Dealing with Request Data

The most important rule about web development is “Do not trust the user”.This is especially true for incoming request data on the input stream.With WSGI this is actually a bit harder than you would expect. Becauseof that Werkzeug wraps the request stream for you to save you from themost prominent problems with it.

Missing EOF Marker on Input Stream

The input stream has no end-of-file marker. If you would call theread() method on the wsgi.input stream you would cause yourapplication to hang on conforming servers. This is actually intentionalhowever painful. Werkzeug solves that problem by wrapping the inputstream in a special LimitedStream. The input stream is exposedon the request objects as stream. This one is eitheran empty stream (if the form data was parsed) or a limited stream withthe contents of the input stream.

When does Werkzeug Parse?

Werkzeug parses the incoming data under the following situations:

  • you access either form, files,or stream and the request method wasPOST or PUT.
  • if you call parse_form_data().

These calls are not interchangeable. If you invoke parse_form_data()you must not use the request object or at least not the attributes thattrigger the parsing process.

This is also true if you read from the wsgi.input stream before theparsing.

General rule: Leave the WSGI input stream alone. Especially inWSGI middlewares. Use either the parsing functions or the requestobject. Do not mix multiple WSGI utility libraries for form dataparsing or anything else that works on the input stream.

How does it Parse?

The standard Werkzeug parsing behavior handles three cases:

  • input content type was multipart/form-data. In this situation thestream will be empty andform will contain the regular POST / _PUT_data, files will contain the uploadedfiles as FileStorage objects.
  • input content type was application/x-www-form-urlencoded. Then thestream will be empty andform will contain the regular POST / _PUT_data and files will be empty.
  • the input content type was neither of them, streampoints to a LimitedStream with the input data for furtherprocessing.

Special note on the getdata method: Calling thisloads the full request data into memory. This is only safe to do if themax_content_length is set. Also you can _either_read the stream _or call get_data().

Limiting Request Data

To avoid being the victim of a DDOS attack you can set the maximumaccepted content length and request field sizes. The BaseRequestclass has two attributes for that: max_content_lengthand max_form_memory_size.

The first one can be used to limit the total content length. For exampleby setting it to 1024 1024 16 the request won’t accept more than16MB of transmitted data.

Because certain data can’t be moved to the hard disk (regular post data)whereas temporary files can, there is a second limit you can set. Themax_form_memory_size limits the size of _POST_transmitted form data. By setting it to 1024 1024 2 you can makesure that all in memory-stored fields are not more than 2MB in size.

This however does not affect in-memory stored files if thestream_factory used returns a in-memory file.

How to extend Parsing?

Modern web applications transmit a lot more than multipart form data orurl encoded data. To extend the capabilities, subclass BaseRequestor Request and add or extend methods.

There is already a mixin that provides JSON parsing:

  1. from werkzeug.wrappers import Request
  2. from werkzeug.wrappers.json import JSONMixin
  3.  
  4. class JSONRequest(JSONMixin, Request):
  5. pass

The basic implementation of that looks like:

  1. from werkzeug.utils import cached_property
  2. from werkzeug.wrappers import Request
  3. import simplejson as json
  4.  
  5. class JSONRequest(Request):
  6. @cached_property
  7. def json(self):
  8. if self.mimetype == "application/json":
  9. return json.loads(self.data)