Robert Brewer wrote: > There's a major change in functionality in the cgi module between Python > 2 and Python 3 which I've just run across: the behavior of > FieldStorage.read_multi, specifically when an HTTP app accepts a file > upload within a multipart/form-data payload. > > In Python 2, each part would be read in sequence within its own > FieldStorage instance. This allowed file uploads to be shunted to a > TemporaryFile (via make_file) as needed: > > klass = self.FieldStorageClass or self.__class__ > part = klass(self.fp, {}, ib, > environ, keep_blank_values, strict_parsing) > # Throw first part away > while not part.done: > headers = rfc822.Message(self.fp) > part = klass(self.fp, headers, ib, > environ, keep_blank_values, strict_parsing) > self.list.append(part) > > In Python 3 (svn revision 72466), the whole request body is read into > memory first via fp.read(), and then broken into separate parts in a > second step: > > klass = self.FieldStorageClass or self.__class__ > parser = email.parser.FeedParser() > # Create bogus content-type header for proper multipart parsing > parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib)) > parser.feed(self.fp.read()) > full_msg = parser.close() > # Get subparts > msgs = full_msg.get_payload() > for msg in msgs: > fp = StringIO(msg.get_payload()) > part = klass(fp, msg, ib, environ, keep_blank_values, > strict_parsing) > self.list.append(part) > > This makes the cgi module in Python 3 somewhat crippled for handling > multipart/form-data file uploads of any significant size (and since > the client is the one determining the size, opens a server up for an > unexpected Denial of Service vector). > > I *think* the FeedParser is designed to accept incremental writes, > but I haven't yet found a way to do any kind of incremental reads > from it in order to shunt the fp.read out to a tempfile again. > I'm secretly hoping Barry has a one-liner fix for this. ;) > It think what it needs is for the email.parser.FeedParser class to have a feed_from_file() method, supported by the class BufferedSubFile. The BufferedSubFile class keeps an internal list of lines. Perhaps it could also have a list of files, so that when the list of lines becomes empty it can continue by reading lines from the files instead, dropping a file from the list when it reaches the end, something like this: [Module feedparser.py] ... class BufferedSubFile(object): ... def __init__(self): # The last partial line pushed into this object. self._partial = '' # The list of full, pushed lines, in reverse order self._lines = [] # The list of files. self._files = [] ... ... def readline(self): while not self._lines and self._files: data = self._files[0].read(MAX_DATA_SIZE) if data: self.push(data) else: del self._files[0] if not self._lines: if self._closed: return '' return NeedMoreData ... def push_file(self, data_file): """Push some new data from a file into this object.""" self._files.append(data_file) ... and then: ... class FeedParser: ... def feed(self, data): """Push more data into the parser.""" self._input.push(data) self._call_parse() def feed_from_file(self, data_file): """Push more data from a file into the parser.""" self._input.push_file(data_file) self._call_parse() ...
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4