Can I just ask why 444 since 392 was the last assigned Python 2 number? On Wed, Sep 15, 2010 at 15:40, georg.brandl <python-checkins at python.org> wrote: > Author: georg.brandl > Date: Thu Sep 16 00:40:38 2010 > New Revision: 84842 > > Log: > Add PEP 444, Python Web3 Interface. > > Added: > peps/trunk/pep-0444.txt (contents, props changed) > > Added: peps/trunk/pep-0444.txt > ============================================================================== > --- (empty file) > +++ peps/trunk/pep-0444.txt Thu Sep 16 00:40:38 2010 > @@ -0,0 +1,1570 @@ > +PEP: 444 > +Title: Python Web3 Interface > +Version: $Revision$ > +Last-Modified: $Date$ > +Author: Chris McDonough <chrism at plope.com>, > + Armin Ronacher <armin.ronacher at active-4.com> > +Discussions-To: Python Web-SIG <web-sig at python.org> > +Status: Draft > +Type: Informational > +Content-Type: text/x-rst > +Created: 19-Jul-2010 > + > + > +Abstract > +======== > + > +This document specifies a proposed second-generation standard > +interface between web servers and Python web applications or > +frameworks. > + > + > +Rationale and Goals > +=================== > + > +This protocol and specification is influenced heavily by the Web > +Services Gateway Interface (WSGI) 1.0 standard described in PEP 333 > +[1]_ . The high-level rationale for having any standard that allows > +Python-based web servers and applications to interoperate is outlined > +in PEP 333. This document essentially uses PEP 333 as a template, and > +changes its wording in various places for the purpose of forming a > +different standard. > + > +Python currently boasts a wide variety of web application frameworks > +which use the WSGI 1.0 protocol. However, due to changes in the > +language, the WSGI 1.0 protocol is not compatible with Python 3. This > +specification describes a standardized WSGI-like protocol that lets > +Python 2.6, 2.7 and 3.1+ applications communicate with web servers. > +Web3 is clearly a WSGI derivative; it only uses a different name than > +"WSGI" in order to indicate that it is not in any way backwards > +compatible. > + > +Applications and servers which are written to this specification are > +meant to work properly under Python 2.6.X, Python 2.7.X and Python > +3.1+. Neither an application nor a server that implements the Web3 > +specification can be easily written which will work under Python 2 > +versions earlier than 2.6 nor Python 3 versions earlier than 3.1. > + > +.. note:: > + > + Whatever Python 3 version fixed http://bugs.python.org/issue4006 so > + ``os.environ['foo']`` returns surrogates (ala PEP 383) when the > + value of 'foo' cannot be decoded using the current locale instead > + of failing with a KeyError is the *true* minimum Python 3 version. > + In particular, however, Python 3.0 is not supported. > + > +.. note:: > + > + Python 2.6 is the first Python version that supported an alias for > + ``bytes`` and the ``b"foo"`` literal syntax. This is why it is the > + minimum version supported by Web3. > + > +Explicability and documentability are the main technical drivers for > +the decisions made within the standard. > + > + > +Differences from WSGI > +===================== > + > +- All protocol-specific environment names are prefixed with ``web3.`` > + rather than ``wsgi.``, eg. ``web3.input`` rather than > + ``wsgi.input``. > + > +- All values present as environment dictionary *values* are explicitly > + *bytes* instances instead of native strings. (Environment *keys* > + however are native strings, always ``str`` regardless of > + platform). > + > +- All values returned by an application must be bytes instances, > + including status code, header names and values, and the body. > + > +- Wherever WSGI 1.0 referred to an ``app_iter``, this specification > + refers to a ``body``. > + > +- No ``start_response()`` callback (and therefore no ``write()`` > + callable nor ``exc_info`` data). > + > +- The ``readline()`` function of ``web3.input`` must support a size > + hint parameter. > + > +- The ``read()`` function of ``web3.input`` must be length delimited. > + A call without a size argument must not read more than the content > + length header specifies. In case a content length header is absent > + the stream must not return anything on read. It must never request > + more data than specified from the client. > + > +- No requirement for middleware to yield an empty string if it needs > + more information from an application to produce output (e.g. no > + "Middleware Handling of Block Boundaries"). > + > +- Filelike objects passed to a "file_wrapper" must have an > + ``__iter__`` which returns bytes (never text). > + > +- ``wsgi.file_wrapper`` is not supported. > + > +- ``QUERY_STRING``, ``SCRIPT_NAME``, ``PATH_INFO`` values required to > + be placed in environ by server (each as the empty bytes instance if > + no associated value is received in the HTTP request). > + > +- ``web3.path_info`` and ``web3.script_name`` should be put into the > + Web3 environment, if possible, by the origin Web3 server. When > + available, each is the original, plain 7-bit ASCII, URL-encoded > + variant of its CGI equivalent derived directly from the request URI > + (with %2F segment markers and other meta-characters intact). If the > + server cannot provide one (or both) of these values, it must omit > + the value(s) it cannot provide from the environment. > + > +- This requirement was removed: "middleware components **must not** > + block iteration waiting for multiple values from an application > + iterable. If the middleware needs to accumulate more data from the > + application before it can produce any output, it **must** yield an > + empty string." > + > +- ``SERVER_PORT`` must be a bytes instance (not an integer). > + > +- The server must not inject an additional ``Content-Length`` header > + by guessing the length from the response iterable. This must be set > + by the application itself in all situations. > + > +- If the origin server advertises that it has the ``web3.async`` > + capability, a Web3 application callable used by the server is > + permitted to return a callable that accepts no arguments. When it > + does so, this callable is to be called periodically by the origin > + server until it returns a non-``None`` response, which must be a > + normal Web3 response tuple. > + > + .. XXX (chrism) Needs a section of its own for explanation. > + > + > +Specification Overview > +====================== > + > +The Web3 interface has two sides: the "server" or "gateway" side, and > +the "application" or "framework" side. The server side invokes a > +callable object that is provided by the application side. The > +specifics of how that object is provided are up to the server or > +gateway. It is assumed that some servers or gateways will require an > +application's deployer to write a short script to create an instance > +of the server or gateway, and supply it with the application object. > +Other servers and gateways may use configuration files or other > +mechanisms to specify where an application object should be imported > +from, or otherwise obtained. > + > +In addition to "pure" servers/gateways and applications/frameworks, it > +is also possible to create "middleware" components that implement both > +sides of this specification. Such components act as an application to > +their containing server, and as a server to a contained application, > +and can be used to provide extended APIs, content transformation, > +navigation, and other useful functions. > + > +Throughout this specification, we will use the term "application > +callable" to mean "a function, a method, or an instance with a > +``__call__`` method". It is up to the server, gateway, or application > +implementing the application callable to choose the appropriate > +implementation technique for their needs. Conversely, a server, > +gateway, or application that is invoking a callable **must not** have > +any dependency on what kind of callable was provided to it. > +Application callables are only to be called, not introspected upon. > + > + > +The Application/Framework Side > +------------------------------ > + > +The application object is simply a callable object that accepts one > +argument. The term "object" should not be misconstrued as requiring > +an actual object instance: a function, method, or instance with a > +``__call__`` method are all acceptable for use as an application > +object. Application objects must be able to be invoked more than > +once, as virtually all servers/gateways (other than CGI) will make > +such repeated requests. It this cannot be guaranteed by the > +implementation of the actual application, it has to be wrapped in a > +function that creates a new instance on each call. > + > +.. note:: > + > + Although we refer to it as an "application" object, this should not > + be construed to mean that application developers will use Web3 as a > + web programming API. It is assumed that application developers > + will continue to use existing, high-level framework services to > + develop their applications. Web3 is a tool for framework and > + server developers, and is not intended to directly support > + application developers.) > + > +An example of an application which is a function (``simple_app``):: > + > + def simple_app(environ): > + """Simplest possible application object""" > + status = b'200 OK' > + headers = [(b'Content-type', b'text/plain')] > + body = [b'Hello world!\n'] > + return body, status, headers > + > +An example of an application which is an instance (``simple_app``):: > + > + class AppClass(object): > + > + """Produce the same output, but using an instance. An > + instance of this class must be instantiated before it is > + passed to the server. """ > + > + def __call__(self, environ): > + status = b'200 OK' > + headers = [(b'Content-type', b'text/plain')] > + body = [b'Hello world!\n'] > + return body, status, headers > + > + simple_app = AppClass() > + > +Alternately, an application callable may return a callable instead of > +the tuple if the server supports asynchronous execution. See > +information concerning ``web3.async`` for more information. > + > + > +The Server/Gateway Side > +----------------------- > + > +The server or gateway invokes the application callable once for each > +request it receives from an HTTP client, that is directed at the > +application. To illustrate, here is a simple CGI gateway, implemented > +as a function taking an application object. Note that this simple > +example has limited error handling, because by default an uncaught > +exception will be dumped to ``sys.stderr`` and logged by the web > +server. > + > +:: > + > + import locale > + import os > + import sys > + > + encoding = locale.getpreferredencoding() > + > + stdout = sys.stdout > + > + if hasattr(sys.stdout, 'buffer'): > + # Python 3 compatibility; we need to be able to push bytes out > + stdout = sys.stdout.buffer > + > + def get_environ(): > + d = {} > + for k, v in os.environ.items(): > + # Python 3 compatibility > + if not isinstance(v, bytes): > + # We must explicitly encode the string to bytes under > + # Python 3.1+ > + v = v.encode(encoding, 'surrogateescape') > + d[k] = v > + return d > + > + def run_with_cgi(application): > + > + environ = get_environ() > + environ['web3.input'] = sys.stdin > + environ['web3.errors'] = sys.stderr > + environ['web3.version'] = (1, 0) > + environ['web3.multithread'] = False > + environ['web3.multiprocess'] = True > + environ['web3.run_once'] = True > + environ['web3.async'] = False > + > + if environ.get('HTTPS', b'off') in (b'on', b'1'): > + environ['web3.url_scheme'] = b'https' > + else: > + environ['web3.url_scheme'] = b'http' > + > + rv = application(environ) > + if hasattr(rv, '__call__'): > + raise TypeError('This webserver does not support asynchronous ' > + 'responses.') > + body, status, headers = rv > + > + CLRF = b'\r\n' > + > + try: > + stdout.write(b'Status: ' + status + CRLF) > + for header_name, header_val in headers: > + stdout.write(header_name + b': ' + header_val + CRLF) > + stdout.write(CRLF) > + for chunk in body: > + stdout.write(chunk) > + stdout.flush() > + finally: > + if hasattr(body, 'close'): > + body.close() > + > + > +Middleware: Components that Play Both Sides > +------------------------------------------- > + > +A single object may play the role of a server with respect to some > +application(s), while also acting as an application with respect to > +some server(s). Such "middleware" components can perform such > +functions as: > + > +* Routing a request to different application objects based on the > + target URL, after rewriting the ``environ`` accordingly. > + > +* Allowing multiple applications or frameworks to run side-by-side in > + the same process. > + > +* Load balancing and remote processing, by forwarding requests and > + responses over a network. > + > +* Perform content postprocessing, such as applying XSL stylesheets. > + > +The presence of middleware in general is transparent to both the > +"server/gateway" and the "application/framework" sides of the > +interface, and should require no special support. A user who desires > +to incorporate middleware into an application simply provides the > +middleware component to the server, as if it were an application, and > +configures the middleware component to invoke the application, as if > +the middleware component were a server. Of course, the "application" > +that the middleware wraps may in fact be another middleware component > +wrapping another application, and so on, creating what is referred to > +as a "middleware stack". > + > +A middleware must support asychronous execution if possible or fall > +back to disabling itself. > + > +Here a middleware that changes the ``HTTP_HOST`` key if an ``X-Host`` > +header exists and adds a comment to all html responses:: > + > + import time > + > + def apply_filter(app, environ, filter_func): > + """Helper function that passes the return value from an > + application to a filter function when the results are > + ready. > + """ > + app_response = app(environ) > + > + # synchronous response, filter now > + if not hasattr(app_response, '__call__'): > + return filter_func(*app_response) > + > + # asychronous response. filter when results are ready > + def polling_function(): > + rv = app_response() > + if rv is not None: > + return filter_func(*rv) > + return polling_function > + > + def proxy_and_timing_support(app): > + def new_application(environ): > + def filter_func(body, status, headers): > + now = time.time() > + for key, value in headers: > + if key.lower() == b'content-type' and \ > + value.split(b';')[0] == b'text/html': > + # assumes ascii compatible encoding in body, > + # but the middleware should actually parse the > + # content type header and figure out the > + # encoding when doing that. > + body += ('<!-- Execution time: %.2fsec -->' % > + (now - then)).encode('ascii') > + break > + return body, status, headers > + then = time.time() > + host = environ.get('HTTP_X_HOST') > + if host is not None: > + environ['HTTP_HOST'] = host > + > + # use the apply_filter function that applies a given filter > + # function for both async and sync responses. > + return apply_filter(app, environ, filter_func) > + return new_application > + > + app = proxy_and_timing_support(app) > + > + > +Specification Details > +===================== > + > +The application callable must accept one positional argument. For the > +sake of illustration, we have named it ``environ``, but it is not > +required to have this name. A server or gateway **must** invoke the > +application object using a positional (not keyword) argument. > +(E.g. by calling ``status, headers, body = application(environ)`` as > +shown above.) > + > +The ``environ`` parameter is a dictionary object, containing CGI-style > +environment variables. This object **must** be a builtin Python > +dictionary (*not* a subclass, ``UserDict`` or other dictionary > +emulation), and the application is allowed to modify the dictionary in > +any way it desires. The dictionary must also include certain > +Web3-required variables (described in a later section), and may also > +include server-specific extension variables, named according to a > +convention that will be described below. > + > +When called by the server, the application object must return a tuple > +yielding three elements: ``status``, ``headers`` and ``body``, or, if > +supported by an async server, an argumentless callable which either > +returns ``None`` or a tuple of those three elements. > + > +The ``status`` element is a status in bytes of the form ``b'999 > +Message here'``. > + > +``headers`` is a Python list of ``(header_name, header_value)`` pairs > +describing the HTTP response header. The ``headers`` structure must > +be a literal Python list; it must yield two-tuples. Both > +``header_name`` and ``header_value`` must be bytes values. > + > +The ``body`` is an iterable yielding zero or more bytes instances. > +This can be accomplished in a variety of ways, such as by returning a > +list containing bytes instances as ``body``, or by returning a > +generator function as ``body`` that yields bytes instances, or by the > +``body`` being an instance of a class which is iterable. Regardless > +of how it is accomplished, the application object must always return a > +``body`` iterable yielding zero or more bytes instances. > + > +The server or gateway must transmit the yielded bytes to the client in > +an unbuffered fashion, completing the transmission of each set of > +bytes before requesting another one. (In other words, applications > +**should** perform their own buffering. See the `Buffering and > +Streaming`_ section below for more on how application output must be > +handled.) > + > +The server or gateway should treat the yielded bytes as binary byte > +sequences: in particular, it should ensure that line endings are not > +altered. The application is responsible for ensuring that the > +string(s) to be written are in a format suitable for the client. (The > +server or gateway **may** apply HTTP transfer encodings, or perform > +other transformations for the purpose of implementing HTTP features > +such as byte-range transmission. See `Other HTTP Features`_, below, > +for more details.) > + > +If the ``body`` iterable returned by the application has a ``close()`` > +method, the server or gateway **must** call that method upon > +completion of the current request, whether the request was completed > +normally, or terminated early due to an error. This is to support > +resource release by the application amd is intended to complement PEP > +325's generator support, and other common iterables with ``close()`` > +methods. > + > +Finally, servers and gateways **must not** directly use any other > +attributes of the ``body`` iterable returned by the application. > + > + > +``environ`` Variables > +--------------------- > + > +The ``environ`` dictionary is required to contain various CGI > +environment variables, as defined by the Common Gateway Interface > +specification [2]_. > + > +The following CGI variables **must** be present. Each key is a native > +string. Each value is a bytes instance. > + > +.. note:: > + > + In Python 3.1+, a "native string" is a ``str`` type decoded using > + the ``surrogateescape`` error handler, as done by > + ``os.environ.__getitem__``. In Python 2.6 and 2.7, a "native > + string" is a ``str`` types representing a set of bytes. > + > +``REQUEST_METHOD`` > + The HTTP request method, such as ``"GET"`` or ``"POST"``. > + > +``SCRIPT_NAME`` > + The initial portion of the request URL's "path" that corresponds to > + the application object, so that the application knows its virtual > + "location". This may be the empty bytes instance if the application > + corresponds to the "root" of the server. SCRIPT_NAME will be a > + bytes instance representing a sequence of URL-encoded segments > + separated by the slash character (``/``). It is assumed that > + ``%2F`` characters will be decoded into literal slash characters > + within ``PATH_INFO`` , as per CGI. > + > +``PATH_INFO`` > + The remainder of the request URL's "path", designating the virtual > + "location" of the request's target within the application. This > + **may** be a bytes instance if the request URL targets the > + application root and does not have a trailing slash. PATH_INFO will > + be a bytes instance representing a sequence of URL-encoded segments > + separated by the slash character (``/``). It is assumed that > + ``%2F`` characters will be decoded into literal slash characters > + within ``PATH_INFO`` , as per CGI. > + > +``QUERY_STRING`` > + The portion of the request URL (in bytes) that follows the ``"?"``, > + if any, or the empty bytes instance. > + > +``SERVER_NAME``, ``SERVER_PORT`` > + When combined with ``SCRIPT_NAME`` and ``PATH_INFO`` (or their raw > + equivalents)`, these variables can be used to complete the URL. > + Note, however, that ``HTTP_HOST``, if present, should be used in > + preference to ``SERVER_NAME`` for reconstructing the request URL. > + See the `URL Reconstruction`_ section below for more detail. > + ``SERVER_PORT`` should be a bytes instance, not an integer. > + > +``SERVER_PROTOCOL`` > + The version of the protocol the client used to send the request. > + Typically this will be something like ``"HTTP/1.0"`` or > + ``"HTTP/1.1"`` and may be used by the application to determine how > + to treat any HTTP request headers. (This variable should probably > + be called ``REQUEST_PROTOCOL``, since it denotes the protocol used > + in the request, and is not necessarily the protocol that will be > + used in the server's response. However, for compatibility with CGI > + we have to keep the existing name.) > + > +The following CGI values **may** present be in the Web3 environment. > +Each key is a native string. Each value is a bytes instances. > + > +``CONTENT_TYPE`` > + The contents of any ``Content-Type`` fields in the HTTP request. > + > +``CONTENT_LENGTH`` > + The contents of any ``Content-Length`` fields in the HTTP request. > + > +``HTTP_`` Variables > + Variables corresponding to the client-supplied HTTP request headers > + (i.e., variables whose names begin with ``"HTTP_"``). The presence > + or absence of these variables should correspond with the presence or > + absence of the appropriate HTTP header in the request. > + > +A server or gateway **should** attempt to provide as many other CGI > +variables as are applicable, each with a string for its key and a > +bytes instance for its value. In addition, if SSL is in use, the > +server or gateway **should** also provide as many of the Apache SSL > +environment variables [5]_ as are applicable, such as ``HTTPS=on`` and > +``SSL_PROTOCOL``. Note, however, that an application that uses any > +CGI variables other than the ones listed above are necessarily > +non-portable to web servers that do not support the relevant > +extensions. (For example, web servers that do not publish files will > +not be able to provide a meaningful ``DOCUMENT_ROOT`` or > +``PATH_TRANSLATED``.) > + > +A Web3-compliant server or gateway **should** document what variables > +it provides, along with their definitions as appropriate. > +Applications **should** check for the presence of any variables they > +require, and have a fallback plan in the event such a variable is > +absent. > + > +Note that CGI variable *values* must be bytes instances, if they are > +present at all. It is a violation of this specification for a CGI > +variable's value to be of any type other than ``bytes``. On Python 2, > +this means they will be of type ``str``. On Python 3, this means they > +will be of type ``bytes``. > + > +They *keys* of all CGI and non-CGI variables in the environ, however, > +must be "native strings" (on both Python 2 and Python 3, they will be > +of type ``str``). > + > +In addition to the CGI-defined variables, the ``environ`` dictionary > +**may** also contain arbitrary operating-system "environment > +variables", and **must** contain the following Web3-defined variables. > + > +===================== =============================================== > +Variable Value > +===================== =============================================== > +``web3.version`` The tuple ``(1, 0)``, representing Web3 > + version 1.0. > + > +``web3.url_scheme`` A bytes value representing the "scheme" portion of > + the URL at which the application is being > + invoked. Normally, this will have the value > + ``b"http"`` or ``b"https"``, as appropriate. > + > +``web3.input`` An input stream (file-like object) from which bytes > + constituting the HTTP request body can be read. > + (The server or gateway may perform reads > + on-demand as requested by the application, or > + it may pre- read the client's request body and > + buffer it in-memory or on disk, or use any > + other technique for providing such an input > + stream, according to its preference.) > + > +``web3.errors`` An output stream (file-like object) to which error > + output text can be written, for the purpose of > + recording program or other errors in a > + standardized and possibly centralized location. > + This should be a "text mode" stream; i.e., > + applications should use ``"\n"`` as a line > + ending, and assume that it will be converted to > + the correct line ending by the server/gateway. > + Applications may *not* send bytes to the > + 'write' method of this stream; they may only > + send text. > + > + For many servers, ``web3.errors`` will be the > + server's main error log. Alternatively, this > + may be ``sys.stderr``, or a log file of some > + sort. The server's documentation should > + include an explanation of how to configure this > + or where to find the recorded output. A server > + or gateway may supply different error streams > + to different applications, if this is desired. > + > +``web3.multithread`` This value should evaluate true if the > + application object may be simultaneously > + invoked by another thread in the same process, > + and should evaluate false otherwise. > + > +``web3.multiprocess`` This value should evaluate true if an > + equivalent application object may be > + simultaneously invoked by another process, and > + should evaluate false otherwise. > + > +``web3.run_once`` This value should evaluate true if the server > + or gateway expects (but does not guarantee!) > + that the application will only be invoked this > + one time during the life of its containing > + process. Normally, this will only be true for > + a gateway based on CGI (or something similar). > + > +``web3.script_name`` The non-URL-decoded ``SCRIPT_NAME`` value. > + Through a historical inequity, by virtue of the > + CGI specification, ``SCRIPT_NAME`` is present > + within the environment as an already > + URL-decoded string. This is the original > + URL-encoded value derived from the request URI. > + If the server cannot provide this value, it > + must omit it from the environ. > + > +``web3.path_info`` The non-URL-decoded ``PATH_INFO`` value. > + Through a historical inequity, by virtue of the > + CGI specification, ``PATH_INFO`` is present > + within the environment as an already > + URL-decoded string. This is the original > + URL-encoded value derived from the request URI. > + If the server cannot provide this value, it > + must omit it from the environ. > + > +``web3.async`` This is ``True`` if the webserver supports > + async invocation. In that case an application > + is allowed to return a callable instead of a > + tuple with the response. The exact semantics > + are not specified by this specification. > + > +===================== =============================================== > + > +Finally, the ``environ`` dictionary may also contain server-defined > +variables. These variables should have names which are native > +strings, composed of only lower-case letters, numbers, dots, and > +underscores, and should be prefixed with a name that is unique to the > +defining server or gateway. For example, ``mod_web3`` might define > +variables with names like ``mod_web3.some_variable``. > + > + > +Input Stream > +~~~~~~~~~~~~ > + > +The input stream (``web3.input``) provided by the server must support > +the following methods: > + > +===================== ======== > +Method Notes > +===================== ======== > +``read(size)`` 1,4 > +``readline([size])`` 1,2,4 > +``readlines([size])`` 1,3,4 > +``__iter__()`` 4 > +===================== ======== > + > +The semantics of each method are as documented in the Python Library > +Reference, except for these notes as listed in the table above: > + > +1. The server is not required to read past the client's specified > + ``Content-Length``, and is allowed to simulate an end-of-file > + condition if the application attempts to read past that point. The > + application **should not** attempt to read more data than is > + specified by the ``CONTENT_LENGTH`` variable. > + > +2. The implementation must support the optional ``size`` argument to > + ``readline()``. > + > +3. The application is free to not supply a ``size`` argument to > + ``readlines()``, and the server or gateway is free to ignore the > + value of any supplied ``size`` argument. > + > +4. The ``read``, ``readline`` and ``__iter__`` methods must return a > + bytes instance. The ``readlines`` method must return a sequence > + which contains instances of bytes. > + > +The methods listed in the table above **must** be supported by all > +servers conforming to this specification. Applications conforming to > +this specification **must not** use any other methods or attributes of > +the ``input`` object. In particular, applications **must not** > +attempt to close this stream, even if it possesses a ``close()`` > +method. > + > +The input stream should silently ignore attempts to read more than the > +content length of the request. If no content length is specified the > +stream must be a dummy stream that does not return anything. > + > + > +Error Stream > +~~~~~~~~~~~~ > + > +The error stream (``web3.errors``) provided by the server must support > +the following methods: > + > +=================== ========== ======== > +Method Stream Notes > +=================== ========== ======== > +``flush()`` ``errors`` 1 > +``write(str)`` ``errors`` 2 > +``writelines(seq)`` ``errors`` 2 > +=================== ========== ======== > + > +The semantics of each method are as documented in the Python Library > +Reference, except for these notes as listed in the table above: > + > +1. Since the ``errors`` stream may not be rewound, servers and > + gateways are free to forward write operations immediately, without > + buffering. In this case, the ``flush()`` method may be a no-op. > + Portable applications, however, cannot assume that output is > + unbuffered or that ``flush()`` is a no-op. They must call > + ``flush()`` if they need to ensure that output has in fact been > + written. (For example, to minimize intermingling of data from > + multiple processes writing to the same error log.) > + > +2. The ``write()`` method must accept a string argument, but needn't > + necessarily accept a bytes argument. The ``writelines()`` method > + must accept a sequence argument that consists entirely of strings, > + but needn't necessarily accept any bytes instance as a member of > + the sequence. > + > +The methods listed in the table above **must** be supported by all > +servers conforming to this specification. Applications conforming to > +this specification **must not** use any other methods or attributes of > +the ``errors`` object. In particular, applications **must not** > +attempt to close this stream, even if it possesses a ``close()`` > +method. > + > + > +Values Returned by A Web3 Application > +------------------------------------- > + > +Web3 applications return an iterable in the form (``status``, > +``headers``, ``body``). The return value can be any iterable type > +that returns exactly three values. If the server supports > +asynchronous applications (``web3.async``), the response may be a > +callable object (which accepts no arguments). > + > +The ``status`` value is assumed by a gateway or server to be an HTTP > +"status" bytes instance like ``b'200 OK'`` or ``b'404 Not Found'``. > +That is, it is a string consisting of a Status-Code and a > +Reason-Phrase, in that order and separated by a single space, with no > +surrounding whitespace or other characters. (See RFC 2616, Section > +6.1.1 for more information.) The string **must not** contain control > +characters, and must not be terminated with a carriage return, > +linefeed, or combination thereof. > + > +The ``headers`` value is assumed by a gateway or server to be a > +literal Python list of ``(header_name, header_value)`` tuples. Each > +``header_name`` must be a bytes instance representing a valid HTTP > +header field-name (as defined by RFC 2616, Section 4.2), without a > +trailing colon or other punctuation. Each ``header_value`` must be a > +bytes instance and **must not** include any control characters, > +including carriage returns or linefeeds, either embedded or at the > +end. (These requirements are to minimize the complexity of any > +parsing that must be performed by servers, gateways, and intermediate > +response processors that need to inspect or modify response headers.) > + > +In general, the server or gateway is responsible for ensuring that > +correct headers are sent to the client: if the application omits a > +header required by HTTP (or other relevant specifications that are in > +effect), the server or gateway **must** add it. For example, the HTTP > +``Date:`` and ``Server:`` headers would normally be supplied by the > +server or gateway. The gateway must however not override values with > +the same name if they are emitted by the application. > + > +(A reminder for server/gateway authors: HTTP header names are > +case-insensitive, so be sure to take that into consideration when > +examining application-supplied headers!) > + > +Applications and middleware are forbidden from using HTTP/1.1 > +"hop-by-hop" features or headers, any equivalent features in HTTP/1.0, > +or any headers that would affect the persistence of the client's > +connection to the web server. These features are the exclusive > +province of the actual web server, and a server or gateway **should** > +consider it a fatal error for an application to attempt sending them, > +and raise an error if they are supplied as return values from an > +application in the ``headers`` structure. (For more specifics on > +"hop-by-hop" features and headers, please see the `Other HTTP > +Features`_ section below.) > + > + > +Dealing with Compatibility Across Python Versions > +------------------------------------------------- > + > +Creating Web3 code that runs under both Python 2.6/2.7 and Python 3.1+ > +requires some care on the part of the developer. In general, the Web3 > +specification assumes a certain level of equivalence between the > +Python 2 ``str`` type and the Python 3 ``bytes`` type. For example, > +under Python 2, the values present in the Web3 ``environ`` will be > +instances of the ``str`` type; in Python 3, these will be instances of > +the ``bytes`` type. The Python 3 ``bytes`` type does not possess all > +the methods of the Python 2 ``str`` type, and some methods which it > +does possess behave differently than the Python 2 ``str`` type. > +Effectively, to ensure that Web3 middleware and applications work > +across Python versions, developers must do these things: > + > +#) Do not assume comparison equivalence between text values and bytes > + values. If you do so, your code may work under Python 2, but it > + will not work properly under Python 3. For example, don't write > + ``somebytes == 'abc'``. This will sometimes be true on Python 2 > + but it will never be true on Python 3, because a sequence of bytes > + never compares equal to a string under Python 3. Instead, always > + compare a bytes value with a bytes value, e.g. "somebytes == > + b'abc'". Code which does this is compatible with and works the > + same in Python 2.6, 2.7, and 3.1. The ``b`` in front of ``'abc'`` > + signals to Python 3 that the value is a literal bytes instance; > + under Python 2 it's a forward compatibility placebo. > + > +#) Don't use the ``__contains__`` method (directly or indirectly) of > + items that are meant to be byteslike without ensuring that its > + argument is also a bytes instance. If you do so, your code may > + work under Python 2, but it will not work properly under Python 3. > + For example, ``'abc' in somebytes'`` will raise a ``TypeError`` > + under Python 3, but it will return ``True`` under Python 2.6 and > + 2.7. However, ``b'abc' in somebytes`` will work the same on both > + versions. In Python 3.2, this restriction may be partially > + removed, as it's rumored that bytes types may obtain a ``__mod__`` > + implementation. > + > +#) ``__getitem__`` should not be used. > + > + .. XXX > + > +#) Dont try to use the ``format`` method or the ``__mod__`` method of > + instances of bytes (directly or indirectly). In Python 2, the > + ``str`` type which we treat equivalently to Python 3's ``bytes`` > + supports these method but actual Python 3's ``bytes`` instances > + don't support these methods. If you use these methods, your code > + will work under Python 2, but not under Python 3. > + > +#) Do not try to concatenate a bytes value with a string value. This > + may work under Python 2, but it will not work under Python 3. For > + example, doing ``'abc' + somebytes`` will work under Python 2, but > + it will result in a ``TypeError`` under Python 3. Instead, always > + make sure you're concatenating two items of the same type, > + e.g. ``b'abc' + somebytes``. > + > +Web3 expects byte values in other places, such as in all the values > +returned by an application. > + > +In short, to ensure compatibility of Web3 application code between > +Python 2 and Python 3, in Python 2, treat CGI and server variable > +values in the environment as if they had the Python 3 ``bytes`` API > +even though they actually have a more capable API. Likewise for all > +stringlike values returned by a Web3 application. > + > + > +Buffering and Streaming > +----------------------- > + > +Generally speaking, applications will achieve the best throughput by > +buffering their (modestly-sized) output and sending it all at once. > +This is a common approach in existing frameworks: the output is > +buffered in a StringIO or similar object, then transmitted all at > +once, along with the response headers. > + > +The corresponding approach in Web3 is for the application to simply > +return a single-element ``body`` iterable (such as a list) containing > +the response body as a single string. This is the recommended > +approach for the vast majority of application functions, that render > +HTML pages whose text easily fits in memory. > + > +For large files, however, or for specialized uses of HTTP streaming > +(such as multipart "server push"), an application may need to provide > +output in smaller blocks (e.g. to avoid loading a large file into > +memory). It's also sometimes the case that part of a response may be > +time-consuming to produce, but it would be useful to send ahead the > +portion of the response that precedes it. > + > +In these cases, applications will usually return a ``body`` iterator > +(often a generator-iterator) that produces the output in a > +block-by-block fashion. These blocks may be broken to coincide with > +mulitpart boundaries (for "server push"), or just before > +time-consuming tasks (such as reading another block of an on-disk > +file). > + > +Web3 servers, gateways, and middleware **must not** delay the > +transmission of any block; they **must** either fully transmit the > +block to the client, or guarantee that they will continue transmission > +even while the application is producing its next block. A > +server/gateway or middleware may provide this guarantee in one of > +three ways: > + > +1. Send the entire block to the operating system (and request that any > + O/S buffers be flushed) before returning control to the > + application, OR > + > +2. Use a different thread to ensure that the block continues to be > + transmitted while the application produces the next block. > + > +3. (Middleware only) send the entire block to its parent > + gateway/server. > + > +By providing this guarantee, Web3 allows applications to ensure that > +transmission will not become stalled at an arbitrary point in their > +output data. This is critical for proper functioning of > +e.g. multipart "server push" streaming, where data between multipart > +boundaries should be transmitted in full to the client. > + > + > +Unicode Issues > +-------------- > + > +HTTP does not directly support Unicode, and neither does this > +interface. All encoding/decoding must be handled by the > +**application**; all values passed to or from the server must be of > +the Python 3 type ``bytes`` or instances of the Python 2 type ``str``, > +not Python 2 ``unicode`` or Python 3 ``str`` objects. > + > +All "bytes instances" referred to in this specification **must**: > + > +- On Python 2, be of type ``str``. > + > +- On Python 3, be of type ``bytes``. > + > +All "bytes instances" **must not** : > + > +- On Python 2, be of type ``unicode``. > + > +- On Python 3, be of type ``str``. > + > +The result of using a textlike object where a byteslike object is > +required is undefined. > + > +Values returned from a Web3 app as a status or as response headers > +**must** follow RFC 2616 with respect to encoding. That is, the bytes > +returned must contain a character stream of ISO-8859-1 characters, or > +the character stream should use RFC 2047 MIME encoding. > + > +On Python platforms which do not have a native bytes-like type > +(e.g. IronPython, etc.), but instead which generally use textlike > +strings to represent bytes data, the definition of "bytes instance" > +can be changed: their "bytes instances" must be native strings that > +contain only code points representable in ISO-8859-1 encoding > +(``\u0000`` through ``\u00FF``, inclusive). It is a fatal error for > +an application on such a platform to supply strings containing any > +other Unicode character or code point. Similarly, servers and > +gateways on those platforms **must not** supply strings to an > +application containing any other Unicode characters. > + > +.. XXX (armin: Jython now has a bytes type, we might remove this > + section after seeing about IronPython) > + > + > +HTTP 1.1 Expect/Continue > +------------------------ > + > +Servers and gateways that implement HTTP 1.1 **must** provide > +transparent support for HTTP 1.1's "expect/continue" mechanism. This > +may be done in any of several ways: > + > +1. Respond to requests containing an ``Expect: 100-continue`` request > + with an immediate "100 Continue" response, and proceed normally. > + > +2. Proceed with the request normally, but provide the application with > + a ``web3.input`` stream that will send the "100 Continue" response > + if/when the application first attempts to read from the input > + stream. The read request must then remain blocked until the client > + responds. > + > +3. Wait until the client decides that the server does not support > + expect/continue, and sends the request body on its own. (This is > + suboptimal, and is not recommended.) > + > +Note that these behavior restrictions do not apply for HTTP 1.0 > +requests, or for requests that are not directed to an application > +object. For more information on HTTP 1.1 Expect/Continue, see RFC > +2616, sections 8.2.3 and 10.1.1. > + > + > +Other HTTP Features > +------------------- > + > +In general, servers and gateways should "play dumb" and allow the > +application complete control over its output. They should only make > +changes that do not alter the effective semantics of the application's > +response. It is always possible for the application developer to add > +middleware components to supply additional features, so server/gateway > +developers should be conservative in their implementation. In a > +sense, a server should consider itself to be like an HTTP "gateway > +server", with the application being an HTTP "origin server". (See RFC > +2616, section 1.3, for the definition of these terms.) > + > +However, because Web3 servers and applications do not communicate via > +HTTP, what RFC 2616 calls "hop-by-hop" headers do not apply to Web3 > +internal communications. Web3 applications **must not** generate any > +"hop-by-hop" headers [4]_, attempt to use HTTP features that would > +require them to generate such headers, or rely on the content of any > +incoming "hop-by-hop" headers in the ``environ`` dictionary. Web3 > +servers **must** handle any supported inbound "hop-by-hop" headers on > +their own, such as by decoding any inbound ``Transfer-Encoding``, > +including chunked encoding if applicable. > + > +Applying these principles to a variety of HTTP features, it should be > +clear that a server **may** handle cache validation via the > +``If-None-Match`` and ``If-Modified-Since`` request headers and the > +``Last-Modified`` and ``ETag`` response headers. However, it is not > +required to do this, and the application **should** perform its own > +cache validation if it wants to support that feature, since the > +server/gateway is not required to do such validation. > + > +Similarly, a server **may** re-encode or transport-encode an > +application's response, but the application **should** use a suitable > +content encoding on its own, and **must not** apply a transport > +encoding. A server **may** transmit byte ranges of the application's > +response if requested by the client, and the application doesn't > +natively support byte ranges. Again, however, the application > +**should** perform this function on its own if desired. > + > +Note that these restrictions on applications do not necessarily mean > +that every application must reimplement every HTTP feature; many HTTP > +features can be partially or fully implemented by middleware > +components, thus freeing both server and application authors from > +implementing the same features over and over again. > + > + > +Thread Support > +-------------- > + > +Thread support, or lack thereof, is also server-dependent. Servers > +that can run multiple requests in parallel, **should** also provide > +the option of running an application in a single-threaded fashion, so > +that applications or frameworks that are not thread-safe may still be > +used with that server. > + > + > +Implementation/Application Notes > +================================ > + > +Server Extension APIs > +--------------------- > + > +Some server authors may wish to expose more advanced APIs, that > +application or framework authors can use for specialized purposes. > +For example, a gateway based on ``mod_python`` might wish to expose > +part of the Apache API as a Web3 extension. > + > +In the simplest case, this requires nothing more than defining an > +``environ`` variable, such as ``mod_python.some_api``. But, in many > +cases, the possible presence of middleware can make this difficult. > +For example, an API that offers access to the same HTTP headers that > +are found in ``environ`` variables, might return different data if > +``environ`` has been modified by middleware. > + > +In general, any extension API that duplicates, supplants, or bypasses > +some portion of Web3 functionality runs the risk of being incompatible > +with middleware components. Server/gateway developers should *not* > +assume that nobody will use middleware, because some framework > +developers specifically organize their frameworks to function almost > +entirely as middleware of various kinds. > + > +So, to provide maximum compatibility, servers and gateways that > +provide extension APIs that replace some Web3 functionality, **must** > +design those APIs so that they are invoked using the portion of the > +API that they replace. For example, an extension API to access HTTP > +request headers must require the application to pass in its current > +``environ``, so that the server/gateway may verify that HTTP headers > +accessible via the API have not been altered by middleware. If the > +extension API cannot guarantee that it will always agree with > +``environ`` about the contents of HTTP headers, it must refuse service > +to the application, e.g. by raising an error, returning ``None`` > +instead of a header collection, or whatever is appropriate to the API. > + > +These guidelines also apply to middleware that adds information such > +as parsed cookies, form variables, sessions, and the like to > +``environ``. Specifically, such middleware should provide these > +features as functions which operate on ``environ``, rather than simply > +stuffing values into ``environ``. This helps ensure that information > +is calculated from ``environ`` *after* any middleware has done any URL > +rewrites or other ``environ`` modifications. > + > +It is very important that these "safe extension" rules be followed by > +both server/gateway and middleware developers, in order to avoid a > +future in which middleware developers are forced to delete any and all > +extension APIs from ``environ`` to ensure that their mediation isn't > +being bypassed by applications using those extensions! > + > + > +Application Configuration > +------------------------- > + > +This specification does not define how a server selects or obtains an > +application to invoke. These and other configuration options are > +highly server-specific matters. It is expected that server/gateway > +authors will document how to configure the server to execute a > +particular application object, and with what options (such as > +threading options). > + > +Framework authors, on the other hand, should document how to create an > +application object that wraps their framework's functionality. The > +user, who has chosen both the server and the application framework, > +must connect the two together. However, since both the framework and > +the server have a common interface, this should be merely a mechanical > +matter, rather than a significant engineering effort for each new > +server/framework pair. > + > +Finally, some applications, frameworks, and middleware may wish to use > +the ``environ`` dictionary to receive simple string configuration > +options. Servers and gateways **should** support this by allowing an > +application's deployer to specify name-value pairs to be placed in > +``environ``. In the simplest case, this support can consist merely of > +copying all operating system-supplied environment variables from > +``os.environ`` into the ``environ`` dictionary, since the deployer in > +principle can configure these externally to the server, or in the CGI > +case they may be able to be set via the server's configuration files. > + > +Applications **should** try to keep such required variables to a > +minimum, since not all servers will support easy configuration of > +them. Of course, even in the worst case, persons deploying an > +application can create a script to supply the necessary configuration > +values:: > + > + from the_app import application > + > + def new_app(environ): > + environ['the_app.configval1'] = b'something' > + return application(environ) > + > +But, most existing applications and frameworks will probably only need > +a single configuration value from ``environ``, to indicate the > +location of their application or framework-specific configuration > +file(s). (Of course, applications should cache such configuration, to > +avoid having to re-read it upon each invocation.) > + > + > +URL Reconstruction > +------------------ > + > +If an application wishes to reconstruct a request's complete URL (as a > +bytes object), it may do so using the following algorithm:: > + > + host = environ.get('HTTP_HOST') > + > + scheme = environ['web3.url_scheme'] > + port = environ['SERVER_PORT'] > + query = environ['QUERY_STRING'] > + > + url = scheme + b'://' > + > + if host: > + url += host > + else: > + url += environ['SERVER_NAME'] > + > + if scheme == b'https': > + if port != b'443': > + url += b':' + port > + else: > + if port != b'80': > + url += b':' + port > + > + if 'web3.script_name' in url: > + url += url_quote(environ['web3.script_name']) > + else: > + url += environ['SCRIPT_NAME'] > + if 'web3.path_info' in environ: > + url += url_quote(environ['web3.path_info']) > + else: > + url += environ['PATH_INFO'] > + if query: > + url += b'?' + query > + > +Note that such a reconstructed URL may not be precisely the same URI > +as requested by the client. Server rewrite rules, for example, may > +have modified the client's originally requested URL to place it in a > +canonical form. > + > + > +Open Questions > +============== > + > +- ``file_wrapper`` replacement. Currently nothing is specified here > + but it's clear that the old system of in-band signalling is broken > + if it does not provide a way to figure out as a middleware in the > + process if the response is a file wrapper. > + > + > +Points of Contention > +==================== > + > +Outlined below are potential points of contention regarding this > +specification. > + > + > +WSGI 1.0 Compatibility > +---------------------- > + > +Components written using the WSGI 1.0 specification will not > +transparently interoperate with components written using this > +specification. That's because the goals of this proposal and the > +goals of WSGI 1.0 are not directly aligned. > + > +WSGI 1.0 is obliged to provide specification-level backwards > +compatibility with versions of Python between 2.2 and 2.7. This > +specification, however, ditches Python 2.5 and lower compatibility in > +order to provide compatibility between relatively recent versions of > +Python 2 (2.6 and 2.7) as well as relatively recent versions of Python > +3 (3.1). > + > +It is currently impossible to write components which work reliably > +under both Python 2 and Python 3 using the WSGI 1.0 specification, > +because the specification implicitly posits that CGI and server > +variable values in the environ and values returned via > +``start_response`` represent a sequence of bytes that can be addressed > +using the Python 2 string API. It posits such a thing because that > +sort of data type was the sensible way to represent bytes in all > +Python 2 versions, and WSGI 1.0 was conceived before Python 3 existed. > + > +Python 3's ``str`` type supports the full API provided by the Python 2 > +``str`` type, but Python 3's ``str`` type does not represent a > +sequence of bytes, it instead represents text. Therefore, using it to > +represent environ values also requires that the environ byte sequence > +be decoded to text via some encoding. We cannot decode these bytes to > +text (at least in any way where the decoding has any meaning other > +than as a tunnelling mechanism) without widening the scope of WSGI to > +include server and gateway knowledge of decoding policies and > +mechanics. WSGI 1.0 never concerned itself with encoding and > +decoding. It made statements about allowable transport values, and > +suggested that various values might be best decoded as one encoding or > +another, but it never required a server to *perform* any decoding > +before > + > +Python 3 does not have a stringlike type that can be used instead to > +represent bytes: it has a ``bytes`` type. A bytes type operates quite > +a bit like a Python 2 ``str`` in Python 3.1+, but it lacks behavior > +equivalent to ``str.__mod__`` and its iteration protocol, and > +containment, sequence treatment, and equivalence comparisons are > +different. > + > +In either case, there is no type in Python 3 that behaves just like > +the Python 2 ``str`` type, and a way to create such a type doesn't > +exist because there is no such thing as a "String ABC" which would > +allow a suitable type to be built. Due to this design > +incompatibility, existing WSGI 1.0 servers, middleware, and > +applications will not work under Python 3, even after they are run > +through ``2to3``. > + > +Existing Web-SIG discussions about updating the WSGI specification so > +that it is possible to write a WSGI application that runs in both > +Python 2 and Python 3 tend to revolve around creating a > +specification-level equivalence between the Python 2 ``str`` type > +(which represents a sequence of bytes) and the Python 3 ``str`` type > +(which represents text). Such an equivalence becomes strained in > +various areas, given the different roles of these types. An arguably > +more straightforward equivalence exists between the Python 3 ``bytes`` > +type API and a subset of the Python 2 ``str`` type API. This > +specification exploits this subset equivalence. > + > +In the meantime, aside from any Python 2 vs. Python 3 compatibility > +issue, as various discussions on Web-SIG have pointed out, the WSGI > +1.0 specification is too general, providing support (via ``.write``) > +for asynchronous applications at the expense of implementation > +complexity. This specification uses the fundamental incompatibility > +between WSGI 1.0 and Python 3 as a natural divergence point to create > +a specification with reduced complexity by changing specialized > +support for asynchronous applications. > + > +To provide backwards compatibility for older WSGI 1.0 applications, so > +that they may run on a Web3 stack, it is presumed that Web3 middleware > +will be created which can be used "in front" of existing WSGI 1.0 > +applications, allowing those existing WSGI 1.0 applications to run > +under a Web3 stack. This middleware will require, when under Python > +3, an equivalence to be drawn between Python 3 ``str`` types and the > +bytes values represented by the HTTP request and all the attendant > +encoding-guessing (or configuration) it implies. > + > +.. note:: > + > + Such middleware *might* in the future, instead of drawing an > + equivalence between Python 3 ``str`` and HTTP byte values, make use > + of a yet-to-be-created "ebytes" type (aka "bytes-with-benefits"), > + particularly if a String ABC proposal is accepted into the Python > + core and implemented. > + > +Conversely, it is presumed that WSGI 1.0 middleware will be created > +which will allow a Web3 application to run behind a WSGI 1.0 stack on > +the Python 2 platform. > + > + > +Environ and Response Values as Bytes > +------------------------------------ > + > +Casual middleware and application writers may consider the use of > +bytes as environment values and response values inconvenient. In > +particular, they won't be able to use common string formatting > +functions such as ``('%s' % bytes_val)`` or > +``bytes_val.format('123')`` because bytes don't have the same API as > +strings on platforms such as Python 3 where the two types differ. > +Likewise, on such platforms, stdlib HTTP-related API support for using > +bytes interchangeably with text can be spotty. In places where bytes > +are inconvenient or incompatible with library APIs, middleware and > +application writers will have to decode such bytes to text explicitly. > +This is particularly inconvenient for middleware writers: to work with > +environment values as strings, they'll have to decode them from an > +implied encoding and if they need to mutate an environ value, they'll > +then need to encode the value into a byte stream before placing it > +into the environ. While the use of bytes by the specification as > +environ values might be inconvenient for casual developers, it > +provides several benefits. > + > +Using bytes types to represent HTTP and server values to an > +application most closely matches reality because HTTP is fundamentally > +a bytes-oriented protocol. If the environ values are mandated to be > +strings, each server will need to use heuristics to guess about the > +encoding of various values provided by the HTTP environment. Using > +all strings might increase casual middleware writer convenience, but > +will also lead to ambiguity and confusion when a value cannot be > +decoded to a meaningful non-surrogate string. > + > +Use of bytes as environ values avoids any potential for the need for > +the specification to mandate that a participating server be informed > +of encoding configuration parameters. If environ values are treated > +as strings, and so must be decoded from bytes, configuration > +parameters may eventually become necessary as policy clues from the > +application deployer. Such a policy would be used to guess an > +appropriate decoding strategy in various circumstances, effectively > +placing the burden for enforcing a particular application encoding > +policy upon the server. If the server must serve more than one > +application, such configuration would quickly become complex. Many > +policies would also be impossible to express declaratively. > + > +In reality, HTTP is a complicated and legacy-fraught protocol which > +requires a complex set of heuristics to make sense of. It would be > +nice if we could allow this protocol to protect us from this > +complexity, but we cannot do so reliably while still providing to > +application writers a level of control commensurate with reality. > +Python applications must often deal with data embedded in the > +environment which not only must be parsed by legacy heuristics, but > +*does not conform even to any existing HTTP specification*. While > +these eventualities are unpleasant, they crop up with regularity, > +making it impossible and undesirable to hide them from application > +developers, as application developers are the only people who are able > +to decide upon an appropriate action when an HTTP specification > +violation is detected. > + > +Some have argued for mixed use of bytes and string values as environ > +*values*. This proposal avoids that strategy. Sole use of bytes as > +environ values makes it possible to fit this specification entirely in > +one's head; you won't need to guess about which values are strings and > +which are bytes. > + > +This protocol would also fit in a developer's head if all environ > +values were strings, but this specification doesn't use that strategy. > +This will likely be the point of greatest contention regarding the use > +of bytes. In defense of bytes: developers often prefer protocols with > +consistent contracts, even if the contracts themselves are suboptimal. > +If we hide encoding issues from a developer until a value that > +contains surrogates causes problems after it has already reached > +beyond the I/O boundary of their application, they will need to do a > +lot more work to fix assumptions made by their application than if we > +were to just present the problem much earlier in terms of "here's some > +bytes, you decode them". This is also a counter-argument to the > +"bytes are inconvenient" assumption: while presenting bytes to an > +application developer may be inconvenient for a casual application > +developer who doesn't care about edge cases, they are extremely > +convenient for the application developer who needs to deal with > +complex, dirty eventualities, because use of bytes allows him the > +appropriate level of control with a clear separation of > +responsibility. > + > +If the protocol uses bytes, it is presumed that libraries will be > +created to make working with bytes-only in the environ and within > +return values more pleasant; for example, analogues of the WSGI 1.0 > +libraries named "WebOb" and "Werkzeug". Such libraries will fill the > +gap between convenience and control, allowing the spec to remain > +simple and regular while still allowing casual authors a convenient > +way to create Web3 middleware and application components. This seems > +to be a reasonable alternative to baking encoding policy into the > +protocol, because many such libraries can be created independently > +from the protocol, and application developers can choose the one that > +provides them the appropriate levels of control and convenience for a > +particular job. > + > +Here are some alternatives to using all bytes: > + > +- Have the server decode all values representing CGI and server > + environ values into strings using the ``latin-1`` encoding, which is > + lossless. Smuggle any undecodable bytes within the resulting > + string. > + > +- Encode all CGI and server environ values to strings using the > + ``utf-8`` encoding with the ``surrogateescape`` error handler. This > + does not work under any existing Python 2. > + > +- Encode some values into bytes and other values into strings, as > + decided by their typical usages. > + > + > +Applications Should be Allowed to Read ``web3.input`` Past ``CONTENT_LENGTH`` > +----------------------------------------------------------------------------- > + > +At [6]_, Graham Dumpleton makes the assertion that ``wsgi.input`` > +should be required to return the empty string as a signifier of > +out-of-data, and that applications should be allowed to read past the > +number of bytes specified in ``CONTENT_LENGTH``, depending only upon > +the empty string as an EOF marker. WSGI relies on an application > +"being well behaved and once all data specified by ``CONTENT_LENGTH`` > +is read, that it processes the data and returns any response. That > +same socket connection could then be used for a subsequent request." > +Graham would like WSGI adapters to be required to wrap raw socket > +connections: "this wrapper object will need to count how much data has > +been read, and when the amount of data reaches that as defined by > +``CONTENT_LENGTH``, any subsequent reads should return an empty string > +instead." This may be useful to support chunked encoding and input > +filters. > + > + > +``web3.input`` Unknown Length > +----------------------------- > + > +There's no documented way to indicate that there is content in > +``environ['web3.input']``, but the content length is unknown. > + > + > +``read()`` of ``web3.input`` Should Support No-Size Calling Convention > +---------------------------------------------------------------------- > + > +At [6]_, Graham Dumpleton makes the assertion that the ``read()`` > +method of ``wsgi.input`` should be callable without arguments, and > +that the result should be "all available request content". Needs > +discussion. > + > +Comment Armin: I changed the spec to require that from an > +implementation. I had too much pain with that in the past already. > +Open for discussions though. > + > + > +Input Filters should set environ ``CONTENT_LENGTH`` to -1 > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +At [6]_, Graham Dumpleton suggests that an input filter might set > +``environ['CONTENT_LENGTH']`` to -1 to indicate that it mutated the > +input. > + > + > +``headers`` as Literal List of Two-Tuples > +----------------------------------------- > + > +Why do we make applications return a ``headers`` structure that is a > +literal list of two-tuples? I think the iterability of ``headers`` > +needs to be maintained while it moves up the stack, but I don't think > +we need to be able to mutate it in place at all times. Could we > +loosen that requirement? > + > +Comment Armin: Strong yes > + > + > +Removed Requirement that Middleware Not Block > +--------------------------------------------- > + > +This requirement was removed: "middleware components **must not** > +block iteration waiting for multiple values from an application > +iterable. If the middleware needs to accumulate more data from the > +application before it can produce any output, it **must** yield an > +empty string." This requirement existed to support asynchronous > +applications and servers (see PEP 333's "Middleware Handling of Block > +Boundaries"). Asynchronous applications are now serviced explicitly > +by ``web3.async`` capable protocol (a Web3 application callable may > +itself return a callable). > + > + > +``web3.script_name`` and ``web3.path_info`` > +------------------------------------------- > + > +These values are required to be placed into the environment by an > +origin server under this specification. Unlike ``SCRIPT_NAME`` and > +``PATH_INFO``, these must be the original *URL-encoded* variants > +derived from the request URI. We probably need to figure out how > +these should be computed originally, and what their values should be > +if the server performs URL rewriting. > + > + > +Long Response Headers > +--------------------- > + > +Bob Brewer notes on Web-SIG [7]_: > + > + Each header_value must not include any control characters, > + including carriage returns or linefeeds, either embedded or at the > + end. (These requirements are to minimize the complexity of any > + parsing that must be performed by servers, gateways, and > + intermediate response processors that need to inspect or modify > + response headers.) [1]_ > + > +That's understandable, but HTTP headers are defined as (mostly) > +\*TEXT, and "words of \*TEXT MAY contain characters from character > +sets other than ISO-8859-1 only when encoded according to the rules of > +RFC 2047." [2]_ And RFC 2047 specifies that "an 'encoded-word' may > +not be more than 75 characters long... If it is desirable to encode > +more text than will fit in an 'encoded-word' of 75 characters, > +multiple 'encoded-word's (separated by CRLF SPACE) may be used." [3]_ > +This satisfies HTTP header folding rules, as well: "Header fields can > +be extended over multiple lines by preceding each extra line with at > +least one SP or HT." [1]_ > + > +So in my reading of HTTP, some code somewhere should introduce > +newlines in longish, encoded response header values. I see three > +options: > + > +1. Keep things as they are and disallow response header values if they > + contain words over 75 chars that are outside the ISO-8859-1 > + character set. > + > +2. Allow newline characters in WSGI response headers. > + > +3. Require/strongly suggest WSGI servers to do the encoding and > + folding before sending the value over HTTP. > + > + > +Request Trailers and Chunked Transfer Encoding > +---------------------------------------------- > + > +When using chunked transfer encoding on request content, the RFCs > +allow there to be request trailers. These are like request headers > +but come after the final null data chunk. These trailers are only > +available when the chunked data stream is finite length and when it > +has all been read in. Neither WSGI nor Web3 currently supports them. > + > +.. XXX (armin) yield from application iterator should be specify write > + plus flush by server. > + > +.. XXX (armin) websocket API. > + > + > +References > +========== > + > +.. [1] PEP 333: Python Web Services Gateway Interface > + (http://www.python.org/dev/peps/pep-0333/) > + > +.. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft > + (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt) > + > +.. [3] "Chunked Transfer Coding" -- HTTP/1.1, section 3.6.1 > + (http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1) > + > +.. [4] "End-to-end and Hop-by-hop Headers" -- HTTP/1.1, Section 13.5.1 > + (http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.5.1) > + > +.. [5] mod_ssl Reference, "Environment Variables" > + (http://www.modssl.org/docs/2.8/ssl_reference.html#ToC25) > + > +.. [6] Details on WSGI 1.0 amendments/clarifications. > + (http://blog.dscpl.com.au/2009/10/details-on-wsgi-10-amendmentsclarificat.html) > + > +.. [7] [Web-SIG] WSGI and long response header values > + http://mail.python.org/pipermail/web-sig/2006-September/002244.html > + > +Copyright > +========= > + > +This document has been placed in the public domain. > + > + > + > +.. > + Local Variables: > + mode: indented-text > + indent-tabs-mode: nil > + sentence-end-double-space: t > + fill-column: 70 > + coding: utf-8 > + End: > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://mail.python.org/mailman/listinfo/python-checkins >
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4