On Fri, Apr 6, 2012 at 1:06 PM, Vinay Sajip <vinay_sajip at yahoo.co.uk> wrote: > There is a problem with the way logging.handlers.SysLogHandler works > when presented with Unicode messages. According to RFC 5424, Unicode > is supposed to be sent encoded as UTF-8 and preceded by a BOM. > However, the current handler implementation puts the BOM at the start > of the formatted message, and this is wrong in scenarios where you > want to put some additional structured data in front of the > unstructured message part; the BOM is supposed to go after the > structured part (which, therefore, has to be ASCII) and before the > unstructured part. In that scenario, the handler's current behaviour > does not strictly conform to RFC 5424. > > The issue is described in [1]. The BOM was originally added / position > changed in response to [2] and [3]. > > It is not possible to achieve conformance with the current > implementation of the handler, unless you subclass the handler and > override the whole emit() method. This is not ideal. For 3.3, I will > refactor the implementation to expose a method which creates the byte > string which is sent over the wire to the syslog daemon. This method > can then be overridden for specific use cases where needed. > > However, for 2.7 and 3.2, removing the BOM insertion would bring the > implementation into conformance to the RFC, though the entire message > would have to be regarded as just a set of octets. A Unicode message > would still be encoded using UTF-8, but the BOM would be left out. > > I am thinking of removing the BOM insertion in 2.7 and 3.2 - although > it is a change in behaviour, the current behaviour does seem broken > with regard to RFC 5424 conformance. However, as some might disagree > with that assessment and view it as a backwards-incompatible behaviour > change, I thought I should post this to get some opinions about > whether this change is viewed as objectionable. > Given the existing brokenness I personally think that removing the BOM insertion (because it is incorrect) in 2.7 and 3.2 is fine if you cannot find a way to make it correct in 2.7 and 3.2 without breaking existing APIs. could a private method to create the byte string not be added and used in 2.7 and 3.2 that correctly add the BOM? > Regards, > > Vinay Sajip > > [1] http://bugs.python.org/issue14452 > [2] http://bugs.python.org/issue7077 > [3] http://bugs.python.org/issue8795 > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20120410/57753188/attachment.html>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4