Stephen J. Turnbull wrote: > Martin> With the UTF-8-SIG codec, it would apply to all operation > Martin> modes of the codec, whether stream-based or from strings. > > I had in mind the ability to treat a string as a stream. Hmm. A string is not a stream, but it could be the contents of a stream. A typical application of codecs goes like this: data = stream.read() [analyze data, e.g. by checking whether there is encoding= in <?xml...] data = data.decode(encoding analyzed) So people do use the "decode-it-all" mode, where no sequential access is necessary - yet the beginning of the string is still the beginning of what once was a stream. This case must be supported. > Martin> Whether or not to use the codec would be the application's > Martin> choice. > > What I think should be provided is a stateful object encapsulating the > codec. Ie, to avoid the need to write > > out = chunk[0].encode("utf-8-sig") + chunk[1].encode("utf-8") No. People who want streaming should use cStringIO, i.e. >>> s=cStringIO.StringIO() >>> s1=codecs.getwriter("utf-8")(s) >>> s1.write(u"Hallo") >>> s.getvalue() 'Hallo' Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4