Guido van Rossum <guido@python.org> writes: > This might beling on SF, except it's already been solved in Python > 2.3, and I need guidance about what to do for Python 2.2.2. > > In 2.2.1, a lone surrogate encoded into utf8 gives an utf8 string that > cannot be decode back. In 2.3, this is fixed. Should this be fixed > in 2.2.2 as well? I think this was discussed really quite a long time ago, like six months or so. > I'm asking because it caused problems with reading .pyc files: if > there's a Unicode literal containing a lone surrogate, reading the > .pyc file causes an exception: > > UnicodeError: UTF-8 decoding error: unexpected code byte > > It looks like revision 2.128 fixed this for 2.3, but that patch > doesn't cleanly apply to the 2.2 maintenance branch. Can someone > help? I think the reason this didn't get fixed in 2.2.1 is that it necessitates bumping MAGIC. I can probably dig up more references if you want. Cheers, M. -- 34. The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4