"M.-A. Lemburg" wrote: > > "Martin v. Loewis" wrote: > > > > "M.-A. Lemburg" <mal@lemburg.com> writes: > > > > > Some debugging with gdb indicates that the codec is indeed writing > > > the 'nd', but the final _PyString_Resize() (which allocates a new > > > buffer and copies the data into that buffer) fails to copy the last > > > two characters from the string or overwrites it with NULLs. > > > > > > Looks like a pymalloc problem to me. Tim ? > > > > It's a UTF-8 codec bug. The codec writes over the end of the buffer, > > then invokes resize. Resizing only copies the allocated bytes, hence > > the uninitialized bytes at the end. > > Ah, yes, you're right. That is... instrumenting the codec I get these results: >>> (u'\u6b63\u78ba\u306b\u8a00\u3046\u3068\u7ffb\u8a33\u306f' ... u'\u3055\u308c\u3066\u3044\u307e\u305b\u3093\u3002\u4e00' ... u'\u90e8\u306f\u30c9\u30a4\u30c4\u8a9e\u3067\u3059\u304c' ... u'\u3001\u3042\u3068\u306f\u3067\u305f\u3089\u3081\u3067' ... u'\u3059\u3002\u5b9f\u969b\u306b\u306f\u300cWenn ist das' ... u' Nunstuck git und'.encode('utf-8')) cbWritten=0, cbAllocated=144 cbWritten=3, cbAllocated=144 cbWritten=6, cbAllocated=144 cbWritten=9, cbAllocated=144 ... cbWritten=102, cbAllocated=144 cbWritten=105, cbAllocated=144 cbWritten=108, cbAllocated=144 cbWritten=111, cbAllocated=144 cbWritten=114, cbAllocated=144 cbWritten=117, cbAllocated=144 cbWritten=120, cbAllocated=144 cbWritten=123, cbAllocated=144 cbWritten=126, cbAllocated=144 end of string = 'ck git und' '\xe6\xad\xa3\xe7\xa2\xba\xe3....das Nunstuck git u \x8f' (the last two bytes seem to be random data, they change from run to run) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4