"M.-A. Lemburg" <mal at egenix.com> writes: > Aahz wrote: >> On Sat, Nov 20, 2004, Andreas Degert wrote: >> >>>I think I found a bug in PyLocale_strcoll() (Python 2.3.4). When used >>>with 2 unicode strings, it converts them to wchar strings and uses >>>wcscoll. The bug is that the wchar strings are not 0-terminated. >> If you're sure this is a bug, please file on SF and report back the >> ID. >> (If you're not sure, what until you get confirmation from one of the >> Unicode experts and then file the bug. ;-) > > Please also check that the bug is still present in Python 2.4 and/or > CVS. We've corrected a bug in the PyUnicode_*WideChar*() APIs just > recently for Python 2.4. The off-by-one error fix in unicodeobject.c (2.228 -> 2.229) is correcting a buffer overflow, is just in the same piece of code. I didn't find a clear statement if the unicode string should be 0-terminated or not. In _PyUnicode_New it's 0-terminated, even in the case when it had to call unicode_resize (though there is a comment in unicode_resize "Ux0000 terminated -- XXX is this needed ?"). If these is the only place where unicode objects are created or modified, they seem to be always 0-terminated. wchar strings must be 0-terminated if they are to be used with the wcs* functions. So it's not a good idea to return a non-terminated string from PyUnicode_AsWideChar. If the unicode strings are always 0-terminated (the unicode buffer size is length+1), then we could just change if (size > PyUnicode_GET_SIZE(unicode)) size = PyUnicode_GET_SIZE(unicode); to if (size > PyUnicode_GET_SIZE(unicode)+1) size = PyUnicode_GET_SIZE(unicode)+1; in PyUnicode_AsWideChar to get 0-terminated wchars. Ok... I'm still not sure if I should file a bug for PyLocale_strcoll or PyUnicode_AsWideChar and if the patch for the latter should assume that the unicode string buffer is 0-terminated... cheers Andreas
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4