Andreas Degert wrote: > "M.-A. Lemburg" <mal at egenix.com> writes: > > >>Aahz wrote: >> >>>On Sat, Nov 20, 2004, Andreas Degert wrote: >>> >>> >>>>I think I found a bug in PyLocale_strcoll() (Python 2.3.4). When used >>>>with 2 unicode strings, it converts them to wchar strings and uses >>>>wcscoll. The bug is that the wchar strings are not 0-terminated. >>> >>>If you're sure this is a bug, please file on SF and report back the >>>ID. >>>(If you're not sure, what until you get confirmation from one of the >>>Unicode experts and then file the bug. ;-) >> >>Please also check that the bug is still present in Python 2.4 and/or >>CVS. We've corrected a bug in the PyUnicode_*WideChar*() APIs just >>recently for Python 2.4. > > > The off-by-one error fix in unicodeobject.c (2.228 -> 2.229) is > correcting a buffer overflow, is just in the same piece of code. > > I didn't find a clear statement if the unicode string should be > 0-terminated or not. You're right: they are always 0-terminated just like 8-bit strings and even though it doesn't seem to be necessary since Python functions will always use the size field when working on a Unicode object rather than rely on the 0-termination. > In _PyUnicode_New it's 0-terminated, even in the > case when it had to call unicode_resize (though there is a comment in > unicode_resize "Ux0000 terminated -- XXX is this needed ?"). If these > is the only place where unicode objects are created or modified, they > seem to be always 0-terminated. Right. > wchar strings must be 0-terminated if they are to be used with the > wcs* functions. So it's not a good idea to return a non-terminated > string from PyUnicode_AsWideChar. If the unicode strings are always > 0-terminated (the unicode buffer size is length+1), then we could just > change > > if (size > PyUnicode_GET_SIZE(unicode)) > size = PyUnicode_GET_SIZE(unicode); > > to > > if (size > PyUnicode_GET_SIZE(unicode)+1) > size = PyUnicode_GET_SIZE(unicode)+1; > > in PyUnicode_AsWideChar to get 0-terminated wchars. > > Ok... I'm still not sure if I should file a bug for PyLocale_strcoll > or PyUnicode_AsWideChar and if the patch for the latter should assume > that the unicode string buffer is 0-terminated... I think it's probably wise to fix both: Looking again, the patch we applied to PyUnicode_AsWideChar() only fixes the 0-termination problem in the case where HAVE_USABLE_WCHAR_T is set. This should be extended to the memcpy() as well. Still, if the buffer passed to PyUnicode_AsWideChar() is not big enough, you won't get the 0-termination (due to truncation), so PyLocale_strcoll() must be either very careful to allocate a buffer that is always big enough or apply 0-termination itself. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 22 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4