Andreas Degert wrote: > "M.-A. Lemburg" <mal at egenix.com> writes: > > >>You're right: they are always 0-terminated just like 8-bit strings >>and even though it doesn't seem to be necessary since Python >>functions will always use the size field when working on >>a Unicode object rather than rely on the 0-termination. > > > OK, should be documented in the code It is, but I wasn't sure whether it is really such a good idea to waist the extra memory and wanted to keep the option of removing the 0-termination. >>>Ok... I'm still not sure if I should file a bug for PyLocale_strcoll >>>or PyUnicode_AsWideChar and if the patch for the latter should assume >>>that the unicode string buffer is 0-terminated... >> >>I think it's probably wise to fix both: >> >>Looking again, the patch we applied to PyUnicode_AsWideChar() >>only fixes the 0-termination problem in the case where >>HAVE_USABLE_WCHAR_T is set. This should be extended to >>the memcpy() as well. > > > What I read from the code is that now in both cases the string is > copied without 0 and that is consistent with the size the buffer is > checked for (PyUnicode_GET_SIZE gives the value of the length field > and that doesn't include the 0-termination) > > >>Still, if the buffer passed to PyUnicode_AsWideChar() >>is not big enough, you won't get the 0-termination (due >>to truncation), so PyLocale_strcoll() must be either very >>careful to allocate a buffer that is always big enough >>or apply 0-termination itself. > > > PyLocale_strcoll() acts quite careful but even so it didn't get what > it expected ;-). This bug is masked by the bug you referred to when > the copy loop is used (ie. if wchar sizes don't match) and the output > buffer string is big enough (like in the strcoll case because the > buffer size already accounts for the 0-termination). > > I appended a (untested) patch for unicodeobject.c. I've just checked in a patch which should correct the problem. > The documentation should be clarified too. Would a patch against > concrete.tex be accepted where I change > > - 'Unicode object' to 'Unicode string' when only the string part of > the python object is referenced, Not sure what you mean here. > - 'size of the object' to 'length of the string' Dito. > - mention the 0-termination of the return-value of > PyUnicode_AS_UNICODE() > > - mention the 0-termination of the return-value of > PyUnicode_AsWideChar I don't think we should document this. Programmers should always use the size of the object rather than rely on the 0-termination. > - '... represents a 16-bit...' to something that explains 16 vs. 32 > but depending on internal representation (UCS-2 or UCS-4) selected at > compile time +1 -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 22 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4