A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/python/cpython/issues/111089 below:

[C API] Change PyUnicode_AsUTF8() to return NULL on embedded null characters · Issue #111089 · python/cpython · GitHub

I propose to change the PyUnicode_AsUTF8() API to raise an exception and return NULL if the string contains embedded null characters.

If the string contains an embedded null character, the UTF-8 encoded string can be truncated if used with C functions using char* since a null byte is treated as the terminator: marker of the string end. Truncating a string silently is a bad practice and can lead to different bugs including security vulnerabilities.

In practice, the minority of impacted C extensions and impacted users should benefit of such backward incompatible change, since truncating a string silently is a bad practice. Impacted users can use PyUnicode_AsUTF8AndSize(obj, NULL) and just ignore the size if they want to truncate on purpose.

It would address the following "hidden" comment on PyUnicode_AsUTF8():

Use of this API is DEPRECATED since no size information can be
extracted from the returned data.

PyUnicode_AsUTF8String() is part of the limited C API, whereas PyUnicode_AsUTF8() is not.

In the recently added PyUnicode_EqualToUTF8(obj, str), str is treated as not equal if obj contains embedded null characters.

The folllowing functions already raise an exception if the string contains embedded null characters or bytes:

PyUnicode_AsUTF8String() returns a bytes object and so the length, so it doesn't raise the exception.

PyUnicode_AsUTF8AndSize() also returns the size and so don't raise on embedded null characters.

Linked PRs

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4