A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/python/cpython/issues/119609 below:

Add PyUnicode_Export() and PyUnicode_Import() functions · Issue #119609 · python/cpython · GitHub

Feature or enhancement

PEP 393 – Flexible String Representation changed the Unicode implementation in Python 3.3 to use 3 string "kinds":

Strings must always use the optimal storage: ASCII string must be stored as PyUnicode_KIND_2BYTE.

Strings have a flag indicating if the string only contains ASCII characters: [U+0000; U+007f] range. It's used by multiple internal optimizations.

This implementation is not leaked in the limited C API. For example, the PyUnicode_FromKindAndData() function is excluded from the stable ABI. Said differently, it's not possible to write efficient code for PEP 393 using the limited C API.

I propose adding two functions:

These functions are added to the limited C API version 3.14.

Native formats (new constants):

Differences with PyUnicode_FromKindAndData():

PyUnicode_NATIVE_ASCII format allows further optimizations.

PyUnicode_NATIVE_UTF8 can be used by PyPy and other Python implementation using UTF-8 as the internal storage.

API:

#define PyUnicode_NATIVE_ASCII 1
#define PyUnicode_NATIVE_UCS1 2
#define PyUnicode_NATIVE_UCS2 3
#define PyUnicode_NATIVE_UCS4 4
#define PyUnicode_NATIVE_UTF8 5

// Get the content of a string in its native format.
// - Return the content, set '*size' and '*native_format' on success.
// - Set an exception and return NULL on error.
PyAPI_FUNC(const void*) PyUnicode_AsNativeFormat(
    PyObject *unicode,
    Py_ssize_t *size,
    int *native_format);

// Create a string object from a native format string.
// - Return a reference to a new string object on success.
// - Set an exception and return NULL on error.
PyAPI_FUNC(PyObject*) PyUnicode_FromNativeFormat(
    const void *data,
    Py_ssize_t size,
    int native_format);

See the attached pull request for more details.

This feature was requested to me to port the MarkupSafe C extension to the limited C API. Currently, each release requires producing around 60 wheel files which takes 20 minutes to build: https://pypi.org/project/MarkupSafe/#files

Using the stable ABI would reduce the number of wheel packages and so ease their release process.

See src/markupsafe/_speedups.c: string functions specialized for the 3 string kinds (UCS-1, UCS-2, UCS-4).

Linked PRs

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4