PEP 393 deprecated some unicode APIs, and introduced wchar_t *wstr
, and Py_ssize_t wstr_length
in the Unicode structure to support these deprecated APIs.
This PEP is planning removal of wstr
, and wstr_length
with deprecated APIs using these members by Python 3.12.
Deprecated APIs which doesn’t use the members are out of scope because they can be removed independently.
Motivation Memory usagestr
is one of the most used types in Python. Even most simple ASCII strings have a wstr
member. It consumes 8 bytes per string on 64-bit systems.
To support legacy Unicode object, many Unicode APIs must call PyUnicode_READY()
.
We can remove this overhead too by dropping support of legacy Unicode object.
SimplicitySupporting legacy Unicode object makes the Unicode implementation more complex. Until we drop legacy Unicode object, it is very hard to try other Unicode implementation like UTF-8 based implementation in PyPy.
Rationale Python 4.0 is not scheduled yetPEP 393 introduced efficient internal representation of Unicode and removed border between “narrow” and “wide” build of Python.
PEP 393 was implemented in Python 3.3 which is released in 2012. Old APIs were deprecated since then, and the removal was scheduled in Python 4.0.
Python 4.0 was expected as next version of Python 3.9 when PEP 393 was accepted. But the next version of Python 3.9 is Python 3.10, not 4.0. This is why this PEP schedule the removal plan again.
Python 2 reached EOLSince Python 2 didn’t have PEP 393 Unicode implementation, legacy APIs might help C extension modules supporting both of Python 2 and 3.
But Python 2 reached the EOL in 2020. We can remove legacy APIs kept for compatibility with Python 2.
Plan Python 3.9These macros and functions are marked as deprecated, using Py_DEPRECATED
macro.
Py_UNICODE_WSTR_LENGTH()
PyUnicode_GET_SIZE()
PyUnicode_GetSize()
PyUnicode_GET_DATA_SIZE()
PyUnicode_AS_UNICODE()
PyUnicode_AS_DATA()
PyUnicode_AsUnicode()
_PyUnicode_AsUnicode()
PyUnicode_AsUnicodeAndSize()
PyUnicode_FromUnicode()
Py_DEPRECATED(3.10)
macro are used as possible. But they are deprecated only in comment and document if the macro can not be used easily.
PyUnicode_WCHAR_KIND
PyUnicode_READY()
PyUnicode_IS_READY()
PyUnicode_IS_COMPACT()
PyUnicode_FromUnicode(NULL, size)
and PyUnicode_FromStringAndSize(NULL, size)
emit DeprecationWarning
when size > 0
.PyArg_ParseTuple()
and PyArg_ParseTupleAndKeywords()
emit DeprecationWarning
when u
, u#
, Z
, and Z#
formats are used.wstr
wstr_length
state.compact
state.ready
PyUnicodeObject
structure is removed.Py_UNICODE_WSTR_LENGTH()
PyUnicode_GET_SIZE()
PyUnicode_GetSize()
PyUnicode_GET_DATA_SIZE()
PyUnicode_AS_UNICODE()
PyUnicode_AS_DATA()
PyUnicode_AsUnicode()
_PyUnicode_AsUnicode()
PyUnicode_AsUnicodeAndSize()
PyUnicode_FromUnicode()
PyUnicode_WCHAR_KIND
PyUnicode_READY()
PyUnicode_IS_READY()
PyUnicode_IS_COMPACT()
PyUnicode_FromStringAndSize(NULL, size))
raises RuntimeError
when size > 0
.PyArg_ParseTuple()
and PyArg_ParseTupleAndKeywords()
raise SystemError
when u
, u#
, Z
, and Z#
formats are used, as other unsupported format character.wchar_t*
representation of string objects.This document has been placed in the public domain.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4