A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/python/peps/commit/12cecb05489e74a36a11c17e8d0b1e36e3768bda below:

Only set LC_CTYPE, never LANG · python/peps@12cecb0 · GitHub

@@ -51,7 +51,6 @@ changed to be roughly equivalent to the following existing configuration

51 51

settings (supported since Python 3.1)::

52 52 53 53

LC_CTYPE=C.UTF-8

54 -

LANG=C.UTF-8

55 54

PYTHONIOENCODING=utf-8:surrogateescape

56 55 57 56

The exact target locale for coercion will be chosen from a predefined list at

@@ -153,7 +152,7 @@ The simplest way to deal with this problem for currently released versions of

153 152

CPython is to explicitly set a more sensible locale when launching the

154 153

application. For example::

155 154 156 -

LANG=C.UTF-8 python3 ...

155 +

LC_CTYPE=C.UTF-8 python3 ...

157 156 158 157

The ``C.UTF-8`` locale is a full locale definition that uses ``UTF-8`` for the

159 158

``LC_CTYPE`` category, and the same settings as the ``C`` locale for all other

@@ -276,19 +275,19 @@ The simplest way to get Python 3 (regardless of the exact version) to behave

276 275

sensibly in Fedora and Debian based containers is to run it in the ``C.UTF-8``

277 276

locale that both distros provide::

278 277 279 -

$ docker run --rm -e LANG=C.UTF-8 fedora:25 python3 -c 'print("ℙƴ☂ℌøἤ")'

278 +

$ docker run --rm -e LC_CTYPE=C.UTF-8 fedora:25 python3 -c 'print("ℙƴ☂ℌøἤ")'

280 279

ℙƴ☂ℌøἤ

281 -

$ docker run --rm -e LANG=C.UTF-8 ncoghlan/debian-python python3 -c 'print("ℙƴ☂ℌøἤ")'

280 +

$ docker run --rm -e LC_CTYPE=C.UTF-8 ncoghlan/debian-python python3 -c 'print("ℙƴ☂ℌøἤ")'

282 281

ℙƴ☂ℌøἤ

283 282 284 -

$ docker run --rm -e LANG=C.UTF-8 fedora:25 locale | grep -E 'LC_ALL|LC_CTYPE|LANG'

285 -

LANG=C.UTF-8

286 -

LC_CTYPE="C.UTF-8"

283 +

$ docker run --rm -e LC_CTYPE=C.UTF-8 fedora:25 locale | grep -E 'LC_ALL|LC_CTYPE|LANG'

284 +

LANG=

285 +

LC_CTYPE=C.UTF-8

287 286

LC_ALL=

288 -

$ docker run --rm -e LANG=C.UTF-8 ncoghlan/debian-python locale | grep -E 'LC_ALL|LC_CTYPE|LANG'

289 -

LANG=C.UTF-8

287 +

$ docker run --rm -e LC_CTYPE=C.UTF-8 ncoghlan/debian-python locale | grep -E 'LC_ALL|LC_CTYPE|LANG'

288 +

LANG=

290 289

LANGUAGE=

291 -

LC_CTYPE="C.UTF-8"

290 +

LC_CTYPE=C.UTF-8

292 291

LC_ALL=

293 292 294 293

The Alpine Linux based Python images provided by Docker, Inc. already use the

@@ -358,8 +357,9 @@ use an explicit locale category like ``LC_TIME``, ``LC_MONETARY`` or

358 357

``LC_NUMERIC`` while otherwise running in the legacy C locale gives the

359 358

following design principles:

360 359 361 -

* don't make any environmental changes that would override explicit settings for

362 -

locale categories other than ``LC_CTYPE`` (most notably: don't set ``LC_ALL``)

360 +

* don't make any environmental changes that would alter any existing settings

361 +

for locale categories other than ``LC_CTYPE`` (most notably: don't set

362 +

``LC_ALL`` or ``LANG``)

363 363 364 364

Finally, maintaining compatibility with running arbitrary subprocesses in

365 365

orchestration use cases leads to the following design principle:

@@ -374,11 +374,12 @@ Specification

374 374 375 375

To better handle the cases where CPython would otherwise end up attempting

376 376

to operate in the ``C`` locale, this PEP proposes that CPython automatically

377 -

attempt to coerce the legacy ``C`` locale to a UTF-8 based locale when it is

378 -

run as a standalone command line application.

377 +

attempt to coerce the legacy ``C`` locale to a UTF-8 based locale for the

378 +

``LC_CTYPE`` category when it is run as a standalone command line application.

379 379 380 380

It further proposes to emit a warning on stderr if the legacy ``C`` locale

381 -

is in effect at the point where the language runtime itself is initialized,

381 +

is in effect for the ``LC_CTYPE`` category at the point where the language

382 +

runtime itself is initialized,

382 383

and the explicit environmental flag to disable locale coercion is not set, in

383 384

order to warn system and application integrators that they're running CPython

384 385

in an unsupported configuration.

@@ -423,17 +424,13 @@ Three such locales will be tried:

423 424

* ``C.UTF-8`` (available at least in Debian, Ubuntu, Alpine, and Fedora 25+, and

424 425

expected to be available by default in a future version of glibc)

425 426

* ``C.utf8`` (available at least in HP-UX)

426 -

* ``UTF-8`` (available in at least some \*BSD variants)

427 +

* ``UTF-8`` (available in at least some \*BSD variants, including Mac OS X)

427 428 428 -

For ``C.UTF-8`` and ``C.utf8``, the coercion will be implemented by setting

429 -

both the ``LC_CTYPE`` and ``LANG`` environment variables to the candidate

430 -

locale name, such that future calls to ``setlocale()`` will see them, as will

431 -

other components looking for those settings (such as GUI development

432 -

frameworks).

433 - 434 -

For the platforms where it is defined, ``UTF-8`` is a partial locale that only

435 -

defines the ``LC_CTYPE`` category. Accordingly, only the ``LC_CTYPE``

436 -

environment variable would be set when using this fallback option.

429 +

The coercion will be implemented by setting the ``LC_CTYPE`` environment

430 +

variable to the candidate locale name, such that future calls to

431 +

``setlocale()`` will see it, as will other components looking for those

432 +

settings (such as GUI development frameworks and Python's own ``locale``

433 +

module).

437 434 438 435

To allow for better cross-platform binary portability and to adjust

439 436

automatically to future changes in locale availability, these checks will be

@@ -444,15 +441,9 @@ When this locale coercion is activated, the following warning will be

444 441

printed on stderr, with the warning containing whichever locale was

445 442

successfully configured::

446 443 447 -

Python detected LC_CTYPE=C: LC_CTYPE & LANG coerced to C.UTF-8 (set another

444 +

Python detected LC_CTYPE=C: LC_CTYPE coerced to C.UTF-8 (set another

448 445

locale or PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).

449 446 450 -

When falling back to the ``UTF-8`` locale, the message would be slightly

451 -

different::

452 - 453 -

Python detected LC_CTYPE=C: LC_CTYPE coerced to UTF-8 (set another locale

454 -

or PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).

455 - 456 447

As long as the current platform provides at least one of the candidate UTF-8

457 448

based environments, this locale coercion will mean that the standard

458 449

Python binary *and* locale-aware extensions should once again "just work"

@@ -489,9 +480,9 @@ Legacy C locale warning during runtime initialization

489 480 490 481

By the time that ``Py_Initialize`` is called, arbitrary locale-dependent

491 482

operations may have taken place in the current process. This means that

492 -

by the time it is called, it is *too late* to switch to a different locale -

493 -

doing so would introduce inconsistencies in decoded text, even in the context

494 -

of the standalone Python interpreter binary.

483 +

by the time it is called, it is *too late* to reliably switch to a different

484 +

locale - doing so would introduce inconsistencies in decoded text, even in the

485 +

context of the standalone Python interpreter binary.

495 486 496 487

Accordingly, when ``Py_Initialize`` is called and CPython detects that the

497 488

configured locale is still the default ``C`` locale and

@@ -860,8 +851,8 @@ whether or not the current locale configuration is likely to cause Unicode

860 851

handling problems.

861 852 862 853 863 -

Setting both LC_CTYPE & LANG for UTF-8 locale coercion

864 -

------------------------------------------------------

854 +

Explicitly setting LC_CTYPE for UTF-8 locale coercion

855 +

-----------------------------------------------------

865 856 866 857

Python is often used as a glue language, integrating other C/C++ ABI compatible

867 858

components in the current process, and components written in arbitrary

@@ -872,19 +863,46 @@ problem has arisen from a setting like ``LC_CTYPE=UTF-8`` being provided on a

872 863

system where no ``UTF-8`` locale is defined (e.g. when a Mac OS X ssh client is

873 864

configured to forward locale settings, and the user logs into a Linux server).

874 865 875 -

Setting ``LANG`` to ``C.UTF-8`` ensures that even components that only check

876 -

the ``LANG`` fallback for their locale settings will still use ``C.UTF-8``.

866 +

This should be sufficient to ensure that when the locale coercion is activated,

867 +

the switch to the UTF-8 based locale will be applied consistently across the

868 +

current process and any subprocesses that inherit the current environment.

869 + 870 + 871 +

Avoiding setting LANG for UTF-8 locale coercion

872 +

-----------------------------------------------

873 + 874 +

Earlier versions of this PEP proposed setting the ``LANG`` category indepdent

875 +

default locale, in addition to setting ``LC_CTYPE``.

876 + 877 +

This was later removed on the grounds that setting only ``LC_CTYPE`` is

878 +

sufficient to handle all of the problematic scenarios that the PEP aimed

879 +

to resolve, while setting ``LANG`` as well would break cases where ``LANG``

880 +

was set correctly, and the locale problems were solely due to an incorrect

881 +

``LC_CTYPE`` setting ([22_]).

877 882 878 -

Together, these should ensure that when the locale coercion is activated, the

879 -

switch to the UTF-8 based locale will be applied consistently across the current

880 -

process and any subprocesses that inherit the current environment.

883 +

For example, consider a Python application that called the Linux ``date``

884 +

utility in a subprocess rather than doing its own date formatting::

885 + 886 +

$ LANG=ja_JP.UTF-8 LC_CTYPE=C date

887 +

2017年 5月 23日 火曜日 17:31:03 JST

888 + 889 +

$ LANG=ja_JP.UTF-8 LC_CTYPE=C.UTF-8 date # Coercing only LC_CTYPE

890 +

2017年 5月 23日 火曜日 17:32:58 JST

891 + 892 +

$ LANG=C.UTF-8 LC_CTYPE=C.UTF-8 date # Coercing both of LC_CTYPE and LANG

893 +

Tue May 23 17:31:10 JST 2017

894 + 895 +

With only ``LC_CTYPE`` updated in the Python process, the subprocess would

896 +

continue to behave as expected. However, if ``LANG`` was updated as well,

897 +

that would effectively override the ``LC_TIME`` setting and use the wrong

898 +

date formatting conventions.

881 899 882 900 883 901

Avoiding setting LC_ALL for UTF-8 locale coercion

884 902

-------------------------------------------------

885 903 886 904

Earlier versions of this PEP proposed setting the ``LC_ALL`` locale override,

887 -

rather than just setting ``LC_CTYPE`` and ``LANG``.

905 +

in addition to setting ``LC_CTYPE``.

888 906 889 907

This was changed after it was determined that just setting ``LC_CTYPE`` and

890 908

``LANG`` should be sufficient to handle all the scenarios the PEP aims to

@@ -1198,6 +1216,10 @@ References

1198 1216

.. [21] GNU readline misbehaviour on Mac OS X with ``LANG=C``

1199 1217

(https://mail.python.org/pipermail/python-dev/2017-May/147897.html)

1200 1218 1219 +

.. [22] Potential problems when setting LANG in addition to setting LC_CTYPE

1220 +

(https://mail.python.org/pipermail/python-dev/2017-May/147968.html)

1221 + 1222 + 1201 1223

Copyright

1202 1224

=========

1203 1225

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4