@@ -51,7 +51,6 @@ changed to be roughly equivalent to the following existing configuration
51
51
settings (supported since Python 3.1)::
52
52
53
53
LC_CTYPE=C.UTF-8
54
-
LANG=C.UTF-8
55
54
PYTHONIOENCODING=utf-8:surrogateescape
56
55
57
56
The exact target locale for coercion will be chosen from a predefined list at
@@ -153,7 +152,7 @@ The simplest way to deal with this problem for currently released versions of
153
152
CPython is to explicitly set a more sensible locale when launching the
154
153
application. For example::
155
154
156
-
LANG=C.UTF-8 python3 ...
155
+
LC_CTYPE=C.UTF-8 python3 ...
157
156
158
157
The ``C.UTF-8`` locale is a full locale definition that uses ``UTF-8`` for the
159
158
``LC_CTYPE`` category, and the same settings as the ``C`` locale for all other
@@ -276,19 +275,19 @@ The simplest way to get Python 3 (regardless of the exact version) to behave
276
275
sensibly in Fedora and Debian based containers is to run it in the ``C.UTF-8``
277
276
locale that both distros provide::
278
277
279
-
$ docker run --rm -e LANG=C.UTF-8 fedora:25 python3 -c 'print("ℙƴ☂ℌøἤ")'
278
+
$ docker run --rm -e LC_CTYPE=C.UTF-8 fedora:25 python3 -c 'print("ℙƴ☂ℌøἤ")'
280
279
ℙƴ☂ℌøἤ
281
-
$ docker run --rm -e LANG=C.UTF-8 ncoghlan/debian-python python3 -c 'print("ℙƴ☂ℌøἤ")'
280
+
$ docker run --rm -e LC_CTYPE=C.UTF-8 ncoghlan/debian-python python3 -c 'print("ℙƴ☂ℌøἤ")'
282
281
ℙƴ☂ℌøἤ
283
282
284
-
$ docker run --rm -e LANG=C.UTF-8 fedora:25 locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
285
-
LANG=C.UTF-8
286
-
LC_CTYPE="C.UTF-8"
283
+
$ docker run --rm -e LC_CTYPE=C.UTF-8 fedora:25 locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
284
+
LANG=
285
+
LC_CTYPE=C.UTF-8
287
286
LC_ALL=
288
-
$ docker run --rm -e LANG=C.UTF-8 ncoghlan/debian-python locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
289
-
LANG=C.UTF-8
287
+
$ docker run --rm -e LC_CTYPE=C.UTF-8 ncoghlan/debian-python locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
288
+
LANG=
290
289
LANGUAGE=
291
-
LC_CTYPE="C.UTF-8"
290
+
LC_CTYPE=C.UTF-8
292
291
LC_ALL=
293
292
294
293
The Alpine Linux based Python images provided by Docker, Inc. already use the
@@ -358,8 +357,9 @@ use an explicit locale category like ``LC_TIME``, ``LC_MONETARY`` or
358
357
``LC_NUMERIC`` while otherwise running in the legacy C locale gives the
359
358
following design principles:
360
359
361
-
* don't make any environmental changes that would override explicit settings for
362
-
locale categories other than ``LC_CTYPE`` (most notably: don't set ``LC_ALL``)
360
+
* don't make any environmental changes that would alter any existing settings
361
+
for locale categories other than ``LC_CTYPE`` (most notably: don't set
362
+
``LC_ALL`` or ``LANG``)
363
363
364
364
Finally, maintaining compatibility with running arbitrary subprocesses in
365
365
orchestration use cases leads to the following design principle:
@@ -374,11 +374,12 @@ Specification
374
374
375
375
To better handle the cases where CPython would otherwise end up attempting
376
376
to operate in the ``C`` locale, this PEP proposes that CPython automatically
377
-
attempt to coerce the legacy ``C`` locale to a UTF-8 based locale when it is
378
-
run as a standalone command line application.
377
+
attempt to coerce the legacy ``C`` locale to a UTF-8 based locale for the
378
+
``LC_CTYPE`` category when it is run as a standalone command line application.
379
379
380
380
It further proposes to emit a warning on stderr if the legacy ``C`` locale
381
-
is in effect at the point where the language runtime itself is initialized,
381
+
is in effect for the ``LC_CTYPE`` category at the point where the language
382
+
runtime itself is initialized,
382
383
and the explicit environmental flag to disable locale coercion is not set, in
383
384
order to warn system and application integrators that they're running CPython
384
385
in an unsupported configuration.
@@ -423,17 +424,13 @@ Three such locales will be tried:
423
424
* ``C.UTF-8`` (available at least in Debian, Ubuntu, Alpine, and Fedora 25+, and
424
425
expected to be available by default in a future version of glibc)
425
426
* ``C.utf8`` (available at least in HP-UX)
426
-
* ``UTF-8`` (available in at least some \*BSD variants)
427
+
* ``UTF-8`` (available in at least some \*BSD variants, including Mac OS X)
427
428
428
-
For ``C.UTF-8`` and ``C.utf8``, the coercion will be implemented by setting
429
-
both the ``LC_CTYPE`` and ``LANG`` environment variables to the candidate
430
-
locale name, such that future calls to ``setlocale()`` will see them, as will
431
-
other components looking for those settings (such as GUI development
432
-
frameworks).
433
-
434
-
For the platforms where it is defined, ``UTF-8`` is a partial locale that only
435
-
defines the ``LC_CTYPE`` category. Accordingly, only the ``LC_CTYPE``
436
-
environment variable would be set when using this fallback option.
429
+
The coercion will be implemented by setting the ``LC_CTYPE`` environment
430
+
variable to the candidate locale name, such that future calls to
431
+
``setlocale()`` will see it, as will other components looking for those
432
+
settings (such as GUI development frameworks and Python's own ``locale``
433
+
module).
437
434
438
435
To allow for better cross-platform binary portability and to adjust
439
436
automatically to future changes in locale availability, these checks will be
@@ -444,15 +441,9 @@ When this locale coercion is activated, the following warning will be
444
441
printed on stderr, with the warning containing whichever locale was
445
442
successfully configured::
446
443
447
-
Python detected LC_CTYPE=C: LC_CTYPE & LANG coerced to C.UTF-8 (set another
444
+
Python detected LC_CTYPE=C: LC_CTYPE coerced to C.UTF-8 (set another
448
445
locale or PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).
449
446
450
-
When falling back to the ``UTF-8`` locale, the message would be slightly
451
-
different::
452
-
453
-
Python detected LC_CTYPE=C: LC_CTYPE coerced to UTF-8 (set another locale
454
-
or PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).
455
-
456
447
As long as the current platform provides at least one of the candidate UTF-8
457
448
based environments, this locale coercion will mean that the standard
458
449
Python binary *and* locale-aware extensions should once again "just work"
@@ -489,9 +480,9 @@ Legacy C locale warning during runtime initialization
489
480
490
481
By the time that ``Py_Initialize`` is called, arbitrary locale-dependent
491
482
operations may have taken place in the current process. This means that
492
-
by the time it is called, it is *too late* to switch to a different locale -
493
-
doing so would introduce inconsistencies in decoded text, even in the context
494
-
of the standalone Python interpreter binary.
483
+
by the time it is called, it is *too late* to reliably switch to a different
484
+
locale - doing so would introduce inconsistencies in decoded text, even in the
485
+
context of the standalone Python interpreter binary.
495
486
496
487
Accordingly, when ``Py_Initialize`` is called and CPython detects that the
497
488
configured locale is still the default ``C`` locale and
@@ -860,8 +851,8 @@ whether or not the current locale configuration is likely to cause Unicode
860
851
handling problems.
861
852
862
853
863
-
Setting both LC_CTYPE & LANG for UTF-8 locale coercion
864
-
------------------------------------------------------
854
+
Explicitly setting LC_CTYPE for UTF-8 locale coercion
855
+
-----------------------------------------------------
865
856
866
857
Python is often used as a glue language, integrating other C/C++ ABI compatible
867
858
components in the current process, and components written in arbitrary
@@ -872,19 +863,46 @@ problem has arisen from a setting like ``LC_CTYPE=UTF-8`` being provided on a
872
863
system where no ``UTF-8`` locale is defined (e.g. when a Mac OS X ssh client is
873
864
configured to forward locale settings, and the user logs into a Linux server).
874
865
875
-
Setting ``LANG`` to ``C.UTF-8`` ensures that even components that only check
876
-
the ``LANG`` fallback for their locale settings will still use ``C.UTF-8``.
866
+
This should be sufficient to ensure that when the locale coercion is activated,
867
+
the switch to the UTF-8 based locale will be applied consistently across the
868
+
current process and any subprocesses that inherit the current environment.
869
+
870
+
871
+
Avoiding setting LANG for UTF-8 locale coercion
872
+
-----------------------------------------------
873
+
874
+
Earlier versions of this PEP proposed setting the ``LANG`` category indepdent
875
+
default locale, in addition to setting ``LC_CTYPE``.
876
+
877
+
This was later removed on the grounds that setting only ``LC_CTYPE`` is
878
+
sufficient to handle all of the problematic scenarios that the PEP aimed
879
+
to resolve, while setting ``LANG`` as well would break cases where ``LANG``
880
+
was set correctly, and the locale problems were solely due to an incorrect
881
+
``LC_CTYPE`` setting ([22_]).
877
882
878
-
Together, these should ensure that when the locale coercion is activated, the
879
-
switch to the UTF-8 based locale will be applied consistently across the current
880
-
process and any subprocesses that inherit the current environment.
883
+
For example, consider a Python application that called the Linux ``date``
884
+
utility in a subprocess rather than doing its own date formatting::
885
+
886
+
$ LANG=ja_JP.UTF-8 LC_CTYPE=C date
887
+
2017年 5月 23日 火曜日 17:31:03 JST
888
+
889
+
$ LANG=ja_JP.UTF-8 LC_CTYPE=C.UTF-8 date # Coercing only LC_CTYPE
890
+
2017年 5月 23日 火曜日 17:32:58 JST
891
+
892
+
$ LANG=C.UTF-8 LC_CTYPE=C.UTF-8 date # Coercing both of LC_CTYPE and LANG
893
+
Tue May 23 17:31:10 JST 2017
894
+
895
+
With only ``LC_CTYPE`` updated in the Python process, the subprocess would
896
+
continue to behave as expected. However, if ``LANG`` was updated as well,
897
+
that would effectively override the ``LC_TIME`` setting and use the wrong
898
+
date formatting conventions.
881
899
882
900
883
901
Avoiding setting LC_ALL for UTF-8 locale coercion
884
902
-------------------------------------------------
885
903
886
904
Earlier versions of this PEP proposed setting the ``LC_ALL`` locale override,
887
-
rather than just setting ``LC_CTYPE`` and ``LANG``.
905
+
in addition to setting ``LC_CTYPE``.
888
906
889
907
This was changed after it was determined that just setting ``LC_CTYPE`` and
890
908
``LANG`` should be sufficient to handle all the scenarios the PEP aims to
@@ -1198,6 +1216,10 @@ References
1198
1216
.. [21] GNU readline misbehaviour on Mac OS X with ``LANG=C``
1199
1217
(https://mail.python.org/pipermail/python-dev/2017-May/147897.html)
1200
1218
1219
+
.. [22] Potential problems when setting LANG in addition to setting LC_CTYPE
1220
+
(https://mail.python.org/pipermail/python-dev/2017-May/147968.html)
1221
+
1222
+
1201
1223
Copyright
1202
1224
=========
1203
1225
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4