A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2016-August/145997.html below:

[Python-Dev] File system path encoding on Windows

[Python-Dev] File system path encoding on WindowsSteve Dower steve.dower at python.org
Tue Aug 30 19:27:23 EDT 2016
On 30Aug2016 1611, Victor Stinner wrote:
> 2016-08-30 23:51 GMT+02:00 Victor Stinner <victor.stinner at gmail.com>:
>> As I already wrote once, my problem is also tjat I simply have no idea how
>> much Python 3 code uses bytes filename. For example, does it concern more
>> than 25% of py3 modules on PyPi, or less than 5%?
>
> I made a very quick test on Windows using a modified Python raising an
> exception on bytes path.
>
> First of all, setuptools fails. It's a kind of blocker issue :-) I
> quickly fixed it (only one line needs to be modified).
>
> I tried to run Twisted unit tests (python -m twisted.trial twisted) of
> Twisted 16.4. I got a lot of exceptions on bytes path from the
> twisted/python/filepath.py module, but also from
> twisted/trial/util.py. It looks like these modules are doing their
> best to convert all paths to... bytes. I had to modify more than 5
> methods just to be able to start running unit tests.
>
> Quick result: setuptools and Twisted rely on bytes path. Dropping
> bytes path support on Windows breaks these modules.
>
> It also means that these modules don't support the full Unicode range
> on Windows on Python 3.5.

Thanks. That's a good idea (certainly better than mine, which was to go 
reading code...)

I haven't looked into setuptools, but Twisted appears to be correctly 
using sys.getfilesystemencoding() when they coerce to bytes, which means 
the proposed change will simply allow the full Unicode range when paths 
are encoded.

However, if there are places where bytes are not transcoded when they 
should be *then* there will be new issues. I wonder if we can quickly 
test whether that happens (e.g. use the file system encoding to "taint" 
the path somehow - special prefix? - so we can raise if bytes that 
haven't been correctly encoded at some point are passed in).

Some of my other searching revealed occasional correct use of 
sys.getfilesystemencoding(), a decent number of uses as a fallback when 
other encodings are not available, and it's very hard to search for code 
that uses the os module with bytes not checked to be the right encoding. 
This is why I argue that the beta period is the best opportunity to 
check, and why we're better to flip the switch now and flip it back if 
it all goes horribly wrong - the alternative is a *very* labour 
intensive exercise that I doubt we can muster.


More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4