Zooko O'Whielacronx wrote: > [snip...] > Would it be possible for Python unicode objects to have a flag > indicating whether the 'python-escape' error handler was present? That > would serve the same purpose as my "failed_decode" flag above, and would > basically allow me to use the Python APIs directory and make all this > work-around code disappear. > > Failing that, I can't see any way to use the os.listdir() in its > unicode-oriented mode to satisfy Tahoe's requirements. > > If you take the above code and then add the fact that you want to use > the failed_decode flag when *encoding* the d argument to os.listdir(), > then you get this code: [2]. > > Oh, I just realized that I *could* use the PEP 383 os.listdir(), like > this: > > def listdir(d): > fse = sys.getfilesystemencoding() > if fse == 'utf-8b': > fse = 'utf-8' > ns = [] > for fn in os.listdir(d): > bytes = fn.encode(fse, 'python-escape') > try: > ns.append(FName(bytes.decode(fse, 'strict'))) > except UnicodeDecodeError: > ns.append(FName(fn.decode('utf-8', 'python-escape'), > failed_decode=True)) > return ns > > (And I guess I could define listdir() like this only on the > non-unicode-safe platforms, as above.) > > However, that strikes me as even more horrible than the previous > "listdir()" work-around, in part because it means decoding, re-encoding, > and re-decoding every name, so I think I would stick with the previous > version. > The current unicode mode would skip the filenames you are interested (those that fail to decode correctly) - so you would have been forced to use the bytes mode. If you need access to the original bytes then you should continue to do this. PEP-383 is entirely neutral for your use case as far as I can see. Michael > Oh, one more note: for Tahoe's purposes you can, in all of the code > above, replace ".decode('utf-8', 'python-replace')" with > ".decode('windows-1252')" and it works just as well. While UTF-8b seems > like a really cool hack, and it would produce more legible results if > utf-8-encoded strings were partially corrupted, I guess I should just > use 'windows-1252' which is already implemented in Python 2 (as well as > in all other software in the world). > > I guess this means that PEP 383, which I have approved of and liked so > far in this discussion, would actually not help Tahoe at all and would > in fact harm Tahoe -- I would have to remember to detect and work-around > the automatic 'utf-8b' filesystem encoding when porting Tahoe to Python > 3. > > If anyone else has a concrete, real use case which would be helped by > PEP 383, I would like to hear about it. Perhaps Tahoe can learn > something from it. > > Oh, if this PEP could be extended to add a flag to each unicode object > indicating whether it was created with the python-escape handler or not, > then it would be useful to me. > > Regards, > > Zooko > > [1] http://mail.python.org/pipermail/python-dev/2009-April/089020.html > [2] http://allmydata.org/trac/tahoe/attachment/ticket/534/fsencode.3.py > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4