M.-A. Lemburg wrote: >> On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <tjreedy at udel.edu> wrote: >>> try: >>> files = os.listdir(somedir, errors = strict) >>> except OSError as e: >>> log(<verbose error message that includes somedir and e>) >>> files = os.listdir(somedir) > If that error parameter is the same as in unicode(value, errors), > then this would be a useful feature: Except that unicode becomes str in 3.0, that is exactly my intention. > People could then choose among the already existing error handlers > ('strict', 'ignore', 'replace', 'xmlcharrefreplace') or register > their own ones via the codecs module. These could be passed through from listdir or getenv to str. [Side questions: 1. 'xmlcharrefreplace' is not in the 3.0 LibRef doc or doc string. Should it be or is 'xmlcharrefreplace' an addition for a later version. 2. A garbage value for errors (such as 'blah') is silently ignored (so I cannot test the above). Intended or a bug?] Someone else proposed a new option 'warn', which Guido has accepted to be the default instead of the current 'ignore'. It could not be passed through (unless str were changed or something registered). I believe the implementation of that would be to call str with 'strict' but catch errors and warn instead. Whether there should be 1 warning for each problematic bytes encountered or 1 for each listdir (or whatever) call, possibly with the number of problems, I leave to others to decide. > Such application specific error handlers could then also apply > whatever fancy round-trip safe encoding of non-decodable bytes > to Unicode escapes, private code points, etc. as seen fit by the > application. > > Perhaps we should also add an ''encoding'' parameter that can be > set on a per directory basis (if necessary) and defaults to the > global file system encoding. That could also be passed through, but I will lets others make the argument for it. > > If an application hits directory that is known to cause problems, > it could then chose to receive the file names in a different, > more suitable encoding. This allows implementing fallback > mechanisms with a list of common encodings for a locale. Terry Jan Reedy
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4