Explored the possibility of detecting Unicode arguments to open and using _wfopen on Windows NT. This led to trying to store Unicode strings in the f_name and f_mode fields of the file object which started to escalate into complexity making Mark's mbcs choice more understandable. Another approach is to use utf-8 as the Py_FileSystemDefaultEncoding and then convert to and from in each file system access function. The core file open function from fileobject.c changed to work with utf-8 is at the end of this message with the important lines in the #ifdef MS_WIN32 section. Along with that change goes a change in Py_FileSystemDefaultEncoding to be "utf-8" rather than "mbcs". This change works for me on Windows 2000 and allows access to all files no matter what the current code page is set to. On Windows 9x (not yet tested), the _wfopen call should fail causing a fallback to fopen. Possibly the OS should be detected instead and _wfopen not attempted on 9x. On 9x, mbcs may be a better choice of encoding although it may also be possible to ask the file system to find the wide character file name and return the mangled short name that can then be used by fopen. The best approach to me seems to be to make Py_FileSystemDefaultEncoding settable by the user, at least allowing the choice between 'utf-8' and 'mbcs' with a default of 'utf-8' on NT and 'mbcs' on 9x. This approach can be extended to other file system calls with, for example, os.listdir and glob.glob upon detecting a utf-8 default encoding, using wide character system calls and converting to utf-8. Please criticise any stylistic or correctness issues in the code as it is my first modification to the Python sources. Neil static PyObject * open_the_file(PyFileObject *f, char *name, char *mode) { assert(f != NULL); assert(PyFile_Check(f)); assert(name != NULL); assert(mode != NULL); assert(f->f_fp == NULL); /* rexec.py can't stop a user from getting the file() constructor -- all they have to do is get *any* file object f, and then do type(f). Here we prevent them from doing damage with it. */ if (PyEval_GetRestricted()) { PyErr_SetString(PyExc_IOError, "file() constructor not accessible in restricted mode"); return NULL; } errno = 0; #ifdef HAVE_FOPENRF if (*mode == '*') { FILE *fopenRF(); f->f_fp = fopenRF(name, mode+1); } else #endif { Py_BEGIN_ALLOW_THREADS #ifdef MS_WIN32 if (strcmp(Py_FileSystemDefaultEncoding, "utf-8") == 0) { PyObject *wname; PyObject *wmode; wname = PyUnicode_DecodeUTF8(name, strlen(name), "strict"); wmode = PyUnicode_DecodeUTF8(mode, strlen(mode), "strict"); if (wname && wmode) { f->f_fp = _wfopen(PyUnicode_AS_UNICODE(wname), PyUnicode_AS_UNICODE(wmode)); } Py_XDECREF(wname); Py_XDECREF(wmode); } if (NULL == f->f_fp) { f->f_fp = fopen(name, mode); } #else f->f_fp = fopen(name, mode); #endif Py_END_ALLOW_THREADS } if (f->f_fp == NULL) { #ifdef NO_FOPEN_ERRNO /* Metroworks only, wich does not always sets errno */ if (errno == 0) { PyObject *v; v = Py_BuildValue("(is)", 0, "Cannot open file"); if (v != NULL) { PyErr_SetObject(PyExc_IOError, v); Py_DECREF(v); } return NULL; } #endif if (errno == EINVAL) PyErr_Format(PyExc_IOError, "invalid argument: %s", mode); else PyErr_SetFromErrnoWithFilename(PyExc_IOError, name); f = NULL; } return (PyObject *)f; }
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4