A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2007-June/073602.html below:

[Python-Dev] zipfile and unicode filenames

[Python-Dev] zipfile and unicode filenamesAlexey Borzenkov snaury at gmail.com
Sun Jun 10 22:26:33 CEST 2007
On 6/10/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> > So the general idea is that at least directory filename has some sort
> > of convention of using oem (dos, console) encoding on Windows, cp866
> > in my case. Header filenames have different encodings, and seem to be
> > ignored.
> Ok, then this is what the zipfile module should implement.

But this is only on Windows! I have no clue what's the common
situation on other OSes and don't even know how to sanely get OEM
codepage on Windows (the obvious way with ctypes.kernel32.GetOEMCP()
doesn't seem good to me).

So I guess that's bad idea anyway, maybe conforming to language bit is
better (ascii will stay ascii anyway).

What about this?

Index: Lib/zipfile.py
===================================================================
--- Lib/zipfile.py	(revision 55850)
+++ Lib/zipfile.py	(working copy)
@@ -252,6 +252,7 @@
             self.extract_version = max(45, self.extract_version)
             self.create_version = max(45, self.extract_version)

+        self._encodeFilename()
         header = struct.pack(structFileHeader, stringFileHeader,
                  self.extract_version, self.reserved, self.flag_bits,
                  self.compress_type, dostime, dosdate, CRC,
@@ -259,6 +260,16 @@
                  len(self.filename), len(extra))
         return header + self.filename + extra

+    def _encodeFilename(self):
+        if isinstance(self.filename, unicode):
+            self.filename = self.filename.encode('utf-8')
+            self.flag_bits = self.flag_bits | 0x800
+
+    def _decodeFilename(self):
+        if self.flag_bits & 0x800:
+            self.filename = self.filename.decode('utf-8')
+            self.flag_bits = self.flag_bits & ~0x800
+
     def _decodeExtra(self):
         # Try to decode the extra field.
         extra = self.extra
@@ -683,6 +694,7 @@
                                      t>>11, (t>>5)&0x3F, (t&0x1F) * 2 )

             x._decodeExtra()
+            x._decodeFilename()
             x.header_offset = x.header_offset + concat
             self.filelist.append(x)
             self.NameToInfo[x.filename] = x
@@ -967,6 +979,7 @@
                     extract_version = zinfo.extract_version
                     create_version = zinfo.create_version

+                zinfo._encodeFilename()
                 centdir = struct.pack(structCentralDir,
                   stringCentralDir, create_version,
                   zinfo.create_system, extract_version, zinfo.reserved,
Index: Lib/test/test_zipfile.py
===================================================================
--- Lib/test/test_zipfile.py	(revision 55850)
+++ Lib/test/test_zipfile.py	(working copy)
@@ -515,6 +515,11 @@
         # and report that the first file in the archive was corrupt.
         self.assertRaises(RuntimeError, zipf.testzip)

+    def testUnicodeFilenames(self):
+        zf = zipfile.ZipFile(TESTFN, "w")
+        zf.writestr(u"foo.txt", "Test for unicode filename")
+        zf.close()
+
     def tearDown(self):
         support.unlink(TESTFN)
         support.unlink(TESTFN2)

The problem is that I don't know if anything actually supports bit 11
at the time and can't even tell if I did this correctly or not. :(
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4