On Fri, 2003-04-11 at 15:54, "Martin v. Löwis" wrote: > Barry Warsaw wrote: > > > - Set the default charset to iso-8859-1. It used to be None, which > > would cause problems with .ugettext() if the file had no charset > > parameter. Arguably, the po/mo file would be broken, but I still think > > iso-8859-1 is a reasonable default. > > I'm -1 here. Why do you think it is a reasonable default? > > Errors should never pass silently. > Unless explicitly silenced. > > While iso-8859-1 might be a reasonable default in other application > domains, in the context of non-English text (which it typically is), > assuming Latin-1 is bound to create mojibake. Okay, never mind, I'll back this one out. The problem was caused by my other patch to unicode-ify on read (see below) without first having a charset. I have a different fix for this. > > - Add a "coerce" default argument to GNUTranslations's constructor. The > > reason for this is that in Zope, we want all msgids and msgstrs to be > > Unicode. For the latter, we could use .ugettext() but there isn't > > currently a mechanism for Unicode-ifying msgids. > > Could you please in what context this is needed? msgids are ASCII, and > you can pass a Unicode string to ugettext just fine. In Zope, all strings are Unicode and the catalog may include messages that are extracted from places other than Python source code, e.g. XML-based files. Message ids can contain non-ASCII characters if they are written by a non-English coder. I think in that case, we'd want to do something like encode the strings possibly with utf-8 for the .po/.mo files, but we want them decoded in time to look the Unicode strings up in the catalog. Similarly, what happens if a non-English coder writes an i18n'd Python module with native strings, possibly using a Python 2.3 coding cookie. We'd want their message ids to be extracted into the .mo/.po files, right? > > The plan then is that the charset parameter specifies the encoding for > > both the msgids and msgstrs, and both are decoded to Unicode when read. > > For example, we might encode po files with utf-8. I think the GNU > > gettext tools don't care. > > They complain loudly if they find bytes > 127 in the msgid. Really? Ok, I'm still confused because I tried the following example: I wrote a .mo file (charset=utf-8) with the following record: #: nofile:0 msgid "ab\xc3\x9e" msgstr "\xc2\xa4yz" I used standard msgfmt to turn that into a .mo file. Then created a GNUTranslation(fp, coerce=True) and called >>> t.ugettext(u'ab\xde') u'\xa4yz' This is what I should expect, right? ;) > > - A few other minor changes from the Zope project, including asserting > > that a zero-length msgid must have a Project-ID-Version header for it to > > be counted as the metadata record. > > That test was there, and removed on request of Bruno Haible, the GNU > gettext maintainer, as he points out that Project-ID-Version is not > mandatory for the metadata (see Patch #700839). Ah, I read the diff backwards in this case. I'll back this one out too. -Barry
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4