RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2011-January.txt below:

References: <1295440442.432.18.camel@marge> Message-ID: On Thu, Jan 20, 2011 at 5:16 AM, Nick Coghlan wrote: > On Thu, Jan 20, 2011 at 10:08 PM, Simon Cross > wrote: >> I'm changing my vote on this to a +1 for two reasons: >> >> * Initially I thought this wasn't supported by Python at all but I see >> that currently it is supported but that support is broken (or at least >> limited to UTF-8 filesystem encodings). Since support is there, might >> as well make it better (especially if it tidies up the code base at >> the same time). >> >> * I still don't think it's a good idea to give modules non-ASCII names >> but the "consenting adults" approach suggests we should let people >> shoot themselves in the foot if they believe they have good reason to >> do so. > > I'm also +1 on this for the reasons Simon gives. Same here. *Most* code will never be shared, or will only be shared between users in the same community. When it goes wrong it's also a learning opportunity. :-) > I should have a chance to look at the patch this weekend. -- --Guido van Rossum (python.org/~guido) From ateijelo at gmail.com Thu Jan 20 17:45:54 2011 From: ateijelo at gmail.com (Andy Teijelo) Date: Thu, 20 Jan 2011 11:45:54 -0500 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <6E9023A7-7701-4AE4-8B86-696C87A569BA@twistedmatrix.com> References: <1295440442.432.18.camel@marge> <20110119234419.GO22400@unaka.lan> <1295483161.12324.10.camel@marge> <20110120020725.GQ22400@unaka.lan> <1295491865.22752.22.camel@marge> <20110120043901.GR22400@unaka.lan> <4D37C1D9.7080801@g.nevcal.com> <5D5CC0A9-BD43-4496-872C-B70C5FC77490@twistedmatrix.com> <4D37C5CC.3040200@g.nevcal.com> <6E9023A7-7701-4AE4-8B86-696C87A569BA@twistedmatrix.com> Message-ID: <4D3866C2.2010007@gmail.com> (Hi, I'm writing from an address different to the one I'm subscribed with to the list because I don't have reverse dns in my mail server and mail.python.org rejects my messages. I hope that's not much trouble) Maybe Python should always use an ASCII encodable filename for modules: a translation of the module name into an ASCII encodable string that, preferrably, was the same as the module name if the module name didn't have any non-ASCII characters. Like, if the code said: import cafe Python would look for a file named: cafe.py but if the code said: import caf? then Python would look, in any platform, for a file named: café.py or café.py or something nicer. Something along the lines of xmlcharrefreplace. Just an idea. Andy. El 1/20/11 12:21 a.m., Glyph Lefkowitz escribi?: > > On Jan 20, 2011, at 12:19 AM, Glenn Linderman wrote: > >> Now if the stuff after m_ was the hex UTF-8 of "caf?", that could get >> interesting :) > > (As it happens, it's the hex digest of the MD5 of the UTF-8 of caf?... ;-)) > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/andy%40lists.teijelo.net From a.badger at gmail.com Thu Jan 20 18:44:39 2011 From: a.badger at gmail.com (Toshio Kuratomi) Date: Thu, 20 Jan 2011 09:44:39 -0800 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <1295524289.2016.116.camel@marge> References: <1295440442.432.18.camel@marge> <20110119234419.GO22400@unaka.lan> <1295483161.12324.10.camel@marge> <20110120020725.GQ22400@unaka.lan> <1295491865.22752.22.camel@marge> <20110120043901.GR22400@unaka.lan> <1295524289.2016.116.camel@marge> Message-ID: <20110120174439.GT22400@unaka.lan> On Thu, Jan 20, 2011 at 12:51:29PM +0100, Victor Stinner wrote: > Le mercredi 19 janvier 2011 ? 20:39 -0800, Toshio Kuratomi a ?crit : > > Teaching students to write non-portable code (relying on filesystem encoding > > where your solution is, don't upload to pypi anything that has non-ascii > > filenames) seems like the exact opposite of how you'd want to shape a young > > student's understanding of good programming practices. > > That was already discuted before: see PEP 3131. > http://www.python.org/dev/peps/pep-3131/#common-objections > > If the teacher choose to use non-ASCII, (s)he is responsible to explain > the consequences to his/her students :-) > It's not discussed in that PEP section. The PEP section says this: "People claim that they will not be able to use a library if to do so they have to use characters they cannot type on their keyboards." Whether you can type it at your keyboard or not is not the problem here. The problem is portability. The students and professors are sharing code with each other. But because of a mixture of operating systems (let alone locale settings), the code written by one partner is unable to run on the computer of the other. If non-ascii filenames without a defined encoding are considered a feature, python cannot even issue a descriptive error when this occurs. It can only say that it could not find the module but not why. A restriction on module names to ascii only could actually state that module names are not allowed to be non-ASCII when it encounters the import line. > > > In a school, you can use the same configuration > > > (encoding) on all computers. > > > > > In a school computer lab perhaps. But not on all the students' and > > professors' machines. How many professors will be cursing python when they > > discover that the example code that they wrote on their Linux workstation > > doesn't work when the students try to use it in their windows computer lab? > > Because some students use a stupid or misconfigured OS, Python should > only accept ASCII names? Just a note -- you'll get much farther if you refrain from calling names. It just makes me think that you aren't reading and understanding the issue I'm raising. My examples that you're replying to involve two "properly configured" OS's. The Linux workstations are configured with a UTF-8 locale. The Windows OS's use wide character unicode. The problem occurs in that the code that one of the parties develops (either the students or the professors) is developed on one of those OS's and then used on the other OS. > So, why do Python 3 support non-ASCII > filenames: it is very well known that non-ASCII filenames is the root in > many troubles! Should we simply drop unicode support for all filenames? > And maybe restrict bytes filenames to bytes in [0; 127]? Or better, > restrict to [32; 126] (U+007f causes some troubles in some terminals). > If you want to argue that because python3 supports non-ascii filenames in other code, then the logical extension is that the import mechanism should support importing module names defined by byte sequences. I happen to think that import has a lot of differences between it and other filenames as I've said three times now. > I think that in 2011, non-ASCII filenames are well supported on all > (modern) operating systems. Issues with non-ASCII filenames are OS > specific and should be fixed by the user (the admin of the computer). > > > Additionally, those other filesystem operations have > > been growing the ability to take byte values and encoding parameters because > > unicode translation via a single filesystem encoding is a good default but > > not a complete solution. > > If you are unable to configure correctly your system to decode/encode > correctly filenames, you should just avoid non-ASCII characters in the > module names. > This seems like an argument to only have unicode versions of all filesystem operations. Since you've been spearheading the effort to have bytes versions of things that access filenames, environment variables, etc, I don't think that you seriously mean that. Perhaps there is a language issue here. > You only give theorical arguments: did you at least try to use non-ASCII > module names on your system with Python 3.2? I suppose that it will just > work and you will never notice that the unicode module name (on "import > caf?") in encoded to bytes. > Yes I did and I got it to fail a cornercase as I showed twice with the same example in other posts. However, I want to make clear here that the issue is not that I can create a non-ascii filename and then import it. The issue is that I can create a non-ascii filename and then try to share it with the usual tools and it won't work on the recipient's system. (A tangent is whether the recipient's system is physically distinct from mine or only has a different environment on the same physical host.) > It fails on on OSes using filesystem encodings other than UTF-8 (eg. > Windows)... because of a Python bug, and I just asked if I have to fix > this bug (or if we should deny non-ASCII names). If the bug is fixed, it > will works everywhere. > I understand that your patch allows non-ASCII names to work on Windows. My issue is that non-ASCII names have ramifications beyond just, "works on Windows" "works on Linux". There's also the question of whether it works when you transfer modules between OS's. > > Your solution creates modules which aren't portable > > More and more operating systems use a filesystem encoding able to encode > any Unicode characters. ASCII-only always give you the best portability, > but I think that today you can start to play with (at least) ISO-8859-1 > characters (caf? should work on all operating systems). If you don't > Unicode issues (I personally love them!), just use ASCII everywhere. > I'd be happy to agree with your enthusiasm for unicode characters if your patch included a method to preserve portability between operating systems. > > One of my proposals creates python code which isn't portable. The other one > > suffers some of the same disadvantages as your solution in portability but > > allows for tools that could automatically correct modules. > > __import__('caf?'.encode('UTF-8')) or > __import__('caf?'.encode('ISO-8859-1')) is less portable than > __import__('caf?'). > Yep, this method is just as unportable as yours as I said in an anlysis in a previous post. The other method is the one that's more portable but has painful drawbacks. (Also note that your example above ignores one of the differences between import and open() that I mentioned in a previous post: import assigns the module to a name automatically whereas open() [like__import__()] makes the programmer assign the name) > > You think that if a module is named appropriately on one system but is not portable to another > > system, that's fine. > > No, I am not saying that. > > I say that if your name is broken while you transfer your project from a > system to another (eg. decompressing an archive creates filenames with > mojibake in the filenames), you should fix your transfer procedure (eg. > use another archive format, use a script to fix filenames, or anything > else), but don't try to handle invalid filenames. > So here's a revised summary: A module being able to be imported by the module author is of primary importance. Portability of modules relies upon third party tool support. Lacking that support, the modules may not be portable. > > Setting system locale to ASCII for use in system-wide scripts > > This is stupid :-) Yes, on such system you, cannot open *any* non-ASCII > file with Python 3 (except if you work, as Python 2, on bytes > filenames). > > Python cannot do anything to improve Unicode support on such system: > only the administrator have to something to do for that. > Python supports open() with a bytes argument for this reason. import does not support such a thing (and I think it would be more wrong for import to do so). > I know that you can give me many examples of systems where Unicode > doesn't work because the system is not correctly configured. But my > opinion is that we should support non-ASCII names because there are > somewhere "some" systems where Unicode is fully functionnal :-) > Comments like these make me think that you aren't understanding me which just makes me frustrated with you. OTOH, if you could acknowledge the points that I'm making and simply disagree with the relative merits of them then we could simply agree to disagree. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From alexander.belopolsky at gmail.com Thu Jan 20 19:02:28 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 20 Jan 2011 13:02:28 -0500 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <4D3866C2.2010007@gmail.com> References: <1295440442.432.18.camel@marge> <20110119234419.GO22400@unaka.lan> <1295483161.12324.10.camel@marge> <20110120020725.GQ22400@unaka.lan> <1295491865.22752.22.camel@marge> <20110120043901.GR22400@unaka.lan> <4D37C1D9.7080801@g.nevcal.com> <5D5CC0A9-BD43-4496-872C-B70C5FC77490@twistedmatrix.com> <4D37C5CC.3040200@g.nevcal.com> <6E9023A7-7701-4AE4-8B86-696C87A569BA@twistedmatrix.com> <4D3866C2.2010007@gmail.com> Message-ID: On Thu, Jan 20, 2011 at 11:45 AM, Andy Teijelo wrote: .. > but if the code said: > > import caf? > > then Python would look, in any platform, for a file named: > > café.py ?or ?café.py ?or something nicer. > > Something along the lines of xmlcharrefreplace. > Just an idea. Curiously, something like this already happens on OSX when filename is not valid UTF-8. For example, >>> open(b'\xdb\xcd', 'w').close() >>> open(b'\xdb\xcd') <_io.TextIOWrapper name=b'\xdb\xcd' mode='r' encoding='UTF-8'> but the actual file created is named "%DB%CD". (Looks like URL-encoding). From alexander.belopolsky at gmail.com Thu Jan 20 19:43:03 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 20 Jan 2011 13:43:03 -0500 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <20110120174439.GT22400@unaka.lan> References: <1295440442.432.18.camel@marge> <20110119234419.GO22400@unaka.lan> <1295483161.12324.10.camel@marge> <20110120020725.GQ22400@unaka.lan> <1295491865.22752.22.camel@marge> <20110120043901.GR22400@unaka.lan> <1295524289.2016.116.camel@marge> <20110120174439.GT22400@unaka.lan> Message-ID: On Thu, Jan 20, 2011 at 12:44 PM, Toshio Kuratomi wrote: > ..?My examples that you're replying to involve two "properly > configured" OS's. ?The Linux workstations are configured with a UTF-8 > locale. ?The Windows OS's use wide character unicode. ?The problem occurs in > that the code that one of the parties develops (either the students or the > professors) is developed on one of those OS's and then used on the other OS. > I re-read your posts on this thread, but could not find the examples that you refer to. ISTM, your hypothetical students should have no problem as long as their professor uses proper tools to package her code. For example, if she uses a recent version of zip that supports the Info-ZIP Unicode Comment Extra Field (see http://www.pkware.com/documents/casestudies/APPNOTE.TXT) and students use similarly up to date unzip tool, the shared code should work as expected. Similarly, I would be surprised if Samba server would not be able to present a shared Linux partition that uses UTF-8 encoding to a Windows client in a way that will make wopen() work as expected. The problem with current Python import mechanism is that it does not use wopen() on Windows and instead, attempts to encode Unicode module name into a mythical single-byte filesystem encoding (locale ANSI code page?) and calls byte-oriented open(char *) on the result. From brett at python.org Thu Jan 20 19:42:19 2011 From: brett at python.org (Brett Cannon) Date: Thu, 20 Jan 2011 10:42:19 -0800 Subject: [Python-Dev] [Python-checkins] devguide: Short doc about where to get tech help related to developing Python. In-Reply-To: References: Message-ID: On Wed, Jan 19, 2011 at 15:21, Sandro Tosi wrote: > Hi, > > On Wed, Jan 19, 2011 at 23:19, brett.cannon wrote: >> +Where to Get Help >> +================= >> +If you are working on Python it is very possible you will come across an issue >> +where you need some assistance in solving (this happens to core developers all >> +the time). You have a couple of options depending on what kind of help you need. >> +If the question involves process or tool usage then please check the developer's >> +guide first as is should answer your question. > > as it should > >> +Filing a Bug >> +------------ >> +If you come across an odd error message that seems like a bug, then file a bug >> +on the `issue tracker`_. In the bug you can explain that you are not sure why >> +the error is coming up or that the exact nature of the problem is. Someone will > > ...or what the exact...? > >> +Asking a Technical Question >> +--------------------------- >> +You have two avenues of communication out of the :ref:`myriad of options >> +available `. If you are comfortable with IRC you can try asking >> +in #python-dev. Typically there are a couple of experienced developers, ranging >> +from triagers to core developers, who can ask questions about developing for > > who can answer questions They can ask as well. =) Anyway, all changes coming in the next push. > > Cheers, > -- > Sandro Tosi (aka morph, morpheus, matrixhasu) > My website: http://matrixhasu.altervista.org/ > Me at Debian: http://wiki.debian.org/SandroTosi > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org > From brett at python.org Thu Jan 20 19:43:37 2011 From: brett at python.org (Brett Cannon) Date: Thu, 20 Jan 2011 10:43:37 -0800 Subject: [Python-Dev] [Python-checkins] devguide: Move Misc/maintainers.rst here and rename to experts.rst. In-Reply-To: References: Message-ID: It's just a bit wordy. I simplified it. On Thu, Jan 20, 2011 at 01:22, Sandro Tosi wrote: > Hi, > > On Thu, Jan 20, 2011 at 04:56, brett.cannon wrote: >> +Unless a name is followed by a '*', you should never assign an issue to >> +that person, only make them nosy. ?Names followed by a '*' may be assigned >> +issues involving the module or topic for which the name has a '*'. > > isn't last sentence a bit weird? I'm not native but "Names followed by > a '*' may issues assigned for the modules...." be a bit better? ok, > fairly minor you can also ignore it :) > > Cheers, > -- > Sandro Tosi (aka morph, morpheus, matrixhasu) > My website: http://matrixhasu.altervista.org/ > Me at Debian: http://wiki.debian.org/SandroTosi > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org > From a.badger at gmail.com Thu Jan 20 20:27:43 2011 From: a.badger at gmail.com (Toshio Kuratomi) Date: Thu, 20 Jan 2011 11:27:43 -0800 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: References: <20110119234419.GO22400@unaka.lan> <1295483161.12324.10.camel@marge> <20110120020725.GQ22400@unaka.lan> <1295491865.22752.22.camel@marge> <20110120043901.GR22400@unaka.lan> <1295524289.2016.116.camel@marge> <20110120174439.GT22400@unaka.lan> Message-ID: <20110120192743.GU22400@unaka.lan> On Thu, Jan 20, 2011 at 01:43:03PM -0500, Alexander Belopolsky wrote: > On Thu, Jan 20, 2011 at 12:44 PM, Toshio Kuratomi wrote: > > ..?My examples that you're replying to involve two "properly > > configured" OS's. ?The Linux workstations are configured with a UTF-8 > > locale. ?The Windows OS's use wide character unicode. ?The problem occurs in > > that the code that one of the parties develops (either the students or the > > professors) is developed on one of those OS's and then used on the other OS. > > > > I re-read your posts on this thread, but could not find the examples > that you refer to. > Examples might be a bad word in this context. Victor was commenting on the two brainstorm ideas for alternatives to ascii-only that I had. One was: * Mandate that every python module on a platform has a specific encoding (rather than the value of the locale) The other was: * allow using byte strings for import I think that both ideas are inferior to mandating that every python module filename is ascii. From what I'm getting from Victor's posts is that he, at least, considers the portability problems to be ignorable because dealing with ambiguous file name encodings is something that he'd like to force third party tools to deal with. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From brett at python.org Thu Jan 20 20:43:59 2011 From: brett at python.org (Brett Cannon) Date: Thu, 20 Jan 2011 11:43:59 -0800 Subject: [Python-Dev] Moving stuff out of Misc and over to the devguide In-Reply-To: References: Message-ID: Short of moving README.coverity (I'm waiting to here back from the company), I'm done with my tweaks to the directory. On Wed, Jan 19, 2011 at 15:31, Brett Cannon wrote: > OK, here is my plan that I will implement: > > MOVE > ---------- > developers.txt > maintainers.rst > README.gdb > README.coverity > README.Emacs > > DELETE (seem way too old to still be relevant; tell me if I am wrong) > ----------- > README.OpenBSD > README.AIX > cheatsheet > > LEAVE everything else (with README properly edited and simplified to > only list files with non-obvious names) > > On Mon, Jan 17, 2011 at 12:32, Brett Cannon wrote: >> There is a bunch of stuff in Misc that probably belongs in the >> devguide (under Resources) instead of in svn. Here are the files I >> think can be moved (in order of how strongly I think they should be >> moved): >> >> PURIFY.README >> README.coverty >> README.klocwork >> README.valgrind >> Porting >> developers.txt >> maintainers.rst >> SpecialBuilds.txt >> >> Now before anyone yells "that is inconvenient", don't forget that all >> core developers can check out and edit the devguide, and that almost >> all of the files listed (SpecialBuilds.txt is the exception) are >> typically edited and viewed on their own. >> >> Anyway, if there is a file listed here you don't think should move out >> of py3k and into the devguide, speak up. >> > From skip at pobox.com Thu Jan 20 20:59:34 2011 From: skip at pobox.com (skip at pobox.com) Date: Thu, 20 Jan 2011 13:59:34 -0600 Subject: [Python-Dev] Moving stuff out of Misc and over to the devguide In-Reply-To: References: Message-ID: <19768.37926.530229.216373@montanaro.dyndns.org> Brett, I'm sure I just missed it, but where is the devguide in the Subversion tree? Thx, Skip From mal at egenix.com Thu Jan 20 21:09:47 2011 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 20 Jan 2011 21:09:47 +0100 Subject: [Python-Dev] [Python-checkins] r88127 - in python/branches/py3k/Misc: README.AIX README.OpenBSD cheatsheet In-Reply-To: <20110120193435.89D3CEEA5B@mail.python.org> References: <20110120193435.89D3CEEA5B@mail.python.org> Message-ID: <4D38968B.9060103@egenix.com> brett.cannon wrote: > Author: brett.cannon > Date: Thu Jan 20 20:34:35 2011 > New Revision: 88127 > > Log: > Remove some outdated files from Misc. > > Removed: > python/branches/py3k/Misc/README.AIX Are you sure that the AIX README is outdated ? It explains some of the details of why there are scripts like ld_so_aix which are still needed on AIX. > python/branches/py3k/Misc/README.OpenBSD Same here. Does OpenBSD 4.x still have the issues mentioned in the file. > python/branches/py3k/Misc/cheatsheet Wouldn't it be better to update this useful file (as part of your PSF grant) ? Most of it still applies to Py3. Regarding some other things you removed or moved: > D SVN-Python3/Misc/maintainers.rst > D SVN-Python3/Misc/developers.txt Why were these removed from the source archive ? They are useful to have around for users wanting to report bugs and are useful to follow the development of the core team between different Python versions. > D SVN-Python3/Misc/python-mode.el Why is this gone ? It's a useful file for Emacs users and usually more recent than what you get with your Emacs installation. > D SVN-Python3/Misc/AIX-NOTES I guess this was renamed to README.AIX before you removed it. See above. > D SVN-Python3/Misc/PURIFY.README Why is this outdated ? Should probably be renamed to README.Purify. > D SVN-Python3/Misc/RFD That's a piece of Python history. These nuggets should stay in the Python source archive, IMHO. > D SVN-Python3/Misc/setuid-prog.c This is useful for people writing setuid programs in Python and avoids many of the usual pitfalls: http://mail.python.org/pipermail/python-list/1999-April/620658.html -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 20 2011) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From brett at python.org Thu Jan 20 21:11:58 2011 From: brett at python.org (Brett Cannon) Date: Thu, 20 Jan 2011 12:11:58 -0800 Subject: [Python-Dev] Moving stuff out of Misc and over to the devguide In-Reply-To: <19768.37926.530229.216373@montanaro.dyndns.org> References: <19768.37926.530229.216373@montanaro.dyndns.org> Message-ID: It's not in the svn tree; it's an Hg repo: ssh://hg at hg.python.org/devguide . The link is also listed in the Resources section of the devguide. On Thu, Jan 20, 2011 at 11:59, wrote: > > Brett, > > I'm sure I just missed it, but where is the devguide in the Subversion tree? > > Thx, > > Skip > From solipsis at pitrou.net Thu Jan 20 21:23:21 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 20 Jan 2011 21:23:21 +0100 Subject: [Python-Dev] [Python-checkins] r88127 - in python/branches/py3k/Misc: README.AIX README.OpenBSD cheatsheet References: <20110120193435.89D3CEEA5B@mail.python.org> <4D38968B.9060103@egenix.com> Message-ID: <20110120212321.385b4690@pitrou.net> On Thu, 20 Jan 2011 21:09:47 +0100 "M.-A. Lemburg" wrote: > brett.cannon wrote: > > Author: brett.cannon > > Date: Thu Jan 20 20:34:35 2011 > > New Revision: 88127 > > > > Log: > > Remove some outdated files from Misc. > > > > Removed: > > python/branches/py3k/Misc/README.AIX > > Are you sure that the AIX README is outdated ? It explains some > of the details of why there are scripts like ld_so_aix which are > still needed on AIX. If someone wants to contribute an up-to-date version they're welcome. The version which has been deleted was totally obsolete. http://bugs.python.org/issue10709 From glyph at twistedmatrix.com Thu Jan 20 21:27:08 2011 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Thu, 20 Jan 2011 15:27:08 -0500 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: References: <1295440442.432.18.camel@marge> Message-ID: On Jan 20, 2011, at 11:46 AM, Guido van Rossum wrote: > On Thu, Jan 20, 2011 at 5:16 AM, Nick Coghlan wrote: >> On Thu, Jan 20, 2011 at 10:08 PM, Simon Cross >> wrote: >>> I'm changing my vote on this to a +1 for two reasons: >>> >>> * Initially I thought this wasn't supported by Python at all but I see >>> that currently it is supported but that support is broken (or at least >>> limited to UTF-8 filesystem encodings). Since support is there, might >>> as well make it better (especially if it tidies up the code base at >>> the same time). >>> >>> * I still don't think it's a good idea to give modules non-ASCII names >>> but the "consenting adults" approach suggests we should let people >>> shoot themselves in the foot if they believe they have good reason to >>> do so. >> >> I'm also +1 on this for the reasons Simon gives. > > Same here. *Most* code will never be shared, or will only be shared > between users in the same community. When it goes wrong it's also a > learning opportunity. :-) Despite my usual proclivity for being contrarian, I find myself in agreement here. Linux users with locales that don't specify UTF-8 frankly _should_ have to deal with all kinds of nastiness until they can transcode their filesystems. MacOS and Windows both have a "right" answer here and your third-party tools shouldn't create mojibake in your filenames. However, I feel that we should not necessarily be making non-ASCII programmers second-class citizens, if they are to be supported at all. The obvious outcome of the current regime is, if you want your code to work in the wider world, you have to make everything ASCII, so non-ASCII programmers have to do a huge amount of extra work to prepare their stuff for distribution. As an english speaker I'd be happy about that, but as a person with a lot of Chinese in-laws, it gives me pause. There is a difference between sharing code for inspection and editing (where a little codec pain is good for the soul: set your locale to UTF-8 and forget it already!) and sharing code so that a (non-programming) user can just run it. If I can write software in English and distribute it to Chinese people, fair's fair, they should be able to write it in chinese and have it work on my computer. To support the latter, could we just make sure that zipimport has a consistent, non-locale-or-operating-system-dependent interpretation of encoding? That way a distributed egg would be importable from a zipfile regardless of how screwed up the distribution target machine's filesystem is. (And this is yet more motivation for distributors to set zip_safe=True.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Jan 20 21:37:14 2011 From: brett at python.org (Brett Cannon) Date: Thu, 20 Jan 2011 12:37:14 -0800 Subject: [Python-Dev] [Python-checkins] r88127 - in python/branches/py3k/Misc: README.AIX README.OpenBSD cheatsheet In-Reply-To: <4D38968B.9060103@egenix.com> References: <20110120193435.89D3CEEA5B@mail.python.org> <4D38968B.9060103@egenix.com> Message-ID: On Thu, Jan 20, 2011 at 12:09, M.-A. Lemburg wrote: > brett.cannon wrote: >> Author: brett.cannon >> Date: Thu Jan 20 20:34:35 2011 >> New Revision: 88127 >> >> Log: >> Remove some outdated files from Misc. >> >> Removed: >> ? ?python/branches/py3k/Misc/README.AIX > > Are you sure that the AIX README is outdated ? It explains some > of the details of why there are scripts like ld_so_aix which are > still needed on AIX. > I asked earlier if anyone thought they were not and no one spoke up. Same goes for README.OpenBSD. >> ? ?python/branches/py3k/Misc/README.OpenBSD > > Same here. Does OpenBSD 4.x still have the issues mentioned in the > file. > >> ? ?python/branches/py3k/Misc/cheatsheet > > Wouldn't it be better to update this useful file (as part of your > PSF grant) ? Most of it still applies to Py3. That file was not even updated to cover context managers and the 'with' keyword so it's been outdated for years and for at least a couple of releases now. If no one has cared to update it for the last two releases of Python 2.x I don't see a point in my spending time doing an update, especially considering it is a duplicate of official docs which is just asking for maintenance trouble. > > Regarding some other things you removed or moved: > >> D ? ?SVN-Python3/Misc/maintainers.rst >> D ? ?SVN-Python3/Misc/developers.txt > > Why were these removed from the source archive ? They are useful > to have around for users wanting to report bugs and are useful > to follow the development of the core team between different > Python versions. They are in the devguide now. > >> D ? ?SVN-Python3/Misc/python-mode.el > > Why is this gone ? It's a useful file for Emacs users and usually > more recent than what you get with your Emacs installation. Barry removed that (I think) two months ago; I was simply updating the README to reflect the actual state of the directory. > >> D ? ?SVN-Python3/Misc/AIX-NOTES > > I guess this was renamed to README.AIX before you removed it. > See above. > >> D ? ?SVN-Python3/Misc/PURIFY.README > > Why is this outdated ? > Should probably be renamed to README.Purify. Because Barry said it was considering it contained an email that has not worked in a decade. > >> D ? ?SVN-Python3/Misc/RFD > > That's a piece of Python history. These nuggets should stay > in the Python source archive, IMHO. Once again, it was already not there and this is just a cleanup of the file; I didn't delete it. > >> D ? ?SVN-Python3/Misc/setuid-prog.c > > This is useful for people writing setuid programs in Python and > avoids many of the usual pitfalls: Another cleanup of the file. -Brett > > http://mail.python.org/pipermail/python-list/1999-April/620658.html > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source ?(#1, Jan 20 2011) >>>> Python/Zope Consulting and Support ... ? ? ? ?http://www.egenix.com/ >>>> mxODBC.Zope.Database.Adapter ... ? ? ? ? ? ? http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... ? ? ? ?http://python.egenix.com/ > ________________________________________________________________________ > > ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: > > > ? eGenix.com Software, Skills and Services GmbH ?Pastor-Loeh-Str.48 > ? ?D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > ? ? ? ? ? Registered at Amtsgericht Duesseldorf: HRB 46611 > ? ? ? ? ? ? ? http://www.egenix.com/company/contact/ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org > From nyamatongwe at gmail.com Thu Jan 20 21:47:32 2011 From: nyamatongwe at gmail.com (Neil Hodgson) Date: Fri, 21 Jan 2011 07:47:32 +1100 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <20110120174439.GT22400@unaka.lan> References: <1295440442.432.18.camel@marge> <20110119234419.GO22400@unaka.lan> <1295483161.12324.10.camel@marge> <20110120020725.GQ22400@unaka.lan> <1295491865.22752.22.camel@marge> <20110120043901.GR22400@unaka.lan> <1295524289.2016.116.camel@marge> <20110120174439.GT22400@unaka.lan> Message-ID: Toshio Kuratomi: > My examples that you're replying to involve two "properly > configured" OS's. ?The Linux workstations are configured with a UTF-8 > locale. ?The Windows OS's use wide character unicode. ?The problem occurs in > that the code that one of the parties develops (either the students or the > professors) is developed on one of those OS's and then used on the other OS. This implies a symmetric issue,. but I can not see how there can be a problem with non-ASCII module names on Windows as the file system allows all Unicode characters so can represent any module name. OS X is also based on Unicode file names. While it is possible to mount file systems on Windows or OS X that do not support Unicode file names these are a very unusual situation that will cause problems in other ways. Common Linux distributions like Ubuntu and Fedora now default to UTF-8 locales. The situations in which users may encounter installations that do not support Unicode file names have reduced greatly. Neil From solipsis at pitrou.net Thu Jan 20 21:55:22 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 20 Jan 2011 21:55:22 +0100 Subject: [Python-Dev] Import and unicode: part two References: <1295440442.432.18.camel@marge> Message-ID: <20110120215522.61429d36@pitrou.net> On Thu, 20 Jan 2011 15:27:08 -0500 Glyph Lefkowitz wrote: > > To support the latter, could we just make sure that zipimport has a consistent, > non-locale-or-operating-system-dependent interpretation of encoding? It already has, but it's dependent on a flag in the zip file itself (actually, one flag per archived file in the zip it seems). (by the way, it would be nice if your text/mail editor wrapped lines at 80 characters or something) Regards Antoine. From v+python at g.nevcal.com Thu Jan 20 21:59:15 2011 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 20 Jan 2011 12:59:15 -0800 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: References: <1295440442.432.18.camel@marge> Message-ID: <4D38A223.1020908@g.nevcal.com> On 1/20/2011 12:27 PM, Glyph Lefkowitz wrote: > To support the latter, could we just make sure that zipimport has a > consistent, non-locale-or-operating-system-dependent interpretation of > encoding? That way a distributed egg would be importable from a zipfile > regardless of how screwed up the distribution target machine's > filesystem is. (And this is yet more motivation for distributors to set > zip_safe=True.) I guess zip_safe is a distutils thing, and I haven't (yet) used distutils. But regarding zip files, I was trying to figure out if ZipFile module supported the CP437/UTF-8 flag, but its documentation seems to predate that concept, and just talks about unencoded byte streams. Yet, I think I have Python3 code that passes str to the filenames, and that works, so some amount of encoding and decoding to something must be happening behind the documentation's back? It does seem that if a ZipFile is created with the UTF-8 flag turned on, that Python should respect that, and that should be independent of the file system configured encoding on the local machine on which the ZipFile is used (as long as the name of the ZipFile is usable). I do know that listing filenames from a zip file created without the UTF-8 flag, using ZipFile to access it and place the names inside a web page that specifies its encoding to be UTF-8 produces illegal characters, so I've become tuned in recently to the zip files do have such a flag, and have been learning the right options to turn it on for the command line tools I use to create zip files... but was surprised when investigating the same for ZipFile. From sandro.tosi at gmail.com Thu Jan 20 22:06:53 2011 From: sandro.tosi at gmail.com (Sandro Tosi) Date: Thu, 20 Jan 2011 22:06:53 +0100 Subject: [Python-Dev] [Python-checkins] devguide: Move Misc/README.Emacs to here. In-Reply-To: References: Message-ID: Hi, On Thu, Jan 20, 2011 at 20:33, brett.cannon wrote: > +.. > + ? Local Variables: > + ? mode: indented-text > + ? indent-tabs-mode: nil > + ? sentence-end-double-space: t > + ? fill-column: 78 > + ? coding: utf-8 > + ? End: maybe this can be removed now Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From alexander.belopolsky at gmail.com Thu Jan 20 22:09:58 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 20 Jan 2011 16:09:58 -0500 Subject: [Python-Dev] [Python-checkins] r88127 - in python/branches/py3k/Misc: README.AIX README.OpenBSD cheatsheet In-Reply-To: References: <20110120193435.89D3CEEA5B@mail.python.org> <4D38968B.9060103@egenix.com> Message-ID: On Thu, Jan 20, 2011 at 3:37 PM, Brett Cannon wrote: > On Thu, Jan 20, 2011 at 12:09, M.-A. Lemburg wrote: .. >>> ? ?python/branches/py3k/Misc/cheatsheet >> >> Wouldn't it be better to update this useful file (as part of your >> PSF grant) ? Most of it still applies to Py3. > > That file was not even updated to cover context managers and the > 'with' keyword so it's been outdated for years and for at least a > couple of releases now. If no one has cared to update it for the last > two releases of Python 2.x I don't see a point in my spending time > doing an update, especially considering it is a duplicate of official > docs which is just asking for maintenance trouble. > You should probably close issue4819 with "won't fix" in this case. I am with MAL on this one, though. I don't think equivalent presentation is duplicated anywhere in the docs. It would be better to have it updated and moved to Doc. From brett at python.org Thu Jan 20 22:13:46 2011 From: brett at python.org (Brett Cannon) Date: Thu, 20 Jan 2011 13:13:46 -0800 Subject: [Python-Dev] [Python-checkins] r88127 - in python/branches/py3k/Misc: README.AIX README.OpenBSD cheatsheet In-Reply-To: References: <20110120193435.89D3CEEA5B@mail.python.org> <4D38968B.9060103@egenix.com> Message-ID: On Thu, Jan 20, 2011 at 13:09, Alexander Belopolsky wrote: > On Thu, Jan 20, 2011 at 3:37 PM, Brett Cannon wrote: >> On Thu, Jan 20, 2011 at 12:09, M.-A. Lemburg wrote: > .. >>>> ? ?python/branches/py3k/Misc/cheatsheet >>> >>> Wouldn't it be better to update this useful file (as part of your >>> PSF grant) ? Most of it still applies to Py3. >> >> That file was not even updated to cover context managers and the >> 'with' keyword so it's been outdated for years and for at least a >> couple of releases now. If no one has cared to update it for the last >> two releases of Python 2.x I don't see a point in my spending time >> doing an update, especially considering it is a duplicate of official >> docs which is just asking for maintenance trouble. >> > > You should probably close issue4819 with "won't fix" in this case. > > > I am with MAL on this one, though. ?I don't think equivalent > presentation is duplicated anywhere in the docs. ?It would be better > to have it updated and moved to Doc. > If someone wants to update I'm not objecting, I'm just saying I view getting the devguide done and moving on to the Python 2 -> 3 porting guide more important. From g.brandl at gmx.net Thu Jan 20 22:24:32 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 20 Jan 2011 22:24:32 +0100 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <1295524289.2016.116.camel@marge> References: <1295440442.432.18.camel@marge> <20110119234419.GO22400@unaka.lan> <1295483161.12324.10.camel@marge> <20110120020725.GQ22400@unaka.lan> <1295491865.22752.22.camel@marge> <20110120043901.GR22400@unaka.lan> <1295524289.2016.116.camel@marge> Message-ID: Am 20.01.2011 12:51, schrieb Victor Stinner: > You only give theorical arguments Read Anathem lately? ;) Georg From ncoghlan at gmail.com Fri Jan 21 01:00:14 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 21 Jan 2011 10:00:14 +1000 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <20110120192743.GU22400@unaka.lan> References: <20110119234419.GO22400@unaka.lan> <1295483161.12324.10.camel@marge> <20110120020725.GQ22400@unaka.lan> <1295491865.22752.22.camel@marge> <20110120043901.GR22400@unaka.lan> <1295524289.2016.116.camel@marge> <20110120174439.GT22400@unaka.lan> <20110120192743.GU22400@unaka.lan> Message-ID: On Fri, Jan 21, 2011 at 5:27 AM, Toshio Kuratomi wrote: > I think that both ideas are inferior to mandating that every python module > filename is ascii. ?From what I'm getting from Victor's posts is that he, at > least, considers the portability problems to be ignorable because dealing > with ambiguous file name encodings is something that he'd like to force > third party tools to deal with. I think you're starting from an incorrect premise: we *already* allow non-ASCII module names in Py3k. They just don't always work properly, hence why people are currently much, much better off using pure ASCII for their module names (as ASCII is still the lowest common denominator for internet communication). However, you are proposing that, instead of attempting to fix at least some of the cases where it doesn't work, we throw up our hands and tell people "Since some poorly configured systems have trouble with this feature, we're taking it away from everybody. Sorry if this breaks your code." While there may be situations where that's a valid approach, this isn't one of them. Yes, non-ASCII filenames are problems for all sorts of reasons (with Python's historically poor support being one of them). The idea is that we're striving to no longer be part of that problem, even if it isn't within our power to fix it entirely. Once we fix the core to handle various Unicode issues, then over time that support can ripple out through the rest of the Python ecosystem - we don't expect everything to magically "just work" as soon as the basic issue in the core is fixed. It's going to be *years* before non-ASCII file names are as portable as pure ASCII ones (it kind of reminds me of the era when you had to avoid spaces in filenames because so many applications choked on them, even after the OS had been updated to support them). As far as the question of filenames not being re-encoded properly when copied between two systems, then yes, that *is* a problem with the third party tools used to do the copying. Such tools will break any code that uses the str APIs to access the filesystem. To deal with the case of undecodable filenames that the import system skips over, it is certainly possibly that importlib or runpy (probably the former) could acquire a function that allowed a named file to imported directly (with a specific module name) rather than requiring the import system to search for it. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Fri Jan 21 01:58:54 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 21 Jan 2011 10:58:54 +1000 Subject: [Python-Dev] [Python-checkins] devguide: Copy over the dev FAQ and *only* strip out stuff covered elsewhere in the In-Reply-To: References: Message-ID: On Fri, Jan 21, 2011 at 6:42 AM, brett.cannon wrote: > brett.cannon pushed 82d3a1b694b3 to devguide: > > http://hg.python.org/devguide/rev/82d3a1b694b3 > changeset: ? 167:82d3a1b694b3 > user: ? ? ? ?Brett Cannon > date: ? ? ? ?Thu Jan 20 12:40:47 2011 -0800 > summary: > ?Copy over the dev FAQ and *only* strip out stuff covered elsewhere in the devguide. Nick Coghlan should be a happy boy after this. Yay, thanks :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ezio.melotti at gmail.com Fri Jan 21 03:31:33 2011 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Fri, 21 Jan 2011 03:31:33 +0100 Subject: [Python-Dev] [Python-checkins] r87815 - peps/trunk/pep-3333.txt In-Reply-To: <20110107153928.3CE34EE988@mail.python.org> References: <20110107153928.3CE34EE988@mail.python.org> Message-ID: On Fri, Jan 7, 2011 at 4:39 PM, phillip.eby wrote: > Author: phillip.eby > Date: Fri Jan 7 16:39:27 2011 > New Revision: 87815 > > Log: > More bytes I/O fixes > > > Modified: > peps/trunk/pep-3333.txt > > Modified: peps/trunk/pep-3333.txt > > ============================================================================== > --- peps/trunk/pep-3333.txt (original) > +++ peps/trunk/pep-3333.txt Fri Jan 7 16:39:27 2011 > @@ -310,9 +310,9 @@ > elif not headers_sent: > # Before the first output, send the stored headers > status, response_headers = headers_sent[:] = headers_set > - sys.stdout.write('Status: %s\r\n' % status) > + sys.stdout.buffer.write('Status: %s\r\n' % status) > for header in response_headers: > - sys.stdout.write('%s: %s\r\n' % header) > + sys.stdout.buffer.write('%s: %s\r\n' % header) > Also note that .buffer might not be available in some cases (i.e. when sys.stdout has been replaced with other objects). > sys.stdout.write('\r\n') > > sys.stdout.buffer.write(data) > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://mail.python.org/mailman/listinfo/python-checkins > -------------- next part -------------- An HTML attachment was scrubbed... URL: From foom at fuhm.net Fri Jan 21 04:16:36 2011 From: foom at fuhm.net (James Y Knight) Date: Thu, 20 Jan 2011 22:16:36 -0500 Subject: [Python-Dev] [Python-checkins] r87815 - peps/trunk/pep-3333.txt In-Reply-To: References: <20110107153928.3CE34EE988@mail.python.org> Message-ID: <884F4E19-1E56-44C1-9801-9128ADD99743@fuhm.net> On Jan 20, 2011, at 9:31 PM, Ezio Melotti wrote: >> Modified: peps/trunk/pep-3333.txt >> ============================================================================== >> --- peps/trunk/pep-3333.txt (original) >> +++ peps/trunk/pep-3333.txt Fri Jan 7 16:39:27 2011 >> @@ -310,9 +310,9 @@ >> elif not headers_sent: >> # Before the first output, send the stored headers >> status, response_headers = headers_sent[:] = headers_set >> - sys.stdout.write('Status: %s\r\n' % status) >> + sys.stdout.buffer.write('Status: %s\r\n' % status) >> for header in response_headers: >> - sys.stdout.write('%s: %s\r\n' % header) >> + sys.stdout.buffer.write('%s: %s\r\n' % header) > > Also note that .buffer might not be available in some cases (i.e. when sys.stdout has been replaced with other objects). Do you have a recommendation for a better way to do bytes I/O on stdin/sydout, then?...just saying that .buffer might not be available isn't a very useful comment unless there's a replacement idiom... James From foom at fuhm.net Fri Jan 21 04:25:17 2011 From: foom at fuhm.net (James Y Knight) Date: Thu, 20 Jan 2011 22:25:17 -0500 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <20110120215522.61429d36@pitrou.net> References: <1295440442.432.18.camel@marge> <20110120215522.61429d36@pitrou.net> Message-ID: <863D108E-9B6E-4F9D-8D1F-D2B87B0EAC69@fuhm.net> On Jan 20, 2011, at 3:55 PM, Antoine Pitrou wrote: > On Thu, 20 Jan 2011 15:27:08 -0500 > Glyph Lefkowitz wrote: >> >> To support the latter, could we just make sure that zipimport has a consistent, >> non-locale-or-operating-system-dependent interpretation of encoding? > > It already has, but it's dependent on a flag in the zip file itself > (actually, one flag per archived file in the zip it seems). > > (by the way, it would be nice if your text/mail editor wrapped lines at > 80 characters or something) You could complain to Apple, but it seems unlikely that they'd change it. They broke it intentionally in OSX 10.6.2 for better compatibility with MS Outlook. (for the technically inclined: It still wraps lines at 80 characters in the raw message, but it uses quoted-printable encoding to escape the line-breaks, so mail readers which decode quoted-printable but can't flow text are now S.O.L. Apple used to use the nice format=flowed standard instead.) James From ncoghlan at gmail.com Fri Jan 21 06:59:13 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 21 Jan 2011 15:59:13 +1000 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: References: <1295440442.432.18.camel@marge> Message-ID: On Fri, Jan 21, 2011 at 3:44 PM, Atsuo Ishimoto wrote: > I don't want Python to encourage people to use non-ascii module names. > Today, seeing UnicodeEncodingError is one of popular reasons for > newbies to abandon learning Python in Japan. Non-ascii module name is > an another source of confusion for newbies. > > Experienced Japanese programmers may not use non-ascii module names to > avoid encoding issues. > > But novice programmers or non-programmers willing to learn programming > with Python will wish to use Japanese module names. Their programs > will stop working if they copy them to another environment. Sooner or > later, they will see storange ImportError and will start complaining > "Python sucks! Python doesn't support Japanese!" on Twitter. > > Copying files with non-ascii file name over platform is not easy as it > sounds. What happen if I copy such files from OSX to my web hosting > server ? Results might differ depending on tools I use to copy and > platforms. These all sound like good reasons to continue to *advise* against using non-ASCII module names. But aside from that, they sound exactly like a lot of the arguments we heard when Py3k started enforcing the bytes/text distinction more rigorously: "you're going to break stuff!". Yes, we know. But if core software development components like Python don't try to improve their Unicode support, how is the situation ever going to get better? Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From tjreedy at udel.edu Fri Jan 21 07:17:07 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 21 Jan 2011 01:17:07 -0500 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <20110120174439.GT22400@unaka.lan> References: <1295440442.432.18.camel@marge> <20110119234419.GO22400@unaka.lan> <1295483161.12324.10.camel@marge> <20110120020725.GQ22400@unaka.lan> <1295491865.22752.22.camel@marge> <20110120043901.GR22400@unaka.lan> <1295524289.2016.116.camel@marge> <20110120174439.GT22400@unaka.lan> Message-ID: On 1/20/2011 12:44 PM, Toshio Kuratomi wrote: > The problem occurs in > that the code that one of the parties develops (either the students or the > professors) is developed on one of those OS's and then used on the other OS. The problem that I reported and hope will be fixed is that private code written and tested on one machine, which will never be distributed, could not be imported on the *same* machine, with nothing changed on that machine except for writing a second file that does the import. If filenames get mangled when file are transported (admittedly more likely with non-ascii chars), that is a different issue. -- Terry Jan Reedy From g.brandl at gmx.net Fri Jan 21 08:33:48 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 21 Jan 2011 08:33:48 +0100 Subject: [Python-Dev] r88121 - python/branches/py3k/Doc/whatsnew/3.2.rst In-Reply-To: <20110120090440.16F11EE98E@mail.python.org> References: <20110120090440.16F11EE98E@mail.python.org> Message-ID: Am 20.01.2011 10:04, schrieb raymond.hettinger: > +os > +-- > + > +Different operating systems use various encodings for filenames and environment > +variables. The :mod:`os` module provides two new functions, > +:func:`~os.fsencode` and :func:`~os.fsdecode`, for encoding and decoding > +filenames: > + > +>>> filename = '???????' > +>>> os.fsencode(filename) > +b'\xd1\x81\xd0\xbb\xd0\xbe\xd0\xb2\xd0\xb0\xd1\x80\xd1\x8c' > +>>> open(os.fsencode(filename)) Please do not include Cyrillic characters directly in the source -- it breaks the LaTeX PDF build. A non-ascii name from the latin-1 range should be fine. Georg From ncoghlan at gmail.com Fri Jan 21 08:55:25 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 21 Jan 2011 17:55:25 +1000 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: References: <1295440442.432.18.camel@marge> Message-ID: On Fri, Jan 21, 2011 at 4:44 PM, Atsuo Ishimoto wrote: > On Fri, Jan 21, 2011 at 2:59 PM, Nick Coghlan wrote: >> >> These all sound like good reasons to continue to *advise* against >> using non-ASCII module names. But aside from that, they sound exactly >> like a lot of the arguments we heard when Py3k started enforcing the >> bytes/text distinction more rigorously: "you're going to break >> stuff!". > > No, non-ASCII module names are new breakage you are going to introduce now :) No, they're not. Non-ASCII module names *already work* in Python 3.1 on UTF-8 filesystems. The portability problem you're complaining about exists now, and Victor is trying to at least partially alleviate it by making these filenames work correctly on more properly configured systems (such as Windows). It won't go away until all filesystem manipulation tools are properly Unicode aware, but that's no reason for us to continue to unnecessarily exacerbate the problem. Given imp_cafe.py: import caf? And caf?.py: print('Hello world from {}'.format(__name__)) I get the following result: ~$ python3.1 imp_cafe.py Hello world from caf? Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ishimoto at gembook.org Fri Jan 21 06:44:48 2011 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Fri, 21 Jan 2011 14:44:48 +0900 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: References: <1295440442.432.18.camel@marge> Message-ID: On Fri, Jan 21, 2011 at 1:46 AM, Guido van Rossum wrote: > On Thu, Jan 20, 2011 at 5:16 AM, Nick Coghlan wrote: >> On Thu, Jan 20, 2011 at 10:08 PM, Simon Cross >> wrote: >>> I'm changing my vote on this to a +1 for two reasons: >>> >>> * Initially I thought this wasn't supported by Python at all but I see >>> that currently it is supported but that support is broken (or at least >>> limited to UTF-8 filesystem encodings). Since support is there, might >>> as well make it better (especially if it tidies up the code base at >>> the same time). >>> >>> * I still don't think it's a good idea to give modules non-ASCII names >>> but the "consenting adults" approach suggests we should let people >>> shoot themselves in the foot if they believe they have good reason to >>> do so. >> >> I'm also +1 on this for the reasons Simon gives. > > Same here. *Most* code will never be shared, or will only be shared > between users in the same community. When it goes wrong it's also a > learning opportunity. :-) > I don't want Python to encourage people to use non-ascii module names. Today, seeing UnicodeEncodingError is one of popular reasons for newbies to abandon learning Python in Japan. Non-ascii module name is an another source of confusion for newbies. Experienced Japanese programmers may not use non-ascii module names to avoid encoding issues. But novice programmers or non-programmers willing to learn programming with Python will wish to use Japanese module names. Their programs will stop working if they copy them to another environment. Sooner or later, they will see storange ImportError and will start complaining "Python sucks! Python doesn't support Japanese!" on Twitter. Copying files with non-ascii file name over platform is not easy as it sounds. What happen if I copy such files from OSX to my web hosting server ? Results might differ depending on tools I use to copy and platforms. Is it a good opportunity to start learnig abound encodings? I don't think so. They should learn concepts of charater set and encodings, Unicode and JIS character sets, some kind of Japanse encodings, number of platform specifix issues, non-standard extention of Microsoft and Apple, and so on. I think they should defer learning these messes until they get ready. -- Atsuo Ishimoto Mail: ishimoto at gembook.org Blog: http://d.hatena.ne.jp/atsuoishimoto/ Twitter: atsuoishimoto From ishimoto at gembook.org Fri Jan 21 07:44:43 2011 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Fri, 21 Jan 2011 15:44:43 +0900 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: References: <1295440442.432.18.camel@marge> Message-ID: On Fri, Jan 21, 2011 at 2:59 PM, Nick Coghlan wrote: > > These all sound like good reasons to continue to *advise* against > using non-ASCII module names. But aside from that, they sound exactly > like a lot of the arguments we heard when Py3k started enforcing the > bytes/text distinction more rigorously: "you're going to break > stuff!". No, non-ASCII module names are new breakage you are going to introduce now :) If the advice against using non-ASCII module names is reasonable, why bother supporting them? > > Yes, we know. But if core software development components like Python > don't try to improve their Unicode support, how is the situation ever > going to get better? > Java, a leading language of IT industry, have already support non-ASCII class files for years. But I've never seen such files in production in Japan, and didn't improve situation until now. -- Atsuo Ishimoto Mail: ishimoto at gembook.org Blog: http://d.hatena.ne.jp/atsuoishimoto/ Twitter: atsuoishimoto From stephen at xemacs.org Fri Jan 21 09:45:44 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 21 Jan 2011 17:45:44 +0900 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: References: <1295440442.432.18.camel@marge> Message-ID: <871v467ih3.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > On Fri, Jan 21, 2011 at 3:44 PM, Atsuo Ishimoto wrote: > > I don't want Python to encourage people to use non-ascii module names. I don't think anybody is *encouraging* it. The argument is for *permitting* it, partly for consistency with other identifiers, and partly because of Python's usual "consenting adults" standard for permitting "dangerous" practices. I realize this is a somewhat problematic distinction in Japan, for several reasons, but it's really not one that can be avoided in computing in any case. The sooner novice programmers learn it, the better. > > Today, seeing UnicodeEncodingError is one of popular reasons for > > newbies to abandon learning Python in Japan. Non-ascii module name is > > an another source of confusion for newbies. > > > > Experienced Japanese programmers may not use non-ascii module names to > > avoid encoding issues. > > > > But novice programmers or non-programmers willing to learn programming > > with Python will wish to use Japanese module names. Their programs > > will stop working if they copy them to another environment. Sooner or > > later, they will see storange ImportError and will start complaining > > "Python sucks! Python doesn't support Japanese!" on Twitter. So ask them, "What language *does* 'support Japanese'?" ;-) Seriously, "support Japanese" is an impossibly hard standard in the current environment. Not only does Japan have 5 more or less standard encodings still in daily use (EUC-JP, ISO-2022-JP, Shift JIS, UTF-8, and UTF-16LE), but many major IT companies have their own variants of the JIS standard character repertoire (all of the variant ideographs I've seen in the wild are in Unicode, but many corporate repertoires add extra symbols that are not), and of course some Microsoft utilities insist on using the deprecated UTF-8 signature with UTF-8. That said, I really don't see module names as a particular problem. By the time your novice is using her own modules (as opposed to importing stdlib and PyPI add-on modules, all with ASCII-only names), she'll be doing file I/O which has all the same problems, AFAICS. True, file names will be strings rather than identifiers, but I don't see why that matters. > > Copying files with non-ascii file name over platform is not easy as it > > sounds. Agreed, it's not trivial. But it's not that hard, either[1], and web hosts and others *could* help by providing checkers for languages that they support. > > What happen if I copy such files from OSX to my web hosting > > server ? Results might differ depending on tools I use to copy and > > platforms. I don't see why this problem is specific to Python modules, as opposed to any file name. > These all sound like good reasons to continue to *advise* against > using non-ASCII module names. +1 > But aside from that, they sound exactly like a lot of the arguments > we heard when Py3k started enforcing the bytes/text distinction > more rigorously: "you're going to break stuff!". Well, not exactly. Enforcing the bytes/text distinction was a change in the definition of Python; breakage was our fault. The change was made because in the (not so) long run it would reduce new breakage. Here, Python is fine (or at least we have some pretty good ideas how to fix it), it's the world that's broken. *Especially* Japan, with its five standard encodings in daily use and scads of private variant repertoires masquerading as standard encodings on top of that. But the whole world is broken because of the NFD/NFC thing. AFAIK, the only file system that tries to enforce an NF is Mac OS X HFS+, and (unfortunately for portability *from* Mac OS X *to* other systems) they chose NFD. Proper NFD support is arguably better for a number of reasons (for one, people regularly invent new composition sequences that will not have precomposed glyphs in any font), but NFC has the advantage that existing fonts support precomposed standard characters while many display engines do not support composition properly yet. And it's likely to stay broken for a while: the move to conformant display engines is going to take more time. I still don't see this as a reason to give up on non-ASCII module names. Just have the documentation warn that many non-ASCII names will be non-portable, so use on multiple systems will require care (maybe gloss that with "probably more care than you want to take"). Footnotes: [1] I actually find copying file names with spaces to be a bigger problem, because it's so hard to get shell quoting right. From martin at v.loewis.de Fri Jan 21 10:53:33 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 21 Jan 2011 10:53:33 +0100 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: References: <1295440442.432.18.camel@marge> Message-ID: <4D39579D.4060408@v.loewis.de> > I don't want Python to encourage people to use non-ascii module names. I don't think the feature is open for debate anymore. PEP 3131 has been accepted (after *long* debates), and I'll pronounce that supporting non-ASCII module names is a direct consequence of having it accepted. Of course, there may be limitations with respect to operating systems, and in the way Python modules integrate with the file system - but that non-ASCII module names must be supported is really out of question. If you would like this to be reverted, you need to write another PEP. Regards, Martin From stephen at xemacs.org Fri Jan 21 11:42:14 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 21 Jan 2011 19:42:14 +0900 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: References: <1295440442.432.18.camel@marge> Message-ID: <87zkqu5yih.fsf@uwakimon.sk.tsukuba.ac.jp> Atsuo Ishimoto writes: > Java, a leading language of IT industry, have already support > non-ASCII class files for years. But I've never seen such files in > production in Japan, and didn't improve situation until now. So why wouldn't Python work the same way? The rest of the world can use non-ASCII modules names sparingly, and Japanese programmers can avoid them diligently. Or learn to use them properly and teach each other; if anybody has the experience of multiple encodings needed to figure out a good way to use the native language in program identifiers despite the encoding problem, my bet is it would be Japan. From solipsis at pitrou.net Fri Jan 21 12:31:29 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 21 Jan 2011 12:31:29 +0100 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <863D108E-9B6E-4F9D-8D1F-D2B87B0EAC69@fuhm.net> References: <1295440442.432.18.camel@marge> <20110120215522.61429d36@pitrou.net> <863D108E-9B6E-4F9D-8D1F-D2B87B0EAC69@fuhm.net> Message-ID: <20110121123129.3c05e129@pitrou.net> On Thu, 20 Jan 2011 22:25:17 -0500 James Y Knight wrote: > > On Jan 20, 2011, at 3:55 PM, Antoine Pitrou wrote: > > > On Thu, 20 Jan 2011 15:27:08 -0500 > > Glyph Lefkowitz wrote: > >> > >> To support the latter, could we just make sure that zipimport has a consistent, > >> non-locale-or-operating-system-dependent interpretation of encoding? > > > > It already has, but it's dependent on a flag in the zip file itself > > (actually, one flag per archived file in the zip it seems). > > > > (by the way, it would be nice if your text/mail editor wrapped lines at > > 80 characters or something) > > You could complain to Apple, but it seems unlikely that they'd change it. They broke it intentionally in OSX 10.6.2 for better compatibility with MS Outlook. > > (for the technically inclined: It still wraps lines at 80 characters in the raw message, but it uses quoted-printable encoding to escape the line-breaks, so mail readers which decode quoted-printable but can't flow text are now S.O.L. Apple used to use the nice format=flowed standard instead.) I think most mail readers are able to word-wrap raw text correctly (even though it still makes your messages look bad amongst a thread of nicely-formatted 80-column messages). The real annoyance is when reading Web archives of mailing-lists, e.g. http://twistedmatrix.com/pipermail/twisted-python/2011-January/023346.html Regards Antoine. From solipsis at pitrou.net Fri Jan 21 12:34:42 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 21 Jan 2011 12:34:42 +0100 Subject: [Python-Dev] [Python-checkins] r87815 - peps/trunk/pep-3333.txt References: <20110107153928.3CE34EE988@mail.python.org> <884F4E19-1E56-44C1-9801-9128ADD99743@fuhm.net> Message-ID: <20110121123442.65621877@pitrou.net> On Thu, 20 Jan 2011 22:16:36 -0500 James Y Knight wrote: > > On Jan 20, 2011, at 9:31 PM, Ezio Melotti wrote: > >> Modified: peps/trunk/pep-3333.txt > >> ============================================================================== > >> --- peps/trunk/pep-3333.txt (original) > >> +++ peps/trunk/pep-3333.txt Fri Jan 7 16:39:27 2011 > >> @@ -310,9 +310,9 @@ > >> elif not headers_sent: > >> # Before the first output, send the stored headers > >> status, response_headers = headers_sent[:] = headers_set > >> - sys.stdout.write('Status: %s\r\n' % status) > >> + sys.stdout.buffer.write('Status: %s\r\n' % status) > >> for header in response_headers: > >> - sys.stdout.write('%s: %s\r\n' % header) > >> + sys.stdout.buffer.write('%s: %s\r\n' % header) > > > > Also note that .buffer might not be available in some cases (i.e. when sys.stdout has been replaced with other objects). > > Do you have a recommendation for a better way to do bytes I/O on stdin/sydout, then?...just saying that .buffer might not be available isn't a very useful comment unless there's a replacement idiom... Well, this is the recommmendation. There's no reason for sys.stdXXX.buffer not to exist if you have full control over the application (which you normally have if you do CGI). Regards Antoine. From foom at fuhm.net Fri Jan 21 14:23:31 2011 From: foom at fuhm.net (James Y Knight) Date: Fri, 21 Jan 2011 08:23:31 -0500 Subject: [Python-Dev] Mail archive line wrapping (Was: Import and unicode: part two) In-Reply-To: <20110121123129.3c05e129@pitrou.net> References: <1295440442.432.18.camel@marge> <20110120215522.61429d36@pitrou.net> <863D108E-9B6E-4F9D-8D1F-D2B87B0EAC69@fuhm.net> <20110121123129.3c05e129@pitrou.net> Message-ID: On Jan 21, 2011, at 6:31 AM, Antoine Pitrou wrote: > On Thu, 20 Jan 2011 22:25:17 -0500 > James Y Knight wrote: >> >> On Jan 20, 2011, at 3:55 PM, Antoine Pitrou wrote: >>> (by the way, it would be nice if your text/mail editor wrapped lines at >>> 80 characters or something) >> >> You could complain to Apple, but it seems unlikely that they'd change it. They broke it intentionally in OSX 10.6.2 for better compatibility with MS Outlook. >> >> (for the technically inclined: It still wraps lines at 80 characters in the raw message, but it uses quoted-printable encoding to escape the line-breaks, so mail readers which decode quoted-printable but can't flow text are now S.O.L. Apple used to use the nice format=flowed standard instead.) > > I think most mail readers are able to word-wrap raw text correctly > (even though it still makes your messages look bad amongst a thread of > nicely-formatted 80-column messages). > The real annoyance is when reading Web archives of mailing-lists, e.g. > http://twistedmatrix.com/pipermail/twisted-python/2011-January/023346.html Well, yes, that's a pretty annoying bug in mailman, isn't it? If only anyone around here was involved in mailman and could fix it! :) [I've attempted to cc this to mailman-users with this message, but since I'm not subscribed I dunno if it'll make it or not.] I have this in my user CSS override file to fix the issue for myself globally on all such archives out in the world: /* Mailing list archives */ html>body>pre { white-space: pre-wrap !important; } But really, pipermail should just output a suitable style itself, e.g.:

 or a  in the header.

That's supported on all browsers since FF3.0, IE8, Safari 3, Opera 8. There are various nonstandard CSS selectors for reaching older browsers (IE5.5+, Firefox pre-1.0+, Opera 4+)...But by the time this change gets made in mailman, and released, and gets into the distros that the various list hosts around the web use, and those hosts get upgraded, I doubt anyone will actually even be able to run those old browsers anymore.

James

From solipsis at pitrou.net  Fri Jan 21 14:30:21 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 21 Jan 2011 14:30:21 +0100
Subject: [Python-Dev] Mail archive line wrapping (Was: Import and
	unicode: part two)
In-Reply-To: 
References: <1295440442.432.18.camel@marge>
	
	
	
	
	
                              
                              <20110120215522.61429d36@pitrou.net> <863D108E-9B6E-4F9D-8D1F-D2B87B0EAC69@fuhm.net> <20110121123129.3c05e129@pitrou.net> 
                              
                              Message-ID: <20110121143021.4a75954b@pitrou.net> On Fri, 21 Jan 2011 08:23:31 -0500 James Y Knight 
                              
                              wrote: > > > > I think most mail readers are able to word-wrap raw text correctly > > (even though it still makes your messages look bad amongst a thread of > > nicely-formatted 80-column messages). > > The real annoyance is when reading Web archives of mailing-lists, e.g. > > http://twistedmatrix.com/pipermail/twisted-python/2011-January/023346.html > > Well, yes, that's a pretty annoying bug in mailman, isn't it? If only anyone around here was involved in mailman and could fix it! :) [I've attempted to cc this to mailman-users with this message, but since I'm not subscribed I dunno if it'll make it or not.] Why is this a bug in mailman? Mailman archives messages as they are sent (well, perhaps it mangles e-mail addresses, perhaps). If someone draws a nice ASCII-art diagram which requires 90 columns instead of 80, you wouldn't want the archive to break its rendering. So, it's really the mail client (or its user :-)) which should handle word-wrapping, not some downstream tool which has no idea of the original intent. > I have this in my user CSS override file to fix the issue for myself globally on all such archives out in the world: > /* Mailing list archives */ > html>body>pre { white-space: pre-wrap !important; } That doesn't wrap to 80 characters, does it? Only whatever the current window/container width is, which isn't necessarily the right thing (if that makes lines 160 characters long, it's still quite uncomfortable to read). Regards Antoine. From barry at python.org Fri Jan 21 14:31:10 2011 From: barry at python.org (Barry Warsaw) Date: Fri, 21 Jan 2011 08:31:10 -0500 Subject: [Python-Dev] Mail archive line wrapping (Was: Import and unicode: part two) In-Reply-To: 
                              
                              References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110120215522.61429d36@pitrou.net> <863D108E-9B6E-4F9D-8D1F-D2B87B0EAC69@fuhm.net> <20110121123129.3c05e129@pitrou.net> 
                              
                              Message-ID: <20110121083110.5445ce58@python.org> On Jan 21, 2011, at 08:23 AM, James Y Knight wrote: >Well, yes, that's a pretty annoying bug in mailman, isn't it? If only anyone >around here was involved in mailman and could fix it! :) [I've attempted to >cc this to mailman-users with this message, but since I'm not subscribed I >dunno if it'll make it or not.] Technically, Pipermail, but jeebus how I hate hacking on that code. :) Although it's been futile for the last decade, maybe this time will work: volunteers wanted! -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: 
                              
                              From ishimoto at gembook.org Fri Jan 21 16:07:04 2011 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Sat, 22 Jan 2011 00:07:04 +0900 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <871v467ih3.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              
                              <871v467ih3.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: 
                              
                              On Fri, Jan 21, 2011 at 5:45 PM, Stephen J. Turnbull 
                              
                              wrote: > Nick Coghlan writes: > ?> On Fri, Jan 21, 2011 at 3:44 PM, Atsuo Ishimoto 
                              
                              wrote: > > ?> > I don't want Python to encourage people to use non-ascii module names. > > I don't think anybody is *encouraging* it. ?The argument is for > *permitting* it, partly for consistency with other identifiers, and > partly because of Python's usual "consenting adults" standard for > permitting "dangerous" practices. I'm sorry, I was not clear. I was afraid that saying "learning opportunity" tempt people to try non-ASCII module names. In these days, even non technical people have access to Windows, Mac and Linux boxes at a time. So chances to be annoyed with broken non-ASCII named files are pretty common. > > I still don't see this as a reason to give up on non-ASCII module > names. ?Just have the documentation warn that many non-ASCII names > will be non-portable, so use on multiple systems will require care > (maybe gloss that with "probably more care than you want to take"). > Nice gloss. -- Atsuo Ishimoto Mail: ishimoto at gembook.org Blog: http://d.hatena.ne.jp/atsuoishimoto/ Twitter: atsuoishimoto From status at bugs.python.org Fri Jan 21 18:07:04 2011 From: status at bugs.python.org (Python tracker) Date: Fri, 21 Jan 2011 18:07:04 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20110121170704.496D01CB5A@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2011-01-14 - 2011-01-21) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 2527 (+29) closed 20228 (+36) total 22755 (+65) Open issues with patches: 1062 Issues opened (44) ================== #10896: trace module compares directories as strings (--ignore-dir) http://bugs.python.org/issue10896 reopened by SilentGhost #10909: thread hang, possibly related to print http://bugs.python.org/issue10909 opened by PythonInTheGrass #10910: pyport.h FreeBSD/Mac OS X "fix" causes errors in C++ compilati http://bugs.python.org/issue10910 opened by X-Istence #10911: cgi: add more tests http://bugs.python.org/issue10911 opened by haypo #10914: Python sub-interpreter test http://bugs.python.org/issue10914 opened by pitrou #10915: Make the PyGILState API compatible with multiple interpreters http://bugs.python.org/issue10915 opened by pitrou #10919: Environment variables are not expanded in _winreg when using R http://bugs.python.org/issue10919 opened by rjnienaber #10922: Unexpected exception when calling function_proxy.__class__.__c http://bugs.python.org/issue10922 opened by DasIch #10924: Adding salt and Modular Crypt Format to crypt library. http://bugs.python.org/issue10924 opened by jafo #10925: Document pure Python version of integer-to-float correctly-rou http://bugs.python.org/issue10925 opened by mark.dickinson #10932: distutils.core.setup - data_files misbehaviour ? http://bugs.python.org/issue10932 opened by Thorsten.Simons #10933: Tracing disabled when a recursion error is triggered (even if http://bugs.python.org/issue10933 opened by fabioz #10936: Simple CSS fix for left margin at docs.python.org http://bugs.python.org/issue10936 opened by cdunn2001 #10937: WinPE 64 bit execution results with errors http://bugs.python.org/issue10937 opened by gettingback2basics #10938: Provide links to system specific strftime/ptime docs http://bugs.python.org/issue10938 opened by hdiogenes #10939: imaplib: Internaldate2tuple raises KeyError parsing month and http://bugs.python.org/issue10939 opened by lavajoe #10940: IDLE 3.2 hangs with Cmd-M hotkey on OS X 10.6 with 64-bit inst http://bugs.python.org/issue10940 opened by rhettinger #10941: imaplib: Internaldate2tuple produces wrong result if date is n http://bugs.python.org/issue10941 opened by lavajoe #10942: xml.etree.ElementTree.tostring returns type bytes, expected ty http://bugs.python.org/issue10942 opened by JTMoon79 #10945: bdist_wininst depends on MBCS codec, unavailable on non-Window http://bugs.python.org/issue10945 opened by eric.araujo #10948: Trouble with dir_util created dir cache http://bugs.python.org/issue10948 opened by diegoqueiroz #10949: logging.RotatingFileHandler not robust enough http://bugs.python.org/issue10949 opened by kalt #10951: gcc 4.6 warnings http://bugs.python.org/issue10951 opened by haypo #10952: Don't normalize module names to NFKC? http://bugs.python.org/issue10952 opened by haypo #10954: No warning for csv.writer API change http://bugs.python.org/issue10954 opened by lregebro #10955: Possible regression with stdlib in zipfile http://bugs.python.org/issue10955 opened by ronaldoussoren #10956: file.write and file.read don't handle EINTR http://bugs.python.org/issue10956 opened by eggy #10957: Python FAQ grammar error http://bugs.python.org/issue10957 opened by jerry.seutter #10960: os.stat() does not mention that it follow symlinks by default http://bugs.python.org/issue10960 opened by mmarkk #10961: Pydoc touchups in new browser for 3.2 http://bugs.python.org/issue10961 opened by ron_adam #10963: "subprocess" can raise OSError (EPIPE) when communicating with http://bugs.python.org/issue10963 opened by dmalcolm #10964: Mac installer need not add things to /usr/local http://bugs.python.org/issue10964 opened by reowen #10965: dev task of documenting undocumented APIs http://bugs.python.org/issue10965 opened by brett.cannon #10966: eliminate use of ImportError implicitly representing TestSkipp http://bugs.python.org/issue10966 opened by brett.cannon #10967: move regrtest over to using more unittest infrastructure http://bugs.python.org/issue10967 opened by brett.cannon #10968: threading.Timer should be a class so that it can be derived http://bugs.python.org/issue10968 opened by Kain94 #10969: Make Tcl recommendation more prominent http://bugs.python.org/issue10969 opened by rhettinger #10970: "string".encode('base64') is not the same as base64.b64encode( http://bugs.python.org/issue10970 opened by mahmoudimus #10971: python Lib/test/regrtest.py -R 3:3: test_zipimport_support fai http://bugs.python.org/issue10971 opened by haypo #10972: zipfile: add "unicode" option to the force the filename encodi http://bugs.python.org/issue10972 opened by haypo #10973: '??' not working with IDLE 3.2rc1 - OSX 10.6.6 http://bugs.python.org/issue10973 opened by naguilera #10943: abitype: Need better support to port C extension modules to th http://bugs.python.org/issue10943 opened by fhaxbox66 at googlemail.com #10947: imaplib: Internaldate2tuple and ParseFlags require (and latter http://bugs.python.org/issue10947 opened by lavajoe #10946: bdist doesn???t pass --skip-build on to subcommands http://bugs.python.org/issue10946 opened by eric.araujo Most recent 15 issues with no replies (15) ========================================== #10971: python Lib/test/regrtest.py -R 3:3: test_zipimport_support fai http://bugs.python.org/issue10971 #10970: "string".encode('base64') is not the same as base64.b64encode( http://bugs.python.org/issue10970 #10967: move regrtest over to using more unittest infrastructure http://bugs.python.org/issue10967 #10965: dev task of documenting undocumented APIs http://bugs.python.org/issue10965 #10960: os.stat() does not mention that it follow symlinks by default http://bugs.python.org/issue10960 #10957: Python FAQ grammar error http://bugs.python.org/issue10957 #10949: logging.RotatingFileHandler not robust enough http://bugs.python.org/issue10949 #10943: abitype: Need better support to port C extension modules to th http://bugs.python.org/issue10943 #10933: Tracing disabled when a recursion error is triggered (even if http://bugs.python.org/issue10933 #10925: Document pure Python version of integer-to-float correctly-rou http://bugs.python.org/issue10925 #10910: pyport.h FreeBSD/Mac OS X "fix" causes errors in C++ compilati http://bugs.python.org/issue10910 #10909: thread hang, possibly related to print http://bugs.python.org/issue10909 #10891: Tweak sorting howto to eliminate redundancy http://bugs.python.org/issue10891 #10886: Unhelpful backtrace for multiprocessing.Queue http://bugs.python.org/issue10886 #10885: multiprocessing docs http://bugs.python.org/issue10885 Most recent 15 issues waiting for review (15) ============================================= #10972: zipfile: add "unicode" option to the force the filename encodi http://bugs.python.org/issue10972 #10963: "subprocess" can raise OSError (EPIPE) when communicating with http://bugs.python.org/issue10963 #10961: Pydoc touchups in new browser for 3.2 http://bugs.python.org/issue10961 #10956: file.write and file.read don't handle EINTR http://bugs.python.org/issue10956 #10955: Possible regression with stdlib in zipfile http://bugs.python.org/issue10955 #10949: logging.RotatingFileHandler not robust enough http://bugs.python.org/issue10949 #10947: imaplib: Internaldate2tuple and ParseFlags require (and latter http://bugs.python.org/issue10947 #10941: imaplib: Internaldate2tuple produces wrong result if date is n http://bugs.python.org/issue10941 #10939: imaplib: Internaldate2tuple raises KeyError parsing month and http://bugs.python.org/issue10939 #10925: Document pure Python version of integer-to-float correctly-rou http://bugs.python.org/issue10925 #10924: Adding salt and Modular Crypt Format to crypt library. http://bugs.python.org/issue10924 #10922: Unexpected exception when calling function_proxy.__class__.__c http://bugs.python.org/issue10922 #10915: Make the PyGILState API compatible with multiple interpreters http://bugs.python.org/issue10915 #10914: Python sub-interpreter test http://bugs.python.org/issue10914 #10908: Improvements to trace._Ignore http://bugs.python.org/issue10908 Top 10 most discussed issues (10) ================================= #10955: Possible regression with stdlib in zipfile http://bugs.python.org/issue10955 20 msgs #3080: Full unicode import system http://bugs.python.org/issue3080 18 msgs #10952: Don't normalize module names to NFKC? http://bugs.python.org/issue10952 15 msgs #10915: Make the PyGILState API compatible with multiple interpreters http://bugs.python.org/issue10915 14 msgs #10924: Adding salt and Modular Crypt Format to crypt library. http://bugs.python.org/issue10924 13 msgs #4819: Misc/cheatsheet needs updating http://bugs.python.org/issue4819 10 msgs #10936: Simple CSS fix for left margin at docs.python.org http://bugs.python.org/issue10936 8 msgs #10948: Trouble with dir_util created dir cache http://bugs.python.org/issue10948 8 msgs #10968: threading.Timer should be a class so that it can be derived http://bugs.python.org/issue10968 8 msgs #8957: strptime(.., '%c') fails to parse output of strftime('%c', ..) http://bugs.python.org/issue8957 7 msgs Issues closed (36) ================== #2644: errors from msync ignored in mmap_object_dealloc http://bugs.python.org/issue2644 closed by brian.curtin #6075: Patch for IDLE/OS X to work with Tk-Cocoa http://bugs.python.org/issue6075 closed by ned.deily #8846: cgi.py bug report + fix: tailing carriage return and newline c http://bugs.python.org/issue8846 closed by wobsta #9532: pipe.read hang, when calling commands.getstatusoutput in multi http://bugs.python.org/issue9532 closed by r.david.murray #10238: ctypes not building under OS X 10.6 with LLVM/Clang 2.8 http://bugs.python.org/issue10238 closed by brett.cannon #10451: memoryview can be used to write into readonly buffer http://bugs.python.org/issue10451 closed by pitrou #10843: OS X installer: install the Tools source directory http://bugs.python.org/issue10843 closed by ned.deily #10874: test_urllib2 shouldn't use is operator for comparing strings http://bugs.python.org/issue10874 closed by pitrou #10887: Add link to development ML http://bugs.python.org/issue10887 closed by eric.araujo #10898: posixmodule.c redefines FSTAT http://bugs.python.org/issue10898 closed by pitrou #10903: ZipExtFile:_update_crc fails for CRC >= 0x80000000 http://bugs.python.org/issue10903 closed by arindam #10904: PYTHONIOENCODING is not in manpage http://bugs.python.org/issue10904 closed by pebbe #10906: wsgiref should mention that CGI scripts usually expect HTTPS v http://bugs.python.org/issue10906 closed by georg.brandl #10912: PyObject_RichCompare differs in behaviour from PyObject_RichCo http://bugs.python.org/issue10912 closed by eli.bendersky #10913: Deprecate PyEval_AcquireLock() and PyEval_ReleaseLock() http://bugs.python.org/issue10913 closed by pitrou #10916: mmap segfault http://bugs.python.org/issue10916 closed by pitrou #10917: PEP 333 link to CGI specification is broken http://bugs.python.org/issue10917 closed by georg.brandl #10918: **kwargs unnecessarily restricted in concurrent.futures 'submi http://bugs.python.org/issue10918 closed by bquinlan #10920: cp65001, PowerShell, Python crash. http://bugs.python.org/issue10920 closed by haypo #10921: imaplib: Internaldate2tuple() string/bytes issues, does not ha http://bugs.python.org/issue10921 closed by lavajoe #10923: Deadlock because of the import lock when loading the utf8 code http://bugs.python.org/issue10923 closed by haypo #10926: Some Invalid Relative Imports succeed in Py 3.0 & 3.1 [& corre http://bugs.python.org/issue10926 closed by r.david.murray #10927: Allow universal line endings in filecmp module http://bugs.python.org/issue10927 closed by r.david.murray #10928: Strange input processing http://bugs.python.org/issue10928 closed by r.david.murray #10929: telnetlib does not send FIN when self.close() issued http://bugs.python.org/issue10929 closed by r.david.murray #10930: dict.setdefault: Bug: default argument is ALWAYS evaluated, i. http://bugs.python.org/issue10930 closed by amaury.forgeotdarc #10931: print() from pipe enclosed between {b'} and {'}-pair on python http://bugs.python.org/issue10931 closed by amaury.forgeotdarc #10934: imaplib: Internaldate2tuple() is documented to return UTC, but http://bugs.python.org/issue10934 closed by belopolsky #10935: wsgiref.handlers.BaseHandler and subclasses of str http://bugs.python.org/issue10935 closed by eric.araujo #10944: ctypes documentation does not mention c_bool in table of stand http://bugs.python.org/issue10944 closed by georg.brandl #10950: ServerProxy returns bad XML http://bugs.python.org/issue10950 closed by georg.brandl #10953: safely eval serialized dict/list data from arbitrary string ov http://bugs.python.org/issue10953 closed by georg.brandl #10958: stat.S_ISLNK() does not wok! http://bugs.python.org/issue10958 closed by amaury.forgeotdarc #10959: mmap crash http://bugs.python.org/issue10959 closed by pitrou #10962: gdb support broken http://bugs.python.org/issue10962 closed by pitrou #1535504: CGIHTTPServer doesn't handle path names with embeded space http://bugs.python.org/issue1535504 closed by georg.brandl From victor.stinner at haypocalc.com Fri Jan 21 18:29:18 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 21 Jan 2011 18:29:18 +0100 Subject: [Python-Dev] Import and unicode: patch is ready for a review and tests Message-ID: <1295630958.15444.3.camel@marge> Hi, It looks like some people fear that non-ASCII module names will cause troubles for the interoperability: you can try my patch attached to issue #3080 to prevent these issues and fix all bugs :-) http://bugs.python.org/issue3080 I should maybe create a dummy Python project using non-ASCII module names to test it. I posted my patch on Rietveld: http://codereview.appspot.com/3972045 Victor From brett at python.org Fri Jan 21 19:21:05 2011 From: brett at python.org (Brett Cannon) Date: Fri, 21 Jan 2011 10:21:05 -0800 Subject: [Python-Dev] [Python-checkins] devguide: Copy over the dev FAQ and *only* strip out stuff covered elsewhere in the In-Reply-To: 
                              
                              References: 
                              
                              
                              Message-ID: 
                              
                              On Thu, Jan 20, 2011 at 16:58, Nick Coghlan 
                              
                              wrote: > On Fri, Jan 21, 2011 at 6:42 AM, brett.cannon > 
                              
                              wrote: >> brett.cannon pushed 82d3a1b694b3 to devguide: >> >> http://hg.python.org/devguide/rev/82d3a1b694b3 >> changeset: ? 167:82d3a1b694b3 >> user: ? ? ? ?Brett Cannon 
                              
                              >> date: ? ? ? ?Thu Jan 20 12:40:47 2011 -0800 >> summary: >> ?Copy over the dev FAQ and *only* strip out stuff covered elsewhere in the devguide. Nick Coghlan should be a happy boy after this. > > Yay, thanks :) Watch what you wish for since you are now maintaining it. =) -Brett > > Cheers, > Nick. > > -- > Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://mail.python.org/mailman/listinfo/python-checkins > From brett at python.org Fri Jan 21 19:22:24 2011 From: brett at python.org (Brett Cannon) Date: Fri, 21 Jan 2011 10:22:24 -0800 Subject: [Python-Dev] [Python-checkins] devguide: Move Misc/README.Emacs to here. In-Reply-To: 
                              
                              References: 
                              
                              
                              Message-ID: 
                              
                              It's the Emacs lovers who put that stuff in all of their files, so I ain't touching it. On Thu, Jan 20, 2011 at 13:06, Sandro Tosi 
                              
                              wrote: > Hi, > > On Thu, Jan 20, 2011 at 20:33, brett.cannon 
                              
                              wrote: >> +.. >> + ? Local Variables: >> + ? mode: indented-text >> + ? indent-tabs-mode: nil >> + ? sentence-end-double-space: t >> + ? fill-column: 78 >> + ? coding: utf-8 >> + ? End: > > maybe this can be removed now > > Cheers, > -- > Sandro Tosi (aka morph, morpheus, matrixhasu) > My website: http://matrixhasu.altervista.org/ > Me at Debian: http://wiki.debian.org/SandroTosi > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org > From martin at v.loewis.de Sat Jan 22 00:50:33 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 22 Jan 2011 00:50:33 +0100 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: 
                              
                              References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              
                              <871v467ih3.fsf@uwakimon.sk.tsukuba.ac.jp> 
                              
                              Message-ID: <4D3A1BC9.40604@v.loewis.de> >> I don't think anybody is *encouraging* it. The argument is for >> *permitting* it, partly for consistency with other identifiers, and >> partly because of Python's usual "consenting adults" standard for >> permitting "dangerous" practices. > > I'm sorry, I was not clear. I was afraid that saying "learning > opportunity" tempt people to try non-ASCII module names. > In these days, even non technical people have access to Windows, Mac > and Linux boxes at a time. So chances to be annoyed with broken > non-ASCII named files are pretty common. Actually, as long people only involve Windows, or only involve Mac, it will all work just fine. It's only when they use non-Mac Unix (such as Linux), or try to move files across systems using sub-prime technology (such as your typical Windows zip utility) they will run into problems. But then it will be clear whom to blame - and people run in the same problems regardless of whether they move Python modules, or regular files (say, Word documents). So the more people get confronted with the poor support of non-ASCII file names in tools, the faster the tools will improve. It took PKWARE many years to come up with a reasonable Unicode story - but now it's really the tools that need to catch up, not the spec. Regards, Martin From ncoghlan at gmail.com Sat Jan 22 03:35:39 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 22 Jan 2011 12:35:39 +1000 Subject: [Python-Dev] Triagers and checkin access to the devguide repository Message-ID: 
                              
                              Given that some of the dev guide docs cover triaging and other aspects of managing issues on the tracker, does it make sense to offer devguide checkin access to triagers that want it? Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From msapiro at value.net Sat Jan 22 02:39:19 2011 From: msapiro at value.net (Mark Sapiro) Date: Fri, 21 Jan 2011 17:39:19 -0800 Subject: [Python-Dev] Mail archive line wrapping (Was: Import and unicode: part two) In-Reply-To: 
                              
                              References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110120215522.61429d36@pitrou.net> <863D108E-9B6E-4F9D-8D1F-D2B87B0EAC69@fuhm.net> <20110121123129.3c05e129@pitrou.net> 
                              
                              Message-ID: <4D3A3547.9010508@value.net> On 11:59 AM, James Y Knight wrote: > > Well, yes, that's a pretty annoying bug in mailman, isn't it? If only anyone around here was involved in mailman and could fix it! :) [I've attempted to cc this to mailman-users with this message, but since I'm not subscribed I dunno if it'll make it or not.] > > I have this in my user CSS override file to fix the issue for myself globally on all such archives out in the world: > /* Mailing list archives */ > html>body>pre { white-space: pre-wrap !important; } > > But really, pipermail should just output a suitable style itself, e.g.: 
                               or a  in the header.


This is mailman bug 266467
. It was fixed in
Mailman 2.1.13.

-- 
Mark Sapiro        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan


From swamiyeswanth at hotmail.com  Sat Jan 22 19:16:24 2011
From: swamiyeswanth at hotmail.com (yeswanth)
Date: Sat, 22 Jan 2011 23:46:24 +0530
Subject: [Python-Dev] web framework for py3k
Message-ID: 

I would want to help porting some web framework for py3k .. I want to 
know to know which one is good and which can be ported easily . Also i 
would require some guidance for this work as I am just a beginner here ..

Thanks
Yeswanth

From alexander.belopolsky at gmail.com  Sat Jan 22 19:25:19 2011
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Sat, 22 Jan 2011 13:25:19 -0500
Subject: [Python-Dev] [Python-checkins] r88140 - in
	python/branches/py3k: Misc/NEWS Modules/zipimport.c
In-Reply-To: <20110122103029.CE0F8EEA11@mail.python.org>
References: <20110122103029.CE0F8EEA11@mail.python.org>
Message-ID: 

On Sat, Jan 22, 2011 at 5:30 AM, victor.stinner
 wrote:
..
> zipimport uses ASCII encoding instead of *cp497* to decode filenames, ..

Shouldn't this be "instead of *cp437*"?

From tjreedy at udel.edu  Sat Jan 22 19:55:41 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 22 Jan 2011 13:55:41 -0500
Subject: [Python-Dev] web framework for py3k
In-Reply-To: 
                              References: 
                              
                              Message-ID: 
                              
                              On 1/22/2011 1:16 PM, yeswanth wrote: In general, this list is for development of Python, CPython, and its stdlib, not 3rd party modules. > I would want to help porting some web framework for py3k .. The target of any such efforts should be 3.2 as it has changes intended to help web programming. > I want to know to know which one is good and which can be ported easily Opinions will depend on the person. Such questions might be better asked on Python list or the specialized web-sig list, where there are more people involved with web frameworks. Most frameworks have their own lists. Some can be accessed as newsgroups at news.gmane.org in the gmane.comp.python hierarchy. -- Terry Jan Reedy From tjreedy at udel.edu Sat Jan 22 20:04:00 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 22 Jan 2011 14:04:00 -0500 Subject: [Python-Dev] What's new 2.x in 3.x docs. Message-ID: 
                              
                              The 3.x docs mostly started fresh with 3.0. The major exception is the What's new section, which goes back to 2.0. The 2.x stuff comprises about 650KB in the repository and whatever that translates into in the distribution. I cannot imagine that anyone who only has 3.x and no 2.x version would have any interest in the 2.x history. And of course, the complete 2.x history will always be available with the latest 2.7.z. And the cover page for 3.x could even say so and include a link. So why not remove it from the 3.2 release (and have two separate pages for the online version)? -- Terry Jan Reedy From solipsis at pitrou.net Sat Jan 22 20:20:20 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 22 Jan 2011 20:20:20 +0100 Subject: [Python-Dev] What's new 2.x in 3.x docs. References: 
                              
                              Message-ID: <20110122202020.51920405@pitrou.net> On Sat, 22 Jan 2011 14:04:00 -0500 Terry Reedy 
                              
                              wrote: > The 3.x docs mostly started fresh with 3.0. The major exception is the > What's new section, which goes back to 2.0. The 2.x stuff comprises > about 650KB in the repository and whatever that translates into in the > distribution. I cannot imagine that anyone who only has 3.x and no 2.x > version would have any interest in the 2.x history. And of course, the > complete 2.x history will always be available with the latest 2.7.z. And > the cover page for 3.x could even say so and include a link. So why not > remove it from the 3.2 release (and have two separate pages for the > online version)? Well, is there any point in doing so, apart from saving 650KB in the repository? I'm not sure we care about the latter (right now the whole source tree is more than 50MB, and that's without version control information). Regards Antoine. From brett at python.org Sat Jan 22 21:54:41 2011 From: brett at python.org (Brett Cannon) Date: Sat, 22 Jan 2011 12:54:41 -0800 Subject: [Python-Dev] Triagers and checkin access to the devguide repository In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: 
                              
                              On Fri, Jan 21, 2011 at 18:35, Nick Coghlan 
                              
                              wrote: > Given that some of the dev guide docs cover triaging and other aspects > of managing issues on the tracker, does it make sense to offer > devguide checkin access to triagers that want it? > There are enough triagers with commit privileges that this is probably not needed. -Brett > Regards, > Nick. > > -- > Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org > From raymond.hettinger at gmail.com Sat Jan 22 22:23:15 2011 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 22 Jan 2011 13:23:15 -0800 Subject: [Python-Dev] What's new 2.x in 3.x docs. In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: <60D93B3F-FF28-4BFF-AD77-F2316F5BAC53@gmail.com> On Jan 22, 2011, at 11:04 AM, Terry Reedy wrote: > The 3.x docs mostly started fresh with 3.0. The major exception is the What's new section, which goes back to 2.0. The 2.x stuff comprises about 650KB in the repository and whatever that translates into in the distribution. I cannot imagine that anyone who only has 3.x and no 2.x version would have any interest in the 2.x history. And of course, the complete 2.x history will always be available with the latest 2.7.z. And the cover page for 3.x could even say so and include a link. So why not remove it from the 3.2 release (and have two separate pages for the online version)? I think there is value in the older whatsnew docs. The provide a readable introduction to various features and nicely augment the plain docs which can be a little dry. +1 for keeping the links as-is. Removing them takes away a resource and gains nothing. Raymond From victor.stinner at haypocalc.com Sun Jan 23 01:12:26 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sun, 23 Jan 2011 01:12:26 +0100 Subject: [Python-Dev] [Python-checkins] r88140 - in python/branches/py3k: Misc/NEWS Modules/zipimport.c In-Reply-To: 
                              
                              References: <20110122103029.CE0F8EEA11@mail.python.org> 
                              
                              Message-ID: <1295741546.18456.0.camel@marge> Le samedi 22 janvier 2011 ? 13:25 -0500, Alexander Belopolsky a ?crit : > > zipimport uses ASCII encoding instead of *cp497* to decode filenames, .. > > Shouldn't this be "instead of *cp437*"? Woops, correct: fixed in r88145. Victor From brett at python.org Sun Jan 23 02:08:00 2011 From: brett at python.org (Brett Cannon) Date: Sat, 22 Jan 2011 17:08:00 -0800 Subject: [Python-Dev] Beta version of the new devguide Message-ID: 
                              
                              http://docs.python.org/devguide/ If you are a core developer and have a correction you want to make you can simply check out the devguide yourself (link is in the Resources section of the devguide) and make the corrections yourself. Otherwise reply here (you can email me directly but I already have instances of multiple people telling me about the same spelling mistake so it's nice to have it public so people know when I have been informed). As for what is left to do, there are a a few things. One is fixing some issues to allow test coverage to be run for the entire test suite (see the coverage docs to know what issues are tracking the problems). I will work on this next if no one beats me to it as both issues should be relatively simple to do. Two, what should the final URL be? Georg picked the current one and I am happy with it. Three, where should it be linked from? docs.python.org homepage? Four, what to do with www.python.org/dev/? Redirect for all the pages? Otherwise I consider the devguide ready to go. My next thing will be an "official" HOWTO on dealing with Python 2/3 porting/maintenance. From brett at python.org Sun Jan 23 02:14:44 2011 From: brett at python.org (Brett Cannon) Date: Sat, 22 Jan 2011 17:14:44 -0800 Subject: [Python-Dev] Beta version of the new devguide In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: 
                              
                              And I forgot to mention I also plan to edit the help text on the various fields on the issue tracker to point to the triaging doc. On Sat, Jan 22, 2011 at 17:08, Brett Cannon 
                              
                              wrote: > http://docs.python.org/devguide/ > > If you are a core developer and have a correction you want to make you > can simply check out the devguide yourself (link is in the Resources > section of the devguide) and make the corrections yourself. Otherwise > reply here (you can email me directly but I already have instances of > multiple people telling me about the same spelling mistake so it's > nice to have it public so people know when I have been informed). > > As for what is left to do, there are a a few things. One is fixing > some issues to allow test coverage to be run for the entire test suite > (see the coverage docs to know what issues are tracking the problems). > I will work on this next if no one beats me to it as both issues > should be relatively simple to do. > > Two, what should the final URL be? Georg picked the current one and I > am happy with it. > > Three, where should it be linked from? docs.python.org homepage? > > Four, what to do with www.python.org/dev/? Redirect for all the pages? > > Otherwise I consider the devguide ready to go. My next thing will be > an "official" HOWTO on dealing with Python 2/3 porting/maintenance. > From ncoghlan at gmail.com Sun Jan 23 02:48:29 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 23 Jan 2011 11:48:29 +1000 Subject: [Python-Dev] What's new 2.x in 3.x docs. In-Reply-To: <60D93B3F-FF28-4BFF-AD77-F2316F5BAC53@gmail.com> References: 
                              
                              <60D93B3F-FF28-4BFF-AD77-F2316F5BAC53@gmail.com> Message-ID: 
                              
                              On Sun, Jan 23, 2011 at 7:23 AM, Raymond Hettinger 
                              
                              wrote: > On Jan 22, 2011, at 11:04 AM, Terry Reedy wrote: > >> The 3.x docs mostly started fresh with 3.0. The major exception is the What's new section, which goes back to 2.0. The 2.x stuff comprises about 650KB in the repository and whatever that translates into in the distribution. I cannot imagine that anyone who only has 3.x and no 2.x version would have any interest in the 2.x history. And of course, the complete 2.x history will always be available with the latest 2.7.z. And the cover page for 3.x could even say so and include a link. So why not remove it from the 3.2 release (and have two separate pages for the online version)? > > I think there is value in the older whatsnew docs. ?The provide a readable introduction to various features and nicely augment the plain docs which can be a little dry. > > +1 for keeping the links as-is. ?Removing them takes away a resource and gains nothing. They're also a useful resource when developing compatibility guides for projects that target older versions (including ones that support py3k via 2to3). With the latest 3.x release always being at the top, I agree with Raymond that retaining the history is a better option. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From tjreedy at udel.edu Sun Jan 23 04:05:00 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 22 Jan 2011 22:05:00 -0500 Subject: [Python-Dev] What's new 2.x in 3.x docs. In-Reply-To: <20110122202020.51920405@pitrou.net> References: 
                              
                              <20110122202020.51920405@pitrou.net> Message-ID: 
                              
                              On 1/22/2011 2:20 PM, Antoine Pitrou wrote: > On Sat, 22 Jan 2011 14:04:00 -0500 > Terry Reedy
                              
                              wrote: >> The 3.x docs mostly started fresh with 3.0. The major exception is the >> What's new section, which goes back to 2.0. The 2.x stuff comprises >> about 650KB in the repository and whatever that translates into in the >> distribution. I cannot imagine that anyone who only has 3.x and no 2.x >> version would have any interest in the 2.x history. And of course, the >> complete 2.x history will always be available with the latest 2.7.z. And >> the cover page for 3.x could even say so and include a link. So why not >> remove it from the 3.2 release (and have two separate pages for the >> online version)? > > Well, is there any point in doing so, apart from saving 650KB in the > repository? I'm not sure we care about the latter (right now the > whole source tree is more than 50MB, and that's without version > control information). I was only proposing actual removal of what to me is noise from the windows help file (now 5.6 mb) with a link to the online version. But the idea is rejected. Fini. -- Terry Jan Reedy From prasun3 at gmail.com Sun Jan 23 08:20:10 2011 From: prasun3 at gmail.com (Prasun Ratn) Date: Sat, 22 Jan 2011 23:20:10 -0800 Subject: [Python-Dev] build problem Message-ID: 
                              
                              Hello I got the latest copy of python source from svn and was trying to build it on Windows Vista (32 bit) using Microsoft Visual Express 2008. I got the following error: 5>"C:\Program Files\TortoiseSVN\bin\subwcrev.exe" .. ..\Modules\getbuildinfo.c "E:\coding\py3kclean\py3k\PCbuild\Win32-temp-Debug\pythoncore\\getbuildinfo2.c" 5>'C:\Program' is not recognized as an internal or external command, Adding an extra set of quotes around the command seems to fix this. I've attached a patch. Thanks Prasun -------------- next part -------------- A non-text attachment was scrubbed... Name: buildinfo.patch Type: application/octet-stream Size: 1356 bytes Desc: not available URL: 
                              
                              From martin at v.loewis.de Sun Jan 23 19:18:35 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 23 Jan 2011 19:18:35 +0100 Subject: [Python-Dev] build problem In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: <4D3C70FB.60102@v.loewis.de> > Adding an extra set of quotes around the command seems to fix > this. I've attached a patch. This is puzzling: a) AFAICT, the code works on all other system as it stands, and b) putting this many quotes into the command line is not plausible. Do you have any strange settings on your computer, such as using a non-standard cmd shell? Regards, Martin From martin at v.loewis.de Sun Jan 23 19:23:02 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 23 Jan 2011 19:23:02 +0100 Subject: [Python-Dev] web framework for py3k In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: <4D3C7206.6080609@v.loewis.de> Am 22.01.2011 19:16, schrieb yeswanth: > I would want to help porting some web framework for py3k .. I want to > know to know which one is good and which can be ported easily . Also i > would require some guidance for this work as I am just a beginner here .. Yeswanth, Terry already indicated that this is the wrong list. The right list would be the python-porting list: http://mail.python.org/mailman/listinfo/python-porting As for which framework can be ported easily: that is, unfortunately, difficult to tell in advance. I'd expect that sheer number of lines gives a good indicator: the smaller the framework, the more easy should it be to port it. HTH, Martin From solipsis at pitrou.net Sun Jan 23 19:58:41 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 23 Jan 2011 19:58:41 +0100 Subject: [Python-Dev] r88147 - in python/branches/py3k: Misc/NEWS Modules/_pickle.c Tools/scripts/find_recursionlimit.py References: <20110123171226.131CCEE98B@mail.python.org> Message-ID: <20110123195841.40d2bbff@pitrou.net> On Sun, 23 Jan 2011 18:12:26 +0100 (CET) antoine.pitrou 
                              
                              wrote: > Author: antoine.pitrou > Date: Sun Jan 23 18:12:25 2011 > New Revision: 88147 > > Log: > Issue #10987: Fix the recursion limit handling in the _pickle module. I forgot to mention that it was ok'ed by Georg, so there it is. Regards Antoine. From brett at python.org Sun Jan 23 21:22:41 2011 From: brett at python.org (Brett Cannon) Date: Sun, 23 Jan 2011 12:22:41 -0800 Subject: [Python-Dev] Beta version of the new devguide In-Reply-To: <20110123075621.468d07c9@dino> References: 
                              
                              <20110123075621.468d07c9@dino> Message-ID: 
                              
                              On Sat, Jan 22, 2011 at 23:56, Mark Summerfield 
                              
                              wrote: > Hi Brett, > > On Sat, 22 Jan 2011 17:08:00 -0800 > Brett Cannon 
                              
                              wrote: >> http://docs.python.org/devguide/ > > Personally, I found the first paragraph of "Contributing" a bit > off-putting. > > How about replacing: > > ? ?People who wish to contribute to Python must read the following > ? ?documents in the order provided. You can stop where you feel > ? ?comfortable and begin contributing immediately without reading and > ? ?understanding these documents all at once, but please do not skip > ? ?around within the documentation as everything is written assuming > ? ?preceding documentation has been read. > > With something like: > > ? ?The Python core development team always welcomes new contributors, > ? ?so we are very glad of your interest! Please read the following > ? ?documents---in the order shown---to ensure that you understand how > ? ?Python's development process works. This will ensure that your > ? ?contributions are considered purely on their merit and don't get > ? ?rejected due to missing or incorrectly performing a step in the > ? ?process. > I'll see what I can do. > In "Getting Set Up" it describes how to build a pydebug build. Is that > really necessary for those who plan only to contribute by working on > pure Python code? > Yes, there is actually a laundry list of reasons even people only working on the stdlib should use a pydebug build. > I had a quick skim over the rest and got the feeling that no clear > distinction is made between C and Python work. Personally, I feel that > more of a distinction should be made since not everyone will be > confident or interested in C. (And maybe more distinction should be made > between working on CPython and the standard library?) I don't see where the distinction between extensions and Python code would serve a purpose beyond clouding the documents by adding more details. People who know both are fine and the people who don't can start off ignorant and work there way up. As for CPython/Python distinction, they are so intertwined at the moment that the distinction is once again not worth it beyond what I have already done. When the stdlib is separated from CPython then the delineation of one over will become worth it. > > Overall I think this document is *extremely welcome* and I am very glad > you have done it. I'm sure that once it starts to get known it will help > add to the pool of people contributing to Python as well as helping to > keep the processes clear:-) =) That's the hope. -Brett > > -- > Mark Summerfield, Qtrac Ltd, www.qtrac.eu > ? ?C++, Python, Qt, PyQt - training and consultancy > ? ? ? ?"Advanced Qt Programming" - ISBN 0321635906 > ? ? ? ? ? ?http://www.qtrac.eu/aqpbook.html > From victor.stinner at haypocalc.com Sun Jan 23 21:22:31 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sun, 23 Jan 2011 21:22:31 +0100 Subject: [Python-Dev] build problem In-Reply-To: <4D3C70FB.60102@v.loewis.de> References: 
                              
                              <4D3C70FB.60102@v.loewis.de> Message-ID: <1295814151.22114.4.camel@marge> Le dimanche 23 janvier 2011 ? 19:18 +0100, "Martin v. L?wis" a ?crit : > > Adding an extra set of quotes around the command seems to fix > > this. I've attached a patch. Hey! I already wrote exactly the same patch! But I didn't propose it upstream because I was unable to reproduce the bug. > This is puzzling: a) AFAICT, the code works on all other system as it > stands, I had this issue already twice, but later (after a reboot? I don't remember) it worked again (without the patch). It might be related to an upgrade of TortoiseSVN (try to upgrade TortoiseSVN without rebooting). > b) putting this many quotes into the command line is not plausible. ""c:\path\to\subwcrev.exe" arg1 arg2 ..." just works. I don't understand why (strange syntax), but it works :-) When I had the problem, it worked with extra quotes, but not without. It is strange because the program ("c:\path\to\subwcrev.exe") existed!? Victor From martin at v.loewis.de Sun Jan 23 23:10:55 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 23 Jan 2011 23:10:55 +0100 Subject: [Python-Dev] build problem In-Reply-To: <1295814151.22114.4.camel@marge> References: 
                              
                              <4D3C70FB.60102@v.loewis.de> <1295814151.22114.4.camel@marge> Message-ID: <4D3CA76F.9070702@v.loewis.de> > ""c:\path\to\subwcrev.exe" arg1 arg2 ..." just works. I don't understand > why (strange syntax), but it works :-) > > When I had the problem, it worked with extra quotes, but not without. It > is strange because the program ("c:\path\to\subwcrev.exe") existed!? I'd really like to understand it before changing it. The part "it sometimes works, then fails" is particularly puzzling, and indicates that the *actual* problem is entirely unrelated to the quoting. Regards, Martin From tjreedy at udel.edu Mon Jan 24 00:45:50 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 23 Jan 2011 18:45:50 -0500 Subject: [Python-Dev] r88147 - in python/branches/py3k: Misc/NEWS Modules/_pickle.c Tools/scripts/find_recursionlimit.py In-Reply-To: <20110123195841.40d2bbff@pitrou.net> References: <20110123171226.131CCEE98B@mail.python.org> <20110123195841.40d2bbff@pitrou.net> Message-ID: 
                              
                              On 1/23/2011 1:58 PM, Antoine Pitrou wrote: >> Issue #10987: Fix the recursion limit handling in the _pickle module. 12 hours after the report! I am still curious why a previous exception changed pickle behavior, and only in 3.2, but I would rather you fix another bug than speeding much time to get me up to speed on the intricacies of _pickle ;-). -- Terry Jan Reedy From fuzzyman at voidspace.org.uk Mon Jan 24 00:54:08 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 23 Jan 2011 23:54:08 +0000 Subject: [Python-Dev] Beta version of the new devguide In-Reply-To: 
                              
                              References: 
                              
                              <20110123075621.468d07c9@dino> 
                              
                              Message-ID: <4D3CBFA0.1000402@voidspace.org.uk> On 23/01/2011 20:22, Brett Cannon wrote: > [snip...] >> I had a quick skim over the rest and got the feeling that no clear >> distinction is made between C and Python work. Personally, I feel that >> more of a distinction should be made since not everyone will be >> confident or interested in C. (And maybe more distinction should be made >> between working on CPython and the standard library?) > I don't see where the distinction between extensions and Python code > would serve a purpose beyond clouding the documents by adding more > details. People who know both are fine and the people who don't can > start off ignorant and work there way up. > I think a lot of people assume that unless they know C they can't contribute to Python. I don't know where the best place is but it would be good to make it *clear* that this isn't true. All the best, Michael Foord > As for CPython/Python distinction, they are so intertwined at the > moment that the distinction is once again not worth it beyond what I > have already done. When the stdlib is separated from CPython then the > delineation of one over will become worth it. > >> Overall I think this document is *extremely welcome* and I am very glad >> you have done it. I'm sure that once it starts to get known it will help >> add to the pool of people contributing to Python as well as helping to >> keep the processes clear:-) > =) That's the hope. > > -Brett > >> -- >> Mark Summerfield, Qtrac Ltd, www.qtrac.eu >> C++, Python, Qt, PyQt - training and consultancy >> "Advanced Qt Programming" - ISBN 0321635906 >> http://www.qtrac.eu/aqpbook.html >> > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From solipsis at pitrou.net Mon Jan 24 01:02:32 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 24 Jan 2011 01:02:32 +0100 Subject: [Python-Dev] r88147 - in python/branches/py3k: Misc/NEWS Modules/_pickle.c Tools/scripts/find_recursionlimit.py References: <20110123171226.131CCEE98B@mail.python.org> <20110123195841.40d2bbff@pitrou.net> 
                              
                              Message-ID: <20110124010232.0f3c5823@pitrou.net> On Sun, 23 Jan 2011 18:45:50 -0500 Terry Reedy 
                              
                              wrote: > On 1/23/2011 1:58 PM, Antoine Pitrou wrote: > > >> Issue #10987: Fix the recursion limit handling in the _pickle module. > > 12 hours after the report! > > I am still curious why a previous exception changed pickle behavior, and > only in 3.2, but I would rather you fix another bug than speeding much > time to get me up to speed on the intricacies of _pickle ;-). It was not about a previous exception. The issue is that pickle detected the recursion overflow but returned a successful status after having set the exception. This is the kind of mistake that produces strange "delayed" exceptions. Regards Antoine. From martin at v.loewis.de Mon Jan 24 01:28:26 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 24 Jan 2011 01:28:26 +0100 Subject: [Python-Dev] r88147 - in python/branches/py3k: Misc/NEWS Modules/_pickle.c Tools/scripts/find_recursionlimit.py In-Reply-To: 
                              
                              References: <20110123171226.131CCEE98B@mail.python.org> <20110123195841.40d2bbff@pitrou.net> 
                              
                              Message-ID: <4D3CC7AA.3070204@v.loewis.de> > I am still curious why a previous exception changed pickle behavior, and > only in 3.2, but I would rather you fix another bug than speeding much > time to get me up to speed on the intricacies of _pickle ;-). IIUC, the code change made pickle actually aware of the exception, rather than just setting it in the thread state, but then happily declaring that pickle succeeded (with what would turn out to be incorrect data). As for why an explicit exception breaks the reporting, and omitting it makes it report the exception correctly: the report that it gave wasn't actually correct. I got raceback (most recent call last): File "a.py", line 4, in 
                              
                              for i in range(100): RuntimeError: maximum recursion depth exceeded while pickling an object So the exception is reported on the range call, or the for loop. After the change, we get Traceback (most recent call last): File "a.py", line 7, in 
                              
                              _pickle.Pickler(io.BytesIO(), protocol=-1).dump(l) RuntimeError: maximum recursion depth exceeded while pickling an object So it appears that the interpreter would actually pick up the exception set by pickle, and attribute it to the for loop. When you add an explicit raise, this raise will clear the stack overflow exception, and set the new exception. So the error manages to pass silently, without being explicitly silenced. I wonder whether we could sprinkle more exception-set? checks in the interpreter loop, at least in debug mode. It's a design flaw in CPython that there are two ways to report an exception: either through the thread state, or through the return value. I don't think this flaw can be fully fixed. However, I wonder whether static analysis of the C code could produce better detection of this kind of bug. Regards, Martin From solipsis at pitrou.net Mon Jan 24 02:16:44 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 24 Jan 2011 02:16:44 +0100 Subject: [Python-Dev] r88147 - in python/branches/py3k: Misc/NEWS Modules/_pickle.c Tools/scripts/find_recursionlimit.py References: <20110123171226.131CCEE98B@mail.python.org> <20110123195841.40d2bbff@pitrou.net> 
                              
                              <4D3CC7AA.3070204@v.loewis.de> Message-ID: <20110124021644.7dc67ccb@pitrou.net> On Mon, 24 Jan 2011 01:28:26 +0100 "Martin v. L?wis" 
                              
                              wrote: > > I wonder whether we could sprinkle more exception-set? checks in > the interpreter loop, at least in debug mode. Yes, this would be nice. Nicer if it can be centralized, of course. That said, it probably wouldn't have helped here, since the code which exhibited the bug (the find_recursion_limit.py script) is basically never run automatically, and very rarely by a human. Regards Antoine. From prasun3 at gmail.com Mon Jan 24 04:08:33 2011 From: prasun3 at gmail.com (prasun3 at gmail.com) Date: Sun, 23 Jan 2011 19:08:33 -0800 Subject: [Python-Dev] build problem In-Reply-To: <4D3CA76F.9070702@v.loewis.de> References: 
                              
                              <4D3C70FB.60102@v.loewis.de> <1295814151.22114.4.camel@marge> <4D3CA76F.9070702@v.loewis.de> Message-ID: 
                              
                              On Sun, Jan 23, 2011 at 2:10 PM, "Martin v. L?wis" 
                              
                              wrote: >> ""c:\path\to\subwcrev.exe" arg1 arg2 ..." just works. I don't understand >> why (strange syntax), but it works :-) >> >> When I had the problem, it worked with extra quotes, but not without. It >> is strange because the program ("c:\path\to\subwcrev.exe") existed!? > > I'd really like to understand it before changing it. The part "it > sometimes works, then fails" is particularly puzzling, and indicates > that the *actual* problem is entirely unrelated to the quoting. I used ProcMon to track down the actual command that the system() call creates. The unmodified code produces this: C:\Windows\system32\cmd.exe /c "C:\Program Files\TortoiseSVN\bin\subwcrev.exe" .. ..\Modules\getbuildinfo.c "E:\coding\py3k\PCbuild\Win32-temp-pgi\pythoncore\\getbuildinfo2.c" whereas my patch produces this: C:\Windows\system32\cmd.exe /c ""C:\Program Files\TortoiseSVN\bin\subwcrev.exe" .. ..\Modules\getbuildinfo.c "E:\coding\py3k\PCbuild\Win32-temp-pgi\pythoncore\\getbuildinfo2.c"" I pasted those two lines on the command prompt. The first results in the error "'C:\Program' is not recognized ...... ". The second one does the right thing. It would be great if someone could run ProcMon on a "normal" system and see what command is created. Thanks Prasun From stephen at xemacs.org Mon Jan 24 03:33:28 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 24 Jan 2011 11:33:28 +0900 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <4D3A1BC9.40604@v.loewis.de> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              
                              <871v467ih3.fsf@uwakimon.sk.tsukuba.ac.jp> 
                              
                              <4D3A1BC9.40604@v.loewis.de> Message-ID: <87y66b58uf.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > Actually, as long people only involve Windows, or only involve Mac, > it will all work just fine. It's only when they use non-Mac Unix > (such as Linux), or try to move files across systems using sub-prime > technology (such as your typical Windows zip utility) they will run > into problems. I believe that the kind of thing that Ishimoto-san has in mind is things like "smart cameras" that will upload your photos to your blog with one touch on the cameras screen and other "Web 2.0 for the rest of us" apps. What with the popularity of Linux and *BSD for such sites, it's easy to imagine problems of the kind he describes occurring between those (which will probably be using Shift JIS in Japan) apps and the websites. Why people with the skills to be actually using Python would have a problem like that, I don't know, but my experience with Japanese vendors is no different from anywhere else: they put the blame for bugs in systems on any convenient component other than their own or close business partners'. Open source is especially convenient because of the NO WARRANTY section prominently displayed in all licenses. > So the more people get confronted with the poor support of non-ASCII > file names in tools, the faster the tools will improve. It took PKWARE > many years to come up with a reasonable Unicode story - but now it's > really the tools that need to catch up, not the spec. I still agree with this point of view, but there is some scope for discussion of whether these tools should be "included batteries" or not. (Unfortunately I'm not in a position to volunteer to help with them for some time. :-( ) From guido at python.org Mon Jan 24 05:07:45 2011 From: guido at python.org (Guido van Rossum) Date: Sun, 23 Jan 2011 20:07:45 -0800 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <87y66b58uf.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              
                              <871v467ih3.fsf@uwakimon.sk.tsukuba.ac.jp> 
                              
                              <4D3A1BC9.40604@v.loewis.de> <87y66b58uf.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: 
                              
                              On Sun, Jan 23, 2011 at 6:33 PM, Stephen J. Turnbull 
                              
                              wrote: > "Martin v. L?wis" writes: > ?> Actually, as long people only involve Windows, or only involve Mac, > ?> it will all work just fine. It's only when they use non-Mac Unix > ?> (such as Linux), or try to move files across systems using sub-prime > ?> technology (such as your typical Windows zip utility) they will run > ?> into problems. > > I believe that the kind of thing that Ishimoto-san has in mind is > things like "smart cameras" that will upload your photos to your blog > with one touch on the cameras screen and other "Web 2.0 for the rest > of us" apps. ?What with the popularity of Linux and *BSD for such > sites, it's easy to imagine problems of the kind he describes > occurring between those (which will probably be using Shift JIS in > Japan) apps and the websites. Really? I would have thought that cell phones have long been the platforms most supportive of Unicode. IIRC Nokia's Python port to S60 *required* Unicode strings for all system interfaces. Android, using Java, also is pretty much all Unicode inside. Am I naive to generalize from these two examples? (This is not meant as a rhetorical question -- I may well be missing something and am genuinely curious about the answer.) -- --Guido van Rossum (python.org/~guido) From stephen at xemacs.org Mon Jan 24 10:19:00 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 24 Jan 2011 18:19:00 +0900 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: 
                              
                              References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              
                              <871v467ih3.fsf@uwakimon.sk.tsukuba.ac.jp> 
                              
                              <4D3A1BC9.40604@v.loewis.de> <87y66b58uf.fsf@uwakimon.sk.tsukuba.ac.jp> 
                              
                              Message-ID: <87pqrm64mz.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > Really? I would have thought that cell phones have long been the > platforms most supportive of Unicode. I would think so too, except in Japan. However, my previous phones exposed file systems with names encoded in Shift JIS to USB and IR browsers, though. (My current one uses Bluetooth, and I don't know how to "get at" the filesystem itself.) A lot of these devices also tend to present themselves as VFAT-formatted drives (a la a USB memory stick), and Shift JIS is very commonly used on those for reasons I don't really understand. In any case, AIUI here the problem is like the problem of refactoring a "make"-based system. There are identifiers which are "spelled" one way inside of files which need to match the "spelling" of names of external filesystem objects. If you transport such a set of files to a POSIX system (which AFAIK most servers still are), then it's quite possible that the file names will get translated to the locale's encoding while the identifiers will not. From martin at v.loewis.de Mon Jan 24 10:45:35 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 24 Jan 2011 10:45:35 +0100 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <87pqrm64mz.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              
                              <871v467ih3.fsf@uwakimon.sk.tsukuba.ac.jp> 
                              
                              <4D3A1BC9.40604@v.loewis.de> <87y66b58uf.fsf@uwakimon.sk.tsukuba.ac.jp> 
                              
                              <87pqrm64mz.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4D3D4A3F.1020502@v.loewis.de> > > Really? I would have thought that cell phones have long been the > > platforms most supportive of Unicode. > > I would think so too, except in Japan. > > However, my previous phones exposed file systems with names encoded in > Shift JIS to USB and IR browsers, though. (My current one uses > Bluetooth, and I don't know how to "get at" the filesystem itself.) A > lot of these devices also tend to present themselves as VFAT-formatted > drives (a la a USB memory stick), and Shift JIS is very commonly used > on those for reasons I don't really understand. It's one thing how the file systems are formatted, but another thing how they are presented to APIs. For example, the phones using Windows CE would have to convert the file names to Unicode in the OS kernel. So: for these phones - do you know how they present file names to the application? Regards, Martin From ncoghlan at gmail.com Mon Jan 24 11:33:07 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 24 Jan 2011 20:33:07 +1000 Subject: [Python-Dev] Beta version of the new devguide In-Reply-To: 
                              
                              References: 
                              
                              <20110123075621.468d07c9@dino> 
                              
                              Message-ID: 
                              
                              On Mon, Jan 24, 2011 at 6:22 AM, Brett Cannon 
                              
                              wrote: >> In "Getting Set Up" it describes how to build a pydebug build. Is that >> really necessary for those who plan only to contribute by working on >> pure Python code? >> > > Yes, there is actually a laundry list of reasons even people only > working on the stdlib should use a pydebug build. And one big reason why I don't unless I have a specific need to check something with it - it makes the already quite long running time for the full test suite take even longer :) I figure it's beneficial to have people running a mixture of debug and release builds anyway - it helps catch things that work in one mode and not the other. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stephen at xemacs.org Mon Jan 24 11:35:22 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 24 Jan 2011 19:35:22 +0900 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <4D3D4A3F.1020502@v.loewis.de> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              
                              <871v467ih3.fsf@uwakimon.sk.tsukuba.ac.jp> 
                              
                              <4D3A1BC9.40604@v.loewis.de> <87y66b58uf.fsf@uwakimon.sk.tsukuba.ac.jp> 
                              
                              <87pqrm64mz.fsf@uwakimon.sk.tsukuba.ac.jp> <4D3D4A3F.1020502@v.loewis.de> Message-ID: <87lj2a613p.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > It's one thing how the file systems are formatted, but another thing > how they are presented to APIs. For example, the phones using Windows CE > would have to convert the file names to Unicode in the OS kernel. > > So: for these phones - do you know how they present file names to the > application? First of all, these aren't just phones; these are all kinds of gadgets (the example I gave was a camera). They're not as smart as an Android or iPhone-like device, and I don't know what OS they use. As for "presentation to the application", as I said, my older phones presented themselves as "removable memory devices" (specifically on the USB port), with VFAT-formatted file systems and Shift JIS file names. In that case you can surely have the kinds of problems described, even if the app is not running on the device itself. I don't know if this is still true of more modern devices, but I was a little shocked that is was true at all, even 5 or 6 years ago. That may be one reason why the phone I have now doesn't provide a USB interface at all. That kind of interface is not only unnecessary with Bluetooth, but Bluetooth uses more robust protocols. From ncoghlan at gmail.com Mon Jan 24 11:51:34 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 24 Jan 2011 20:51:34 +1000 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <87lj2a613p.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              
                              <871v467ih3.fsf@uwakimon.sk.tsukuba.ac.jp> 
                              
                              <4D3A1BC9.40604@v.loewis.de> <87y66b58uf.fsf@uwakimon.sk.tsukuba.ac.jp> 
                              
                              <87pqrm64mz.fsf@uwakimon.sk.tsukuba.ac.jp> <4D3D4A3F.1020502@v.loewis.de> <87lj2a613p.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: 
                              
                              On Mon, Jan 24, 2011 at 8:35 PM, Stephen J. Turnbull 
                              
                              wrote: > First of all, these aren't just phones; these are all kinds of gadgets > (the example I gave was a camera). ?They're not as smart as an Android > or iPhone-like device, and I don't know what OS they use. We're getting a little far afield from the original question though - once it was pointed out that non-ASCII module names already work on some systems but not others, it became fairly clear that Victor's patch is about fixing an existing feature to be more robust rather than adding something new. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From list at qtrac.plus.com Sun Jan 23 08:56:21 2011 From: list at qtrac.plus.com (Mark Summerfield) Date: Sun, 23 Jan 2011 07:56:21 +0000 Subject: [Python-Dev] Beta version of the new devguide In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: <20110123075621.468d07c9@dino> Hi Brett, On Sat, 22 Jan 2011 17:08:00 -0800 Brett Cannon 
                              
                              wrote: > http://docs.python.org/devguide/ Personally, I found the first paragraph of "Contributing" a bit off-putting. How about replacing: People who wish to contribute to Python must read the following documents in the order provided. You can stop where you feel comfortable and begin contributing immediately without reading and understanding these documents all at once, but please do not skip around within the documentation as everything is written assuming preceding documentation has been read. With something like: The Python core development team always welcomes new contributors, so we are very glad of your interest! Please read the following documents---in the order shown---to ensure that you understand how Python's development process works. This will ensure that your contributions are considered purely on their merit and don't get rejected due to missing or incorrectly performing a step in the process. In "Getting Set Up" it describes how to build a pydebug build. Is that really necessary for those who plan only to contribute by working on pure Python code? I had a quick skim over the rest and got the feeling that no clear distinction is made between C and Python work. Personally, I feel that more of a distinction should be made since not everyone will be confident or interested in C. (And maybe more distinction should be made between working on CPython and the standard library?) Overall I think this document is *extremely welcome* and I am very glad you have done it. I'm sure that once it starts to get known it will help add to the pool of people contributing to Python as well as helping to keep the processes clear:-) -- Mark Summerfield, Qtrac Ltd, www.qtrac.eu C++, Python, Qt, PyQt - training and consultancy "Advanced Qt Programming" - ISBN 0321635906 http://www.qtrac.eu/aqpbook.html From solipsis at pitrou.net Mon Jan 24 12:29:58 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 24 Jan 2011 12:29:58 +0100 Subject: [Python-Dev] Beta version of the new devguide References: 
                              
                              <20110123075621.468d07c9@dino> 
                              
                              
                              Message-ID: <20110124122958.589246ed@pitrou.net> On Mon, 24 Jan 2011 20:33:07 +1000 Nick Coghlan 
                              
                              wrote: > On Mon, Jan 24, 2011 at 6:22 AM, Brett Cannon 
                              
                              wrote: > >> In "Getting Set Up" it describes how to build a pydebug build. Is that > >> really necessary for those who plan only to contribute by working on > >> pure Python code? > >> > > > > Yes, there is actually a laundry list of reasons even people only > > working on the stdlib should use a pydebug build. > > And one big reason why I don't unless I have a specific need to check > something with it - it makes the already quite long running time for > the full test suite take even longer :) Please try the -j option to regrtest. Regards Antoine. From earney at umsystem.edu Mon Jan 24 14:56:56 2011 From: earney at umsystem.edu (Earney, Billy C.) Date: Mon, 24 Jan 2011 07:56:56 -0600 Subject: [Python-Dev] tahoe-lafs Message-ID: <59ACE054B23B6045ACC43DBD778609BD6867D17257@UM-EMAIL02.um.umsystem.edu> Greetings! I know that this list is for python development questions/comments, but I wanted to bring up the tahoe-lafs project if people are interested in a project developed in python that allows for secure distributed storage. For more information see http://tahoe-lafs.org For those of you interested in joining a tahoe-lafs storage grid, I'm a member of a newly created storage grid called volunteer-grid2, and we are currently looking for new members. The requirements to be a member can be viewed at http://bigpig.org/twiki/bin/view/Main/AboutVolunteerGrid2 Billy Earney earney at umsystem.edu
                              
                              Programmer/Analyst-Expert [cid:image001.gif at 01CBBB9B.FB70CDB0] MySQL Certified DBA Office of Social and Economic Data Analysis (OSEDA) University of Missouri Phone: 573-882-7396 Fax: 573-884-4635 -------------- next part -------------- An HTML attachment was scrubbed... URL: 
                              
                              -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 1989 bytes Desc: image001.gif URL: 
                              
                              From solipsis at pitrou.net Mon Jan 24 15:14:27 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 24 Jan 2011 15:14:27 +0100 Subject: [Python-Dev] tahoe-lafs References: <59ACE054B23B6045ACC43DBD778609BD6867D17257@UM-EMAIL02.um.umsystem.edu> Message-ID: <20110124151427.3526dd1a@pitrou.net> On Mon, 24 Jan 2011 07:56:56 -0600 "Earney, Billy C." 
                              
                              wrote: > Greetings! > > I know that this list is for python development questions/comments, but I wanted to bring up the tahoe-lafs project [...] You should really post such messages to comp.lang.python. From ncoghlan at gmail.com Mon Jan 24 16:33:04 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 25 Jan 2011 01:33:04 +1000 Subject: [Python-Dev] Beta version of the new devguide In-Reply-To: <20110124122958.589246ed@pitrou.net> References: 
                              
                              <20110123075621.468d07c9@dino> 
                              
                              
                              <20110124122958.589246ed@pitrou.net> Message-ID: 
                              
                              On Mon, Jan 24, 2011 at 9:29 PM, Antoine Pitrou 
                              
                              wrote: > On Mon, 24 Jan 2011 20:33:07 +1000 > Nick Coghlan 
                              
                              wrote: >> On Mon, Jan 24, 2011 at 6:22 AM, Brett Cannon 
                              
                              wrote: >> >> In "Getting Set Up" it describes how to build a pydebug build. Is that >> >> really necessary for those who plan only to contribute by working on >> >> pure Python code? >> >> >> > >> > Yes, there is actually a laundry list of reasons even people only >> > working on the stdlib should use a pydebug build. >> >> And one big reason why I don't unless I have a specific need to check >> something with it - it makes the already quite long running time for >> the full test suite take even longer :) > > Please try the -j option to regrtest. While I must admit I'm still not in the habit of running tests in parallel, that's a substantial speed improvement regardless of build type, so a non-debug build is still noticeably faster. release (with -j4): 2 min 25 sec (3 min wall clock time) pydebug (with -j4): 4 min 43 sec (10 min wall clock time) Given that I typically *don't* need the extra info from a debug build to analyse problems and a full configure and rebuild cycle takes less time than a single pydebug test run, I'll happily stick with the much faster test execution that comes from using a release build. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Mon Jan 24 16:35:05 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 25 Jan 2011 01:35:05 +1000 Subject: [Python-Dev] tahoe-lafs In-Reply-To: <59ACE054B23B6045ACC43DBD778609BD6867D17257@UM-EMAIL02.um.umsystem.edu> References: <59ACE054B23B6045ACC43DBD778609BD6867D17257@UM-EMAIL02.um.umsystem.edu> Message-ID: 
                              
                              On Mon, Jan 24, 2011 at 11:56 PM, Earney, Billy C. 
                              
                              wrote: > Greetings! > > > > I know that this list is for python development questions/comments, > People that post questions innocently unaware of the nature of this list have an excuse. You don't. This is not a good way to encourage people to think well of you or your project. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: 
                              
                              From solipsis at pitrou.net Mon Jan 24 16:46:01 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 24 Jan 2011 16:46:01 +0100 Subject: [Python-Dev] Beta version of the new devguide References: 
                              
                              Message-ID: <20110124164601.6b984c6c@pitrou.net> On Sat, 22 Jan 2011 17:08:00 -0800 Brett Cannon 
                              
                              wrote: > > Two, what should the final URL be? Georg picked the current one and I > am happy with it. Ditto for me. > Three, where should it be linked from? docs.python.org homepage? > Four, what to do with www.python.org/dev/? Redirect for all the pages? Right, this whole area (wpo/dev) looks obsolete to me. The devguide allows us to easily edit and improve development-related docs, which is great! It should be accessible easily from the main site. Perhaps "core development" should be renamed "contributing" and redirect to the devguide. Also, the submenu displayed below "core development" can be trimmed dramatically. (then there's the question of whether the devguide should be exhaustive; should it contain reference-like material about all aspects of core development?) Regards Antoine. From vstinner at edenwall.com Mon Jan 24 16:39:39 2011 From: vstinner at edenwall.com (Victor Stinner) Date: Mon, 24 Jan 2011 16:39:39 +0100 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <87lj2a613p.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1295440442.432.18.camel@marge> <4D3D4A3F.1020502@v.loewis.de> <87lj2a613p.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <201101241639.39345.vstinner@edenwall.com> Le lundi 24 janvier 2011 11:35:22, Stephen J. Turnbull a ?crit : > ... VFAT-formatted file systems and Shift JIS file names ... I missed something: VFAT stores filenames as unicode (whereas FAT only supports byte filenames). Well, VFAT stores filenames twice: as a 8+3 byte strings and as a 255 unicode (UTF-16-LE) string (UTF-16-LE). On which OS do you access this VFAT file system? On Windows, you have two APIs: bytes (*A) and wide character (*W). If you use the wide character, there is explicit encoding at all. Linux has two mount options to control unicode on a VFAT filesystem: "codepage" for the byte filenames (use Shift JIS here) and "iocharset" for the unicode filenames (I don't understand this option). Anyway, both systems support unicode filenames. I suppose that Shift JIS is used to encode the filename in the 8+3 byte string form. Victor From spoettl at hotmail.com Mon Jan 24 17:39:54 2011 From: spoettl at hotmail.com (Stefan Spoettl) Date: Mon, 24 Jan 2011 16:39:54 +0000 Subject: [Python-Dev] (no subject) Message-ID: 
                              
                              Using:Python 2.7.0+ (r27:82500, Sep 15 2010, 18:14:55) [GCC 4.4.5] on linux2(Ubuntu 10.10) Method to reproduce error: 1. Defining a module which is later imported by another: --------------------------------------------------------------------- class SomeThing: def __init__(self): self.variable = 'Where is my bytecode?' def deliver(self): return self.variable if __name__ == '__main__': obj = SomeThing() print obj.deliver() --------------------------------------------------------------------- 2. Run this module:Output of the Python Shell: Where is my bytecode? >>> 3. Defining the importing module: --------------------------------------------------------------------- class UseSomeThing: def __init__(self, something): self.anything = something def giveanything(self): return self.anything if __name__ == '__main__': anything = UseSomeThing(SomeThing.SomeThing().deliver()).giveanything() print anything --------------------------------------------------------------------- 4. Run this module:Output of the Python Shell: Where is my bytecode >>>(One can find SomeThing.pyc on the disc.) 5. Changing the imported module: ---------------------------------------------------------------------class SomeThing: def __init__(self): self.variable = 'What the hell is this? It could not be Python!' def deliver(self): return self.variableif __name__ == '__main__': obj = SomeThing() print obj.deliver()--------------------------------------------------------------------- 6. Run the changed module:Output of the Python Shell: What the hell is this? It could not be Python! >>> 7. Run the importing module again:Output of the Python Shell: Where is my bytecode? >>>8. Deleting the bytecode of the imported module makes no effect! Remark: I think that I have observed yesterday late night a similar effect on Windows XPwith Python 2.7.1 and Python 3.1.3. But when I have tried it out today in the morning theerror hasn't appeared. So it may be that the Python interpreter isn't working correctly onlyon Ubuntu 10.10. -------------- next part -------------- An HTML attachment was scrubbed... URL: 
                              
                              From victor.stinner at haypocalc.com Mon Jan 24 17:51:42 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 24 Jan 2011 17:51:42 +0100 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <201101241639.39345.vstinner@edenwall.com> References: <1295440442.432.18.camel@marge> <87lj2a613p.fsf@uwakimon.sk.tsukuba.ac.jp> <201101241639.39345.vstinner@edenwall.com> Message-ID: <201101241751.42470.victor.stinner@haypocalc.com> Le lundi 24 janvier 2011 16:39:39, Victor Stinner a ?crit : > Le lundi 24 janvier 2011 11:35:22, Stephen J. Turnbull a ?crit : > > ... VFAT-formatted file systems and Shift JIS file names ... > > I missed something: VFAT stores filenames as unicode (whereas FAT only > supports byte filenames). Well, VFAT stores filenames twice: as a 8+3 byte > strings and as a 255 unicode (UTF-16-LE) string (UTF-16-LE). > > On which OS do you access this VFAT file system? On Windows, you have two > APIs: bytes (*A) and wide character (*W). If you use the wide character, > there is explicit encoding at all. Oops, there is *not* explicit encoding a all. Victor From earney at umsystem.edu Mon Jan 24 17:18:16 2011 From: earney at umsystem.edu (Earney, Billy C.) Date: Mon, 24 Jan 2011 10:18:16 -0600 Subject: [Python-Dev] tahoe-lafs In-Reply-To: 
                              
                              References: <59ACE054B23B6045ACC43DBD778609BD6867D17257@UM-EMAIL02.um.umsystem.edu> 
                              
                              Message-ID: <59ACE054B23B6045ACC43DBD778609BD6867D1731B@UM-EMAIL02.um.umsystem.edu> I want to make it clear that I am in no way associated with the tahoe-lafs project. I do not want my email to make that project look bad. That was not my intention. Billy Earney earney at umsystem.edu
                              
                              Programmer/Analyst-Expert [cid:image001.gif at 01CBBBB0.03DD8B00] MySQL Certified DBA Office of Social and Economic Data Analysis (OSEDA) University of Missouri Phone: 573-882-7396 Fax: 573-884-4635 From: Nick Coghlan [mailto:ncoghlan at gmail.com] Sent: Monday, January 24, 2011 9:35 AM To: Earney, Billy C. Cc: Python-Dev at python.org Subject: Re: [Python-Dev] tahoe-lafs On Mon, Jan 24, 2011 at 11:56 PM, Earney, Billy C. 
                              
                              > wrote: Greetings! I know that this list is for python development questions/comments, People that post questions innocently unaware of the nature of this list have an excuse. You don't. This is not a good way to encourage people to think well of you or your project. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com
                              
                              | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: 
                              
                              -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 1989 bytes Desc: image001.gif URL: 
                              
                              From phd at phdru.name Mon Jan 24 17:49:12 2011 From: phd at phdru.name (Oleg Broytman) Date: Mon, 24 Jan 2011 19:49:12 +0300 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <201101241639.39345.vstinner@edenwall.com> References: <1295440442.432.18.camel@marge> <4D3D4A3F.1020502@v.loewis.de> <87lj2a613p.fsf@uwakimon.sk.tsukuba.ac.jp> <201101241639.39345.vstinner@edenwall.com> Message-ID: <20110124164912.GA9307@iskra.aviel.ru> On Mon, Jan 24, 2011 at 04:39:39PM +0100, Victor Stinner wrote: > I missed something: VFAT stores filenames as unicode (whereas FAT only > supports byte filenames). Well, VFAT stores filenames twice: as a 8+3 byte > strings and as a 255 unicode (UTF-16-LE) string (UTF-16-LE). > > On which OS do you access this VFAT file system? On Windows, you have two > APIs: bytes (*A) and wide character (*W). If you use the wide character, there > is explicit encoding at all. Linux has two mount options to control unicode on > a VFAT filesystem: "codepage" for the byte filenames (use Shift JIS here) and > "iocharset" for the unicode filenames (I don't understand this option). AFAIU, `codepage` is "remote charset" while `iocharset` is "local charset". I.e., to mount windows-1251 filesystem to my linux with koi8-r locale I use codepage=cp866,iocharset=koi8-r (cp866 is OEM encoding for cp1251 ANSI). Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From phd at phdru.name Mon Jan 24 18:18:00 2011 From: phd at phdru.name (Oleg Broytman) Date: Mon, 24 Jan 2011 20:18:00 +0300 Subject: [Python-Dev] (no subject) In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: <20110124171800.GB9307@iskra.aviel.ru> On Mon, Jan 24, 2011 at 04:39:54PM +0000, Stefan Spoettl wrote: > So it may be that the Python interpreter isn't working correctly onlyon Ubuntu 10.10 Than you should report the problem to the Ubuntu developers, right? And it would be nice if you investigate deeper and send a proper mail - with a subject, with a properly formatted text, not html. http://www.catb.org/~esr/faqs/smart-questions.html Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From brett at python.org Mon Jan 24 18:33:04 2011 From: brett at python.org (Brett Cannon) Date: Mon, 24 Jan 2011 09:33:04 -0800 Subject: [Python-Dev] (no subject) In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: 
                              
                              Bug reports should be filed at bugs.python.org On Mon, Jan 24, 2011 at 08:39, Stefan Spoettl 
                              
                              wrote: > Using: > Python 2.7.0+ (r27:82500, Sep 15 2010, 18:14:55) > [GCC 4.4.5] on linux2 > (Ubuntu 10.10) > Method to reproduce error: > 1. Defining a module which is later imported by another: > --------------------------------------------------------------------- > class SomeThing: > ?? ?def __init__(self): > ?? ? ? ?self.variable = 'Where is my bytecode?' > ?? ?def deliver(self): > ?? ? ? ?return self.variable > > if __name__ == '__main__': > ?? ?obj = SomeThing() > ?? ?print obj.deliver() > --------------------------------------------------------------------- > 2. Run this module: > Output of the Python Shell: Where is my bytecode? > ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? >>> > 3. Defining the importing module: > --------------------------------------------------------------------- > class UseSomeThing: > ?? ?def __init__(self, something): > ?? ? ? ?self.anything = something > ?? ?def giveanything(self): > ?? ? ? ?return self.anything > > if __name__ == '__main__': > ?? ?anything = UseSomeThing(SomeThing.SomeThing().deliver()).giveanything() > ?? ?print anything > --------------------------------------------------------------------- > 4. Run this module: > Output of the Python Shell: Where is my bytecode > ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?>>> > (One can find SomeThing.pyc on the disc.) > 5. Changing the imported module: > --------------------------------------------------------------------- > class SomeThing: > ?? ?def __init__(self): > ?? ? ? ?self.variable = 'What the hell is this? It could not be Python!' > ?? ?def deliver(self): > ?? ? ? ?return self.variable > > if __name__ == '__main__': > ?? ?obj = SomeThing() > ?? ?print obj.deliver() > --------------------------------------------------------------------- > 6. Run the changed module: > Output of the Python Shell:?What the hell is this? It could not be Python! > ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? >>> > 7. Run the importing module again: > Output of the Python Shell:?Where is my bytecode? > ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? >>> > 8. Deleting the bytecode of the imported module makes no effect! > Remark: I think that I have observed yesterday late night a similar effect > on Windows XP > with Python 2.7.1 and Python 3.1.3. But when I have tried it out today in > the morning the > error hasn't appeared. So it may be that the Python interpreter isn't > working correctly only > on Ubuntu 10.10. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brett%40python.org > > From brett at python.org Mon Jan 24 19:38:45 2011 From: brett at python.org (Brett Cannon) Date: Mon, 24 Jan 2011 10:38:45 -0800 Subject: [Python-Dev] Beta version of the new devguide In-Reply-To: 
                              
                              References: 
                              
                              <20110123075621.468d07c9@dino> 
                              
                              
                              <20110124122958.589246ed@pitrou.net> 
                              
                              Message-ID: 
                              
                              On Mon, Jan 24, 2011 at 07:33, Nick Coghlan 
                              
                              wrote: > On Mon, Jan 24, 2011 at 9:29 PM, Antoine Pitrou 
                              
                              wrote: >> On Mon, 24 Jan 2011 20:33:07 +1000 >> Nick Coghlan 
                              
                              wrote: >>> On Mon, Jan 24, 2011 at 6:22 AM, Brett Cannon 
                              
                              wrote: >>> >> In "Getting Set Up" it describes how to build a pydebug build. Is that >>> >> really necessary for those who plan only to contribute by working on >>> >> pure Python code? >>> >> >>> > >>> > Yes, there is actually a laundry list of reasons even people only >>> > working on the stdlib should use a pydebug build. >>> >>> And one big reason why I don't unless I have a specific need to check >>> something with it - it makes the already quite long running time for >>> the full test suite take even longer :) >> >> Please try the -j option to regrtest. > > While I must admit I'm still not in the habit of running tests in > parallel, that's a substantial speed improvement regardless of build > type, so a non-debug build is still noticeably faster. > > release (with -j4): 2 min 25 sec (3 min wall clock time) > pydebug (with -j4): 4 min 43 sec (10 min wall clock time) > If you thinks that's slow, trying running it under coverage single-threaded. =) > Given that I typically *don't* need the extra info from a debug build > to analyse problems and a full configure and rebuild cycle takes less > time than a single pydebug test run, I'll happily stick with the much > faster test execution that comes from using a release build. > I'm not going to drag on arguing this point, but there is more to pydebug builds than some debug info when working in the C code. For instance, pure Python code can still trigger problems indirectly in C code which gets picked up by a pydebug. You also have ResourceWarnings now which are almost exclusively triggered by pure Python code. My point is there is more to a pydebug build than just direct debugging support for C code. But if running the test suite w/o a debug build is what it takes to get people to run the test suite I will take that over not running it at all. From brett at python.org Mon Jan 24 19:43:01 2011 From: brett at python.org (Brett Cannon) Date: Mon, 24 Jan 2011 10:43:01 -0800 Subject: [Python-Dev] Beta version of the new devguide In-Reply-To: <20110124164601.6b984c6c@pitrou.net> References: 
                              
                              <20110124164601.6b984c6c@pitrou.net> Message-ID: 
                              
                              On Mon, Jan 24, 2011 at 07:46, Antoine Pitrou 
                              
                              wrote: > On Sat, 22 Jan 2011 17:08:00 -0800 > Brett Cannon 
                              
                              wrote: >> >> Two, what should the final URL be? Georg picked the current one and I >> am happy with it. > > Ditto for me. > >> Three, where should it be linked from? docs.python.org homepage? Either there and/or the www.python.org homepage. >> Four, what to do with www.python.org/dev/? Redirect for all the pages? > > Right, this whole area (wpo/dev) looks obsolete to me. The devguide > allows us to easily edit and improve development-related docs, which is > great! It should be accessible easily from the main site. Perhaps > "core development" should be renamed "contributing" and redirect to the > devguide. Also, the submenu displayed below "core development" can be > trimmed dramatically. There actually shouldn't be anything at python.org/dev that is useful which has not been rewritten or linked to from the devguide. So that whole page can be heavily gutted to the point of probably being nothing more than a link to the devguide and a link to the PEP 0. But I will hold off on the gutting until the devguide is "released"; probably end of the week. > > (then there's the question of whether the devguide should be > exhaustive; should it contain reference-like material about > all aspects of core development?) That's where the balancing act comes in. If we get too exhaustive then we have to constantly update the docs anytime we make a change. If we leave too loose then someone is going to come along and potentially waste some time on something because they didn't realize what they should have been doing. From raymond.hettinger at gmail.com Mon Jan 24 20:04:06 2011 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 24 Jan 2011 11:04:06 -0800 Subject: [Python-Dev] Keeping __init__.py empty for Python packages used for module grouping. Message-ID: <854DFFA6-1BAD-41D7-BBC7-6906C606774A@gmail.com> Looking at http://docs.python.org/dev/library/html.html#module-html it would appear that we've created a new module with a single trivial function. In reality, there was already a python package, html, that served to group two loosely related modules, html.parser and html.entities. ISTM, that if we're going to use python packages as "namespace containers" for categorizing modules, then the top level __init__ namespace should be left empty. Before the placement of html.escape() becomes set in stone, I think we should consider putting it somewhere else. Raymond From g.brandl at gmx.net Mon Jan 24 20:18:07 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 24 Jan 2011 20:18:07 +0100 Subject: [Python-Dev] Keeping __init__.py empty for Python packages used for module grouping. In-Reply-To: <854DFFA6-1BAD-41D7-BBC7-6906C606774A@gmail.com> References: <854DFFA6-1BAD-41D7-BBC7-6906C606774A@gmail.com> Message-ID: 
                              
                              Am 24.01.2011 20:04, schrieb Raymond Hettinger: > Looking at http://docs.python.org/dev/library/html.html#module-html it would > appear that we've created a new module with a single trivial function. > > In reality, there was already a python package, html, that served to group > two loosely related modules, html.parser and html.entities. > > ISTM, that if we're going to use python packages as "namespace containers" > for categorizing modules, then the top level __init__ namespace should be > left empty. > > Before the placement of html.escape() becomes set in stone, I think we should > consider putting it somewhere else. To be honest, I don't see the issue. I don't see stdlib packages as "namespace containers", but rather as a nice way of structuring functionality. And remember that flat is better than nested -- why should escape() be put away into a new submodule? At least you'll need to let us know where you would rather put that function. Georg From raymond.hettinger at gmail.com Mon Jan 24 20:46:45 2011 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 24 Jan 2011 11:46:45 -0800 Subject: [Python-Dev] Location of tests for packages Message-ID: 
                              
                              Right now, the tests for the unittest package are under the package directory instead of Lib/test where we have most of the other tests. There are some other packages that do the same thing, each for their own reason. I think we should develop a strong preference for tests going under Lib/test unless there is a very compelling reason. We already have a similar preference for all Docs going under Doc/ and that has not proved to be an issue with any package maintainer. * The Windows distro has an install option to exclude Lib/test. The currrent situation with unittest works against it. * The commingling of tests with the regular code is making it more difficult to grep code while excluding tests. * Having packages create their little worlds within world is making it more difficult to find things. * For regrtest to work, there still needs to be some file in Lib/test that dispatches to the alternate test directory. This isn't a critical issue (nothing is broken) but we're a week from another release candidate, so the new Py3.2 package organization (unittest was flat in Py3.1 and its test were under Lib/test) is about to become a de-facto decision that will be hard to undo. I recommend moving it under Lib/test before everything is set in stone. Raymond P.S. I've discussed this with Michael and his preference is against going back to the Py3.1 style where the tests were under Lib/test. He thinks the current tree makes it easier to sync with Py2.7 and the unittest2 third-party module. Also, he likes grepping the regular source and tests all at once. From g.brandl at gmx.net Mon Jan 24 20:26:00 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 24 Jan 2011 20:26:00 +0100 Subject: [Python-Dev] What's new 2.x in 3.x docs. In-Reply-To: 
                              
                              References: 
                              
                              <60D93B3F-FF28-4BFF-AD77-F2316F5BAC53@gmail.com> 
                              
                              Message-ID: 
                              
                              Am 23.01.2011 02:48, schrieb Nick Coghlan: > On Sun, Jan 23, 2011 at 7:23 AM, Raymond Hettinger > 
                              
                              wrote: >> On Jan 22, 2011, at 11:04 AM, Terry Reedy wrote: >> >>> The 3.x docs mostly started fresh with 3.0. The major exception is the What's new section, which goes back to 2.0. The 2.x stuff comprises about 650KB in the repository and whatever that translates into in the distribution.. I cannot imagine that anyone who only has 3.x and no 2.x version would have any interest in the 2.x history. And of course, the complete 2.x history will always be available with the latest 2.7.z. And the cover page for 3.x could even say so and include a link. So why not remove it from the 3.2 release (and have two separate pages for the online version)? >> >> I think there is value in the older whatsnew docs. The provide a readable introduction to various features and nicely augment the plain docs which can be a little dry. >> >> +1 for keeping the links as-is. Removing them takes away a resource and gains nothing. > > They're also a useful resource when developing compatibility guides > for projects that target older versions (including ones that support > py3k via 2to3). > > With the latest 3.x release always being at the top, I agree with > Raymond that retaining the history is a better option. Agreed. Georg From fdrake at acm.org Mon Jan 24 21:14:44 2011 From: fdrake at acm.org (Fred Drake) Date: Mon, 24 Jan 2011 15:14:44 -0500 Subject: [Python-Dev] Keeping __init__.py empty for Python packages used for module grouping. In-Reply-To: <854DFFA6-1BAD-41D7-BBC7-6906C606774A@gmail.com> References: <854DFFA6-1BAD-41D7-BBC7-6906C606774A@gmail.com> Message-ID: 
                              
                              On Mon, Jan 24, 2011 at 2:04 PM, Raymond Hettinger 
                              
                              wrote: > ISTM, that if we're going to use python packages as "namespace containers" for > categorizing modules, then the top level __init__ namespace should be left empty. This is only an issue if the separate components are distributed separately; for the standard library, we're not using it as a namespace package in the same sense that is done with (for example) the "zope" package. ? -Fred -- Fred L. Drake, Jr.? ? 
                              
                              "A storm broke loose in my mind."? --Albert Einstein From merwok at netwok.org Mon Jan 24 21:21:07 2011 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Mon, 24 Jan 2011 21:21:07 +0100 Subject: [Python-Dev] Location of tests for packages In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: <4D3DDF33.9020702@netwok.org> > Right now, the tests for the unittest package are under the > package directory instead of Lib/test where we have most of the > other tests. > > There are some other packages that do the same thing, each for > their own reason. The corresponding bug report is #10572 (opened by Michael Foord). R. David Murray was +1 for moving email tests, Barry deferred to him, Brett was +0 for importlib, and I was ?0 for distutils. Maintainers of ctypes, json, lib2to3 and sqlite3 haven?t yet expressed themselves. Regards From martin at v.loewis.de Mon Jan 24 21:17:34 2011 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 24 Jan 2011 21:17:34 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation Message-ID: <4D3DDE5E.4080807@v.loewis.de> I have been thinking about Unicode representation for some time now. This was triggered, on the one hand, by discussions with Glyph Lefkowitz (who complained that his server app consumes too much memory), and Carl Friedrich Bolz (who profiled Python applications to determine that Unicode strings are among the top consumers of memory in Python). On the other hand, this was triggered by the discussion on supporting surrogates in the library better. I'd like to propose PEP 393, which takes a different approach, addressing both problems simultaneously: by getting a flexible representation (one that can be either 1, 2, or 4 bytes), we can support the full range of Unicode on all systems, but still use only one byte per character for strings that are pure ASCII (which will be the majority of strings for the majority of users). You'll find the PEP at http://www.python.org/dev/peps/pep-0393/ For convenience, I include it below. Regards, Martin PEP: 393 Title: Flexible String Representation Version: $Revision: 88168 $ Last-Modified: $Date: 2011-01-24 21:14:21 +0100 (Mo, 24. Jan 2011) $ Author: Martin v. L?wis 
                              
                              Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 24-Jan-2010 Python-Version: 3.3 Post-History: Abstract ======== The Unicode string type is changed to support multiple internal representations, depending on the character with the largest Unicode ordinal (1, 2, or 4 bytes). This will allow a space-efficient representation in common cases, but give access to full UCS-4 on all systems. For compatibility with existing APIs, several representations may exist in parallel; over time, this compatibility should be phased out. Rationale ========= There are two classes of complaints about the current implementation of the unicode type: on systems only supporting UTF-16, users complain that non-BMP characters are not properly supported. On systems using UCS-4 internally (and also sometimes on systems using UCS-2), there is a complaint that Unicode strings take up too much memory - especially compared to Python 2.x, where the same code would often use ASCII strings (i.e. ASCII-encoded byte strings). With the proposed approach, ASCII-only Unicode strings will again use only one byte per character; while still allowing efficient indexing of strings containing non-BMP characters (as strings containing them will use 4 bytes per character). One problem with the approach is support for existing applications (e.g. extension modules). For compatibility, redundant representations may be computed. Applications are encouraged to phase out reliance on a specific internal representation if possible. As interaction with other libraries will often require some sort of internal representation, the specification choses UTF-8 as the recommended way of exposing strings to C code. For many strings (e.g. ASCII), multiple representations may actually share memory (e.g. the shortest form may be shared with the UTF-8 form if all characters are ASCII). With such sharing, the overhead of compatibility representations is reduced. Specification ============= The Unicode object structure is changed to this definition:: typedef struct { PyObject_HEAD Py_ssize_t length; void *str; Py_hash_t hash; int state; Py_ssize_t utf8_length; void *utf8; Py_ssize_t wstr_length; void *wstr; } PyUnicodeObject; These fields have the following interpretations: - length: number of code points in the string (result of sq_length) - str: shortest-form representation of the unicode string; the lower two bits of the pointer indicate the specific form: 01 => 1 byte (Latin-1); 11 => 2 byte (UCS-2); 11 => 4 byte (UCS-4); 00 => null pointer The string is null-terminated (in its respective representation). - hash, state: same as in Python 3.2 - utf8_length, utf8: UTF-8 representation (null-terminated) - wstr_length, wstr: representation in platform's wchar_t (null-terminated). If wchar_t is 16-bit, this form may use surrogate pairs (in which cast wstr_length differs form length). All three representations are optional, although the str form is considered the canonical representation which can be absent only while the string is being created. The Py_UNICODE type is still supported but deprecated. It is always defined as a typedef for wchar_t, so the wstr representation can double as Py_UNICODE representation. The str and utf8 pointers point to the same memory if the string uses only ASCII characters (using only Latin-1 is not sufficient). The str and wstr pointers point to the same memory if the string happens to fit exactly to the wchar_t type of the platform (i.e. uses some BMP-not-Latin-1 characters if sizeof(wchar_t) is 2, and uses some non-BMP characters if sizeof(wchar_t) is 4). If the string is created directly with the canonical representation (see below), this representation doesn't take a separate memory block, but is allocated right after the PyUnicodeObject struct. String Creation --------------- The recommended way to create a Unicode object is to use the function PyUnicode_New:: PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar); Both parameters must denote the eventual size/range of the strings. In particular, codecs using this API must compute both the number of characters and the maximum character in advance. An string is allocated according to the specified size and character range and is null-terminated; the actual characters in it may be unitialized. PyUnicode_FromString and PyUnicode_FromStringAndSize remain supported for processing UTF-8 input; the input is decoded, and the UTF-8 representation is not yet set for the string. PyUnicode_FromUnicode remains supported but is deprecated. If the Py_UNICODE pointer is non-null, the str representation is set. If the pointer is NULL, a properly-sized wstr representation is allocated, which can be modified until PyUnicode_Finalize() is called (explicitly or implicitly). Resizing a Unicode string remains possible until it is finalized. PyUnicode_Finalize() converts a string containing only a wstr representation into the canonical representation. Unless wstr and str can share the memory, the wstr representation is discarded after the conversion. String Access ------------- The canonical representation can be accessed using two macros PyUnicode_Kind and PyUnicode_Data. PyUnicode_Kind gives one of the value PyUnicode_1BYTE (1), PyUnicode_2BYTE (2), or PyUnicode_4BYTE (3). PyUnicode_Data gives the void pointer to the data, masking out the pointer kind. All these functions call PyUnicode_Finalize in case the canonical representation hasn't been computed yet. A new function PyUnicode_AsUTF8 is provided to access the UTF-8 representation. It is thus identical to the existing _PyUnicode_AsString, which is removed. The function will compute the utf8 representation when first called. Since this representation will consume memory until the string object is released, applications should use the existing PyUnicode_AsUTF8String where possible (which generates a new string object every time). API that implicitly converts a string to a char* (such as the ParseTuple functions) will use this function to compute a conversion. PyUnicode_AsUnicode is deprecated; it computes the wstr representation on first use. String Operations ----------------- Various convenience functions will be provided to deal with the canonical representation, in particular with respect to concatenation and slicing. Stable ABI ---------- None of the functions in this PEP become part of the stable ABI. Copyright ========= This document has been placed in the public domain. From fdrake at acm.org Mon Jan 24 21:28:21 2011 From: fdrake at acm.org (Fred Drake) Date: Mon, 24 Jan 2011 15:28:21 -0500 Subject: [Python-Dev] Location of tests for packages In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: 
                              
                              On Mon, Jan 24, 2011 at 2:46 PM, Raymond Hettinger 
                              
                              wrote: > P.S. ?I've discussed this with Michael and his preference is against > going back to the Py3.1 style where the tests were under Lib/test. ?He > thinks the current tree makes it easier to sync with Py2.7 and the > unittest2 third-party module. ?Also, he likes grepping the regular source > and tests all at once. I'm with Michael on this. -1 on pushing all the tests into Lib/test/. ? -Fred -- Fred L. Drake, Jr.? ? 
                              
                              "A storm broke loose in my mind."? --Albert Einstein From martin at v.loewis.de Mon Jan 24 21:28:58 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 24 Jan 2011 21:28:58 +0100 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <201101241639.39345.vstinner@edenwall.com> References: <1295440442.432.18.camel@marge> <4D3D4A3F.1020502@v.loewis.de> <87lj2a613p.fsf@uwakimon.sk.tsukuba.ac.jp> <201101241639.39345.vstinner@edenwall.com> Message-ID: <4D3DE10A.6020902@v.loewis.de> Am 24.01.2011 16:39, schrieb Victor Stinner: > Le lundi 24 janvier 2011 11:35:22, Stephen J. Turnbull a ?crit : >> ... VFAT-formatted file systems and Shift JIS file names ... > > I missed something: VFAT stores filenames as unicode (whereas FAT only > supports byte filenames). Well, VFAT stores filenames twice: as a 8+3 byte > strings and as a 255 unicode (UTF-16-LE) string (UTF-16-LE). Stephen may not have meant VFAT. Instead, he might have meant FAT32, or, more likely, exFAT. VFAT is patented by Microsoft, so vendors of devices using flash memory cards often don't support VFAT. In any case, file names are encoded in the OEM code page even on VFAT. > On which OS do you access this VFAT file system? On Windows, you have two > APIs: bytes (*A) and wide character (*W). If you use the wide character, there > is explicit encoding at all. Right ("no explicit encoding"). However, this is actually where things can go wrong: Windows needs to guess the file system, and will guess it uses the OEM code page. If the device writing the file system uses a different OEM code age than the Windows installation reading it, you get moji-bake. This will actually happen with the *A APIs as well: they do *not* give you the file name from disk. Instead, Windows converts the OEM characters on disk to Unicode, and then the Unicode characters to the ANSI code page. > Linux has two mount options to control unicode on > a VFAT filesystem: "codepage" for the byte filenames (use Shift JIS here) and > "iocharset" for the unicode filenames (I don't understand this option). > Anyway, both systems support unicode filenames. Linux doesn't support "unicode file names". Instead, it can support UTF-8. As Oleg explains: you need one encoding for the bytes on disk (to know what they mean, when converted to Unicode), and one encoding to then convert the "abstract" unicode to bytes again to present to the application. This is similar to how *A works on Windows. The iocharset is needed even if the file system is known to use UTF-16 (say, NTFS, VFAT, or Joliet). Regards, Martin From barry at python.org Mon Jan 24 21:39:27 2011 From: barry at python.org (Barry Warsaw) Date: Mon, 24 Jan 2011 15:39:27 -0500 Subject: [Python-Dev] Location of tests for packages In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: <20110124153927.78810848@python.org> On Jan 24, 2011, at 11:46 AM, Raymond Hettinger wrote: >P.S. I've discussed this with Michael and his preference is against going >back to the Py3.1 style where the tests were under Lib/test. He thinks the >current tree makes it easier to sync with Py2.7 and the unittest2 third-party >module. Also, he likes grepping the regular source and tests all at once. Which seem like compelling reasons to keep things the way they are for unittest, in addition to the fact that we're already in RC for 3.2, so you would need RM approval to make such a change this late in the process. I agree that it's not ideal, but for certain packages that are also distributed separately, it can be much easier to keep the tests with the code, and I'm inclined to defer to the primary maintainer's preference. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: 
                              
                              From tseaver at palladion.com Mon Jan 24 22:59:12 2011 From: tseaver at palladion.com (Tres Seaver) Date: Mon, 24 Jan 2011 16:59:12 -0500 Subject: [Python-Dev] Keeping __init__.py empty for Python packages used for module grouping. In-Reply-To: 
                              
                              References: <854DFFA6-1BAD-41D7-BBC7-6906C606774A@gmail.com> 
                              
                              Message-ID: 
                              
                              -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/24/2011 03:14 PM, Fred Drake wrote: > On Mon, Jan 24, 2011 at 2:04 PM, Raymond Hettinger > 
                              
                              wrote: >> ISTM, that if we're going to use python packages as "namespace containers" for >> categorizing modules, then the top level __init__ namespace should be left empty. > > This is only an issue if the separate components are distributed > separately; for the standard library, we're not using it as a > namespace package in the same sense that is done with (for example) > the "zope" package. It might matter if we want to enable third-party package installation into a namespace also used by the stdlib: ISTR that the 'xml' package had such installs at one point. If that pattern is a goal, having all versions of the namespace's __init__.py empty of anything but the __path__-munging majyk / boilerplate is required to make such installs work regardless of the order of PYTHONPATH. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk099jAACgkQ+gerLs4ltQ7e4gCfbYJE8d8bNrX19zrzC4xvfA9Y KkQAnA7niExvMqXtUBD/XwzZZ9EzHcBm =/Q/Y -----END PGP SIGNATURE----- From solipsis at pitrou.net Mon Jan 24 23:03:07 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 24 Jan 2011 23:03:07 +0100 Subject: [Python-Dev] Location of tests for packages References: 
                              
                              Message-ID: <20110124230307.3e7a3ee2@pitrou.net> On Mon, 24 Jan 2011 11:46:45 -0800 Raymond Hettinger 
                              
                              wrote: > > This isn't a critical issue (nothing is broken) but we're a week from another release candidate, so the new Py3.2 package organization (unittest was flat in Py3.1 and its test were under Lib/test) is about to become a de-facto decision that will be hard to undo. Well can we stop being melodramatic? Tests are not part of the API and so they are free to move whenever we want. No need to hold a release candidate for that. Regards Antoine. From solipsis at pitrou.net Mon Jan 24 23:12:33 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 24 Jan 2011 23:12:33 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation References: <4D3DDE5E.4080807@v.loewis.de> Message-ID: <20110124231233.79bed8eb@pitrou.net> On Mon, 24 Jan 2011 21:17:34 +0100 "Martin v. L?wis" 
                              
                              wrote: > I have been thinking about Unicode representation for some time now. > This was triggered, on the one hand, by discussions with Glyph Lefkowitz > (who complained that his server app consumes too much memory), and Carl > Friedrich Bolz (who profiled Python applications to determine that > Unicode strings are among the top consumers of memory in Python). > On the other hand, this was triggered by the discussion on supporting > surrogates in the library better. > > I'd like to propose PEP 393, which takes a different approach, > addressing both problems simultaneously: by getting a flexible > representation (one that can be either 1, 2, or 4 bytes), we can > support the full range of Unicode on all systems, but still use > only one byte per character for strings that are pure ASCII (which > will be the majority of strings for the majority of users). For this kind of experiment, I think a concrete attempt at implementing (together with performance/memory savings numbers) would be much more useful than an abstract proposal. It is hard to judge the concrete effects of the changes you are proposing, even though they might (or not) make sense in theory. For example, you are adding a lot of constant overhead to every unicode object, even very small ones, which might be detrimental. Also, accessing the unicode object's payload can become quite a bit more cumbersome. Only implementing can tell how much this is workable in practice. Regards Antoine. From benjamin at python.org Mon Jan 24 23:13:38 2011 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 24 Jan 2011 16:13:38 -0600 Subject: [Python-Dev] Location of tests for packages In-Reply-To: <4D3DDF33.9020702@netwok.org> References: 
                              
                              <4D3DDF33.9020702@netwok.org> Message-ID: 
                              
                              2011/1/24 ?ric Araujo 
                              
                              : >> Right now, the tests for the unittest package are under the >> package directory instead of Lib/test where we have most of the >> other tests. >> >> There are some other packages that do the same thing, each for >> their own reason. > > The corresponding bug report is #10572 (opened by Michael Foord). > > R. David Murray was +1 for moving email tests, Barry deferred to him, > Brett was +0 for importlib, and I was ?0 for distutils. ?Maintainers of > ctypes, json, lib2to3 and sqlite3 haven?t yet expressed themselves. I prefer lib2to3 tests to stay in lib2to3/. -- Regards, Benjamin From tjreedy at udel.edu Mon Jan 24 23:44:36 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 24 Jan 2011 17:44:36 -0500 Subject: [Python-Dev] Keeping __init__.py empty for Python packages used for module grouping. In-Reply-To: 
                              
                              References: <854DFFA6-1BAD-41D7-BBC7-6906C606774A@gmail.com> 
                              
                              Message-ID: 
                              
                              On 1/24/2011 2:18 PM, Georg Brandl wrote: > Am 24.01.2011 20:04, schrieb Raymond Hettinger: >> Looking at http://docs.python.org/dev/library/html.html#module-html >> it would appear that we've created a new module with a single >> trivial function. >> >> In reality, there was already a python package, html, that served >> to group two loosely related modules, html.parser and >> html.entities. >> >> ISTM, that if we're going to use python packages as "namespace >> containers" for categorizing modules, then the top level __init__ >> namespace should be left empty. >> >> Before the placement of html.escape() becomes set in stone, I think >> we should consider putting it somewhere else. > > To be honest, I don't see the issue. I don't see stdlib packages as > "namespace containers", but rather as a nice way of structuring > functionality. And remember that flat is better than nested -- why > should escape() be put away into a new submodule? > > At least you'll need to let us know where you would rather put that > function. I would put in html.entities, which is also sparse, as it seems to me vaguely related. -- Terry Jan Reedy From martin at v.loewis.de Tue Jan 25 00:04:03 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 25 Jan 2011 00:04:03 +0100 Subject: [Python-Dev] Keeping __init__.py empty for Python packages used for module grouping. In-Reply-To: 
                              
                              References: <854DFFA6-1BAD-41D7-BBC7-6906C606774A@gmail.com> 
                              
                              
                              Message-ID: <4D3E0563.2050807@v.loewis.de> > If that pattern is a goal, having all versions of the namespace's > __init__.py empty of anything but the __path__-munging majyk / > boilerplate is required to make such installs work regardless of the > order of PYTHONPATH. With PEP 382, having extensible packages won't contradict to having a non-trivial __init__.py, and no __path__-munging will be necessary. Regards, Martin From martin at v.loewis.de Tue Jan 25 00:07:03 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 25 Jan 2011 00:07:03 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <20110124231233.79bed8eb@pitrou.net> References: <4D3DDE5E.4080807@v.loewis.de> <20110124231233.79bed8eb@pitrou.net> Message-ID: <4D3E0617.7010001@v.loewis.de> >> I'd like to propose PEP 393, which takes a different approach, >> addressing both problems simultaneously: by getting a flexible >> representation (one that can be either 1, 2, or 4 bytes), we can >> support the full range of Unicode on all systems, but still use >> only one byte per character for strings that are pure ASCII (which >> will be the majority of strings for the majority of users). > > For this kind of experiment, I think a concrete attempt at implementing > (together with performance/memory savings numbers) would be much more > useful than an abstract proposal. I partially agree. An implementation is certainly needed, but there is nothing wrong (IMO) with designing the change before implementing it. Also, several people have offered to help with the implementation, so we need to agree on a specification first (which is actually cheaper than starting with the implementation only to find out that people misunderstood each other). Regards, Martin From brett at python.org Tue Jan 25 00:09:01 2011 From: brett at python.org (Brett Cannon) Date: Mon, 24 Jan 2011 15:09:01 -0800 Subject: [Python-Dev] Keeping __init__.py empty for Python packages used for module grouping. In-Reply-To: 
                              
                              References: <854DFFA6-1BAD-41D7-BBC7-6906C606774A@gmail.com> 
                              
                              Message-ID: 
                              
                              On Mon, Jan 24, 2011 at 11:18, Georg Brandl 
                              
                              wrote: > Am 24.01.2011 20:04, schrieb Raymond Hettinger: >> Looking at http://docs.python.org/dev/library/html.html#module-html it would >> appear that we've created a new module with a single trivial function. >> >> In reality, there was already a python package, html, that served to group >> two loosely related modules, html.parser and html.entities. >> >> ISTM, that if we're going to use python packages as "namespace containers" >> for categorizing modules, then the top level __init__ namespace should be >> left empty. >> >> Before the placement of html.escape() becomes set in stone, I think we should >> consider putting it somewhere else. > > To be honest, I don't see the issue. ?I don't see stdlib packages as > "namespace containers", but rather as a nice way of structuring functionality. > And remember that flat is better than nested -- why should escape() be put > away into a new submodule? Importlib also acts as a precedent with importlib.import_module(). I honestly don't feel the need to treat packages as a namespace explicitly (but then again I also disagree with the argument that __init__.py needs to be left empty). From martin at v.loewis.de Tue Jan 25 00:14:21 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 25 Jan 2011 00:14:21 +0100 Subject: [Python-Dev] Location of tests for packages In-Reply-To: <20110124230307.3e7a3ee2@pitrou.net> References: 
                              
                              <20110124230307.3e7a3ee2@pitrou.net> Message-ID: <4D3E07CD.1020904@v.loewis.de> >> This isn't a critical issue (nothing is broken) but we're a week >> from another release candidate, so the new Py3.2 package >> organization (unittest was flat in Py3.1 and its test were under >> Lib/test) is about to become a de-facto decision that will be hard >> to undo. > > Well can we stop being melodramatic? Tests are not part of the API > and so they are free to move whenever we want. No need to hold a > release candidate for that. Of course there is. Any addition or removal of files at this point has the chance of breaking the release process, which may fail to pick up files, or break in trying to pick up files that it expected to be there. This has happened *many* times during the alpha and beta releases of 3.2, so it's not at all a theoretical problem. After the next release candidate, I'd prefer to see no changes whatsoever to the tree (but it's Georg's decision, of course). Regards, Martin From solipsis at pitrou.net Tue Jan 25 00:20:45 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 25 Jan 2011 00:20:45 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <4D3E0617.7010001@v.loewis.de> References: <4D3DDE5E.4080807@v.loewis.de> <20110124231233.79bed8eb@pitrou.net> <4D3E0617.7010001@v.loewis.de> Message-ID: <1295911245.3704.13.camel@localhost.localdomain> Le mardi 25 janvier 2011 ? 00:07 +0100, "Martin v. L?wis" a ?crit : > >> I'd like to propose PEP 393, which takes a different approach, > >> addressing both problems simultaneously: by getting a flexible > >> representation (one that can be either 1, 2, or 4 bytes), we can > >> support the full range of Unicode on all systems, but still use > >> only one byte per character for strings that are pure ASCII (which > >> will be the majority of strings for the majority of users). > > > > For this kind of experiment, I think a concrete attempt at implementing > > (together with performance/memory savings numbers) would be much more > > useful than an abstract proposal. > > I partially agree. An implementation is certainly needed, but there is > nothing wrong (IMO) with designing the change before implementing it. > Also, several people have offered to help with the implementation, so > we need to agree on a specification first (which is actually cheaper > than starting with the implementation only to find out that people > misunderstood each other). I'm not sure it's really cheaper. When implementing you will probably find out that it makes more sense to change the meaning of some fields, add or remove some, etc. You will also want to try various tweaks since the whole point is to lighten the footprint of unicode strings in common workloads. So, the only criticism I have, intuitively, is that the unicode structure seems to become a bit too large. For example, I'm not sure you need a generic (pointer, size) pair in addition to the representation-specific ones. Incidentally, to slightly reduce the overhead the unicode objects, there's this proposal: http://bugs.python.org/issue1943 Regards Antoine. From solipsis at pitrou.net Tue Jan 25 00:21:48 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 25 Jan 2011 00:21:48 +0100 Subject: [Python-Dev] Location of tests for packages In-Reply-To: <4D3E07CD.1020904@v.loewis.de> References: 
                              
                              <20110124230307.3e7a3ee2@pitrou.net> <4D3E07CD.1020904@v.loewis.de> Message-ID: <1295911308.3704.15.camel@localhost.localdomain> Le mardi 25 janvier 2011 ? 00:14 +0100, "Martin v. L?wis" a ?crit : > >> This isn't a critical issue (nothing is broken) but we're a week > >> from another release candidate, so the new Py3.2 package > >> organization (unittest was flat in Py3.1 and its test were under > >> Lib/test) is about to become a de-facto decision that will be hard > >> to undo. > > > > Well can we stop being melodramatic? Tests are not part of the API > > and so they are free to move whenever we want. No need to hold a > > release candidate for that. > > Of course there is. Any addition or removal of files at this point has > the chance of breaking the release process, which may fail to pick up > files, or break in trying to pick up files that it expected to be there. > This has happened *many* times during the alpha and beta releases of > 3.2, so it's not at all a theoretical problem. My point was that these changes can take place after 3.2 (both final and rc). Regards Antoine. From fuzzyman at voidspace.org.uk Tue Jan 25 00:40:55 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 24 Jan 2011 23:40:55 +0000 Subject: [Python-Dev] Location of tests for packages In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: <4D3E0E07.1080005@voidspace.org.uk> On 24/01/2011 19:46, Raymond Hettinger wrote: > Right now, the tests for the unittest package are under the package directory instead of Lib/test where we have most of the other tests. > > There are some other packages that do the same thing, each for their own reason. > > I think we should develop a strong preference for tests going under Lib/test unless there is a very compelling reason. We already have a similar preference for all Docs going under Doc/ and that has not proved to be an issue with any package maintainer. > > * The Windows distro has an install option to exclude Lib/test. The currrent situation with unittest works against it. > * The commingling of tests with the regular code is making it more difficult to grep code while excluding tests. > * Having packages create their little worlds within world is making it more difficult to find things. > * For regrtest to work, there still needs to be some file in Lib/test that dispatches to the alternate test directory. > > This isn't a critical issue (nothing is broken) but we're a week from another release candidate, so the new Py3.2 package organization (unittest was flat in Py3.1 and its test were under Lib/test) is about to become a de-facto decision that will be hard to undo. The tests are already under unittest in 2.7 so that change isn't "new". Moving the tests now makes it harder to maintain them (patches to 3.2 won't apply to 2.7). This is discussed in issue 10572. http://bugs.python.org/issue10572 It isn't just unittest, it seems that all *test packages* are in their respective package and not Lib/test except for the json module where Raymond already moved the tests: distutils/tests email/test ctypes/test importlib/test lib2to3/tests sqlite3/test tkinter/test So I'm a little confused as to why the focus on the *unittest* test suite. Brett has expressed a willingness to move the importlib tests under Lib/test and R David Murray would *like* to move the email tests there (but hasn't). Barry is -0 and so am I. It generally makes a few things slightly harder for me but not much. If we make a general policy decision to move all package tests out of their packages and into Lib/test (and actually do it) then fine, but I'm not overjoyed with a unilateral decision that unittest is special in this regard... :-) All the best, Michael > I recommend moving it under Lib/test before everything is set in stone. > > > Raymond > > > P.S. I've discussed this with Michael and his preference is against going back to the Py3.1 style where the tests were under Lib/test. He thinks the current tree makes it easier to sync with Py2.7 and the unittest2 third-party module. Also, he likes grepping the regular source and tests all at once. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From fuzzyman at voidspace.org.uk Tue Jan 25 01:19:02 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 25 Jan 2011 00:19:02 +0000 Subject: [Python-Dev] Location of tests for packages In-Reply-To: <4D3E07CD.1020904@v.loewis.de> References: 
                              
                              <20110124230307.3e7a3ee2@pitrou.net> <4D3E07CD.1020904@v.loewis.de> Message-ID: <4D3E16F6.8020402@voidspace.org.uk> On 24/01/2011 23:14, "Martin v. L?wis" wrote: >>> This isn't a critical issue (nothing is broken) but we're a week >>> from another release candidate, so the new Py3.2 package >>> organization (unittest was flat in Py3.1 and its test were under >>> Lib/test) is about to become a de-facto decision that will be hard >>> to undo. >> Well can we stop being melodramatic? Tests are not part of the API >> and so they are free to move whenever we want. No need to hold a >> release candidate for that. > Of course there is. Any addition or removal of files at this point has > the chance of breaking the release process, which may fail to pick up > files, or break in trying to pick up files that it expected to be there. > This has happened *many* times during the alpha and beta releases of > 3.2, so it's not at all a theoretical problem. > > After the next release candidate, I'd prefer to see no changes > whatsoever to the tree (but it's Georg's decision, of course). What Antoine meant is that we could make the change for 3.2.1 and don't need to delay 3.2. Michael > Regards, > Martin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From dmalcolm at redhat.com Tue Jan 25 01:28:43 2011 From: dmalcolm at redhat.com (David Malcolm) Date: Mon, 24 Jan 2011 19:28:43 -0500 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <4D3DDE5E.4080807@v.loewis.de> References: <4D3DDE5E.4080807@v.loewis.de> Message-ID: <1295915323.3219.44.camel@radiator.bos.redhat.com> On Mon, 2011-01-24 at 21:17 +0100, "Martin v. L?wis" wrote: ... snip ... > I'd like to propose PEP 393, which takes a different approach, > addressing both problems simultaneously: by getting a flexible > representation (one that can be either 1, 2, or 4 bytes), we can > support the full range of Unicode on all systems, but still use > only one byte per character for strings that are pure ASCII (which > will be the majority of strings for the majority of users). There was some discussion about this at PyCon 2010, where we referred to it casually as "Pay-as-you-go unicode" ... snip ... > - str: shortest-form representation of the unicode string; the lower > two bits of the pointer indicate the specific form: > 01 => 1 byte (Latin-1); 11 => 2 byte (UCS-2); 11 => 4 byte (UCS-4); Repetition of "11"; I'm guessing that the 2byte/UCS-2 should read "10", so that they give the width of the char representation. > 00 => null pointer Naturally this assumes that all pointers are at least 4-byte aligned (so that they can be masked off). I assume that this is sane on every platform that Python supports, but should it be spelled out explicitly somewhere in the PEP? > > The string is null-terminated (in its respective representation). > - hash, state: same as in Python 3.2 > - utf8_length, utf8: UTF-8 representation (null-terminated) If this is to share its buffer with the "str" representation for the Latin-1 case, then I take it this ptr will typically be (str & ~4) ? i.e. only "str" has the low-order-bit type info. > - wstr_length, wstr: representation in platform's wchar_t > (null-terminated). If wchar_t is 16-bit, this form may use surrogate > pairs (in which cast wstr_length differs form length). > > All three representations are optional, although the str form is > considered the canonical representation which can be absent only > while the string is being created. Spelling out the meaning of "optional": does this mean that the relevant ptr is NULL; if so, if utf8 is null, is utf8_length undefined, or is it some dummy value? (i.e. is the pointer the first thing to check before we know if utf8_length is meaningful?); similar consideration for the wstr representation. > The Py_UNICODE type is still supported but deprecated. It is always > defined as a typedef for wchar_t, so the wstr representation can double > as Py_UNICODE representation. > > The str and utf8 pointers point to the same memory if the string uses > only ASCII characters (using only Latin-1 is not sufficient). The str ...though the ptrs are non-equal for this case, as noted above, as "str" has an 0x1 typecode. > and wstr pointers point to the same memory if the string happens to > fit exactly to the wchar_t type of the platform (i.e. uses some > BMP-not-Latin-1 characters if sizeof(wchar_t) is 2, and uses some > non-BMP characters if sizeof(wchar_t) is 4). > > If the string is created directly with the canonical representation > (see below), this representation doesn't take a separate memory block, > but is allocated right after the PyUnicodeObject struct. Is the idea to do pointer arithmentic when deleting the PyUnicodeObject to determine if the ptr is in that location, and not delete it if it is, or is there some other way of determining whether the pointers need deallocating? If the former, is this embedding an assumption that the underlying allocator couldn't have allocated a buffer directly adjacent to the PyUnicodeObject. I know that GNU libc's malloc/free implementation has gaps of two machine words between each allocation; off the top of my head I'm not sure if the optimized Object/obmalloc.c allocator enforces such gaps. ... snip ... Extra section: GDB Debugging Hooks ------------------- Tools/gdb/libpython.py contains debugging hooks that embed knowledge about the internals of CPython's data types, include PyUnicodeObject instances. It will need to be slightly updated to track the change. (I can do that change if need be; it shouldn't be too hard). Hope this is helpful Dave From raymond.hettinger at gmail.com Tue Jan 25 02:19:44 2011 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 24 Jan 2011 17:19:44 -0800 Subject: [Python-Dev] Location of tests for packages In-Reply-To: <4D3E0E07.1080005@voidspace.org.uk> References: 
                              
                              <4D3E0E07.1080005@voidspace.org.uk> Message-ID: <749B0A1D-752F-4842-96D4-E73FEEFD5CEF@gmail.com> On Jan 24, 2011, at 3:40 PM, Michael Foord wrote: > It isn't just unittest, it seems that all *test packages* are in their respective package and not Lib/test except for the json module where Raymond already moved the tests: > > distutils/tests > email/test > ctypes/test > importlib/test > lib2to3/tests > sqlite3/test > tkinter/test > > So I'm a little confused as to why the focus on the *unittest* test suite. There's not a focus on unittest. Importlib should also move under Lib/test and when email is ready, it too should fully join the organization of the overall project (Doc, Lib, Lib/test, Modules, Objects, Tools). ISTM, ctypes and disutils could almost be viewed as separate projects. We could ship Python without ctypes for example and we've got a policy against implementing the rest of library using ctypes. The same goes for tkinter (it is not uncommon to have builds with it). And sqlite3 is close to being completely third-party maintained. In contrast, the unittest module and importlib belong with the core distro. So, I'm thinking that there were some precedents in cases where there was a really good reason for separating the project (we don't even include tkinter docs in our doc build), but that we should maintain a strong preference for keeping the overall project organization intact. ElementTree was fully folded into the project. I think we should follow that precedent and avoid balkanizing the python source into many little project subtrees (worlds within a world). Raymond From a.badger at gmail.com Tue Jan 25 04:26:09 2011 From: a.badger at gmail.com (Toshio Kuratomi) Date: Mon, 24 Jan 2011 19:26:09 -0800 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: 
                              
                              References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              Message-ID: <20110125032609.GC24080@unaka.lan> On Thu, Jan 20, 2011 at 03:27:08PM -0500, Glyph Lefkowitz wrote: > > On Jan 20, 2011, at 11:46 AM, Guido van Rossum wrote: > Same here. *Most* code will never be shared, or will only be shared > between users in the same community. When it goes wrong it's also a > learning opportunity. :-) > > > Despite my usual proclivity for being contrarian, I find myself in agreement > here. Linux users with locales that don't specify UTF-8 frankly _should_ have > to deal with all kinds of nastiness until they can transcode their filesystems. > MacOS and Windows both have a "right" answer here and your third-party tools > shouldn't create mojibake in your filenames. > However, if this is the consensus, it makes a lot more sense to pick utf-8 as *the* encoding for python module filenames on Linux. Why UTF-8: * UTF-8 can cover the whole range of unicode whereas most (all?) other locale friendly encodings cannot. * UTF-8 is becoming a standard for Linux distributions whether or not Linux users are adopting it. * Third party tools are gaining support for UTF-8 even when they aren't gaining support for generic encodings (If I read the spec on zip correctly, this is actually what's happening there). Why not locale: * Relying on locale is simply not portable. If nothing prevents people from distributing a unicode filename then they will go ahead and do so. If the result works (say, because it's utf-8 and 80% of the Linux userbase is using utf-8) then it will get packaged and distributed and people won't know that it's a problem until someone with a non-utf-8 locale decids to use it. * Mixing of modules from different locales won't work. Suppose that the system python installs the previous module. The local site has other modules that it has installed using a different filename encoding. The users at the site will find that either one or hte other of the two modules won't work. * Because of the portability problems you have no choice but to tell people not to distribute python modules with non-ASCII names. This makes the use of unicode names second class indefintely (until the kernel devs decide that they're wrong to not enforce a filesystem encoding or Linux becomes irrelevant as a platform). * If you can pick a set of encodings that are valid (utf-8 for Linux and MacOS, wide unicode for windows [I get the feeling from other parts of the conversation that Windows won't be so lucky, though]) tools to convert python names become easier to write. If you restrict it far enough, you could even write tools/importers that automatically do the detection. PS: Sorry for not replying immediately, the team I'm on is dealing with an issue at my work and I'm also preparing for a conference later this week. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: 
                              
                              From fdrake at acm.org Tue Jan 25 05:40:49 2011 From: fdrake at acm.org (Fred Drake) Date: Mon, 24 Jan 2011 23:40:49 -0500 Subject: [Python-Dev] Keeping __init__.py empty for Python packages used for module grouping. In-Reply-To: 
                              
                              References: <854DFFA6-1BAD-41D7-BBC7-6906C606774A@gmail.com> 
                              
                              
                              Message-ID: 
                              
                              On Mon, Jan 24, 2011 at 4:59 PM, Tres Seaver 
                              
                              wrote: > It might matter if we want to enable third-party package installation > into a namespace also used by the stdlib: ?ISTR that the 'xml' package > had such installs at one point. Almost, but not quite. The xml package at one point allowed itself to be overridden by another package (_xmlplus specifically), however that was define. Experience proved that this was a mistake. "Namespace packages", as originally defined by setuptools and applied for the hurry, zc, and zope packages (and many others), are a very different thing than what was done for the xml/_xmlplus package, and have proven significantly more useful and usable. While I heartily approve of "namespace packages" of that sort, I see no reason to support installing into the same package namespace as the standard library. The primary disadvantage I see is that it would be too easy to foster confusion over what's in the standard library among newcomers. ? -Fred -- Fred L. Drake, Jr.? ? 
                              
                              "A storm broke loose in my mind."? --Albert Einstein From g.brandl at gmx.net Tue Jan 25 08:23:42 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 25 Jan 2011 08:23:42 +0100 Subject: [Python-Dev] Location of tests for packages In-Reply-To: <4D3E07CD.1020904@v.loewis.de> References: 
                              
                              <20110124230307.3e7a3ee2@pitrou.net> <4D3E07CD.1020904@v.loewis.de> Message-ID: 
                              
                              Am 25.01.2011 00:14, schrieb "Martin v. L?wis": >>> This isn't a critical issue (nothing is broken) but we're a week >>> from another release candidate, so the new Py3.2 package >>> organization (unittest was flat in Py3.1 and its test were under >>> Lib/test) is about to become a de-facto decision that will be hard >>> to undo. >> >> Well can we stop being melodramatic? Tests are not part of the API >> and so they are free to move whenever we want. No need to hold a >> release candidate for that. Yes, let's postpone this for after the final release. > Of course there is. Any addition or removal of files at this point has > the chance of breaking the release process, which may fail to pick up > files, or break in trying to pick up files that it expected to be there. > This has happened *many* times during the alpha and beta releases of > 3.2, so it's not at all a theoretical problem. > > After the next release candidate, I'd prefer to see no changes > whatsoever to the tree (but it's Georg's decision, of course). I agree with both of you. Ideally there shouldn't be any but cosmetic changes after rc2, otherwise I'd be inclined to add an rc3 to the release schedule. Georg From g.brandl at gmx.net Tue Jan 25 08:26:56 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 25 Jan 2011 08:26:56 +0100 Subject: [Python-Dev] Location of tests for packages In-Reply-To: <749B0A1D-752F-4842-96D4-E73FEEFD5CEF@gmail.com> References: 
                              
                              <4D3E0E07.1080005@voidspace.org.uk> <749B0A1D-752F-4842-96D4-E73FEEFD5CEF@gmail.com> Message-ID: 
                              
                              Am 25.01.2011 02:19, schrieb Raymond Hettinger: > > On Jan 24, 2011, at 3:40 PM, Michael Foord wrote: >> It isn't just unittest, it seems that all *test packages* are in their respective package and not Lib/test except for the json module where Raymond already moved the tests: >> >> distutils/tests >> email/test >> ctypes/test >> importlib/test >> lib2to3/tests >> sqlite3/test >> tkinter/test >> >> So I'm a little confused as to why the focus on the *unittest* test suite. > > > There's not a focus on unittest. Importlib should also move under Lib/test > and when email is ready, it too should fully join the organization of > the overall project (Doc, Lib, Lib/test, Modules, Objects, Tools). I'm +0 on moving all tests under Lib/test -- I think the respective maintainers of the libraries in question should have the final word, because... > ISTM, ctypes and disutils could almost be viewed as separate projects. > We could ship Python without ctypes for example and we've got a policy > against implementing the rest of library using ctypes. The same goes > for tkinter (it is not uncommon to have builds with it). And sqlite3 is > close to being completely third-party maintained. this weakens the argument of having a consistent organization of test modules: if one or two are allowed to have the test suite intra-package, it doesn't matter so much any more for others. Georg From stephen at xemacs.org Tue Jan 25 09:29:12 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 25 Jan 2011 17:29:12 +0900 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <201101241639.39345.vstinner@edenwall.com> References: <1295440442.432.18.camel@marge> <4D3D4A3F.1020502@v.loewis.de> <87lj2a613p.fsf@uwakimon.sk.tsukuba.ac.jp> <201101241639.39345.vstinner@edenwall.com> Message-ID: <87fwsh5quf.fsf@uwakimon.sk.tsukuba.ac.jp> As Nick points out, nobody really seems to think this is an argument against your patch. I'm going to bow out of this thread after this post, as I'm clearly out of my technical depth. Victor Stinner writes: > Le lundi 24 janvier 2011 11:35:22, Stephen J. Turnbull a ?crit : > > ... VFAT-formatted file systems and Shift JIS file names ... > > I missed something: VFAT stores filenames as unicode (whereas FAT only > supports byte filenames). Well, VFAT stores filenames twice: as a 8+3 byte > strings and as a 255 unicode (UTF-16-LE) string (UTF-16-LE). I don't know what it is; I didn't have char-device-level access to the file system, nor did I have the specs (it was a proprietary phone by a Japanese OEM). It *presented* filenames in Shift JIS when mounted on Linux with the vfat filesystem (either "mount -t vfat /dev/sde1 /mnt/gadget" or "mount -t auto /dev/sde1 /mnt/gadget"). Maybe there is some unusual layer to translate from Unicode there, I'm not familiar with Linux kernel drivers and libc facilities (such special-casing is a common pattern in programming for Japanese; remember, the Japanese had to deal with these issues before there was any standard for them). > On which OS do you access this VFAT file system? On Windows, you have two > APIs: bytes (*A) and wide character (*W). If you use the wide character, there > is explicit encoding at all. Linux has two mount options to control unicode on > a VFAT filesystem: "codepage" for the byte filenames (use Shift JIS here) and > "iocharset" for the unicode filenames (I don't understand this > option). I didn't either, in fact this is the first I've heard of it, so I've never tried it. > I suppose that Shift JIS is used to encode the filename in the 8+3 byte string > form. Could be, but I'm pretty sure these were long filenames, although maybe they were just short enough (that is, I don't recall noticing any truncation when mounted compared to the way they were presented on the phone itself). I don't use that phone anymore, it's in a box of junk equipment somewhere.... From catch-all at masklinn.net Tue Jan 25 10:22:41 2011 From: catch-all at masklinn.net (Xavier Morel) Date: Tue, 25 Jan 2011 10:22:41 +0100 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <20110125032609.GC24080@unaka.lan> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> Message-ID: <7F2E941E-143A-461E-BEC5-D7545C6D877A@masklinn.net> On 2011-01-25, at 04:26 , Toshio Kuratomi wrote: > > * If you can pick a set of encodings that are valid (utf-8 for Linux and > MacOS HFS+ uses UTF-16 in NFD (actually in an Apple-specific variant of NFD). Right here you've already broken Python modules on OSX. And as far as I know, Linux software/FS generally use NFC (I've already seen this issue cause trouble) From ncoghlan at gmail.com Tue Jan 25 11:13:57 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 25 Jan 2011 20:13:57 +1000 Subject: [Python-Dev] tahoe-lafs In-Reply-To: <59ACE054B23B6045ACC43DBD778609BD6867D1731B@UM-EMAIL02.um.umsystem.edu> References: <59ACE054B23B6045ACC43DBD778609BD6867D17257@UM-EMAIL02.um.umsystem.edu> 
                              
                              <59ACE054B23B6045ACC43DBD778609BD6867D1731B@UM-EMAIL02.um.umsystem.edu> Message-ID: 
                              
                              On Tue, Jan 25, 2011 at 2:18 AM, Earney, Billy C. 
                              
                              wrote: > I want to make it clear that I am in no way associated with the tahoe-lafs > project. I do not want my email to make that project look bad. That was > not my intention. > Good to know. I was also in a somewhat grumpy mood when I wrote my last post, so take it with a grain of salt :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: 
                              
                              From ncoghlan at gmail.com Tue Jan 25 12:08:01 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 25 Jan 2011 21:08:01 +1000 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <4D3DDE5E.4080807@v.loewis.de> References: <4D3DDE5E.4080807@v.loewis.de> Message-ID: 
                              
                              On Tue, Jan 25, 2011 at 6:17 AM, "Martin v. L?wis" 
                              
                              wrote: > A new function PyUnicode_AsUTF8 is provided to access the UTF-8 > representation. It is thus identical to the existing > _PyUnicode_AsString, which is removed. The function will compute the > utf8 representation when first called. Since this representation will > consume memory until the string object is released, applications > should use the existing PyUnicode_AsUTF8String where possible > (which generates a new string object every time). API that implicitly > converts a string to a char* (such as the ParseTuple functions) will > use this function to compute a conversion. I'm not entirely clear as to what "this function" is referring to here. I'm also dubious of the "PyUnicode_Finalize" name - "PyUnicode_Ready" might be a better option (PyType_Ready seems a better analogy for a "I've filled everything in, please calculate the derived fields now" than Py_Finalize). More generally, let me see if I understand the proposed structure correctly: str: Always set once PyUnicode_Ready() has been called. Always points to the canonical representation of the string (as indicated by PyUnicode_Kind) length: Always set once PyUnicode_Ready() has been called. Specifies the number of code points in the string. wstr: Set only if PyUnicode_AsUnicode has been called on the string. If (sizeof(wchar_t) == 2 && PyUnicode_Kind() == PyUnicode_2BYTE) or (sizeof(wchar_t) == 4 && PyUnicode_Kind() == PyUnicode_4BYTE), wstr = str, otherwise wstr points to dedicated memory wstr_length: Valid only if wstr != NULL If wstr_length != length, indicates presence of surrogate pairs in a UCS-2 string (i.e. sizeof(wchar_t) == 2, PyUnicode_Kind() == PyUnicode_4BYTE). utf8: Set only if PyUnicode_AsUTF8 has been called on the string. If string contents are pure ASCII, utf8 = str, otherwise utf8 points to dedicated memory. utf8_length: Valid only if utf8_ptr != NULL One change I would propose is that rather than hiding flags in the low order bits of the str pointer, we expand the use of the existing "state" field to cover the representation information in addition to the interning information. I would also suggest explicitly flagging internally whether or not a 1 byte string is ASCII or Latin-1 along the lines of: /* Already existing string state constants */ #SSTATE_NOT_INTERNED 0x00 #SSTATE_INTERNED_MORTAL 0x01 #SSTATE_INTERNED_IMMORTAL 0x02 /* New string state constants */ #SSTATE_INTERN_MASK 0x03 #SSTATE_KIND_ASCII 0x00 #SSTATE_KIND_LATIN1 0x04 #SSTATE_KIND_2BYTE 0x08 #SSTATE_KIND_4BYTE 0x0C #SSTATE_KIND_MASK 0x0C PyUnicode_Kind would then return PyUnicode_1BYTE for strings that were flagged internally as either ASCII or LATIN1. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Tue Jan 25 12:26:03 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 25 Jan 2011 12:26:03 +0100 Subject: [Python-Dev] r88178 - python/branches/py3k/Lib/test/crashers/underlying_dict.py References: <20110125000028.94263EEBDB@mail.python.org> Message-ID: <20110125122603.74e49f8c@pitrou.net> On Tue, 25 Jan 2011 01:00:28 +0100 (CET) benjamin.peterson 
                              
                              wrote: > Author: benjamin.peterson > Date: Tue Jan 25 01:00:28 2011 > New Revision: 88178 > > Log: > another pretty crasher served up by pypy Some comments would be nice. Right now it looks pretty close to deliberately obfuscated code (especially with the call to gc.get_referrers()). Regards Antoine. From exarkun at twistedmatrix.com Tue Jan 25 16:00:11 2011 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Tue, 25 Jan 2011 15:00:11 -0000 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <7F2E941E-143A-461E-BEC5-D7545C6D877A@masklinn.net> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> <7F2E941E-143A-461E-BEC5-D7545C6D877A@masklinn.net> Message-ID: <20110125150011.1699.303521989.divmod.xquotient.358@localhost.localdomain> On 09:22 am, catch-all at masklinn.net wrote: >On 2011-01-25, at 04:26 , Toshio Kuratomi wrote: >> >>* If you can pick a set of encodings that are valid (utf-8 for Linux >>and >> MacOS > >HFS+ uses UTF-16 in NFD (actually in an Apple-specific variant of NFD). >Right here you've already broken Python modules on OSX. Are you sure about the UTF-16 part? Evidence strongly points towards UTF-8: $ python Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import unicodedata, os >>> file(u'\N{SNOWMAN}', 'w').close() >>> os.listdir('.') ['\xe2\x98\x83'] >>> unicodedata.name('\xe2\x98\x83'.decode('utf-8')) 'SNOWMAN' >>> Jean-Paul From ncoghlan at gmail.com Tue Jan 25 17:07:45 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 26 Jan 2011 02:07:45 +1000 Subject: [Python-Dev] [Python-checkins] r88155 - python/branches/py3k/Doc/whatsnew/3.2.rst In-Reply-To: <20110124015149.4B8CFEEAE3@mail.python.org> References: <20110124015149.4B8CFEEAE3@mail.python.org> Message-ID: 
                              
                              On Mon, Jan 24, 2011 at 11:51 AM, raymond.hettinger 
                              
                              wrote: > Author: raymond.hettinger > Date: Mon Jan 24 02:51:49 2011 > New Revision: 88155 > > Log: > Add entries for dis, dbm, and ctypes. > > > Modified: > ? python/branches/py3k/Doc/whatsnew/3.2.rst > > Modified: python/branches/py3k/Doc/whatsnew/3.2.rst > ============================================================================== > --- python/branches/py3k/Doc/whatsnew/3.2.rst ? (original) > +++ python/branches/py3k/Doc/whatsnew/3.2.rst ? Mon Jan 24 02:51:49 2011 > @@ -1599,6 +1599,51 @@ > > ?(Contributed by Ron Adam; :issue:`2001`.) > > +dis > +--- For the dis module there is also the change to dis.dis() itself from issue 6507 - you can now pass source strings directly to dis without needing to compile them first: >>> dis.dis("1 + 2") 1 0 LOAD_CONST 2 (3) 3 RETURN_VALUE > +The :mod:`dis` module gained two new functions for inspecting code, > +:func:`~dis.code_info` and :func:`~dis.show_code`. ?Both provide detailed code > +object information for the supplied function, method, source code string or code > +object. ?The former returns a string and the latter prints it:: > + > + ? ?>>> import dis, random > + ? ?>>> show_code(random.choice) Typo here - missing a "dis." at the start of the line. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From a.badger at gmail.com Tue Jan 25 17:35:25 2011 From: a.badger at gmail.com (Toshio Kuratomi) Date: Tue, 25 Jan 2011 08:35:25 -0800 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <7F2E941E-143A-461E-BEC5-D7545C6D877A@masklinn.net> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> <7F2E941E-143A-461E-BEC5-D7545C6D877A@masklinn.net> Message-ID: <20110125163525.GE24080@unaka.lan> On Tue, Jan 25, 2011 at 10:22:41AM +0100, Xavier Morel wrote: > On 2011-01-25, at 04:26 , Toshio Kuratomi wrote: > > > > * If you can pick a set of encodings that are valid (utf-8 for Linux and > > MacOS > > HFS+ uses UTF-16 in NFD (actually in an Apple-specific variant of NFD). Right here you've already broken Python modules on OSX. > Others have been saying that Mac OSX's HFS+ uses UTF-8. But the question is not whether UTF-16 or UTF-8 is used by HFS+. It's whether you can sensibly decide on an encoding from the type of system that is being run on. This could be querying the filesystem or a check on sys.platform or some other method. I don't know what detection the current code does. On Linux there's no defined encoding that will work; file names are just bytes to the Linux kernel so based on people's argument that the convention is and should be that filenames are utf-8 and anything else is a misconfigured system -- python should mandate that its module filenames on Linux are utf-8 rather than using the user's locale settings. > > And as far as I know, Linux software/FS generally use NFC (I've already seen this issue cause trouble) > Linux FS's are bytes with a small blacklist (so you can't use the NULL byte in a filename, for instance). Linux software would be free to use any normal form that they want. If one software used NFC and another used NFD, the FS would record two separate files with two separate filenames. Other programs might or might not display this correctly. Example: 
                              
                              $ touch cafe 
                              
                              $ python Python 2.7 (r27:82500, Sep 16 2010, 18:02:00) >>> import os >>> import unicodedata >>> a=u'caf?' >>> b=unicodedata.normalize('NFC', a) >>> c=unicodedata.normalize('NFD', a) >>> open(b.encode('utf8'), 'w').close() >>> open(c.encode('utf8'), 'w').close() >>> os.listdir(u'.') >>> [u'people-etc-changes.txt', u'cafe\u0301', u'cafe', u'people-etc-changes.sha256sum', u'caf\xe9'] >>> os.listdir('.') >>> ['people-etc-changes.txt', 'cafe\xcc\x81', 'cafe', 'people-etc-changes.sha256sum', 'caf\xc3\xa9'] >>> ^D 
                              
                              $ ls -al . drwxrwxr-x. 2 badger badger 4096 Jan 25 07:46 . drwxr-xr-x. 17 badger badger 4096 Jan 24 18:27 .. -rw-rw-r--. 1 badger badger 0 Jan 25 07:45 cafe -rw-rw-r--. 1 badger badger 0 Jan 25 07:46 cafe -rw-rw-r--. 1 badger badger 0 Jan 25 07:46 caf? 
                              
                              $ ls -al cafe -rw-rw-r--. 1 badger badger 0 Jan 25 07:45 cafe 
                              
                              $ ls -al cafe? -rw-rw-r--. 1 badger badger 0 Jan 25 07:46 cafe Now in this case, the decomposed form of the filename is being displayed incorrectly and the shell treats the decomposed character as two characters instead of one. However, when you view these files in dolphin (the KDE file manager) you properly see caf? repeated twice. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: 
                              
                              From brett at python.org Tue Jan 25 18:38:36 2011 From: brett at python.org (Brett Cannon) Date: Tue, 25 Jan 2011 09:38:36 -0800 Subject: [Python-Dev] Location of tests for packages In-Reply-To: <749B0A1D-752F-4842-96D4-E73FEEFD5CEF@gmail.com> References: 
                              
                              <4D3E0E07.1080005@voidspace.org.uk> <749B0A1D-752F-4842-96D4-E73FEEFD5CEF@gmail.com> Message-ID: 
                              
                              On Mon, Jan 24, 2011 at 17:19, Raymond Hettinger 
                              
                              wrote: > > On Jan 24, 2011, at 3:40 PM, Michael Foord wrote: >> It isn't just unittest, it seems that all *test packages* are in their respective package and not Lib/test except for the json module where Raymond already moved the tests: >> >> ? ?distutils/tests >> ? ?email/test >> ? ?ctypes/test >> ? ?importlib/test >> ? ?lib2to3/tests >> ? ?sqlite3/test >> ? ?tkinter/test >> >> So I'm a little confused as to why the focus on the *unittest* test suite. > > > There's not a focus on unittest. ?Importlib should also move under Lib/test > and when email is ready, it too should fully join the organization of > the overall project (Doc, Lib, Lib/test, Modules, Objects, Tools). Just to clarify my position since importlib keeps getting brought up as an example, I'm fine with a move but I won't be putting the work in to do the move if there is actually consensus to make this a stdlib-wide policy. And I am assuming that the directory will be moved wholesale to Lib/test/importlib (with proper fixes for any relative imports) along with verification that importlib.test.__main__ continues to work (naming it test.importlib_tests seems rather redundant compared to test.importlib). While I'm for consistency, obviously a trend was started by ctypes and sqlite3 that the rest of us who created full packages followed up to this point. If we move some modules and not others purely because some distros choose not to ship e.g., ctypes and sqlite3, that will get annoying w/o some very clear explanation/delineation as to why some packages have a special rule to follow (I'm guessing "packages that have external dependencies" would be it). From fijall at gmail.com Tue Jan 25 19:11:43 2011 From: fijall at gmail.com (Maciej Fijalkowski) Date: Tue, 25 Jan 2011 20:11:43 +0200 Subject: [Python-Dev] r88178 - python/branches/py3k/Lib/test/crashers/underlying_dict.py In-Reply-To: <20110125122603.74e49f8c@pitrou.net> References: <20110125000028.94263EEBDB@mail.python.org> <20110125122603.74e49f8c@pitrou.net> Message-ID: 
                              
                              On Tue, Jan 25, 2011 at 1:26 PM, Antoine Pitrou 
                              
                              wrote: > On Tue, 25 Jan 2011 01:00:28 +0100 (CET) > benjamin.peterson 
                              
                              wrote: >> Author: benjamin.peterson >> Date: Tue Jan 25 01:00:28 2011 >> New Revision: 88178 >> >> Log: >> another pretty crasher served up by pypy > > Some comments would be nice. Right now it looks pretty close to > deliberately obfuscated code (especially with the call to > gc.get_referrers()). > > Regards > > Antoine. > I gets to a dict of class circumventing dictproxy. It's yet unclear why it segfaults. From alexander.belopolsky at gmail.com Tue Jan 25 19:16:07 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 25 Jan 2011 13:16:07 -0500 Subject: [Python-Dev] Location of tests for packages In-Reply-To: 
                              
                              References: 
                              
                              <4D3E0E07.1080005@voidspace.org.uk> <749B0A1D-752F-4842-96D4-E73FEEFD5CEF@gmail.com> 
                              
                              Message-ID: 
                              
                              On Tue, Jan 25, 2011 at 12:38 PM, Brett Cannon 
                              
                              wrote: >.. If we move some modules and not others purely because some > distros choose not to ship e.g., ctypes and sqlite3 I don't see why this is a problem. Regrtest already has a mechanism that allows skipping tests based on various criteria. This mechanism works for both packages and flat modules that can be optionally installed. FWIW, I am +0 on consolidating tests under Lib/test. One of the reasons that I have not seen mentioned is that it is well-known that test package is not part of the official stdlib API and can be changes/restructured in backward incompatible ways. It is not obvious whether the same applies to say lib2to3.tests or ctypes.test. If you are interested to see what it takes to move tests from a package, I moved json tests to Lib/test/json_tests in r86875. It is not hard, but does require some changes to imports. From solipsis at pitrou.net Tue Jan 25 19:21:32 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 25 Jan 2011 19:21:32 +0100 Subject: [Python-Dev] r88178 - python/branches/py3k/Lib/test/crashers/underlying_dict.py In-Reply-To: 
                              
                              References: <20110125000028.94263EEBDB@mail.python.org> <20110125122603.74e49f8c@pitrou.net> 
                              
                              Message-ID: <1295979692.3716.8.camel@localhost.localdomain> Le mardi 25 janvier 2011 ? 20:11 +0200, Maciej Fijalkowski a ?crit : > On Tue, Jan 25, 2011 at 1:26 PM, Antoine Pitrou 
                              
                              wrote: > > On Tue, 25 Jan 2011 01:00:28 +0100 (CET) > > benjamin.peterson 
                              
                              wrote: > >> Author: benjamin.peterson > >> Date: Tue Jan 25 01:00:28 2011 > >> New Revision: 88178 > >> > >> Log: > >> another pretty crasher served up by pypy > > > > Some comments would be nice. Right now it looks pretty close to > > deliberately obfuscated code (especially with the call to > > gc.get_referrers()). > > > > Regards > > > > Antoine. > > > > I gets to a dict of class circumventing dictproxy. It's yet unclear > why it segfaults. Perhaps the method cache? But why the comment "# should print 1"? Shouldn't it print 2 instead? From mal at egenix.com Tue Jan 25 23:43:52 2011 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 25 Jan 2011 23:43:52 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <4D3DDE5E.4080807@v.loewis.de> References: <4D3DDE5E.4080807@v.loewis.de> Message-ID: <4D3F5228.4010901@egenix.com> I'll comment more on this later this week... >From my first impression, I'm not too thrilled by the prospect of making the Unicode implementation more complicated by having three different representations on each object. I also don't see how this could save a lot of memory. As an example take a French text with say 10mio code points. This would end up appearing in memory as 3 copies on Windows: one copy stored as UCS2 (20MB), one as Latin-1 (10MB) and one as UTF-8 (probably around 15MB, depending on how many accents are used). That's a saving of -10MB compared to today's implementation :-) "Martin v. L?wis" wrote: > I have been thinking about Unicode representation for some time now. > This was triggered, on the one hand, by discussions with Glyph Lefkowitz > (who complained that his server app consumes too much memory), and Carl > Friedrich Bolz (who profiled Python applications to determine that > Unicode strings are among the top consumers of memory in Python). > On the other hand, this was triggered by the discussion on supporting > surrogates in the library better. > > I'd like to propose PEP 393, which takes a different approach, > addressing both problems simultaneously: by getting a flexible > representation (one that can be either 1, 2, or 4 bytes), we can > support the full range of Unicode on all systems, but still use > only one byte per character for strings that are pure ASCII (which > will be the majority of strings for the majority of users). > > You'll find the PEP at > > http://www.python.org/dev/peps/pep-0393/ > > For convenience, I include it below. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 25 2011) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Wed Jan 26 00:22:32 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 26 Jan 2011 00:22:32 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation References: <4D3DDE5E.4080807@v.loewis.de> <4D3F5228.4010901@egenix.com> Message-ID: <20110126002232.1864cd6b@pitrou.net> For the record: > I also don't see how this could save a lot of memory. As an example > take a French text with say 10mio code points. This would end up > appearing in memory as 3 copies on Windows: one copy stored as UCS2 (20MB), > one as Latin-1 (10MB) and one as UTF-8 (probably around 15MB, depending > on how many accents are used). Typical French text seems to have 5% non-ASCII characters. So the number of UTF-8 bytes needed to represent a French text would only be 5% higher than the number of code points. Anyway, it's quite obvious that Martin's goal is that only one representation gets created most of the time. To quote the draft: ?All three representations are optional, although the str form is considered the canonical representation which can be absent only while the string is being created.? Regards Antoine. From martin at v.loewis.de Wed Jan 26 00:23:45 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 26 Jan 2011 00:23:45 +0100 Subject: [Python-Dev] r88178 - python/branches/py3k/Lib/test/crashers/underlying_dict.py In-Reply-To: <20110125122603.74e49f8c@pitrou.net> References: <20110125000028.94263EEBDB@mail.python.org> <20110125122603.74e49f8c@pitrou.net> Message-ID: <4D3F5B81.6050401@v.loewis.de> > Some comments would be nice. Right now it looks pretty close to > deliberately obfuscated code (especially with the call to > gc.get_referrers()). That call tries to get at the class dictionary, rather then just the dict_proxy that you get from A.__dict__. There should be two referrers to thingy: the class dict, and the module dict. The class dict will have a __module__ key. I agree the program should print 2, though. Regards, Martin From solipsis at pitrou.net Wed Jan 26 00:24:12 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 26 Jan 2011 00:24:12 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              Message-ID: <20110126002412.60002036@pitrou.net> On Tue, 25 Jan 2011 21:08:01 +1000 Nick Coghlan 
                              
                              wrote: > > One change I would propose is that rather than hiding flags in the low > order bits of the str pointer, we expand the use of the existing > "state" field to cover the representation information in addition to > the interning information. +1, by the way. The "state" field has many bits available (even if we decide to make it a char rather than an int). Regards Antoine. From ncoghlan at gmail.com Wed Jan 26 02:30:27 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 26 Jan 2011 11:30:27 +1000 Subject: [Python-Dev] Location of tests for packages In-Reply-To: 
                              
                              References: 
                              
                              <4D3E0E07.1080005@voidspace.org.uk> <749B0A1D-752F-4842-96D4-E73FEEFD5CEF@gmail.com> 
                              
                              
                              Message-ID: 
                              
                              On Wed, Jan 26, 2011 at 4:16 AM, Alexander Belopolsky 
                              
                              wrote: > FWIW, I am +0 on consolidating tests under Lib/test. ?One of the > reasons that I have not seen mentioned is that it is well-known that > test package is not part of the official stdlib API and can be > changes/restructured in backward incompatible ways. It is not obvious > whether the same applies to say lib2to3.tests or ctypes.test. I am +0 for the same reason as Alexander. The test subpackages should either be moved under the test package, or, for packages with PyPI distributed backports for previous versions, they should be prefixed with a leading underscore to make it clear that they're private implementation details and backwards compatibility guarantees don't apply. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Wed Jan 26 02:41:31 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 26 Jan 2011 11:41:31 +1000 Subject: [Python-Dev] [Python-checkins] r88197 - python/branches/py3k/Lib/email/generator.py In-Reply-To: <20110126003919.A9236EEC96@mail.python.org> References: <20110126003919.A9236EEC96@mail.python.org> Message-ID: 
                              
                              On Wed, Jan 26, 2011 at 10:39 AM, victor.stinner 
                              
                              wrote: > Author: victor.stinner > Date: Wed Jan 26 01:39:19 2011 > New Revision: 88197 > > Log: > Fix BytesGenerator._handle_text() if the message has no payload (None) Folks, for the peace of mind of python-checkins watchers, please remember to mention the reviewer's name when checking in fixes during the RC period (the last one I checked had been reviewed by Georg on the issue tracker, but it's hard to check without even an issue number to look up). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From digitalxero at gmail.com Wed Jan 26 02:50:30 2011 From: digitalxero at gmail.com (Dj Gilcrease) Date: Tue, 25 Jan 2011 20:50:30 -0500 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <4D3F5228.4010901@egenix.com> References: <4D3DDE5E.4080807@v.loewis.de> <4D3F5228.4010901@egenix.com> Message-ID: 
                              
                              On Tue, Jan 25, 2011 at 5:43 PM, M.-A. Lemburg 
                              
                              wrote: > I also don't see how this could save a lot of memory. As an example > take a French text with say 10mio code points. This would end up > appearing in memory as 3 copies on Windows: one copy stored as UCS2 (20MB), > one as Latin-1 (10MB) and one as UTF-8 (probably around 15MB, depending > on how many accents are used). That's a saving of -10MB compared to > today's implementation :-) If I am reading the pep right, which I may not be as I am no expert on unicode, the new implementation would actually give a 10MB saving since the wchar field is optional, so only the str (Latin-1) and utf8 fields would need to be stored. How it decides not to store one field or another would need to be clarified in the pep is I am right. From brett at python.org Wed Jan 26 03:07:38 2011 From: brett at python.org (Brett Cannon) Date: Tue, 25 Jan 2011 18:07:38 -0800 Subject: [Python-Dev] [Python-checkins] r88197 - python/branches/py3k/Lib/email/generator.py In-Reply-To: <20110126003919.A9236EEC96@mail.python.org> References: <20110126003919.A9236EEC96@mail.python.org> Message-ID: 
                              
                              This broke the buildbots (R. David Murray thinks you may have forgotten to call super() in the 'payload is None' branch). Are you getting code reviews and fully running the test suite before committing? We are in RC. On Tue, Jan 25, 2011 at 16:39, victor.stinner 
                              
                              wrote: > Author: victor.stinner > Date: Wed Jan 26 01:39:19 2011 > New Revision: 88197 > > Log: > Fix BytesGenerator._handle_text() if the message has no payload (None) > > Modified: > ? python/branches/py3k/Lib/email/generator.py > > Modified: python/branches/py3k/Lib/email/generator.py > ============================================================================== > --- python/branches/py3k/Lib/email/generator.py (original) > +++ python/branches/py3k/Lib/email/generator.py Wed Jan 26 01:39:19 2011 > @@ -377,8 +377,11 @@ > ? ? def _handle_text(self, msg): > ? ? ? ? # If the string has surrogates the original source was bytes, so > ? ? ? ? # just write it back out. > - ? ? ? ?if _has_surrogates(msg._payload): > - ? ? ? ? ? ?self.write(msg._payload) > + ? ? ? ?payload = msg.get_payload() > + ? ? ? ?if payload is None: > + ? ? ? ? ? ?return > + ? ? ? ?if _has_surrogates(payload): > + ? ? ? ? ? ?self.write(payload) > ? ? ? ? else: > ? ? ? ? ? ? super(BytesGenerator,self)._handle_text(msg) > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://mail.python.org/mailman/listinfo/python-checkins > From stephen at xemacs.org Wed Jan 26 03:24:54 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 26 Jan 2011 11:24:54 +0900 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <20110125163525.GE24080@unaka.lan> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> <7F2E941E-143A-461E-BEC5-D7545C6D877A@masklinn.net> <20110125163525.GE24080@unaka.lan> Message-ID: <874o8w5rm1.fsf@uwakimon.sk.tsukuba.ac.jp> Toshio Kuratomi writes: > On Linux there's no defined encoding that will work; file names are just > bytes to the Linux kernel so based on people's argument that the convention > is and should be that filenames are utf-8 and anything else is > a misconfigured system -- python should mandate that its module filenames on > Linux are utf-8 rather than using the user's locale settings. This isn't going to work where I live (Tsukuba). At the national university alone there are hundreds of pre-existing *nix systems whose filesystems were often configured a decade or more ago. Even if the hardware and OS have been upgraded, the filesystems are usually migrated as-is, with OS configuration tweaks to accomodate them. Many of them use EUC-JP (and servers often Shift JIS). That means that you won't be able to read module names with ls, and that will make Python unacceptable for this purpose. I imagine that in Russia the same is true for the various Cyrillic encodings. I really don't think there is anything that can be done here except to warn people that "Kids, these stunts are performed by highly-trained professionals. Don't try this at home!" Of course they will anyway, but at least they will have been warned in sufficiently strong terms that they might pay attention and be able to recover when they run into bizarre import exceptions. Oh, yeah, don't forget to apply Victor's patch, which allows Python to keep the promises it can make about consistency.
                              
                              From a.badger at gmail.com Wed Jan 26 06:33:56 2011 From: a.badger at gmail.com (Toshio Kuratomi) Date: Tue, 25 Jan 2011 21:33:56 -0800 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <874o8w5rm1.fsf@uwakimon.sk.tsukuba.ac.jp> References: 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> <7F2E941E-143A-461E-BEC5-D7545C6D877A@masklinn.net> <20110125163525.GE24080@unaka.lan> <874o8w5rm1.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20110126053356.GH24080@unaka.lan> On Wed, Jan 26, 2011 at 11:24:54AM +0900, Stephen J. Turnbull wrote: > Toshio Kuratomi writes: > > > On Linux there's no defined encoding that will work; file names are just > > bytes to the Linux kernel so based on people's argument that the convention > > is and should be that filenames are utf-8 and anything else is > > a misconfigured system -- python should mandate that its module filenames on > > Linux are utf-8 rather than using the user's locale settings. > > This isn't going to work where I live (Tsukuba). At the national > university alone there are hundreds of pre-existing *nix systems whose > filesystems were often configured a decade or more ago. Even if the > hardware and OS have been upgraded, the filesystems are usually > migrated as-is, with OS configuration tweaks to accomodate them. Many > of them use EUC-JP (and servers often Shift JIS). That means that you > won't be able to read module names with ls, and that will make Python > unacceptable for this purpose. I imagine that in Russia the same is > true for the various Cyrillic encodings. > Sure ... but with these systems, neither read-modules-as-locale or read-modules-as-utf-8 are a good solution to work, correct? Especially if the OS does get upgraded but the filesystems with user data (and user created modules) are migrated as-is, you'll run into situations where system installed modules are in utf-8 and user created modules are shift-jis and so something will always be broken. The only way to make sure that modules work is to restrict them to ASCII-only on the filesystem. But because unicode module names are seen as a necessary feature, the question is which way forward is going to lead to the least brokenness. Which could be locale... but from the python2 locale-related bugs that I get to look at, I doubt. > I really don't think there is anything that can be done here except to > warn people that "Kids, these stunts are performed by highly-trained > professionals. Don't try this at home!" Of course they will anyway, > but at least they will have been warned in sufficiently strong terms > that they might pay attention and be able to recover when they run > into bizarre import exceptions. > So on the subject of warnings... I think a reason it's better to pick an encoding for the platform/filesystem rather than to use locale is because people will get an error or a warning at the appropriate time if that's the case -- the first time they attempt to create and import a module with a filename that's not encoded in the correct encoding for the platform. It's all very well to say: "We wrote in the documentation on http://docs.python.org/distutils/introduction.html#Choosing-a-name that only ASCII names should be used when distributing python modules" but if the interpreter doesn't complain when people use a non-ASCII filename we all know that they aren't going to look in the documentation; they'll try it and if it works they'll learn that habit. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: 
                              
                              From stephen at xemacs.org Wed Jan 26 09:58:36 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 26 Jan 2011 17:58:36 +0900 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <20110126053356.GH24080@unaka.lan> References: 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> <7F2E941E-143A-461E-BEC5-D7545C6D877A@masklinn.net> <20110125163525.GE24080@unaka.lan> <874o8w5rm1.fsf@uwakimon.sk.tsukuba.ac.jp> <20110126053356.GH24080@unaka.lan> Message-ID: <87wrls3utf.fsf@uwakimon.sk.tsukuba.ac.jp> Toshio Kuratomi writes: > Sure ... but with these systems, neither read-modules-as-locale or > read-modules-as-utf-8 are a good solution to work, correct? Good solution, no, but I believe that read-modules-as-locale *should* work to a great extent. AFAIK Python 3 reads Python programs as str (ie, converting to Unicode -- if it doesn't, it *should*
                              
                              ). > Especially if the OS does get upgraded but the filesystems with > user data (and user created modules) are migrated as-is, you'll run > into situations where system installed modules are in utf-8 and > user created modules are shift-jis and so something will always be > broken. I don't know what you mean by "system-installed modules". If you're talking about Python itself, it's not a problem. Python doesn't have any Japanese-named modules in any encoding. On the other hand, *everything* that involves scripting (shell scripts, make, etc) related to those filesystems will be broken *unless* the system, after upgrade but before going live, is converted to have an appropriate locale encoding. So I don't really see a problem here. The problem is portability across systems, and that is a problem that only the third-party transports can really deal with. tar and unzip need to be taught how to change file names to the locale, etc. > The only way to make sure that modules work is to restrict them to ASCII-only > on the filesystem. But because unicode module names are seen as > a necessary feature, the question is which way forward is going to lead to > the least brokenness. Which could be locale... but from the python2 > locale-related bugs that I get to look at, I doubt. AFAICS this is going to be site-specific. End of story. Or, if you prefer, "maru-nage".
                              
                              IMHO, Python 2 locale bugs are unlikely to be a good guide to Python 3 locale bugs because in Python 2 most people just ignore locale and use "native" strings (~= bytes in Python 3), and that typically "just works". In Python 3 that just *doesn't* work any more because you get a UnicodeError on import, etc, etc. IMHO, YMMV, and all that. I know *of* such systems (there remain quite a few here used by student and research labs), but the ones I maintain were easy to convert to UTF-8 because I don't export file systems (except my private files for my own use); everything is mediated by Apache and Zope, and browsers are happy to cope if I change from EUC-JP to UTF-8 and then flip the Apache switch to change default encodings. From victor.stinner at haypocalc.com Wed Jan 26 10:40:34 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 26 Jan 2011 10:40:34 +0100 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <20110125032609.GC24080@unaka.lan> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> Message-ID: <1296034834.25379.18.camel@marge> Le lundi 24 janvier 2011 ? 19:26 -0800, Toshio Kuratomi a ?crit : > Why not locale: > * Relying on locale is simply not portable. (...) > * Mixing of modules from different locales won't work. (...) I don't understand what you are talking about. When you import a module, the module name becomes a filename. On Windows, you can reuse the Unicode name directly as a filename. On the other OSes, you have to encode the name to filesystem encoding. During Python 3.2 development, we tried to be able to use a filesystem encoding different than the locale encoding (PYTHONFSENCODING environment variable): but it doesn't work simply because Python is not alone in the OS. Except Python, all programs speak the same "language": the locale encoding. Let's try to give you an example: if create a module with a name encoded to UTF-8, your file browser will display mojibake. I don't understand the relation between the local filesystem encoding and the portability. I suppose that you are talking about the distribution of a module to other computers. Here the question is how the filenames are stored during the transfer. The user is free to use any tool, and try to find a tool handling Unicode correctly :-) But it's no more the Python problem. Each computer uses a different locale encoding. You have to use it to cooperate with other programs and avoid mojibake. But I don't understand why you write that "Mixing of modules from different locales won't work". If you use a tool storing filenames in your locale encoding (eg. TAR file format... and sometimes the ZIP format), the problem comes from your tool and you should use another tool. I created http://bugs.python.org/issue10972 to workaround ZIP tools supposing that ZIP files use the locale encoding instead of cp497: this issue adds an option to force the usage of the Unicode flag (and so store filenames to UTF-8). Even if initially, I created the issue to workaround a bootstrap issue (#10955). Victor From victor.stinner at haypocalc.com Wed Jan 26 10:57:28 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 26 Jan 2011 10:57:28 +0100 Subject: [Python-Dev] [Python-checkins] r88197 - python/branches/py3k/Lib/email/generator.py In-Reply-To: 
                              
                              References: <20110126003919.A9236EEC96@mail.python.org> 
                              
                              Message-ID: <1296035848.25379.27.camel@marge> Hi, Le mardi 25 janvier 2011 ? 18:07 -0800, Brett Cannon a ?crit : > This broke the buildbots (R. David Murray thinks you may have > forgotten to call super() in the 'payload is None' branch). Are you > getting code reviews and fully running the test suite before > committing? We are in RC. > (...) > > - if _has_surrogates(msg._payload): > > - self.write(msg._payload) > > + payload = msg.get_payload() > > + if payload is None: > > + return > > + if _has_surrogates(payload): > > + self.write(payload) I didn't realize that such minor change can do anything harmful: the parent method (Generator._handle_text) has exaclty the same test. If msg._payload is None, call the parent method with None does nothing. But _has_surrogates() doesn't support None. The problem is not the test of None, but replacing msg._payload by msg.get_payload(). I thought that get_payload() was a dummy getter reading self._payload, but I was completly wrong :-) I was stupid to not run at least test_email, sorry. And no, I didn't ask for a review, because I thought that such minor change cannot be harmful. FYI the commit is related indirectly to #9124 (Mailbox module should use binary I/O, not text I/O). Victor From martin at v.loewis.de Wed Jan 26 11:12:02 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 26 Jan 2011 11:12:02 +0100 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <1296034834.25379.18.camel@marge> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> <1296034834.25379.18.camel@marge> Message-ID: <4D3FF372.3040007@v.loewis.de> Am 26.01.2011 10:40, schrieb Victor Stinner: > Le lundi 24 janvier 2011 ? 19:26 -0800, Toshio Kuratomi a ?crit : >> Why not locale: >> * Relying on locale is simply not portable. (...) >> * Mixing of modules from different locales won't work. (...) > > I don't understand what you are talking about. I think by "portability", he means "moving files from one computer to another". He argues that if Python would mandate UTF-8 for all file names on Unix, moving files in such a way would support portability, whereas using the locale's filename might not (if the locale use a different charset on the target system). While this is technically true, I don't think it's a helpful way of thinking: by mandating that file names are UTF-8 when accessed from Python, we make the actual files inaccessible on both the source and the target system. > I don't understand the relation between the local filesystem encoding > and the portability. I suppose that you are talking about the > distribution of a module to other computers. Here the question is how > the filenames are stored during the transfer. The user is free to use > any tool, and try to find a tool handling Unicode correctly :-) But it's > no more the Python problem. There are cases where there is no real "transfer", in the sense in which you are using the word. For example, with NFS, you can access the very same file simultaneously on two systems, with no file name conversion (unless you are using NFSv4, and unless your NFSv4 implementations support the UTF-8 mandate in NFS well). Also, if two users of the same machine have different locale settings, the same file name might be interpreted differently. Regards, Martin From phd at phdru.name Wed Jan 26 12:02:31 2011 From: phd at phdru.name (Oleg Broytman) Date: Wed, 26 Jan 2011 14:02:31 +0300 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <4D3FF372.3040007@v.loewis.de> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> <1296034834.25379.18.camel@marge> <4D3FF372.3040007@v.loewis.de> Message-ID: <20110126110231.GB27259@iskra.aviel.ru> On Wed, Jan 26, 2011 at 11:12:02AM +0100, "Martin v. L??wis" wrote: > There are cases where there is no real "transfer", in the sense in which > you are using the word. For example, with NFS, you can access the very > same file simultaneously on two systems, with no file name conversion > (unless you are using NFSv4, and unless your NFSv4 implementations > support the UTF-8 mandate in NFS well). > > Also, if two users of the same machine have different locale settings, > the same file name might be interpreted differently. I have a solution for all these problems, with a price, of course. Let's use utf8+base64. Base64 uses a very restricted subset of ASCII and filenames will never be interpreted whatever filesystem encodings would be. The price is users loose standard OS tools like ls and find. I am partially joking, of course, but only partially. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From victor.stinner at haypocalc.com Wed Jan 26 12:57:16 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 26 Jan 2011 12:57:16 +0100 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <4D3FF372.3040007@v.loewis.de> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> <1296034834.25379.18.camel@marge> <4D3FF372.3040007@v.loewis.de> Message-ID: <1296043036.25379.41.camel@marge> Le mercredi 26 janvier 2011 ? 11:12 +0100, "Martin v. L?wis" a ?crit : > There are cases where there is no real "transfer", in the sense in which > you are using the word. For example, with NFS, you can access the very > same file simultaneously on two systems, with no file name conversion > (unless you are using NFSv4, and unless your NFSv4 implementations > support the UTF-8 mandate in NFS well). Python encodes the module name to the locale encoding to create a filename. If the locale encoding is not the encoding used on the NFS server, it doesn't work, but I don't think that Python has to workaround this issue. If an user plays with non-ASCII module names, (s)he has to understand that (s)he will have to fight against badly configured systems and tools unable to handle Unicode correctly. We might warn him/her in the documentation. If NFSv3 doesn't reencode filenames for each client and the clients don't reencode filenames, all clients have to use the same locale encoding than the server. Otherwise, I don't see how it can work. > Also, if two users of the same machine have different locale settings, > the same file name might be interpreted differently. Except Mac OS X and Windows, no kernel supports Unicode and so all users of the same computer have to use the same locale encoding, or they will not be able to share non-ASCII filenames. -- Again, I don't think that Python should do anything special to workaround these issues. (Hardcode the module filename encoding to UTF-8 doesn't work for all the reasons explained in other emails.) Victor From ncoghlan at gmail.com Wed Jan 26 13:30:37 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 26 Jan 2011 22:30:37 +1000 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: 
                              
                              References: <4D3DDE5E.4080807@v.loewis.de> <4D3F5228.4010901@egenix.com> 
                              
                              Message-ID: 
                              
                              On Wed, Jan 26, 2011 at 11:50 AM, Dj Gilcrease 
                              
                              wrote: > On Tue, Jan 25, 2011 at 5:43 PM, M.-A. Lemburg 
                              
                              wrote: >> I also don't see how this could save a lot of memory. As an example >> take a French text with say 10mio code points. This would end up >> appearing in memory as 3 copies on Windows: one copy stored as UCS2 (20MB), >> one as Latin-1 (10MB) and one as UTF-8 (probably around 15MB, depending >> on how many accents are used). That's a saving of -10MB compared to >> today's implementation :-) > > If I am reading the pep right, which I may not be as I am no expert on > unicode, the new implementation would actually give a 10MB saving > since the wchar field is optional, so only the str (Latin-1) and utf8 > fields would need to be stored. How it decides not to store one field > or another would need to be clarified in the pep is I am right. The PEP actually does define that already: PyUnicode_AsUTF8 populates the utf8 field of the existing string, while PyUnicode_AsUTF8String creates a *new* string with that field populated. PyUnicode_AsUnicode will populate the wstr field (but doing so generally shouldn't be necessary). For a UCS4 build, my reading of the PEP puts the memory savings for a 100 code point string as follows: Current size: 400 bytes (regardless of max code point) New initial size (max code point < 256): 100 bytes (75% saving) New initial size (max code point < 65536): 200 bytes (50% saving) New initial size (max code point >= 65536): 400 bytes (no saving) For each of the "new" strings, they may consume additional storage if the utf8 or wstr fields get populated. The maximum possible size would be a UCS4 string (max code point >= 65536) on a sizeof(wchar_t) == 2 system with the utf8 string populated. In such cases, you would consume at least 700 bytes, plus whatever additional memory is needed to encode the non-BMP characters into UTF-8 and UTF-16. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Wed Jan 26 13:34:43 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 26 Jan 2011 22:34:43 +1000 Subject: [Python-Dev] [Python-checkins] r88197 - python/branches/py3k/Lib/email/generator.py In-Reply-To: <1296035848.25379.27.camel@marge> References: <20110126003919.A9236EEC96@mail.python.org> 
                              
                              <1296035848.25379.27.camel@marge> Message-ID: 
                              
                              On Wed, Jan 26, 2011 at 7:57 PM, Victor Stinner 
                              
                              wrote: > I was stupid to not run at least test_email, sorry. And no, I didn't ask > for a review, because I thought that such minor change cannot be > harmful. During the RC period, *everything* that touches the code base should be reviewed by a second committer before checkin, and sanctioned by the RM as well. This applies even for apparently trivial changes. Docs checkins are slightly less strict (especially Raymond finishing off the What's New), but even there it's preferable to be cautious in the run up to a final release. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From p.f.moore at gmail.com Wed Jan 26 13:49:44 2011 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 26 Jan 2011 12:49:44 +0000 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: 
                              
                              References: <4D3DDE5E.4080807@v.loewis.de> <4D3F5228.4010901@egenix.com> 
                              
                              
                              Message-ID: 
                              
                              On 26 January 2011 12:30, Nick Coghlan 
                              
                              wrote: > The PEP actually does define that already: > > PyUnicode_AsUTF8 populates the utf8 field of the existing string, > while PyUnicode_AsUTF8String creates a *new* string with that field > populated. > > PyUnicode_AsUnicode will populate the wstr field (but doing so > generally shouldn't be necessary). AIUI, another point is that the PEP deprecates the use of the calls that populate the utf8 and wstr fields, in favour of the calls that expect the caller to manage the extra memory (PyUnicode_AsUTF8String rather than PyUnicode_AsUTF8, ??? rather than PyUnicode_AsUnicode). So in the long term, the extra fields should never be populated - although this could take some time as extensions have to be recoded. Ultimately, the extra fields and older APIs could even be removed. So any space cost (which I concede could be non-trivial in some cases) is expected to be short-term. Paul. From foom at fuhm.net Wed Jan 26 14:24:15 2011 From: foom at fuhm.net (James Y Knight) Date: Wed, 26 Jan 2011 08:24:15 -0500 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <1296034834.25379.18.camel@marge> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> <1296034834.25379.18.camel@marge> Message-ID: 
                              
                              On Jan 26, 2011, at 4:40 AM, Victor Stinner wrote: > During > Python 3.2 development, we tried to be able to use a filesystem encoding > different than the locale encoding (PYTHONFSENCODING environment > variable): but it doesn't work simply because Python is not alone in the > OS. Except Python, all programs speak the same "language": the locale > encoding. Let's try to give you an example: if create a module with a > name encoded to UTF-8, your file browser will display mojibake. Is that really true? I'm pretty sure GTK+ treats all filenames as UTF-8 no matter what the locale says. (over-rideable by G_FILENAME_ENCODING or G_BROKEN_FILENAMES) James From victor.stinner at haypocalc.com Wed Jan 26 17:47:10 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 26 Jan 2011 17:47:10 +0100 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: 
                              
                              References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> <1296034834.25379.18.camel@marge> 
                              
                              Message-ID: <1296060430.2672.76.camel@marge> Le mercredi 26 janvier 2011 ? 08:24 -0500, James Y Knight a ?crit : > On Jan 26, 2011, at 4:40 AM, Victor Stinner wrote: > > During > > Python 3.2 development, we tried to be able to use a filesystem encoding > > different than the locale encoding (PYTHONFSENCODING environment > > variable): but it doesn't work simply because Python is not alone in the > > OS. Except Python, all programs speak the same "language": the locale > > encoding. Let's try to give you an example: if create a module with a > > name encoded to UTF-8, your file browser will display mojibake. > > Is that really true? I'm pretty sure GTK+ treats all filenames as > UTF-8 no matter what the locale says. (over-rideable by > G_FILENAME_ENCODING or G_BROKEN_FILENAMES) Not exactly. Gtk+ uses the glib library, and to encode/decode filenames, the glib library uses: - UTF-8 on Windows - G_FILENAME_ENCODING environment variable if set (comma-separated list of encodings) - UTF-8 if G_BROKEN_FILENAMES env var is set - or the locale encoding glib has no type to store a filename, a filename is a raw byte string (char*). It has a nice function to workaround mojibake issues: g_filename_display_name(). This function tries to decode the filename from each encoding of the filename encoding list, if all decodings failed, use UTF-8 and escape undecodable bytes. So yes, if you set G_FILENAME_ENCODING you can fix mojibake issues. But you have to pass the raw bytes filenames to other libraries and programs. The problem with PYTHONFSENCODING is that sys.getfilesystemencoding() is not only used for the filenames, but also for the command line arguments and the environment variables. For more information about glib, see g_filename_to_utf8(), g_filename_display_name() and g_get_filename_charsets() documentation: http://library.gnome.org/devel/glib/2.26/glib-Character-Set-Conversion.html Victor From brett at python.org Wed Jan 26 18:43:52 2011 From: brett at python.org (Brett Cannon) Date: Wed, 26 Jan 2011 09:43:52 -0800 Subject: [Python-Dev] [Python-checkins] r88197 - python/branches/py3k/Lib/email/generator.py In-Reply-To: 
                              
                              References: <20110126003919.A9236EEC96@mail.python.org> 
                              
                              <1296035848.25379.27.camel@marge> 
                              
                              Message-ID: 
                              
                              On Wed, Jan 26, 2011 at 04:34, Nick Coghlan 
                              
                              wrote: > On Wed, Jan 26, 2011 at 7:57 PM, Victor Stinner > 
                              
                              wrote: >> I was stupid to not run at least test_email, sorry. And no, I didn't ask >> for a review, because I thought that such minor change cannot be >> harmful. > > During the RC period, *everything* that touches the code base should > be reviewed by a second committer before checkin, and sanctioned by > the RM as well. This applies even for apparently trivial changes. Especially as this is not the first slip-up; Raymond had a copy-and-paste slip that broke the buildbots. Luckily he was in #python-dev when it happened and it was noticed fast enough he fixed in in under a minute. So yes, even stuff we would all consider minor **must** have a review. Time to update the devguide I think. -Brett From g.brandl at gmx.net Wed Jan 26 19:08:36 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 26 Jan 2011 19:08:36 +0100 Subject: [Python-Dev] [Python-checkins] r88197 - python/branches/py3k/Lib/email/generator.py In-Reply-To: <1296035848.25379.27.camel@marge> References: <20110126003919.A9236EEC96@mail.python.org> 
                              
                              <1296035848.25379.27.camel@marge> Message-ID: 
                              
                              Am 26.01.2011 10:57, schrieb Victor Stinner: > Hi, > > Le mardi 25 janvier 2011 ? 18:07 -0800, Brett Cannon a ?crit : >> This broke the buildbots (R. David Murray thinks you may have >> forgotten to call super() in the 'payload is None' branch). Are you >> getting code reviews and fully running the test suite before >> committing? We are in RC. >> (...) >> > - if _has_surrogates(msg._payload): >> > - self.write(msg._payload) >> > + payload = msg.get_payload() >> > + if payload is None: >> > + return >> > + if _has_surrogates(payload): >> > + self.write(payload) > > I didn't realize that such minor change can do anything harmful: That's why the rule is that *every change needs to be reviewed*, not *every change that doesn't look harmful needs to be reviewed*. (This is true only for code changes, of course. Doc changes rarely have hidden bugs, nor are they embarrassing when a bug slips into the release. And I get the "test suite" (building the docs) results twice a day and can fix problems myself.) > the > parent method (Generator._handle_text) has exaclty the same test. If > msg._payload is None, call the parent method with None does nothing. But > _has_surrogates() doesn't support None. > > The problem is not the test of None, but replacing msg._payload by > msg.get_payload(). I thought that get_payload() was a dummy getter > reading self._payload, but I was completly wrong :-) > > I was stupid to not run at least test_email, sorry. And no, I didn't ask > for a review, because I thought that such minor change cannot be > harmful. I hope you know better now :) *Always* run the test suite *before* even asking for review. Georg From foom at fuhm.net Wed Jan 26 19:25:49 2011 From: foom at fuhm.net (James Y Knight) Date: Wed, 26 Jan 2011 13:25:49 -0500 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <1296060430.2672.76.camel@marge> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> <1296034834.25379.18.camel@marge> 
                              
                              <1296060430.2672.76.camel@marge> Message-ID: <7A5FC2EF-45CC-4838-8919-00E9563AF9D3@fuhm.net> On Jan 26, 2011, at 11:47 AM, Victor Stinner wrote: > Not exactly. Gtk+ uses the glib library, and to encode/decode filenames, > the glib library uses: > > - UTF-8 on Windows > - G_FILENAME_ENCODING environment variable if set (comma-separated list > of encodings) > - UTF-8 if G_BROKEN_FILENAMES env var is set > - or the locale encoding But the documentation says: > On Unix, the character sets are determined by consulting the environment variables G_FILENAME_ENCODING and G_BROKEN_FILENAMES. On Windows, the character set used in the GLib API is always UTF-8 and said environment variables have no effect. > > G_FILENAME_ENCODING may be set to a comma-separated list of character set names. The special token "@locale" is taken to mean the character set for thecurrent locale. If G_FILENAME_ENCODING is not set, but G_BROKEN_FILENAMES is, the character set of the current locale is taken as the filename encoding. If neither environment variable is set, UTF-8 is taken as the filename encoding, but the character set of the current locale is also put in the list of encodings. Which indicates to me that (unless you override the behavior with env vars) it encodes filenames in UTF-8 regardless of the locale, and attempts decoding in UTF-8 primarily. And that only when the filename doesn't make sense in UTF-8, it will also try decoding it in the locale encoding. James From andy-python at hammerhartes.de Wed Jan 26 22:09:15 2011 From: andy-python at hammerhartes.de (=?ISO-8859-1?Q?Andreas_St=FChrk?=) Date: Wed, 26 Jan 2011 22:09:15 +0100 Subject: [Python-Dev] r88178 - python/branches/py3k/Lib/test/crashers/underlying_dict.py In-Reply-To: 
                              
                              References: <20110125000028.94263EEBDB@mail.python.org> <20110125122603.74e49f8c@pitrou.net> 
                              
                              Message-ID: 
                              
                              > I gets to a dict of class circumventing dictproxy. It's yet unclear > why it segfaults. The crash as well as the output "1" are both caused because updating the class dictionary directly doesn't invalidate the method cache. When the new value for "f" is assigned to the dict, the old "f" gets garbage collected (because the method cache uses borrowed references), but there is still an entry in the cache for the (now garbage-collected) function. When "a.f" is executed next, the entry of the cache is used and a new method is created. When that method gets called, it returns "1" and when the interpreter tries to garbage collect the new method on interpreter finalization, it segfaults because the referenced "f" is already collected. Regards, Andreas From martin at v.loewis.de Wed Jan 26 22:10:48 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 26 Jan 2011 22:10:48 +0100 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <1296043036.25379.41.camel@marge> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> <1296034834.25379.18.camel@marge> <4D3FF372.3040007@v.loewis.de> <1296043036.25379.41.camel@marge> Message-ID: <4D408DD8.8030505@v.loewis.de> > If NFSv3 doesn't reencode filenames for each client and the clients > don't reencode filenames, all clients have to use the same locale > encoding than the server. Otherwise, I don't see how it can work. In practice, users accept that they get mojibake - their editors can still open the files, and they can double-click them in a file browser just fine. So it doesn't really need to work, and users can still use it. > Again, I don't think that Python should do anything special to > workaround these issues. I agree, and I'm certainly in favor of keeping the current code base. Just make sure you understand the reasoning of those opposing. Regards, Martin From a.badger at gmail.com Thu Jan 27 01:47:08 2011 From: a.badger at gmail.com (Toshio Kuratomi) Date: Wed, 26 Jan 2011 16:47:08 -0800 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <4D3FF372.3040007@v.loewis.de> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> <1296034834.25379.18.camel@marge> <4D3FF372.3040007@v.loewis.de> Message-ID: <20110127004708.GI24080@unaka.lan> On Wed, Jan 26, 2011 at 11:12:02AM +0100, "Martin v. L?wis" wrote: > Am 26.01.2011 10:40, schrieb Victor Stinner: > > Le lundi 24 janvier 2011 ? 19:26 -0800, Toshio Kuratomi a ?crit : > >> Why not locale: > >> * Relying on locale is simply not portable. (...) > >> * Mixing of modules from different locales won't work. (...) > > > > I don't understand what you are talking about. > > I think by "portability", he means "moving files from one computer to > another". He argues that if Python would mandate UTF-8 for all file > names on Unix, moving files in such a way would support portability, > whereas using the locale's filename might not (if the locale use a > different charset on the target system). > > While this is technically true, I don't think it's a helpful way of > thinking: by mandating that file names are UTF-8 when accessed from > Python, we make the actual files inaccessible on both the source and > the target system. > > > I don't understand the relation between the local filesystem encoding > > and the portability. I suppose that you are talking about the > > distribution of a module to other computers. Here the question is how > > the filenames are stored during the transfer. The user is free to use > > any tool, and try to find a tool handling Unicode correctly :-) But it's > > no more the Python problem. > > There are cases where there is no real "transfer", in the sense in which > you are using the word. For example, with NFS, you can access the very > same file simultaneously on two systems, with no file name conversion > (unless you are using NFSv4, and unless your NFSv4 implementations > support the UTF-8 mandate in NFS well). > > Also, if two users of the same machine have different locale settings, > the same file name might be interpreted differently. > Thanks Martin, I think that you understand my view even if you don't share it. There's one further case that I am worried about that has no real "transfer". Since people here seem to think that unicode module names are the future (for instance, the comments about redefining the C locale to include utf-8 and the comments about archiving tools needing to support encoding bits), there are eventually going to be unicode modules that become dependencies of other modules and programs. These will need to be installed on systems. Linux distributions that ship these will need to choose a filesystem encoding for the filenames of these. Likely the sensible thing for them to do is to use utf-8 since all the ones I can think of default to utf-8. But, as Stephen and Victor have pointed out, users change their locale settings to things that aren't utf-8 and save their modules using filenames in that encoding. When they update their OS to a version that has utf-8 python module names, they will find that they have to make a choice. They can either change their locale settings to a utf-8 encoding and have the system installed modules work or they can leave their encoding on their non-utf-8 encoding and have the modules that they've created on-site work. This is not a good position to put users of these systems in. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: 
                              
                              From nyamatongwe at gmail.com Thu Jan 27 02:37:52 2011 From: nyamatongwe at gmail.com (Neil Hodgson) Date: Thu, 27 Jan 2011 12:37:52 +1100 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <20110127004708.GI24080@unaka.lan> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> <1296034834.25379.18.camel@marge> <4D3FF372.3040007@v.loewis.de> <20110127004708.GI24080@unaka.lan> Message-ID: 
                              
                              Toshio Kuratomi: > When they update their OS to a version that has > utf-8 python module names, they will find that they have to make a choice. > They can either change their locale settings to a utf-8 encoding and have > the system installed modules work or they can leave their encoding on their > non-utf-8 encoding and have the modules that they've created on-site work. When switching to a UTF-8 locale, they can also change the file names of their modules to be encoded in UTF-8. It would be fairly easy to write a script that identifies non-ASCII file names in a directory and offers to transcode their names from their current encoding to UTF-8. Neil From v+python at g.nevcal.com Thu Jan 27 03:43:11 2011 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 26 Jan 2011 18:43:11 -0800 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: <20110127004708.GI24080@unaka.lan> References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> <1296034834.25379.18.camel@marge> <4D3FF372.3040007@v.loewis.de> <20110127004708.GI24080@unaka.lan> Message-ID: <4D40DBBF.2010808@g.nevcal.com> On 1/26/2011 4:47 PM, Toshio Kuratomi wrote: > There's one further case that I am worried about that has no real > "transfer". Since people here seem to think that unicode module names are > the future (for instance, the comments about redefining the C locale to > include utf-8 and the comments about archiving tools needing to support > encoding bits), there are eventually going to be unicode modules that become > dependencies of other modules and programs. These will need to be installed > on systems. Linux distributions that ship these will need to choose > a filesystem encoding for the filenames of these. Likely the sensible thing > for them to do is to use utf-8 since all the ones I can think of default to > utf-8. But, as Stephen and Victor have pointed out, users change their > locale settings to things that aren't utf-8 and save their modules using > filenames in that encoding. When they update their OS to a version that has > utf-8 python module names, they will find that they have to make a choice. > They can either change their locale settings to a utf-8 encoding and have > the system installed modules work or they can leave their encoding on their > non-utf-8 encoding and have the modules that they've created on-site work. > > This is not a good position to put users of these systems in. The way this case should work, is that programs that install files (installation is a form of transfer) should transform their names from the encoding used in the transfer medium to the encoding of the filesystem on which they are installed. Python3 should access the files, transforming the names from the encoding of the filesystem on which they are installed to Unicode for use by the program. I think Python3 is trying to do its part, and Victor is trying to make that more robust on more platforms, specifically Windows. The programs that install files, which may include programs that install Python files I don't know, may or may not be doing their part, but clearly there are cases where they do not. Systems that have different encodings for names on the same or different file systems need to have a way to obtain the encoding for the file names, so they can be properly decoded. If they don't have such a way, they are broken. ===== The rest of this is an attempt to describe the problem of Linux and other systems which use byte strings instead of character strings as file names. No problem, as long as programs allow byte strings as file names. Python3 does not, for the import statement, thus the problem is relevant for discussion here, as has been ongoing. ===== Since file names are defined to be byte strings, there is no way to obtain the encoding for file names, so they cannot always be decoded, and sometimes not properly decoded, because no one knows which encoding was used to create them, _if any_. Hence, Linux programs that use character strings as file names internally and expect them to match the byte strings in the file system are promoting a fiction: that there is a transformation (encoding) from character strings to byte strings that will match. When using ASCII character strings, they can be transformed to bytes using a simple transformation: identity... but that isn't necessarily correct, if the files were created using EBCDIC (unlikely on Linux systems, but not impossible, since Linux files are byte strings). When using non-ASCII character strings, the fiction promoted is even bigger, and the transformation even harder. Any 8-bit character encoding can pretend that identity is the correct transformation, but the result is mojibake if it isn't. Unicode other multi-byte encodings have an even harder job, because there can be 8-bit sequences that are not legal for some transformations, but are legal for others. This is when the fiction is exposed! As the recent description of glib points out, when the file names are read as bytes, and shown to the user for selection, possibly using some mojibake-generating transformation to characters, the user has a fighting chance to pick the right file, less chance if the transformation is lossy ('?' substitutions, etc.) and/or the names are redundant in their lossless characters. However, when the specification of the name is in characters (such as for Python import, or file names specified as character constants in any application system that provides/permits such), and there are large numbers of transformations that could be used to convert characters to bytes, the problem is harder, and error-prone... programs that want to promote the fiction of using characters for filenames must work harder. It seems that Python on Linux is such a program. One technique is to have conventions agreed on by applications and users to limit the number of encodings used on a particular system to one (optimal) or a few, the latter requires understanding that files created in one encoding may not be accessible by systems that use a different one... until they are renamed. Subsets of applications and users can the happily share files with others of their encoding, and with the subset of files that can be decoded successfully by their encoding, even though it is not correct. (often ASCII, or a few mojibake characters learned for cross-subset usage.) When multiple encodings are used without such conventions, chaos results. Another technique that would be amusing is to use Base64 (as Oleg suggested), URL-encoding, or some other mapping that transforms non-ASCII names to ASCII character sequences and the identity mapping to obtain bytes, and then Python could ship such files to any system, as long as it always included that mapping as one of the encodings it would try to find files. This would probably be the most powerful solution, but would only need to be applied to those systems that do not use characters for filenames. It could, in fact, be applied on any system that uses a subset of characters for filenames, and hence transcends the need for Unicode support in a file system to use Unicode names in Python3 import statements. It would likely be problematical for use with 3rd-party libraries, however. Another technique would be to try each possible encoding in turn, in some defined order, and the filesystem searched for that byte string as a file name, possibly matching files that shouldn't have been matched. To limit that search, such programs could allow configuration of an smaller ordered list of encodings to be tried to limit the search, and a specific one to be used for the creation of new files; this opens up the possibility of not trying the "right" encoding, for some rogue file name. This would be an issue and implementation for Linux systems, but would not need to be used on systems such as MacOS (which defines a particular encoding) or Windows (which defines a particular encoding) etc. When mounting filesystems that use byte string file names on systems with a define encoding, it should be the responsibility of the mounting system to do such transformations, and possibly have such configurations, and possibly have mappings or renaming facilities, and possibly prohibit access to files whose names cannot be transformed (of course, one can always punt by configuring latin-1 or other encodings that can match any byte string, but that produces mojibake, and then there is no surety that particular files will appear to have the name that programs expect). Of course, Victor's patch is addressing Windows issues, and Windows has defined encodings, it is just a matter of using the proper APIs to see them, and should be accepted. It sounds like the current situation on Linux is that Python can access the subset of files that match the locale encoding for which it is run. It sounds like it would be inappropriate for Python to begin shipping files with non-ASCII names as part of its Linux distribution, unless facilities are created or tools used to remap non-ASCII names to the local locale encoding. Locales that are not ASCII supersets (in character repertoire, not encoding) could not be supported. Locales that do not support all the characters used in files shipped with Python could not be supported. Since locales vary wildly in their available non-ASCII names, that limits Python eithr to shipping ASCII names only, or restricting the locales that are supported to those that support the characters used. I suppose that Victor's patch would point out most or all the places where such transformations would have to be implemented, if it is important to support systems having byte string file names whose users cannot agree to use a single encoding for transforming to/from characters. -------------- next part -------------- An HTML attachment was scrubbed... URL: 
                              
                              From greg at krypto.org Thu Jan 27 06:50:30 2011 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 26 Jan 2011 21:50:30 -0800 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <1295911245.3704.13.camel@localhost.localdomain> References: <4D3DDE5E.4080807@v.loewis.de> <20110124231233.79bed8eb@pitrou.net> <4D3E0617.7010001@v.loewis.de> <1295911245.3704.13.camel@localhost.localdomain> Message-ID: 
                              
                              On Mon, Jan 24, 2011 at 3:20 PM, Antoine Pitrou 
                              
                              wrote: > Le mardi 25 janvier 2011 ? 00:07 +0100, "Martin v. L?wis" a ?crit : >> >> I'd like to propose PEP 393, which takes a different approach, >> >> addressing both problems simultaneously: by getting a flexible >> >> representation (one that can be either 1, 2, or 4 bytes), we can >> >> support the full range of Unicode on all systems, but still use >> >> only one byte per character for strings that are pure ASCII (which >> >> will be the majority of strings for the majority of users). >> > >> > For this kind of experiment, I think a concrete attempt at implementing >> > (together with performance/memory savings numbers) would be much more >> > useful than an abstract proposal. >> >> I partially agree. An implementation is certainly needed, but there is >> nothing wrong (IMO) with designing the change before implementing it. >> Also, several people have offered to help with the implementation, so >> we need to agree on a specification first (which is actually cheaper >> than starting with the implementation only to find out that people >> misunderstood each other). > > I'm not sure it's really cheaper. When implementing you will probably > find out that it makes more sense to change the meaning of some fields, > add or remove some, etc. You will also want to try various tweaks since > the whole point is to lighten the footprint of unicode strings in common > workloads. Yep. This is only a proposal, an implementation will allow all of that to be experimented with. I have frequently see code today, even in python 2.x, that suffers greatly from unicode vs string use (due to APIs in some code that were returning unicode objects unnecessarily when the data was really all ascii text). python 3.x only increases this as the default for so many things passes through unicode even for programs that may not need it. > > So, the only criticism I have, intuitively, is that the unicode > structure seems to become a bit too large. For example, I'm not sure you > need a generic (pointer, size) pair in addition to the > representation-specific ones. I believe the intent this pep is aiming at is for the existing in memory structure to be compatible with already compiled binary extension modules without having to recompile them or change the APIs they are using. Personally I don't care at all about preserving that level of binary compatibility, it has been convenient in the past but is rarely the right thing to do. Of course I'd personally like to see PyObject nuked and revisited, it is too large and is probably not cache line efficient. > > Incidentally, to slightly reduce the overhead the unicode objects, > there's this proposal: http://bugs.python.org/issue1943 Interesting. But that aims more at cpu performance than memory overhead. What I see is programs that predominantly process ascii data yet waste memory on a 2-4x data explosion of the internal representation. This PEP aims to address that larger target. -gps From lukasz at langa.pl Thu Jan 27 11:00:19 2011 From: lukasz at langa.pl (=?UTF-8?B?xYF1a2FzeiBMYW5nYQ==?=) Date: Thu, 27 Jan 2011 11:00:19 +0100 Subject: [Python-Dev] Why do we bundle lib2to3 with Python? Was: Location of tests for packages Message-ID: <4D414233.4000906@langa.pl> W dniu 2011-01-24 23:13, Benjamin Peterson pisze: > I prefer lib2to3 tests to stay in lib2to3/. On a related note, I had trouble myself with using outdated 2to3 and heard complaints about that at least a couple of times. What do we gain from bundling 2to3 with Python? -- Best regards, ?ukasz Langa From solipsis at pitrou.net Thu Jan 27 15:57:18 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 27 Jan 2011 15:57:18 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: 
                              
                              References: <4D3DDE5E.4080807@v.loewis.de> <20110124231233.79bed8eb@pitrou.net> <4D3E0617.7010001@v.loewis.de> <1295911245.3704.13.camel@localhost.localdomain> 
                              
                              Message-ID: <1296140238.3685.0.camel@localhost.localdomain> Le mercredi 26 janvier 2011 ? 21:50 -0800, Gregory P. Smith a ?crit : > > > > Incidentally, to slightly reduce the overhead the unicode objects, > > there's this proposal: http://bugs.python.org/issue1943 > > Interesting. But that aims more at cpu performance than memory > overhead. What I see is programs that predominantly process ascii > data yet waste memory on a 2-4x data explosion of the internal > representation. This PEP aims to address that larger target. Right, but we should keep in mind that many unicode strings will not be very large, and so the constant overhead of unicode objects is not necessarily negligible. Regards Antoine. From brett at python.org Thu Jan 27 18:22:47 2011 From: brett at python.org (Brett Cannon) Date: Thu, 27 Jan 2011 09:22:47 -0800 Subject: [Python-Dev] Why do we bundle lib2to3 with Python? Was: Location of tests for packages In-Reply-To: <4D414233.4000906@langa.pl> References: <4D414233.4000906@langa.pl> Message-ID: 
                              
                              2011/1/27 ?ukasz Langa 
                              
                              : > > W dniu 2011-01-24 23:13, Benjamin Peterson pisze: >> >> ?I prefer lib2to3 tests to stay in lib2to3/. > > On a related note, I had trouble myself with using outdated 2to3 and > heard complaints about that at least a couple of times. What do we gain > from bundling 2to3 with Python? Same thing we get when we bundle anything with Python: one less dependency for people to download. Obviously this shouldn't be as much of an issue once Python 3.2 is out. From stefan_ml at behnel.de Thu Jan 27 20:06:10 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 27 Jan 2011 20:06:10 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <4D3DDE5E.4080807@v.loewis.de> References: <4D3DDE5E.4080807@v.loewis.de> Message-ID: 
                              
                              "Martin v. L?wis", 24.01.2011 21:17: > The Py_UNICODE type is still supported but deprecated. It is always > defined as a typedef for wchar_t, so the wstr representation can double > as Py_UNICODE representation. It's too bad this isn't initialised by default, though. Py_UNICODE is the only representation that can be used efficiently from C code and Cython relies on it for fast text processing. This proposal will therefore likely have a pretty negative performance impact on extensions written in Cython as the compiler could no longer expect this representation to be available instantaneously. Stefan From martin at v.loewis.de Thu Jan 27 21:26:13 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 27 Jan 2011 21:26:13 +0100 Subject: [Python-Dev] Import and unicode: part two In-Reply-To: 
                              
                              References: <1295440442.432.18.camel@marge> 
                              
                              
                              
                              
                              
                              
                              <20110125032609.GC24080@unaka.lan> <1296034834.25379.18.camel@marge> <4D3FF372.3040007@v.loewis.de> <20110127004708.GI24080@unaka.lan> 
                              
                              Message-ID: <4D41D4E5.5050603@v.loewis.de> > When switching to a UTF-8 locale, they can also change the file > names of their modules to be encoded in UTF-8. It would be fairly easy > to write a script that identifies non-ASCII file names in a directory > and offers to transcode their names from their current encoding to > UTF-8. In fact, convmv (http://j3e.de/linux/convmv/) does exactly that; it comes as a Debian package also. Regards, Martin From foom at fuhm.net Thu Jan 27 21:26:15 2011 From: foom at fuhm.net (James Y Knight) Date: Thu, 27 Jan 2011 15:26:15 -0500 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: 
                              
                              References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              Message-ID: <999F03D8-C1D7-4A3D-BC7A-3A195CFD9CE5@fuhm.net> On Jan 27, 2011, at 2:06 PM, Stefan Behnel wrote: > "Martin v. L?wis", 24.01.2011 21:17: >> The Py_UNICODE type is still supported but deprecated. It is always >> defined as a typedef for wchar_t, so the wstr representation can double >> as Py_UNICODE representation. > > It's too bad this isn't initialised by default, though. Py_UNICODE is the only representation that can be used efficiently from C code and Cython relies on it for fast text processing. This proposal will therefore likely have a pretty negative performance impact on extensions written in Cython as the compiler could no longer expect this representation to be available instantaneously. But the whole point of the exercise is so that it doesn't have to store a 4byte-per-char representation when a 1byte-per-char rep would do. If cython wants to work most efficiently with this proposal, it should learn to deal with the three possible raw representations. James From brett at python.org Thu Jan 27 21:38:45 2011 From: brett at python.org (Brett Cannon) Date: Thu, 27 Jan 2011 12:38:45 -0800 Subject: [Python-Dev] getting stable URLs for major.minor versions Message-ID: 
                              
                              Because of all the writing I have been doing lately, I have been pulling up a lot of URLs pointing to various Python releases based around minor versions (e.g., Python 2.7, not specifically 2.7.1). What has been somewhat annoying is that there are no URLs which act as a redirect to the latest release of a minor version. For instance, it would be great if http://www.python.org/2.7 redirected to the Python 2.7.1 page. Linking to the 2.7.0 release page seems off since it is out of date, but linking to 2.7.1 also seems silly as that will become out of date as the newest release of Python 2.7 at some point as well. Can we consider coming up with some URL scheme where people can link to a version of Python that always redirects to the newest release? Bonus points if we extend this to major versions, too. =) I am asking here since the RMs will have to be okay with doing this as part of the release plan. Get the ball rolling, I say we make http://www.python.org/version/2.7 and http://www.python.org/version/2 redirect to the 2.7.1 release page, etc. Personally I would rather have http://www.python.org/2.7 redirect to 2.7.1, but since that already redirects to 2.7.0 I doubt people would be okay with the change. From martin at v.loewis.de Thu Jan 27 22:05:38 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 27 Jan 2011 22:05:38 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <1295911245.3704.13.camel@localhost.localdomain> References: <4D3DDE5E.4080807@v.loewis.de> <20110124231233.79bed8eb@pitrou.net> <4D3E0617.7010001@v.loewis.de> <1295911245.3704.13.camel@localhost.localdomain> Message-ID: <4D41DE22.2000100@v.loewis.de> > So, the only criticism I have, intuitively, is that the unicode > structure seems to become a bit too large. For example, I'm not sure you > need a generic (pointer, size) pair in addition to the > representation-specific ones. It's not really a generic pointer, but rather a variable-sized pointer. It may not fit into any of the other representations (e.g. if there is a four-byte wchar_t, then a two-byte representation would fit neither into the UTF-8 pointer nor into the wchar_t pointer). > Incidentally, to slightly reduce the overhead the unicode objects, > there's this proposal: http://bugs.python.org/issue1943 I wonder what aspects of this patch and discussion should be integrated into the PEP. The notion of allocating the memory in the same block is already considered in the PEP; what else might be relevant? Input is welcome! Regards, Martin From martin at v.loewis.de Thu Jan 27 22:07:58 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 27 Jan 2011 22:07:58 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: 
                              
                              References: <4D3DDE5E.4080807@v.loewis.de> <20110124231233.79bed8eb@pitrou.net> <4D3E0617.7010001@v.loewis.de> <1295911245.3704.13.camel@localhost.localdomain> 
                              
                              Message-ID: <4D41DEAE.3080501@v.loewis.de> > I believe the intent this pep is aiming at is for the existing in > memory structure to be compatible with already compiled binary > extension modules without having to recompile them or change the APIs > they are using. No, binary compatibility is not achieved. ABI-conforming modules will continue to work even under this change, but only because access to the unicode object internal representation is not available to the restricted ABI. > Personally I don't care at all about preserving that level of binary > compatibility, it has been convenient in the past but is rarely the > right thing to do. Of course I'd personally like to see PyObject > nuked and revisited, it is too large and is probably not cache line > efficient. That's a different PEP :-) Regards, Martin From v+python at g.nevcal.com Thu Jan 27 22:06:18 2011 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 27 Jan 2011 13:06:18 -0800 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <999F03D8-C1D7-4A3D-BC7A-3A195CFD9CE5@fuhm.net> References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              <999F03D8-C1D7-4A3D-BC7A-3A195CFD9CE5@fuhm.net> Message-ID: <4D41DE4A.20303@g.nevcal.com> On 1/27/2011 12:26 PM, James Y Knight wrote: > On Jan 27, 2011, at 2:06 PM, Stefan Behnel wrote: >> "Martin v. L?wis", 24.01.2011 21:17: >>> The Py_UNICODE type is still supported but deprecated. It is always >>> defined as a typedef for wchar_t, so the wstr representation can double >>> as Py_UNICODE representation. >> It's too bad this isn't initialised by default, though. Py_UNICODE is the only representation that can be used efficiently from C code and Cython relies on it for fast text processing. This proposal will therefore likely have a pretty negative performance impact on extensions written in Cython as the compiler could no longer expect this representation to be available instantaneously. > But the whole point of the exercise is so that it doesn't have to store a 4byte-per-char representation when a 1byte-per-char rep would do. If cython wants to work most efficiently with this proposal, it should learn to deal with the three possible raw representations. C was doing fast text processing on char long before Py_UNICODE existed, or wchar_t. -------------- next part -------------- An HTML attachment was scrubbed... URL: 
                              
                              From martin at v.loewis.de Thu Jan 27 22:16:54 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 27 Jan 2011 22:16:54 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <1295915323.3219.44.camel@radiator.bos.redhat.com> References: <4D3DDE5E.4080807@v.loewis.de> <1295915323.3219.44.camel@radiator.bos.redhat.com> Message-ID: <4D41E0C6.3090102@v.loewis.de> > Repetition of "11"; I'm guessing that the 2byte/UCS-2 should read "10", > so that they give the width of the char representation. Thanks, fixed. >> 00 => null pointer > > Naturally this assumes that all pointers are at least 4-byte aligned (so > that they can be masked off). I assume that this is sane on every > platform that Python supports, but should it be spelled out explicitly > somewhere in the PEP? I'll change the PEP to move the type indicator into the state field, so that issue becomes irrelevant. >> The string is null-terminated (in its respective representation). >> - hash, state: same as in Python 3.2 >> - utf8_length, utf8: UTF-8 representation (null-terminated) > If this is to share its buffer with the "str" representation for the > Latin-1 case, then I take it this ptr will typically be (str & ~4) ? > i.e. only "str" has the low-order-bit type info. Yes, the other pointers are aligned. Notice that the case in which sharing occurs is only ASCII, though (for Latin-1, some characters require two bytes in UTF-8). > Spelling out the meaning of "optional": > does this mean that the relevant ptr is NULL; if so, if utf8 is null, > is utf8_length undefined, or is it some dummy value? I've clarified this: I propose length is undefined (unless there is a good reason to clear it). >> If the string is created directly with the canonical representation >> (see below), this representation doesn't take a separate memory block, >> but is allocated right after the PyUnicodeObject struct. > > Is the idea to do pointer arithmentic when deleting the PyUnicodeObject > to determine if the ptr is in that location, and not delete it if it is, > or is there some other way of determining whether the pointers need > deallocating? Correct. > If the former, is this embedding an assumption that the > underlying allocator couldn't have allocated a buffer directly adjacent > to the PyUnicodeObject. I know that GNU libc's malloc/free > implementation has gaps of two machine words between each allocation; > off the top of my head I'm not sure if the optimized Object/obmalloc.c > allocator enforces such gaps. No, it doesn't... So I guess I reserve another bit in the state for that. > GDB Debugging Hooks > ------------------- > Tools/gdb/libpython.py contains debugging hooks that embed knowledge > about the internals of CPython's data types, include PyUnicodeObject > instances. It will need to be slightly updated to track the change. Thanks, added. Regards, Martin From fdrake at acm.org Thu Jan 27 22:16:59 2011 From: fdrake at acm.org (Fred Drake) Date: Thu, 27 Jan 2011 16:16:59 -0500 Subject: [Python-Dev] getting stable URLs for major.minor versions In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: 
                              
                              On Thu, Jan 27, 2011 at 3:38 PM, Brett Cannon 
                              
                              wrote: > Linking to the 2.7.0 release page seems off since it is > out of date, but linking to 2.7.1 also seems silly as that will become > out of date as the newest release of Python 2.7 at some point as well. I'd love to see something like this as well. Part of the problem is that when we want URLs to specific versions (which might even mean 2.7.0), we use the version number as released, and... there's really not a 2.7.0. I'd love for us to include ".0" in the actual release number, instead of calling it just 2.7. Then we could much more easily handle this for docs, downloads, and anywhere else we want to multi-plex multiple versions. ? -Fred -- Fred L. Drake, Jr.? ? 
                              
                              "A storm broke loose in my mind."? --Albert Einstein From skip at pobox.com Thu Jan 27 22:21:22 2011 From: skip at pobox.com (skip at pobox.com) Date: Thu, 27 Jan 2011 15:21:22 -0600 Subject: [Python-Dev] getting stable URLs for major.minor versions In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: <19777.57810.481593.401954@montanaro.dyndns.org> Brett> Bonus points if we extend this to major versions, too. =) I know you added a smiley, but just wanted to point out that since Python 2 and 3 are really different languages, referring 2.4 users to 3.3 might be a bad idea. (I imagine it wouldn't be hard to generalize from micro to minor though. 
                              
                              ) Skip From stefan_ml at behnel.de Thu Jan 27 22:24:34 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 27 Jan 2011 22:24:34 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <999F03D8-C1D7-4A3D-BC7A-3A195CFD9CE5@fuhm.net> References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              <999F03D8-C1D7-4A3D-BC7A-3A195CFD9CE5@fuhm.net> Message-ID: 
                              
                              James Y Knight, 27.01.2011 21:26: > On Jan 27, 2011, at 2:06 PM, Stefan Behnel wrote: >> "Martin v. L?wis", 24.01.2011 21:17: >>> The Py_UNICODE type is still supported but deprecated. It is always >>> defined as a typedef for wchar_t, so the wstr representation can >>> double as Py_UNICODE representation. >> >> It's too bad this isn't initialised by default, though. Py_UNICODE is >> the only representation that can be used efficiently from C code and >> Cython relies on it for fast text processing. This proposal will >> therefore likely have a pretty negative performance impact on >> extensions written in Cython as the compiler could no longer expect >> this representation to be available instantaneously. > > But the whole point of the exercise is so that it doesn't have to store > a 4byte-per-char representation when a 1byte-per-char rep would do. I am well aware of that. But I'm arguing that the current simpler internal representation has had its advantages for CPython as a platform. > If cython wants to work most efficiently with this proposal, it should > learn to deal with the three possible raw representations. I agree. After all, CPython is lucky to have it available. It wouldn't be the first time that we duplicate looping code based on the input type. However, like the looping code, it will also complicate all indexing code at runtime as it always needs to test which of the representations is current before it can read a character. Currently, all of this is a compile time decision. This will necessarily have a performance impact. Stefan From martin at v.loewis.de Thu Jan 27 22:37:32 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 27 Jan 2011 22:37:32 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: 
                              
                              References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              Message-ID: <4D41E59C.9060903@v.loewis.de> Am 25.01.2011 12:08, schrieb Nick Coghlan: > On Tue, Jan 25, 2011 at 6:17 AM, "Martin v. L?wis" 
                              
                              wrote: >> A new function PyUnicode_AsUTF8 is provided to access the UTF-8 >> representation. It is thus identical to the existing >> _PyUnicode_AsString, which is removed. The function will compute the >> utf8 representation when first called. Since this representation will >> consume memory until the string object is released, applications >> should use the existing PyUnicode_AsUTF8String where possible >> (which generates a new string object every time). API that implicitly >> converts a string to a char* (such as the ParseTuple functions) will >> use this function to compute a conversion. > > I'm not entirely clear as to what "this function" is referring to here. PyUnicode_AsUTF8 (i.e. the one where you don't need to release the memory). I made this explicit now. > I'm also dubious of the "PyUnicode_Finalize" name - "PyUnicode_Ready" > might be a better option (PyType_Ready seems a better analogy for a > "I've filled everything in, please calculate the derived fields now" > than Py_Finalize). Ok, changed (when I was pondering about this PEP, this once occurred me also, but I forgot when I typed it in). > > More generally, let me see if I understand the proposed structure correctly: > > str: Always set once PyUnicode_Ready() has been called. > Always points to the canonical representation of the string (as > indicated by PyUnicode_Kind) > length: Always set once PyUnicode_Ready() has been called. Specifies > the number of code points in the string. Correct. > wstr: Set only if PyUnicode_AsUnicode has been called on the string. Might also be set when the string is created through PyUnicode_FromUnicode was used, and PyUnicode_Ready hasn't been called. > If (sizeof(wchar_t) == 2 && PyUnicode_Kind() == PyUnicode_2BYTE) > or (sizeof(wchar_t) == 4 && PyUnicode_Kind() == PyUnicode_4BYTE), wstr > = str, otherwise wstr points to dedicated memory > wstr_length: Valid only if wstr != NULL > If wstr_length != length, indicates presence of surrogate pairs in > a UCS-2 string (i.e. sizeof(wchar_t) == 2, PyUnicode_Kind() == > PyUnicode_4BYTE). Correct. > utf8: Set only if PyUnicode_AsUTF8 has been called on the string. > If string contents are pure ASCII, utf8 = str, otherwise utf8 > points to dedicated memory. > utf8_length: Valid only if utf8_ptr != NULL Correct. > One change I would propose is that rather than hiding flags in the low > order bits of the str pointer, we expand the use of the existing > "state" field to cover the representation information in addition to > the interning information. Thanks for the idea; done. > I would also suggest explicitly flagging > internally whether or not a 1 byte string is ASCII or Latin-1 along > the lines of: Not sure about that. It would complicate PyUnicode_Kind. Instead, I'd rather fill out utf8 right away if we can use sharing (e.g. when the string is created with a max value <128, or PyUnicode_Ready has determined that). So I keep it for the moment as reserved (but would use it when str is NULL, as I'd have to fill in some value, anyway). Regards, Martin From martin at v.loewis.de Thu Jan 27 22:42:39 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 27 Jan 2011 22:42:39 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <4D3F5228.4010901@egenix.com> References: <4D3DDE5E.4080807@v.loewis.de> <4D3F5228.4010901@egenix.com> Message-ID: <4D41E6CF.1020206@v.loewis.de> >>From my first impression, I'm > not too thrilled by the prospect of making the Unicode implementation > more complicated by having three different representations on each > object. Thanks, added as a concern. > I also don't see how this could save a lot of memory. As an example > take a French text with say 10mio code points. This would end up > appearing in memory as 3 copies on Windows: one copy stored as UCS2 (20MB), > one as Latin-1 (10MB) and one as UTF-8 (probably around 15MB, depending > on how many accents are used). That's a saving of -10MB compared to > today's implementation :-) As others have pointed out: that's not how it works. It actually *will* save memory, since the alternative representations are optional. Regards, Martin From martin at v.loewis.de Thu Jan 27 22:47:03 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 27 Jan 2011 22:47:03 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: 
                              
                              References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              Message-ID: <4D41E7D7.5060708@v.loewis.de> Am 27.01.2011 20:06, schrieb Stefan Behnel: > "Martin v. L?wis", 24.01.2011 21:17: >> The Py_UNICODE type is still supported but deprecated. It is always >> defined as a typedef for wchar_t, so the wstr representation can double >> as Py_UNICODE representation. > > It's too bad this isn't initialised by default, though. Py_UNICODE is > the only representation that can be used efficiently from C code and > Cython relies on it for fast text processing. That's not true. The str representation can also be used efficiently from C. > This proposal will > therefore likely have a pretty negative performance impact on extensions > written in Cython as the compiler could no longer expect this > representation to be available instantaneously. In any case, I've added this concern. Regards, Martin From martin at v.loewis.de Thu Jan 27 22:54:25 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 27 Jan 2011 22:54:25 +0100 Subject: [Python-Dev] getting stable URLs for major.minor versions In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: <4D41E991.7030002@v.loewis.de> Am 27.01.2011 21:38, schrieb Brett Cannon: > Because of all the writing I have been doing lately, I have been > pulling up a lot of URLs pointing to various Python releases based > around minor versions (e.g., Python 2.7, not specifically 2.7.1). What > has been somewhat annoying is that there are no URLs which act as a > redirect to the latest release of a minor version. For instance, it > would be great if http://www.python.org/2.7 redirected to the Python > 2.7.1 page. The tradition is that /X.Y actually points to download/releases/X.Y. These redirects haven't been added for 2.7, but are present for all earlier releases, and 3.1. So unless there are strong objections, I'll add the missing redirects soon. > Get the ball rolling, I say we make http://www.python.org/version/2.7 > and http://www.python.org/version/2 redirect to the 2.7.1 release > page, etc. Personally I would rather have http://www.python.org/2.7 > redirect to 2.7.1, but since that already redirects to 2.7.0 I doubt > people would be okay with the change. How about http://www.python.org/2.7.x redirecting to the latest 2.7.x release? Likewise 2.x and 3.x. Regards, Martin From brett at python.org Thu Jan 27 22:40:25 2011 From: brett at python.org (Brett Cannon) Date: Thu, 27 Jan 2011 13:40:25 -0800 Subject: [Python-Dev] getting stable URLs for major.minor versions In-Reply-To: <19777.57810.481593.401954@montanaro.dyndns.org> References: 
                              
                              <19777.57810.481593.401954@montanaro.dyndns.org> Message-ID: 
                              
                              On Thu, Jan 27, 2011 at 13:21, 
                              
                              wrote: > ? ?Brett> Bonus points if we extend this to major versions, too. =) > > I know you added a smiley, but just wanted to point out that since Python 2 > and 3 are really different languages, referring 2.4 users to 3.3 might be a > bad idea. ?(I imagine it wouldn't be hard to generalize from micro to minor > though. 
                              
                              ) I don't get what you are worried about: http://www.python.org/2 would refer to 2.7.1 while http://www.python.org/3 would refer to 3.1.3. I added the smiley as I doubt many people worry about linking to Python 2 vs. Python 3 as generically as I have lately. > > Skip > From martin at v.loewis.de Thu Jan 27 23:01:42 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 27 Jan 2011 23:01:42 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: 
                              
                              References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              <999F03D8-C1D7-4A3D-BC7A-3A195CFD9CE5@fuhm.net> 
                              
                              Message-ID: <4D41EB46.7060105@v.loewis.de> > I agree. After all, CPython is lucky to have it available. It wouldn't > be the first time that we duplicate looping code based on the input > type. However, like the looping code, it will also complicate all > indexing code at runtime as it always needs to test which of the > representations is current before it can read a character. Currently, > all of this is a compile time decision. This will necessarily have a > performance impact. That's most certainly the case. That's one of the reasons to discuss this through a PEP, rather than just coming up with a patch: if people object to it too much because of the impact on execution speed, it may get rejected. Of course, that would make those unhappy who complain about the memory consumption. This is a classical time-space-tradeoff, favoring space reduction over time reduction. I fully understand that the actual impact can only be observed when an implementation is available, and applications have made a reasonable effort to work with the implementation efficiently (or perhaps not, which would show the impact on unmodified implementations). This is something that works much better in PyPy: the actual string operations are written in RPython, and the tracing JIT would generate all versions of the code that are relevant for the different representations (IIUC, this approach is only planned for PyPy, yet). I hope that C macros can help reduce the maintenance burden. Regards, Martin From greg at krypto.org Thu Jan 27 23:05:51 2011 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 27 Jan 2011 14:05:51 -0800 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <4D41E6CF.1020206@v.loewis.de> References: <4D3DDE5E.4080807@v.loewis.de> <4D3F5228.4010901@egenix.com> <4D41E6CF.1020206@v.loewis.de> Message-ID: 
                              
                              BTW, has anyone looked at what other languages with a native unicode type do for their implementations if any of them attempt to conserve ram? From brett at python.org Thu Jan 27 22:57:24 2011 From: brett at python.org (Brett Cannon) Date: Thu, 27 Jan 2011 13:57:24 -0800 Subject: [Python-Dev] getting stable URLs for major.minor versions In-Reply-To: <4D41E991.7030002@v.loewis.de> References: 
                              
                              <4D41E991.7030002@v.loewis.de> Message-ID: 
                              
                              On Thu, Jan 27, 2011 at 13:54, "Martin v. L?wis" 
                              
                              wrote: > Am 27.01.2011 21:38, schrieb Brett Cannon: >> Because of all the writing I have been doing lately, I have been >> pulling up a lot of URLs pointing to various Python releases based >> around minor versions (e.g., Python 2.7, not specifically 2.7.1). What >> has been somewhat annoying is that there are no URLs which act as a >> redirect to the latest release of a minor version. For instance, it >> would be great if http://www.python.org/2.7 redirected to the Python >> 2.7.1 page. > > The tradition is that /X.Y actually points to download/releases/X.Y. > These redirects haven't been added for 2.7, but are present for all > earlier releases, and 3.1. So unless there are strong objections, > I'll add the missing redirects soon. That would be great. Keeping bumping up against the missing 2.7 redirect. > >> Get the ball rolling, I say we make http://www.python.org/version/2.7 >> and http://www.python.org/version/2 redirect to the 2.7.1 release >> page, etc. Personally I would rather have http://www.python.org/2.7 >> redirect to 2.7.1, but since that already redirects to 2.7.0 I doubt >> people would be okay with the change. > > How about http://www.python.org/2.7.x redirecting to the latest 2.7.x > release? Likewise 2.x and 3.x. Works for me! Short and elegant. From alexander.belopolsky at gmail.com Thu Jan 27 23:25:44 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 27 Jan 2011 17:25:44 -0500 Subject: [Python-Dev] getting stable URLs for major.minor versions In-Reply-To: <4D41E991.7030002@v.loewis.de> References: 
                              
                              <4D41E991.7030002@v.loewis.de> Message-ID: 
                              
                              On Thu, Jan 27, 2011 at 4:54 PM, "Martin v. L?wis" 
                              
                              wrote: .. > How about http://www.python.org/2.7.x redirecting to the latest 2.7.x > release? Likewise 2.x and 3.x. Whatever we do, let's use this opportunity to unify redirect rules for http://www.python.org/X.Y and http://docs.python.org/X.Y. For a related discussion, see http://bugs.python.org/issue10446. From solipsis at pitrou.net Thu Jan 27 23:30:08 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 27 Jan 2011 23:30:08 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <4D41DE22.2000100@v.loewis.de> References: <4D3DDE5E.4080807@v.loewis.de> <20110124231233.79bed8eb@pitrou.net> <4D3E0617.7010001@v.loewis.de> <1295911245.3704.13.camel@localhost.localdomain> <4D41DE22.2000100@v.loewis.de> Message-ID: <1296167408.3693.1.camel@localhost.localdomain> > > Incidentally, to slightly reduce the overhead the unicode objects, > > there's this proposal: http://bugs.python.org/issue1943 > > I wonder what aspects of this patch and discussion should be integrated > into the PEP. The notion of allocating the memory in the same block is > already considered in the PEP; what else might be relevant? Ok, I'm sorry for not reading the PEP carefully enough, then. The patch does a couple of other tweaks such as making "state" a char rather than an int, and changing the freelist algorithm. But the latter doesn't need to be spelled out in a PEP anyway. Regards Antoine. From martin at v.loewis.de Thu Jan 27 23:40:29 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 27 Jan 2011 23:40:29 +0100 Subject: [Python-Dev] getting stable URLs for major.minor versions In-Reply-To: 
                              
                              References: 
                              
                              <4D41E991.7030002@v.loewis.de> 
                              
                              Message-ID: <4D41F45D.5020105@v.loewis.de> > Whatever we do, let's use this opportunity to unify redirect rules > for http://www.python.org/X.Y and http://docs.python.org/X.Y. For a > related discussion, see http://bugs.python.org/issue10446. TLDR; somebody should summarize it and specify what exactly needs to be changed. I'm only going to change the release redirects now. Regards, Martin From alexander.belopolsky at gmail.com Thu Jan 27 23:50:07 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 27 Jan 2011 17:50:07 -0500 Subject: [Python-Dev] getting stable URLs for major.minor versions In-Reply-To: <4D41F45D.5020105@v.loewis.de> References: 
                              
                              <4D41E991.7030002@v.loewis.de> 
                              
                              <4D41F45D.5020105@v.loewis.de> Message-ID: 
                              
                              On Thu, Jan 27, 2011 at 5:40 PM, "Martin v. L?wis" 
                              
                              wrote: >> Whatever we do, let's use this opportunity to ?unify redirect rules >> for http://www.python.org/X.Y and http://docs.python.org/X.Y. ?For a >> related discussion, see http://bugs.python.org/issue10446. > > TLDR; somebody should summarize it and specify what exactly needs to > be changed. > AFAICT, http://docs.python.org/X.Y links consistently point to http://docs.python.org/release/X.Y.Z, where Z is the last micro release of X.Y major.minor series. I don't see any reason to change anything at the moment, but if http://www.python.org will grow X.Y.x redirects, it would be nice to have the same under http://docs.python.org/release/ if not under http://docs.python.org/. From stefan_ml at behnel.de Thu Jan 27 23:53:40 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 27 Jan 2011 23:53:40 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <4D3DDE5E.4080807@v.loewis.de> References: <4D3DDE5E.4080807@v.loewis.de> Message-ID: 
                              
                              "Martin v. L?wis", 24.01.2011 21:17: > If the string is created directly with the canonical representation > (see below), this representation doesn't take a separate memory block, > but is allocated right after the PyUnicodeObject struct. Does this mean it's supposed to become a PyVarObject? Antoine proposed that, too. Apart from breaking (more or less) all existing C subtyping code, this will also make it harder to subtype it in new code. I don't like that idea at all. Stefan From martin at v.loewis.de Fri Jan 28 00:56:03 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Fri, 28 Jan 2011 00:56:03 +0100 Subject: [Python-Dev] getting stable URLs for major.minor versions In-Reply-To: 
                              
                              References: 
                              
                              <4D41E991.7030002@v.loewis.de> 
                              
                              Message-ID: <4D420613.8030603@v.loewis.de> > Works for me! Short and elegant. Done! http://www.python.org/2.6.x http://www.python.org/2.x http://www.python.org/3.1.x http://www.python.org/3.x Regards, Martin From martin at v.loewis.de Fri Jan 28 01:02:32 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Fri, 28 Jan 2011 01:02:32 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: 
                              
                              References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              Message-ID: <4D420798.1050004@v.loewis.de> Am 27.01.2011 23:53, schrieb Stefan Behnel: > "Martin v. L?wis", 24.01.2011 21:17: >> If the string is created directly with the canonical representation >> (see below), this representation doesn't take a separate memory block, >> but is allocated right after the PyUnicodeObject struct. > > Does this mean it's supposed to become a PyVarObject? What do you mean by "become"? Will it be declared as such? No. > Antoine proposed > that, too. Apart from breaking (more or less) all existing C subtyping > code, this will also make it harder to subtype it in new code. I don't > like that idea at all. Why will it break all existing subtyping code? See the PEP: Only objects created through PyUnicode_New will be affected - I don't think this can include objects of a subtype. Regards, Martin From eliben at gmail.com Fri Jan 28 05:55:22 2011 From: eliben at gmail.com (Eli Bendersky) Date: Fri, 28 Jan 2011 06:55:22 +0200 Subject: [Python-Dev] fcmp() in test.support Message-ID: 
                              
                              I'm working on improving the .rst documentation of test.support (Issue 11015), and came upon the undocumented "fcmp" function that's being exported from test.support, along with a "FUZZ"constant. As I search through the tests (py3k trunk), I see fcmp() is being used only in two places in a fairly trivial way: 1. test_float: where it can be directly replaced by assertAlmostEqual from unittest 2. test_builtin: where the assertion can also be easily rewritten in terms of assertAlmostEqual Although fcmp seems to provide extra functionality over assertAlmostEqual, the above makes me think it should probably be removed altogether, or added to unittest if it's still deemed important. +/- ? Eli From stefan_ml at behnel.de Fri Jan 28 07:20:26 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 28 Jan 2011 07:20:26 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <4D420798.1050004@v.loewis.de> References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              <4D420798.1050004@v.loewis.de> Message-ID: 
                              
                              "Martin v. L?wis", 28.01.2011 01:02: > Am 27.01.2011 23:53, schrieb Stefan Behnel: >> "Martin v. L?wis", 24.01.2011 21:17: >>> If the string is created directly with the canonical representation >>> (see below), this representation doesn't take a separate memory block, >>> but is allocated right after the PyUnicodeObject struct. >> >> Does this mean it's supposed to become a PyVarObject? > > What do you mean by "become"? Will it be declared as such? No. > >> Antoine proposed >> that, too. Apart from breaking (more or less) all existing C subtyping >> code, this will also make it harder to subtype it in new code. I don't >> like that idea at all. > > Why will it break all existing subtyping code? See the PEP: Only objects > created through PyUnicode_New will be affected - I don't think this can > include objects of a subtype. Ok, that's fine then. Stefan From eliben at gmail.com Fri Jan 28 07:52:55 2011 From: eliben at gmail.com (Eli Bendersky) Date: Fri, 28 Jan 2011 08:52:55 +0200 Subject: [Python-Dev] Beta version of the new devguide In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: 
                              
                              On Sun, Jan 23, 2011 at 03:08, Brett Cannon 
                              
                              wrote: > http://docs.python.org/devguide/ > > If you are a core developer and have a correction you want to make you > can simply check out the devguide yourself (link is in the Resources > section of the devguide) and make the corrections yourself. Otherwise > reply here (you can email me directly but I already have instances of > multiple people telling me about the same spelling mistake so it's > nice to have it public so people know when I have been informed). Brett, A couple of concerns regarding the "Getting Set Up" page: 1) "Do note that CPython will notice that it is being run from a source checkout. This means that it if you edit Python source code in your checkout the changes will be picked up by the interpreter for immediate testing. " I'm not sure what this means. Does CPython really know it's being run from a source checkout as opposed to a source tarball? By editing "Python source code" you mean the standard libraries/tests? To be "picked up by the interpreter" you then need to run it from the root of the checkout (after build) but this is also true for source tarballs. 2) "The core CPython interpreter only needs a C compiler to build itself;" I find this confusing since the CPython interpreter doesn't build itself. A developer builds it with a C compiler / makefile. Some tools indeed "build themselves" in some kind of a bootstrap process (i.e. gcc, AFAIK). I apologize in advance if this is too nit-picky ;-) Eli From fweimer at bfk.de Fri Jan 28 10:35:19 2011 From: fweimer at bfk.de (Florian Weimer) Date: Fri, 28 Jan 2011 09:35:19 +0000 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: 
                              
                              (Stefan Behnel's message of "Thu\, 27 Jan 2011 20\:06\:10 +0100") References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              Message-ID: <821v3xl6aw.fsf@mid.bfk.de> * Stefan Behnel: > "Martin v. L?wis", 24.01.2011 21:17: >> The Py_UNICODE type is still supported but deprecated. It is always >> defined as a typedef for wchar_t, so the wstr representation can double >> as Py_UNICODE representation. > > It's too bad this isn't initialised by default, though. Py_UNICODE is > the only representation that can be used efficiently from C code Is this really true? I don't think I've seen any C API which actually uses wchar_t, beyond that what is provided by libc. UTF-8 and even UTF-16 are much, much more common. -- Florian Weimer 
                              
                              BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstra?e 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 From stefan_ml at behnel.de Fri Jan 28 11:30:33 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 28 Jan 2011 11:30:33 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <821v3xl6aw.fsf@mid.bfk.de> References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              <821v3xl6aw.fsf@mid.bfk.de> Message-ID: 
                              
                              Florian Weimer, 28.01.2011 10:35: > * Stefan Behnel: >> "Martin v. L?wis", 24.01.2011 21:17: >>> The Py_UNICODE type is still supported but deprecated. It is always >>> defined as a typedef for wchar_t, so the wstr representation can double >>> as Py_UNICODE representation. >> >> It's too bad this isn't initialised by default, though. Py_UNICODE is >> the only representation that can be used efficiently from C code > > Is this really true? I don't think I've seen any C API which actually > uses wchar_t, beyond that what is provided by libc. UTF-8 and even > UTF-16 are much, much more common. They are also much harder to use, unless you are really only interested in 7-bit ASCII data - which is the case for most C libraries, so I believe that's what you meant here. However, this is the CPython runtime with built-in Unicode support, not the C runtime where it comes as an add-on at best, and where Unicode processing without being Unicode aware is common. The nice thing about Py_UNICODE is that is basically gives you native Unicode code points directly, without needing to decode UTF-8 byte runs and the like. In Cython, it allows you to do things like this: def test_for_those_characters(unicode s): for c in s: # warning: randomly chosen Unicode escapes ahead if c in u"\u0356\u1012\u3359\u4567": return True else: return False The loop runs in plain C, using the somewhat obvious implementation with a loop over Py_UNICODE characters and a switch statement for the comparison. This would look a *lot* more ugly with UTF-8 encoded byte strings. Regarding Cython specifically, the above will still be *possible* under the proposal, given that the memory layout of the strings will still represent the Unicode code points. It will just be trickier to implement in Cython's type system as there is no longer a (user visible) C type representation for those code units. It can be any of uchar, ushort16 or uint32, neither of which is necessarily a 'native' representation of a Unicode character in CPython. While I'm somewhat confident that I'll find a way to fix this in Cython, my point is just that this adds a certain level of complexity to C code using the new memory layout that simply wasn't there before. Stefan From skip at pobox.com Fri Jan 28 11:50:37 2011 From: skip at pobox.com (skip at pobox.com) Date: Fri, 28 Jan 2011 04:50:37 -0600 Subject: [Python-Dev] getting stable URLs for major.minor versions In-Reply-To: 
                              
                              References: 
                              
                              <19777.57810.481593.401954@montanaro.dyndns.org> 
                              
                              Message-ID: <19778.40829.489205.651604@montanaro.dyndns.org> Brett> I don't get what you are worried about: http://www.python.org/2 Brett> would refer to 2.7.1 while http://www.python.org/3 would refer to Brett> 3.1.3. In my world, 2 == major, 7 == minor, 1 == micro. I interpreted your reference to "major" as implying .../2 would refer to .../3. I thought the smiley was because you didn't relly expect people to do that. S From fweimer at bfk.de Fri Jan 28 15:27:39 2011 From: fweimer at bfk.de (Florian Weimer) Date: Fri, 28 Jan 2011 14:27:39 +0000 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: 
                              
                              (Stefan Behnel's message of "Fri\, 28 Jan 2011 11\:30\:33 +0100") References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              <821v3xl6aw.fsf@mid.bfk.de> 
                              
                              Message-ID: <82r5bxhzms.fsf@mid.bfk.de> * Stefan Behnel: > The nice thing about Py_UNICODE is that is basically gives you native > Unicode code points directly, without needing to decode UTF-8 byte > runs and the like. In Cython, it allows you to do things like this: > > def test_for_those_characters(unicode s): > for c in s: > # warning: randomly chosen Unicode escapes ahead > if c in u"\u0356\u1012\u3359\u4567": > return True > else: > return False > > The loop runs in plain C, using the somewhat obvious implementation > with a loop over Py_UNICODE characters and a switch statement for the > comparison. This would look a *lot* more ugly with UTF-8 encoded byte > strings. Not really, because UTF-8 is quite search-friendly. (The if would have to invoke a memmem()-like primitive.) Random subscrips are problematic. However, why would one want to write loops like the above? Don't you have to take combining characters (comprising multiple codepoints) into account most of the time when you look at individual characters? Then UTF-32 does not offer much of a simplification. -- Florian Weimer 
                              
                              BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstra?e 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 From stefan_ml at behnel.de Fri Jan 28 16:22:37 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 28 Jan 2011 16:22:37 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <82r5bxhzms.fsf@mid.bfk.de> References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              <821v3xl6aw.fsf@mid.bfk.de> 
                              
                              <82r5bxhzms.fsf@mid.bfk.de> Message-ID: 
                              
                              Florian Weimer, 28.01.2011 15:27: > * Stefan Behnel: > >> The nice thing about Py_UNICODE is that is basically gives you native >> Unicode code points directly, without needing to decode UTF-8 byte >> runs and the like. In Cython, it allows you to do things like this: >> >> def test_for_those_characters(unicode s): >> for c in s: >> # warning: randomly chosen Unicode escapes ahead >> if c in u"\u0356\u1012\u3359\u4567": >> return True >> else: >> return False >> >> The loop runs in plain C, using the somewhat obvious implementation >> with a loop over Py_UNICODE characters and a switch statement for the >> comparison. This would look a *lot* more ugly with UTF-8 encoded byte >> strings. > > Not really, because UTF-8 is quite search-friendly. (The if would > have to invoke a memmem()-like primitive.) Random subscrips are > problematic. > > However, why would one want to write loops like the above? Don't you > have to take combining characters (comprising multiple codepoints) > into account most of the time when you look at individual characters? > Then UTF-32 does not offer much of a simplification. Hmm, I think this discussion is pointless. Regardless of the memory layout, you can always go down to the byte level and use an efficient (multi-)substring search algorithm. (which is obviously helped if you know the layout at compile time *wink*) Bad example, I guess. Stefan From techtonik at gmail.com Fri Jan 28 17:12:39 2011 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 28 Jan 2011 18:12:39 +0200 Subject: [Python-Dev] Finally fix installer to add Python to %PATH% on Windows Message-ID: 
                              
                              Hi, I'd like to You probably know that after installation on Windows system it is possible to call Python from Explorer's Run dialog (Win-R). It is because Python path is added to App Paths registry key and Windows Explorer shell checks this key when looking for executable. But Python doesn't work from cmd session and, more importantly, *Python doesn't work from .bat files*. It is because cmd shell doesn't know about App Paths and relies on system PATH to find executable. As far as I remember, there is no function in Python stdlib either, that executes processes and does lookups in App Paths. I never paid much attention to this fact, because I put several .bat files for every 25, 26, 27, 31 and 32 version of Python into PATH manually. But when bootstrap script for build environment of Native Client (NaCl) said that I have no Python available and started to install its own, I've asked myself - "How come? There are 5! possible versions of Python on my system." It appeared that the following .bat file doesn't work: ---cut mypy.bat-- python.exe ---cut mypy.bat-- C:\>mypy.bat C:\>python.exe 'python.exe' is not recognized as an internal or external command, operable program or batch file. I've seen about 7 requests to add Python into %PATH% in installer. All closed with no result, but with some fear and uncertainty. Martin feared that MSI installers are not able to remove entry from PATH and even if they can, they may kill the whole PATH instead of removing just one entry. To prove or dispel these fears, I've just installed/uninstalled Mercurial from mercurial-1.7.3-1-x86.msi and App Engine from GoogleAppEngine-1.4.1.msi several times. Both add entries to PATH and both remove them without any further problems. Should we finally add this to 3.2 installer for Python? -- anatoly t. From brian.curtin at gmail.com Fri Jan 28 17:29:07 2011 From: brian.curtin at gmail.com (Brian Curtin) Date: Fri, 28 Jan 2011 10:29:07 -0600 Subject: [Python-Dev] Finally fix installer to add Python to %PATH% on Windows In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: 
                              
                              On Fri, Jan 28, 2011 at 10:12, anatoly techtonik 
                              
                              wrote: > Hi, I'd like to > > You probably know that after installation on Windows system it is > possible to call Python from Explorer's Run dialog (Win-R). It is > because Python path is added to App Paths registry key and Windows > Explorer shell checks this key when looking for executable. > > But Python doesn't work from cmd session and, more importantly, > *Python doesn't work from .bat files*. It is because cmd shell doesn't > know about App Paths and relies on system PATH to find executable. As > far as I remember, there is no function in Python stdlib either, that > executes processes and does lookups in App Paths. > > I never paid much attention to this fact, because I put several .bat > files for every 25, 26, 27, 31 and 32 version of Python into PATH > manually. But when bootstrap script for build environment of Native > Client (NaCl) said that I have no Python available and started to > install its own, I've asked myself - "How come? There are 5! possible > versions of Python on my system." It appeared that the following .bat > file doesn't work: > > ---cut mypy.bat-- > python.exe > ---cut mypy.bat-- > > C:\>mypy.bat > > C:\>python.exe > 'python.exe' is not recognized as an internal or external command, > operable program or batch file. > > > I've seen about 7 requests to add Python into %PATH% in installer. All > closed with no result, but with some fear and uncertainty. Martin > feared that MSI installers are not able to remove entry from PATH and > even if they can, they may kill the whole PATH instead of removing > just one entry. > > To prove or dispel these fears, I've just installed/uninstalled > Mercurial from mercurial-1.7.3-1-x86.msi and App Engine from > GoogleAppEngine-1.4.1.msi several times. Both add entries to PATH and > both remove them without any further problems. Should we finally add > this to 3.2 installer for Python? > > -- > anatoly t. Definitely not for 3.2, but this is something I'd like to look into for 3.3. Recently I've talked to two Python trainers/educators and the major gripe their attendees see is that you can't just sit down and type "python" and have something work. For multi-Python installs, we'll have to define what that "something" is, but I think it should be possible for the installer to optionally put Python into the path, and to also remove itself on uninstall. One of said trainers is running a course inside my company right now and the training room VMs they are running on do not have the path setup. Some users were puzzled as to why "python foo.py" doesn't work, but executing "foo.py" does (via file association). One quick-and-dirty solution was to create a "Command Shell" shortcut in the start menu which would just be a batch file that adds Python to the path for that cmd session. It would be kind of similar to the "Python (command line)" shortcut, which uses pythonw.exe. I think we can do better than this, though. -------------- next part -------------- An HTML attachment was scrubbed... URL: 
                              
                              From mail at timgolden.me.uk Fri Jan 28 17:37:34 2011 From: mail at timgolden.me.uk (Tim Golden) Date: Fri, 28 Jan 2011 16:37:34 +0000 Subject: [Python-Dev] Finally fix installer to add Python to %PATH% on Windows In-Reply-To: 
                              
                              References: 
                              
                              
                              Message-ID: <4D42F0CE.8080406@timgolden.me.uk> On 28/01/2011 16:29, Brian Curtin wrote: > On Fri, Jan 28, 2011 at 10:12, anatoly techtonik
                              
                              wrote: >> I've seen about 7 requests to add Python into %PATH% in installer. All >> closed with no result, but with some fear and uncertainty. Martin >> feared that MSI installers are not able to remove entry from PATH and >> even if they can, they may kill the whole PATH instead of removing >> just one entry. >> >> To prove or dispel these fears, I've just installed/uninstalled >> Mercurial from mercurial-1.7.3-1-x86.msi and App Engine from >> GoogleAppEngine-1.4.1.msi several times. Both add entries to PATH and >> both remove them without any further problems. Should we finally add >> this to 3.2 installer for Python? >> >> -- >> anatoly t. > > > Definitely not for 3.2, but this is something I'd like to look into for 3.3. > > Recently I've talked to two Python trainers/educators and the major gripe > their attendees see is that you can't just sit down and type "python" and > have something work. For multi-Python installs, we'll have to define what > that "something" is, but I think it should be possible for the installer to > optionally put Python into the path, and to also remove itself on uninstall. I don't think, ultimately, that there is any insurmountable technical objection. There are misgivings but they could undoubtedly be overcome or overridden. But it would require someone to patch the MSI builder so it added the functionality and -- I think -- offered it as an option which could be enabled or disabled. TJG From status at bugs.python.org Fri Jan 28 18:07:04 2011 From: status at bugs.python.org (Python tracker) Date: Fri, 28 Jan 2011 18:07:04 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20110128170704.C6AB61CCED@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2011-01-21 - 2011-01-28) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 2567 (+40) closed 20262 (+34) total 22829 (+74) Open issues with patches: 1085 Issues opened (54) ================== #10042: total_ordering stack overflow http://bugs.python.org/issue10042 reopened by eric.araujo #10501: make_buildinfo regression with unquoted path http://bugs.python.org/issue10501 reopened by eli.bendersky #10708: Misc/porting should be folded into the development FAQ or the http://bugs.python.org/issue10708 reopened by pitrou #10976: json.loads() throws TypeError on bytes object http://bugs.python.org/issue10976 opened by hhas #10977: Concrete object C API needs abstract path for subclasses of bu http://bugs.python.org/issue10977 opened by rhettinger #10978: Add optional argument to Semaphore.release for releasing multi http://bugs.python.org/issue10978 opened by rhettinger #10979: setUpClass exception causes explosion with "-b" http://bugs.python.org/issue10979 opened by brandon-rhodes #10980: http.server Header Unicode Bug http://bugs.python.org/issue10980 opened by aronacher #10983: Errors in http.client.HTTPConnection class (python3) http://bugs.python.org/issue10983 opened by nooB #10984: argparse add_mutually_exclusive_group should accept existing a http://bugs.python.org/issue10984 opened by gotgenes #10988: Descriptor protocol documentation for super bindings is incorr http://bugs.python.org/issue10988 opened by Joshua.Arnold #10989: ssl.SSLContext(True).load_verify_locations(None, True) segfaul http://bugs.python.org/issue10989 opened by haypo #10990: tests mutating sys.gettrace() w/o re-instating previous state http://bugs.python.org/issue10990 opened by brett.cannon #10991: trace fails when test imported a temporary file http://bugs.python.org/issue10991 opened by brett.cannon #10992: tests failing when run under coverage http://bugs.python.org/issue10992 opened by brett.cannon #10994: implementation details in sys module http://bugs.python.org/issue10994 opened by fijall #10998: Remove last traces of -Q / sys.flags.division_warning / Py_Div http://bugs.python.org/issue10998 opened by eric.araujo #10999: os.chflags refers to stat constants, but the constants are not http://bugs.python.org/issue10999 opened by r.david.murray #11001: Various obvious errors in cookies documentation http://bugs.python.org/issue11001 opened by spookylukey #11003: os.system should be deprecated in favour of subprocess module http://bugs.python.org/issue11003 opened by Jakob.Bowyer #11005: Assertion error on RLock._acquire_restore http://bugs.python.org/issue11005 opened by haypo #11006: warnings with subprocess and pipe2 http://bugs.python.org/issue11006 opened by pitrou #11007: stack tracebacks should give the relevant class name http://bugs.python.org/issue11007 opened by stickwithjosh #11009: urllib.splituser is not documented http://bugs.python.org/issue11009 opened by techtonik #11011: More functools functions http://bugs.python.org/issue11011 opened by Jason.Baker #11012: Add log1p(), exp1m(), gamma(), and lgamma() to cmath http://bugs.python.org/issue11012 opened by rhettinger #11015: Bring test.support docs up to date http://bugs.python.org/issue11015 opened by ncoghlan #11021: email MIME-Version headers for each part in multipart message http://bugs.python.org/issue11021 opened by david.caro #11022: locale.setlocale() doesn't change I/O codec, os.environ does http://bugs.python.org/issue11022 opened by sdaoden #11023: pep 227 missing text http://bugs.python.org/issue11023 opened by aisaac #11025: Distutils2 install command without setup.py or setup.cfg creat http://bugs.python.org/issue11025 opened by Boris.FELD #11027: Implement sectionxform in configparser http://bugs.python.org/issue11027 opened by Kunjesh.Kaushik #11028: Implement the setup.py -> setup.cfg in mkcfg http://bugs.python.org/issue11028 opened by alaintty #11029: Crash, 2.7.1, Tkinter and threads and line drawing http://bugs.python.org/issue11029 opened by PythonInTheGrass #11030: regrtest - allow for relative path with --coverdir http://bugs.python.org/issue11030 opened by sandro.tosi #11031: regrtest - --testdir, new command-line option to specify alter http://bugs.python.org/issue11031 opened by sandro.tosi #11032: _string: formatter_field_name_split() and formatter_parser doe http://bugs.python.org/issue11032 opened by haypo #11033: ElementTree.fromstring doesn't work with Unicode http://bugs.python.org/issue11033 opened by Peter.Cai #11034: Build problem on Windows with MSVC++ Express 2008 http://bugs.python.org/issue11034 opened by eli.bendersky #11035: Segmentation fault http://bugs.python.org/issue11035 opened by Dmitry.Groshev #11037: How distutils2 handle namespaces http://bugs.python.org/issue11037 opened by sdouche #11038: Some commands should stop if Name and Version are missing http://bugs.python.org/issue11038 opened by gawel #11040: After registering a project to PyPI, classifiers fields aren't http://bugs.python.org/issue11040 opened by Julien.Miotte #11041: On the distutils2 documentation, 'requires-python' shouldn't b http://bugs.python.org/issue11041 opened by Julien.Miotte #11042: [PyPI CSS] Adding project urls onto a project page using regis http://bugs.python.org/issue11042 opened by Julien.Miotte #11043: On GNU/Linux (Ubuntu) distutils2.mkcfg shouldn't create an exe http://bugs.python.org/issue11043 opened by Julien.Miotte #11044: The description-file isn't handled by distutils2 http://bugs.python.org/issue11044 opened by Julien.Miotte #11045: shutil._make_tarball http://bugs.python.org/issue11045 opened by tarek #11046: darwin/MacOS X setup.py hack http://bugs.python.org/issue11046 opened by sdaoden #10997: Duplicate entries in IDLE "Recent Files" menu item on OS X http://bugs.python.org/issue10997 opened by ned.deily #11016: Add S_ISDOOR to the stat module http://bugs.python.org/issue11016 opened by pitrou #11024: imaplib: Time2Internaldate() returns localized strings http://bugs.python.org/issue11024 opened by spaetz #11036: Allow multiple files in the description-file metadata http://bugs.python.org/issue11036 opened by gawel #11047: Bad description for a change http://bugs.python.org/issue11047 opened by Oren_Held Most recent 15 issues with no replies (15) ========================================== #11047: Bad description for a change http://bugs.python.org/issue11047 #11044: The description-file isn't handled by distutils2 http://bugs.python.org/issue11044 #11043: On GNU/Linux (Ubuntu) distutils2.mkcfg shouldn't create an exe http://bugs.python.org/issue11043 #11042: [PyPI CSS] Adding project urls onto a project page using regis http://bugs.python.org/issue11042 #11041: On the distutils2 documentation, 'requires-python' shouldn't b http://bugs.python.org/issue11041 #11040: After registering a project to PyPI, classifiers fields aren't http://bugs.python.org/issue11040 #11038: Some commands should stop if Name and Version are missing http://bugs.python.org/issue11038 #11037: How distutils2 handle namespaces http://bugs.python.org/issue11037 #11036: Allow multiple files in the description-file metadata http://bugs.python.org/issue11036 #11033: ElementTree.fromstring doesn't work with Unicode http://bugs.python.org/issue11033 #11031: regrtest - --testdir, new command-line option to specify alter http://bugs.python.org/issue11031 #11030: regrtest - allow for relative path with --coverdir http://bugs.python.org/issue11030 #11028: Implement the setup.py -> setup.cfg in mkcfg http://bugs.python.org/issue11028 #11023: pep 227 missing text http://bugs.python.org/issue11023 #11012: Add log1p(), exp1m(), gamma(), and lgamma() to cmath http://bugs.python.org/issue11012 Most recent 15 issues waiting for review (15) ============================================= #11047: Bad description for a change http://bugs.python.org/issue11047 #11034: Build problem on Windows with MSVC++ Express 2008 http://bugs.python.org/issue11034 #11032: _string: formatter_field_name_split() and formatter_parser doe http://bugs.python.org/issue11032 #11031: regrtest - --testdir, new command-line option to specify alter http://bugs.python.org/issue11031 #11030: regrtest - allow for relative path with --coverdir http://bugs.python.org/issue11030 #11024: imaplib: Time2Internaldate() returns localized strings http://bugs.python.org/issue11024 #11015: Bring test.support docs up to date http://bugs.python.org/issue11015 #11011: More functools functions http://bugs.python.org/issue11011 #11001: Various obvious errors in cookies documentation http://bugs.python.org/issue11001 #10999: os.chflags refers to stat constants, but the constants are not http://bugs.python.org/issue10999 #10998: Remove last traces of -Q / sys.flags.division_warning / Py_Div http://bugs.python.org/issue10998 #10997: Duplicate entries in IDLE "Recent Files" menu item on OS X http://bugs.python.org/issue10997 #10992: tests failing when run under coverage http://bugs.python.org/issue10992 #10990: tests mutating sys.gettrace() w/o re-instating previous state http://bugs.python.org/issue10990 #10989: ssl.SSLContext(True).load_verify_locations(None, True) segfaul http://bugs.python.org/issue10989 Top 10 most discussed issues (10) ================================= #10990: tests mutating sys.gettrace() w/o re-instating previous state http://bugs.python.org/issue10990 22 msgs #9124: Mailbox module should use binary I/O, not text I/O http://bugs.python.org/issue9124 21 msgs #10848: Move test.regrtest from getopt to argparse http://bugs.python.org/issue10848 19 msgs #11034: Build problem on Windows with MSVC++ Express 2008 http://bugs.python.org/issue11034 12 msgs #5863: bz2.BZ2File should accept other file-like objects. http://bugs.python.org/issue5863 11 msgs #11027: Implement sectionxform in configparser http://bugs.python.org/issue11027 11 msgs #11016: Add S_ISDOOR to the stat module http://bugs.python.org/issue11016 11 msgs #10954: No warning for csv.writer API change http://bugs.python.org/issue10954 10 msgs #10994: implementation details in sys module http://bugs.python.org/issue10994 10 msgs #11022: locale.setlocale() doesn't change I/O codec, os.environ does http://bugs.python.org/issue11022 9 msgs Issues closed (34) ================== #4177: Crash in MIMEText on FreeBSD http://bugs.python.org/issue4177 closed by haypo #5097: asyncore.dispatcher_with_send undocumented http://bugs.python.org/issue5097 closed by giampaolo.rodola #5831: Doc mistake : threading.Timer is *not* a class http://bugs.python.org/issue5831 closed by eric.araujo #10948: Trouble with dir_util created dir cache http://bugs.python.org/issue10948 closed by eric.araujo #10949: logging.RotatingFileHandler not robust enough http://bugs.python.org/issue10949 closed by vinay.sajip #10952: Don't normalize module names to NFKC? http://bugs.python.org/issue10952 closed by haypo #10955: Possible regression with stdlib in zipfile http://bugs.python.org/issue10955 closed by haypo #10957: Python developer FAQ grammar error http://bugs.python.org/issue10957 closed by brett.cannon #10960: os.stat() does not mention that it follow symlinks by default http://bugs.python.org/issue10960 closed by r.david.murray #10970: "string".encode('base64') is not the same as base64.b64encode( http://bugs.python.org/issue10970 closed by terry.reedy #10973: OS X 10.6 IDLE, tkinter: Cocoa Tk 8.5 crash when composite cha http://bugs.python.org/issue10973 closed by ned.deily #10974: IDLE 3.x can crash decoding recent file list http://bugs.python.org/issue10974 closed by ned.deily #10975: #10961: Pydoc touchups in new 3.2 Web server (issue4090042) http://bugs.python.org/issue10975 closed by eric.araujo #10981: argparse: options starting with -- match substrings http://bugs.python.org/issue10981 closed by david.caro #10982: asyncore timeouts do not work correctly http://bugs.python.org/issue10982 closed by giampaolo.rodola #10985: test_sys triggers a fatal python error when run under coverage http://bugs.python.org/issue10985 closed by brett.cannon #10986: traceback's rendering behavior while throwing custom exception http://bugs.python.org/issue10986 closed by benjamin.peterson #10987: _pickle doesn't handle recursion limits properly http://bugs.python.org/issue10987 closed by pitrou #10993: HTTPSConnection does not close when call close() method http://bugs.python.org/issue10993 closed by tanakorn #10995: mailbox.py open() calls don't set encoding http://bugs.python.org/issue10995 closed by r.david.murray #10996: Typo in What's New in 3.2 http://bugs.python.org/issue10996 closed by rhettinger #11000: Doc: ast.parse parses source, not just expressions http://bugs.python.org/issue11000 closed by terry.reedy #11002: 'Upload' link on Files page is broken http://bugs.python.org/issue11002 closed by eric.araujo #11004: AssertionError on collections.deque().count(1) http://bugs.python.org/issue11004 closed by rhettinger #11008: logging.dictConfig not documented as new in version 2.7 http://bugs.python.org/issue11008 closed by vinay.sajip #11010: Unicode BOM left in loaded text http://bugs.python.org/issue11010 closed by loewis #11013: Build of 2.7 svn fails in readline http://bugs.python.org/issue11013 closed by brett.cannon #11014: 'filter' argument for Tarfile.add needs to be a keyword-only a http://bugs.python.org/issue11014 closed by rhettinger #11017: optparse: error: invalid integer value http://bugs.python.org/issue11017 closed by eric.araujo #11018: typo in test_bz2 http://bugs.python.org/issue11018 closed by pitrou #11019: BytesGenerator fails if the Message body is None http://bugs.python.org/issue11019 closed by r.david.murray #11020: Pyclbr broken because of missing 2-to-3 conversion http://bugs.python.org/issue11020 closed by rhettinger #11026: Distutils2 install command fail with python 2.5/2.7 http://bugs.python.org/issue11026 closed by Boris.FELD #11039: Use of 'L' specifier is inconsistent when printing long intege http://bugs.python.org/issue11039 closed by eric.smith From merwok at netwok.org Fri Jan 28 18:43:16 2011 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Fri, 28 Jan 2011 18:43:16 +0100 Subject: [Python-Dev] Finally fix installer to add Python to %PATH% on Windows In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: <4D430034.7060207@netwok.org> Hello See http://bugs.python.org/issue3561 (rejected by Martin). Regards From brett at python.org Fri Jan 28 19:05:47 2011 From: brett at python.org (Brett Cannon) Date: Fri, 28 Jan 2011 10:05:47 -0800 Subject: [Python-Dev] Beta version of the new devguide In-Reply-To: 
                              
                              References: 
                              
                              
                              Message-ID: 
                              
                              On Thu, Jan 27, 2011 at 22:52, Eli Bendersky 
                              
                              wrote: > On Sun, Jan 23, 2011 at 03:08, Brett Cannon 
                              
                              wrote: >> http://docs.python.org/devguide/ >> >> If you are a core developer and have a correction you want to make you >> can simply check out the devguide yourself (link is in the Resources >> section of the devguide) and make the corrections yourself. Otherwise >> reply here (you can email me directly but I already have instances of >> multiple people telling me about the same spelling mistake so it's >> nice to have it public so people know when I have been informed). > > Brett, > A couple of concerns regarding the "Getting Set Up" page: > > 1) > > "Do note that CPython will notice that it is being run from a source > checkout. This means that it if you edit Python source code in your > checkout the changes will be picked up by the interpreter for > immediate testing. " > > I'm not sure what this means. Does CPython really know it's being run > from a source checkout as opposed to a source tarball? Technically yes because of sys.subversion, but otherwise not really. But then again the distinction is so minimal I'm not going to bother rephrasing it to make it clear. > By editing > "Python source code" you mean the standard libraries/tests? I'll make it "Python's". > To be > "picked up by the interpreter" you then need to run it from the root > of the checkout (after build) but this is also true for source > tarballs. Once again, not an important distinction. > > 2) > > "The core CPython interpreter only needs a C compiler to build itself;" > > I find this confusing since the CPython interpreter doesn't build > itself. A developer builds it with a C compiler / makefile. Some tools > indeed "build themselves" in some kind of a bootstrap process (i.e. > gcc, AFAIK). True. I'll rephrase. > > > I apologize in advance if this is too nit-picky ;-) Sure, but at least you said it nicely. =) -Brett > Eli > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org > From brett at python.org Fri Jan 28 19:09:12 2011 From: brett at python.org (Brett Cannon) Date: Fri, 28 Jan 2011 10:09:12 -0800 Subject: [Python-Dev] fcmp() in test.support In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: 
                              
                              On Thu, Jan 27, 2011 at 20:55, Eli Bendersky 
                              
                              wrote: > I'm working on improving the .rst documentation of test.support (Issue > 11015), and came upon the undocumented "fcmp" function that's being > exported from test.support, along with a "FUZZ"constant. > > As I search through the tests (py3k trunk), I see fcmp() is being used > only in two places in a fairly trivial way: > 1. test_float: where it can be directly replaced by assertAlmostEqual > from unittest > 2. test_builtin: where the assertion can also be easily rewritten in > terms of assertAlmostEqual > > Although fcmp seems to provide extra functionality over > assertAlmostEqual, the above makes me think it should probably be > removed altogether, or added to unittest if it's still deemed > important. > > +/- ? I say drop it if it can be done so safely. From raymond.hettinger at gmail.com Fri Jan 28 19:51:19 2011 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 28 Jan 2011 10:51:19 -0800 Subject: [Python-Dev] fcmp() in test.support In-Reply-To: 
                              
                              References: 
                              
                              
                              Message-ID: <99C803CF-798E-45DD-A163-F597C38A5E89@gmail.com> On Jan 28, 2011, at 10:09 AM, Brett Cannon wrote: > On Thu, Jan 27, 2011 at 20:55, Eli Bendersky 
                              
                              wrote: >> I'm working on improving the .rst documentation of test.support (Issue >> 11015), and came upon the undocumented "fcmp" function that's being >> exported from test.support, along with a "FUZZ"constant. >> >> As I search through the tests (py3k trunk), I see fcmp() is being used >> only in two places in a fairly trivial way: >> 1. test_float: where it can be directly replaced by assertAlmostEqual >> from unittest >> 2. test_builtin: where the assertion can also be easily rewritten in >> terms of assertAlmostEqual >> >> Although fcmp seems to provide extra functionality over >> assertAlmostEqual, the above makes me think it should probably be >> removed altogether, or added to unittest if it's still deemed >> important. >> >> +/- ? > > I say drop it if it can be done so safely. Yes, please remove fcmp() altogether. Like you said, the usage is trivial. If you're feeling bold, replace them with assertEqual(), the tests look like they produce exact values even in floating point. Raymond ------------------------------ ~/py32 $ ack "fcmp" --python Doc/tools/pygments/lexers/asm.py 261: r'|lshr|ashr|and|or|xor|icmp|fcmp' Lib/test/support.py 36: "fcmp", "is_jython", "TESTFN", "HOST", "FUZZ", "SAVEDCWD", "temp_cwd", 354:def fcmp(x, y): # fuzzy comparison function 364: outcome = fcmp(x[i], y[i]) Lib/test/test_builtin.py 13:from test.support import fcmp, TESTFN, unlink, run_unittest, check_warnings 397: self.assertTrue(not fcmp(divmod(3.25, 1.0), (3.0, 0.25))) 398: self.assertTrue(not fcmp(divmod(-3.25, 1.0), (-4.0, 0.75))) 399: self.assertTrue(not fcmp(divmod(3.25, -1.0), (-4.0, -0.75))) 400: self.assertTrue(not fcmp(divmod(-3.25, -1.0), (3.0, -0.25))) Lib/test/test_float.py 91: self.assertEqual(support.fcmp(float(" .25e-1 "), .025), 0) From fuzzyman at voidspace.org.uk Fri Jan 28 20:21:08 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 28 Jan 2011 19:21:08 +0000 Subject: [Python-Dev] Finally fix installer to add Python to %PATH% on Windows In-Reply-To: 
                              
                              References: 
                              
                              
                              Message-ID: <4D431724.4010002@voidspace.org.uk> On 28/01/2011 16:29, Brian Curtin wrote: > On Fri, Jan 28, 2011 at 10:12, anatoly techtonik 
                              
                              
                              > wrote: > > Hi, I'd like to > > You probably know that after installation on Windows system it is > possible to call Python from Explorer's Run dialog (Win-R). It is > because Python path is added to App Paths registry key and Windows > Explorer shell checks this key when looking for executable. > > But Python doesn't work from cmd session and, more importantly, > *Python doesn't work from .bat files*. It is because cmd shell doesn't > know about App Paths and relies on system PATH to find executable. As > far as I remember, there is no function in Python stdlib either, that > executes processes and does lookups in App Paths. > > I never paid much attention to this fact, because I put several .bat > files for every 25, 26, 27, 31 and 32 version of Python into PATH > manually. But when bootstrap script for build environment of Native > Client (NaCl) said that I have no Python available and started to > install its own, I've asked myself - "How come? There are 5! possible > versions of Python on my system." It appeared that the following .bat > file doesn't work: > > ---cut mypy.bat-- > python.exe > ---cut mypy.bat-- > > C:\>mypy.bat > > C:\>python.exe > 'python.exe' is not recognized as an internal or external command, > operable program or batch file. > > > I've seen about 7 requests to add Python into %PATH% in installer. All > closed with no result, but with some fear and uncertainty. Martin > feared that MSI installers are not able to remove entry from PATH and > even if they can, they may kill the whole PATH instead of removing > just one entry. > > To prove or dispel these fears, I've just installed/uninstalled > Mercurial from mercurial-1.7.3-1-x86.msi and App Engine from > GoogleAppEngine-1.4.1.msi several times. Both add entries to PATH and > both remove them without any further problems. Should we finally add > this to 3.2 installer for Python? > > -- > anatoly t. > > > Definitely not for 3.2, but this is something I'd like to look into > for 3.3. > > Recently I've talked to two Python trainers/educators and the major > gripe their attendees see is that you can't just sit down and type > "python" and have something work. For multi-Python installs, we'll > have to define what that "something" is, but I think it should be > possible for the installer to optionally put Python into the path, and > to also remove itself on uninstall. > I've helped quite a few "python newbies" on Windows who are also surprised / frustrated on learning that "python" on the command line doesn't work after installing python. All the best, Michael Foord > One of said trainers is running a course inside my company right now > and the training room VMs they are running on do not have the path > setup. Some users were puzzled as to why "python foo.py" doesn't work, > but executing "foo.py" does (via file association). > > One quick-and-dirty solution was to create a "Command Shell" shortcut > in the start menu which would just be a batch file that adds Python to > the path for that cmd session. It would be kind of similar to the > "Python (command line)" shortcut, which uses pythonw.exe. I think we > can do better than this, though. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: 
                              
                              From raymond.hettinger at gmail.com Fri Jan 28 20:29:02 2011 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 28 Jan 2011 11:29:02 -0800 Subject: [Python-Dev] Finally fix installer to add Python to %PATH% on Windows In-Reply-To: <4D431724.4010002@voidspace.org.uk> References: 
                              
                              
                              <4D431724.4010002@voidspace.org.uk> Message-ID: <7DA37C12-D3DA-49B3-996A-017CF304BC5C@gmail.com> On Jan 28, 2011, at 11:21 AM, Michael Foord wrote: > On 28/01/2011 16:29, Brian Curtin wrote: >> >> Recently I've talked to two Python trainers/educators and the major gripe their attendees see is that you can't just sit down and type "python" and have something work. For multi-Python installs, we'll have to define what that "something" is, but I think it should be possible for the installer to optionally put Python into the path, and to also remove itself on uninstall. >> > > I've helped quite a few "python newbies" on Windows who are also surprised / frustrated on learning that "python" on the command line doesn't work after installing python. At the very least, we should add some prominent instructions for getting the command line version up and running. Raymond From fuzzyman at voidspace.org.uk Fri Jan 28 20:34:12 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 28 Jan 2011 19:34:12 +0000 Subject: [Python-Dev] fcmp() in test.support In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: <4D431A34.9030401@voidspace.org.uk> On 28/01/2011 04:55, Eli Bendersky wrote: > I'm working on improving the .rst documentation of test.support (Issue > 11015), and came upon the undocumented "fcmp" function that's being > exported from test.support, along with a "FUZZ"constant. > This module shouldn't really be documented at all. It exists to support the Python test framework and we don't want to have to support users or make API stability guarantees. Plus most of it is rather old. Please don't document more stuff in this module. > As I search through the tests (py3k trunk), I see fcmp() is being used > only in two places in a fairly trivial way: > 1. test_float: where it can be directly replaced by assertAlmostEqual > from unittest > 2. test_builtin: where the assertion can also be easily rewritten in > terms of assertAlmostEqual > > Although fcmp seems to provide extra functionality over > assertAlmostEqual, the above makes me think it should probably be > removed altogether, or added to unittest if it's still deemed > important. > Yes, get rid of it. Michael Foord > +/- ? > Eli > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From eliben at gmail.com Fri Jan 28 20:59:23 2011 From: eliben at gmail.com (Eli Bendersky) Date: Fri, 28 Jan 2011 21:59:23 +0200 Subject: [Python-Dev] fcmp() in test.support In-Reply-To: <4D431A34.9030401@voidspace.org.uk> References: 
                              
                              <4D431A34.9030401@voidspace.org.uk> Message-ID: 
                              
                              >> I'm working on improving the .rst documentation of test.support (Issue >> 11015), and came upon the undocumented "fcmp" function that's being >> exported from test.support, along with a "FUZZ"constant. >> The documentation of the test module clearly states right at the top: "" Note The test package is meant for internal use by Python only. It is documented for the benefit of the core developers of Python. Any use of this package outside of Python?s standard library is discouraged as code mentioned here can change or be removed without notice between releases of Python. "" Given that disclaimer, I don't think it's a bad idea to document more parts of test.support. People adding new tests should be aware of some of the tools that already exist there, and only some of which are documented. Just my 2c here. Maybe Nick will want to chip in here since he opened issue 11015. Eli From lists at cheimes.de Fri Jan 28 21:34:05 2011 From: lists at cheimes.de (Christian Heimes) Date: Fri, 28 Jan 2011 21:34:05 +0100 Subject: [Python-Dev] Finally fix installer to add Python to %PATH% on Windows In-Reply-To: <7DA37C12-D3DA-49B3-996A-017CF304BC5C@gmail.com> References: 
                              
                              
                              <4D431724.4010002@voidspace.org.uk> <7DA37C12-D3DA-49B3-996A-017CF304BC5C@gmail.com> Message-ID: 
                              
                              Am 28.01.2011 20:29, schrieb Raymond Hettinger: > At the very least, we should add some prominent instructions for getting the command line version up and running. /me pops out of Guido's time machine and says: "execute Tools/scripts/win_add2path.py" I'm -1 on adding Python to %PATH%. The private MSVCRT DLLs may lead to unexpected side effects and it doesn't scale at all. What about people with more than one Python installation? I suggest that we add a single user specific directory or a global directory to %PATH% for all installations. Then the Python installer or 3rd party modules can drop executables like python3.3.exe or plip-3.3.exe into this directory. A .bat file won't do good because .bat files must be called with "call python33.bat" from another .bat file or the first one gets terminated. We can even use a single and simple executable as template for all tasks: * get registry key from resource section of the executable * use the registry key to lookup the location and name of pythonXX.dll * load DLL * get optional dotted module name for resource section * either fire up interpreter as shell, with **argv or -m dotted.module.name **argv Done ;) Christian From brian.curtin at gmail.com Fri Jan 28 21:46:38 2011 From: brian.curtin at gmail.com (Brian Curtin) Date: Fri, 28 Jan 2011 14:46:38 -0600 Subject: [Python-Dev] Finally fix installer to add Python to %PATH% on Windows In-Reply-To: 
                              
                              References: 
                              
                              
                              <4D431724.4010002@voidspace.org.uk> <7DA37C12-D3DA-49B3-996A-017CF304BC5C@gmail.com> 
                              
                              Message-ID: 
                              
                              On Fri, Jan 28, 2011 at 14:34, Christian Heimes 
                              
                              wrote: > Am 28.01.2011 20:29, schrieb Raymond Hettinger: > > At the very least, we should add some prominent instructions for getting > the command line version up and running. > > /me pops out of Guido's time machine and says: "execute > Tools/scripts/win_add2path.py" > > I'm -1 on adding Python to %PATH%. The private MSVCRT DLLs may lead to > unexpected side effects and it doesn't scale at all. What about people > with more than one Python installation? The same "problem" exists when it comes to file associations. The last installer you've run wins the battle. Since setting file associations is optional, and only one association can exist, I don't see why we can't do the same for the PATH. -------------- next part -------------- An HTML attachment was scrubbed... URL: 
                              
                              From martin at v.loewis.de Fri Jan 28 22:49:08 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Fri, 28 Jan 2011 22:49:08 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: 
                              
                              References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              <821v3xl6aw.fsf@mid.bfk.de> 
                              
                              Message-ID: <4D4339D4.5060906@v.loewis.de> > The nice thing about Py_UNICODE is that is basically gives you native > Unicode code points directly, without needing to decode UTF-8 byte runs > and the like. In Cython, it allows you to do things like this: > > def test_for_those_characters(unicode s): > for c in s: > # warning: randomly chosen Unicode escapes ahead > if c in u"\u0356\u1012\u3359\u4567": > return True > else: > return False > > The loop runs in plain C, using the somewhat obvious implementation with > a loop over Py_UNICODE characters and a switch statement for the > comparison. This would look a *lot* more ugly with UTF-8 encoded byte > strings. And indeed, when Cython is updated to 3.3, it shouldn't access the UTF-8 representation for such a loop. Instead, it should access the str representation, and might compile this to code like #define Cython_CharAt(data, kind, pos) kind==LATIN1 ? \ ((unsigned char*)data)[pos] : kind==UCS2 ? \ ((unsigned short*)data)[pos] : \ ((Py_UCS4*)data)[pos] void *data = PyUnicode_Data(s); int kind = PyUnicode_Kind(s); for(int pos=0; pos < PyUnicode_Size(s); pos++){ Py_UCS4 c = Cython_CharAt(data, kind, pos); Py_UCS4 tmp = {0x356, 0x1012, 0x3359, 0x4567}; for (int k=0; k<4; k++) if (c == tmp[k]) return 1; } return 0; > Regarding Cython specifically, the above will still be *possible* under > the proposal, given that the memory layout of the strings will still > represent the Unicode code points. It will just be trickier to implement > in Cython's type system as there is no longer a (user visible) C type > representation for those code units. There is: Py_UCS4 remains available. > It can be any of uchar, ushort16 or > uint32, neither of which is necessarily a 'native' representation of a > Unicode character in CPython. There won't be a "native" representation anymore - that's the whole point of the PEP. > While I'm somewhat confident that I'll > find a way to fix this in Cython, my point is just that this adds a > certain level of complexity to C code using the new memory layout that > simply wasn't there before. Understood. However, I think it is easier than you think it is. Regards, Martin From josiah.carlson at gmail.com Sat Jan 29 01:54:08 2011 From: josiah.carlson at gmail.com (Josiah Carlson) Date: Fri, 28 Jan 2011 16:54:08 -0800 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <4D3DDE5E.4080807@v.loewis.de> References: <4D3DDE5E.4080807@v.loewis.de> Message-ID: 
                              
                              Pardon me for this drive-by posting, but this thread smells a lot like this old thread (don't be afraid to read it all, there are some good points in there; not directed at you Martin, but at all readers/posters in this thread)... http://mail.python.org/pipermail/python-3000/2006-September/003795.html 
                              
                              I'm not averse to faster and/or more memory efficient unicode representations (I would be quite happy with them, actually). I do see the usefulness of having non-utf-8 representations, and caching them is a good idea, though I wonder if that is a "good for Python itself to cache", or "good for the application to cache". The evil side of me says that we should just provide an API available in Python/C for "give me the representation of unicode string X using the 2byte/4byte code points", and have it just return the appropriate array.array() value (useful for passing to other APIs, or for those who need to do manual manipulation of code-points), or whatever structure is deemed to be appropriate. The less evil side of me says that going with what the PEP offers isn't a bad idea, and might just be a good idea. I'll defer my vote to Martin. Regards, - Josiah On Mon, Jan 24, 2011 at 12:17 PM, "Martin v. L?wis" 
                              
                              wrote: > I have been thinking about Unicode representation for some time now. > This was triggered, on the one hand, by discussions with Glyph Lefkowitz > (who complained that his server app consumes too much memory), and Carl > Friedrich Bolz (who profiled Python applications to determine that > Unicode strings are among the top consumers of memory in Python). > On the other hand, this was triggered by the discussion on supporting > surrogates in the library better. > > I'd like to propose PEP 393, which takes a different approach, > addressing both problems simultaneously: by getting a flexible > representation (one that can be either 1, 2, or 4 bytes), we can > support the full range of Unicode on all systems, but still use > only one byte per character for strings that are pure ASCII (which > will be the majority of strings for the majority of users). > > You'll find the PEP at > > http://www.python.org/dev/peps/pep-0393/ > > For convenience, I include it below. > > Regards, > Martin > > PEP: 393 > Title: Flexible String Representation > Version: $Revision: 88168 $ > Last-Modified: $Date: 2011-01-24 21:14:21 +0100 (Mo, 24. Jan 2011) $ > Author: Martin v. L?wis 
                              
                              > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 24-Jan-2010 > Python-Version: 3.3 > Post-History: > > Abstract > ======== > > The Unicode string type is changed to support multiple internal > representations, depending on the character with the largest Unicode > ordinal (1, 2, or 4 bytes). This will allow a space-efficient > representation in common cases, but give access to full UCS-4 on all > systems. For compatibility with existing APIs, several representations > may exist in parallel; over time, this compatibility should be phased > out. > > Rationale > ========= > > There are two classes of complaints about the current implementation > of the unicode type: on systems only supporting UTF-16, users complain > that non-BMP characters are not properly supported. On systems using > UCS-4 internally (and also sometimes on systems using UCS-2), there is > a complaint that Unicode strings take up too much memory - especially > compared to Python 2.x, where the same code would often use ASCII > strings (i.e. ASCII-encoded byte strings). With the proposed approach, > ASCII-only Unicode strings will again use only one byte per character; > while still allowing efficient indexing of strings containing non-BMP > characters (as strings containing them will use 4 bytes per > character). > > One problem with the approach is support for existing applications > (e.g. extension modules). For compatibility, redundant representations > may be computed. Applications are encouraged to phase out reliance on > a specific internal representation if possible. As interaction with > other libraries will often require some sort of internal > representation, the specification choses UTF-8 as the recommended way > of exposing strings to C code. > > For many strings (e.g. ASCII), multiple representations may actually > share memory (e.g. the shortest form may be shared with the UTF-8 form > if all characters are ASCII). With such sharing, the overhead of > compatibility representations is reduced. > > Specification > ============= > > The Unicode object structure is changed to this definition:: > > typedef struct { > PyObject_HEAD > Py_ssize_t length; > void *str; > Py_hash_t hash; > int state; > Py_ssize_t utf8_length; > void *utf8; > Py_ssize_t wstr_length; > void *wstr; > } PyUnicodeObject; > > These fields have the following interpretations: > > - length: number of code points in the string (result of sq_length) > - str: shortest-form representation of the unicode string; the lower > two bits of the pointer indicate the specific form: > 01 => 1 byte (Latin-1); 11 => 2 byte (UCS-2); 11 => 4 byte (UCS-4); > 00 => null pointer > > The string is null-terminated (in its respective representation). > - hash, state: same as in Python 3.2 > - utf8_length, utf8: UTF-8 representation (null-terminated) > - wstr_length, wstr: representation in platform's wchar_t > (null-terminated). If wchar_t is 16-bit, this form may use surrogate > pairs (in which cast wstr_length differs form length). > > All three representations are optional, although the str form is > considered the canonical representation which can be absent only > while the string is being created. > > The Py_UNICODE type is still supported but deprecated. It is always > defined as a typedef for wchar_t, so the wstr representation can double > as Py_UNICODE representation. > > The str and utf8 pointers point to the same memory if the string uses > only ASCII characters (using only Latin-1 is not sufficient). The str > and wstr pointers point to the same memory if the string happens to > fit exactly to the wchar_t type of the platform (i.e. uses some > BMP-not-Latin-1 characters if sizeof(wchar_t) is 2, and uses some > non-BMP characters if sizeof(wchar_t) is 4). > > If the string is created directly with the canonical representation > (see below), this representation doesn't take a separate memory block, > but is allocated right after the PyUnicodeObject struct. > > String Creation > --------------- > > The recommended way to create a Unicode object is to use the function > PyUnicode_New:: > > PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar); > > Both parameters must denote the eventual size/range of the strings. > In particular, codecs using this API must compute both the number of > characters and the maximum character in advance. An string is > allocated according to the specified size and character range and is > null-terminated; the actual characters in it may be unitialized. > > PyUnicode_FromString and PyUnicode_FromStringAndSize remain supported > for processing UTF-8 input; the input is decoded, and the UTF-8 > representation is not yet set for the string. > > PyUnicode_FromUnicode remains supported but is deprecated. If the > Py_UNICODE pointer is non-null, the str representation is set. If the > pointer is NULL, a properly-sized wstr representation is allocated, > which can be modified until PyUnicode_Finalize() is called (explicitly > or implicitly). Resizing a Unicode string remains possible until it > is finalized. > > PyUnicode_Finalize() converts a string containing only a wstr > representation into the canonical representation. Unless wstr and str > can share the memory, the wstr representation is discarded after the > conversion. > > String Access > ------------- > > The canonical representation can be accessed using two macros > PyUnicode_Kind and PyUnicode_Data. PyUnicode_Kind gives one of the > value PyUnicode_1BYTE (1), PyUnicode_2BYTE (2), or PyUnicode_4BYTE > (3). PyUnicode_Data gives the void pointer to the data, masking out > the pointer kind. All these functions call PyUnicode_Finalize > in case the canonical representation hasn't been computed yet. > > A new function PyUnicode_AsUTF8 is provided to access the UTF-8 > representation. It is thus identical to the existing > _PyUnicode_AsString, which is removed. The function will compute the > utf8 representation when first called. Since this representation will > consume memory until the string object is released, applications > should use the existing PyUnicode_AsUTF8String where possible > (which generates a new string object every time). API that implicitly > converts a string to a char* (such as the ParseTuple functions) will > use this function to compute a conversion. > > PyUnicode_AsUnicode is deprecated; it computes the wstr representation > on first use. > > String Operations > ----------------- > > Various convenience functions will be provided to deal with the > canonical representation, in particular with respect to concatenation > and slicing. > > Stable ABI > ---------- > > None of the functions in this PEP become part of the stable ABI. > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/josiah.carlson%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: 
                              
                              From stefan_ml at behnel.de Sat Jan 29 07:33:54 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 29 Jan 2011 07:33:54 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <4D4339D4.5060906@v.loewis.de> References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              <821v3xl6aw.fsf@mid.bfk.de> 
                              
                              <4D4339D4.5060906@v.loewis.de> Message-ID: 
                              
                              "Martin v. L?wis", 28.01.2011 22:49: > And indeed, when Cython is updated to 3.3, it shouldn't access the UTF-8 > representation for such a loop. Instead, it should access the str > representation Sure. >> Regarding Cython specifically, the above will still be *possible* under >> the proposal, given that the memory layout of the strings will still >> represent the Unicode code points. It will just be trickier to implement >> in Cython's type system as there is no longer a (user visible) C type >> representation for those code units. > > There is: Py_UCS4 remains available. Thanks for that pointer. I had always thought that all "*UCS4*" names were platform specific and had completely missed that type. It's a lot nicer than Py_UNICODE because it allows users to fold surrogate pairs back into the character value. It's completely missing from the docs, BTW. Google doesn't give me a single mention for all of docs.python.org, even though it existed at least since (and likely long before) Cython's oldest supported runtime Python 2.3. If I had known about that type earlier, I could have ended up making that the native Unicode character type in Cython instead of bothering with Py_UNICODE. But this can still be changed I think. Since type inference was available before native Py_UNICODE support, it's unlikely that users will have Py_UNICODE written in their code explicitly. So I can make the switch under the hood. Just to explain, a native CPython C type is much better than an arbitrary integer type, because it allows Cython to apply specific coercion rules from and to Python object types. As currently Py_UNICODE, Py_UCS4 would obviously coerce from and to a 1 character Unicode string, but it could additionally handle surrogate pair splitting and combining automatically on current 16-bit Unicode builds so that you'd get a Unicode string with two code points on coercion to Python. >> While I'm somewhat confident that I'll >> find a way to fix this in Cython, my point is just that this adds a >> certain level of complexity to C code using the new memory layout that >> simply wasn't there before. > > Understood. However, I think it is easier than you think it is. Let's see about the implications once there is an implementation. Stefan From stefan_ml at behnel.de Sat Jan 29 08:47:38 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 29 Jan 2011 08:47:38 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <4D3DDE5E.4080807@v.loewis.de> References: <4D3DDE5E.4080807@v.loewis.de> Message-ID: 
                              
                              "Martin v. L?wis", 24.01.2011 21:17: > I have been thinking about Unicode representation for some time now. > This was triggered, on the one hand, by discussions with Glyph Lefkowitz > (who complained that his server app consumes too much memory), and Carl > Friedrich Bolz (who profiled Python applications to determine that > Unicode strings are among the top consumers of memory in Python). > On the other hand, this was triggered by the discussion on supporting > surrogates in the library better. > > I'd like to propose PEP 393, which takes a different approach, > addressing both problems simultaneously: by getting a flexible > representation (one that can be either 1, 2, or 4 bytes), we can > support the full range of Unicode on all systems, but still use > only one byte per character for strings that are pure ASCII (which > will be the majority of strings for the majority of users). > > You'll find the PEP at > > http://www.python.org/dev/peps/pep-0393/ > > [...] > Stable ABI > ---------- > > None of the functions in this PEP become part of the stable ABI. I think that's only part of the truth. This PEP can potentially have an impact on the stable ABI in the sense that the build-time size of Py_UNICODE may no longer be important for extensions that work on unicode buffers in the future as long as they only use the 'str' pointer and not 'wstr'. Stefan From stefan_ml at behnel.de Sat Jan 29 08:48:18 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 29 Jan 2011 08:48:18 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <4D3DDE5E.4080807@v.loewis.de> References: <4D3DDE5E.4080807@v.loewis.de> Message-ID: 
                              
                              "Martin v. L?wis", 24.01.2011 21:17: > I have been thinking about Unicode representation for some time now. > This was triggered, on the one hand, by discussions with Glyph Lefkowitz > (who complained that his server app consumes too much memory), and Carl > Friedrich Bolz (who profiled Python applications to determine that > Unicode strings are among the top consumers of memory in Python). > On the other hand, this was triggered by the discussion on supporting > surrogates in the library better. > > I'd like to propose PEP 393, which takes a different approach, > addressing both problems simultaneously: by getting a flexible > representation (one that can be either 1, 2, or 4 bytes), we can > support the full range of Unicode on all systems, but still use > only one byte per character for strings that are pure ASCII (which > will be the majority of strings for the majority of users). > > You'll find the PEP at > > http://www.python.org/dev/peps/pep-0393/ After much discussion, I'm +1 for this PEP. Implementation and benchmarks are pending, but there are strong indicators that it will bring relief for the memory overhead of most applications without leading to a major degradation performance-wise. Not for Python code anyway, and I'll try to make sure Cython extensions won't notice much when switching to CPython 3.3. Martin, this is a smart way of doing it. Stefan From martin at v.loewis.de Sat Jan 29 10:05:59 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 29 Jan 2011 10:05:59 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: 
                              
                              References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              Message-ID: <4D43D877.6090701@v.loewis.de> >> None of the functions in this PEP become part of the stable ABI. > > I think that's only part of the truth. This PEP can potentially have an > impact on the stable ABI in the sense that the build-time size of > Py_UNICODE may no longer be important for extensions that work on > unicode buffers in the future as long as they only use the 'str' pointer > and not 'wstr'. Py_UNICODE isn't part of the stable ABI, so it wasn't important for extensions using the stable ABI before - so really no change here. Regards, Martin From stefan_ml at behnel.de Sat Jan 29 11:00:48 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 29 Jan 2011 11:00:48 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <4D43D877.6090701@v.loewis.de> References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              <4D43D877.6090701@v.loewis.de> Message-ID: 
                              
                              "Martin v. L?wis", 29.01.2011 10:05: >>> None of the functions in this PEP become part of the stable ABI. >> >> I think that's only part of the truth. This PEP can potentially have an >> impact on the stable ABI in the sense that the build-time size of >> Py_UNICODE may no longer be important for extensions that work on >> unicode buffers in the future as long as they only use the 'str' pointer >> and not 'wstr'. > > Py_UNICODE isn't part of the stable ABI, so it wasn't important for > extensions using the stable ABI before - so really no change here. I know, that's not what I meant. But this PEP would enable a C API that provides direct access to the underlying buffer. Just as is currently provided for the Py_UNICODE array, but with a stable ABI because the buffer type won't change based on build time options. OTOH, one could argue that this is already partly provided by the generic buffer API. Stefan From ncoghlan at gmail.com Sat Jan 29 13:49:01 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 29 Jan 2011 22:49:01 +1000 Subject: [Python-Dev] fcmp() in test.support In-Reply-To: <4D431A34.9030401@voidspace.org.uk> References: 
                              
                              <4D431A34.9030401@voidspace.org.uk> Message-ID: 
                              
                              On Sat, Jan 29, 2011 at 5:34 AM, Michael Foord 
                              
                              wrote: > This module shouldn't really be documented at all. It exists to support the > Python test framework and we don't want to have to support users or make API > stability guarantees. Plus most of it is rather old. Please don't document > more stuff in this module. As Eli noted, we explicitly disclaim all stability guarantees when it comes to this module. Documenting it properly is intended to make it easier to write high quality tests using the utilities we have developed over the years without having to read the source code. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Jan 29 13:53:44 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 29 Jan 2011 22:53:44 +1000 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: 
                              
                              References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              <4D43D877.6090701@v.loewis.de> 
                              
                              Message-ID: 
                              
                              On Sat, Jan 29, 2011 at 8:00 PM, Stefan Behnel 
                              
                              wrote: > OTOH, one could argue that this is already partly provided by the generic > buffer API. Which won't be part of the stable ABI until 3.3 - there are some discrepancies between PEP 3118 and the actual implementation that we need to sort out first. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Sat Jan 29 14:21:14 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 29 Jan 2011 14:21:14 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              <4D43D877.6090701@v.loewis.de> 
                              
                              Message-ID: <20110129142114.0c6d04c1@pitrou.net> On Sat, 29 Jan 2011 11:00:48 +0100 Stefan Behnel 
                              
                              wrote: > > I know, that's not what I meant. But this PEP would enable a C API that > provides direct access to the underlying buffer. Just as is currently > provided for the Py_UNICODE array, but with a stable ABI because the buffer > type won't change based on build time options. > > OTOH, one could argue that this is already partly provided by the generic > buffer API. Unicode objects don't provide the buffer API (and chances are they never will). Regards Antoine. From stefan_ml at behnel.de Sat Jan 29 18:03:23 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 29 Jan 2011 18:03:23 +0100 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: <4D3DDE5E.4080807@v.loewis.de> References: <4D3DDE5E.4080807@v.loewis.de> Message-ID: 
                              
                              "Martin v. L?wis", 24.01.2011 21:17: > I'd like to propose PEP 393, which takes a different approach, > addressing both problems simultaneously: by getting a flexible > representation (one that can be either 1, 2, or 4 bytes), we can > support the full range of Unicode on all systems, but still use > only one byte per character for strings that are pure ASCII (which > will be the majority of strings for the majority of users). > > You'll find the PEP at > > http://www.python.org/dev/peps/pep-0393/ >[...] > The Py_UNICODE type is still supported but deprecated. It is always > defined as a typedef for wchar_t, so the wstr representation can double > as Py_UNICODE representation. What about the character property functions? http://docs.python.org/py3k/c-api/unicode.html#unicode-character-properties Will they be adapted to accept Py_UCS4 instead of Py_UNICODE? Stefan From alexander.belopolsky at gmail.com Sat Jan 29 18:12:19 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 29 Jan 2011 12:12:19 -0500 Subject: [Python-Dev] PEP 393: Flexible String Representation In-Reply-To: 
                              
                              References: <4D3DDE5E.4080807@v.loewis.de> 
                              
                              Message-ID: 
                              
                              On Sat, Jan 29, 2011 at 12:03 PM, Stefan Behnel 
                              
                              wrote: .. > What about the character property functions? > > http://docs.python.org/py3k/c-api/unicode.html#unicode-character-properties > > Will they be adapted to accept Py_UCS4 instead of Py_UNICODE? They have been already. See revision 84177. Docs should be fixed. From victor.stinner at haypocalc.com Sun Jan 30 09:56:18 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sun, 30 Jan 2011 09:56:18 +0100 Subject: [Python-Dev] Issue #11051: system calls per import Message-ID: <1296377778.24415.4.camel@marge> Hi, Antoine Pitrou noticed that Python 3.2 tries a lot of filenames to load a module: http://bugs.python.org/issue11051 Python 3.1 does already test many filenames, but with Python 3.2, it is even worse. For each directory in sys.path, it tries 9 suffixes: '', '.cpython-32m.so', 'module.cpython-32m.so', '.abi3.so', 'module.abi3.so', '.so', 'module.so', '.py', '.pyc'. I don't understand why it tests so much .so suffixes. And why it does test with and without "module". Victor From martin at v.loewis.de Sun Jan 30 10:40:52 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 30 Jan 2011 10:40:52 +0100 Subject: [Python-Dev] Issue #11051: system calls per import In-Reply-To: <1296377778.24415.4.camel@marge> References: <1296377778.24415.4.camel@marge> Message-ID: <4D453224.7020809@v.loewis.de> > Python 3.1 does already test many filenames, but with Python 3.2, it is > even worse. > > For each directory in sys.path, it tries 9 suffixes: '', > '.cpython-32m.so', 'module.cpython-32m.so', '.abi3.so', > 'module.abi3.so', '.so', 'module.so', '.py', '.pyc'. > > I don't understand why it tests so much .so suffixes. And why it does > test with and without "module". The many extensions have been specified in PEP 3149. The PEP also specifies # This "tag" will appear between the module base name and the operation # file system extension for shared libraries. which apparently meant that the existing mechanism is extended to add the tag. The support for both the "short extension" (i.e. ".so") and "long extension" (i.e. "module.so") goes back to r4297 (Python 1.1), when the short extension was added as an alternative to the long extension. The original module suffix was defined in r3518 when dynamic extension modules got supported, as either "module.so" (SUN_SHLIB) or "module.o" (dl_loadmod, apparently Irix). Regards, Martin From georg at python.org Sun Jan 30 10:25:05 2011 From: georg at python.org (Georg Brandl) Date: Sun, 30 Jan 2011 10:25:05 +0100 Subject: [Python-Dev] Issue #11051: system calls per import In-Reply-To: <1296377778.24415.4.camel@marge> References: <1296377778.24415.4.camel@marge> Message-ID: <4D452E71.6070401@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 30.01.2011 09:56, schrieb Victor Stinner: > Hi, > > Antoine Pitrou noticed that Python 3.2 tries a lot of filenames to load > a module: > http://bugs.python.org/issue11051 > > Python 3.1 does already test many filenames, but with Python 3.2, it is > even worse. > > For each directory in sys.path, it tries 9 suffixes: '', > '.cpython-32m.so', 'module.cpython-32m.so', '.abi3.so', > 'module.abi3.so', '.so', 'module.so', '.py', '.pyc'. '' is not really a suffix, but a test for a package directory. > I don't understand why it tests so much .so suffixes. Because of PEP 3149 and PEP 384. > And why it does test with and without "module". Because it always did (there's a thing called backwards compatibility.) This is of course probably the obvious one to start a deprecation process. Georg -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1FLnEACgkQN9GcIYhpnLApaACdGDe9qVlZNVHRF92yTqYnYFIp hjIAn34YqvMy8fy7pcz0qAlS/WhRWR4G =1b9C -----END PGP SIGNATURE----- From ncoghlan at gmail.com Sun Jan 30 13:52:25 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 30 Jan 2011 22:52:25 +1000 Subject: [Python-Dev] Issue #11051: system calls per import In-Reply-To: <4D452E71.6070401@python.org> References: <1296377778.24415.4.camel@marge> <4D452E71.6070401@python.org> Message-ID: 
                              
                              On Sun, Jan 30, 2011 at 7:25 PM, Georg Brandl 
                              
                              wrote: >> And why it does test with and without "module". > > Because it always did (there's a thing called backwards compatibility.) > > This is of course probably the obvious one to start a deprecation process. But why do we check the long suffix for the *new* extension module naming variants from PEP 3149 and PEP 384? Those are completely new, so there's no backwards compatibility argument there. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From victor.stinner at haypocalc.com Sun Jan 30 17:35:45 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sun, 30 Jan 2011 17:35:45 +0100 Subject: [Python-Dev] Issue #11051: system calls per import In-Reply-To: 
                              
                              References: <1296377778.24415.4.camel@marge> <4D452E71.6070401@python.org> 
                              
                              Message-ID: <1296405345.24507.9.camel@marge> Le dimanche 30 janvier 2011 ? 22:52 +1000, Nick Coghlan a ?crit : > On Sun, Jan 30, 2011 at 7:25 PM, Georg Brandl 
                              
                              wrote: > >> And why it does test with and without "module". > > > > Because it always did (there's a thing called backwards compatibility.) > > > > This is of course probably the obvious one to start a deprecation process. > > But why do we check the long suffix for the *new* extension module > naming variants from PEP 3149 and PEP 384? Those are completely new, > so there's no backwards compatibility argument there. My implicit question was: can we limit the number of tested suffixes? I see two candidates: remove 'module.cpython-32m.so' ('.cpython-32m.so' should be enough) and 'module.abi3.so' ('.abi3.so' should be enough). And the real question is: should we change that before 3.2 final? If we don't change that in 3.2, it will be harder to change it later (but it is still possible). Limit the number of suffixes is maybe not the right solution to limit the number of system calls at startup. We can imagine alternatives: * remember the last filename when loading a module and retry this filename first * specify that a module is a Python system module and should only be loaded from "system directories" * specify the module type (directory, .py file, dynamic library, ...) when loading a module * (or a least remember the module type and retry this type first) * etc. We should find a compromise between speed (limit the number of system calls) and the usability of Python modules. Victor From georg at python.org Sun Jan 30 17:50:32 2011 From: georg at python.org (Georg Brandl) Date: Sun, 30 Jan 2011 17:50:32 +0100 Subject: [Python-Dev] Issue #11051: system calls per import In-Reply-To: <1296405345.24507.9.camel@marge> References: <1296377778.24415.4.camel@marge> <4D452E71.6070401@python.org> 
                              
                              <1296405345.24507.9.camel@marge> Message-ID: <4D4596D8.8040908@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 30.01.2011 17:35, schrieb Victor Stinner: > Le dimanche 30 janvier 2011 ? 22:52 +1000, Nick Coghlan a ?crit : >> On Sun, Jan 30, 2011 at 7:25 PM, Georg Brandl 
                              
                              wrote: >>>> And why it does test with and without "module". >>> >>> Because it always did (there's a thing called backwards compatibility.) >>> >>> This is of course probably the obvious one to start a deprecation process. >> >> But why do we check the long suffix for the *new* extension module >> naming variants from PEP 3149 and PEP 384? Those are completely new, >> so there's no backwards compatibility argument there. > > My implicit question was: can we limit the number of tested suffixes? I > see two candidates: remove 'module.cpython-32m.so' ('.cpython-32m.so' > should be enough) and 'module.abi3.so' ('.abi3.so' should be enough). > > And the real question is: should we change that before 3.2 final? We most definitely shouldn't. Georg -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1FltgACgkQN9GcIYhpnLDquwCfZH+jtM6nsXz4Iyi2XrhpDKBH +6IAnA4Be/CWQhiQ9hq1VqGH2ent7say =e1d5 -----END PGP SIGNATURE----- From alexander.belopolsky at gmail.com Sun Jan 30 17:54:51 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 30 Jan 2011 11:54:51 -0500 Subject: [Python-Dev] Issue #11051: system calls per import In-Reply-To: <1296405345.24507.9.camel@marge> References: <1296377778.24415.4.camel@marge> <4D452E71.6070401@python.org> 
                              
                              <1296405345.24507.9.camel@marge> Message-ID: 
                              
                              On Sun, Jan 30, 2011 at 11:35 AM, Victor Stinner 
                              
                              wrote: .. > We should find a compromise between speed (limit the number of system > calls) and the usability of Python modules. Do you have measurements that show python spending significant time on failing open calls? From p.f.moore at gmail.com Sun Jan 30 20:37:40 2011 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 30 Jan 2011 19:37:40 +0000 Subject: [Python-Dev] Stable buildbots In-Reply-To: 
                              
                              References: <20101113133712.60e9be27@pitrou.net> 
                              
                              
                              
                              <4CEB7E12.1070201@snakebite.org> 
                              
                              Message-ID: 
                              
                              On 23 November 2010 23:18, David Bolen 
                              
                              wrote: > Trent Nelson 
                              
                              writes: > >> That's interesting. ?(That kill_python.exe doesn't kill the wedged >> processes, but pskill does.) ?kill_python is pretty simple, it just >> calls TerminateProcess() after acquiring a handle with the relevant >> PROCESS_TERMINATE access right. ?(...) >> >> Are you calling pskill with the -t flag? i.e. kill process and all >> dependents? ?That might be the ticket, especially if killing the child >> process that wedged select() is waiting on causes it to return, and >> thus, makes it killable. > > Nope, just "pskill python_d". ?Haven't bothered to check the pskill > source but I'm assuming it's just a basic TerminateProcess. Ideally my > quickest workaround would just be to replace the kill_python in the > buildbot tools script with that command but of course they could get > updated on checkouts and I'm not arguing it's generally appropriate enough > to belong in the source. After a long, long time (:-(), I'm finally getting a chance to look at this. I've patched buildbot as mentioned earlier in the thread, but I don't see where I should put the pskill command to make it work. At the moment, I have scheduled tasks to pskill python_d and vsjitdebugger. The python_d one runs daily and the debugger one hourly. (I daren't kill python_d too often, or I'll start killing in-progress tests, I assume). The vsjitdebugger one is there because I think it solves the CRT popup issue (I'll add the autoit script as well, but as I'm running as a service, I'm not sure the popup will alwats be visible for the autoit script to pick up...) Presumably, you're inserting a pskill command somewhere into the actual build process. I don't know much about buildbot, but I thought that was controlled by the master and/or the Python build scripts, neither of which I can change. If I want to add a pskill command just after a build/test has run (i.e., about where kill_python runs at the moment) how do I do that? Thanks, Paul. From martin at v.loewis.de Sun Jan 30 20:43:57 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 30 Jan 2011 20:43:57 +0100 Subject: [Python-Dev] Issue #11051: system calls per import In-Reply-To: 
                              
                              References: <1296377778.24415.4.camel@marge> <4D452E71.6070401@python.org> 
                              
                              <1296405345.24507.9.camel@marge> 
                              
                              Message-ID: <4D45BF7D.70405@v.loewis.de> Am 30.01.2011 17:54, schrieb Alexander Belopolsky: > On Sun, Jan 30, 2011 at 11:35 AM, Victor Stinner > 
                              
                              wrote: > .. >> We should find a compromise between speed (limit the number of system >> calls) and the usability of Python modules. > > Do you have measurements that show python spending significant time on > failing open calls? No; past measurements always showed that this is insignificant, probably thanks to operating system caching the relevant directory blocks (so it doesn't really matter whether you make one or ten lookups per directory; my guess is that it matters more if you look into ten directories instead of one). Regards, Martin From db3l.net at gmail.com Sun Jan 30 21:50:36 2011 From: db3l.net at gmail.com (David Bolen) Date: Sun, 30 Jan 2011 15:50:36 -0500 Subject: [Python-Dev] Stable buildbots References: <20101113133712.60e9be27@pitrou.net> 
                              
                              
                              
                              <4CEB7E12.1070201@snakebite.org> 
                              
                              
                              Message-ID: 
                              
                              Paul Moore 
                              
                              writes: > Presumably, you're inserting a pskill command somewhere into the > actual build process. I don't know much about buildbot, but I thought > that was controlled by the master and/or the Python build scripts, > neither of which I can change. > > If I want to add a pskill command just after a build/test has run > (i.e., about where kill_python runs at the moment) how do I do that? I haven't been able to - as you say there's no good way to hook into the build process in real time as the changes have to be external or they'll get zapped on the next checkout. I suppose you could rapidly try to monitor the output of the build slave log file, but then you risk killing a process from a next step if you miss something or are too slow. And I've had cases (after long periods of continuous runtime) where the build slave log stops being generated even while the slave is running fine. Anyway, in the absence of changes to the build tree, I finally gave up and now run an external script (see below) that whacks any python_d process it finds running for more than 2 hours (arbitrary choice). I considered trying to dig deeper to identify processes with no logical test parent (more similar to the build kill_python itself), but decided it was too much effort for the minimal extra gain. So not terribly different from your once a day pskill, though as you say if you arbitrarily kill all python_d processes at any given point in time, you risk interrupting an active test. So the AutoIt script covers pop-ups and the script below cleans up hung processes. On the subject of pop-ups, I'm not sure but if you find your service not showing them try enabling the "Allow service to interact with the desktop" option in the service definition. In my experience though if a service can't perform a UI interaction, the interaction just fails, so I wouldn't expect the process to get stuck in that case. Anyway, in my case the kill script itself is Cygwin/bash based, but using the pstools tools, and conceptually just kills (pskill) any python_d process identified as having been running for 2 or more hours of wall time (via pslist): - - - - - - - - - - - - - - - - - - - - - - - - - #!/bin/sh # # kill_python.sh # # Quick 'n dirty script to watch for python_d processes that exceed a few # hours of runtime, then kill then assuming they're hung # PROC="python_d" TIMEOUT="2" while [ 1 ]; do echo "`date` Checking..." PIDS=`pslist 2>&1 | grep "^$PROC" | awk -v TIMEOUT=$TIMEOUT '{split($NF,fields,":"); if (int(fields[1]) >= int(TIMEOUT)) {print $2}}'` if [ "$PIDS" ]; then echo ===== `date` for pid in $PIDS; do pslist $pid 2>&1 | grep "^$PROC" pskill $pid done echo ===== fi sleep 300 done - - - - - - - - - - - - - - - - - - - - - - - - - It's a kludge, but as you say, for us to impose this on the build slave side requires it to be outside of the build tree. I've been running it for about a month now and it seems to be doing the job. I run a similar script on OSX (my Tiger slave also sometimes sees stuck processes, though they just burn CPU rather than interfere with tests), but there I can identify stranded python_d processes if they are owned by init, so the script can react more quickly. I'm pretty sure the best long term fix is to move the kill processing into the clean script (as per issue 9973) rather than where it currently is in the build script, but so far I don't think the idea has been able to attract the interest of anyone who can actually commit such a change. (See also the Dec continuation of this thread - http://www.mail-archive.com/python-dev at python.org/msg54389.html) I had also created issue 10641 from when I thought I found a problem with kill_python, but that turned out incorrect, and in subsequent tests kill_python in the build tree always worked, so the core issue seems to always be the failure to run it at all as opposed to it not working. For now though, these two external "monitors" seem to have helped contain the number of manual operations I have to do on my two Windows slaves. (Though recently I've begun seeing two new sorts of pop-ups under Windows 7 but both related to memory, so I think I just need to give my VM a little more memory) -- David From solipsis at pitrou.net Sun Jan 30 22:17:24 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 30 Jan 2011 22:17:24 +0100 Subject: [Python-Dev] Stable buildbots References: <20101113133712.60e9be27@pitrou.net> 
                              
                              
                              
                              <4CEB7E12.1070201@snakebite.org> 
                              
                              
                              
                              Message-ID: <20110130221724.40e9cb4d@pitrou.net> Hello, > I'm pretty sure the best long term fix is to move the kill processing > into the clean script (as per issue 9973) rather than where it > currently is in the build script, but so far I don't think the idea > has been able to attract the interest of anyone who can actually > commit such a change. Thanks for bringing my attention on this. I've added a comment on that issue. If you say this should improve things, there's probably no reason not to commit such a patch. Regards Antoine. From p.f.moore at gmail.com Sun Jan 30 22:46:25 2011 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 30 Jan 2011 21:46:25 +0000 Subject: [Python-Dev] Stable buildbots In-Reply-To: 
                              
                              References: <20101113133712.60e9be27@pitrou.net> 
                              
                              
                              
                              <4CEB7E12.1070201@snakebite.org> 
                              
                              
                              
                              Message-ID: 
                              
                              On 30 January 2011 20:50, David Bolen 
                              
                              wrote: > I haven't been able to - as you say there's no good way to hook into > the build process in real time as the changes have to be external or > they'll get zapped on the next checkout. ?I suppose you could rapidly > try to monitor the output of the build slave log file, but then you > risk killing a process from a next step if you miss something or are > too slow. ?And I've had cases (after long periods of continuous > runtime) where the build slave log stops being generated even while > the slave is running fine. OK, sounds like I hadn't missed anything, then, which is good in some sense :-) > For now though, these two external "monitors" seem to have helped > contain the number of manual operations I have to do on my two Windows > slaves. ?(Though recently I've begun seeing two new sorts of pop-ups > under Windows 7 but both related to memory, so I think I just need to > give my VM a little more memory) Yes, my (somewhat more simplistic) kill scripts had done some good as well. Having said that, http://bugs.python.org/issue9931 is currently stopping my buildslave (at least if I run it as a service), so it's a bit of a moot point at the moment... (One thing that might be good is if there were a means in the buildslave architecture to deliberately disable a test temporarily, if it's known to fail - I know ignoring errors isn't a good thing in general, but OTOH, having a slave effectively dead for months because of a known issue isn't a lot of help, either :-() Thanks for the reply. Paul. From greg.ewing at canterbury.ac.nz Sun Jan 30 22:23:45 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 31 Jan 2011 10:23:45 +1300 Subject: [Python-Dev] Issue #11051: system calls per import In-Reply-To: <1296405345.24507.9.camel@marge> References: <1296377778.24415.4.camel@marge> <4D452E71.6070401@python.org> 
                              
                              <1296405345.24507.9.camel@marge> Message-ID: <4D45D6E1.6030906@canterbury.ac.nz> Victor Stinner wrote: > Limit the number of suffixes is maybe not the right solution to limit > the number of system calls at startup. We can imagine alternatives: > > * remember the last filename when loading a module and retry this > filename first > * specify that a module is a Python system module and should only be > loaded from "system directories" > * specify the module type (directory, .py file, dynamic library, ...) > when loading a module > * (or a least remember the module type and retry this type first) > * etc. Maybe also * Read and cache the directory contents and search it ourselves instead of making a system call for every possible name. -- Greg From raymond.hettinger at gmail.com Mon Jan 31 05:26:31 2011 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 30 Jan 2011 20:26:31 -0800 Subject: [Python-Dev] [Python-checkins] r88273 - python/branches/py3k/Doc/whatsnew/3.2.rst In-Reply-To: <20110131042140.E8C47EE991@mail.python.org> References: <20110131042140.E8C47EE991@mail.python.org> Message-ID: 
                              
                              On Jan 30, 2011, at 8:21 PM, eli.bendersky wrote: Please use the open tracker item and do not edit the document directly. Raymond From martin at v.loewis.de Mon Jan 31 08:33:13 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 31 Jan 2011 08:33:13 +0100 Subject: [Python-Dev] Issue #11051: system calls per import In-Reply-To: <4D45D6E1.6030906@canterbury.ac.nz> References: <1296377778.24415.4.camel@marge> <4D452E71.6070401@python.org> 
                              
                              <1296405345.24507.9.camel@marge> <4D45D6E1.6030906@canterbury.ac.nz> Message-ID: <4D4665B9.9000108@v.loewis.de> > Maybe also > > * Read and cache the directory contents and search it ourselves > instead of making a system call for every possible name. I wouldn't do that - I would expect that this is actually slower than making the system calls, because the system might get away with not reading the entire directory (whereas it will have to when we explicitly ask for that). Regards, Martin From guido at python.org Mon Jan 31 09:08:25 2011 From: guido at python.org (Guido van Rossum) Date: Mon, 31 Jan 2011 00:08:25 -0800 Subject: [Python-Dev] Issue #11051: system calls per import In-Reply-To: <4D4665B9.9000108@v.loewis.de> References: <1296377778.24415.4.camel@marge> <4D452E71.6070401@python.org> 
                              
                              <1296405345.24507.9.camel@marge> <4D45D6E1.6030906@canterbury.ac.nz> <4D4665B9.9000108@v.loewis.de> Message-ID: 
                              
                              On Sun, Jan 30, 2011 at 11:33 PM, "Martin v. L?wis" 
                              
                              wrote: >> Maybe also >> >> ? ?* Read and cache the directory contents and search it ourselves >> ? ? ?instead of making a system call for every possible name. > > I wouldn't do that - I would expect that this is actually slower than > making the system calls, because the system might get away with not > reading the entire directory (whereas it will have to when we explicitly > ask for that). Hm. Long (very long) ago I had to implement just that, and it was much faster. But this was over NFS. Still, I think the directory would have to be truly enormous before reading its contents (which doesn't access all the inodes) is slower than statting a few dozen of its entries. At least on most *nix filesystems. Another thing to consider: on App Engine (which despite of all its architectural weirdness uses a -- mostly -- standard Linux filesystem for the Python code of the app) someone measured that importing from a zipfile is much faster than importing from the filesystem. I would imagine this extends to other contexts too, and it makes sense because the zipfile directory gets cached in memory so no stat() calls are necessary. (Basically I am biased to believe that stat() is a pretty slow system call -- this may just be old NFS lore though.) -- --Guido van Rossum (python.org/~guido) From j.bos-interpay at xs4all.nl Mon Jan 31 10:17:51 2011 From: j.bos-interpay at xs4all.nl (Jurjen N.E. Bos) Date: Mon, 31 Jan 2011 10:17:51 +0100 Subject: [Python-Dev] Byte code arguments from two to one byte: did anyone try this? Message-ID: 
                              
                              I tried to find any research on this subject, but I couldn't find any, so I'll be daring and vulnerable and just try it out to see what your thoughts are. I single stepped a simple loop in Python to see where the efficiency bottlenecks are. I was impressed by the optimizations already in there, but I still dare to suggest an optimization that from my estimates might shave off a few cycles, speeding up Python about 5%. The idea is simple: change the byte code argument values from two bytes to one. Implications are: - code changes are relatively simple, see below - fewer memory reads, which are becoming more and more expensive - saves three instructions for every opcode with args (i.e. most of them) Code changes are, as far as I could find: compile.c: assemble_emit must produce extended opcodes for all cases of more than 8 bits instead of 16 ceval.c: NEXTARG and PEEKARG need adjustment EXTENDED_ARG needs adjustment (this will be a four byte instruction, which is ugly, I agree) peephole.c: GETARG, SETARG, need adjustment also GETJUMPTGT, CODESIZE routine tuple_of_constants, fold_binops_on_constants, PyCode_Optimize are dependent on instruction length, which will be 2 instead of 3 (search for the digit 3 will find all cases, as far as I checked) you probably will have to write a macro for codestr[i+3] there is a check for code length >32700, but I think this one might stay, maybe if a few extra checks are added. dis: minor adjustments Estimation of speed impact: about 80% of the instructions seem to have an argument, and I never saw an opcode >255 while looking at bytecode, so they are probably not frequent. The NEXTARG macro expands on my Macbook to: mov -408(%ebp),%edx (next_instr) movzbl 2(%edx),%eax (*second byte) shl $0x8,%eax (*shift) movzbl 1(%edx),%edx (first byte) add %edx,%eax (*combine) and the starred instructions will vanish. The main loop is approximately 40 instructions, so a saving of three instructions is significant. I don't dare to claim 3/40 = 7.5% savings, but I think 5% may be realistic. Did anyone try this already? If not, I might take up the gauntlet and try it myself, but I never did this before... - Jurjen PS I also saw that some scratch variables, mainly v and x, are carefull stored back in memory by the compiler and the end of the big interpreter loop, while their value isn't used anymore, of course. A few carefully placed braces might tell the compiler how useless this is and save another few percent. From stefan_ml at behnel.de Mon Jan 31 11:10:13 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 31 Jan 2011 11:10:13 +0100 Subject: [Python-Dev] Byte code arguments from two to one byte: did anyone try this? In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: 
                              
                              Jurjen N.E. Bos, 31.01.2011 10:17: > I single stepped a simple loop in Python to see where the efficiency > bottlenecks are. What version of CPython did you try that with? The latest py3k branch? Stefan From georg at python.org Mon Jan 31 11:32:02 2011 From: georg at python.org (Georg Brandl) Date: Mon, 31 Jan 2011 11:32:02 +0100 Subject: [Python-Dev] [RELEASED] Python 3.2 rc 2 Message-ID: <4D468FA2.4040704@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On behalf of the Python development team, I'm quite happy to announce the second release candidate of Python 3.2. Python 3.2 is a continuation of the efforts to improve and stabilize the Python 3.x line. Since the final release of Python 2.7, the 2.x line will only receive bugfixes, and new features are developed for 3.x only. Since PEP 3003, the Moratorium on Language Changes, is in effect, there are no changes in Python's syntax and built-in types in Python 3.2. Development efforts concentrated on the standard library and support for porting code to Python 3. Highlights are: * numerous improvements to the unittest module * PEP 3147, support for .pyc repository directories * PEP 3149, support for version tagged dynamic libraries * PEP 3148, a new futures library for concurrent programming * PEP 384, a stable ABI for extension modules * PEP 391, dictionary-based logging configuration * an overhauled GIL implementation that reduces contention * an extended email package that handles bytes messages * a much improved ssl module with support for SSL contexts and certificate hostname matching * a sysconfig module to access configuration information * additions to the shutil module, among them archive file support * many enhancements to configparser, among them mapping protocol support * improvements to pdb, the Python debugger * countless fixes regarding bytes/string issues; among them full support for a bytes environment (filenames, environment variables) * many consistency and behavior fixes for numeric operations For a more extensive list of changes in 3.2, see http://docs.python.org/3.2/whatsnew/3.2.html To download Python 3.2 visit: http://www.python.org/download/releases/3.2/ Please consider trying Python 3.2 with your code and reporting any bugs you may notice to: http://bugs.python.org/ Enjoy! - -- Georg Brandl, Release Manager georg at python.org (on behalf of the entire python-dev team and 3.2's contributors) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk1Gj6IACgkQN9GcIYhpnLC53wCfcZhc6bxbc+fsmi+PAJxM6npr Hh4An3QRdeyKHm+L3CqVk+EX02PxNx2r =sTu6 -----END PGP SIGNATURE----- From steve at pearwood.info Mon Jan 31 11:31:53 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 31 Jan 2011 21:31:53 +1100 Subject: [Python-Dev] Byte code arguments from two to one byte: did anyone try this? In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: <4D468F99.8070001@pearwood.info> Jurjen N.E. Bos wrote: > I was impressed by the optimizations already in there, but I still dare > to suggest an optimization that from my estimates might shave off a few > cycles, speeding up Python about 5%. > The idea is simple: change the byte code argument values from two bytes > to one. Interesting. Have you seem Cesare Di Mauro's WPython project, which takes the opposite strategy? http://code.google.com/p/wpython2/ -- Steven From jussi.enkovaara at csc.fi Mon Jan 31 11:50:27 2011 From: jussi.enkovaara at csc.fi (Jussi Enkovaara) Date: Mon, 31 Jan 2011 12:50:27 +0200 Subject: [Python-Dev] Issue #11051: system calls per import In-Reply-To: <4D45BF7D.70405@v.loewis.de> References: <1296377778.24415.4.camel@marge> <4D452E71.6070401@python.org> 
                              
                              <1296405345.24507.9.camel@marge> 
                              
                              <4D45BF7D.70405@v.loewis.de> Message-ID: <4D4693F3.8030200@csc.fi> On 2011-01-30 21:43, "Martin v. L?wis" wrote: > Am 30.01.2011 17:54, schrieb Alexander Belopolsky: >> On Sun, Jan 30, 2011 at 11:35 AM, Victor Stinner >> 
                              
                              wrote: >> .. >>> We should find a compromise between speed (limit the number of system >>> calls) and the usability of Python modules. >> >> Do you have measurements that show python spending significant time on >> failing open calls? > > No; past measurements always showed that this is insignificant, probably > thanks to operating system caching the relevant directory blocks (so > it doesn't really matter whether you make one or ten lookups per > directory; my guess is that it matters more if you look into ten > directories instead of one). Dear Python-developers, I would like you to be aware of one particular problem related to the system calls in massively parallel systems. We are developing a Python-based simulation software GPAW (https://wiki.fysik.dtu.dk/gpaw/) and tested it with up to tens of thousands of CPU cores. The program uses MPI, thus thousands of Python interpreters are launched at start-up time. As all these interpreters execute the same import statements, the huge amount of (IO-related) system calls puts extreme pressure to the file system, and as result just starting the Python interpreter(s) can take ~45 minutes with ~30 000 CPU cores! Currently, we have tried to work around the problem either by installing Python and required additional modules (NumPy and GPAW) to a ramdisk, or by modifying the CPython source (at the moment 2.6 version) in such a way that only single process performs the system calls and uses MPI to broadcast the results to other processes (preliminary work in progress). As a related problem, dynamic linking can also be quite expensive (or even not available in some systems), and in some cases we have made a small hack to CPython for enabling statically linked packages (simple modules can of course be included relatively easily in static Python build.) I am not expecting that the problems can be solved easily for the general CPython interpreter, especially as massively parallel supercomputers are quite small niche of Python usage. However, I think it would be good to be aware of problems with large amount of system calls in a more special Python usage. Best regards, Jussi -- Jussi Enkovaara, Application Scientist, High Performance Computing, CSC PO. BOX 405 02101 Espoo, Finland, Tel +358 9 457 2935, fax +358 9 457 2302 CSC - IT Center for Science, www.csc.fi, e-mail: jussi.enkovaara at csc.fi From Jurjen.Bos at hetnet.nl Mon Jan 31 12:59:49 2011 From: Jurjen.Bos at hetnet.nl (Jurjen N.E. Bos) Date: Mon, 31 Jan 2011 12:59:49 +0100 Subject: [Python-Dev] Followup: Byte code arguments from two to one byte: did anyone try this? In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: 
                              
                              > What version of CPython did you try that with? The latest py3k branch? I had a quick look at 3.2, 2.5 and 2.7 and got the impression that the savings is more if the interpreter loop is faster: the fewer instructions there are, the bigger a 3 instruction difference would make. The NEXTARG macro is the same in all three versions: #define NEXTARG() (next_instr += 2, (next_instr[-1]<<8) + next_instr[-2]) and the compiler compiles this to two separate fetches. I found out my compiler (gcc) will make better code if we used a short. It produces a "movswl" instruction to do both fetches at the same time, if I force it to. That saves two instructions already. This would imply that on little-endian machines, this would already save a few percent changing just 1 line of code in ceval.c: #define NEXTARG() (next_instr += 2, *(short *)&next_instr[-2]) - Jurjen From Jurjen.Bos at hetnet.nl Mon Jan 31 13:28:39 2011 From: Jurjen.Bos at hetnet.nl (Jurjen N.E. Bos) Date: Mon, 31 Jan 2011 13:28:39 +0100 Subject: [Python-Dev] short fetch for NEXTARG macro (was: one byte byte code arguments) Message-ID: <86A291E9-5B01-478F-8FB3-20A422534EEB@hetnet.nl> I just did it: my first python source code hack. I replaced the NEXTARG and PEEKARG macros in ceval.c using a cast to short pointer, and lo and behold, a crude measurement indicates one to two percent speed increase. That isn't much, but it is virtually for free! Here are the macro's I used: #define NEXTARG() (next_instr +=2, *(short*)&next_instr[-2]) #define PEEKARG() (*(short*)&next_instr[1]) - Jurjen From solipsis at pitrou.net Mon Jan 31 13:43:00 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 31 Jan 2011 13:43:00 +0100 Subject: [Python-Dev] Issue #11051: system calls per import References: <1296377778.24415.4.camel@marge> <4D452E71.6070401@python.org> 
                              
                              <1296405345.24507.9.camel@marge> <4D45D6E1.6030906@canterbury.ac.nz> <4D4665B9.9000108@v.loewis.de> 
                              
                              Message-ID: <20110131134300.2babc577@pitrou.net> On Mon, 31 Jan 2011 00:08:25 -0800 Guido van Rossum 
                              
                              wrote: > > (Basically I am biased to believe that stat() is a pretty slow system > call -- this may just be old NFS lore though.) I don't know about NFS, but starting a Python interpreter located on a Samba share from a Windows VM is quite slow too. I think Martin is right for the common case: on a local filesystem on a modern Unix, stat() is certainly very fast. Remote or distributed filesystems seem to be more of a problem. Regards Antoine. From solipsis at pitrou.net Mon Jan 31 13:45:26 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 31 Jan 2011 13:45:26 +0100 Subject: [Python-Dev] short fetch for NEXTARG macro (was: one byte byte code arguments) References: <86A291E9-5B01-478F-8FB3-20A422534EEB@hetnet.nl> Message-ID: <20110131134526.7a3af3fb@pitrou.net> On Mon, 31 Jan 2011 13:28:39 +0100 "Jurjen N.E. Bos" 
                              
                              wrote: > I just did it: my first python source code hack. > I replaced the NEXTARG and PEEKARG macros in ceval.c using a cast to > short pointer, and lo and behold, a crude measurement indicates one > to two percent speed increase. > That isn't much, but it is virtually for free! > > Here are the macro's I used: > #define NEXTARG() (next_instr +=2, *(short*)&next_instr[-2]) > #define PEEKARG() (*(short*)&next_instr[1]) Some architectures forbid unaligned access, so this can't be used as-is. Regards Antoine. From cesare.di.mauro at gmail.com Mon Jan 31 13:59:16 2011 From: cesare.di.mauro at gmail.com (Cesare Di Mauro) Date: Mon, 31 Jan 2011 13:59:16 +0100 Subject: [Python-Dev] short fetch for NEXTARG macro (was: one byte byte code arguments) In-Reply-To: <20110131134526.7a3af3fb@pitrou.net> References: <86A291E9-5B01-478F-8FB3-20A422534EEB@hetnet.nl> <20110131134526.7a3af3fb@pitrou.net> Message-ID: 
                              
                              2011/1/31 Antoine Pitrou 
                              
                              > On Mon, 31 Jan 2011 13:28:39 +0100 > "Jurjen N.E. Bos" 
                              
                              wrote: > > I just did it: my first python source code hack. > > I replaced the NEXTARG and PEEKARG macros in ceval.c using a cast to > > short pointer, and lo and behold, a crude measurement indicates one > > to two percent speed increase. > > That isn't much, but it is virtually for free! > > > > Here are the macro's I used: > > #define NEXTARG() (next_instr +=2, *(short*)&next_instr[-2]) > > #define PEEKARG() (*(short*)&next_instr[1]) > > Some architectures forbid unaligned access, so this can't be used as-is. > > Regards > > Antoine. > > WPython already addressed it ( http://code.google.com/p/wpython2/source/browse/Python/ceval.c?repo=wpython11): #ifdef WORDS_BIGENDIAN #define NEXTOPCODE() oparg = *next_instr++; \ opcode = oparg >> 8; oparg &= 0xff #else #define NEXTOPCODE() oparg = *next_instr++; \ opcode = oparg & 0xff; oparg >>= 8 #endif Shorts alignament is also guaranted due to wordcodes ( http://wpython2.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdfpag.12). Cesare -------------- next part -------------- An HTML attachment was scrubbed... URL: 
                              
                              From tjreedy at udel.edu Mon Jan 31 14:23:29 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 31 Jan 2011 08:23:29 -0500 Subject: [Python-Dev] Byte code arguments from two to one byte: did anyone try this? In-Reply-To: <4D468F99.8070001@pearwood.info> References: 
                              
                              <4D468F99.8070001@pearwood.info> Message-ID: 
                              
                              On 1/31/2011 5:31 AM, Steven D'Aprano wrote: > Jurjen N.E. Bos wrote: >> I was impressed by the optimizations already in there, but I still >> dare to suggest an optimization that from my estimates might shave off >> a few cycles, speeding up Python about 5%. >> The idea is simple: change the byte code argument values from two >> bytes to one. > > > Interesting. Have you seem Cesare Di Mauro's WPython project, which > takes the opposite strategy? > > http://code.google.com/p/wpython2/ The two strategies could be mixed. Some 'word codes' could consist of a bytecode + byte arg, and others a real word code. Maybe WPython does that already. Might end up being slower though. -- Terry Jan Reedy From cesare.di.mauro at gmail.com Mon Jan 31 14:30:57 2011 From: cesare.di.mauro at gmail.com (Cesare Di Mauro) Date: Mon, 31 Jan 2011 14:30:57 +0100 Subject: [Python-Dev] Byte code arguments from two to one byte: did anyone try this? In-Reply-To: 
                              
                              References: 
                              
                              <4D468F99.8070001@pearwood.info> 
                              
                              Message-ID: 
                              
                              2011/1/31 Terry Reedy 
                              
                              > On 1/31/2011 5:31 AM, Steven D'Aprano wrote: > >> Jurjen N.E. Bos wrote: >> >>> I was impressed by the optimizations already in there, but I still >>> dare to suggest an optimization that from my estimates might shave off >>> a few cycles, speeding up Python about 5%. >>> The idea is simple: change the byte code argument values from two >>> bytes to one. >>> >> >> >> Interesting. Have you seem Cesare Di Mauro's WPython project, which >> takes the opposite strategy? >> >> http://code.google.com/p/wpython2/ >> > > The two strategies could be mixed. Some 'word codes' could consist of a > bytecode + byte arg, and others a real word code. Maybe WPython does that > already. Might end up being slower though. > > -- > Terry Jan Reedy Yes, WPython already does it ( http://wpython2.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdfpag.7) , but on average it was faster (pag. 28). Cesare > -------------- next part -------------- An HTML attachment was scrubbed... URL: 
                              
                              From foom at fuhm.net Mon Jan 31 15:29:46 2011 From: foom at fuhm.net (James Y Knight) Date: Mon, 31 Jan 2011 09:29:46 -0500 Subject: [Python-Dev] short fetch for NEXTARG macro (was: one byte byte code arguments) In-Reply-To: <20110131134526.7a3af3fb@pitrou.net> References: <86A291E9-5B01-478F-8FB3-20A422534EEB@hetnet.nl> <20110131134526.7a3af3fb@pitrou.net> Message-ID: <5BC68B65-92CA-4A2B-B0C4-8AAE764A0D0B@fuhm.net> On Jan 31, 2011, at 7:45 AM, Antoine Pitrou wrote: > On Mon, 31 Jan 2011 13:28:39 +0100 > "Jurjen N.E. Bos" 
                              
                              wrote: >> I just did it: my first python source code hack. >> I replaced the NEXTARG and PEEKARG macros in ceval.c using a cast to >> short pointer, and lo and behold, a crude measurement indicates one >> to two percent speed increase. >> That isn't much, but it is virtually for free! >> >> Here are the macro's I used: >> #define NEXTARG() (next_instr +=2, *(short*)&next_instr[-2]) >> #define PEEKARG() (*(short*)&next_instr[1]) > > Some architectures forbid unaligned access, so this can't be used as-is. It could perhaps be #ifdef'd in on x86/x86-64, though, which is by far the most common architecture to run python on. James From barry at python.org Mon Jan 31 17:11:30 2011 From: barry at python.org (Barry Warsaw) Date: Mon, 31 Jan 2011 11:11:30 -0500 Subject: [Python-Dev] Issue #11051: system calls per import In-Reply-To: <1296405345.24507.9.camel@marge> References: <1296377778.24415.4.camel@marge> <4D452E71.6070401@python.org> 
                              
                              <1296405345.24507.9.camel@marge> Message-ID: <20110131111130.1beefdc7@python.org> On Jan 30, 2011, at 05:35 PM, Victor Stinner wrote: >And the real question is: should we change that before 3.2 final? If we >don't change that in 3.2, it will be harder to change it later (but it >is still possible). I don't see how you possibly can without re-entering beta. Mucking with the import machinery *at all* does not seem prudent in the last RC. ;) FWIW, I recall this being discussed at the time of the PEPs and we decided not to narrow the search patterns down. I'd have to go through my archives for the details, but I think it would be better to officially deprecate the 'module' form so that they can be removed in a future version. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: 
                              
                              From techtonik at gmail.com Mon Jan 31 19:19:44 2011 From: techtonik at gmail.com (techtonik at gmail.com) Date: Mon, 31 Jan 2011 18:19:44 +0000 Subject: [Python-Dev] MSI: Remove dependency from win32com.client module (issue4080047) Message-ID: <90e6ba6e85f2cbfc00049b2875bf@google.com> Reviewers: , Please review this at http://codereview.appspot.com/4080047/ Affected files: M Tools/msi/msi.py M Tools/msi/msilib.py Index: Tools/msi/msi.py =================================================================== --- Tools/msi/msi.py (revision 88279) +++ Tools/msi/msi.py (working copy) @@ -4,7 +4,6 @@ import msilib, schema, sequence, os, glob, time, re, shutil, zipfile from msilib import Feature, CAB, Directory, Dialog, Binary, add_data import uisample -from win32com.client import constants from distutils.spawn import find_executable from uuids import product_codes import tempfile @@ -1360,7 +1359,7 @@ # Step 2: Add CAB files i = msilib.MakeInstaller() - db = i.OpenDatabase(msi, constants.msiOpenDatabaseModeTransact) + db = i.OpenDatabase(msi, msilib.msiOpenDatabaseModeTransact) v = db.OpenView("SELECT LastSequence FROM Media") v.Execute(None) Index: Tools/msi/msilib.py =================================================================== --- Tools/msi/msilib.py (revision 88279) +++ Tools/msi/msilib.py (working copy) @@ -4,7 +4,6 @@ import win32com.client.gencache import win32com.client import pythoncom, pywintypes -from win32com.client import constants import re, string, os, sets, glob, subprocess, sys, _winreg, struct try: @@ -29,6 +28,18 @@ knownbits = datasizemask | type_valid | type_localizable | \ typemask | type_nullable | type_key +# Constants from Windows Installer SDK +msiOpenDatabaseModeReadOnly = 0 +msiOpenDatabaseModeTransact = 1 +msiOpenDatabaseModeDirect = 2 +msiOpenDatabaseModeCreate = 3 +msiColumnInfoNames = 0 +msiColumnInfoTypes = 1 +msiReadStreamInteger = 0 +msiReadStreamBytes = 1 +msiViewModifyInsert = 1 +msidbFileAttributesVital = 512 + # Summary Info Property IDs PID_CODEPAGE=1 PID_TITLE=2 @@ -141,8 +152,7 @@ def gen_schema(destpath, schemapath): d = MakeInstaller() - schema = d.OpenDatabase(schemapath, - win32com.client.constants.msiOpenDatabaseModeReadOnly) + schema = d.OpenDatabase(schemapath, msiOpenDatabaseModeReadOnly) # XXX ORBER BY v=schema.OpenView("SELECT * FROM _Columns") @@ -196,8 +206,7 @@ def gen_sequence(destpath, msipath): dir = os.path.dirname(destpath) d = MakeInstaller() - seqmsi = d.OpenDatabase(msipath, - win32com.client.constants.msiOpenDatabaseModeReadOnly) + seqmsi = d.OpenDatabase(msipath, msiOpenDatabaseModeReadOnly) v = seqmsi.OpenView("SELECT * FROM _Tables"); v.Execute(None) @@ -212,7 +221,7 @@ f.write("%s = [\n" % table) v1 = seqmsi.OpenView("SELECT * FROM `%s`" % table) v1.Execute(None) - info = v1.ColumnInfo(constants.msiColumnInfoTypes) + info = v1.ColumnInfo(msiColumnInfoTypes) while 1: r = v1.Fetch() if not r:break @@ -226,7 +235,7 @@ rec.append(r.StringData(i)) elif info.StringData(i)[0]=="v": size = r.DataSize(i) - bytes = r.ReadStream(i, size, constants.msiReadStreamBytes) + bytes = r.ReadStream(i, size, msiReadStreamBytes) bytes = bytes.encode("latin-1") # binary data represented "as-is" if table == "Binary": fname = rec[0]+".bin" @@ -275,7 +284,7 @@ r.SetStream(i+1, field.name) else: raise TypeError, "Unsupported type %s" % field.__class__.__name__ - v.Modify(win32com.client.constants.msiViewModifyInsert, r) + v.Modify(msiViewModifyInsert, r) r.ClearData() v.Close() @@ -298,8 +307,7 @@ ProductCode = ProductCode.upper() d = MakeInstaller() # Create the database - db = d.OpenDatabase(name, - win32com.client.constants.msiOpenDatabaseModeCreate) + db = d.OpenDatabase(name, msiOpenDatabaseModeCreate) # Create the tables for t in schema.tables: t.create(db) @@ -538,7 +546,7 @@ short = self.make_short(file) full = "%s|%s" % (short, file) filesize = os.stat(absolute).st_size - # constants.msidbFileAttributesVital + # msidbFileAttributesVital # Compressed omitted, since it is the database default # could add r/o, system, hidden attributes = 512 From brett at python.org Mon Jan 31 19:38:57 2011 From: brett at python.org (Brett Cannon) Date: Mon, 31 Jan 2011 10:38:57 -0800 Subject: [Python-Dev] Issue #11051: system calls per import In-Reply-To: <20110131134300.2babc577@pitrou.net> References: <1296377778.24415.4.camel@marge> <4D452E71.6070401@python.org> 
                              
                              <1296405345.24507.9.camel@marge> <4D45D6E1.6030906@canterbury.ac.nz> <4D4665B9.9000108@v.loewis.de> 
                              
                              <20110131134300.2babc577@pitrou.net> Message-ID: 
                              
                              On Mon, Jan 31, 2011 at 04:43, Antoine Pitrou 
                              
                              wrote: > On Mon, 31 Jan 2011 00:08:25 -0800 > Guido van Rossum 
                              
                              wrote: >> >> (Basically I am biased to believe that stat() is a pretty slow system >> call -- this may just be old NFS lore though.) > > I don't know about NFS, but starting a Python interpreter located on a > Samba share from a Windows VM is quite slow too. > I think Martin is right for the common case: on a local filesystem on a > modern Unix, stat() is certainly very fast. Remote or > distributed filesystems seem to be more of a problem. I should mention that I have considered implementing a caching finder and loader for filesystems in importlib for people to optionally install to use for themselves. The real trick, though, is should it only cache hits, misses, or both? Regardless, though, it would be a very simple mixin or subclass to implement if there is demand for this sort of thing. And as for the zipfile being faster, that's true (I have incomplete benchmarks in importlib that you can use if people want to measure this stuff themselves, although you will need to tweak them to run against a zipfile). From amauryfa at gmail.com Mon Jan 31 19:58:49 2011 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Mon, 31 Jan 2011 19:58:49 +0100 Subject: [Python-Dev] MSI: Remove dependency from win32com.client module (issue4080047) In-Reply-To: <90e6ba6e85f2cbfc00049b2875bf@google.com> References: <90e6ba6e85f2cbfc00049b2875bf@google.com> Message-ID: 
                              
                              Hi, 2011/1/31 
                              
                              : > Reviewers: , > > Please review this at http://codereview.appspot.com/4080047/ [...] It looks good, but did you create an item in the issue tracker? -- Amaury Forgeot d'Arc From georg.brandl at gmail.com Mon Jan 31 20:05:29 2011 From: georg.brandl at gmail.com (georg.brandl at gmail.com) Date: Mon, 31 Jan 2011 19:05:29 +0000 Subject: [Python-Dev] MSI: Remove dependency from win32com.client module (issue4080047) Message-ID: <000325574bc270e521049b2919d9@google.com> Is there a bugs.python.org issue for this? http://codereview.appspot.com/4080047/ From techtonik at gmail.com Mon Jan 31 21:45:45 2011 From: techtonik at gmail.com (techtonik at gmail.com) Date: Mon, 31 Jan 2011 20:45:45 +0000 Subject: [Python-Dev] MSI: Remove dependency from win32com.client module (issue4080047) Message-ID: <20cf30434772fe60a6049b2a7f9a@google.com> There is no b.p.o issue as it's not a bug, but a tiny copy/paste patch to clean up the code a bit while I am trying to understand how to add Python to the PATH. I see no reason for b.p.o bureaucracy. Mercurial-style workflow [1] is more beneficial to development as it doesn't require switching from console to browser for submitting changes. This way tiny changes can be integrated/updated more rapidly. 1. http://mercurial.selenic.com/wiki/ContributingChanges#The_basics:_patches_by_email http://codereview.appspot.com/4080047/ From brian.curtin at gmail.com Mon Jan 31 21:49:42 2011 From: brian.curtin at gmail.com (Brian Curtin) Date: Mon, 31 Jan 2011 14:49:42 -0600 Subject: [Python-Dev] MSI: Remove dependency from win32com.client module (issue4080047) In-Reply-To: <20cf30434772fe60a6049b2a7f9a@google.com> References: <20cf30434772fe60a6049b2a7f9a@google.com> Message-ID: 
                              
                              On Mon, Jan 31, 2011 at 14:45, 
                              
                              wrote: > There is no b.p.o issue as it's not a bug, but a tiny copy/paste patch > to clean up the code a bit while I am trying to understand how to add > Python to the PATH. > > I see no reason for b.p.o bureaucracy. Mercurial-style workflow [1] is > more beneficial to development as it doesn't require switching from > console to browser for submitting changes. This way tiny changes can be > integrated/updated more rapidly. > > 1. > > http://mercurial.selenic.com/wiki/ContributingChanges#The_basics:_patches_by_email > > > http://codereview.appspot.com/4080047/ Please create an issue. -------------- next part -------------- An HTML attachment was scrubbed... URL: 
                              
                              From solipsis at pitrou.net Mon Jan 31 21:54:06 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 31 Jan 2011 21:54:06 +0100 Subject: [Python-Dev] MSI: Remove dependency from win32com.client module (issue4080047) References: <20cf30434772fe60a6049b2a7f9a@google.com> Message-ID: <20110131215406.5c597a50@pitrou.net> On Mon, 31 Jan 2011 20:45:45 +0000 techtonik at gmail.com wrote: > I see no reason for b.p.o bureaucracy. Mercurial-style workflow [1] is > more beneficial to development as it doesn't require switching from > console to browser for submitting changes. Ok, why don't you contribute to Mercurial instead? From g.brandl at gmx.net Mon Jan 31 21:58:43 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 31 Jan 2011 21:58:43 +0100 Subject: [Python-Dev] MSI: Remove dependency from win32com.client module (issue4080047) In-Reply-To: <20cf30434772fe60a6049b2a7f9a@google.com> References: <20cf30434772fe60a6049b2a7f9a@google.com> Message-ID: 
                              
                              Am 31.01.2011 21:45, schrieb techtonik at gmail.com: > There is no b.p.o issue as it's not a bug, but a tiny copy/paste patch > to clean up the code a bit while I am trying to understand how to add > Python to the PATH. > > I see no reason for b.p.o bureaucracy. Mercurial-style workflow [1] is > more beneficial to development as it doesn't require switching from > console to browser for submitting changes. This way tiny changes can be > integrated/updated more rapidly. The tracker is not bureaucracy, it's how our development process works. I know that Mercurial uses a different process, with patches always going to the mailing list and being reviewed there, but that would be way too much volume for python-dev considering our number of patches. BTW, you should be able to send emails to report at bugs.python.org in order to create new issues, and attachments will automatically become attached to the bug reports. Georg From ethan at stoneleaf.us Mon Jan 31 22:09:16 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 31 Jan 2011 13:09:16 -0800 Subject: [Python-Dev] MSI: Remove dependency from win32com.client module (issue4080047) In-Reply-To: <20cf30434772fe60a6049b2a7f9a@google.com> References: <20cf30434772fe60a6049b2a7f9a@google.com> Message-ID: <4D4724FC.1040705@stoneleaf.us> techtonik at gmail.com wrote: > I see no reason for b.p.o bureaucracy. It provides a place for discussion, and makes it easier to coordinate multiple efforts. ~Ethan~ From techtonik at gmail.com Mon Jan 31 22:09:03 2011 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 31 Jan 2011 23:09:03 +0200 Subject: [Python-Dev] Finally fix installer to add Python to %PATH% on Windows In-Reply-To: 
                              
                              References: 
                              
                              
                              <4D431724.4010002@voidspace.org.uk> <7DA37C12-D3DA-49B3-996A-017CF304BC5C@gmail.com> 
                              
                              Message-ID: 
                              
                              On Fri, Jan 28, 2011 at 10:34 PM, Christian Heimes 
                              
                              wrote: > Am 28.01.2011 20:29, schrieb Raymond Hettinger: >> At the very least, we should add some prominent instructions for getting the command line version up and running. > > /me pops out of Guido's time machine and says: "execute > Tools/scripts/win_add2path.py" > > I'm -1 on adding Python to %PATH%. The private MSVCRT DLLs may lead to > unexpected side effects and it doesn't scale at all. Can you explain that part? There are no any MSVCRT DLLs in my Python26+ installation directories. > What about people > with more than one Python installation? I suggest that we add a single > user specific directory or a global directory to %PATH% for all > installations. Then the Python installer or 3rd party modules can drop > executables like python3.3.exe or plip-3.3.exe into this directory. python33.exe, but user story about people with more than one Python installation is a different one. > A > .bat file won't do good because .bat files must be called with "call > python33.bat" from another .bat file or the first one gets terminated. Wow. I've spent so many years in Windows console and didn't know that. Thanks. > We can even use a single and simple executable as template for all tasks: > > ?* get registry key from resource section of the executable > ?* use the registry key to lookup the location and name of pythonXX.dll > ?* load DLL > ?* get optional dotted module name for resource section > ?* either fire up interpreter as shell, with **argv or -m > dotted.module.name **argv > > Done ;) Actually, I would like to see the code that dynamically finds pythonXX.dll that is available on the system, and loads it into memory. This will be extremely useful for writing 3rd party application plugins in Python. Plugins that they only work when Python is installed and it doesn't really matter which Python version is there. But that is another story also. -- anatoly t. From techtonik at gmail.com Mon Jan 31 22:13:47 2011 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 31 Jan 2011 23:13:47 +0200 Subject: [Python-Dev] Finally fix installer to add Python to %PATH% on Windows In-Reply-To: 
                              
                              References: 
                              
                              
                              <4D431724.4010002@voidspace.org.uk> <7DA37C12-D3DA-49B3-996A-017CF304BC5C@gmail.com> 
                              
                              
                              Message-ID: 
                              
                              Ok. Here is the patch. I used Orca to reverse installer tables of Mercurial MSI and inserted similar entry for Python. Also available for review at: http://codereview.appspot.com/4023055 -- anatoly t. -------------- next part -------------- Index: Tools/msi/msi.py =================================================================== --- Tools/msi/msi.py (revision 88279) +++ Tools/msi/msi.py (working copy) @@ -463,6 +463,11 @@ ("CompileGrammar", "COMPILEALL", 6802), ]) + # Add target dir to PATH + add_data(db, "Environment", + [("Environmnent", "=-*PATH", "[~];[TARGETDIR]", "python.exe"), + ]) + ##################################################################### # Standard dialogs: FatalError, UserExit, ExitDialog fatal=PyDialog(db, "FatalError", x, y, w, h, modal, title, From brian.curtin at gmail.com Mon Jan 31 22:24:33 2011 From: brian.curtin at gmail.com (Brian Curtin) Date: Mon, 31 Jan 2011 15:24:33 -0600 Subject: [Python-Dev] Finally fix installer to add Python to %PATH% on Windows In-Reply-To: 
                              
                              References: 
                              
                              
                              <4D431724.4010002@voidspace.org.uk> <7DA37C12-D3DA-49B3-996A-017CF304BC5C@gmail.com> 
                              
                              
                              
                              Message-ID: 
                              
                              On Mon, Jan 31, 2011 at 15:13, anatoly techtonik 
                              
                              wrote: > Ok. Here is the patch. I used Orca to reverse installer tables of > Mercurial MSI and inserted similar entry for Python. > > Also available for review at: http://codereview.appspot.com/4023055 > -- > anatoly t. That's the easy part. It doesn't cover any of the real issues with doing this. -------------- next part -------------- An HTML attachment was scrubbed... URL: 
                              
                              From techtonik at gmail.com Mon Jan 31 22:43:28 2011 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 31 Jan 2011 23:43:28 +0200 Subject: [Python-Dev] Finally fix installer to add Python to %PATH% on Windows In-Reply-To: 
                              
                              References: 
                              
                              
                              <4D431724.4010002@voidspace.org.uk> <7DA37C12-D3DA-49B3-996A-017CF304BC5C@gmail.com> 
                              
                              
                              
                              
                              Message-ID: 
                              
                              On Mon, Jan 31, 2011 at 11:24 PM, Brian Curtin 
                              
                              wrote: > On Mon, Jan 31, 2011 at 15:13, anatoly techtonik 
                              
                              > wrote: >> >> Ok. Here is the patch. I used Orca to reverse installer tables of >> Mercurial MSI and inserted similar entry for Python. >> >> Also available for review at: http://codereview.appspot.com/4023055 >> -- >> anatoly t. > > That's the easy part. It doesn't cover any of the real issues with doing > this. Please be more specific. It will also help if you integrate this part while it's still hot. -- anatoly t. From brian.curtin at gmail.com Mon Jan 31 22:49:49 2011 From: brian.curtin at gmail.com (Brian Curtin) Date: Mon, 31 Jan 2011 15:49:49 -0600 Subject: [Python-Dev] Finally fix installer to add Python to %PATH% on Windows In-Reply-To: 
                              
                              References: 
                              
                              
                              <4D431724.4010002@voidspace.org.uk> <7DA37C12-D3DA-49B3-996A-017CF304BC5C@gmail.com> 
                              
                              
                              
                              
                              
                              Message-ID: 
                              
                              On Mon, Jan 31, 2011 at 15:43, anatoly techtonik 
                              
                              wrote: > On Mon, Jan 31, 2011 at 11:24 PM, Brian Curtin 
                              
                              > wrote: > > On Mon, Jan 31, 2011 at 15:13, anatoly techtonik 
                              
                              > > wrote: > >> > >> Ok. Here is the patch. I used Orca to reverse installer tables of > >> Mercurial MSI and inserted similar entry for Python. > >> > >> Also available for review at: http://codereview.appspot.com/4023055 > >> -- > >> anatoly t. > > > > That's the easy part. It doesn't cover any of the real issues with doing > > this. > > Please be more specific. It will also help if you integrate this part > while it's still hot. > -- > anatoly t. > There are numerous comments in the various PATH-related issues on the issue tracker, and many of them are duplicated in this very thread. -------------- next part -------------- An HTML attachment was scrubbed... URL: 
                              
                              From techtonik at gmail.com Mon Jan 31 22:50:18 2011 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 31 Jan 2011 23:50:18 +0200 Subject: [Python-Dev] Mercurial style patch submission (Was: MSI: Remove dependency from win32com.client module (issue4080047)) Message-ID: 
                              
                              On Mon, Jan 31, 2011 at 10:54 PM, Antoine Pitrou 
                              
                              wrote: > On Mon, 31 Jan 2011 20:45:45 +0000 > techtonik at gmail.com wrote: >> I see no reason for b.p.o bureaucracy. Mercurial-style workflow [1] is >> more beneficial to development as it doesn't require switching from >> console to browser for submitting changes. > > Ok, why don't you contribute to Mercurial instead? If you don't want to receive a stupid answer, why don't you read the link and say what you don't like in this approach in a constructive manner? http://mercurial.selenic.com/wiki/ContributingChanges#The_basics:_patches_by_email -- anatoly t. From techtonik at gmail.com Mon Jan 31 22:58:20 2011 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 31 Jan 2011 23:58:20 +0200 Subject: [Python-Dev] MSI: Remove dependency from win32com.client module (issue4080047) In-Reply-To: <4D4724FC.1040705@stoneleaf.us> References: <20cf30434772fe60a6049b2a7f9a@google.com> <4D4724FC.1040705@stoneleaf.us> Message-ID: 
                              
                              On Mon, Jan 31, 2011 at 11:09 PM, Ethan Furman 
                              
                              wrote: > techtonik at gmail.com wrote: >> >> I see no reason for b.p.o bureaucracy. > > It provides a place for discussion, and makes it easier to coordinate > multiple efforts. Code review system provides a better space for discussion if we are speaking about simple code cleanup. To me polluting tracker with the issues that are neither bugs nor feature requests only makes bug triaging process and search more cumbersome. -- anatoly t. From techtonik at gmail.com Mon Jan 31 23:05:12 2011 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 1 Feb 2011 00:05:12 +0200 Subject: [Python-Dev] MSI: Remove dependency from win32com.client module (issue4080047) In-Reply-To: 
                              
                              References: <20cf30434772fe60a6049b2a7f9a@google.com> 
                              
                              Message-ID: 
                              
                              On Mon, Jan 31, 2011 at 10:58 PM, Georg Brandl 
                              
                              wrote: > Am 31.01.2011 21:45, schrieb techtonik at gmail.com: >> There is no b.p.o issue as it's not a bug, but a tiny copy/paste patch >> to clean up the code a bit while I am trying to understand how to add >> Python to the PATH. >> >> I see no reason for b.p.o bureaucracy. Mercurial-style workflow [1] is >> more beneficial to development as it doesn't require switching from >> console to browser for submitting changes. This way tiny changes can be >> integrated/updated more rapidly. > > The tracker is not bureaucracy, it's how our development process works. Don't you want to improve this process? Code review system is a much better place to review patches than mailing list or bug tracker. Especially patches that are not related to actual bugs. > I know that Mercurial uses a different process, with patches always going > to the mailing list and being reviewed there, but that would be way too > much volume for python-dev considering our number of patches. Seems reasonable. Do you have any stats how many patches are sent weekly and how many of them are actually integrated? > BTW, you should be able to send emails to report at bugs.python.org in order > to create new issues, and attachments will automatically become attached > to the bug reports. Thanks. I'll keep this in mind. -- anatoly t. From brian.curtin at gmail.com Mon Jan 31 23:09:57 2011 From: brian.curtin at gmail.com (Brian Curtin) Date: Mon, 31 Jan 2011 16:09:57 -0600 Subject: [Python-Dev] Mercurial style patch submission (Was: MSI: Remove dependency from win32com.client module (issue4080047)) In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: 
                              
                              On Mon, Jan 31, 2011 at 15:50, anatoly techtonik 
                              
                              wrote: > On Mon, Jan 31, 2011 at 10:54 PM, Antoine Pitrou 
                              
                              > wrote: > > On Mon, 31 Jan 2011 20:45:45 +0000 > > techtonik at gmail.com wrote: > >> I see no reason for b.p.o bureaucracy. Mercurial-style workflow [1] is > >> more beneficial to development as it doesn't require switching from > >> console to browser for submitting changes. > > > > Ok, why don't you contribute to Mercurial instead? > > If you don't want to receive a stupid answer, why don't you read the > link and say what you don't like in this approach in a constructive > manner? > > > http://mercurial.selenic.com/wiki/ContributingChanges#The_basics:_patches_by_email > -- > anatoly t. >>> Don't send your patch to the BugTracker
                              
                              - it can't be reviewed there, so it won't go anywhere! We do fine with reviews on the tracker, and there has been some on and off work on integrating Rietveld. For the people actually doing the work here, accepting patches on the tracker and dealing with them there has been a reasonably effective workflow, enough that we don't see a need to change it. >>> Patches go to mercurial-devel at selenic.com
                              
                              - no subscription necessary! As you were directed to in an earlier email by Georg, there is now a way to report bugs via email without requiring any subscription. *report*@*bugs*.* python*.*org is the address.* * * *>>> *Because this is a community project and our developers are very busy, patches will sometimes fall through the cracks. If you've gotten no response to your patch after a few days, feel free to resend it. This is true of any workflow on just about any open source project. Whether it's email or a bug tracker, not everything is going to be acknowledged, reviewed, fixed, or rejected immediately. We feel that the tracker allows us to, well, keep track of things. It works for us. What they do works for them, and I'm sure it works great. Could it work for python-dev? Maybe. Is it worth changing anything when no one who is doing the actual work has voiced a need for change? Absolutely not. -------------- next part -------------- An HTML attachment was scrubbed... URL: 
                              
                              From solipsis at pitrou.net Mon Jan 31 23:17:52 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 31 Jan 2011 23:17:52 +0100 Subject: [Python-Dev] Mercurial style patch submission (Was: MSI: Remove dependency from win32com.client module (issue4080047)) In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: <20110131231752.24887e1e@pitrou.net> On Mon, 31 Jan 2011 23:50:18 +0200 anatoly techtonik 
                              
                              wrote: > On Mon, Jan 31, 2011 at 10:54 PM, Antoine Pitrou 
                              
                              wrote: > > On Mon, 31 Jan 2011 20:45:45 +0000 > > techtonik at gmail.com wrote: > >> I see no reason for b.p.o bureaucracy. Mercurial-style workflow [1] is > >> more beneficial to development as it doesn't require switching from > >> console to browser for submitting changes. > > > > Ok, why don't you contribute to Mercurial instead? > > If you don't want to receive a stupid answer, why don't you read the > link and say what you don't like in this approach in a constructive > manner? Very simple: I don't want to be spammed with tons of patches, patch reviews, and issue comments. Also, I want the history of issue discussions to be easily accessible from permanent, issue-specific URLs, rather than search through mailing-list archives to understand why a change was made. I appreciate that you refrained from giving a stupid answer, however. From martin at v.loewis.de Mon Jan 31 23:45:12 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 31 Jan 2011 23:45:12 +0100 Subject: [Python-Dev] Issue #11051: system calls per import In-Reply-To: 
                              
                              References: <1296377778.24415.4.camel@marge> <4D452E71.6070401@python.org> 
                              
                              <1296405345.24507.9.camel@marge> <4D45D6E1.6030906@canterbury.ac.nz> <4D4665B9.9000108@v.loewis.de> 
                              
                              Message-ID: <4D473B78.8080408@v.loewis.de> > Another thing to consider: on App Engine (which despite of all its > architectural weirdness uses a -- mostly -- standard Linux filesystem > for the Python code of the app) someone measured that importing from a > zipfile is much faster than importing from the filesystem. I would > imagine this extends to other contexts too, and it makes sense because > the zipfile directory gets cached in memory so no stat() calls are > necessary. Of course, you can't know until you measure, and then you only know about the specific case. However, I think you can't really compare zip reading with directory reading - I'd expect that reading a zip directory is signficantly faster than reading the directory contents of the zip file unpacked, just because this is so many fewer layers of indirection. Regards, Martin From benjamin at python.org Mon Jan 31 23:58:30 2011 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 31 Jan 2011 16:58:30 -0600 Subject: [Python-Dev] Mercurial style patch submission (Was: MSI: Remove dependency from win32com.client module (issue4080047)) In-Reply-To: 
                              
                              References: 
                              
                              Message-ID: 
                              
                              2011/1/31 anatoly techtonik 
                              
                              : > On Mon, Jan 31, 2011 at 10:54 PM, Antoine Pitrou 
                              
                              wrote: >> On Mon, 31 Jan 2011 20:45:45 +0000 >> techtonik at gmail.com wrote: >>> I see no reason for b.p.o bureaucracy. Mercurial-style workflow [1] is >>> more beneficial to development as it doesn't require switching from >>> console to browser for submitting changes. >> >> Ok, why don't you contribute to Mercurial instead? > > If you don't want to receive a stupid answer, why don't you read the > link and say what you don't like in this approach in a constructive > manner? As I understand it, there used to be patches at python.org. I'm not sure why this was discontinued, so perhaps someone more senior should chime in. :) -- Regards, Benjamin From benjamin at python.org Mon Jan 31 23:59:14 2011 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 31 Jan 2011 16:59:14 -0600 Subject: [Python-Dev] MSI: Remove dependency from win32com.client module (issue4080047) In-Reply-To: 
                              
                              References: <20cf30434772fe60a6049b2a7f9a@google.com> <4D4724FC.1040705@stoneleaf.us> 
                              
                              Message-ID: 
                              
                              2011/1/31 anatoly techtonik 
                              
                              : > On Mon, Jan 31, 2011 at 11:09 PM, Ethan Furman 
                              
                              wrote: >> techtonik at gmail.com wrote: >>> >>> I see no reason for b.p.o bureaucracy. >> >> It provides a place for discussion, and makes it easier to coordinate >> multiple efforts. > > Code review system provides a better space for discussion if we are > speaking about simple code cleanup. To me polluting tracker with the > issues that are neither bugs nor feature requests only makes bug > triaging process and search more cumbersome. If it's not a bug or a feature request, why does it need to change? -- Regards, Benjamin

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4