A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2010-November.txt below:

wrote: > On 11/11/2010 16:07, Hirokazu Yamamoto wrote: > > Hello. Is it possible to remove Win32 ANSI API (ie: GetFileAttributesA) > > and only use Win32 WIDE API (ie: GetFileAttributesW)? > > Mainly in posixmodule.c. > > I think we can simplify the code hugely. (This means droping bytes > > support for os.stat etc on windows) > > > > # I recently did it for winsound.PlaySound with MvL's approval > > +1 from me How do you support cross-platform code using bytes filenames? IIRC, it has already been argued that it was an important feature. Many filesystem-related utilities might prefer to handle filenames in bytes form. ("winsound" is a Windows-specific module so that wasn't a concern obviously) Regards Antoine. From merwok at netwok.org Thu Nov 11 18:38:11 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Thu, 11 Nov 2010 18:38:11 +0100 Subject: [Python-Dev] [Python-checkins] r86348 - in python/branches/py3k/Lib: test/test_xml_etree.py xml/etree/ElementTree.py In-Reply-To: <20101109234842.GA1068@rubuntu> References: <20101109023700.32DC1EEA06@mail.python.org> <4CD9A521.9030200@netwok.org> <20101109234842.GA1068@rubuntu> Message-ID: <4CDC2A03.1080404@netwok.org> >> Shouldn?t this include an entry in NEWS and maybe in ACKS? > It was a very simple bug fix (caused due to an overlook initially), so > did not add NEWS/ACKS. For features, larger fixes or complete patches, > I the add NEWS and ACKS as appropriate. Thanks for the reply. Now I?m unsure about the rules for adding NEWS entries: some bugs are important but have a very simple fix (see #1718574 for an example). I guess I?ll just always add an entry :) Brett, maybe this is something to cover in the dev docs. make-patchcheck-ly yours From brett at python.org Thu Nov 11 18:56:11 2010 From: brett at python.org (Brett Cannon) Date: Thu, 11 Nov 2010 09:56:11 -0800 Subject: [Python-Dev] [Python-checkins] r86348 - in python/branches/py3k/Lib: test/test_xml_etree.py xml/etree/ElementTree.py In-Reply-To: <4CDC2A03.1080404@netwok.org> References: <20101109023700.32DC1EEA06@mail.python.org> <4CD9A521.9030200@netwok.org> <20101109234842.GA1068@rubuntu> <4CDC2A03.1080404@netwok.org> Message-ID: On Thu, Nov 11, 2010 at 09:38, ?ric Araujo wrote: >>> Shouldn?t this include an entry in NEWS and maybe in ACKS? >> It was a very simple bug fix (caused due to an overlook initially), so >> did not add NEWS/ACKS. For features, larger fixes or complete patches, >> I the add NEWS and ACKS as appropriate. > > Thanks for the reply. ?Now I?m unsure about the rules for adding NEWS > entries: some bugs are important but have a very simple fix (see > #1718574 for an example). ?I guess I?ll just always add an entry :) > > Brett, maybe this is something to cover in the dev docs. I just follow Guido's own personal rule: if the fix required thought they should go into Misc/ACKS. From alexander.belopolsky at gmail.com Thu Nov 11 19:01:10 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 11 Nov 2010 13:01:10 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CDC0950.5040309@voidspace.org.uk> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> Message-ID: 2010/11/11 Michael Foord : .. >> You mean runtime automation, e.g. creating __all__ on the fly omitting >> underscored names? >> > Writing code to generate a __all__ that duplicates the default behaviour > seems redundant to me. > FWIW, I like having __all__ at the top of the module. It feels like a table of contents at the start of a chapter. In some cases it may also serve as an optimization when len(__all__) is much smaller than len(__dict__). I also don't like _ prefix to become an exclusive means to express privateness. I think the current definition of "public names" is a good one and just needs to be made more visible in the docs. If the module defines __all__, that should be the ultimate answer to what is public in that module. (Users should learn to use help(module) instead of dir(module) for API discovery.) If __all__ is not defined in the module, I think it is good to introduce it after a careful review of what it should contain. And __all__ should never contain names that start with _. From merwok at netwok.org Thu Nov 11 19:10:43 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Thu, 11 Nov 2010 19:10:43 +0100 Subject: [Python-Dev] [Python-checkins] r86348 - in python/branches/py3k/Lib: test/test_xml_etree.py xml/etree/ElementTree.py In-Reply-To: References: <20101109023700.32DC1EEA06@mail.python.org> <4CD9A521.9030200@netwok.org> <20101109234842.GA1068@rubuntu> <4CDC2A03.1080404@netwok.org> Message-ID: <4CDC31A3.1020306@netwok.org> > I just follow Guido's own personal rule: if the fix required thought > they should go into Misc/ACKS. Okay. Same rule for NEWS? From brett at python.org Thu Nov 11 19:16:04 2010 From: brett at python.org (Brett Cannon) Date: Thu, 11 Nov 2010 10:16:04 -0800 Subject: [Python-Dev] [Python-checkins] r86348 - in python/branches/py3k/Lib: test/test_xml_etree.py xml/etree/ElementTree.py In-Reply-To: <4CDC31A3.1020306@netwok.org> References: <20101109023700.32DC1EEA06@mail.python.org> <4CD9A521.9030200@netwok.org> <20101109234842.GA1068@rubuntu> <4CDC2A03.1080404@netwok.org> <4CDC31A3.1020306@netwok.org> Message-ID: On Thu, Nov 11, 2010 at 10:10, ?ric Araujo wrote: >> I just follow Guido's own personal rule: if the fix required thought >> they should go into Misc/ACKS. > > Okay. ?Same rule for NEWS? > > I do a NEWS entry if a bug was fixed or semantics changed/added for anything public (e.g., I don't do an entry for every little clarification in the docs or new tests fixed or written). From steve at pearwood.info Thu Nov 11 19:16:16 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 12 Nov 2010 05:16:16 +1100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> Message-ID: <4CDC32F0.3010500@pearwood.info> Nick Coghlan wrote: > My personal opinion is that we should be trying to get the standard > library to the point where __all__ definitions are unnecessary - if a > name isn't in __all__, it should start with an underscore (and if that > is true, then the __all__ definition becomes effectively redundant). You don't *need* to define __all__ -- if you don't, import * will import everything that doesn't start with a leading underscore. __all__ is only useful when you want more control over what is or isn't imported. If you don't need that control, just don't define __all__, and the problem is solved. > That way, all sources of information (docs, dir(), help(), import *) > give the same answer as to what constitutes the public API. I disagree with the underlying assumption that import * need necessarily import the entire public API. That's not how I use it in my modules, and the option should be available to std library modules as well. When I create a module, I distinguish between three categories of functions: * private, which start with an underscore; * the core public API, which is listed in __all__; and * support/helper functions, which are not part of the core functionality of the module but are public. If you import * you will get just the core functions. If you want the support functions, you need to use the fully qualified module.name, or otherwise import them yourself. This division of public functions into first and second class API functions is a deliberate design choice on my part. I expect the core functionality to be fully documented. Helper and support functions may not be -- there should be some docs, but doing so is a lower priority. The support functions are public, and available for use, if you go looking for them, but I neither encourage nor discourage users from doing so. I don't see any reason that the standard library should not be permitted to use the same convention. Another couple of objections to getting rid of __all__: If you're proxying modules or built-ins, you may not be able to use a _private name, but you may not want import * to pick up your proxies. I find it annoying to see this: import module as _module _module.func() (instead of import module and merely leaving module out of __all__) I accept that some standard library authors may choose this convention, but I don't want to see it become mandatory. -- Steven From alexander.belopolsky at gmail.com Thu Nov 11 19:40:36 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 11 Nov 2010 13:40:36 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CDBDB0C.6080703@voidspace.org.uk> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> Message-ID: On Thu, Nov 11, 2010 at 7:01 AM, Michael Foord wrote: .. >> I don't understand why everyone seem to have accepted Michael's >> premise that "we don't have a clearly stated policy for what defines >> the public API of standard library modules." ?We do have such a policy >> and it is well known (while the location in the reference manual may >> not be): > > Ha. 14 paragraphs into the grammar reference on the import statement is > perhaps not where developers would go to look for Python standard library > development policy.. Very true. To make it slightly more visible, any objections to the following patch? (It adds "public names (in module globals)" linking to that 14-th paragraph in the index.) Index: Doc/reference/simple_stmts.rst =================================================================== --- Doc/reference/simple_stmts.rst (revision 86409) +++ Doc/reference/simple_stmts.rst (working copy) @@ -794,6 +794,7 @@ namespace of the :keyword:`import` statement.. .. index:: single: __all__ (optional module attribute) +.. index:: public names (in module globals) The *public names* defined by a module are determined by checking the module's namespace for a variable named ``__all__``; if defined, it must be a sequence of From solipsis at pitrou.net Thu Nov 11 19:47:34 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 11 Nov 2010 19:47:34 +0100 Subject: [Python-Dev] Breaking undocumented API References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> Message-ID: <20101111194734.78fb3846@pitrou.net> On Thu, 11 Nov 2010 13:40:36 -0500 Alexander Belopolsky wrote: > On Thu, Nov 11, 2010 at 7:01 AM, Michael Foord > wrote: > .. > >> I don't understand why everyone seem to have accepted Michael's > >> premise that "we don't have a clearly stated policy for what defines > >> the public API of standard library modules." ?We do have such a policy > >> and it is well known (while the location in the reference manual may > >> not be): > > > > Ha. 14 paragraphs into the grammar reference on the import statement is > > perhaps not where developers would go to look for Python standard library > > development policy.. > > Very true. To make it slightly more visible, any objections to the > following patch? (It adds "public names (in module globals)" linking > to that 14-th paragraph in the index.) I think what Michael meant is that the language grammar reference is not (and shouldn't be) the authority on stdlib development policy. To which I would agree. Regards Antoine. From victor.stinner at haypocalc.com Thu Nov 11 20:26:24 2010 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 11 Nov 2010 20:26:24 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <4CDC14C0.6070300@m2.ccsnet.ne.jp> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> Message-ID: <201011112026.24445.victor.stinner@haypocalc.com> On Thursday 11 November 2010 17:07:28 Hirokazu Yamamoto wrote: > Hello. Is it possible to remove Win32 ANSI API (ie: GetFileAttributesA) > and only use Win32 WIDE API (ie: GetFileAttributesW)? > Mainly in posixmodule.c. Even if I hate the MBCS encoding, because it replaces undecodable characters by similar glyphs by default, I'm not certain that it is a good idea to drop the bytes API. Can it be a problem to port programs from Python2 to Python3? Do major Python2 programs/libraries rely on the bytes API? > I think we can simplify the code hugely. (This means droping bytes > support for os.stat etc on windows) Sure, it will divide the number of lines, of the code specific to Windows, by two. Victor From martin at v.loewis.de Thu Nov 11 20:44:52 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 11 Nov 2010 20:44:52 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <20101111174335.3173f67e@pitrou.net> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <4CDC157B.6090406@timgolden.me.uk> <20101111174335.3173f67e@pitrou.net> Message-ID: <4CDC47B4.5080200@v.loewis.de> > How do you support cross-platform code using bytes filenames? > IIRC, it has already been argued that it was an important feature. Many > filesystem-related utilities might prefer to handle filenames in bytes > form. It would be a policy decision. However, I think it is hear-say that filesystem-related utilities might prefer byte file names. On Windows, some files are inaccessible if you constrain yourself to byte filenames, so once people learn about this limitation, I expect them to switch to Unicode filenames on Windows - for the same reason they use byte filenames on Unix (i.e. to be able to access all files correctly). Regards, Martin From martin at v.loewis.de Thu Nov 11 20:50:35 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 11 Nov 2010 20:50:35 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <201011112026.24445.victor.stinner@haypocalc.com> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011112026.24445.victor.stinner@haypocalc.com> Message-ID: <4CDC490B.9060809@v.loewis.de> > Even if I hate the MBCS encoding, because it replaces undecodable characters > by similar glyphs by default, I'm not certain that it is a good idea to drop > the bytes API. Can it be a problem to port programs from Python2 to Python3? > Do major Python2 programs/libraries rely on the bytes API? I don't actually know for a fact, but I expect that the answer is "no". The questions is: where do file names typically come from? My guess is that they come from a) hard-coded strings in the source code b) command line arguments/environment variables c) directory listings [of course, there are other ways, like GUI input, getcwd(), etc] In case a), you have filenames such as ".", e.g. as a parameter to listdir or walk. These will typically be regular strings in Python 2, which become Unicode strings in 3. You would actively need to put b"" prefixes into the code. In case b), they will be Unicode strings in Python 3. In case c), they will be Unicode strings if the argument is a Unicode string. So by induction, file names will be typically unicode. The exception will be libraries/applications which make deliberate attempts to get byte-oriented file names. Regards, Martin From solipsis at pitrou.net Thu Nov 11 21:02:43 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 11 Nov 2010 21:02:43 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <4CDC47B4.5080200@v.loewis.de> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <4CDC157B.6090406@timgolden.me.uk> <20101111174335.3173f67e@pitrou.net> <4CDC47B4.5080200@v.loewis.de> Message-ID: <20101111210243.264ccfb7@pitrou.net> On Thu, 11 Nov 2010 20:44:52 +0100 "Martin v. L?wis" wrote: > > How do you support cross-platform code using bytes filenames? > > IIRC, it has already been argued that it was an important feature. Many > > filesystem-related utilities might prefer to handle filenames in bytes > > form. > > It would be a policy decision. However, I think it is hear-say that > filesystem-related utilities might prefer byte file names. One possible situation is when you receive filenames in bytes form from an external API or tool (or even the contents of a file). If you don't know the encoding, keeping the bytes form is obviously recommended. I don't know how often this happens. Regards Antoine. From eric at trueblade.com Thu Nov 11 21:44:23 2010 From: eric at trueblade.com (Eric Smith) Date: Thu, 11 Nov 2010 15:44:23 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <87fwv9g6li.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CDC55A7.5000001@trueblade.com> On 11/10/2010 11:58 AM, Tres Seaver wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 11/09/2010 11:12 PM, Stephen J. Turnbull wrote: >> Nick Coghlan writes: >> >> > > Module writers who compound the error by expecting to be imported >> > > this way, thereby bogarting the global namespace for their own >> > > purposes, should be fish-slapped. ;) >> > >> > Be prepared to fish-slap all of python-dev then - we use precisely >> > this technique to support optional acceleration modules. The pure >> > Python versions of pairs like profile/_profile and heapq/_heapq >> > include a try/except block at the end that does the equivalent of: >> > >> > try: >> > from _accelerated import * # Allow accelerated overrides >> > except ImportError: >> > pass # Use pure Python versions >> >> But these identifiers will appear at the module level, not global, no? >> Otherwise this technique couldn't be used. I don't really understand >> what Tres is talking about when he writes "modules that expect to be >> imported this way". The *imported* module shouldn't care, no? This >> is an issue for the *importing* code to deal with. > > Right -- "private" star imports aren't the issue for me, because the > same user who creates them is responsible for the other end fo the > stick. I was ranting about library authors who document star imports as > the expected usage pattern for their external users. > > Note that I still wouldn't use star imports in the "private > acceleration" case myself. I would prefer a pattern like: > > - ----------------------- $< ----------------------------- > # spam.py > > # Pure python API implementation > def foo(spat, blarg): > ... > > def bar(qux): > ... > > # Replace with accelearated C implemenataion > try: > import _spam > except ImportError: > pass # accelerated version not available > else: > foo = _spam.foo > bar = _spam.bar > - ----------------------- $< ----------------------------- > > This explicit name remapping catches unintentional erros (e.g., _spam > renames a method) better than the star import. But then you're saying that all implementations of _spam have to support the same API. What if CPython's _spam has foo, bar, and baz, but Jython's only has foo and bar, and IronPython's only has baz? Without getting into special casing or lots of try/catch blocks on individual names, I think import * is the best way to go. Eric. From ncoghlan at gmail.com Thu Nov 11 23:01:32 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 12 Nov 2010 08:01:32 +1000 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <201011112026.24445.victor.stinner@haypocalc.com> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011112026.24445.victor.stinner@haypocalc.com> Message-ID: On Fri, Nov 12, 2010 at 5:26 AM, Victor Stinner wrote: > On Thursday 11 November 2010 17:07:28 Hirokazu Yamamoto wrote: >> Hello. Is it possible to remove Win32 ANSI API (ie: GetFileAttributesA) >> and only use Win32 WIDE API (ie: GetFileAttributesW)? >> Mainly in posixmodule.c. > > Even if I hate the MBCS encoding, because it replaces undecodable characters > by similar glyphs by default, I'm not certain that it is a good idea to drop > the bytes API. Can it be a problem to port programs from Python2 to Python3? > Do major Python2 programs/libraries rely on the bytes API? > >> I think we can simplify the code hugely. (This means droping bytes >> support for os.stat etc on windows) > > Sure, it will divide the number of lines, of the code specific to Windows, by > two. Can we get most of the code cleanup benefit without the backwards compatibility risk by doing the decode from 'mbcs' on our side of the fence? That is, have code that was the C equivalent of: arg_is_bytes = not isinstance(arg, str) if arg_is_bytes: val = _decode_mbcs(arg) # Decoding error checking here else: val = arg # Common processing using WIDE API if arg_is_bytes: result = _encode_mbcs(wide_result) # Encoding error checking here else: result = wide_result Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Thu Nov 11 23:15:36 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 12 Nov 2010 08:15:36 +1000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CDC32F0.3010500@pearwood.info> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <4CDC32F0.3010500@pearwood.info> Message-ID: On Fri, Nov 12, 2010 at 4:16 AM, Steven D'Aprano wrote: > Another couple of objections to getting rid of __all__: > > If you're proxying modules or built-ins, you may not be able to use a > _private name, but you may not want import * to pick up your proxies. > > I find it annoying to see this: > > import module as _module > _module.func() > > (instead of import module and merely leaving module out of __all__) That gets us back to dir() and help() giving the wrong impression of the module's public API though. The issue I have is that the current policy (public APIs may or may not be in all, private APIs may or may not be prefixed by a leading underscore) makes it impossible to reliably extract a module's public API programmatically. If we instead adopt the explicit policy that private APIs are: - imported modules (with the exception of os.path) - any names starting with a leading underscore Then we get the 3 API tiers you describe: core public API in __all__, other public functions and globals without leading underscores, private API with leading underscores (or imported modules). We could even add two additional functions to the inspect module (e.g. getpublicnames() and getimportstarnames()) which applied the relevant filtering rules. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From greg.ewing at canterbury.ac.nz Thu Nov 11 23:24:49 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 12 Nov 2010 11:24:49 +1300 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> Message-ID: <4CDC6D31.2040809@canterbury.ac.nz> Nick Coghlan wrote: > My personal opinion is that we should be trying to get the standard > library to the point where __all__ definitions are unnecessary - if a > name isn't in __all__, it should start with an underscore (and if that > is true, then the __all__ definition becomes effectively redundant). What about names imported from other modules that are used by the module, but not intended for re-export? How would you prevent them from turning up in help() etc. without using __all__? -- Greg From nd at perlig.de Fri Nov 12 08:51:58 2010 From: nd at perlig.de (=?iso-8859-1?q?Andr=E9_Malo?=) Date: Fri, 12 Nov 2010 08:51:58 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <4CDC490B.9060809@v.loewis.de> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011112026.24445.victor.stinner@haypocalc.com> <4CDC490B.9060809@v.loewis.de> Message-ID: <201011120851.58615.nd@perlig.de> On Thursday 11 November 2010 20:50:35 Martin v. L?wis wrote: > > Even if I hate the MBCS encoding, because it replaces undecodable > > characters by similar glyphs by default, I'm not certain that it is a > > good idea to drop the bytes API. Can it be a problem to port programs > > from Python2 to Python3? Do major Python2 programs/libraries rely on the > > bytes API? > > I don't actually know for a fact, but I expect that the answer is "no". > > The questions is: where do file names typically come from? My guess > is that they come from > a) hard-coded strings in the source code > b) command line arguments/environment variables [...] > In case b), they will be Unicode strings in Python 3. But not neccessarily with unicode semantics if I get the discussions about the environment topic right. Additionally: d) Over a socket (like the HTTP protocol) -> Bytes. nd From p.f.moore at gmail.com Fri Nov 12 09:44:03 2010 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 12 Nov 2010 08:44:03 +0000 Subject: [Python-Dev] Issues 9931 and 9055 - test_ttk_guionly and buildbot run as a service Message-ID: Hi, My buildbot has been failing for some time because of these 2 issues, both related to the fact that tests are hanging when run as a service (and hence have no display to open GUI elements on). Both issues have patches, and as far as I am aware, the patches fix the issues reasonably well. What can I do to move these 2 issues forwards? As things stand, my buildbot is not providing a lot of value on the 3.x branch :-( Thanks, Paul. From martin at v.loewis.de Fri Nov 12 09:51:19 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 12 Nov 2010 09:51:19 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <201011120851.58615.nd@perlig.de> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011112026.24445.victor.stinner@haypocalc.com> <4CDC490B.9060809@v.loewis.de> <201011120851.58615.nd@perlig.de> Message-ID: <4CDD0007.7060201@v.loewis.de> > Additionally: > > d) Over a socket (like the HTTP protocol) -> Bytes. Sure. However, you can't really expect that the bytes you receive over the socket are a meaningful filename on your local Windows installation. So it would be a bug in the application to not decode the bytes that you receive before using them as a file name. In a well-specified network protocol, you would know the encoding of the bytes; IETF recommends to use UTF-8 for all new protocols. Using an UTF-8 string as a filename on Windows will create mojibake. Regards, Martin From martin at v.loewis.de Fri Nov 12 10:29:31 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 12 Nov 2010 10:29:31 +0100 Subject: [Python-Dev] buildbot master update Message-ID: <4CDD08FB.3070701@v.loewis.de> As you may have noticed: I updated the buildbot master to release 0.8.2. If you notice any problems, please post them here. Slave operators can upgrade their installations at their own pace; buildbot is highly backwards compatible. As a recommendation, I suggest that slaves run at least at the version that is available in Debian stable (currently 0.7.8). Regards, Martin From martin at v.loewis.de Fri Nov 12 10:32:46 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 12 Nov 2010 10:32:46 +0100 Subject: [Python-Dev] bugs.python.org migration complete Message-ID: <4CDD09BE.8090106@v.loewis.de> bugs.python.org is now on the new hardware. There have been some problems in the migration: the old hardware would start failing before the scheduled migration date, so the migration was done early, causing outage for some people who then the old address in their DNS caches. In addition, there was initially a misconfiguration preventing outgoing IP traffic, particularly preventing outgoing emails from being delivered. This is all fixed now; report any remaining issues to the metatracker. Regards, Martin From hrvoje.niksic at avl.com Fri Nov 12 10:49:48 2010 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Fri, 12 Nov 2010 10:49:48 +0100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CDC6D31.2040809@canterbury.ac.nz> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <4CDC6D31.2040809@canterbury.ac.nz> Message-ID: <4CDD0DBC.4050405@avl.com> On 11/11/2010 11:24 PM, Greg Ewing wrote: > Nick Coghlan wrote: > >> My personal opinion is that we should be trying to get the standard >> library to the point where __all__ definitions are unnecessary - if a >> name isn't in __all__, it should start with an underscore (and if that >> is true, then the __all__ definition becomes effectively redundant). > > What about names imported from other modules that are used by > the module, but not intended for re-export? How would you > prevent them from turning up in help() etc. without using > __all__? import foo as _foo I believe I am not the only one who finds that practice ugly, but I find it just as ugly to underscore-ize every non-public helper function. __all__ is there for a reason, let's use it. Maybe help() could automatically ignore stuff not in __all__, or display it but warn the user of non-public identifiers? From lukasz at langa.pl Fri Nov 12 11:34:01 2010 From: lukasz at langa.pl (=?UTF-8?B?xYF1a2FzeiBMYW5nYQ==?=) Date: Fri, 12 Nov 2010 11:34:01 +0100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <4CDC32F0.3010500@pearwood.info> Message-ID: <4CDD1819.4020306@langa.pl> Am 11.11.2010 23:15, schrieb Nick Coghlan: > If we instead adopt the explicit policy that private APIs are: > - imported modules (with the exception of os.path) > - any names starting with a leading underscore > > Then we get the 3 API tiers you describe: core public API in __all__, > other public functions and globals without leading underscores, > private API with leading underscores (or imported modules). +1 I like this approach *very much*. Let me elaborate: 1. The community knows, understands and accepts _names as private. We need to have _names for private functions and constants because we can change or remove those in later versions. It's very explicit: when the user complains "What, you removed _foo?" we can say "Yes, it was considered an implementation detail *from the start*." And it's hard to beat that argument. It was private from the start. You knew that because the name you called specifies that. If we would be now to proclaim __all__ as a decisive point on what's private and what's not, it makes lives of all Python programmers (I mean the users as well) more complicated. 2. That being said, having help() mark non-underscored names which aren't included in __all__ as private is a good idea, too [1]. I'm a heavy user of interactive API discovery using dir() and help() and this would be definitely welcome. And even for those who don't use those tools, this feature can expose inconsistencies between documentation and code. 3. "import name as _name" or "from x.y import z as _z" is just bad form. There may be valid exceptions but imagine if that would be the default way to do it. Uglier than nights of November. 4. This is why I think considering all imports as private (unless they're in __all__) is a fine example of "practicability beats purity". We could try to conceive a way to expose this information programatically but that's not so important at the moment. [1] As Hrvoje Niksic wrote here: http://mail.python.org/pipermail/python-dev/2010-November/105533.html -- Best regards, ?ukasz Langa From fdrake at acm.org Fri Nov 12 12:23:31 2010 From: fdrake at acm.org (Fred Drake) Date: Fri, 12 Nov 2010 06:23:31 -0500 Subject: [Python-Dev] [Python-checkins] r86429 - python/branches/py3k/Doc/tools/sphinxext/pyspecific.py In-Reply-To: <20101112085712.F3D23EEA2D@mail.python.org> References: <20101112085712.F3D23EEA2D@mail.python.org> Message-ID: On Fri, Nov 12, 2010 at 3:57 AM, georg.brandl wrote in a commit: > Add a deprecated-removed directive that allows to give the version of removal for deprecations. This sounds pretty general-purpose rather than Python-specific. Any chance this will move into Sphinx? I know a few projects that like to deprecate things and would use this. :-) ? -Fred -- Fred L. Drake, Jr.? ? "A storm broke loose in my mind."? --Albert Einstein From victor.stinner at haypocalc.com Fri Nov 12 13:08:30 2010 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 12 Nov 2010 13:08:30 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011112026.24445.victor.stinner@haypocalc.com> Message-ID: <201011121308.30368.victor.stinner@haypocalc.com> On Thursday 11 November 2010 23:01:32 you wrote: > > Sure, it will divide the number of lines, of the code specific to > > Windows, by two. > > Can we get most of the code cleanup benefit without the backwards > compatibility risk by doing the decode from 'mbcs' on our side of the > fence? I created PyUnicode_FSDecoder, a ParseTuple converter used to work on unicode paths, instead of bytes paths. On Windows, this converter uses mbcs encoding in strict mode, whereas Windows converter uses replace error handler to decode, and ignore to encode. So I don't think that we should this converter on Windows. > That is, have code that was the C equivalent of: > > arg_is_bytes = not isinstance(arg, str) > if arg_is_bytes: > val = _decode_mbcs(arg) > # Decoding error checking here > else: > val = arg > # Common processing using WIDE API > if arg_is_bytes: > result = _encode_mbcs(wide_result) > # Encoding error checking here > else: > result = wide_result This doesn't make the code shorter, it may be longer than the actual code, and it is less compliant with the Windows native API... Victor From victor.stinner at haypocalc.com Fri Nov 12 13:13:08 2010 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 12 Nov 2010 13:13:08 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <20101111210243.264ccfb7@pitrou.net> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <4CDC47B4.5080200@v.loewis.de> <20101111210243.264ccfb7@pitrou.net> Message-ID: <201011121313.08741.victor.stinner@haypocalc.com> On Thursday 11 November 2010 21:02:43 Antoine Pitrou wrote: > On Thu, 11 Nov 2010 20:44:52 +0100 > > "Martin v. L?wis" wrote: > > > How do you support cross-platform code using bytes filenames? > > > IIRC, it has already been argued that it was an important feature. Many > > > filesystem-related utilities might prefer to handle filenames in bytes > > > form. > > > > It would be a policy decision. However, I think it is hear-say that > > filesystem-related utilities might prefer byte file names. > > One possible situation is when you receive filenames in bytes form from > an external API or tool (or even the contents of a file). If you don't > know the encoding, keeping the bytes form is obviously recommended. I disagree with you: the filename stored in the binary content/network stream may be encoded with a different code page than the current Windows code page. The application have to decode the filename itself, the application has more information about the right encoding than Windows. Examples: - MKV video stores filenames in utf-8 - ZIP stores filenames in cp437 or utf-8 - tar stores filenames... in the locale encoding (except for PAX format which uses utf-8) - etc. Victor From victor.stinner at haypocalc.com Fri Nov 12 13:15:35 2010 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 12 Nov 2010 13:15:35 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <4CDC490B.9060809@v.loewis.de> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011112026.24445.victor.stinner@haypocalc.com> <4CDC490B.9060809@v.loewis.de> Message-ID: <201011121315.35541.victor.stinner@haypocalc.com> On Thursday 11 November 2010 20:50:35 you wrote: > > Even if I hate the MBCS encoding, because it replaces undecodable > > characters by similar glyphs by default, I'm not certain that it is a > > good idea to drop the bytes API. Can it be a problem to port programs > > from Python2 to Python3? Do major Python2 programs/libraries rely on the > > bytes API? > > I don't actually know for a fact, but I expect that the answer is "no". > > The questions is: where do file names typically come from? My guess > is that they come from > a) hard-coded strings in the source code > b) command line arguments/environment variables > c) directory listings > [of course, there are other ways, like GUI input, getcwd(), etc] > > In case a), you have filenames such as ".", e.g. as a parameter to > listdir or walk. These will typically be regular strings in Python 2, > which become Unicode strings in 3. You would actively need to put b"" > prefixes into the code. > > In case b), they will be Unicode strings in Python 3. > > In case c), they will be Unicode strings if the argument is a Unicode > string. So by induction, file names will be typically unicode. The > exception will be libraries/applications which make deliberate attempts > to get byte-oriented file names. Ok, good answer. In this case, I vote +1 to remove completly the ANSI version from all Python modules. I consider the ANSI version has a compatibility layer for old applications written for MS-Dos or early versions of Windows. Even if these APIs are still widely used in C/C++ applications, the wide versions should always be preferred. Victor From solipsis at pitrou.net Fri Nov 12 14:40:29 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Nov 2010 14:40:29 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <4CDC47B4.5080200@v.loewis.de> <20101111210243.264ccfb7@pitrou.net> <201011121313.08741.victor.stinner@haypocalc.com> Message-ID: <20101112144029.00f6fdfc@pitrou.net> On Fri, 12 Nov 2010 13:13:08 +0100 Victor Stinner wrote: > On Thursday 11 November 2010 21:02:43 Antoine Pitrou wrote: > > On Thu, 11 Nov 2010 20:44:52 +0100 > > > > "Martin v. L?wis" wrote: > > > > How do you support cross-platform code using bytes filenames? > > > > IIRC, it has already been argued that it was an important feature. Many > > > > filesystem-related utilities might prefer to handle filenames in bytes > > > > form. > > > > > > It would be a policy decision. However, I think it is hear-say that > > > filesystem-related utilities might prefer byte file names. > > > > One possible situation is when you receive filenames in bytes form from > > an external API or tool (or even the contents of a file). If you don't > > know the encoding, keeping the bytes form is obviously recommended. > > I disagree with you: the filename stored in the binary content/network stream > may be encoded with a different code page than the current Windows code page. > The application have to decode the filename itself, the application has more > information about the right encoding than Windows. I'm not talking about Windows obviously. POSIX filenames are natively bytes, so if you get a bytes filename from an external source, it makes sense to reuse the bytes form. I think it would be a mistake to allow bytes filenames under POSIX but not under Windows. It makes porting harder. > - tar stores filenames... in the locale encoding (except for PAX format which > uses utf-8) So bytes filenames are useful at least for tar. I'm sure there are many other cases (actually, most kinds of configuration files containing paths would apply). Regards Antoine. From barry at python.org Fri Nov 12 17:15:53 2010 From: barry at python.org (Barry Warsaw) Date: Fri, 12 Nov 2010 11:15:53 -0500 Subject: [Python-Dev] buildbot master update In-Reply-To: <4CDD08FB.3070701@v.loewis.de> References: <4CDD08FB.3070701@v.loewis.de> Message-ID: <20101112111553.44da8c08@mission> On Nov 12, 2010, at 10:29 AM, Martin v. L?wis wrote: >As you may have noticed: I updated the buildbot master to release 0.8.2. >If you notice any problems, please post them here. Pretty! My buildbot seems fine. >Slave operators can upgrade their installations at their own pace; >buildbot is highly backwards compatible. As a recommendation, I suggest >that slaves run at least at the version that is available in Debian >stable (currently 0.7.8). Thanks Martin, for all you do to keep our infrastructure humming along smoothly, including the recent Roundup migration. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From status at bugs.python.org Fri Nov 12 18:07:02 2010 From: status at bugs.python.org (Python tracker) Date: Fri, 12 Nov 2010 18:07:02 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20101112170702.8111B1DBD7@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2010-11-05 - 2010-11-12) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 2526 (+12) closed 19651 (+54) total 22177 (+66) Open issues with patches: 1050 Issues opened (47) ================== #9313: distutils error on MSVC older than 8 http://bugs.python.org/issue9313 reopened by eric.araujo #10252: Fix resource warnings in distutils http://bugs.python.org/issue10252 reopened by eric.araujo #10329: trace.py and unicode in Python 3 http://bugs.python.org/issue10329 reopened by belopolsky #10332: Multiprocessing maxtasksperchild results in hang http://bugs.python.org/issue10332 opened by Jimbofbx #10333: Remove ancient backwards compatibility GC API http://bugs.python.org/issue10333 opened by nascheme #10336: test_xmlrpc fails if gzip is not supported by client http://bugs.python.org/issue10336 opened by ocean-city #10338: test_lib2to3 failure on buildbot x86 debian parallel 3.x: node http://bugs.python.org/issue10338 opened by haypo #10339: test_lib2to3 leaks http://bugs.python.org/issue10339 opened by pitrou #10340: asyncore doesn't properly handle EINVAL on OSX http://bugs.python.org/issue10340 opened by giampaolo.rodola #10342: trace module cannot produce coverage reports for zipped module http://bugs.python.org/issue10342 opened by belopolsky #10344: codecs.readline doesn't care buffering=0 http://bugs.python.org/issue10344 opened by Santiago.Piccinini #10348: multiprocessing: use SysV semaphores on FreeBSD http://bugs.python.org/issue10348 opened by haypo #10349: Error in Module/python.c when building on OpenBSD 4.8 http://bugs.python.org/issue10349 opened by pgurumur #10350: errno is read too late http://bugs.python.org/issue10350 opened by hfuru #10351: Add autocompletion for keys in dictionaries http://bugs.python.org/issue10351 opened by Valery.Khamenya #10354: tempfile.template is broken http://bugs.python.org/issue10354 opened by giampaolo.rodola #10355: SpooledTemporaryFile's name property is broken http://bugs.python.org/issue10355 opened by giampaolo.rodola #10356: decimal.py: hash of -1 http://bugs.python.org/issue10356 opened by skrah #10357: ** and "mapping" are poorly defined in python docs http://bugs.python.org/issue10357 opened by Fergal.Daly #10358: Doc styles for print should only use dark colors http://bugs.python.org/issue10358 opened by fdrake #10359: ISO C cleanup http://bugs.python.org/issue10359 opened by hfuru #10360: _weakrefset.WeakSet.__contains__ should not propagate TypeErro http://bugs.python.org/issue10360 opened by tseaver #10362: AttributeError: addinfourl instance has no attribute 'tell' http://bugs.python.org/issue10362 opened by Valentin.Lorentz #10363: Embedded python, handle (memory) leak http://bugs.python.org/issue10363 opened by martind #10364: Color coding fails after running program. http://bugs.python.org/issue10364 opened by Typo #10365: IDLE Crashes on File Open Dialog when code window closed befor http://bugs.python.org/issue10365 opened by william.barr #10366: Remove unneeded '(object)' from 3.x class examples http://bugs.python.org/issue10366 opened by terry.reedy #10367: "python setup.py sdist upload --show-response" can fail with " http://bugs.python.org/issue10367 opened by jcea #10369: tarfile requires an actual file on disc; a file-like object is http://bugs.python.org/issue10369 opened by strombrg #10371: Deprecate trace module undocumented API http://bugs.python.org/issue10371 opened by belopolsky #10373: Setup Script example incorrect http://bugs.python.org/issue10373 opened by lensart #10374: setup.py caches outdated scripts in the build tree http://bugs.python.org/issue10374 opened by gjb1002 #10375: 2to3 print(single argument) http://bugs.python.org/issue10375 opened by hfuru #10376: ZipFile unzip is unbuffered http://bugs.python.org/issue10376 opened by Jimbofbx #10377: cProfile incorrectly labels its output http://bugs.python.org/issue10377 opened by exarkun #10379: locale.format() input regression http://bugs.python.org/issue10379 opened by barry #10381: Add timezone support to datetime C API http://bugs.python.org/issue10381 opened by belopolsky #10382: Command line error marker misplaced on unicode entry http://bugs.python.org/issue10382 opened by belopolsky #10383: test_os leaks under Windows http://bugs.python.org/issue10383 opened by pitrou #10384: SyntaxError should contain exact location of the invalid chara http://bugs.python.org/issue10384 opened by belopolsky #10385: Mark up "subprocess" as module in its doc http://bugs.python.org/issue10385 opened by belopolsky #10388: spwd returning different value depending on privileges http://bugs.python.org/issue10388 opened by giampaolo.rodola #10391: obj2ast's error handling can lead to python crashing with a C- http://bugs.python.org/issue10391 opened by dmalcolm #10392: GZipFile crash when fileobj.mode is None http://bugs.python.org/issue10392 opened by bgreenlee #10394: subprocess Popen deadlock http://bugs.python.org/issue10394 opened by Christoph.Mathys #10395: os.path.commonprefix broken by design http://bugs.python.org/issue10395 opened by ronaldoussoren #10345: fcntl.ioctl always fails claiming an invalid fd http://bugs.python.org/issue10345 opened by bgamari Most recent 15 issues with no replies (15) ========================================== #10394: subprocess Popen deadlock http://bugs.python.org/issue10394 #10392: GZipFile crash when fileobj.mode is None http://bugs.python.org/issue10392 #10388: spwd returning different value depending on privileges http://bugs.python.org/issue10388 #10384: SyntaxError should contain exact location of the invalid chara http://bugs.python.org/issue10384 #10381: Add timezone support to datetime C API http://bugs.python.org/issue10381 #10377: cProfile incorrectly labels its output http://bugs.python.org/issue10377 #10375: 2to3 print(single argument) http://bugs.python.org/issue10375 #10373: Setup Script example incorrect http://bugs.python.org/issue10373 #10350: errno is read too late http://bugs.python.org/issue10350 #10339: test_lib2to3 leaks http://bugs.python.org/issue10339 #10338: test_lib2to3 failure on buildbot x86 debian parallel 3.x: node http://bugs.python.org/issue10338 #10332: Multiprocessing maxtasksperchild results in hang http://bugs.python.org/issue10332 #10320: printf %qd is nonstandard http://bugs.python.org/issue10320 #10310: signed:1 bitfields rarely make sense http://bugs.python.org/issue10310 #10309: dlmalloc.c needs _GNU_SOURCE for mremap() http://bugs.python.org/issue10309 Most recent 15 issues waiting for review (15) ============================================= #10392: GZipFile crash when fileobj.mode is None http://bugs.python.org/issue10392 #10391: obj2ast's error handling can lead to python crashing with a C- http://bugs.python.org/issue10391 #10385: Mark up "subprocess" as module in its doc http://bugs.python.org/issue10385 #10382: Command line error marker misplaced on unicode entry http://bugs.python.org/issue10382 #10371: Deprecate trace module undocumented API http://bugs.python.org/issue10371 #10369: tarfile requires an actual file on disc; a file-like object is http://bugs.python.org/issue10369 #10360: _weakrefset.WeakSet.__contains__ should not propagate TypeErro http://bugs.python.org/issue10360 #10359: ISO C cleanup http://bugs.python.org/issue10359 #10356: decimal.py: hash of -1 http://bugs.python.org/issue10356 #10354: tempfile.template is broken http://bugs.python.org/issue10354 #10351: Add autocompletion for keys in dictionaries http://bugs.python.org/issue10351 #10350: errno is read too late http://bugs.python.org/issue10350 #10342: trace module cannot produce coverage reports for zipped module http://bugs.python.org/issue10342 #10340: asyncore doesn't properly handle EINVAL on OSX http://bugs.python.org/issue10340 #10329: trace.py and unicode in Python 3 http://bugs.python.org/issue10329 Top 10 most discussed issues (10) ================================= #10329: trace.py and unicode in Python 3 http://bugs.python.org/issue10329 11 msgs #7061: Improve turtle module documentation http://bugs.python.org/issue7061 9 msgs #10354: tempfile.template is broken http://bugs.python.org/issue10354 9 msgs #10359: ISO C cleanup http://bugs.python.org/issue10359 9 msgs #10379: locale.format() input regression http://bugs.python.org/issue10379 9 msgs #10325: PY_LLONG_MAX & co - preprocessor constants or not? http://bugs.python.org/issue10325 8 msgs #5412: extend configparser to support mapping access(__*item__) http://bugs.python.org/issue5412 7 msgs #10252: Fix resource warnings in distutils http://bugs.python.org/issue10252 7 msgs #10349: Error in Module/python.c when building on OpenBSD 4.8 http://bugs.python.org/issue10349 7 msgs #10364: Color coding fails after running program. http://bugs.python.org/issue10364 7 msgs Issues closed (51) ================== #1602: windows console doesn't print utf8 (Py30a2) http://bugs.python.org/issue1602 closed by haypo #1926: NNTPS support in nntplib http://bugs.python.org/issue1926 closed by pitrou #6058: Add cp65001 to encodings/aliases.py http://bugs.python.org/issue6058 closed by haypo #6226: Inconsistent 'file' vs 'stream' kwarg in pprint, other stdlibs http://bugs.python.org/issue6226 closed by eric.araujo #6317: winsound.PlaySound doesn't accept non-unicode string http://bugs.python.org/issue6317 closed by ocean-city #8634: get method for dbm interface http://bugs.python.org/issue8634 closed by eric.araujo #8679: write a distutils to distutils2 converter http://bugs.python.org/issue8679 closed by eric.araujo #8804: http.client should support SSL contexts http://bugs.python.org/issue8804 closed by pitrou #9421: configparser.ConfigParser's getint, getboolean and getfloat do http://bugs.python.org/issue9421 closed by lukasz.langa #9508: python3.2 reversal of distutils reintrocud macos9 support http://bugs.python.org/issue9508 closed by eric.araujo #10008: Two links point to same place http://bugs.python.org/issue10008 closed by georg.brandl #10022: Emit more information in decoded SSL certificates http://bugs.python.org/issue10022 closed by pitrou #10145: float.is_integer is undocumented http://bugs.python.org/issue10145 closed by mark.dickinson #10180: File objects should not pickleable http://bugs.python.org/issue10180 closed by pitrou #10226: urlparse example is wrong http://bugs.python.org/issue10226 closed by orsenthil #10229: Refleak run of test_gettext fails http://bugs.python.org/issue10229 closed by eric.araujo #10232: Tkinter issues with Scrollbar and custom widget list http://bugs.python.org/issue10232 closed by terry.reedy #10245: Fix resource warnings in test_telnetlib http://bugs.python.org/issue10245 closed by orsenthil #10282: IMPLEMENTATION token differently delt with in NNTP capability http://bugs.python.org/issue10282 closed by pitrou #10297: decimal module documentation is misguiding http://bugs.python.org/issue10297 closed by mark.dickinson #10302: Add class-functions to hash many small objects with hashlib http://bugs.python.org/issue10302 closed by gregory.p.smith #10303: small inconsistency in tutorial http://bugs.python.org/issue10303 closed by orsenthil #10304: error in tutorial triple-string example http://bugs.python.org/issue10304 closed by terry.reedy #10311: Signal handlers must preserve errno http://bugs.python.org/issue10311 closed by pitrou #10321: Add support for Message objects and binary data to smtplib.sen http://bugs.python.org/issue10321 closed by r.david.murray #10324: Modules/binascii.c: simplify expressions http://bugs.python.org/issue10324 closed by orsenthil #10327: Abnormal SSL timeouts when using socket timeouts - once again http://bugs.python.org/issue10327 closed by pakal #10330: trace module doesn't work without threads http://bugs.python.org/issue10330 closed by belopolsky #10331: test_gdb failure when warnings printed out http://bugs.python.org/issue10331 closed by dmalcolm #10334: Add new reST directive for links to source code http://bugs.python.org/issue10334 closed by georg.brandl #10335: tokenize.open(): open a file with encoding detected from a cod http://bugs.python.org/issue10335 closed by haypo #10337: testTanh() of test_math fails on "NetBSD 5 i386 3.x" http://bugs.python.org/issue10337 closed by haypo #10341: Remove traces of setuptools http://bugs.python.org/issue10341 closed by eric.araujo #10343: urllib.parse problems with bytes vs str http://bugs.python.org/issue10343 closed by r.david.murray #10346: strange arithmetic behaviour http://bugs.python.org/issue10346 closed by mark.dickinson #10347: regrtest progress counter makes -f option less useful http://bugs.python.org/issue10347 closed by pitrou #10352: rlcompleter.py has no tests in trunk http://bugs.python.org/issue10352 closed by georg.brandl #10353: 2to3 and places argument in unitests assertAlmostEqual http://bugs.python.org/issue10353 closed by r.david.murray #10361: Fix issue 9995 - distutils forces developers to store password http://bugs.python.org/issue10361 closed by eric.araujo #10368: "python setup.py sdist upload --show-response" fails http://bugs.python.org/issue10368 closed by eric.araujo #10370: py3 readlines() reports wrong offset for UnicodeDecodeError http://bugs.python.org/issue10370 closed by haypo #10372: [REGRESSION] test_gc fails in non-debug mode. http://bugs.python.org/issue10372 closed by pitrou #10378: Typo in results of help(divmod) http://bugs.python.org/issue10378 closed by benjamin.peterson #10380: AttributeError: 'module' object has no attribute 'exc_tracebac http://bugs.python.org/issue10380 closed by georg.brandl #10386: token module should define __all__ http://bugs.python.org/issue10386 closed by belopolsky #10387: ConfigParser's getboolean method is broken http://bugs.python.org/issue10387 closed by lukasz.langa #10389: Document rules for use of case in section titles http://bugs.python.org/issue10389 closed by belopolsky #10390: json.load should handle bytes input http://bugs.python.org/issue10390 closed by r.david.murray #10393: "with" statement isn't thread-safe http://bugs.python.org/issue10393 closed by amaury.forgeotdarc #1466065: base64 module ignores non-alphabet characters http://bugs.python.org/issue1466065 closed by r.david.murray #962772: when both maintainer and author provided, author discarded http://bugs.python.org/issue962772 closed by tarek From tjreedy at udel.edu Fri Nov 12 18:07:44 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 12 Nov 2010 12:07:44 -0500 Subject: [Python-Dev] Issues 9931 and 9055 - test_ttk_guionly and buildbot run as a service In-Reply-To: References: Message-ID: On 11/12/2010 3:44 AM, Paul Moore wrote: > Hi, > My buildbot has been failing for some time because of these 2 issues, > both related to the fact that tests are hanging when run as a service > (and hence have no display to open GUI elements on). Both issues have > patches, and as far as I am aware, the patches fix the issues > reasonably well. What can I do to move these 2 issues forwards? As > things stand, my buildbot is not providing a lot of value on the 3.x > branch :-( http://bugs.python.org/issue9055 is marked as a 2.7 issue only, perhaps fixed by Tim Golden's committed patches. Should it be re-versioned for 3.1/2? There is no patch file attached, though perhaps the code in Yamamoto's message is meant as such (but for which version?). So the first thing you could do is clarify the current status and remaining issue on the tracker. http://bugs.python.org/issue9931 by Yamamoto is marked for all 3 versions. It seems to be a similar issue, though marked 'test' rather than 'ctypes'. It does have a patch by him apparently based on his previous comments. The issue has no responses and needs a patch review. So the first thing you could do is to provide one;-). If it looks great (no changes that you can think of) and works great, say so. Then it can move on to commit review stage. PS. Providing links like the above makes it easier for multiple people to take a look and respond. -- Terry Jan Reedy From tjreedy at udel.edu Fri Nov 12 18:11:40 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 12 Nov 2010 12:11:40 -0500 Subject: [Python-Dev] bugs.python.org migration complete In-Reply-To: <4CDD09BE.8090106@v.loewis.de> References: <4CDD09BE.8090106@v.loewis.de> Message-ID: On 11/12/2010 4:32 AM, "Martin v. L?wis" wrote: > bugs.python.org is now on the new hardware. There have been some > problems in the migration: the old hardware would start failing before > the scheduled migration date, so the migration was done early, causing > outage for some people who then the old address in their DNS caches. > In addition, there was initially a misconfiguration preventing outgoing > IP traffic, particularly preventing outgoing emails from being > delivered. This is all fixed now; report any remaining issues to the > metatracker. I got stymied by some of the late failures, but it has been working great, with quick response, since last night. Thanks for the upgrade. -- Terry Jan Reedy From p.f.moore at gmail.com Fri Nov 12 18:25:05 2010 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 12 Nov 2010 17:25:05 +0000 Subject: [Python-Dev] buildbot master update In-Reply-To: <20101112111553.44da8c08@mission> References: <4CDD08FB.3070701@v.loewis.de> <20101112111553.44da8c08@mission> Message-ID: On 12 November 2010 16:15, Barry Warsaw wrote: > On Nov 12, 2010, at 10:29 AM, Martin v. L?wis wrote: > >>As you may have noticed: I updated the buildbot master to release 0.8.2. >>If you notice any problems, please post them here. > > Pretty! ?My buildbot seems fine. Yes, I like the new look. >>Slave operators can upgrade their installations at their own pace; >>buildbot is highly backwards compatible. As a recommendation, I suggest >>that slaves run at least at the version that is available in Debian >>stable (currently 0.7.8). > > Thanks Martin, for all you do to keep our infrastructure humming along > smoothly, including the recent Roundup migration. Thanks from me, too! Paul From solipsis at pitrou.net Fri Nov 12 20:42:00 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Nov 2010 20:42:00 +0100 Subject: [Python-Dev] r86418 - in python/branches/release27-maint: Doc/library/difflib.rst Lib/difflib.py Lib/test/test_difflib.py Misc/NEWS References: <20101111232219.6AC11EEA01@mail.python.org> Message-ID: <20101112204200.32856238@pitrou.net> Hello, On Fri, 12 Nov 2010 00:22:19 +0100 (CET) terry.reedy wrote: > + > + .. versionadded:: 2.7 > + The *autojunk* parameter. Maybe I've missed something, but is there any reason to add a new parameter in a bugfix release? (apart from security issues) Regards Antoine. From martin at v.loewis.de Fri Nov 12 20:44:34 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 12 Nov 2010 20:44:34 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <20101112144029.00f6fdfc@pitrou.net> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <4CDC47B4.5080200@v.loewis.de> <20101111210243.264ccfb7@pitrou.net> <201011121313.08741.victor.stinner@haypocalc.com> <20101112144029.00f6fdfc@pitrou.net> Message-ID: <4CDD9922.4090309@v.loewis.de> > I'm not talking about Windows obviously. POSIX filenames are natively > bytes, so if you get a bytes filename from an external source, it makes > sense to reuse the bytes form. > > I think it would be a mistake to allow bytes filenames under POSIX but > not under Windows. It makes porting harder. Not really. People who want to write portable code should use Unicode filenames everywhere, not byte filenames. > >> - tar stores filenames... in the locale encoding (except for PAX format which >> uses utf-8) > > So bytes filenames are useful at least for tar. No, they are not. The tarfile module decodes all file names on its own, IIUC. Regards, Martin From martin at v.loewis.de Fri Nov 12 20:46:27 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 12 Nov 2010 20:46:27 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <201011121315.35541.victor.stinner@haypocalc.com> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011112026.24445.victor.stinner@haypocalc.com> <4CDC490B.9060809@v.loewis.de> <201011121315.35541.victor.stinner@haypocalc.com> Message-ID: <4CDD9993.5080709@v.loewis.de> > Ok, good answer. In this case, I vote +1 to remove completly the ANSI version > from all Python modules. I think caution is still necessary. So I propose to deprecate byte filenames on Windows in 3.2, with removal in 3.3. People who think this is a terrible mistake and breaks there applications with no hope of a sensible solution can then still intervene. Regards, Martin From martin at v.loewis.de Fri Nov 12 20:53:00 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 12 Nov 2010 20:53:00 +0100 Subject: [Python-Dev] buildbot master update In-Reply-To: <20101112111553.44da8c08@mission> References: <4CDD08FB.3070701@v.loewis.de> <20101112111553.44da8c08@mission> Message-ID: <4CDD9B1C.3070703@v.loewis.de> > Thanks Martin, for all you do to keep our infrastructure humming along > smoothly, including the recent Roundup migration. I just write the announcements :-) In this case. thanks should also extend to Izak Burger of Upfront Hosting who did most of the setup (I just did the DNS changes), and to bitdancer who investigated (together with Izak) the configuration problems of the new installation. Regards, Martin From solipsis at pitrou.net Fri Nov 12 21:07:52 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Nov 2010 21:07:52 +0100 Subject: [Python-Dev] buildbot master update References: <4CDD08FB.3070701@v.loewis.de> <20101112111553.44da8c08@mission> <4CDD9B1C.3070703@v.loewis.de> Message-ID: <20101112210752.00528fcd@pitrou.net> On Fri, 12 Nov 2010 20:53:00 +0100 "Martin v. L?wis" wrote: > > Thanks Martin, for all you do to keep our infrastructure humming along > > smoothly, including the recent Roundup migration. > > I just write the announcements :-) In this case. thanks should also > extend to Izak Burger of Upfront Hosting who did most of the setup > (I just did the DNS changes), and to bitdancer who > investigated (together with Izak) the configuration problems of the new > installation. And for the record, bitdancer is R. David Murray :-) cheers Antoine. From hnassrat at gmail.com Fri Nov 12 21:08:42 2010 From: hnassrat at gmail.com (Hatem Nassrat) Date: Fri, 12 Nov 2010 13:08:42 -0700 Subject: [Python-Dev] Closures / Python Scopes Message-ID: A colleague of mine came across something anecdotal when working with lambdas, it is expressed by the following code snippet. In [1]: def a(): ...: for i in range(10): ...: def b(): ...: return i ...: yield b ...: ...: In [2]: funcs = list(a()) In [3]: print [f() for f in funcs] [9, 9, 9, 9, 9, 9, 9, 9, 9, 9] I understand that for loops in python do not have a scope, neither do if statements, and python is awesome for that. Is this something accidental? i.e. will python ever evolve into having scopes for if and for loops (and other blocks that are not functions)? the reason I ask is with the introduction of http://docs.python.org/py3k/reference/simple_stmts.html#nonlocal it seems like something that can happen. -- Hatem Nassrat From tjreedy at udel.edu Fri Nov 12 21:32:21 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 12 Nov 2010 15:32:21 -0500 Subject: [Python-Dev] r86418 - in python/branches/release27-maint: Doc/library/difflib.rst Lib/difflib.py Lib/test/test_difflib.py Misc/NEWS In-Reply-To: <20101112204200.32856238@pitrou.net> References: <20101111232219.6AC11EEA01@mail.python.org> <20101112204200.32856238@pitrou.net> Message-ID: On 11/12/2010 2:42 PM, Antoine Pitrou wrote: > > Hello, > > On Fri, 12 Nov 2010 00:22:19 +0100 (CET) > terry.reedy wrote: >> + >> + .. versionadded:: 2.7 >> + The *autojunk* parameter. > > Maybe I've missed something, but is there any reason to add a new > parameter in a bugfix release? > (apart from security issues) This is a bugfix. We discussed this (with Tim's participation) here last July/August and pretty well agreed that this was the least obnoxious solution to a bad situation. -- Terry Jan Reedy From tjreedy at udel.edu Fri Nov 12 21:38:19 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 12 Nov 2010 15:38:19 -0500 Subject: [Python-Dev] Closures / Python Scopes In-Reply-To: References: Message-ID: On 11/12/2010 3:08 PM, Hatem Nassrat wrote: > A colleague of mine came across something anecdotal when working with > lambdas, it is expressed by the following code snippet. > > In [1]: def a(): > ...: for i in range(10): > ...: def b(): > ...: return i > ...: yield b > ...: > ...: > > In [2]: funcs = list(a()) > > In [3]: print [f() for f in funcs] > [9, 9, 9, 9, 9, 9, 9, 9, 9, 9] > > > I understand that for loops in python do not have a scope, neither do > if statements, and python is awesome for that. Is this something > accidental? i.e. will python ever evolve into having scopes for if and > for loops (and other blocks that are not functions)? the reason I ask > is with the introduction of > http://docs.python.org/py3k/reference/simple_stmts.html#nonlocal it > seems like something that can happen. Question/discussion issues like this belong on python-list or python-ideas list. -- Terry Jan Reedy From tjreedy at udel.edu Fri Nov 12 21:53:17 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 12 Nov 2010 15:53:17 -0500 Subject: [Python-Dev] r86418 - in python/branches/release27-maint: Doc/library/difflib.rst Lib/difflib.py Lib/test/test_difflib.py Misc/NEWS In-Reply-To: References: <20101111232219.6AC11EEA01@mail.python.org> <20101112204200.32856238@pitrou.net> Message-ID: On 11/12/2010 3:32 PM, Terry Reedy wrote: > On 11/12/2010 2:42 PM, Antoine Pitrou wrote: >> >> Hello, >> >> On Fri, 12 Nov 2010 00:22:19 +0100 (CET) >> terry.reedy wrote: >>> + >>> + .. versionadded:: 2.7 >>> + The *autojunk* parameter. I just realized that this should say 2.7.1 so people know not to use it with the original 2.7. I will repeat it again in the SequenceMatcher section. >> Maybe I've missed something, but is there any reason to add a new >> parameter in a bugfix release? >> (apart from security issues) > > This is a bugfix. We discussed this (with Tim's participation) here last > July/August and pretty well agreed that this was the least obnoxious > solution to a bad situation. -- Terry Jan Reedy From ncoghlan at gmail.com Sat Nov 13 01:45:22 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 13 Nov 2010 10:45:22 +1000 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <4CDD9993.5080709@v.loewis.de> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011112026.24445.victor.stinner@haypocalc.com> <4CDC490B.9060809@v.loewis.de> <201011121315.35541.victor.stinner@haypocalc.com> <4CDD9993.5080709@v.loewis.de> Message-ID: On Sat, Nov 13, 2010 at 5:46 AM, "Martin v. L?wis" wrote: >> Ok, good answer. In this case, I vote +1 to remove completly the ANSI version >> from all Python modules. > > I think caution is still necessary. So I propose to deprecate byte > filenames on Windows in 3.2, with removal in 3.3. People who think this > is a terrible mistake and breaks there applications with no hope of a > sensible solution can then still intervene. I was going to suggest much the same thing. Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Nov 13 01:51:03 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 13 Nov 2010 10:51:03 +1000 Subject: [Python-Dev] r86418 - in python/branches/release27-maint: Doc/library/difflib.rst Lib/difflib.py Lib/test/test_difflib.py Misc/NEWS In-Reply-To: References: <20101111232219.6AC11EEA01@mail.python.org> <20101112204200.32856238@pitrou.net> Message-ID: On Sat, Nov 13, 2010 at 6:32 AM, Terry Reedy wrote: > On 11/12/2010 2:42 PM, Antoine Pitrou wrote: >> Maybe I've missed something, but is there any reason to add a new >> parameter in a bugfix release? >> (apart from security issues) > > This is a bugfix. We discussed this (with Tim's participation) here last > July/August and pretty well agreed that this was the least obnoxious > solution to a bad situation. Yep, as Terry said, the current behaviour is irredeemably broken in some situations, but switching it off completely would break other cases. Adding a new optional parameter that defaulted to the 2.7 behaviour was considered the least-bad option out of those available (do nothing, add parameter, change default behaviour, add new API). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From tjreedy at udel.edu Sat Nov 13 02:31:49 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 12 Nov 2010 20:31:49 -0500 Subject: [Python-Dev] [Python-checkins] r86441 - python/branches/py3k/Lib/test/test_nntplib.py In-Reply-To: <20101113002853.526F8EEA40@mail.python.org> References: <20101113002853.526F8EEA40@mail.python.org> Message-ID: <4CDDEA85.6050907@udel.edu> On 11/12/2010 7:28 PM, antoine.pitrou wrote: > Author: antoine.pitrou > Date: Sat Nov 13 01:28:53 2010 > New Revision: 86441 > > Log: > Switch from gmane to another provider for NNTP tests (as gmane isn't reliable > enough). Also, use setUpClass in order to connect only once per test run. > class NetworkedNNTP_SSLTests(NetworkedNNTPTestsMixin, unittest.TestCase): > - NNTP_HOST = 'snews.gmane.org' > - GROUP_NAME = 'gmane.comp.python.devel' > - GROUP_PAT = 'gmane.comp.python.d*' gmane is most problematical on weekends. > + NNTP_HOST = 'nntp.aioe.org' > + GROUP_NAME = 'comp.lang.python' > + GROUP_PAT = 'comp.lang.*' aioe went away for several months a couple of years ago or so. Let us hope it stays up for awhile now. The ssl connection currently does not work (expired certificate). Terry From greg.ewing at canterbury.ac.nz Sat Nov 13 04:05:35 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 13 Nov 2010 16:05:35 +1300 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CDC83B3.307@pearwood.info> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <4CDC32F0.3010500@pearwood.info> <4CDC6CBF.7060500@canterbury.ac.nz> <4CDC83B3.307@pearwood.info> Message-ID: <4CDE007F.9010903@canterbury.ac.nz> Steven D'Aprano wrote: > By the way, did you intend to send this off-list? No, I didn't realise I hadn't sent it to the list. If you don't document them, I won't use them, because I won't know if it's one of these don't-ask-don't-tell pseudo-public functions or something private that's accidentally been given a non-underscore name. > Greg Ewing wrote: >> Also it means that help() wouldn't show me documentation for >> the support functions, which is a bad thing if they really are >> intended for public use. > > I don't see why... if you import the module, and call help(module), they > will show up as normal. If the module has an __all__ list, then help(module) will only show functions included in that list. So your pseudo-public functions would not show up in it. Without some other reason to suspect their existence, I would probably never find them. -- Greg From guido at python.org Sat Nov 13 05:38:16 2010 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Nov 2010 20:38:16 -0800 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CDD0DBC.4050405@avl.com> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <4CDC6D31.2040809@canterbury.ac.nz> <4CDD0DBC.4050405@avl.com> Message-ID: On Fri, Nov 12, 2010 at 1:49 AM, Hrvoje Niksic wrote: > On 11/11/2010 11:24 PM, Greg Ewing wrote: >> >> Nick Coghlan wrote: >> >>> ?My personal opinion is that we should be trying to get the standard >>> ?library to the point where __all__ definitions are unnecessary - if a >>> ?name isn't in __all__, it should start with an underscore (and if that >>> ?is true, then the __all__ definition becomes effectively redundant). >> >> What about names imported from other modules that are used by >> the module, but not intended for re-export? How would you >> prevent them from turning up in help() etc. without using >> __all__? > > import foo as _foo > > I believe I am not the only one who finds that practice ugly, Agreed. > but I find it > just as ugly to underscore-ize every non-public helper function. __all__ is > there for a reason, let's use it. ?Maybe help() could automatically ignore > stuff not in __all__, or display it but warn the user of non-public > identifiers? No, I like all non-public functions, constants, classes and variables (but excluding imported modules) to start with _. You'd still need __all__ to make "import *" do the right thing, but the reader of the source code should not have to look up every name in __all__ to find whether it is supposed to be public or private. Plus, the same convention should carry over to methods and other class attributes, where you don't have __all__. If help() is broken we should fix it. (I'm not very happy with it myself anyway, I rarely use it.) Note that __all__ was originally invented to give "from package import *" a well-defined meaning when the package included submodules that might not have been loaded yet. Using it for other export control (while a good idea) could be considered "newfangled". :-) -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Sat Nov 13 13:06:46 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 13 Nov 2010 13:06:46 +0100 Subject: [Python-Dev] Breaking undocumented API References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <4CDC6D31.2040809@canterbury.ac.nz> <4CDD0DBC.4050405@avl.com> Message-ID: <20101113130646.1237977b@pitrou.net> On Fri, 12 Nov 2010 20:38:16 -0800 Guido van Rossum wrote: > > Note that __all__ was originally invented to give "from package import > *" a well-defined meaning when the package included submodules that > might not have been loaded yet. Using it for other export control > (while a good idea) could be considered "newfangled". :-) Newfangled in a rather old way already, then, perhaps :p regards Antoine. From solipsis at pitrou.net Sat Nov 13 13:08:39 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 13 Nov 2010 13:08:39 +0100 Subject: [Python-Dev] [Python-checkins] r86441 - python/branches/py3k/Lib/test/test_nntplib.py References: <20101113002853.526F8EEA40@mail.python.org> <4CDDEA85.6050907@udel.edu> Message-ID: <20101113130839.1c315e45@pitrou.net> On Fri, 12 Nov 2010 20:31:49 -0500 Terry Reedy wrote: > > > class NetworkedNNTP_SSLTests(NetworkedNNTPTestsMixin, unittest.TestCase): > > - NNTP_HOST = 'snews.gmane.org' > > - GROUP_NAME = 'gmane.comp.python.devel' > > - GROUP_PAT = 'gmane.comp.python.d*' > > gmane is most problematical on weekends. Well we've had buildbot failures in the middle of the week. > > + NNTP_HOST = 'nntp.aioe.org' > > + GROUP_NAME = 'comp.lang.python' > > + GROUP_PAT = 'comp.lang.*' > > aioe went away for several months a couple of years ago or so. > Let us hope it stays up for awhile now. > The ssl connection currently does not work (expired certificate). Funny, it shows that the NNTP SSL tests don't check the certificate, then. Regards Antoine. From g.rodola at gmail.com Sat Nov 13 13:12:31 2010 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Sat, 13 Nov 2010 13:12:31 +0100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> Message-ID: +1 on everything. 2010/11/11 Alexander Belopolsky : > 2010/11/11 Michael Foord : > .. >>> You mean runtime automation, e.g. creating __all__ on the fly omitting >>> underscored names? >>> >> Writing code to generate a __all__ that duplicates the default behaviour >> seems redundant to me. >> > > FWIW, I like having __all__ at the top of the module. ?It feels like a > table of contents at the start of a chapter. ?In some cases it may > also serve as an optimization when len(__all__) is much smaller than > len(__dict__). ?I also don't like _ prefix to become an exclusive > means to express privateness. > > I think the current definition of "public names" is a good one and > just needs to be made more visible in the docs. ?If the module defines > __all__, that should be the ultimate answer to what is public in that > module. ? (Users should learn to use help(module) instead of > dir(module) for API discovery.) ? If __all__ is not defined in the > module, I think it is good to introduce it after a careful review of > what it should contain. ?And __all__ should never contain names that > start with _. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/g.rodola%40gmail.com > From foom at fuhm.net Sat Nov 13 13:30:05 2010 From: foom at fuhm.net (James Y Knight) Date: Sat, 13 Nov 2010 07:30:05 -0500 Subject: [Python-Dev] [Python-checkins] r86441 - python/branches/py3k/Lib/test/test_nntplib.py In-Reply-To: <20101113130839.1c315e45@pitrou.net> References: <20101113002853.526F8EEA40@mail.python.org> <4CDDEA85.6050907@udel.edu> <20101113130839.1c315e45@pitrou.net> Message-ID: <92814936-A0FC-403A-B3BA-46AE3085594B@fuhm.net> On Nov 13, 2010, at 7:08 AM, Antoine Pitrou wrote: > Funny, it shows that the NNTP SSL tests don't check the certificate, > then. Unsurprising, given that you need 140 lines of pretty non-obvious python code to do so... James From solipsis at pitrou.net Sat Nov 13 13:37:12 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 13 Nov 2010 13:37:12 +0100 Subject: [Python-Dev] Stable buildbots Message-ID: <20101113133712.60e9be27@pitrou.net> Hi, Just to let you know that we now have 8 stable buildbots, including Barry's own PPC Ubuntu machine (even though the Windows buildbots give a rather unconventional meaning to the word "stability"). Right now they are mostly green: http://www.python.org/dev/buildbot/all/waterfall?category=3.x.stable cheers Antoine. From solipsis at pitrou.net Sat Nov 13 13:40:25 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 13 Nov 2010 13:40:25 +0100 Subject: [Python-Dev] [Python-checkins] r86441 - python/branches/py3k/Lib/test/test_nntplib.py In-Reply-To: <92814936-A0FC-403A-B3BA-46AE3085594B@fuhm.net> References: <20101113002853.526F8EEA40@mail.python.org> <4CDDEA85.6050907@udel.edu> <20101113130839.1c315e45@pitrou.net> <92814936-A0FC-403A-B3BA-46AE3085594B@fuhm.net> Message-ID: <20101113134025.5604fc9c@pitrou.net> On Sat, 13 Nov 2010 07:30:05 -0500 James Y Knight wrote: > On Nov 13, 2010, at 7:08 AM, Antoine Pitrou wrote: > > Funny, it shows that the NNTP SSL tests don't check the certificate, > > then. > > Unsurprising, given that you need 140 lines of pretty non-obvious python code to do so... You must have missed the new match_hostname() function: http://docs.python.org/dev/library/ssl.html#ssl.match_hostname (its implementation is 50 lines rather than 140 lines, though) Regards Antoine. From dickinsm at gmail.com Sat Nov 13 14:00:29 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Sat, 13 Nov 2010 13:00:29 +0000 Subject: [Python-Dev] buildbot master update In-Reply-To: <4CDD08FB.3070701@v.loewis.de> References: <4CDD08FB.3070701@v.loewis.de> Message-ID: On Fri, Nov 12, 2010 at 9:29 AM, "Martin v. L?wis" wrote: > As you may have noticed: I updated the buildbot master to release 0.8.2. > If you notice any problems, please post them here. One effect of this change seems to be that bbreport[1] no longer works, since it appears that buildbot 0.8.2 has done away with the XMLRPC interface that bbreport uses. But that's really a bbreport issue rather than a buildbot one... Mark [1] http://code.google.com/p/bbreport/ From g.brandl at gmx.net Sat Nov 13 15:15:43 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 13 Nov 2010 15:15:43 +0100 Subject: [Python-Dev] buildbot master update In-Reply-To: References: <4CDD08FB.3070701@v.loewis.de> Message-ID: Am 13.11.2010 14:00, schrieb Mark Dickinson: > On Fri, Nov 12, 2010 at 9:29 AM, "Martin v. L?wis" wrote: >> As you may have noticed: I updated the buildbot master to release 0.8.2. >> If you notice any problems, please post them here. > > One effect of this change seems to be that bbreport[1] no longer > works, since it appears that buildbot 0.8.2 has done away with the > XMLRPC interface that bbreport uses. > > But that's really a bbreport issue rather than a buildbot one... > > Mark I've added a quickfix by copying the removed xmlrpc interface to the local buildbot installation now. I had to patch it up a bit though, because of an apparent API change somewhere in buildbot, and I'm not sure whether this was correct. Georg From ocean-city at m2.ccsnet.ne.jp Sat Nov 13 15:47:53 2010 From: ocean-city at m2.ccsnet.ne.jp (Hirokazu Yamamoto) Date: Sat, 13 Nov 2010 23:47:53 +0900 Subject: [Python-Dev] Issues 9931 and 9055 - test_ttk_guionly and buildbot run as a service In-Reply-To: References: Message-ID: <4CDEA519.1020801@m2.ccsnet.ne.jp> On 2010/11/13 2:07, Terry Reedy wrote: > On 11/12/2010 3:44 AM, Paul Moore wrote: >> Hi, >> My buildbot has been failing for some time because of these 2 issues, >> both related to the fact that tests are hanging when run as a service >> (and hence have no display to open GUI elements on). Both issues have >> patches, and as far as I am aware, the patches fix the issues >> reasonably well. What can I do to move these 2 issues forwards? As >> things stand, my buildbot is not providing a lot of value on the 3.x >> branch :-( > > http://bugs.python.org/issue9055 > is marked as a 2.7 issue only, perhaps fixed by Tim Golden's committed > patches. Should it be re-versioned for 3.1/2? There is no patch file > attached, though perhaps the code in Yamamoto's message is meant as such > (but for which version?). So the first thing you could do is clarify the > current status and remaining issue on the tracker. > > http://bugs.python.org/issue9931 > by Yamamoto is marked for all 3 versions. It seems to be a similar > issue, though marked 'test' rather than 'ctypes'. It does have a patch > by him apparently based on his previous comments. The issue has no > responses and needs a patch review. So the first thing you could do is > to provide one;-). If it looks great (no changes that you can think of) > and works great, say so. Then it can move on to commit review stage. > > PS. Providing links like the above makes it easier for multiple people > to take a look and respond. My patch won't fix issue9055 directly, but solves issue9931. Probably it's easy to create a patch to fix issue9055 based on my patch. One problem is, how to skip test. With single decorator like skip_unless_symlink? Or let requires() raise SkipTest? From ocean-city at m2.ccsnet.ne.jp Sat Nov 13 17:21:37 2010 From: ocean-city at m2.ccsnet.ne.jp (Hirokazu Yamamoto) Date: Sun, 14 Nov 2010 01:21:37 +0900 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <201011121308.30368.victor.stinner@haypocalc.com> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011112026.24445.victor.stinner@haypocalc.com> <201011121308.30368.victor.stinner@haypocalc.com> Message-ID: <4CDEBB11.5050209@m2.ccsnet.ne.jp> On 2010/11/12 4:26, Victor Stinner wrote: > On Thursday 11 November 2010 17:07:28 Hirokazu Yamamoto wrote: >> Hello. Is it possible to remove Win32 ANSI API (ie: GetFileAttributesA) >> and only use Win32 WIDE API (ie: GetFileAttributesW)? >> Mainly in posixmodule.c. > > Even if I hate the MBCS encoding, because it replaces undecodable characters > by similar glyphs by default, I'm not certain that it is a good idea to drop > the bytes API. On 2010/11/12 21:08, Victor Stinner wrote: > On Thursday 11 November 2010 23:01:32 you wrote: >>> Sure, it will divide the number of lines, of the code specific to >>> Windows, by two. >> >> Can we get most of the code cleanup benefit without the backwards >> compatibility risk by doing the decode from 'mbcs' on our side of the >> fence? > > I created PyUnicode_FSDecoder, a ParseTuple converter used to work on unicode > paths, instead of bytes paths. On Windows, this converter uses mbcs encoding > in strict mode, whereas Windows converter uses replace error handler to > decode, and ignore to encode. So I don't think that we should this converter > on Windows. > >> That is, have code that was the C equivalent of: >> >> arg_is_bytes = not isinstance(arg, str) >> if arg_is_bytes: >> val = _decode_mbcs(arg) >> # Decoding error checking here >> else: >> val = arg >> # Common processing using WIDE API >> if arg_is_bytes: >> result = _encode_mbcs(wide_result) >> # Encoding error checking here >> else: >> result = wide_result > > This doesn't make the code shorter, it may be longer than the actual code, and > it is less compliant with the Windows native API... Is it possible to implement new PyArg_ParseTuple converter to use PyUnicode_Decode(const char *s, Py_ssize_t size, const char *encoding, /* mbcs */ const char *errors) /* replace */ and use it? From tjreedy at udel.edu Sat Nov 13 19:40:54 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 13 Nov 2010 13:40:54 -0500 Subject: [Python-Dev] [Python-checkins] r86441 - python/branches/py3k/Lib/test/test_nntplib.py In-Reply-To: <20101113130839.1c315e45@pitrou.net> References: <20101113002853.526F8EEA40@mail.python.org> <4CDDEA85.6050907@udel.edu> <20101113130839.1c315e45@pitrou.net> Message-ID: On 11/13/2010 7:08 AM, Antoine Pitrou wrote: > On Fri, 12 Nov 2010 20:31:49 -0500 > Terry Reedy wrote: >> >>> class NetworkedNNTP_SSLTests(NetworkedNNTPTestsMixin, unittest.TestCase): >>> - NNTP_HOST = 'snews.gmane.org' >>> - GROUP_NAME = 'gmane.comp.python.devel' >>> - GROUP_PAT = 'gmane.comp.python.d*' >> >> gmane is most problematical on weekends. > > Well we've had buildbot failures in the middle of the week. Why I did not say 'only' ;-). >>> + NNTP_HOST = 'nntp.aioe.org' >>> + GROUP_NAME = 'comp.lang.python' >>> + GROUP_PAT = 'comp.lang.*' >> >> aioe went away for several months a couple of years ago or so. >> Let us hope it stays up for awhile now. >> The ssl connection currently does not work (expired certificate). More specifically, if, with Thunderbird, I turn on SSL/TLS, (which switches from port 119 to 563), I get *invalid* certificate message - good for aioe.org, news.aioe,org, but not nntp.aioe.org. I believe SSL worked before the hiatus so it might be an oversight in restarting. > Funny, it shows that the NNTP SSL tests don't check the certificate, > then. Or not thoroughly. -- Terry Jan Reedy From tjreedy at udel.edu Sat Nov 13 20:29:23 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 13 Nov 2010 14:29:23 -0500 Subject: [Python-Dev] [Python-checkins] r86441 - python/branches/py3k/Lib/test/test_nntplib.py In-Reply-To: References: <20101113002853.526F8EEA40@mail.python.org> <4CDDEA85.6050907@udel.edu> <20101113130839.1c315e45@pitrou.net> Message-ID: O > More specifically, if, with Thunderbird, I turn on SSL/TLS, (which > switches from port 119 to 563), I get *invalid* certificate message - > good for aioe.org, news.aioe,org, but not nntp.aioe.org. I believe SSL > worked before the hiatus so it might be an oversight in restarting. > >> Funny, it shows that the NNTP SSL tests don't check the certificate, >> then. > > Or not thoroughly. I can access gmane with SSL, so you could add a conditional (on being up and running) certificate check using that. -- Terry Jan Reedy From tjreedy at udel.edu Sat Nov 13 19:17:25 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 13 Nov 2010 13:17:25 -0500 Subject: [Python-Dev] [Python-checkins] r86451 - python/branches/py3k/Misc/NEWS In-Reply-To: <20101113132541.861BBEEA82@mail.python.org> References: <20101113132541.861BBEEA82@mail.python.org> Message-ID: <4CDED635.3010409@udel.edu> On 11/13/2010 8:25 AM, georg.brandl wrote: > Author: georg.brandl > Date: Sat Nov 13 14:25:40 2010 > New Revision: 86451 > - unused undocumented value PyBUF_SHADOW, and strangely-looking code in > + undocumented value PyBUF_SHADOW, and strangely-looking code in For future reference, 'strangely-looking' should be either 'strange- looking' or 'strangely appearing'. First, '-ly' adverbs are never hypenated even when modifying adjectives. Second, 'strangely looking code' would mean that the code is actively looking around strangely (as opposed to passively sitting there appearing strange). tjr From tjreedy at udel.edu Sat Nov 13 19:21:09 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 13 Nov 2010 13:21:09 -0500 Subject: [Python-Dev] [Python-checkins] r86453 - in python/branches/release31-maint: Include/patchlevel.h Lib/distutils/__init__.py Lib/idlelib/idlever.py Misc/NEWS Misc/RPM/python-3.1.spec README In-Reply-To: <20101113172857.00DBFEEAC5@mail.python.org> References: <20101113172857.00DBFEEAC5@mail.python.org> Message-ID: <4CDED715.1070100@udel.edu> On 11/13/2010 12:28 PM, benjamin.peterson wrote: > Author: benjamin.peterson > Date: Sat Nov 13 18:28:56 2010 > New Revision: 86453 > Modified: python/branches/release31-maint/README > ============================================================================== > --- python/branches/release31-maint/README (original) > +++ python/branches/release31-maint/README Sat Nov 13 18:28:56 2010 > @@ -1,5 +1,5 @@ > -This is Python version 3.1.2 > -============================ > +This is Python version 3.1.2 release candidate 1 > +================================================ That should be 3.1.3 ;-) From janssen at parc.com Sat Nov 13 21:56:11 2010 From: janssen at parc.com (Bill Janssen) Date: Sat, 13 Nov 2010 12:56:11 PST Subject: [Python-Dev] [Python-checkins] r86441 - python/branches/py3k/Lib/test/test_nntplib.py In-Reply-To: <20101113134025.5604fc9c@pitrou.net> References: <20101113002853.526F8EEA40@mail.python.org> <4CDDEA85.6050907@udel.edu> <20101113130839.1c315e45@pitrou.net> <92814936-A0FC-403A-B3BA-46AE3085594B@fuhm.net> <20101113134025.5604fc9c@pitrou.net> Message-ID: <47826.1289681771@parc.com> Antoine Pitrou wrote: > On Sat, 13 Nov 2010 07:30:05 -0500 > James Y Knight wrote: > > On Nov 13, 2010, at 7:08 AM, Antoine Pitrou wrote: > > > Funny, it shows that the NNTP SSL tests don't check the certificate, > > > then. > > > > Unsurprising, given that you need 140 lines of pretty non-obvious python code to do so... > > You must have missed the new match_hostname() function: > http://docs.python.org/dev/library/ssl.html#ssl.match_hostname > > (its implementation is 50 lines rather than 140 lines, though) On the client side, it's pretty easy to see an invalid (say, expired) certificate. Just call get_server_certificate(), which will fail if the server certificate is invalid. That's a separate issue from matching the request hostname to the various host identifiers in the certificate, which various application protocols may or may not require. Bill From benjamin at python.org Sun Nov 14 00:08:10 2010 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 13 Nov 2010 17:08:10 -0600 Subject: [Python-Dev] [RELEASED] Python 3.1.3 release candidate 1 Message-ID: On behalf of the Python development team, I'm gladsome to announce a release candidate of the third bugfix release for the Python 3.1 series, Python 3.1.3. This bug fix release fixes numerous issues found in 3.1.2. Please try it with your packages and report any bugs you find. The final of 3.1.3 is scheduled to be released in two weeks. The Python 3.1 version series focuses on the stabilization and optimization of the features and changes that Python 3.0 introduced. For example, the new I/O system has been rewritten in C for speed. File system APIs that use unicode strings now handle paths with undecodable bytes in them. Other features include an ordered dictionary implementation, a condensed syntax for nested with statements, and support for ttk Tile in Tkinter. For a more extensive list of changes in 3.1, see http://doc.python.org/3.1/whatsnew/3.1.html or Misc/NEWS in the Python distribution. To download Python 3.1.3 visit: http://www.python.org/download/releases/3.1.3/ A list of changes in 3.1.3 can be found here: http://svn.python.org/projects/python/tags/r313rc1/Misc/NEWS The 3.1 documentation can be found at: http://docs.python.org/3.1 Bugs can always be reported to: http://bugs.python.org Enjoy! -- Benjamin Peterson Release Manager benjamin at python.org (on behalf of the entire python-dev team and 3.1.3's contributors) From benjamin at python.org Sun Nov 14 00:12:22 2010 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 13 Nov 2010 17:12:22 -0600 Subject: [Python-Dev] [RELEASED] Python 2.7.1 release candidate 1 Message-ID: On behalf of the Python development team, I'm chuffed to announce the a release candidate of Python 2.7.1. Please test the release candidate with your packages and report any bugs you find. 2.7.1 final is scheduled in two weeks. 2.7 includes many features that were first released in Python 3.1. The faster io module, the new nested with statement syntax, improved float repr, set literals, dictionary views, and the memoryview object have been backported from 3.1. Other features include an ordered dictionary implementation, unittests improvements, a new sysconfig module, auto-numbering of fields in the str/unicode format method, and support for ttk Tile in Tkinter. For a more extensive list of changes in 2.7, see http://doc.python.org/dev/whatsnew/2.7.html or Misc/NEWS in the Python distribution. To download Python 2.7.1 visit: http://www.python.org/download/releases/2.7.1/ The 2.7.1 changelog is at: http://svn.python.org/projects/python/tags/r271rc1/Misc/NEWS 2.7 documentation can be found at: http://docs.python.org/2.7/ This is a testing release, so we encourage developers to test it with their applications and libraries. Please report any bugs you find, so they can be fixed in the final release. The bug tracker is at: http://bugs.python.org/ Enjoy! -- Benjamin Peterson Release Manager benjamin at python.org (on behalf of the entire python-dev team and 2.7.1's contributors) From victor.stinner at haypocalc.com Sun Nov 14 01:06:55 2010 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sun, 14 Nov 2010 01:06:55 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <4CDEBB11.5050209@m2.ccsnet.ne.jp> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011121308.30368.victor.stinner@haypocalc.com> <4CDEBB11.5050209@m2.ccsnet.ne.jp> Message-ID: <201011140106.55153.victor.stinner@haypocalc.com> On Saturday 13 November 2010 17:21:37 you wrote: > On 2010/11/12 4:26, Victor Stinner wrote: > > On Thursday 11 November 2010 17:07:28 Hirokazu Yamamoto wrote: > >> Hello. Is it possible to remove Win32 ANSI API (ie: GetFileAttributesA) > >> and only use Win32 WIDE API (ie: GetFileAttributesW)? > >> Mainly in posixmodule.c. > > > > Even if I hate the MBCS encoding, because it replaces undecodable > > characters > > > by similar glyphs by default, I'm not certain that it is a good idea > > to drop > > > the bytes API. > > On 2010/11/12 21:08, Victor Stinner wrote: > > On Thursday 11 November 2010 23:01:32 you wrote: > >>> Sure, it will divide the number of lines, of the code specific to > >>> Windows, by two. > >> > >> Can we get most of the code cleanup benefit without the backwards > >> compatibility risk by doing the decode from 'mbcs' on our side of the > >> fence? > > > > I created PyUnicode_FSDecoder, a ParseTuple converter used to work on > > unicode paths, instead of bytes paths. On Windows, this converter uses > > mbcs encoding in strict mode, whereas Windows converter uses replace > > error handler to decode, and ignore to encode. So I don't think that we > > should this converter on Windows. > > > >> That is, have code that was the C equivalent of: > >> > >> arg_is_bytes = not isinstance(arg, str) > >> > >> if arg_is_bytes: > >> val = _decode_mbcs(arg) > >> # Decoding error checking here > >> > >> else: > >> val = arg > >> > >> # Common processing using WIDE API > >> > >> if arg_is_bytes: > >> result = _encode_mbcs(wide_result) > >> # Encoding error checking here > >> > >> else: > >> result = wide_result > > > > This doesn't make the code shorter, it may be longer than the actual > > code, and it is less compliant with the Windows native API... > > Is it possible to implement new PyArg_ParseTuple converter to use > PyUnicode_Decode(const char *s, > Py_ssize_t size, > const char *encoding, /* mbcs */ > const char *errors) /* replace */ > and use it? Yes, but how do you check if the input argument is a bytes or a str object with your PyArg_Parse converter? You should use "O" format and manually convert it to unicode, and then convert the result back to bytes (if the input was bytes). It don't think that it makes the code shorter. The code is currently working. The question is if we have to drop the ANSI API now, later or never. It looks like the decision moves to "later" (deprecate in 3.2, remove in 3.3). I still think that drop now doesn't really hurt. Victor From solipsis at pitrou.net Sun Nov 14 01:19:28 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 14 Nov 2010 01:19:28 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011121308.30368.victor.stinner@haypocalc.com> <4CDEBB11.5050209@m2.ccsnet.ne.jp> <201011140106.55153.victor.stinner@haypocalc.com> Message-ID: <20101114011928.0f1e3d60@pitrou.net> On Sun, 14 Nov 2010 01:06:55 +0100 Victor Stinner wrote: > > The code is currently working. The question is if we have to drop the ANSI API > now, later or never. If the code is currently working and isn't a security hole, then we obviously don't "have to". Apparently several developers "want to", which is different. > It looks like the decision moves to "later" (deprecate in > 3.2, remove in 3.3). I still think that drop now doesn't really hurt. If you drop code without first deprecating it, chances are it will hurt someone. That's why having a deprecation period is the rule we usually follow (most of the time :-)). Regards Antoine. From ncoghlan at gmail.com Sun Nov 14 02:06:57 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 14 Nov 2010 11:06:57 +1000 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <20101114011928.0f1e3d60@pitrou.net> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011121308.30368.victor.stinner@haypocalc.com> <4CDEBB11.5050209@m2.ccsnet.ne.jp> <201011140106.55153.victor.stinner@haypocalc.com> <20101114011928.0f1e3d60@pitrou.net> Message-ID: On Sun, Nov 14, 2010 at 10:19 AM, Antoine Pitrou wrote: > On Sun, 14 Nov 2010 01:06:55 +0100 > Victor Stinner wrote: >> >> The code is currently working. The question is if we have to drop the ANSI API >> now, later or never. > > If the code is currently working and isn't a security hole, then we > obviously don't "have to". > Apparently several developers "want to", which is different. We should also keep in mind that *Microsoft* have chosen to keep the bytes Win32 APIs around, despite their flaws, all in the name of backwards compatibility. While the goal of nudging third party developers towards the superior Unicode APIs is an admirable one, it is still the case that there is a *lot* of ASCII-only code out there. E.g. applications could easily be storing filenames in an ASCII only datastore that provides them back to the application as bytes in 3.x. >> It looks like the decision moves to "later" (deprecate in >> 3.2, remove in 3.3). I still think that drop now doesn't really hurt. > > If you drop code without first deprecating it, chances are it will > hurt someone. ?That's why having a deprecation period is the rule we > usually follow (most of the time :-)). Indeed. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sun Nov 14 02:28:31 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 14 Nov 2010 11:28:31 +1000 Subject: [Python-Dev] PEP 385: Formatting of Hg checkin notifications Message-ID: Following the python-checkins list, I get to see both the current SVN notifications and the Hg notifications from Tarek's pushes into the distutils repository. I realised today that there is one key reason as to why the latter strikes me as a big wall of unintelligible text, while I find the SVN notification quite easy to read: vertical whitespace. The SVN notification uses vertical whitespace to separate out the log message and the list of files affected clearly from the rest of the header fields. It makes it *really* easy to see at a glance what the checkin was about and which files were affected. For the Hg notification, both of these fields are embedded in a big header block along with all the other fields, so it is quite difficult to make out the same information. It would be really nice if the formatting could be improved for the email notifications on the Hg side when we adopt it for the main CPython repository. The changes would be to: - add a blank line before and after the summary field - add a carriage return between the header and content for the summary field and the files field - indent the list of files by two spaces and use a carriage return rather than a comma to separate named files I've included an example below based on one of Tarek's recent pushes: Current Hg notification header and start of first diff: ================================================ tarek.ziade pushed 7ebf14ab2840 to distutils2: http://hg.python.org/distutils2/rev/7ebf14ab2840 changeset: 816:7ebf14ab2840 tag: tip user: Tarek Ziade date: Sat Nov 13 12:40:33 2010 +0100 summary: compiler_type -> name files: distutils2/compiler/__init__.py, distutils2/compiler/bcppcompiler.py, distutils2/compiler/ccompiler.py, distutils2/compiler/cygwinccompiler.py, distutils2/compiler/msvc9compiler.py, distutils2/compiler/msvccompiler.py, distutils2/compiler/unixccompiler.py, distutils2/tests/test_config.py diff --git a/distutils2/compiler/__init__.py b/distutils2/compiler/__init__.py --- a/distutils2/compiler/__init__.py +++ b/distutils2/compiler/__init__.py @@ -13,7 +13,7 @@ ==================================================== Proposed change to separate out summary and files fields: ================================================ tarek.ziade pushed 7ebf14ab2840 to distutils2: http://hg.python.org/distutils2/rev/7ebf14ab2840 changeset: 816:7ebf14ab2840 tag: tip user: Tarek Ziade date: Sat Nov 13 12:40:33 2010 +0100 summary: compiler_type -> name files: distutils2/compiler/__init__.py distutils2/compiler/bcppcompiler.py distutils2/compiler/ccompiler.py distutils2/compiler/cygwinccompiler.py distutils2/compiler/msvc9compiler.py distutils2/compiler/msvccompiler.py distutils2/compiler/unixccompiler.py distutils2/tests/test_config.py diff --git a/distutils2/compiler/__init__.py b/distutils2/compiler/__init__.py --- a/distutils2/compiler/__init__.py +++ b/distutils2/compiler/__init__.py @@ -13,7 +13,7 @@ ==================================================== Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From db3l.net at gmail.com Sun Nov 14 03:40:22 2010 From: db3l.net at gmail.com (David Bolen) Date: Sat, 13 Nov 2010 21:40:22 -0500 Subject: [Python-Dev] Stable buildbots References: <20101113133712.60e9be27@pitrou.net> Message-ID: Antoine Pitrou writes: > (even though the Windows buildbots give > a rather unconventional meaning to the word "stability"). Nag, nag, nag .... :-) There's been a bit of an uptick in the past few weeks with hung python_d processes (not a new issue, but it ebbs and flows), so I'm going to try to pull together a monitor script this weekend to start killing them off automatically. Should at least get rid of some of the low hanging fruit that interferes with subsequent builds. -- David From tjreedy at udel.edu Sun Nov 14 04:10:11 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 13 Nov 2010 22:10:11 -0500 Subject: [Python-Dev] PEP 385: Formatting of Hg checkin notifications In-Reply-To: References: Message-ID: On 11/13/2010 8:28 PM, Nick Coghlan wrote: > Following the python-checkins list, I get to see both the current SVN > notifications and the Hg notifications from Tarek's pushes into the > distutils repository. I realised today that there is one key reason as > to why the latter strikes me as a big wall of unintelligible text, > while I find the SVN notification quite easy to read: vertical > whitespace. > > The SVN notification uses vertical whitespace to separate out the log > message and the list of files affected clearly from the rest of the > header fields. It makes it *really* easy to see at a glance what the > checkin was about and which files were affected. For the Hg > notification, both of these fields are embedded in a big header block > along with all the other fields, so it is quite difficult to make out > the same information. > > It would be really nice if the formatting could be improved for the > email notifications on the Hg side when we adopt it for the main > CPython repository. The changes would be to: > - add a blank line before and after the summary field > - add a carriage return between the header and content for the summary > field and the files field > - indent the list of files by two spaces and use a carriage return > rather than a comma to separate named files > > I've included an example below based on one of Tarek's recent pushes: > > Current Hg notification header and start of first diff: > ================================================ > tarek.ziade pushed 7ebf14ab2840 to distutils2: > > http://hg.python.org/distutils2/rev/7ebf14ab2840 > changeset: 816:7ebf14ab2840 > tag: tip > user: Tarek Ziade > date: Sat Nov 13 12:40:33 2010 +0100 > summary: compiler_type -> name > files: distutils2/compiler/__init__.py, > distutils2/compiler/bcppcompiler.py, distutils2/compiler/ccompiler.py, > distutils2/compiler/cygwinccompiler.py, > distutils2/compiler/msvc9compiler.py, > distutils2/compiler/msvccompiler.py, > distutils2/compiler/unixccompiler.py, distutils2/tests/test_config.py > > diff --git a/distutils2/compiler/__init__.py b/distutils2/compiler/__init__.py > --- a/distutils2/compiler/__init__.py > +++ b/distutils2/compiler/__init__.py > @@ -13,7 +13,7 @@ > ==================================================== > > Proposed change to separate out summary and files fields: > ================================================ > tarek.ziade pushed 7ebf14ab2840 to distutils2: > > http://hg.python.org/distutils2/rev/7ebf14ab2840 > changeset: 816:7ebf14ab2840 > tag: tip > user: Tarek Ziade > date: Sat Nov 13 12:40:33 2010 +0100 > > summary: > compiler_type -> name > > files: > distutils2/compiler/__init__.py > distutils2/compiler/bcppcompiler.py > distutils2/compiler/ccompiler.py > distutils2/compiler/cygwinccompiler.py > distutils2/compiler/msvc9compiler.py > distutils2/compiler/msvccompiler.py > distutils2/compiler/unixccompiler.py > distutils2/tests/test_config.py > > diff --git a/distutils2/compiler/__init__.py b/distutils2/compiler/__init__.py > --- a/distutils2/compiler/__init__.py > +++ b/distutils2/compiler/__init__.py > @@ -13,7 +13,7 @@ > ==================================================== Much better except possible for \n after 'summary:' -- Terry Jan Reedy From rdmurray at bitdance.com Sun Nov 14 04:40:52 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Sat, 13 Nov 2010 22:40:52 -0500 Subject: [Python-Dev] unexpected traceback/stack behavior with chained exceptions (issue 1553375) Message-ID: <20101114034052.39AE81FC192@kimball.webabinitio.net> Issue 1553375 [1] proposes a patch to add an 'allframes' option to the traceback printing and formatting routines so that the full traceback from the top of the execution stack down to the exception is printed, instead of just from the point where the exception is caught down to the exception. This is useful when the reason you are capturing the traceback is to log it, and you have several different points in your application where you do such traceback logging. You often really want to know the entire stack in that case; logging only from the capture point down can lose important debugging information depending on how the application is structured. The patch seems to work well, except for one problem that I don't have enough CPython internals knowledge to understand. If the traceback we are printing has a chained traceback, the resulting full traceback shows the line that is printing the traceback instead of the line from the 'try' block. (It prints the expected line if there is no chained traceback). So, is this a failure in my understanding of how tracebacks are supposed to work, or a bug in how chained tracebacks are constructed? [1] http://bugs.python.org/issue1553375 -- R. David Murray www.bitdance.com From ncoghlan at gmail.com Sun Nov 14 09:22:31 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 14 Nov 2010 18:22:31 +1000 Subject: [Python-Dev] Stable buildbots In-Reply-To: References: <20101113133712.60e9be27@pitrou.net> Message-ID: On Sun, Nov 14, 2010 at 12:40 PM, David Bolen wrote: > Antoine Pitrou writes: > >> (even though the Windows buildbots give >> a rather unconventional meaning to the word "stability"). > > Nag, nag, nag .... :-) > > There's been a bit of an uptick in the past few weeks with hung > python_d processes (not a new issue, but it ebbs and flows), so I'm > going to try to pull together a monitor script this weekend to start > killing them off automatically. ?Should at least get rid of some of > the low hanging fruit that interferes with subsequent builds. Do we have any idea why the workaround to avoid the popup windows stopped working? (assuming it ever worked reliably - I thought it did, but that impression may have been incorrect) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sun Nov 14 09:25:27 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 14 Nov 2010 18:25:27 +1000 Subject: [Python-Dev] PEP 385: Formatting of Hg checkin notifications In-Reply-To: References: Message-ID: On Sun, Nov 14, 2010 at 1:10 PM, Terry Reedy wrote: > Much better except possible for \n after 'summary:' That extra line break helps more for multi-line checkin messages (which happen reasonably often). Doesn't really bother me either way - I'm mainly looking for info on who has the ability to change the format in the first place :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From db3l.net at gmail.com Sun Nov 14 09:48:53 2010 From: db3l.net at gmail.com (David Bolen) Date: Sun, 14 Nov 2010 03:48:53 -0500 Subject: [Python-Dev] Stable buildbots References: <20101113133712.60e9be27@pitrou.net> Message-ID: Nick Coghlan writes: > Do we have any idea why the workaround to avoid the popup windows > stopped working? (assuming it ever worked reliably - I thought it did, > but that impression may have been incorrect) Oh, the pop-up handling for the RTL dialogs still seems to be working fine (at least I haven't seen any since I put it in place). That, plus the original buildbot tweaks to block any OS popups still looks solid for avoiding any dialogs that block a test process. This is a completely separate issue, though probably around just as long, and like the popup problem its frequency changes over time. By "hung" here I'm referring to cases where something must go wrong with a test and/or its cleanup such that a python_d process remains running, usually several of them at the same time. So I end up with a bunch of python_d processes in the background (but not with any dialogs pending), which eventually cause errors during attempts the next time the same builder is used since the file remains in use. I expect some of this may be the lack of a good process group cleanup under Windows, though the root cause may not be unique to Windows. I see something very similar reasonable frequency on my OSX Tiger buildbot as well. But since the filesystem there can let the build tree get cleaned and rebuilt even with a stranded executable, the impact is minimal on subsequent tests than on Windows, though the OSX processes do burn a ton of CPU. I run a script on OSX to kill them off, but that was quick to whip up since in those cases the stranded processes all end up getting owned by init so it's a simple ps grep and kill. In the Windows case I'll probably just set a time limit so if the processes have been around more than a few hours I figure they're safe to kill. -- David From martin at v.loewis.de Sun Nov 14 11:09:08 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 14 Nov 2010 11:09:08 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <20101114011928.0f1e3d60@pitrou.net> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011121308.30368.victor.stinner@haypocalc.com> <4CDEBB11.5050209@m2.ccsnet.ne.jp> <201011140106.55153.victor.stinner@haypocalc.com> <20101114011928.0f1e3d60@pitrou.net> Message-ID: <4CDFB544.7000809@v.loewis.de> > If the code is currently working and isn't a security hole, then we > obviously don't "have to". > Apparently several developers "want to", which is different. In case the motivation for that isn't clear: it would produce a significant code reduction, and therefore ease maintenance. Regards, Martin From martin at v.loewis.de Sun Nov 14 11:14:27 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 14 Nov 2010 11:14:27 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011121308.30368.victor.stinner@haypocalc.com> <4CDEBB11.5050209@m2.ccsnet.ne.jp> <201011140106.55153.victor.stinner@haypocalc.com> <20101114011928.0f1e3d60@pitrou.net> Message-ID: <4CDFB683.5000709@v.loewis.de> > We should also keep in mind that *Microsoft* have chosen to keep the > bytes Win32 APIs around, despite their flaws, all in the name of > backwards compatibility. Of course, Microsoft is in a different position. If they remove a functionality in some release, their users typically can't go back and continue to use the old version - at least not on the same computer. For Python, it's different: our users can go back to use an old version if the new one breaks their applications. And we do break applications from time to time, most notably with the introduction of Python 3. > While the goal of nudging third party > developers towards the superior Unicode APIs is an admirable one, it > is still the case that there is a *lot* of ASCII-only code out there. The question is: is there also a lot of ASCII-only Python 3 software out there? And would developers of such software have difficulties to port it to a Unicode file name API. > E.g. applications could easily be storing filenames in an ASCII only > datastore that provides them back to the application as bytes in 3.x. That's speculation. My speculation would be that authors of such a datastore find that they can't even print the data anymore in a reasonable way, so they changed their API to return strings (i.e. decoding from ASCII) when they ported it to Python 3. They wouldn't even consider it a change, because it returned strings all the time, and now Python 3 has a different string type. >> If you drop code without first deprecating it, chances are it will >> hurt someone. That's why having a deprecation period is the rule we >> usually follow (most of the time :-)). I'm in favor of deprecating it first. Regards, Martin From martin at v.loewis.de Sun Nov 14 11:18:07 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 14 Nov 2010 11:18:07 +0100 Subject: [Python-Dev] Stable buildbots In-Reply-To: References: <20101113133712.60e9be27@pitrou.net> Message-ID: <4CDFB75F.7020802@v.loewis.de> > This is a completely separate issue, though probably around just as > long, and like the popup problem its frequency changes over time. By > "hung" here I'm referring to cases where something must go wrong with > a test and/or its cleanup such that a python_d process remains > running, usually several of them at the same time. So I end up with a > bunch of python_d processes in the background (but not with any > dialogs pending), which eventually cause errors during attempts the > next time the same builder is used since the file remains in use. This is what kill_python.exe is supposed to solve. So I recommend to investigate why it fails to kill the hanging Pythons. Regards, Martin From martin at v.loewis.de Sun Nov 14 11:20:47 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 14 Nov 2010 11:20:47 +0100 Subject: [Python-Dev] PEP 385: Formatting of Hg checkin notifications In-Reply-To: References: Message-ID: <4CDFB7FF.1000300@v.loewis.de> Am 14.11.2010 09:25, schrieb Nick Coghlan: > On Sun, Nov 14, 2010 at 1:10 PM, Terry Reedy wrote: >> Much better except possible for \n after 'summary:' > > That extra line break helps more for multi-line checkin messages > (which happen reasonably often). Doesn't really bother me either way - > I'm mainly looking for info on who has the ability to change the > format in the first place :) See http://hg.python.org/hooks/ You should have push permissions to that repository. Regards, Martin From db3l.net at gmail.com Sun Nov 14 11:32:25 2010 From: db3l.net at gmail.com (David Bolen) Date: Sun, 14 Nov 2010 05:32:25 -0500 Subject: [Python-Dev] Stable buildbots References: <20101113133712.60e9be27@pitrou.net> <4CDFB75F.7020802@v.loewis.de> Message-ID: "Martin v. L?wis" writes: > This is what kill_python.exe is supposed to solve. So I recommend to > investigate why it fails to kill the hanging Pythons. Yeah, I know, and I can't say I disagree in principle - not sure why Windows doesn't let the kill in that module work (or if there's an issue actually running it under all conditions). At the moment though, I do know that using the sysinternals pskill utility externally (which is what I currently do interactively) definitely works so to be honest, automating that is a guaranteed bang for buck at this point with no analysis involved. Looking into kill_python or its use can be a follow-on. -- David From ncoghlan at gmail.com Sun Nov 14 12:41:59 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 14 Nov 2010 21:41:59 +1000 Subject: [Python-Dev] unexpected traceback/stack behavior with chained exceptions (issue 1553375) In-Reply-To: <20101114034052.39AE81FC192@kimball.webabinitio.net> References: <20101114034052.39AE81FC192@kimball.webabinitio.net> Message-ID: On Sun, Nov 14, 2010 at 1:40 PM, R. David Murray wrote: > Issue 1553375 [1] proposes a patch to add an 'allframes' option to the > traceback printing and formatting routines so that the full traceback > from the top of the execution stack down to the exception is printed, > instead of just from the point where the exception is caught down to > the exception. ?This is useful when the reason you are capturing the > traceback is to log it, and you have several different points in your > application where you do such traceback logging. ?You often really want > to know the entire stack in that case; logging only from the capture > point down can lose important debugging information depending on how > the application is structured. > > The patch seems to work well, except for one problem that I don't have > enough CPython internals knowledge to understand. ?If the traceback we > are printing has a chained traceback, the resulting full traceback shows > the line that is printing the traceback instead of the line from the 'try' > block. ?(It prints the expected line if there is no chained traceback). > > So, is this a failure in my understanding of how tracebacks are supposed > to work, or a bug in how chained tracebacks are constructed? It looks to me like you're grabbing a reference to a frame that is currently executing and that frame has moved on since the exception was thrown (to your exception handler). The print_stack() call in the patch then accurately reflects this. The other thing to keep in mind is that the exception currently being handled is the *last* one produced by _iter_chain - all of the rest have already been caught and handled, it was the handlers for those that raised the subsequent exceptions in the chain. Basically, the whole patch strikes me as fundamentally misguided. If someone wants this information in their exception handler, they should put a print_stack() with the appropriate header information after the print_exception() call rather than trying to embed it in the display of the exception information. logging could also gain an independent "stack_trace=True" option to request inclusion of a stack trace independently of whether or not exception information is included. (Side note: there's a typo in the format_tb docstring claiming it is a wrapper around extract_stack - that's incorrect, it is a wrapper around extract_tb) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sun Nov 14 12:44:19 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 14 Nov 2010 21:44:19 +1000 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <4CDFB683.5000709@v.loewis.de> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011121308.30368.victor.stinner@haypocalc.com> <4CDEBB11.5050209@m2.ccsnet.ne.jp> <201011140106.55153.victor.stinner@haypocalc.com> <20101114011928.0f1e3d60@pitrou.net> <4CDFB683.5000709@v.loewis.de> Message-ID: On Sun, Nov 14, 2010 at 8:14 PM, "Martin v. L?wis" wrote: > I'm in favor of deprecating it first. Aye. I've made the best case I could for keeping it, and even I don't find it terribly convincing. So deprecation for 3.2 sound like a reasonable option. Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sun Nov 14 12:46:41 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 14 Nov 2010 21:46:41 +1000 Subject: [Python-Dev] PEP 385: Formatting of Hg checkin notifications In-Reply-To: <4CDFB7FF.1000300@v.loewis.de> References: <4CDFB7FF.1000300@v.loewis.de> Message-ID: On Sun, Nov 14, 2010 at 8:20 PM, "Martin v. L?wis" wrote: > Am 14.11.2010 09:25, schrieb Nick Coghlan: >> On Sun, Nov 14, 2010 at 1:10 PM, Terry Reedy wrote: >>> Much better except possible for \n after 'summary:' >> >> That extra line break helps more for multi-line checkin messages >> (which happen reasonably often). Doesn't really bother me either way - >> I'm mainly looking for info on who has the ability to change the >> format in the first place :) > > See > > http://hg.python.org/hooks/ > > You should have push permissions to that repository. Thanks - it will give me a chance to use Hg for something meaningful as well. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sun Nov 14 13:39:40 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 14 Nov 2010 22:39:40 +1000 Subject: [Python-Dev] PEP 385: Formatting of Hg checkin notifications In-Reply-To: <4CDFB7FF.1000300@v.loewis.de> References: <4CDFB7FF.1000300@v.loewis.de> Message-ID: On Sun, Nov 14, 2010 at 8:20 PM, "Martin v. L?wis" wrote: > See > > http://hg.python.org/hooks/ > > You should have push permissions to that repository. I suspect my hg-fu is inadequate to at this point - I get an 'access to repository "hg.python.org/hooks" not permitted' error when I try to push the modified file to "ssh://hg at hg.python.org/hooks". (I actually got the same error when cloning, but if I understand hg correctly, it shouldn't matter that my clone came from the http URL rather than the ssh one). My username and email address in my hgrc file match those in Dirkjan's author map, so I'm not sure what's going on there. The change I tried to make was to replace the last couple of lines of the header creation mail.py's incoming() function with the following 3 lines: body += log.splitlines()[:-2] body += ['summary:\n ' + ctx.description(), ''] body += ['files:\n ' + '\n '.join(ctx.files()), ''] Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From g.brandl at gmx.net Sun Nov 14 14:05:12 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 14 Nov 2010 14:05:12 +0100 Subject: [Python-Dev] PEP 385: Formatting of Hg checkin notifications In-Reply-To: References: <4CDFB7FF.1000300@v.loewis.de> Message-ID: Am 14.11.2010 13:39, schrieb Nick Coghlan: > On Sun, Nov 14, 2010 at 8:20 PM, "Martin v. L?wis" wrote: >> See >> >> http://hg.python.org/hooks/ >> >> You should have push permissions to that repository. > > I suspect my hg-fu is inadequate to at this point - I get an 'access > to repository "hg.python.org/hooks" not permitted' error when I try to > push the modified file to "ssh://hg at hg.python.org/hooks". Martin told you only half the truth: the SSH URL is (currently) . I think we will change that to remove the /repos/ part before going live with the cpython repo, but the hg username remains, corresponding to the pythondev username for SVN. > (I actually > got the same error when cloning, but if I understand hg correctly, it > shouldn't matter that my clone came from the http URL rather than the > ssh one). That's correct. > My username and email address in my hgrc file match those in Dirkjan's > author map, so I'm not sure what's going on there. The usernames and email addresses you use for commits don't matter; as long as you can connect via SSH you can push commits with any author. cheers, Georg From p.f.moore at gmail.com Sun Nov 14 18:31:19 2010 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 14 Nov 2010 17:31:19 +0000 Subject: [Python-Dev] Issues 9931 and 9055 - test_ttk_guionly and buildbot run as a service In-Reply-To: References: Message-ID: On 12 November 2010 17:07, Terry Reedy wrote: > On 11/12/2010 3:44 AM, Paul Moore wrote: >> >> Hi, >> My buildbot has been failing for some time because of these 2 issues, >> both related to the fact that tests are hanging when run as a service >> (and hence have no display to open GUI elements on). Both issues have >> patches, and as far as I am aware, the patches fix the issues >> reasonably well. What can I do to move these 2 issues forwards? As >> things stand, my buildbot is not providing a lot of value on the 3.x >> branch :-( > > http://bugs.python.org/issue9055 > is marked as a 2.7 issue only, perhaps fixed by Tim Golden's committed > patches. Should it be re-versioned for 3.1/2? There is no patch file > attached, though perhaps the code in Yamamoto's message is meant as such > (but for which version?). So the first thing you could do is clarify the > current status and remaining issue on the tracker. Ah, sorry. I misremembered the history - you are right, I suspect this is fixed (at least to the extent that my buildbot isn't permanently red :-)) On rereading, I get the impression that a cleaner fix may be possible by using the ideas in the patch for 9931, but that's probably for another time. > http://bugs.python.org/issue9931 > by Yamamoto is marked for all 3 versions. It seems to be a similar issue, > though marked 'test' rather than 'ctypes'. It does have a patch by him > apparently based on his previous comments. The issue has no responses and > needs a patch review. So the first thing you could do is to provide one;-). > If it looks great (no changes that you can think of) and works great, say > so. Then it can move on to commit review stage. OK, thanks. I'll see if I can provide a review, and see how it goes from there. Really, it's not that urgent that this gets fixed in the wider scheme of things - but as my buildbot is a bit useless while the problem remains, I'm motivated to do what I can to work on it. I'm just a little limited in what I can do, hence the request for suggestions. > PS. Providing links like the above makes it easier for multiple people to > take a look and respond. You're right, and I apologise for that. I sent the email in a hurry and didn't consider others before sending. Paul From p.f.moore at gmail.com Sun Nov 14 18:49:36 2010 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 14 Nov 2010 17:49:36 +0000 Subject: [Python-Dev] Stable buildbots In-Reply-To: References: <20101113133712.60e9be27@pitrou.net> Message-ID: On 14 November 2010 02:40, David Bolen wrote: > There's been a bit of an uptick in the past few weeks with hung > python_d processes (not a new issue, but it ebbs and flows), so I'm > going to try to pull together a monitor script this weekend to start > killing them off automatically. ?Should at least get rid of some of > the low hanging fruit that interferes with subsequent builds. My buildslave (x86 XP-5, see http://www.python.org/dev/buildbot/buildslaves/moore-windows) runs buildbot as a service. I set it up that way as I assumed that would be the most sensible approach to avoid manual intervention on reboots, keeping a user session permanently running, etc. But it seems that there are a few areas where things don't work quite right when run from a service (see, for example, http://bugs.python.org/issue9931) and I assumed that some of my hung python_d processes were related to that. Do you run your slave as a service? (And for that matter, what do other Windows slave owners do?) Are there any "best practices" for ongoing admin of a Windows buildslave that might be worth collecting together? (I'll try to put some notes on what I've found together - maybe a page on the Python wiki would be the best place to collect them). Paul. From martin at v.loewis.de Sun Nov 14 19:27:22 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 14 Nov 2010 19:27:22 +0100 Subject: [Python-Dev] PEP 385: Formatting of Hg checkin notifications In-Reply-To: References: <4CDFB7FF.1000300@v.loewis.de> Message-ID: <4CE02A0A.1070207@v.loewis.de> > I suspect my hg-fu is inadequate to at this point - I get an 'access > to repository "hg.python.org/hooks" not permitted' error when I try to > push the modified file to "ssh://hg at hg.python.org/hooks". Try ssh://hg at hg.python.org/repos/hooks I think this is something that needs to be fixed: I fail to see the point of having this extra repos/ directory in the path (even though it's certainly useful to have them all in a separate directory on disk). It's also unfortunate that hg complains it can't give access to /hooks, when the problem really is that the repository doesn't exist. I guess this is because it tries to create it, and then finds that it can't. Regards, Martin From solipsis at pitrou.net Sun Nov 14 19:35:07 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 14 Nov 2010 19:35:07 +0100 Subject: [Python-Dev] PEP 385: Formatting of Hg checkin notifications References: <4CDFB7FF.1000300@v.loewis.de> <4CE02A0A.1070207@v.loewis.de> Message-ID: <20101114193507.7959c860@pitrou.net> On Sun, 14 Nov 2010 19:27:22 +0100 "Martin v. L?wis" wrote: > > I suspect my hg-fu is inadequate to at this point - I get an 'access > > to repository "hg.python.org/hooks" not permitted' error when I try to > > push the modified file to "ssh://hg at hg.python.org/hooks". > > Try > > ssh://hg at hg.python.org/repos/hooks > > I think this is something that needs to be fixed: I fail to see the > point of having this extra repos/ directory in the path (even though > it's certainly useful to have them all in a separate directory on disk). IIUC, "repos/hooks" is interpreted as a relative path to the "hg" user's HOME. The "ssh://" scheme executes remote hg over an ssh session, I don't think there's any additional magic. Regards Antoine. From martin at v.loewis.de Sun Nov 14 19:49:44 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 14 Nov 2010 19:49:44 +0100 Subject: [Python-Dev] PEP 385: Formatting of Hg checkin notifications In-Reply-To: <20101114193507.7959c860@pitrou.net> References: <4CDFB7FF.1000300@v.loewis.de> <4CE02A0A.1070207@v.loewis.de> <20101114193507.7959c860@pitrou.net> Message-ID: <4CE02F48.3040207@v.loewis.de> >> I think this is something that needs to be fixed: I fail to see the >> point of having this extra repos/ directory in the path (even though >> it's certainly useful to have them all in a separate directory on disk). > > IIUC, "repos/hooks" is interpreted as a relative path to the "hg" > user's HOME. The "ssh://" scheme executes remote hg over an ssh > session, I don't think there's any additional magic. Correct. However, this just means that additional magic is required. Regards, Martin From vinay_sajip at yahoo.co.uk Sun Nov 14 21:05:16 2010 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Sun, 14 Nov 2010 20:05:16 +0000 (UTC) Subject: [Python-Dev] unexpected traceback/stack behavior with chained exceptions (issue 1553375) References: <20101114034052.39AE81FC192@kimball.webabinitio.net> Message-ID: Nick Coghlan gmail.com> writes: > of the exception information. logging could also gain an independent > "stack_trace=True" option to request inclusion of a stack trace > independently of whether or not exception information is included. Good point, Nick. There are times when you'd want to know how you got to a certain point in code, irrespective of whether any exception occurred. So your suggestion makes sense, and I'll try and see if I can get it into 3.2. Another benefit of this is that a user only gets this if they want it; if I were to use the allframes flag in logging, then everyone would get the print_stack() even if they didn't want it. Regards, Vinay Sajip From g.brandl at gmx.net Sun Nov 14 21:36:37 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 14 Nov 2010 21:36:37 +0100 Subject: [Python-Dev] PEP 385: Formatting of Hg checkin notifications In-Reply-To: <20101114193507.7959c860@pitrou.net> References: <4CDFB7FF.1000300@v.loewis.de> <4CE02A0A.1070207@v.loewis.de> <20101114193507.7959c860@pitrou.net> Message-ID: Am 14.11.2010 19:35, schrieb Antoine Pitrou: > On Sun, 14 Nov 2010 19:27:22 +0100 > "Martin v. L?wis" wrote: >> > I suspect my hg-fu is inadequate to at this point - I get an 'access >> > to repository "hg.python.org/hooks" not permitted' error when I try to >> > push the modified file to "ssh://hg at hg.python.org/hooks". >> >> Try >> >> ssh://hg at hg.python.org/repos/hooks >> >> I think this is something that needs to be fixed: I fail to see the >> point of having this extra repos/ directory in the path (even though >> it's certainly useful to have them all in a separate directory on disk). > > IIUC, "repos/hooks" is interpreted as a relative path to the "hg" > user's HOME. The "ssh://" scheme executes remote hg over an ssh > session, I don't think there's any additional magic. There is; we already have a custom authorized_keys command in place to call the hg-ssh wrapper, and all that's needed is to customize that command a bit more. Georg From db3l.net at gmail.com Sun Nov 14 22:24:55 2010 From: db3l.net at gmail.com (David Bolen) Date: Sun, 14 Nov 2010 16:24:55 -0500 Subject: [Python-Dev] Stable buildbots References: <20101113133712.60e9be27@pitrou.net> Message-ID: Paul Moore writes: > Do you run your slave as a service? (And for that matter, what do > other Windows slave owners do?) Are there any "best practices" for > ongoing admin of a Windows buildslave that might be worth collecting > together? (I'll try to put some notes on what I've found together - > maybe a page on the Python wiki would be the best place to collect > them). I've always run my slave interactively under Windows (well, started it interactively). Not sure if I tried a service in the beginning or not, it was a while ago. So your slave is probably the guinea pig for service operation. There is http://wiki.python.org/moin/BuildbotOnWindows (for which I can't take any credit). It could probably use a little love and updating, and it's largely aimed at setting things up, but not as much operating it. I think the only stuff I'm doing on my slave above and beyond the basic setup is a small patch to buildbot (circa 2007, couldn't get it back upstream at the time) to use SetErrorMode to disable OS pop-ups, and the AutoIt script (from earlier this year) to auto-acknowledge C RTL pop-ups. The kill script in this thread as a safety net above kill_python would be a third tweak. There was a buildbot fix for uploading that was only needed for the short-lived MSI generation, and which I think later buildbot versions have their own changes for. I'd be happy to work with you if you're willing to combine/edit our bits of information. Probably something we can take off-list, so just let me know. -- David From ncoghlan at gmail.com Mon Nov 15 12:45:46 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 15 Nov 2010 21:45:46 +1000 Subject: [Python-Dev] PEP 385: Formatting of Hg checkin notifications In-Reply-To: <4CE02A0A.1070207@v.loewis.de> References: <4CDFB7FF.1000300@v.loewis.de> <4CE02A0A.1070207@v.loewis.de> Message-ID: On Mon, Nov 15, 2010 at 4:27 AM, "Martin v. L?wis" wrote: >> I suspect my hg-fu is inadequate to at this point - I get an 'access >> to repository "hg.python.org/hooks" not permitted' error when I try to >> push the modified file to "ssh://hg at hg.python.org/hooks". > > Try > > ssh://hg at hg.python.org/repos/hooks And done :) Hopefully I didn't break anything in the process... Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Mon Nov 15 14:24:01 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 15 Nov 2010 23:24:01 +1000 Subject: [Python-Dev] [Python-checkins] r86467 - in python/branches/py3k: Doc/library/logging.rst Lib/logging/__init__.py Misc/NEWS In-Reply-To: <20101114213304.ED32AEE997@mail.python.org> References: <20101114213304.ED32AEE997@mail.python.org> Message-ID: On Mon, Nov 15, 2010 at 7:33 AM, vinay.sajip wrote: > > + ? .. attribute:: stack_info > + > + ? ? ?Stack frame information (where available) from the bottom of the stack > + ? ? ?in the current thread, up to and including the stack frame of the > + ? ? ?logging call which resulted in the creation of this record. > + Interesting - my mental model of the call stack is that the outermost frame is the top of the stack and the stack grows downwards as calls are executed (there are a few idioms like "recursive descent", the intuitive parallel with "inner functions" being lower in the stack than "outer functions" as well as the order in which Python prints stack traces that reinforce this view). According to the sys.getframe documentation, my mental model is wrong though :) (I'll note that the documentation of frame objects in the language reference itself appears a little confused on the matter - either that or I'm completely misunderstanding when writing to f_lineno will work) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From reid.kleckner at gmail.com Mon Nov 15 18:01:36 2010 From: reid.kleckner at gmail.com (Reid Kleckner) Date: Mon, 15 Nov 2010 12:01:36 -0500 Subject: [Python-Dev] [Python-checkins] r86467 - in python/branches/py3k: Doc/library/logging.rst Lib/logging/__init__.py Misc/NEWS In-Reply-To: References: <20101114213304.ED32AEE997@mail.python.org> Message-ID: On Mon, Nov 15, 2010 at 8:24 AM, Nick Coghlan wrote: > On Mon, Nov 15, 2010 at 7:33 AM, vinay.sajip wrote: >> >> + ? .. attribute:: stack_info >> + >> + ? ? ?Stack frame information (where available) from the bottom of the stack >> + ? ? ?in the current thread, up to and including the stack frame of the >> + ? ? ?logging call which resulted in the creation of this record. >> + > > Interesting - my mental model of the call stack is that the outermost > frame is the top of the stack and the stack grows downwards as calls > are executed (there are a few idioms like "recursive descent", the > intuitive parallel with "inner functions" being lower in the stack > than "outer functions" as well as the order in which Python prints > stack traces that reinforce this view). Probably because the C stack tends to grow down for most architectures, but most stack data structures are implemented over arrays and hence, grow upwards from 0. Depending on the author's background, they probably use one mental model or the other. Reid From techtonik at gmail.com Mon Nov 15 21:43:08 2010 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 15 Nov 2010 22:43:08 +0200 Subject: [Python-Dev] PEP 385: Formatting of Hg checkin notifications In-Reply-To: References: Message-ID: On Sun, Nov 14, 2010 at 5:10 AM, Terry Reedy wrote: > On 11/13/2010 8:28 PM, Nick Coghlan wrote: >> >> Following the python-checkins list, I get to see both the current SVN >> notifications and the Hg notifications from Tarek's pushes into the >> distutils repository. I realised today that there is one key reason as >> to why the latter strikes me as a big wall of unintelligible text, >> while I find the SVN notification quite easy to read: vertical >> whitespace. >> >> The SVN notification uses vertical whitespace to separate out the log >> message and the list of files affected clearly from the rest of the >> header fields. It makes it *really* easy to see at a glance what the >> checkin was about and which files were affected. For the Hg >> notification, both of these fields are embedded in a big header block >> along with all the other fields, so it is quite difficult to make out >> the same information. >> >> It would be really nice if the formatting could be improved for the >> email notifications on the Hg side when we adopt it for the main >> CPython repository. The changes would be to: >> - add a blank line before and after the summary field >> - add a carriage return between the header and content for the summary >> field and the files field >> - indent the list of files by two spaces and use a carriage return >> rather than a comma to separate named files >> >> I've included an example below based on one of Tarek's recent pushes: >> >> Current Hg notification header and start of first diff: >> ================================================ >> tarek.ziade pushed 7ebf14ab2840 to distutils2: >> >> http://hg.python.org/distutils2/rev/7ebf14ab2840 >> changeset: ? 816:7ebf14ab2840 >> tag: ? ? ? ? tip >> user: ? ? ? ?Tarek Ziade >> date: ? ? ? ?Sat Nov 13 12:40:33 2010 +0100 >> summary: ? ? compiler_type -> ?name >> files: ? ? ? distutils2/compiler/__init__.py, >> distutils2/compiler/bcppcompiler.py, distutils2/compiler/ccompiler.py, >> distutils2/compiler/cygwinccompiler.py, >> distutils2/compiler/msvc9compiler.py, >> distutils2/compiler/msvccompiler.py, >> distutils2/compiler/unixccompiler.py, distutils2/tests/test_config.py >> >> diff --git a/distutils2/compiler/__init__.py >> b/distutils2/compiler/__init__.py >> --- a/distutils2/compiler/__init__.py >> +++ b/distutils2/compiler/__init__.py >> @@ -13,7 +13,7 @@ >> ==================================================== >> >> Proposed change to separate out summary and files fields: >> ================================================ >> tarek.ziade pushed 7ebf14ab2840 to distutils2: >> >> http://hg.python.org/distutils2/rev/7ebf14ab2840 >> changeset: ? 816:7ebf14ab2840 >> tag: ? ? ? ? tip >> user: ? ? ? ?Tarek Ziade >> date: ? ? ? ?Sat Nov 13 12:40:33 2010 +0100 >> >> summary: >> compiler_type -> ?name >> >> files: >> ? distutils2/compiler/__init__.py >> ? distutils2/compiler/bcppcompiler.py >> ? distutils2/compiler/ccompiler.py >> ? distutils2/compiler/cygwinccompiler.py >> ? distutils2/compiler/msvc9compiler.py >> ? distutils2/compiler/msvccompiler.py >> ? distutils2/compiler/unixccompiler.py >> ? distutils2/tests/test_config.py >> >> diff --git a/distutils2/compiler/__init__.py >> b/distutils2/compiler/__init__.py >> --- a/distutils2/compiler/__init__.py >> +++ b/distutils2/compiler/__init__.py >> @@ -13,7 +13,7 @@ >> ==================================================== > > Much better except possible for \n after 'summary:' Why not to drop "summary" label at all? The purpose of the text delimited with newlines is quite obvious. -- anatoly t. From brian.curtin at gmail.com Tue Nov 16 01:23:51 2010 From: brian.curtin at gmail.com (Brian Curtin) Date: Mon, 15 Nov 2010 18:23:51 -0600 Subject: [Python-Dev] Stable buildbots In-Reply-To: References: <20101113133712.60e9be27@pitrou.net> Message-ID: On Sun, Nov 14, 2010 at 02:48, David Bolen wrote: > Nick Coghlan writes: > > > Do we have any idea why the workaround to avoid the popup windows > > stopped working? (assuming it ever worked reliably - I thought it did, > > but that impression may have been incorrect) > > Oh, the pop-up handling for the RTL dialogs still seems to be working > fine (at least I haven't seen any since I put it in place). That, plus > the original buildbot tweaks to block any OS popups still looks solid > for avoiding any dialogs that block a test process. > > This is a completely separate issue, though probably around just as > long, and like the popup problem its frequency changes over time. By > "hung" here I'm referring to cases where something must go wrong with > a test and/or its cleanup such that a python_d process remains > running, usually several of them at the same time. So I end up with a > bunch of python_d processes in the background (but not with any > dialogs pending), which eventually cause errors during attempts the > next time the same builder is used since the file remains in use. > > I expect some of this may be the lack of a good process group cleanup > under Windows, though the root cause may not be unique to Windows. I > see something very similar reasonable frequency on my OSX Tiger > buildbot as well. But since the filesystem there can let the build > tree get cleaned and rebuilt even with a stranded executable, the > impact is minimal on subsequent tests than on Windows, though the OSX > processes do burn a ton of CPU. I run a script on OSX to kill them > off, but that was quick to whip up since in those cases the stranded > processes all end up getting owned by init so it's a simple ps grep > and kill. In the Windows case I'll probably just set a time limit so > if the processes have been around more than a few hours I figure > they're safe to kill. > > -- David Is the dialog closer script available somewhere? I'm guessing this is the same script that closes the window which pops up during test_capi's crash? I just setup a Windows Server 2008 R2 x64 build slave and noticed it hanging due to the popup. -------------- next part -------------- An HTML attachment was scrubbed... URL: From db3l.net at gmail.com Tue Nov 16 03:35:05 2010 From: db3l.net at gmail.com (David Bolen) Date: Mon, 15 Nov 2010 21:35:05 -0500 Subject: [Python-Dev] Stable buildbots References: <20101113133712.60e9be27@pitrou.net> Message-ID: Brian Curtin writes: > Is the dialog closer script available somewhere? I'm guessing this is the > same script that closes the window which pops up during test_capi's crash? Not sure about that specific test, as I won't normally see the windows. If the failure is causing a C RTL pop-up, then yes, the script will be closing it. If the test is generating an OS level pop-up (process error dialog from the OS, not RTL) then that is instead suppressed for any of the child processes run on my slave, so it never shows up at all. The RTL script is trivial enough that I'll just include it inline: - - - - - - - - - - - - - - - - - - - - - - - - - ; buildbot.au3 ; Forceably acknowledge any RTL pop-ups that may occur during testing $MSVCRT = "Microsoft Visual C++ Runtime Library" while 1 ; Wait for any RTL pop-up and then acknowledge WinWait($MSVCRT) ControlClick($MSVCRT, "", "[CLASS:Button; TEXT:OK]") ; Safety check to avoid spinning if it doesn't go away Sleep(1000) WEnd - - - - - - - - - - - - - - - - - - - - - - - - - Execute with AutoIt3 (http://www.autoitscript.com/autoit3/). I just use the plain autoit3.exe against this script from the Startup folder. The error mode buildbot patch was discussed in the past on this list (or it might have been the python-3000-devel list at the time). Originally it just used pywin32, but I added a fallback to ctypes if available. When first done, we were still building pre-2.5 builds - I suppose at this point it could just assume the presence of ctypes. The patch below is from 0.7.11p3: - - - - - - - - - - - - - - - - - - - - - - - - - --- commands.py 2009-08-13 11:53:17.000000000 -0400 +++ /cygdrive/d/python/2.6/lib/site-packages/buildbot/slave/commands.py 2009-11-08 02:09:38.000000000 -0500 @@ -489,6 +489,23 @@ if not self.keepStdinOpen: self.pp.closeStdin() + # [db3l] Under Win32, try to control error mode + win32_SetErrorMode = None + if runtime.platformType == 'win32': + try: + import win32api + win32_SetErrorMode = win32api.SetErrorMode + except: + try: + import ctypes + win32_SetErrorMode = ctypes.windll.kernel32.SetErrorMode + except: + pass + + if win32_SetErrorMode: + log.msg(" Setting Windows error mode") + old_err_mode = win32_SetErrorMode(7) + # win32eventreactor's spawnProcess (under twisted <= 2.0.1) returns # None, as opposed to all the posixbase-derived reactors (which # return the new Process object). This is a nuisance. We can make up @@ -509,6 +526,10 @@ if not self.process: self.process = p + # [db3l] + if win32_SetErrorMode: + win32_SetErrorMode(old_err_mode) + # connectionMade also closes stdin as long as we're not using a PTY. # This is intended to kill off inappropriately interactive commands # better than the (long) hung-command timeout. ProcessPTY should be - - - - - - - - - - - - - - - - - - - - - - - - - -- David From janssen at parc.com Tue Nov 16 04:57:10 2010 From: janssen at parc.com (Bill Janssen) Date: Mon, 15 Nov 2010 19:57:10 PST Subject: [Python-Dev] Stable buildbots In-Reply-To: References: <20101113133712.60e9be27@pitrou.net> Message-ID: <30929.1289879830@parc.com> Both the Tiger buildbots are suddenly failing 3.x on test_cmd_line. Looking at the changes since the last success, I can't see anything which would obviously affect that... Any suspects? Here's what's failing: ====================================================================== ERROR: test_run_code (test.test_cmd_line.CmdLineTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_cmd_line.py", line 95, in test_run_code assert_python_failure('-c') File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py", line 55, in assert_python_failure return _assert_python(False, *args, **env_vars) File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py", line 29, in _assert_python env=env) File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/subprocess.py", line 683, in __init__ self.stdin = io.open(p2cwrite, 'wb', bufsize) OSError: [Errno 9] Bad file descriptor ====================================================================== ERROR: test_run_module (test.test_cmd_line.CmdLineTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_cmd_line.py", line 72, in test_run_module assert_python_failure('-m') File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py", line 55, in assert_python_failure return _assert_python(False, *args, **env_vars) File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py", line 29, in _assert_python env=env) File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/subprocess.py", line 683, in __init__ self.stdin = io.open(p2cwrite, 'wb', bufsize) OSError: [Errno 9] Bad file descriptor ====================================================================== ERROR: test_version (test.test_cmd_line.CmdLineTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_cmd_line.py", line 48, in test_version rc, out, err = assert_python_ok('-V') File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py", line 48, in assert_python_ok return _assert_python(True, *args, **env_vars) File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py", line 29, in _assert_python env=env) File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/subprocess.py", line 683, in __init__ self.stdin = io.open(p2cwrite, 'wb', bufsize) OSError: [Errno 9] Bad file descriptor Bill From nad at acm.org Tue Nov 16 10:21:29 2010 From: nad at acm.org (Ned Deily) Date: Tue, 16 Nov 2010 01:21:29 -0800 Subject: [Python-Dev] Stable buildbots References: <20101113133712.60e9be27@pitrou.net> <30929.1289879830@parc.com> Message-ID: In article <30929.1289879830 at parc.com>, Bill Janssen wrote: > Both the Tiger buildbots are suddenly failing 3.x on test_cmd_line. > Looking at the changes since the last success, I can't see anything > which would obviously affect that... Any suspects? It appears to be a duplicate of Issue8458. Playing with it again, it seems to be a race condition: sometimes I see all three failures you reported, sometimes just one, sometimes none. Again, only on 10.4 (Tiger), not 10.5 or 10.6. But the 10.4 machine I'm using is by far the slowest of the three so it is possible that could be a factor. Perhaps a race condition with cleaning up the p2c pipe from a previous run? > Here's what's failing: > > ====================================================================== > ERROR: test_run_code (test.test_cmd_line.CmdLineTest) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_cmd_line.py" > , line 95, in test_run_code > assert_python_failure('-c') > File > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py" > , line 55, in assert_python_failure > return _assert_python(False, *args, **env_vars) > File > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py" > , line 29, in _assert_python > env=env) > File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/subprocess.py", > line 683, in __init__ > self.stdin = io.open(p2cwrite, 'wb', bufsize) > OSError: [Errno 9] Bad file descriptor > > ====================================================================== > ERROR: test_run_module (test.test_cmd_line.CmdLineTest) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_cmd_line.py" > , line 72, in test_run_module > assert_python_failure('-m') > File > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py" > , line 55, in assert_python_failure > return _assert_python(False, *args, **env_vars) > File > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py" > , line 29, in _assert_python > env=env) > File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/subprocess.py", > line 683, in __init__ > self.stdin = io.open(p2cwrite, 'wb', bufsize) > OSError: [Errno 9] Bad file descriptor > > ====================================================================== > ERROR: test_version (test.test_cmd_line.CmdLineTest) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_cmd_line.py" > , line 48, in test_version > rc, out, err = assert_python_ok('-V') > File > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py" > , line 48, in assert_python_ok > return _assert_python(True, *args, **env_vars) > File > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py" > , line 29, in _assert_python > env=env) > File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/subprocess.py", > line 683, in __init__ > self.stdin = io.open(p2cwrite, 'wb', bufsize) > OSError: [Errno 9] Bad file descriptor -- Ned Deily, nad at acm.org From georg at python.org Tue Nov 16 15:05:51 2010 From: georg at python.org (Georg Brandl) Date: Tue, 16 Nov 2010 15:05:51 +0100 Subject: [Python-Dev] [RELEASED] Python 3.2 alpha 4 Message-ID: <4CE28FBF.9020200@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On behalf of the Python development team, I'm happy to announce the fourth and (this time really) final alpha preview release of Python 3.2. Python 3.2 is a continuation of the efforts to improve and stabilize the Python 3.x line. Since the final release of Python 2.7, the 2.x line will only receive bugfixes, and new features are developed for 3.x only. Since PEP 3003, the Moratorium on Language Changes, is in effect, there are no changes in Python's syntax and built-in types in Python 3.2. Development efforts concentrated on the standard library and support for porting code to Python 3. Highlights are: * numerous improvements to the unittest module * PEP 3147, support for .pyc repository directories * PEP 3149, support for version tagged dynamic libraries * an overhauled GIL implementation that reduces contention * many consistency and behavior fixes for numeric operations * countless fixes regarding string/unicode issues; among them full support for a bytes environment (filenames, environment variables) * a sysconfig module to access configuration information * a pure-Python implementation of the datetime module * additions to the shutil module, among them archive file support * improvements to pdb, the Python debugger For an extensive list of changes in 3.2, see Misc/NEWS in the Python distribution. To download Python 3.2 visit: http://www.python.org/download/releases/3.2/ 3.2 documentation can be found at: http://docs.python.org/3.2/ Please consider trying Python 3.2 with your code and reporting any bugs you may notice to: http://bugs.python.org/ Enjoy! - -- Georg Brandl, Release Manager georg at python.org (on behalf of the entire python-dev team and 3.2's contributors) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) iEYEARECAAYFAkzij74ACgkQN9GcIYhpnLCbtwCgi4whRruM0Oi6yfgjVclYErFa OJcAn0U8UBBsQBFyGcnKJRbls6B+guQ2 =Vuqf -----END PGP SIGNATURE----- From p.f.moore at gmail.com Tue Nov 16 16:05:49 2010 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 16 Nov 2010 15:05:49 +0000 Subject: [Python-Dev] [RELEASED] Python 3.2 alpha 4 In-Reply-To: References: <4CE28FBF.9020200@python.org> Message-ID: (Copying to the list, sorry Georg for the duplicate) On 16 November 2010 14:05, Georg Brandl wrote: > On behalf of the Python development team, I'm happy to announce the > fourth and (this time really) final alpha preview release of Python 3.2. PEP 3148 (Futures) is noted in the PEP as going into 3.2, It also seems to be in the release. Should it not be added to the "What's new in 3.2" document and the release announcements? It's a fairly significant feature. Paul. From alexander.belopolsky at gmail.com Tue Nov 16 16:16:15 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 16 Nov 2010 10:16:15 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> Message-ID: What this thread has shown is that there is no consensus on what public names are and what rules should be followed when changing names that can be imported from a module. I have opened an issue at http://bugs.python.org/issue10434 to address this. My vote is to adopt the definition spelled out in the language reference, copy it to the library manual and add some discussion of the deprecation policies. I also have a similar question about C API. Here, in absence of __all__, the answer should be clear: all symbols in public header files should start with either _Py_ or Py_ and those that start with Py_ are public. The question is what should be done with names that start with Py_, but are not documented? Can we add an underscore to those names? If so, should a (deprecated) alias be made available? Should they be documented as deprecated? I think these questions can only be answered on a case by case bases which choices being: 1. Document. 2. Document as deprecated. 3. Document as deprecated, add underscore prefix and retain a deprecated alias. 4. Add an underscore prefix. The specific set of names that I would like to consider is the following from unicode.h. I am marking with (*) the names that I think should be documented and with (D) those that should be deprecated: PyUnicode_GetMax PyUnicode_Resize (*) PyUnicode_InternImmortal PyUnicode_FromOrdinal (*) PyUnicode_GetDefaultEncoding (D) PyUnicode_AsDecodedObject PyUnicode_AsDecodedUnicode PyUnicode_AsEncodedObject PyUnicode_AsEncodedUnicode PyUnicode_BuildEncodingMap PyUnicode_EncodeDecimal (*) PyUnicode_Append (*) PyUnicode_AppendAndDel (*) PyUnicode_Partition (*) PyUnicode_RPartition (*) PyUnicode_RSplit (*) PyUnicode_IsIdentifier (*) Py_UNICODE_strlen Py_UNICODE_strcpy Py_UNICODE_strcat Py_UNICODE_strncpy Py_UNICODE_strcmp Py_UNICODE_strncmp Py_UNICODE_strchr Py_UNICODE_strrchr On Sat, Nov 13, 2010 at 7:12 AM, Giampaolo Rodol? wrote: > +1 on everything. > > 2010/11/11 Alexander Belopolsky : >> 2010/11/11 Michael Foord : >> .. >>>> You mean runtime automation, e.g. creating __all__ on the fly omitting >>>> underscored names? >>>> >>> Writing code to generate a __all__ that duplicates the default behaviour >>> seems redundant to me. >>> >> >> FWIW, I like having __all__ at the top of the module. ?It feels like a >> table of contents at the start of a chapter. ?In some cases it may >> also serve as an optimization when len(__all__) is much smaller than >> len(__dict__). ?I also don't like _ prefix to become an exclusive >> means to express privateness. >> >> I think the current definition of "public names" is a good one and >> just needs to be made more visible in the docs. ?If the module defines >> __all__, that should be the ultimate answer to what is public in that >> module. ? (Users should learn to use help(module) instead of >> dir(module) for API discovery.) ? If __all__ is not defined in the >> module, I think it is good to introduce it after a careful review of >> what it should contain. ?And __all__ should never contain names that >> start with _. >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: http://mail.python.org/mailman/options/python-dev/g.rodola%40gmail.com >> > From fuzzyman at voidspace.org.uk Tue Nov 16 16:31:10 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 16 Nov 2010 15:31:10 +0000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> Message-ID: <4CE2A3BE.6060308@voidspace.org.uk> On 16/11/2010 15:16, Alexander Belopolsky wrote: > What this thread has shown is that there is no consensus on what > public names are and what rules should be followed when changing names > that can be imported from a module. I have opened an issue at > http://bugs.python.org/issue10434 to address this. My vote is to > adopt the definition spelled out in the language reference, copy it to > the library manual and add some discussion of the deprecation > policies. > Whilst the definition in the reference manual is fine it only covers module level public APIs (which I realise is your particular concern) it doesn't cover whether a module in a package is public and doesn't cover class members. The rules for these follow as a natural extension, but if we are going to bother codifying the rules (which I think is good given the confusion) then it is worth covering these cases. I posted a suggested wording in an earlier message: http://mail.python.org/pipermail/python-dev/2010-November/105476.html We could also note that existing modules that don't follow these rules will generally follow the deprecation rules for "accidentally public" names, but that this will be decided on a case-by-case basis and that names *obviously* never intended to be public may be changed if it is believed that they aren't (or really shouldn't be) in use. All the best, Michael Foord > I also have a similar question about C API. Here, in absence of > __all__, the answer should be clear: all symbols in public header > files should start with either _Py_ or Py_ and those that start with > Py_ are public. The question is what should be done with names that > start with Py_, but are not documented? Can we add an underscore to > those names? If so, should a (deprecated) alias be made available? > Should they be documented as deprecated? > > I think these questions can only be answered on a case by case bases > which choices being: > > 1. Document. > 2. Document as deprecated. > 3. Document as deprecated, add underscore prefix and retain a deprecated alias. > 4. Add an underscore prefix. > > The specific set of names that I would like to consider is the > following from unicode.h. I am marking with (*) the names that I > think should be documented and with (D) those that should be > deprecated: > > PyUnicode_GetMax > PyUnicode_Resize (*) > PyUnicode_InternImmortal > PyUnicode_FromOrdinal (*) > PyUnicode_GetDefaultEncoding (D) > PyUnicode_AsDecodedObject > PyUnicode_AsDecodedUnicode > PyUnicode_AsEncodedObject > PyUnicode_AsEncodedUnicode > PyUnicode_BuildEncodingMap > PyUnicode_EncodeDecimal (*) > PyUnicode_Append (*) > PyUnicode_AppendAndDel (*) > PyUnicode_Partition (*) > PyUnicode_RPartition (*) > PyUnicode_RSplit (*) > PyUnicode_IsIdentifier (*) > Py_UNICODE_strlen > Py_UNICODE_strcpy > Py_UNICODE_strcat > Py_UNICODE_strncpy > Py_UNICODE_strcmp > Py_UNICODE_strncmp > Py_UNICODE_strchr > Py_UNICODE_strrchr > > > On Sat, Nov 13, 2010 at 7:12 AM, Giampaolo Rodol? wrote: >> +1 on everything. >> >> 2010/11/11 Alexander Belopolsky : >>> 2010/11/11 Michael Foord : >>> .. >>>>> You mean runtime automation, e.g. creating __all__ on the fly omitting >>>>> underscored names? >>>>> >>>> Writing code to generate a __all__ that duplicates the default behaviour >>>> seems redundant to me. >>>> >>> FWIW, I like having __all__ at the top of the module. It feels like a >>> table of contents at the start of a chapter. In some cases it may >>> also serve as an optimization when len(__all__) is much smaller than >>> len(__dict__). I also don't like _ prefix to become an exclusive >>> means to express privateness. >>> >>> I think the current definition of "public names" is a good one and >>> just needs to be made more visible in the docs. If the module defines >>> __all__, that should be the ultimate answer to what is public in that >>> module. (Users should learn to use help(module) instead of >>> dir(module) for API discovery.) If __all__ is not defined in the >>> module, I think it is good to introduce it after a careful review of >>> what it should contain. And __all__ should never contain names that >>> start with _. >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev at python.org >>> http://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/g.rodola%40gmail.com >>> -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From mal at egenix.com Tue Nov 16 16:38:04 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 16 Nov 2010 16:38:04 +0100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> Message-ID: <4CE2A55C.8030807@egenix.com> Alexander Belopolsky wrote: > What this thread has shown is that there is no consensus on what > public names are and what rules should be followed when changing names > that can be imported from a module. I have opened an issue at > http://bugs.python.org/issue10434 to address this. My vote is to > adopt the definition spelled out in the language reference, copy it to > the library manual and add some discussion of the deprecation > policies. > > I also have a similar question about C API. Here, in absence of > __all__, the answer should be clear: all symbols in public header > files should start with either _Py_ or Py_ and those that start with > Py_ are public. The question is what should be done with names that > start with Py_, but are not documented? Can we add an underscore to > those names? If so, should a (deprecated) alias be made available? > Should they be documented as deprecated? > > I think these questions can only be answered on a case by case bases > which choices being: > > 1. Document. > 2. Document as deprecated. > 3. Document as deprecated, add underscore prefix and retain a deprecated alias. > 4. Add an underscore prefix. > > The specific set of names that I would like to consider is the > following from unicode.h. I am marking with (*) the names that I > think should be documented and with (D) those that should be > deprecated: > > PyUnicode_GetMax > PyUnicode_Resize (*) > PyUnicode_InternImmortal > PyUnicode_FromOrdinal (*) > PyUnicode_GetDefaultEncoding (D) > PyUnicode_AsDecodedObject > PyUnicode_AsDecodedUnicode > PyUnicode_AsEncodedObject > PyUnicode_AsEncodedUnicode > PyUnicode_BuildEncodingMap > PyUnicode_EncodeDecimal (*) > PyUnicode_Append (*) > PyUnicode_AppendAndDel (*) > PyUnicode_Partition (*) > PyUnicode_RPartition (*) > PyUnicode_RSplit (*) > PyUnicode_IsIdentifier (*) > Py_UNICODE_strlen > Py_UNICODE_strcpy > Py_UNICODE_strcat > Py_UNICODE_strncpy > Py_UNICODE_strcmp > Py_UNICODE_strncmp > Py_UNICODE_strchr > Py_UNICODE_strrchr For Unicode, unicodeobject.h defines which APIs are private or not. APIs which don't appear in the header file are either private or need to be added to the header file (but I don't think there are any in this category). All APIs in the header that do not appear in the documentation, should be added there as well. unicodeobject.h already provides documentation for most of the APIs you've listed above (except some new ones that were added later on). One API I'm not sure about is PyUnicode_AppendAndDel(). It's somewhat obscure and given that we already have PyUnicode_Concat(), I think it should be made private and eventually dropped. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 16 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From guido at python.org Tue Nov 16 16:48:20 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Nov 2010 07:48:20 -0800 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> Message-ID: On Tue, Nov 16, 2010 at 7:16 AM, Alexander Belopolsky wrote: > What this thread has shown is that there is no consensus on what > public names are and what rules should be followed when changing names > that can be imported from a module. ?I have opened an issue at > http://bugs.python.org/issue10434 to address this. ?My vote is to > adopt the definition spelled out in the language reference, copy it to > the library manual and add some discussion of the deprecation > policies. Hm. Apart from the specific semantics assigned by the language to single and double leading (and trailing) underscores, I still think this belongs in a style guide, not in the library manual. When reading the library manual, one should always assume that undocumented features are subject to change at any time. When writing library code, one should of course be much more conservative, and guidelines for contributors are needed to ensure that in the future we won't repeat the mistakes of the past (mostly my own mistakes :-). > I also have a similar question about C API. ?Here, in absence of > __all__, the answer should be clear: all symbols in public header > files should start with either _Py_ or Py_ and those that start with > Py_ are public. ? The question is what should be done with names that > start with Py_, but are not documented? ?Can we add an underscore to > those names? ?If so, should a (deprecated) alias be made available? > Should they be documented as deprecated? Even more care should be taken here, since breakage is harder to fix, especially in 3rd party code that needs to be compatible with a wide range of Python versions. The good news here is that the intended rule is very clear: - *no* symbols that don't start with Py_ or _Py_ (unless there's a technical reason why it can't be named that way) - public == Py_ - private == _Py_ > I think these questions can only be answered on a case by case bases Right! > which choices being: > > 1. Document. > 2. Document as deprecated. > 3. Document as deprecated, add underscore prefix and retain a deprecated alias. > 4. Add an underscore prefix. > > The specific set of names that I would like to consider is the > following from unicode.h. ?I am marking with (*) the names that I > think should be documented and with (D) those that should be > deprecated: > > PyUnicode_GetMax > PyUnicode_Resize (*) > PyUnicode_InternImmortal > PyUnicode_FromOrdinal (*) > PyUnicode_GetDefaultEncoding (D) > PyUnicode_AsDecodedObject > PyUnicode_AsDecodedUnicode > PyUnicode_AsEncodedObject > PyUnicode_AsEncodedUnicode > PyUnicode_BuildEncodingMap > PyUnicode_EncodeDecimal (*) > PyUnicode_Append (*) > PyUnicode_AppendAndDel (*) > PyUnicode_Partition (*) > PyUnicode_RPartition (*) > PyUnicode_RSplit (*) > PyUnicode_IsIdentifier (*) > Py_UNICODE_strlen > Py_UNICODE_strcpy > Py_UNICODE_strcat > Py_UNICODE_strncpy > Py_UNICODE_strcmp > Py_UNICODE_strncmp > Py_UNICODE_strchr > Py_UNICODE_strrchr I'll leave this to others more familiar with the Unicode code; I would recommend being fairly conservative though since these have been around for a long time. -- --Guido van Rossum (python.org/~guido) From janssen at parc.com Tue Nov 16 17:30:44 2010 From: janssen at parc.com (Bill Janssen) Date: Tue, 16 Nov 2010 08:30:44 PST Subject: [Python-Dev] Stable buildbots In-Reply-To: References: <20101113133712.60e9be27@pitrou.net> <30929.1289879830@parc.com> Message-ID: <45342.1289925044@parc.com> Ned Deily wrote: > In article <30929.1289879830 at parc.com>, Bill Janssen > wrote: > > > Both the Tiger buildbots are suddenly failing 3.x on test_cmd_line. > > Looking at the changes since the last success, I can't see anything > > which would obviously affect that... Any suspects? > > It appears to be a duplicate of Issue8458. Playing with it again, it > seems to be a race condition: sometimes I see all three failures you > reported, sometimes just one, sometimes none. Again, only on 10.4 > (Tiger), not 10.5 or 10.6. But the 10.4 machine I'm using is by far the > slowest of the three so it is possible that could be a factor. Good thought. It's also the slowest of my buildbots -- dual 1GHz PPC. > Perhaps a race condition with cleaning up the p2c pipe from a previous run? > > > Here's what's failing: > > > > ====================================================================== > > ERROR: test_run_code (test.test_cmd_line.CmdLineTest) > > ---------------------------------------------------------------------- > > Traceback (most recent call last): > > File > > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_cmd_line.py" > > , line 95, in test_run_code > > assert_python_failure('-c') > > File > > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py" > > , line 55, in assert_python_failure > > return _assert_python(False, *args, **env_vars) > > File > > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py" > > , line 29, in _assert_python > > env=env) > > File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/subprocess.py", > > line 683, in __init__ > > self.stdin = io.open(p2cwrite, 'wb', bufsize) > > OSError: [Errno 9] Bad file descriptor > > > > ====================================================================== > > ERROR: test_run_module (test.test_cmd_line.CmdLineTest) > > ---------------------------------------------------------------------- > > Traceback (most recent call last): > > File > > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_cmd_line.py" > > , line 72, in test_run_module > > assert_python_failure('-m') > > File > > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py" > > , line 55, in assert_python_failure > > return _assert_python(False, *args, **env_vars) > > File > > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py" > > , line 29, in _assert_python > > env=env) > > File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/subprocess.py", > > line 683, in __init__ > > self.stdin = io.open(p2cwrite, 'wb', bufsize) > > OSError: [Errno 9] Bad file descriptor > > > > ====================================================================== > > ERROR: test_version (test.test_cmd_line.CmdLineTest) > > ---------------------------------------------------------------------- > > Traceback (most recent call last): > > File > > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_cmd_line.py" > > , line 48, in test_version > > rc, out, err = assert_python_ok('-V') > > File > > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py" > > , line 48, in assert_python_ok > > return _assert_python(True, *args, **env_vars) > > File > > "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/script_helper.py" > > , line 29, in _assert_python > > env=env) > > File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/subprocess.py", > > line 683, in __init__ > > self.stdin = io.open(p2cwrite, 'wb', bufsize) > > OSError: [Errno 9] Bad file descriptor > > -- > Ned Deily, > nad at acm.org > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/bill%40janssen.org From exarkun at twistedmatrix.com Tue Nov 16 17:34:54 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Tue, 16 Nov 2010 16:34:54 -0000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> Message-ID: <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> On 03:48 pm, guido at python.org wrote: >On Tue, Nov 16, 2010 at 7:16 AM, Alexander Belopolsky > wrote: >>What this thread has shown is that there is no consensus on what >>public names are and what rules should be followed when changing names >>that can be imported from a module. ?I have opened an issue at >>http://bugs.python.org/issue10434 to address this. ?My vote is to >>adopt the definition spelled out in the language reference, copy it to >>the library manual and add some discussion of the deprecation >>policies. > >Hm. Apart from the specific semantics assigned by the language to >single and double leading (and trailing) underscores, I still think >this belongs in a style guide, not in the library manual. When reading >the library manual, one should always assume that undocumented >features are subject to change at any time. I don't think it belongs only in PEP 8 (that's "a style guide" you're referring to, correct?). It needs to be front and center. This is information that every single user of the stdlib needs in order to use the stdlib correctly. Imagine trying to use a dictionary without knowing about alphabetical ordering. Or driving a car without knowing what lane markers indicate. No matter how many times we discuss this policy on this list (I know it's come up here before), the majority of python users still won't learn about it. PEP 8 isn't nearly visible enough, either. Whatever the rule is, it needs to be presented with the information itself. If the rule is that things not documented in the library manual have no compatibility guarantees, then all of the means of getting documentation *other* than looking at the library manual need to indicate this somehow (alternatively, the information shouldn't be duplicated, but I doubt I'll convince anyone of that). Here's a stupid proposal. What if the top of pydoc output said (for stdlib modules only) "The library manual is the canonical reference. Refer to it before using APIs you find in this documentation." Still inconvenient, but inconvenient is better than secret/impossible. Jean-Paul From raymond.hettinger at gmail.com Tue Nov 16 18:03:03 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 16 Nov 2010 09:03:03 -0800 Subject: [Python-Dev] [RELEASED] Python 3.2 alpha 4 In-Reply-To: References: <4CE28FBF.9020200@python.org> Message-ID: <662EDCAC-B0D2-4FF4-B666-CDB3363123C7@gmail.com> On Nov 16, 2010, at 7:05 AM, Paul Moore wrote: > > PEP 3148 (Futures) is noted in the PEP as going into 3.2, It also > seems to be in the release. > > Should it not be added to the "What's new in 3.2" document and the > release announcements? It's a fairly significant feature. I'll update the whatsnew document before the beta goes out. Raymond From raymond.hettinger at gmail.com Tue Nov 16 18:01:39 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 16 Nov 2010 09:01:39 -0800 Subject: [Python-Dev] [RELEASED] Python 3.2 alpha 4 In-Reply-To: References: <4CE28FBF.9020200@python.org> Message-ID: <61761AC8-B99E-4D0B-9C1A-70A419957FB7@gmail.com> On Nov 16, 2010, at 7:05 AM, Paul Moore wrote: > > PEP 3148 (Futures) is noted in the PEP as going into 3.2, It also > seems to be in the release. > > Should it not be added to the "What's new in 3.2" document and the > release announcements? It's a fairly significant feature. I'll update the whatsnew document before the beta goes out. Raymond From solipsis at pitrou.net Tue Nov 16 18:06:40 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 16 Nov 2010 18:06:40 +0100 Subject: [Python-Dev] Breaking undocumented API References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> Message-ID: <20101116180640.26a112f2@pitrou.net> On Tue, 16 Nov 2010 16:34:54 -0000 exarkun at twistedmatrix.com wrote: > > Imagine trying to use a dictionary without knowing about alphabetical > ordering. You mean an ordered dictionary, right? From alexander.belopolsky at gmail.com Tue Nov 16 18:13:57 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 16 Nov 2010 12:13:57 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE2A55C.8030807@egenix.com> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <4CE2A55C.8030807@egenix.com> Message-ID: On Tue, Nov 16, 2010 at 10:38 AM, M.-A. Lemburg wrote: .. > One API I'm not sure about is PyUnicode_AppendAndDel(). It's somewhat > obscure and given that we already have PyUnicode_Concat(), I think > it should be made private and eventually dropped. > What about PyUnicode_GetMax()? Isn't that supposed to be Py_UNICODE_GETMAX()? Or better still Py_UNICODE_MAXORDINAL? From lukasz at langa.pl Tue Nov 16 18:16:21 2010 From: lukasz at langa.pl (=?UTF-8?B?xYF1a2FzeiBMYW5nYQ==?=) Date: Tue, 16 Nov 2010 18:16:21 +0100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <20101116180640.26a112f2@pitrou.net> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <20101116180640.26a112f2@pitrou.net> Message-ID: <4CE2BC65.1080001@langa.pl> Am 16.11.2010 18:06, schrieb Antoine Pitrou: > On Tue, 16 Nov 2010 16:34:54 -0000 > exarkun at twistedmatrix.com wrote: >> Imagine trying to use a dictionary without knowing about alphabetical >> ordering. > You mean an ordered dictionary, right? He meant the ones with actual paper pages. From fuzzyman at voidspace.org.uk Tue Nov 16 18:21:38 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 16 Nov 2010 17:21:38 +0000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE2BC65.1080001@langa.pl> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <20101116180640.26a112f2@pitrou.net> <4CE2BC65.1080001@langa.pl> Message-ID: <4CE2BDA2.1000302@voidspace.org.uk> On 16/11/2010 17:16, ?ukasz Langa wrote: > Am 16.11.2010 18:06, schrieb Antoine Pitrou: >> On Tue, 16 Nov 2010 16:34:54 -0000 >> exarkun at twistedmatrix.com wrote: >>> Imagine trying to use a dictionary without knowing about alphabetical >>> ordering. >> You mean an ordered dictionary, right? > > He meant the ones with actual paper pages. But given that we are particularly talking about how to handle undocumented APIs, a more apropos comparison would be to ask how dictionary readers are supposed to look up words that aren't in the dictionary... This is why I think it *is* a style issue for developers - the more important decision is codifying how we decide what words need to go in the dictionary (to continue to torture the analogy). Michael > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From exarkun at twistedmatrix.com Tue Nov 16 18:30:49 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Tue, 16 Nov 2010 17:30:49 -0000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE2BDA2.1000302@voidspace.org.uk> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <20101116180640.26a112f2@pitrou.net> <4CE2BC65.1080001@langa.pl> <4CE2BDA2.1000302@voidspace.org.uk> Message-ID: <20101116173049.2040.989476246.divmod.xquotient.936@localhost.localdomain> On 05:21 pm, fuzzyman at voidspace.org.uk wrote: >On 16/11/2010 17:16, 1ukasz Langa wrote: >>Am 16.11.2010 18:06, schrieb Antoine Pitrou: >>>On Tue, 16 Nov 2010 16:34:54 -0000 >>>exarkun at twistedmatrix.com wrote: >>>>Imagine trying to use a dictionary without knowing about >>>>alphabetical >>>>ordering. >>>You mean an ordered dictionary, right? >> >>He meant the ones with actual paper pages. > >But given that we are particularly talking about how to handle >undocumented APIs, a more apropos comparison would be to ask how >dictionary readers are supposed to look up words that aren't in the >dictionary... No, this isn't an appropriate comparison. The dictionary was an example of something that presents information but is very hard to use without knowing the rules. We're not talking about undocumented APIs. We're talking about APIs that are documented somewhere other than in the library manual. Jean-Paul From mal at egenix.com Tue Nov 16 19:06:22 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 16 Nov 2010 19:06:22 +0100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <4CE2A55C.8030807@egenix.com> Message-ID: <4CE2C81E.20103@egenix.com> Alexander Belopolsky wrote: > On Tue, Nov 16, 2010 at 10:38 AM, M.-A. Lemburg wrote: > .. >> One API I'm not sure about is PyUnicode_AppendAndDel(). It's somewhat >> obscure and given that we already have PyUnicode_Concat(), I think >> it should be made private and eventually dropped. >> > > What about PyUnicode_GetMax()? Isn't that supposed to be > Py_UNICODE_GETMAX()? Or better still Py_UNICODE_MAXORDINAL? Traditionally, all uppercase symbols refer to macros, whereas the mixed case ones refer to functions. Now, we can't use a macro for this, since the information has to be available as callable in order to applications or extensions to use it (without recompile). Regarding the name: PyUnicode_MaxOrdinal() would certainly have been better. BTW: I'm not really happy about the Py_UNICODE_ prefix for functions in unicodeobject.h, but I guess it's too late to change those. It would be better to stick to one prefix for Unicode related APIs, i.e. "PyUnicode_". -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 16 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From g.brandl at gmx.net Tue Nov 16 19:05:44 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 16 Nov 2010 19:05:44 +0100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <20101116180640.26a112f2@pitrou.net> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <20101116180640.26a112f2@pitrou.net> Message-ID: Am 16.11.2010 18:06, schrieb Antoine Pitrou: > On Tue, 16 Nov 2010 16:34:54 -0000 > exarkun at twistedmatrix.com wrote: >> >> Imagine trying to use a dictionary without knowing about alphabetical >> ordering. > > You mean an ordered dictionary, right? That one's a sorted dictionary, though. Georg From alexander.belopolsky at gmail.com Tue Nov 16 19:31:32 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 16 Nov 2010 13:31:32 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE2C81E.20103@egenix.com> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <4CE2A55C.8030807@egenix.com> <4CE2C81E.20103@egenix.com> Message-ID: On Tue, Nov 16, 2010 at 1:06 PM, M.-A. Lemburg wrote: .. > Now, we can't use a macro for [PyUnicode_GetMax()], since the information has > to be available as callable in order to applications or extensions > to use it (without recompile). > .. but it *is* a macro resolving to either PyUnicodeUCS2_GetMax or PyUnicodeUCS4_GetMax. What is the scenario when may want to change what PyUnicodeUCS?_GetMax return and have extensions pick up the change without a recompile? UCS2 case will certainly never change since it is already 0xFFFF. Is it possible that USC4 will be expanded beyond 0x10FFFF? Note that we can have both a macro and a function version. This is fairly standard practice in Python C-API. From jcea at jcea.es Tue Nov 16 19:38:07 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 16 Nov 2010 19:38:07 +0100 Subject: [Python-Dev] Mercurial Schedule Message-ID: <4CE2CF8F.4040500@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Is there any updated mercurial schedule?. Any impact related with the new 3.2 schedule (three weeks offset)? - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOLPj5lgi5GaxT1NAQKM4gQAnL+pDmsc8PjPYCdCMf50pe6NwUs60D54 O3t8IgtbQJi9HqL5KJIJ99ZYlBOzze0lCy25NWNmnSrt6ISoU3IuTe7SUJ24iWKH T4x9MzRog5eIfa7z37aCJiIfvRJV4Q2drL4C6U1VFSji13EpknkGXefvyNToc+OX IDSM9ESZmGc= =vSL9 -----END PGP SIGNATURE----- From alexander.belopolsky at gmail.com Tue Nov 16 19:40:36 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 16 Nov 2010 13:40:36 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE2C81E.20103@egenix.com> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <4CE2A55C.8030807@egenix.com> <4CE2C81E.20103@egenix.com> Message-ID: On Tue, Nov 16, 2010 at 1:06 PM, M.-A. Lemburg wrote: .. > BTW: I'm not really happy about the Py_UNICODE_ prefix for functions > in unicodeobject.h, but I guess it's too late to change those. > It would be better to stick to one prefix for Unicode related > APIs, i.e. "PyUnicode_". I don't have a problem with this. It makes sense that functions that operate on PyUnicode objects start with PyUnicode_ and those that operate on Py_UNICODE ordinals start with Py_UNICODE_. Of course, PyUnicode should have been named PyUnicodeObject and Py_UNICODE should have been named Py_wchar_t, but that's a different story. From mal at egenix.com Tue Nov 16 19:57:04 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 16 Nov 2010 19:57:04 +0100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <4CE2A55C.8030807@egenix.com> <4CE2C81E.20103@egenix.com> Message-ID: <4CE2D400.5060803@egenix.com> Alexander Belopolsky wrote: > On Tue, Nov 16, 2010 at 1:06 PM, M.-A. Lemburg wrote: > .. >> Now, we can't use a macro for [PyUnicode_GetMax()], since the information has >> to be available as callable in order to applications or extensions >> to use it (without recompile). >> > > .. but it *is* a macro resolving to either PyUnicodeUCS2_GetMax or > PyUnicodeUCS4_GetMax. That doesn't count :-) It's only a trick to prevent external code from using the wrong Unicode APIs. There still is a real function behind the renaming. > What is the scenario when may want to change > what PyUnicodeUCS?_GetMax return and have extensions pick up the > change without a recompile? If an extensions uses the stable ABI, it will want to know whether the interpreter was built for UCS2 or UCS4 (even if it doesn't use the Unicode APIs directly). > UCS2 case will certainly never change > since it is already 0xFFFF. Is it possible that USC4 will be expanded > beyond 0x10FFFF? Well, the Unicode Consortium decided to not go beyond 0x10FFFF, but then you never know... when they started out on the quest, 16 bits appeared more than enough, but they found out relatively quickly that the Asian scripts had enough code points to easily fill that space. Once space is available, it tends to get used sooner or later :-) > Note that we can have both a macro and a function > version. This is fairly standard practice in Python C-API. Sure, but what for ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 16 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From alexander.belopolsky at gmail.com Tue Nov 16 20:06:37 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 16 Nov 2010 14:06:37 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE2D400.5060803@egenix.com> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <4CE2A55C.8030807@egenix.com> <4CE2C81E.20103@egenix.com> <4CE2D400.5060803@egenix.com> Message-ID: On Tue, Nov 16, 2010 at 1:57 PM, M.-A. Lemburg wrote: .. >> Note that we can have both a macro and a function >> version. ?This is fairly standard practice in Python C-API. > > Sure, but what for ? > Mostly just for consistency with the other macros: http://docs.python.org/dev/py3k/c-api/unicode.html#unicode-character-properties Wait, these actually map to C functions as well. So this is just a naming issue. From tjreedy at udel.edu Tue Nov 16 20:08:18 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 16 Nov 2010 14:08:18 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> Message-ID: On 11/16/2010 10:16 AM, Alexander Belopolsky wrote: > What this thread has shown is that there is no consensus on what > public names are and what rules should be followed when changing names > that can be imported from a module. Nor is their any consensus on the use of __all__ in the stdlib, with opinion ranging from never to sometimes to always. I do not have any opinions on the particular solution adopted, but appreciate your persistence in pushing to *some* solution. It would be nice to add 'Cleanly separated public and private APIs' to the list of 3.x features. -- Terry Jan Reedy From mal at egenix.com Tue Nov 16 20:16:50 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 16 Nov 2010 20:16:50 +0100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <4CE2A55C.8030807@egenix.com> <4CE2C81E.20103@egenix.com> <4CE2D400.5060803@egenix.com> Message-ID: <4CE2D8A2.9040705@egenix.com> Alexander Belopolsky wrote: > On Tue, Nov 16, 2010 at 1:57 PM, M.-A. Lemburg wrote: > .. >>> Note that we can have both a macro and a function >>> version. This is fairly standard practice in Python C-API. >> >> Sure, but what for ? >> > > Mostly just for consistency with the other macros: > > http://docs.python.org/dev/py3k/c-api/unicode.html#unicode-character-properties > > Wait, these actually map to C functions as well. So this is just a > naming issue. As said: the UCS2/4 name mangling doesn't count fall under the macro naming scheme, since it's done transparently and with a different reasoning in mind, than when you decide to use a macro to access some object detail, or want to avoid repetition. This trick was also added after the original APIs had already been documented for a while, so there was no way to change their names anymore. The various ctype functions use macro names for historic reasons: they were directed to different functions and/or inline code depending on a configuration switch. This is now gone, since the lib C ctype functions were locale aware and often implemented things a little differently than the Python ctype tables. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 16 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From alexander.belopolsky at gmail.com Tue Nov 16 20:52:07 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 16 Nov 2010 14:52:07 -0500 Subject: [Python-Dev] PyUnicode_GetMax() and PyUnicode_FromOrdinal() Was: Breaking undocumented API Message-ID: On Tue, Nov 16, 2010 at 1:57 PM, M.-A. Lemburg wrote: > Alexander Belopolsky wrote: >> On Tue, Nov 16, 2010 at 1:06 PM, M.-A. Lemburg wrote: >> .. >>> Now, we can't use a macro for [PyUnicode_GetMax()], since the information has >>> to be available as callable in order to applications or extensions >>> to use it (without recompile). >>> >> >> .. but it *is* a macro resolving to either PyUnicodeUCS2_GetMax or >> PyUnicodeUCS4_GetMax. > > That doesn't count :-) It's only a trick to prevent external code > from using the wrong Unicode APIs. > > There still is a real function behind the renaming. > >> What is the scenario when may want to change >> what PyUnicodeUCS?_GetMax return and have extensions pick up the >> change without a recompile? > > If an extensions uses the stable ABI, it will want to know > whether the interpreter was built for UCS2 or UCS4 (even if > it doesn't use the Unicode APIs directly). > >> UCS2 case will certainly never change >> since it is already 0xFFFF. ?Is it possible that USC4 will be expanded >> beyond 0x10FFFF? > > Well, the Unicode Consortium decided to not go beyond 0x10FFFF, > but then you never know... when they started out on the quest, > 16 bits appeared more than enough, but they found out relatively > quickly that the Asian scripts had enough code points to easily > fill that space. > > Once space is available, it tends to get used sooner or later :-) > >> Note that we can have both a macro and a function >> version. ?This is fairly standard practice in Python C-API. > > Sure, but what for ? Note that PyUnicode_FromOrdinal() is documented (in unicodeobject.h) as follows without a reference to PyUnicode_GetMax(): """ Create a Unicode Object from the given Unicode code point ordinal. The ordinal must be in range(0x10000) on narrow Python builds (UCS2), and range(0x110000) on wide builds (UCS4). A ValueError is raised in case it is not. """ The actual implementation actually checks UCS4 range only. if (ordinal < 0 || ordinal > 0x10ffff) { PyErr_SetString(PyExc_ValueError, "chr() arg not in range(0x110000)"); return NULL; } This actually looks like a bug: >>> len(chr(0x10FFFF)) 2 (on a USC2 build.) Also, I think PyUnicode_FromOrdinal() should take Py_UNICODE argument rather than int. From mal at egenix.com Tue Nov 16 21:06:15 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 16 Nov 2010 21:06:15 +0100 Subject: [Python-Dev] PyUnicode_GetMax() and PyUnicode_FromOrdinal() Was: Breaking undocumented API In-Reply-To: References: Message-ID: <4CE2E437.5010103@egenix.com> Alexander Belopolsky wrote: > On Tue, Nov 16, 2010 at 1:57 PM, M.-A. Lemburg wrote: >> Alexander Belopolsky wrote: >>> On Tue, Nov 16, 2010 at 1:06 PM, M.-A. Lemburg wrote: >>> .. >>>> Now, we can't use a macro for [PyUnicode_GetMax()], since the information has >>>> to be available as callable in order to applications or extensions >>>> to use it (without recompile). >>>> >>> >>> .. but it *is* a macro resolving to either PyUnicodeUCS2_GetMax or >>> PyUnicodeUCS4_GetMax. >> >> That doesn't count :-) It's only a trick to prevent external code >> from using the wrong Unicode APIs. >> >> There still is a real function behind the renaming. >> >>> What is the scenario when may want to change >>> what PyUnicodeUCS?_GetMax return and have extensions pick up the >>> change without a recompile? >> >> If an extensions uses the stable ABI, it will want to know >> whether the interpreter was built for UCS2 or UCS4 (even if >> it doesn't use the Unicode APIs directly). >> >>> UCS2 case will certainly never change >>> since it is already 0xFFFF. Is it possible that USC4 will be expanded >>> beyond 0x10FFFF? >> >> Well, the Unicode Consortium decided to not go beyond 0x10FFFF, >> but then you never know... when they started out on the quest, >> 16 bits appeared more than enough, but they found out relatively >> quickly that the Asian scripts had enough code points to easily >> fill that space. >> >> Once space is available, it tends to get used sooner or later :-) >> >>> Note that we can have both a macro and a function >>> version. This is fairly standard practice in Python C-API. >> >> Sure, but what for ? > > Note that PyUnicode_FromOrdinal() is documented (in unicodeobject.h) > as follows without a reference to PyUnicode_GetMax(): > > """ > Create a Unicode Object from the given Unicode code point ordinal. > > The ordinal must be in range(0x10000) on narrow Python builds > (UCS2), and range(0x110000) on wide builds (UCS4). A ValueError is > raised in case it is not. > """ > > The actual implementation actually checks UCS4 range only. > > if (ordinal < 0 || ordinal > 0x10ffff) { > PyErr_SetString(PyExc_ValueError, > "chr() arg not in range(0x110000)"); > return NULL; > } > > This actually looks like a bug: > >>>> len(chr(0x10FFFF)) > 2 > > (on a USC2 build.) Yes, it's a documentation bug. I guess someone forgot to update the comment in unicodeobject.h after the change to have chr()/unichr() return a 2-char string instead of a 1-char string for non-BMP code points. > Also, I think PyUnicode_FromOrdinal() should take Py_UNICODE argument > rather than int. No, an ordinal is a number, not a typed value. We have PyUnicode_FromUnicode() to create strings from Py_UNICODE* arrays. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 16 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From rbp at isnomore.net Tue Nov 16 21:15:56 2010 From: rbp at isnomore.net (Rodrigo Bernardo Pimentel) Date: Tue, 16 Nov 2010 18:15:56 -0200 Subject: [Python-Dev] Python bug week-end : 20-21 November In-Reply-To: References: <20101025230337.41aeef12@pitrou.net> Message-ID: On 26 October 2010 18:04, Georg Brandl wrote: > Am 26.10.2010 19:53, schrieb Brett Cannon: >> Can whomever has edit access to the Python Google Calendar add this? > > Done. The Bug Weekend is still up, right? I don't see mention of it at http://wiki.python.org/moin/PythonBugDay (and when I tried to log in to edit, I got "A problem occurred in a Python script." - now, I thought no problems ever occurred on Python scripts! ;)). ? ? rbp -- ?http://isnomore.net From alexander.belopolsky at gmail.com Tue Nov 16 21:31:13 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 16 Nov 2010 15:31:13 -0500 Subject: [Python-Dev] PyUnicode_GetMax() and PyUnicode_FromOrdinal() Was: Breaking undocumented API In-Reply-To: <4CE2E437.5010103@egenix.com> References: <4CE2E437.5010103@egenix.com> Message-ID: On Tue, Nov 16, 2010 at 3:06 PM, M.-A. Lemburg wrote: .. >>>>> len(chr(0x10FFFF)) >> 2 >> >> (on a USC2 build.) > > Yes, it's a documentation bug. I guess someone forgot to update > the comment in unicodeobject.h after the change to have chr()/unichr() > return a 2-char string instead of a 1-char string for non-BMP > code points. Same problem in reST doc for chr(i): """ chr(i) Return the string of one character whose Unicode codepoint is the integer i. For example, chr(97) returns the string 'a'. This is the inverse of ord(). The valid range for the argument depends how Python was configured ? it may be either UCS2 [0..0xFFFF] or UCS4 [0..0x10FFFF]. ValueError will be raised if i is outside that range. """ http://docs.python.org/dev/py3k/library/functions.html?chr And in ord(c): """ ord(c) Given a string of length one, return an integer representing the Unicode code point of the character. For example, ord('a') returns the integer 97 and ord('\u2020') returns 8224. This is the inverse of chr(). If the argument length is not one, a TypeError will be raised. (If Python was built with UCS2 Unicode, then the character?s code point must be in the range [0..65535] inclusive; otherwise the string length is two!) """ http://docs.python.org/dev/py3k/library/functions.html#ord From g.brandl at gmx.net Tue Nov 16 21:49:01 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 16 Nov 2010 21:49:01 +0100 Subject: [Python-Dev] Python bug week-end : 20-21 November In-Reply-To: References: <20101025230337.41aeef12@pitrou.net> Message-ID: Am 16.11.2010 21:15, schrieb Rodrigo Bernardo Pimentel: > On 26 October 2010 18:04, Georg Brandl wrote: >> Am 26.10.2010 19:53, schrieb Brett Cannon: >>> Can whomever has edit access to the Python Google Calendar add this? >> >> Done. > > The Bug Weekend is still up, right? I don't see mention of it at > http://wiki.python.org/moin/PythonBugDay (and when I tried to log in > to edit, I got "A problem occurred in a Python script." - now, I > thought no problems ever occurred on Python scripts! ;)). Yeah, somebody (Antoine?) should update that wiki page... Georg From ben+python at benfinney.id.au Tue Nov 16 22:31:41 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 17 Nov 2010 08:31:41 +1100 Subject: [Python-Dev] Breaking undocumented API References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> Message-ID: <87lj4t9cqq.fsf@benfinney.id.au> exarkun at twistedmatrix.com writes: > On 03:48 pm, guido at python.org wrote: > >Hm. Apart from the specific semantics assigned by the language to > >single and double leading (and trailing) underscores, I still think > >this belongs in a style guide, not in the library manual. > > I don't think it belongs only in PEP 8 (that's "a style guide" you're > referring to, correct?). I don't know about Guido, but I'd be ?1 on suggestions to add more normative information to PEP 7, PEP 8, PEP 257, or any other established style guide PEP. I certainly don't want to have to keep going back to the same documents frequently just to see if the set of recommendations I already know has changed recently. Rather, I took Guido's mention of ?this belongs in a style guide? as suggesting a *new* style guide. Perhaps one that explicitly obsoletes an existing one or perhaps not; either way, the updated normative recommendations are in a new document with a new name, so that one knows whether one has already read it. > It needs to be front and center. This is information that every single > user of the stdlib needs in order to use the stdlib correctly. True enough. This is information that goes beyond a style guide for writers, and into conventions that API users need to know also. -- \ ?I went to the museum where they had all the heads and arms | `\ from the statues that are in all the other museums.? ?Steven | _o__) Wright | Ben Finney From fdrake at acm.org Tue Nov 16 22:41:39 2010 From: fdrake at acm.org (Fred Drake) Date: Tue, 16 Nov 2010 16:41:39 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <87lj4t9cqq.fsf@benfinney.id.au> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <87lj4t9cqq.fsf@benfinney.id.au> Message-ID: On Tue, Nov 16, 2010 at 4:31 PM, Ben Finney wrote: > I don't know about Guido, but I'd be -1 on suggestions to add more > normative information to PEP 7, PEP 8, PEP 257, or any other established > style guide PEP. I certainly don't want to have to keep going back to > the same documents frequently just to see if the set of recommendations > I already know has changed recently. Agreed. Many style guides are written as extensions of PEP 8 in particular. This has already bitten the Zope community, which was developing style beyond what was even written in it's own extension, only to have PEP 8 change out from under it in a contrary manner. Lessons we learned: - If you refer to someone else's documents, refer to specific versions. References can be updated explicitly if desired. - If you have even an advisory point of style, write it down in the style guide, so people who read the foundational documents you referred to without version information will be aware of the expectations. Otherwise, you may as well not have one. -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From guido at python.org Tue Nov 16 22:49:16 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Nov 2010 13:49:16 -0800 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> Message-ID: On Tue, Nov 16, 2010 at 8:34 AM, wrote: > On 03:48 pm, guido at python.org wrote: >> >> On Tue, Nov 16, 2010 at 7:16 AM, Alexander Belopolsky >> wrote: >>> >>> What this thread has shown is that there is no consensus on what >>> public names are and what rules should be followed when changing names >>> that can be imported from a module. ?I have opened an issue at >>> http://bugs.python.org/issue10434 to address this. ?My vote is to >>> adopt the definition spelled out in the language reference, copy it to >>> the library manual and add some discussion of the deprecation >>> policies. >> >> Hm. Apart from the specific semantics assigned by the language to >> single and double leading (and trailing) underscores, I still think >> this belongs in a style guide, not in the library manual. When reading >> the library manual, one should always assume that undocumented >> features are subject to change at any time. > > I don't think it belongs only in PEP 8 (that's "a style guide" you're > referring to, correct?). ?It needs to be front and center. ?This is > information that every single user of the stdlib needs in order to use the > stdlib correctly. That depends on what methods you're imagining "every single user" is using to find out what the API *is*. In my experience there are many ways people do this: - by reading the source - by reading the official docs - by trial and error - inspection of objects (e.g. dir()) - using help() - by reading pydoc output collected on some website (or local disk) - by following tutorials - by reading books containing reference documentation generated by 3rd party authors Most people do several of those things. (Personally, I learned about many APIs by creating them. But I'm probably an exception. :-) > No matter how many times we discuss this policy on this list (I know it's > come up here before), the majority of python users still won't learn about > it. Agreed. And adding a disclaimer to help() or pydoc output won't make much of a difference, I expect. > PEP 8 isn't nearly visible enough, either. ?Whatever the rule is, it needs > to be presented with the information itself. ?If the rule is that things not > documented in the library manual have no compatibility guarantees, then all > of the means of getting documentation *other* than looking at the library > manual need to indicate this somehow (alternatively, the information > shouldn't be duplicated, but I doubt I'll convince anyone of that). Assuming people actually read the disclaimers. > Here's a stupid proposal. ?What if the top of pydoc output said (for stdlib > modules only) "The library manual is the canonical reference. Refer to it > before using APIs you find in this documentation." ?Still inconvenient, but > inconvenient is better than secret/impossible. Personally I think it would be sufficient if the disclaimer was at the top of the library reference itself. That's certainly enough from a legalistic "I told you so" POV and I doubt that we'll be able to move the POV of what people actually use... -- --Guido van Rossum (python.org/~guido) From alexander.belopolsky at gmail.com Tue Nov 16 22:54:24 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 16 Nov 2010 16:54:24 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <87lj4t9cqq.fsf@benfinney.id.au> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <87lj4t9cqq.fsf@benfinney.id.au> Message-ID: On Tue, Nov 16, 2010 at 4:31 PM, Ben Finney wrote: .. > I don't know about Guido, but I'd be -1 on suggestions to add more > normative information to PEP 7, PEP 8, PEP 257, or any other established > style guide PEP. I certainly don't want to have to keep going back to > the same documents frequently just to see if the set of recommendations > I already know has changed recently. > > Rather, I took Guido's mention of "this belongs in a style guide" as > suggesting a *new* style guide. Perhaps one that explicitly obsoletes an > existing one or perhaps not; either way, the updated normative > recommendations are in a new document with a new name, so that one knows > whether one has already read it. > +1 Numbered PEPs, while well-known to old-timers, are really odd place for newcomers to find a style guide. This really should be a separate part at the top level of docs.python.org. Note that we already have a documentation style guide under "Documenting Python." Maybe we should reuse this slot and have say "Python Development" part which will put together PEP 7, PEP 8 and documentation "Style Guide" in one convenient package. This, however, is a much bigger project than what I had in mind when I started this thread. From alexander.belopolsky at gmail.com Tue Nov 16 23:19:36 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 16 Nov 2010 17:19:36 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE2A55C.8030807@egenix.com> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <4CE2A55C.8030807@egenix.com> Message-ID: I created http://bugs.python.org/issue10435 to follow up on unicode C API issues. On Tue, Nov 16, 2010 at 10:38 AM, M.-A. Lemburg wrote: > Alexander Belopolsky wrote: >> What this thread has shown is that there is no consensus on what >> public names are and what rules should be followed when changing names >> that can be imported from a module. ?I have opened an issue at >> http://bugs.python.org/issue10434 to address this. ?My vote is to >> adopt the definition spelled out in the language reference, copy it to >> the library manual and add some discussion of the deprecation >> policies. >> >> I also have a similar question about C API. ?Here, in absence of >> __all__, the answer should be clear: all symbols in public header >> files should start with either _Py_ or Py_ and those that start with >> Py_ are public. ? The question is what should be done with names that >> start with Py_, but are not documented? ?Can we add an underscore to >> those names? ?If so, should a (deprecated) alias be made available? >> Should they be documented as deprecated? >> >> I think these questions can only be answered on a case by case bases >> which choices being: >> >> 1. Document. >> 2. Document as deprecated. >> 3. Document as deprecated, add underscore prefix and retain a deprecated alias. >> 4. Add an underscore prefix. >> >> The specific set of names that I would like to consider is the >> following from unicode.h. ?I am marking with (*) the names that I >> think should be documented and with (D) those that should be >> deprecated: >> >> PyUnicode_GetMax >> PyUnicode_Resize (*) >> PyUnicode_InternImmortal >> PyUnicode_FromOrdinal (*) >> PyUnicode_GetDefaultEncoding (D) >> PyUnicode_AsDecodedObject >> PyUnicode_AsDecodedUnicode >> PyUnicode_AsEncodedObject >> PyUnicode_AsEncodedUnicode >> PyUnicode_BuildEncodingMap >> PyUnicode_EncodeDecimal (*) >> PyUnicode_Append (*) >> PyUnicode_AppendAndDel (*) >> PyUnicode_Partition (*) >> PyUnicode_RPartition (*) >> PyUnicode_RSplit (*) >> PyUnicode_IsIdentifier (*) >> Py_UNICODE_strlen >> Py_UNICODE_strcpy >> Py_UNICODE_strcat >> Py_UNICODE_strncpy >> Py_UNICODE_strcmp >> Py_UNICODE_strncmp >> Py_UNICODE_strchr >> Py_UNICODE_strrchr > > For Unicode, unicodeobject.h defines which APIs are private or not. > APIs which don't appear in the header file are either private or > need to be added to the header file (but I don't think there are > any in this category). > > All APIs in the header that do not appear in the documentation, > should be added there as well. unicodeobject.h already provides > documentation for most of the APIs you've listed above (except some > new ones that were added later on). > > One API I'm not sure about is PyUnicode_AppendAndDel(). It's somewhat > obscure and given that we already have PyUnicode_Concat(), I think > it should be made private and eventually dropped. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source ?(#1, Nov 16 2010) >>>> Python/Zope Consulting and Support ... ? ? ? ?http://www.egenix.com/ >>>> mxODBC.Zope.Database.Adapter ... ? ? ? ? ? ? http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... ? ? ? ?http://python.egenix.com/ > ________________________________________________________________________ > > ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: > > > ? eGenix.com Software, Skills and Services GmbH ?Pastor-Loeh-Str.48 > ? ?D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > ? ? ? ? ? Registered at Amtsgericht Duesseldorf: HRB 46611 > ? ? ? ? ? ? ? http://www.egenix.com/company/contact/ > From glyph at twistedmatrix.com Wed Nov 17 00:41:42 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 16 Nov 2010 18:41:42 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> Message-ID: <9F20A8B5-7628-448B-AE21-416D6FE76E80@twistedmatrix.com> On Nov 16, 2010, at 4:49 PM, Guido van Rossum wrote: >> PEP 8 isn't nearly visible enough, either. Whatever the rule is, it needs >> to be presented with the information itself. If the rule is that things not >> documented in the library manual have no compatibility guarantees, then all >> of the means of getting documentation *other* than looking at the library >> manual need to indicate this somehow (alternatively, the information >> shouldn't be duplicated, but I doubt I'll convince anyone of that). > > Assuming people actually read the disclaimers. I don't think it necessarily needs to be presented as a disclaimer. There will always be people who just ignore part of the information presented, but the message could be something along the lines of "Here's some basic documentation, but it might be out-of-date or incomplete. You can find a better reference at ." If it's easy to click on the link, I think a lot of people will click on it. Especially since the library reference really _is_ more helpful than the docstrings, for the standard library. (IMHO, dir()'s semantics are so weird that it should emit a warning too, like "looking for docs? please use help()".) -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Wed Nov 17 08:18:59 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 17 Nov 2010 08:18:59 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <4CE2CF8F.4040500@jcea.es> References: <4CE2CF8F.4040500@jcea.es> Message-ID: Am 16.11.2010 19:38, schrieb Jesus Cea: > Is there any updated mercurial schedule?. > > Any impact related with the new 3.2 schedule (three weeks offset)? I've been trying to contact Dirkjan and ask; generally, I don't see much connection to the 3.2 schedule (with the exception that the final migration day should not be a release day.) Georg From ncoghlan at gmail.com Wed Nov 17 12:45:39 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 17 Nov 2010 21:45:39 +1000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> Message-ID: On Wed, Nov 17, 2010 at 2:34 AM, wrote: > I don't think it belongs only in PEP 8 (that's "a style guide" you're > referring to, correct?). ?It needs to be front and center. ?This is > information that every single user of the stdlib needs in order to use the > stdlib correctly. > > Imagine trying to use a dictionary without knowing about alphabetical > ordering. ?Or driving a car without knowing what lane markers indicate. The definition of the public/private policy in all its gory detail should be in PEP 8 as Guido suggests. The library documentation may then contain a note about the difference in compatibility guarantees for public and private APIs, say that any interface and behaviour documented in the manual qualifies as public, then point readers to PEP 8 for the precise details. A similar note could be placed in the C API documentation (with a reference to the detailed policy in PEP 7, perhaps REsTify'ing that PEP in the process in order to link directly to the naming convention section). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Wed Nov 17 12:57:17 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 17 Nov 2010 11:57:17 +0000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> Message-ID: <4CE3C31D.50701@voidspace.org.uk> On 17/11/2010 11:45, Nick Coghlan wrote: > On Wed, Nov 17, 2010 at 2:34 AM, wrote: >> I don't think it belongs only in PEP 8 (that's "a style guide" you're >> referring to, correct?). It needs to be front and center. This is >> information that every single user of the stdlib needs in order to use the >> stdlib correctly. >> >> Imagine trying to use a dictionary without knowing about alphabetical >> ordering. Or driving a car without knowing what lane markers indicate. > The definition of the public/private policy in all its gory detail > should be in PEP 8 as Guido suggests. +1 Have we agreed the policy though? > The library documentation may then contain a note about the difference > in compatibility guarantees for public and private APIs, say that any > interface and behaviour documented in the manual qualifies as public, > then point readers to PEP 8 for the precise details. > +1 This sounds like the right approach to me. All the best, Michael > A similar note could be placed in the C API documentation (with a > reference to the detailed policy in PEP 7, perhaps REsTify'ing that > PEP in the process in order to link directly to the naming convention > section). > > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From lukasz at langa.pl Wed Nov 17 13:37:27 2010 From: lukasz at langa.pl (=?UTF-8?B?xYF1a2FzeiBMYW5nYQ==?=) Date: Wed, 17 Nov 2010 13:37:27 +0100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE3C31D.50701@voidspace.org.uk> References: <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> Message-ID: <4CE3CC87.1000105@langa.pl> Am 17.11.2010 12:57, schrieb Michael Foord: > On 17/11/2010 11:45, Nick Coghlan wrote: >> The definition of the public/private policy in all its gory detail >> should be in PEP 8 as Guido suggests. > > +1 > Guido did not said that, though. I'm with Fred and other people that agree that PEPs should be more-less immutable. Let's make a new document (PEP 88?). The reasoning was well laid out here: http://mail.python.org/pipermail/python-dev/2010-November/105641.html http://mail.python.org/pipermail/python-dev/2010-November/105642.html > Have we agreed the policy though? > Everybody has their own opinion on the matter. This discussion thread is getting too fractured to actually get us far enough with the conclusions. Let's make a PEP and discuss concrete wording on a concrete proposal. >> The library documentation may then contain a note about the difference >> in compatibility guarantees for public and private APIs, say that any >> interface and behaviour documented in the manual qualifies as public, >> then point readers to PEP 8 for the precise details. >> > > +1 Yes, point to PEP 88. Best regards, ?ukasz Langa From jcea at jcea.es Wed Nov 17 13:51:49 2010 From: jcea at jcea.es (Jesus Cea) Date: Wed, 17 Nov 2010 13:51:49 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: References: <4CE2CF8F.4040500@jcea.es> Message-ID: <4CE3CFE5.7070803@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 17/11/10 08:18, Georg Brandl wrote: > Am 16.11.2010 19:38, schrieb Jesus Cea: >> Is there any updated mercurial schedule?. >> >> Any impact related with the new 3.2 schedule (three weeks offset)? > > I've been trying to contact Dirkjan and ask; generally, I don't > see much connection to the 3.2 schedule (with the exception that > the final migration day should not be a release day.) I can't find the mail now, but I remember that months ago the Mercurial migration schedule was mid-december. I wonder if there is any update. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOPP5Zlgi5GaxT1NAQLpSgP/e31LxthlSKgrVYbVhmHKfpdRvQKS2KGb kd0wpIYHhYs/TF0Jwm+Z1r4ylNTaOq0bSL8mJAFqZDnf2IA/jSn9Db/JUk338z7B FIcP0jYLSG0wS+pITRL+f6ifCK5s9SgdbSlPVTdyA6R5G9BDw0T72ZI4WDbnbTEy zqPfvWULiqY= =kPIk -----END PGP SIGNATURE----- From emile.anclin at logilab.fr Wed Nov 17 13:48:06 2010 From: emile.anclin at logilab.fr (Emile Anclin) Date: Wed, 17 Nov 2010 13:48:06 +0100 Subject: [Python-Dev] python3k vs _ast Message-ID: <201011171348.07169.emile.anclin@logilab> hello everybody, migrating Pylint to python3.x, we encounter a little problem : in the tree generated by _ast, if we consider a "args" node (representing an argument of a function), the "lineno" (and the "col_offset") information disappeared from those nodes. Is there a particular reason for that ? In python2.x, the "args" nodes were just "Name" nodes, and as for now we keep them as "AssName" nodes in astng/pylint and would like to know where it was defined. thx for any information -- Emile Anclin http://www.logilab.fr/ http://www.logilab.org/ Informatique scientifique & et gestion de connaissances From fuzzyman at voidspace.org.uk Wed Nov 17 14:11:51 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 17 Nov 2010 13:11:51 +0000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE3CC87.1000105@langa.pl> References: <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@lan ga.pl> Message-ID: <4CE3D497.50102@voidspace.org.uk> On 17/11/2010 12:37, ?ukasz Langa wrote: > Am 17.11.2010 12:57, schrieb Michael Foord: >> On 17/11/2010 11:45, Nick Coghlan wrote: >>> The definition of the public/private policy in all its gory detail >>> should be in PEP 8 as Guido suggests. >> >> +1 >> > > Guido did not said that, though. I think that is a reasonable interpretation, and the suggestion that by "in a style guide" means "create a new style guide" is more of a stretch. > I'm with Fred and other people that agree that PEPs should be > more-less immutable. Let's make a new document (PEP 88?). The > reasoning was well laid out here: > > http://mail.python.org/pipermail/python-dev/2010-November/105641.html > http://mail.python.org/pipermail/python-dev/2010-November/105642.html In those emails Fred provides a solution to his most substantial difficulty, that other people base their own documents off pep8, by recommending that extension documents should refer to a specific revision. I don't think those reasons are compelling and the cost of splitting the Python development style guide into multiple documents are higher. (They run the risk of contradicting each other, if you want to find a particular rule you have multiple places to check, there is no single authoritative place to send people, people *wanting* to base documents off the Python style rules now have to refer to multiple places, etc.) So -1 on splitting Python development style guide into multiple documents. Michael >> Have we agreed the policy though? >> > > Everybody has their own opinion on the matter. This discussion thread > is getting too fractured to actually get us far enough with the > conclusions. Let's make a PEP and discuss concrete wording on a > concrete proposal. > >>> The library documentation may then contain a note about the difference >>> in compatibility guarantees for public and private APIs, say that any >>> interface and behaviour documented in the manual qualifies as public, >>> then point readers to PEP 8 for the precise details. >>> >> >> +1 > > Yes, point to PEP 88. > > > Best regards, > ?ukasz Langa > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fdrake at acm.org Wed Nov 17 14:21:57 2010 From: fdrake at acm.org (Fred Drake) Date: Wed, 17 Nov 2010 08:21:57 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE3D497.50102@voidspace.org.uk> References: <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> Message-ID: 2010/11/17 Michael Foord : > So -1 on splitting Python development style guide into multiple documents. I don't think that the publicness or API stability promises of the standard library are part of a style guide. They're an essential part of the library documentation. They aren't a guide for 3rd-party code, and are specific to the standard library. If we can't come up with something reasonable for the standard library, we *certainly* shouldn't be making recommendations on the matter for 3rd party code. If we do come up with something reasonable, we can recommend it to others later (once field-proven), and without duplication. (Possibly by referring to the standard library documentation, and possibly by refactoring. That's not important until we have something, though.) ? -Fred -- Fred L. Drake, Jr.? ? "A storm broke loose in my mind."? --Albert Einstein From ncoghlan at gmail.com Wed Nov 17 14:24:39 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 17 Nov 2010 23:24:39 +1000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE3D497.50102@voidspace.org.uk> References: <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> Message-ID: 2010/11/17 Michael Foord : > I don't think those reasons are compelling and the cost of splitting the > Python development style guide into multiple documents are higher. (They run > the risk of contradicting each other, if you want to find a particular rule > you have multiple places to check, there is no single authoritative place to > send people, people *wanting* to base documents off the Python style rules > now have to refer to multiple places, etc.) > > So -1 on splitting Python development style guide into multiple documents. Indeed. We don't need to clarify things very often, but the idea of creating a new PEP every time we want to make something explicit that was historically implicit (or otherwise underspecified) is a silly idea. Allowing traceable revisions is what version control is for, and hence why the PEP archive is part of the SVN repository. As far as notifiying current developers of any changes, they will generally be following python-dev anyway, or else will get pulled up on python-checkins if the policy change is significant (and this one really *isn't* all that significant - the only people it will affect are those deciding whether to document or deprecate implicitly public APIs and that almost never happens, since the vast majority of our APIs are explicitly public or private). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Wed Nov 17 14:25:34 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 17 Nov 2010 13:25:34 +0000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> Message-ID: <4CE3D7CE.8030108@voidspace.org.uk> On 17/11/2010 13:21, Fred Drake wrote: > 2010/11/17 Michael Foord : >> So -1 on splitting Python development style guide into multiple documents. > I don't think that the publicness or API stability promises of the > standard library are part of a style guide. They're an essential part > of the library documentation. They aren't a guide for 3rd-party code, > and are specific to the standard library. PEP 8 *isn't* targeted at third party code - is the development style guide for the Python standard library. This document gives coding conventions for the Python code comprising the standard library in the main Python distribution. The ideal place for informing the Python core developers the naming conventions we should use for our public APIs... (Which is why Guido said that a style guide *is* the right place for this information.) It doesn't mean it shouldn't be information provided to library users as well. (As discussed.) All the best, Michael Foord > If we can't come up with something reasonable for the standard > library, we *certainly* shouldn't be making recommendations on the > matter for 3rd party code. If we do come up with something > reasonable, we can recommend it to others later (once field-proven), > and without duplication. (Possibly by referring to the standard > library documentation, and possibly by refactoring. That's not > important until we have something, though.) > > > -Fred > > -- > Fred L. Drake, Jr. > "A storm broke loose in my mind." --Albert Einstein -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From dirkjan at ochtman.nl Wed Nov 17 14:23:59 2010 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Wed, 17 Nov 2010 14:23:59 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <4CE3CFE5.7070803@jcea.es> References: <4CE2CF8F.4040500@jcea.es> <4CE3CFE5.7070803@jcea.es> Message-ID: On Wed, Nov 17, 2010 at 13:51, Jesus Cea wrote: > I can't find the mail now, but I remember that months ago the Mercurial > migration schedule was mid-december. I wonder if there is any update. I'm still aiming for that date. I've had some problems getting the test repository together. It's almost done, but I'm on holiday in Boston and NYC this week, so I don't have much time to spend on it. The delay shouldn't be much more than a week, and we'll just compress the testing period such that the migration date should still be about the same, release schedules willing. Georg, if you have any further questions, mail is better than IRC while I'm here. Cheers, Dirkjan From phd at phd.pp.ru Wed Nov 17 14:29:59 2010 From: phd at phd.pp.ru (Oleg Broytman) Date: Wed, 17 Nov 2010 16:29:59 +0300 Subject: [Python-Dev] python3k vs _ast In-Reply-To: <201011171348.07169.emile.anclin@logilab> References: <201011171348.07169.emile.anclin@logilab> Message-ID: <20101117132959.GA29283@phd.pp.ru> Seems to be rather a usage question, not a development question (python-dev is about *developing* python, not *using* it). On Wed, Nov 17, 2010 at 01:48:06PM +0100, Emile Anclin wrote: > hello everybody, > > migrating Pylint to python3.x, we encounter a little problem : > in the tree generated by _ast, if we consider a "args" node (representing > an argument of a function), the "lineno" (and the "col_offset") > information disappeared from those nodes. Is there a particular > reason for that ? In python2.x, the "args" nodes were just "Name" nodes, > and as for now we keep them as "AssName" nodes in astng/pylint and would > like to know where it was defined. > > thx for any information > > -- > > Emile Anclin > http://www.logilab.fr/ http://www.logilab.org/ > Informatique scientifique & et gestion de connaissances Oleg. -- Oleg Broytman http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From ncoghlan at gmail.com Wed Nov 17 14:30:25 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 17 Nov 2010 23:30:25 +1000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> Message-ID: On Wed, Nov 17, 2010 at 11:21 PM, Fred Drake wrote: > 2010/11/17 Michael Foord : >> So -1 on splitting Python development style guide into multiple documents. > > I don't think that the publicness or API stability promises of the > standard library are part of a style guide. ?They're an essential part > of the library documentation. ?They aren't a guide for 3rd-party code, > and are specific to the standard library. > > If we can't come up with something reasonable for the standard > library, we *certainly* shouldn't be making recommendations on the > matter for 3rd party code. ?If we do come up with something > reasonable, we can recommend it to others later (once field-proven), > and without duplication. ?(Possibly by referring to the standard > library documentation, and possibly by refactoring. ?That's not > important until we have something, though.) Would it make people happier if we left PEP 7 and PEP 8 alone, and put the clarification of what constitutes a "public API" into PEP 5 instead? PEP 5 currently the deprecation policy for language constructs, it would be easy enough to extend it to all public APIs. The library documentation is *not* the right place for quibbling about what constitutes a public API when using other means than the library documentation to find APIs to call. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From lukasz at langa.pl Wed Nov 17 14:31:41 2010 From: lukasz at langa.pl (=?UTF-8?B?xYF1a2FzeiBMYW5nYQ==?=) Date: Wed, 17 Nov 2010 14:31:41 +0100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE3D497.50102@voidspace.org.uk> References: <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@lan ga.pl> <4CE3D497.50102@voidspace.or g.uk> Message-ID: <4CE3D93D.3010601@langa.pl> Am 17.11.2010 14:11, schrieb Michael Foord: > I don't think those reasons are compelling and the cost of splitting > the Python development style guide into multiple documents are higher. > (They run the risk of contradicting each other, if you want to find a > particular rule you have multiple places to check, there is no single > authoritative place to send people, people *wanting* to base documents > off the Python style rules now have to refer to multiple places, etc.) > > So -1 on splitting Python development style guide into multiple > documents. > Bah, again my English skills failed me in a critical moment ;) I was proposing creation of PEP 88 to supersede PEP 8. This would be better IMO for the following reasons: 1. Existing projects wouldn't have to explain afterwards why they differ from PEP 8, e.g. in terms of public/private API declaration. "Your project claims PEP8 conformance! Why don't you use __all__?" "Ah, that was before they've added this part to PEP8." 2. All other projects (new and old) would have a much more explicit (better than implicit) sign that *something significant has changed* in the recommended style. 3. As someone already said, PEP8 is not visible enough. Transition from PEP 8 to PEP 88 could help to make some hype that would help raise the awareness within the community. Mutating PEP8 is bad form. We fight mercilessly over source code backwards compatibility so I think PEPs should be taken just as seriously in that regard. ?ukasz From benjamin at python.org Wed Nov 17 14:36:37 2010 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 17 Nov 2010 07:36:37 -0600 Subject: [Python-Dev] python3k vs _ast In-Reply-To: <20101117132959.GA29283@phd.pp.ru> References: <201011171348.07169.emile.anclin@logilab> <20101117132959.GA29283@phd.pp.ru> Message-ID: 2010/11/17 Oleg Broytman : > Seems to be rather a usage question, not a development question (python-dev > is about *developing* python, not *using* it). Well, technically I think it's a feature request. > > On Wed, Nov 17, 2010 at 01:48:06PM +0100, Emile Anclin wrote: >> hello everybody, >> >> migrating Pylint to python3.x, we encounter a little problem : >> in the tree generated by _ast, if we consider a "args" node (representing >> an argument of a function), the "lineno" (and the "col_offset") >> information disappeared from those nodes. Is there a particular >> reason for that ? In python2.x, the "args" nodes were just "Name" nodes, >> and as for now we keep them as "AssName" nodes in astng/pylint and would >> like to know where it was defined. I wouldn't object to adding them back if you want to file a bug report. -- Regards, Benjamin From fdrake at acm.org Wed Nov 17 14:45:03 2010 From: fdrake at acm.org (Fred Drake) Date: Wed, 17 Nov 2010 08:45:03 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> Message-ID: On Wed, Nov 17, 2010 at 8:30 AM, Nick Coghlan wrote: > The library documentation is *not* the right place for quibbling about > what constitutes a public API when using other means than the library > documentation to find APIs to call. Quibbling can happen on the mailing list, where it can be ignored by those who aren't interested. But the documentation is the right place to document what we come up with for the standard library. I expect what the tools do will inform any decisions, and the tools (those in the stdlib) will henceforth be maintained with that in mind. I *am* suggesting that the scope of this be restricted to what's appropriate for the standard library, rather than a general recommendation for others. Third-party projects are free to use what we come up with, or provide their own policies. That's theirs to decide, and I see no value in interfering with that. ? -Fred -- Fred L. Drake, Jr.? ? "A storm broke loose in my mind."? --Albert Einstein From fuzzyman at voidspace.org.uk Wed Nov 17 14:53:24 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 17 Nov 2010 13:53:24 +0000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE3D93D.3010601@langa.pl> References: <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@lan ga.pl> <4CE3D497.50102@voidspace.or g.uk> <4CE3D93D.3010601@langa.pl> Message-ID: <4CE3DE54.2070008@voidspace.org.uk> On 17/11/2010 13:31, ?ukasz Langa wrote: > Am 17.11.2010 14:11, schrieb Michael Foord: >> I don't think those reasons are compelling and the cost of splitting >> the Python development style guide into multiple documents are >> higher. (They run the risk of contradicting each other, if you want >> to find a particular rule you have multiple places to check, there is >> no single authoritative place to send people, people *wanting* to >> base documents off the Python style rules now have to refer to >> multiple places, etc.) >> >> So -1 on splitting Python development style guide into multiple >> documents. >> > > Bah, again my English skills failed me in a critical moment ;) I was > proposing creation of PEP 88 to supersede PEP 8. This would be better > IMO for the following reasons: > > 1. Existing projects wouldn't have to explain afterwards why they > differ from PEP 8, e.g. in terms of public/private API declaration. > "Your project claims PEP8 conformance! Why don't you use __all__?" > "Ah, that was before they've added this part to PEP8." > > 2. All other projects (new and old) would have a much more explicit > (better than implicit) sign that *something significant has changed* > in the recommended style. > > 3. As someone already said, PEP8 is not visible enough. Transition > from PEP 8 to PEP 88 could help to make some hype that would help > raise the awareness within the community. > > Mutating PEP8 is bad form. We fight mercilessly over source code > backwards compatibility so I think PEPs should be taken just as > seriously in that regard. Given the following: http://code.python.org/hg/peps/log/6b223d6b8b24/pep-0008.txt Anyone who thinks that PEP 8 is immutable (and should remain so) is already wrong... As discussed, the goal is to codify what is already considered "best practise" within the wider community and the standard library *anyway*. So in practise this won't be a great surprise or change. As to the publicity, PEP 8 is both the most widely known PEP and the most widely known Python style guide. This isn't an argument for letting it rot, nor for deprecating it and invalidating all those tutorials / developers / links / books that consider it authoritative. Better to carefully and slowly evolve it as practise and the language change. For those wanting immutable versions we provide that in the form of specific revisions. All the best, Michael > > > ?ukasz -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From steve at pearwood.info Wed Nov 17 15:16:53 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 18 Nov 2010 01:16:53 +1100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <87lj4t9cqq.fsf@benfinney.id.au> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <87lj4t9cqq.fsf@benfinney.id.au> Message-ID: <4CE3E3D5.3040607@pearwood.info> Ben Finney wrote: > I don't know about Guido, but I'd be ?1 on suggestions to add more > normative information to PEP 7, PEP 8, PEP 257, or any other established > style guide PEP. I certainly don't want to have to keep going back to > the same documents frequently just to see if the set of recommendations > I already know has changed recently. This is not a problem unique to any specific PEP. How do we learn about any changes that might interest us? What are the alternatives? - our knowledge is fixed to what we knew at some particular date, and gets further and further obsolete as time goes by; - we actively search out new knowledge; - we wait for somebody to tell us that something we knew has changed. (E.g. I was rather surprised to learn that, sometime over the last few years, the number of extra-solar planets known to astronomers have increased from the one or two I was aware of to multiple dozens.) All three strategies have advantages and disadvantages. Regardless of whether future versions of the style-guide are called "PEP 8" or whether they are given new names ("PEP 8" -> "PEP 88" -> ...), we have the identical problem -- how do we know whether or not there is a new version of the style guide to look for? In twelve months time, how sure will we be that PEP 88 is the most recent version to look for? Perhaps we missed the release of PEP 95. The one advantage of giving each revision of the document an updated name is that, under some circumstances, we *might* be able to detect a new revision easily. If I think that PEP 88 is the most recent version, and somebody says that the recommended style guide is PEP 89, I might: - think that he merely made a mistake, and meant to say 88; or - think that there is a new document for me to look at. > Rather, I took Guido's mention of ?this belongs in a style guide? as > suggesting a *new* style guide. Perhaps one that explicitly obsoletes an > existing one or perhaps not; either way, the updated normative > recommendations are in a new document with a new name, so that one knows > whether one has already read it. How do you know which is the most recent version of the style guide to look at? Instead of doing a O(1) lookup of PEP 8, you have to follow a potentially O(N) search: PEP 8 is obsoleted by PEP 88... go and look at PEP 88. PEP 88 is obsoleted by PEP 93... go at look at PEP 93. PEP 93 is obsoleted by PEP 123... go and look at PEP 123. PEP 123 doesn't contain an "obsoleted by" notice, so: (1) either it is the current document, or (2) it has been obsoleted, but the link to the new version was missed, and it is now very hard to discover what the current document is called. Personally, I don't think the current PEP arrangement is broken enough to change it. Each PEP is already tracked in VCS and history is available for it. There's insufficient advantage, and some disadvantage, to splitting each revision of the PEPs into new documents with new names. -1 on the idea. -- Steven From emile.anclin at logilab.fr Wed Nov 17 15:18:14 2010 From: emile.anclin at logilab.fr (Emile Anclin) Date: Wed, 17 Nov 2010 15:18:14 +0100 Subject: [Python-Dev] python3k vs _ast In-Reply-To: References: <201011171348.07169.emile.anclin@logilab> <20101117132959.GA29283@phd.pp.ru> Message-ID: <201011171518.14387.emile.anclin@logilab> On Wednesday 17 November 2010 14:36:37 Benjamin Peterson wrote: > I wouldn't object to adding them back if you want to file a bug report. Ok, thank you for quick reply. here is the issue : http://bugs.python.org/issue10445 -- Emile Anclin http://www.logilab.fr/ http://www.logilab.org/ Informatique scientifique & et gestion de connaissances From steve at pearwood.info Wed Nov 17 15:19:22 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 18 Nov 2010 01:19:22 +1100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE3D93D.3010601@langa.pl> References: <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@lan ga.pl> <4CE3D497.50102@voidspace.or g.uk> <4CE3D93D.3010601@langa.pl> Message-ID: <4CE3E46A.9030905@pearwood.info> ?ukasz Langa wrote: > Mutating PEP8 is bad form. We fight mercilessly over source code > backwards compatibility so I think PEPs should be taken just as > seriously in that regard. There's no comparison between the two. If you change your library's API -- not "source code", it doesn't matter if the source code changes so long as the interface remains backwards compatible -- then you will break other people's code. If we change PEP 8, then all that will happen is that some people's coding style will no longer be exactly compatible with PEP 8. Their code will continue to work. -- Steven From ncoghlan at gmail.com Wed Nov 17 15:19:39 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 18 Nov 2010 00:19:39 +1000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> Message-ID: On Wed, Nov 17, 2010 at 11:45 PM, Fred Drake wrote: > On Wed, Nov 17, 2010 at 8:30 AM, Nick Coghlan wrote: >> The library documentation is *not* the right place for quibbling about >> what constitutes a public API when using other means than the library >> documentation to find APIs to call. > > Quibbling can happen on the mailing list, where it can be ignored by > those who aren't interested. > > But the documentation is the right place to document what we come up > with for the standard library. ?I expect what the tools do will inform > any decisions, and the tools (those in the stdlib) will henceforth be > maintained with that in mind. > > I *am* suggesting that the scope of this be restricted to what's > appropriate for the standard library, rather than a general > recommendation for others. ?Third-party projects are free to use what > we come up with, or provide their own policies. ?That's theirs to > decide, and I see no value in interfering with that. The standard library documentation should say that the public API is what the documentation says it is. Officially, anyone going outside those documented APIs should not be surprised if things get removed or changed arbitrarily without warning. That has long been the python-dev policy and I, for one, don't think it should change. What we're talking about in this thread is what to do in the grey area of APIs which are not included in the official documentation, but also don't have names starting with an underscore so they "look public" when reading the source code or exploring the API in the interactive interpreter. It *may* be appropriate for the standard library documentation to acknowledge that this grey area exists (I'm not yet convinced on that point), but it definitely should *not* be encouraging anyone to rely on it or on our policies for dealing with it. The policy we're aiming to clarify here is what we should do when we come across standard library APIs that land in the grey area, with there being two appropriate ways to deal with them: 1. Document them and make them officially public 2. Deprecate the public names and make them officially private (with the public names later removed in accordance with normal deprecation procedures) The actual approach taken will vary on a case-by-case basis (and is a little trickier in the case of module level globals, since those can't be deprecated properly), but is always aimed at bringing the standard library more into line with the official position (i.e. APIs are either public-and-documented or private). So the official policy from a language *user* point of view would remain unchanged (i.e. if it isn't documented, you're on your own). As a *pragmatic* policy, however, we would explicitly acknowledge that developers may inadvertently use an undocumented API without realising that it isn't technically public, and hence apply the normal deprecation process even though the official policy says we don't have to. Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From rdmurray at bitdance.com Wed Nov 17 15:19:35 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 17 Nov 2010 09:19:35 -0500 Subject: [Python-Dev] python3k vs _ast In-Reply-To: References: <201011171348.07169.emile.anclin@logilab> <20101117132959.GA29283@phd.pp.ru> Message-ID: <20101117141935.599592188CD@kimball.webabinitio.net> On Wed, 17 Nov 2010 07:36:37 -0600, Benjamin Peterson wrote: > 2010/11/17 Oleg Broytman : > > Seems to be rather a usage question, not a development question (python-dev > > is about *developing* python, not *using* it). > > Well, technically I think it's a feature request. > > > > > On Wed, Nov 17, 2010 at 01:48:06PM +0100, Emile Anclin wrote: > >> hello everybody, > >> > >> migrating Pylint to python3.x, we encounter a little problem : > >> in the tree generated by _ast, if we consider a "args" node (representing > >> an argument of a function), the "lineno" (and the "col_offset") > >> information disappeared from those nodes. Is there a particular > >> reason for that ? In python2.x, the "args" nodes were just "Name" nodes, > >> and as for now we keep them as "AssName" nodes in astng/pylint and would > >> like to know where it was defined. > > I wouldn't object to adding them back if you want to file a bug report. It also seems to me that it was a perfectly appropriate question for this list. The question was "why did you developers drop this (obscure) feature that we depend on in Python3?" I don't think that question would make sense on python-list. Granted, there's a fuzzy line there, but pylint is really development infrastructure :) The python-porting list would have been a good alternate choice. -- R. David Murray www.bitdance.com From fuzzyman at voidspace.org.uk Wed Nov 17 15:25:01 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 17 Nov 2010 14:25:01 +0000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> Message-ID: <4CE3E5BD.6050700@voidspace.org.uk> On 17/11/2010 14:19, Nick Coghlan wrote: > On Wed, Nov 17, 2010 at 11:45 PM, Fred Drake wrote: >> On Wed, Nov 17, 2010 at 8:30 AM, Nick Coghlan wrote: >>> The library documentation is *not* the right place for quibbling about >>> what constitutes a public API when using other means than the library >>> documentation to find APIs to call. >> Quibbling can happen on the mailing list, where it can be ignored by >> those who aren't interested. >> >> But the documentation is the right place to document what we come up >> with for the standard library. I expect what the tools do will inform >> any decisions, and the tools (those in the stdlib) will henceforth be >> maintained with that in mind. >> >> I *am* suggesting that the scope of this be restricted to what's >> appropriate for the standard library, rather than a general >> recommendation for others. Third-party projects are free to use what >> we come up with, or provide their own policies. That's theirs to >> decide, and I see no value in interfering with that. > The standard library documentation should say that the public API is > what the documentation says it is. Officially, anyone going outside > those documented APIs should not be surprised if things get removed or > changed arbitrarily without warning. That has long been the python-dev > policy and I, for one, don't think it should change. > > What we're talking about in this thread is what to do in the grey area > of APIs which are not included in the official documentation, but also > don't have names starting with an underscore so they "look public" We're *also* discussing codifying the naming conventions (or using __all__) within the standard library, so it isn't just about deprecations (which is why I think PEP 8 rather than PEP 5). This is so that in the future if a name looks public users can have more confidence that it actually is... Obviously what to do about modules that don't follow these rules currently is a big part of it (and how the discussion started). All the best, Michael > when reading the source code or exploring the API in the interactive > interpreter. It *may* be appropriate for the standard library > documentation to acknowledge that this grey area exists (I'm not yet > convinced on that point), but it definitely should *not* be > encouraging anyone to rely on it or on our policies for dealing with > it. > > The policy we're aiming to clarify here is what we should do when we > come across standard library APIs that land in the grey area, with > there being two appropriate ways to deal with them: > 1. Document them and make them officially public > 2. Deprecate the public names and make them officially private (with > the public names later removed in accordance with normal deprecation > procedures) > > The actual approach taken will vary on a case-by-case basis (and is a > little trickier in the case of module level globals, since those can't > be deprecated properly), but is always aimed at bringing the standard > library more into line with the official position (i.e. APIs are > either public-and-documented or private). > > So the official policy from a language *user* point of view would > remain unchanged (i.e. if it isn't documented, you're on your own). As > a *pragmatic* policy, however, we would explicitly acknowledge that > developers may inadvertently use an undocumented API without realising > that it isn't technically public, and hence apply the normal > deprecation process even though the official policy says we don't have > to. > > Regards, > Nick. > -- http://www.voidspace.org.uk/ From jcea at jcea.es Wed Nov 17 15:31:02 2010 From: jcea at jcea.es (Jesus Cea) Date: Wed, 17 Nov 2010 15:31:02 +0100 Subject: [Python-Dev] I need help with IO testuite Message-ID: <4CE3E726.2030008@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all. I am modifying IO module for Python 3.2, and I am unable to understand the mechanism used in IO testsuite to test both the C and the Python implementation. In particular I need to test that the implementation passes some parameters to the OS. The module uses "Mock" classes, but I think "Mock" is something else, and I don't see how it interpose between the C/Python code and the OS. If somebody could explain the mechanism a bit... Thanks for your time and attention. Some background: http://bugs.python.org/issue10142 - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOPnJplgi5GaxT1NAQLVqQP/cf9+hdLdoSMzY+cSquq7YZMiQOQ0aMEH ZRn+su4F3qg5e8MgEQOXFj9uGEjVDLwonE4nBZ+T3ovBcPCyGaLB/K/YttZGVM5/ O3gpzZss9bkMvuWQCblyEJp8uzJC831AwPDMg1Q0nbMiTnJlW5dY1CX9BD0gYPBW oIVBt2oBfCI= =hq7M -----END PGP SIGNATURE----- From ncoghlan at gmail.com Wed Nov 17 15:34:22 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 18 Nov 2010 00:34:22 +1000 Subject: [Python-Dev] Proposed adjustments to PEP 0 generation Message-ID: The lists of Meta-PEPs and Other Informational PEPs at the beginning of PEP 0 are starting to get a little long, and contain some outdated information that doesn't really deserve pride of place at the top of the PEP index. If I don't hear any objections in this thread, I plan to make the following tweaks to the PEP 0 generator "soonish": - make these two lists respect the "Withdrawn" and "Rejected" flags (i.e. taking the relevant PEPs out of this list and dropping them into later categories) - adding a new "Historical" category for PEPs that have served their purpose and are no longer of immediate interest (primarily old release PEPs, but also the old SVN migration PEP, the DVCS study and PEP 42) Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Wed Nov 17 15:44:15 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 17 Nov 2010 15:44:15 +0100 Subject: [Python-Dev] I need help with IO testuite References: <4CE3E726.2030008@jcea.es> Message-ID: <20101117154415.41100ec5@pitrou.net> On Wed, 17 Nov 2010 15:31:02 +0100 Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi all. I am modifying IO module for Python 3.2, and I am unable to > understand the mechanism used in IO testsuite to test both the C and the > Python implementation. > > In particular I need to test that the implementation passes some > parameters to the OS. > > The module uses "Mock" classes, but I think "Mock" is something else, > and I don't see how it interpose between the C/Python code and the OS. It doesn't interpose between Python and the OS: it mocks the OS. It is, therefore, a mock (!). Consequently, if you want to test that parameters are passed to the OS, you shouldn't use a mock, but an actual file. There are several tests which already do that, it shouldn't be too hard to write your own. Regards Antoine. From ncoghlan at gmail.com Wed Nov 17 15:46:01 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 18 Nov 2010 00:46:01 +1000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE3E5BD.6050700@voidspace.org.uk> References: <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <4CE3E5BD.6050700@voidspace.org.uk> Message-ID: On Thu, Nov 18, 2010 at 12:25 AM, Michael Foord wrote: > We're *also* discussing codifying the naming conventions (or using __all__) > within the standard library, so it isn't just about deprecations (which is > why I think PEP 8 rather than PEP 5). This is so that in the future if a > name looks public users can have more confidence that it actually is... I deliberately glossed over that, since my stance on the naming conventions is "don't change them" (i.e. PEP 8 already says that a leading underscore is an internal use indicator, and I think that's how we should guide the clarification of our deprecation policy - just carving out an exception for imported modules). My original question related to dealing with the grey area in the deprecation policy (i.e. wanting to remove an API that was undocumented, but had a public name) and I'm happy that the existing style guide does answer my question (even though the implications aren't necessarily obvious). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From phd at phd.pp.ru Wed Nov 17 15:50:56 2010 From: phd at phd.pp.ru (Oleg Broytman) Date: Wed, 17 Nov 2010 17:50:56 +0300 Subject: [Python-Dev] python3k vs _ast In-Reply-To: <20101117141935.599592188CD@kimball.webabinitio.net> References: <201011171348.07169.emile.anclin@logilab> <20101117132959.GA29283@phd.pp.ru> <20101117141935.599592188CD@kimball.webabinitio.net> Message-ID: <20101117145056.GA1034@phd.pp.ru> On Wed, Nov 17, 2010 at 09:19:35AM -0500, R. David Murray wrote: > On Wed, 17 Nov 2010 07:36:37 -0600, Benjamin Peterson wrote: > > 2010/11/17 Oleg Broytman : > > > Seems to be rather a usage question, not a development question (python-dev > > > is about *developing* python, not *using* it). > > > > Well, technically I think it's a feature request. > > > > > > > > On Wed, Nov 17, 2010 at 01:48:06PM +0100, Emile Anclin wrote: > > >> hello everybody, > > >> > > >> migrating Pylint to python3.x, we encounter a little problem : > > >> in the tree generated by _ast, if we consider a "args" node (representing > > >> an argument of a function), the "lineno" (and the "col_offset") > > >> information disappeared from those nodes. Is there a particular > > >> reason for that ? In python2.x, the "args" nodes were just "Name" nodes, > > >> and as for now we keep them as "AssName" nodes in astng/pylint and would > > >> like to know where it was defined. > > > > I wouldn't object to adding them back if you want to file a bug report. > > It also seems to me that it was a perfectly appropriate question > for this list. The question was "why did you developers drop this > (obscure) feature that we depend on in Python3?" The problem for me is the wording. A question like "why did you developers drop a feature?" is certainly a development question, while "like to know where it was defined" seems more like a usage question. I apologize for misunderstanding. > I don't think that > question would make sense on python-list. Granted, there's a fuzzy > line there, but pylint is really development infrastructure :) > > The python-porting list would have been a good alternate choice. > > -- > R. David Murray www.bitdance.com Oleg. -- Oleg Broytman http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From ncoghlan at gmail.com Wed Nov 17 15:58:30 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 18 Nov 2010 00:58:30 +1000 Subject: [Python-Dev] I need help with IO testuite In-Reply-To: <4CE3E726.2030008@jcea.es> References: <4CE3E726.2030008@jcea.es> Message-ID: On Thu, Nov 18, 2010 at 12:31 AM, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi all. I am modifying IO module for Python 3.2, and I am unable to > understand the mechanism used in IO testsuite to test both the C and the > Python implementation. > > In particular I need to test that the implementation passes some > parameters to the OS. > > The module uses "Mock" classes, but I think "Mock" is something else, > and I don't see how it interpose between the C/Python code and the OS. The "Mock" refers to stubbing out or substituting various layers of the IO stack with the Python implementations in the test file. It isn't related specifically to the C/Python switching. > If somebody could explain the mechanism a bit... The actual C/Python switching happens later in the file. It is best to start from the bottom of the file (with the list of test cases that are actually executed) and work your way up from there. For what Amaury is talking about, what you can test is that the higher layers of the IO stack (e.g. BufferedReader) correctly pass the new flags down to the RawIO layer. You're correct that you can't really test that RawIO is actually passing the flags down to the OS. However, if you have a way to check whether the filesystem in use is ZFS, you may be able to create a conditionally executed test, such that correct behaviour can be verified just by running on a machine that uses ZFS for its temp directory. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From tseaver at palladion.com Wed Nov 17 15:58:37 2010 From: tseaver at palladion.com (Tres Seaver) Date: Wed, 17 Nov 2010 09:58:37 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE3E3D5.3040607@pearwood.info> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <87lj4t9cqq.fsf@benfinney.id.au> <4CE3E3D5.3040607@pearwood.info> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/17/2010 09:16 AM, Steven D'Aprano wrote: > Ben Finney wrote: > >> I don't know about Guido, but I'd be ?1 on suggestions to add more >> normative information to PEP 7, PEP 8, PEP 257, or any other established >> style guide PEP. I certainly don't want to have to keep going back to >> the same documents frequently just to see if the set of recommendations >> I already know has changed recently. > > This is not a problem unique to any specific PEP. How do we learn about > any changes that might interest us? What are the alternatives? > > - our knowledge is fixed to what we knew at some particular date, and > gets further and further obsolete as time goes by; > > - we actively search out new knowledge; > > - we wait for somebody to tell us that something we knew has changed. > > (E.g. I was rather surprised to learn that, sometime over the last few > years, the number of extra-solar planets known to astronomers have > increased from the one or two I was aware of to multiple dozens.) > > All three strategies have advantages and disadvantages. > > Regardless of whether future versions of the style-guide are called "PEP > 8" or whether they are given new names ("PEP 8" -> "PEP 88" -> ...), we > have the identical problem -- how do we know whether or not there is a > new version of the style guide to look for? In twelve months time, how > sure will we be that PEP 88 is the most recent version to look for? > Perhaps we missed the release of PEP 95. > > The one advantage of giving each revision of the document an updated > name is that, under some circumstances, we *might* be able to detect a > new revision easily. If I think that PEP 88 is the most recent version, > and somebody says that the recommended style guide is PEP 89, I might: > > - think that he merely made a mistake, and meant to say 88; or > - think that there is a new document for me to look at. > > >> Rather, I took Guido's mention of ?this belongs in a style guide? as >> suggesting a *new* style guide. Perhaps one that explicitly obsoletes an >> existing one or perhaps not; either way, the updated normative >> recommendations are in a new document with a new name, so that one knows >> whether one has already read it. > > How do you know which is the most recent version of the style guide to > look at? Instead of doing a O(1) lookup of PEP 8, you have to follow a > potentially O(N) search: > > PEP 8 is obsoleted by PEP 88... go and look at PEP 88. > PEP 88 is obsoleted by PEP 93... go at look at PEP 93. > PEP 93 is obsoleted by PEP 123... go and look at PEP 123. > PEP 123 doesn't contain an "obsoleted by" notice, so: > (1) either it is the current document, or > (2) it has been obsoleted, but the link to the new version was missed, > and it is now very hard to discover what the current document is called. > > Personally, I don't think the current PEP arrangement is broken enough > to change it. Each PEP is already tracked in VCS and history is > available for it. There's insufficient advantage, and some disadvantage, > to splitting each revision of the PEPs into new documents with new > names. -1 on the idea. FWIW, Guido recently ruled that updating PEP 333 to indicate how WSGI would work in Python3 was not appropriate, and suggested instead a new PEP (3333), stating[1]: Of those, IMO only textual clarifications ought to be made to an existing, accepted, widely implemented standards-track PEP. Note that the BDFL ruled this way even though the changes to PEP 333 were essentially clarifications which applied only to Python 3: the existing Python 2 semantics would have rmeained the same.[2] [1] http://permalink.gmane.org/gmane.comp.python.devel/117269 [2] http://permalink.gmane.org/gmane.comp.python.devel/117249 Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkzj7ZwACgkQ+gerLs4ltQ7mPgCg1TpA+rF0WigLGB1xeuUTyRF7 MLQAnjGUgWZUqQBLfbwl6RanA+ME4Hth =zuiQ -----END PGP SIGNATURE----- From ncoghlan at gmail.com Wed Nov 17 16:00:20 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 18 Nov 2010 01:00:20 +1000 Subject: [Python-Dev] I need help with IO testuite In-Reply-To: References: <4CE3E726.2030008@jcea.es> Message-ID: On Thu, Nov 18, 2010 at 12:58 AM, Nick Coghlan wrote: > For what Amaury is talking about, what you can test is that the higher > layers of the IO stack (e.g. BufferedReader) correctly pass the new > flags down to the RawIO layer. You're correct that you can't really > test that RawIO is actually passing the flags down to the OS. However, > if you have a way to check whether the filesystem in use is ZFS, you > may be able to create a conditionally executed test, such that correct > behaviour can be verified just by running on a machine that uses ZFS > for its temp directory. On further thought, the test should probably be unconditional - just allow a ValueError as an acceptable result that indicates the underlying filesystem isn't ZFS. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From alexander.belopolsky at gmail.com Wed Nov 17 16:17:45 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 17 Nov 2010 10:17:45 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> Message-ID: On Wed, Nov 17, 2010 at 9:19 AM, Nick Coghlan wrote: .. > The standard library documentation should say that the public API is > what the documentation says it is. Officially, anyone going outside > those documented APIs should not be surprised if things get removed or > changed arbitrarily without warning. That has long been the python-dev > policy and I, for one, don't think it should change. > +1 That's another reason why it is appropriate to document this in both Library Reference and the Developers Guide (whatever it is). In the Library Reference we can say point-blank: "This is the authoritative documentation of what Python Library provides. Anything not mentioned here is subject to change between releases without notice." In the Developers Guide, guide, however we can take a more nuanced approach that would start with a general policy that changing existing APIs public or not is costly and should not be done without significant offsetting benefit. More on this below. > What we're talking about in this thread is what to do in the grey area > of APIs which are not included in the official documentation, but also > don't have names starting with an underscore so they "look public" > when reading the source code or exploring the API in the interactive > interpreter. It *may* be appropriate for the standard library > documentation to acknowledge that this grey area exists (I'm not yet > convinced on that point), but it definitely should *not* be > encouraging anyone to rely on it or on our policies for dealing with > it. > Users will venture into grey area regardless of whether its existence is acknowledged or not. Developers Guide should take this into consideration, but there is no need to encourage this practice in the Library Reference. In the Developers Guide, we can list a set of factors that need to be considered when changing or removing an undocumented API. For example: 1. Does it start with an underscore? 2. Is __all__ defined for the module? Id so, is the name in __all__? 3. Is API name well chosen for what it does? 4. How old is the module? Was is written before modern policies have been adopted? 5. Is API used in the standard library outside of the module? 6. Is API broken? Can it be fixed? (If it was broken in several releases and nobody complained - it is ok to remove.) 7. Is API used? General google search or google code search can give an insight. The decision to remove an API should be always done on a case by case basis. Purely style compliance changes such as let's add __all__ and rename all names not in all by prepending an underscore should always add old names back as deprecated aliases. (Breaking from xyz import * by adding __all__ to xyz is probably ok because code using from xyz import * may be broken by any addition to xyz and users have been warned.) .. > So the official policy from a language *user* point of view would > remain unchanged (i.e. if it isn't documented, you're on your own). As > a *pragmatic* policy, however, we would explicitly acknowledge that > developers may inadvertently use an undocumented API without realising > that it isn't technically public, and hence apply the normal > deprecation process even though the official policy says we don't have > to. +1 From foom at fuhm.net Wed Nov 17 16:24:12 2010 From: foom at fuhm.net (James Y Knight) Date: Wed, 17 Nov 2010 10:24:12 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> Message-ID: <3B40F127-F82B-4A91-9485-3D089DAF4A4F@fuhm.net> On Nov 17, 2010, at 9:19 AM, Nick Coghlan wrote: > (and is a little trickier in the case of module level globals, since those can't be deprecated properly) People keep saying this, but there have already been examples shown of how to do it. I actually think that python should include a way to do so standard -- it's a reasonable enough desire, as shown by how many times in this thread the inability to do so has been mentioned. If the existing working 3rd-party mechanisms aren't good enough for python-dev standards, come up with a new way... James From guido at python.org Wed Nov 17 16:30:03 2010 From: guido at python.org (Guido van Rossum) Date: Wed, 17 Nov 2010 07:30:03 -0800 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <3B40F127-F82B-4A91-9485-3D089DAF4A4F@fuhm.net> References: <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <3B40F127-F82B-4A91-9485-3D089DAF4A4F@fuhm.net> Message-ID: On Wed, Nov 17, 2010 at 7:24 AM, James Y Knight wrote: > On Nov 17, 2010, at 9:19 AM, Nick Coghlan wrote: >> (and is a little trickier in the case of module level globals, since those can't be deprecated properly) > > People keep saying this, but there have already been examples shown of how to do it. I actually think that python should include a way to do so standard -- it's a reasonable enough desire, as shown by how many times in this thread the inability to do so has been mentioned. If the existing working 3rd-party mechanisms aren't good enough for python-dev standards, come up with a new way... That's quite the distraction from the current thread though. Start discussing it on python-ideas, or submit a code fix, or something in between. But the hackish way that some 3rd party frameworks use (replacing the module object with a class instance in sys.modules) is clearly not right for the standard library (I'll explain on python-ideas if you insist). -- --Guido van Rossum (python.org/~guido) From guido at python.org Wed Nov 17 16:52:37 2010 From: guido at python.org (Guido van Rossum) Date: Wed, 17 Nov 2010 07:52:37 -0800 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <87lj4t9cqq.fsf@benfinney.id.au> References: <64DF4272-FF17-4E82-96F5-1DA6CA3A06EC@gmail.com> <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <87lj4t9cqq.fsf@benfinney.id.au> Message-ID: On Tue, Nov 16, 2010 at 1:31 PM, Ben Finney wrote: > I don't know about Guido, but I'd be -1 on suggestions to add more > normative information to PEP 7, PEP 8, PEP 257, or any other established > style guide PEP. I certainly don't want to have to keep going back to > the same documents frequently just to see if the set of recommendations > I already know has changed recently. > > Rather, I took Guido's mention of "this belongs in a style guide" as > suggesting a *new* style guide. Perhaps one that explicitly obsoletes an > existing one or perhaps not; either way, the updated normative > recommendations are in a new document with a new name, so that one knows > whether one has already read it. That's not what I meant. In the case of style guides I think it is totally appropriate to update the PEP as new rules are developed or existing ones are clarified (or even changed). I certainly don't want to get into the situation where the style guide is spread over multiple documents that need to be taken together to make sense. It's not like PEP 8 specifies an API that is going to break code in the future -- it is a set of conventions. You could create a new PEP or move the style guide out of the PEP system (a not unreasonable option) but the effect of changes to the style guide is the same: some fraction of old code will become non-compliant. So what? A style guide is just that -- a guide for coding style. Every good style guide contains an escape clause: in PEP 8 it is the section named "A Foolish Consistency is the Hobgoblin of Little Minds". I've seen many unreasonable uses of style guides. This is a recurring theme with Google's internal style guides too. For example, some people get in an argument with a code reviewer about what's the best way to do something, and they can't agree -- so now they want a resolution in the style guide, no matter how specific their argument is to one particular context. Other people claim you cannot change a style guide because it would make existing code unnecessarily non-compliant. There are the people who insist that the style guide be followed mindlessly, even in situations where using a different style would be clearly better. Then there are the people who want to update the entire code base to become compliant after each style change. Etc., etc. All I want to say is, people lighten up. The style guide can't solve all your problems. You are never going to have all code compliant. Use the style guide when it helps, ignore it when it's in the way. Finally, there's the issue of the scope of PEP 8. Its heading says that it applies to the stdlib. The reason I put this in was so that 3rd party developers who disagreed with (part of) PEP 8 would not feel obligated to follow it. At the same time I would hope that most people see its value and follow (most of) it for their own code, accepting that a more universal set of conventions helps readability of all code. I would not be against changes to the style guide that emphasize that some rules apply specifically to the stdlib (the rules about mostly not using non-ASCII characters come to mind) and even to include some normative rules for stdlib developers (e.g. exactly how to use __all__ and private names). But we cannot hope that all stdlib modules will all look exactly alike. It is the work of many contributors, over many years, with different backgrounds and intentions. That's fine. Let's try to make new stdlib modules use the best style we can think of, but limit the time spent fretting over code that's already there. -- --Guido van Rossum (python.org/~guido) From jcea at jcea.es Wed Nov 17 17:07:02 2010 From: jcea at jcea.es (Jesus Cea) Date: Wed, 17 Nov 2010 17:07:02 +0100 Subject: [Python-Dev] Help deploying a new buildbot running OpenIndiana/x86 Message-ID: <4CE3FDA6.5040703@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, everybody. I am glad to say I am installing an OpenIndiana zone (Openindiana is a fork of Indiana, a distribution of OpenSolaris) with the aim to be a buildbot for python development. This machine has plenty of disk (even SSD!), CPU and memory for the task. I am reading http://wiki.python.org/moin/BuildBot . I have installed buildbotslave already, but I need passwords, etc., to link to python buildbot infraestructure. The machine is behind a NAT system, so any incoming connection will need to be documented and a port mapping request to be done. So, after installing buildbotslave, what is the next step?. Thanks to OpenIndiana staff, specially Alasdair Lumsden, for providing the physical resources for this attempt. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOP9pplgi5GaxT1NAQLmWQP6AqEGqEX3b50qKTP2MrkJwYQ8pXCOJm+6 fGB4jpH+i47mzgSOtANvrp1N5qOmHXzjbdWlVrL2/7ZOeLiGWSnq/ZvpTrYaysU3 o2zG4rhk48jsSYE7u0EoSKk272LmAiTU6WBSt6ZMzOGWIQxdjMhs/OVanpFybBc0 rCbATfdJ3hQ= =rIqM -----END PGP SIGNATURE----- From solipsis at pitrou.net Wed Nov 17 17:23:01 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 17 Nov 2010 17:23:01 +0100 Subject: [Python-Dev] Help deploying a new buildbot running OpenIndiana/x86 References: <4CE3FDA6.5040703@jcea.es> Message-ID: <20101117172301.36ac88f9@pitrou.net> On Wed, 17 Nov 2010 17:07:02 +0100 Jesus Cea wrote: > > I am reading http://wiki.python.org/moin/BuildBot . I have installed > buildbotslave already, but I need passwords, etc., to link to python > buildbot infraestructure. > > The machine is behind a NAT system, so any incoming connection will need > to be documented and a port mapping request to be done. There is no incoming connection; however, a bunch of outgoing connections are made to various hosts by various tests, so it's better if there's no overzealous firewall in-between. Regards Antoine. From foom at fuhm.net Wed Nov 17 17:23:43 2010 From: foom at fuhm.net (James Y Knight) Date: Wed, 17 Nov 2010 11:23:43 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <3B40F127-F82B-4A91-9485-3D089DAF4A4F@fuhm.net> Message-ID: On Nov 17, 2010, at 10:30 AM, Guido van Rossum wrote: > On Wed, Nov 17, 2010 at 7:24 AM, James Y Knight wrote: >> On Nov 17, 2010, at 9:19 AM, Nick Coghlan wrote: >>> (and is a little trickier in the case of module level globals, since those can't be deprecated properly) >> >> People keep saying this, but there have already been examples shown of how to do it. I actually think that python should include a way to do so standard -- it's a reasonable enough desire, as shown by how many times in this thread the inability to do so has been mentioned. If the existing working 3rd-party mechanisms aren't good enough for python-dev standards, come up with a new way... > > That's quite the distraction from the current thread though. Start > discussing it on python-ideas, or submit a code fix, or something in > between. But the hackish way that some 3rd party frameworks use > (replacing the module object with a class instance in sys.modules) is > clearly not right for the standard library (I'll explain on > python-ideas if you insist). I just don't want people to use the current lack as an excuse to simply remove module attributes without prior deprecation (or make a compatibility policy which recommends doing such a thing). I'll leave it up to the experts on this list (or python-ideas...) to determine how to implement a module-level deprecation in a way that isn't considered "hackish". (Or, if there is no such way, there's also the alternative of simply never removing module-level names.) James From guido at python.org Wed Nov 17 17:38:09 2010 From: guido at python.org (Guido van Rossum) Date: Wed, 17 Nov 2010 08:38:09 -0800 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3D497.50102@voidspace.org.uk> <3B40F127-F82B-4A91-9485-3D089DAF4A4F@fuhm.net> Message-ID: On Wed, Nov 17, 2010 at 8:23 AM, James Y Knight wrote: > On Nov 17, 2010, at 10:30 AM, Guido van Rossum wrote: >> On Wed, Nov 17, 2010 at 7:24 AM, James Y Knight wrote: >>> On Nov 17, 2010, at 9:19 AM, Nick Coghlan wrote: >>>> (and is a little trickier in the case of module level globals, since those can't be deprecated properly) >>> >>> People keep saying this, but there have already been examples shown of how to do it. I actually think that python should include a way to do so standard -- it's a reasonable enough desire, as shown by how many times in this thread the inability to do so has been mentioned. If the existing working 3rd-party mechanisms aren't good enough for python-dev standards, come up with a new way... >> >> That's quite the distraction from the current thread though. Start >> discussing it on python-ideas, or submit a code fix, or something in >> between. But the hackish way that some 3rd party frameworks use >> (replacing the module object with a class instance in sys.modules) is >> clearly not right for the standard library (I'll explain on >> python-ideas if you insist). > > I just don't want people to use the current lack as an excuse to simply remove module attributes without prior deprecation (or make a compatibility policy which recommends doing such a thing). I'll leave it up to the experts on this list (or python-ideas...) to determine how to implement a module-level deprecation in a way that isn't considered "hackish". (Or, if there is no such way, there's also the alternative of simply never removing module-level names.) Deprecation doesn't *require* logging a warning or raising an exception. You can also add a note to the docs, or if it is undocumented, just add a comment to the code. (Though if it is in widespread use despite being undocumented, a better way would be to document it first -- as immediately deprecated if necessary.) Deprecation is in the end a way to give people advance warning about future changes. The mechanism of the warning doesn't always have to be implemented by the interpreter/compiler/parser or whatever other tool. -- --Guido van Rossum (python.org/~guido) From jcea at jcea.es Wed Nov 17 17:52:14 2010 From: jcea at jcea.es (Jesus Cea) Date: Wed, 17 Nov 2010 17:52:14 +0100 Subject: [Python-Dev] Help deploying a new buildbot running OpenIndiana/x86 In-Reply-To: <20101117172301.36ac88f9@pitrou.net> References: <4CE3FDA6.5040703@jcea.es> <20101117172301.36ac88f9@pitrou.net> Message-ID: <4CE4083E.1080603@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 17/11/10 17:23, Antoine Pitrou wrote: > There is no incoming connection; however, a bunch of outgoing > connections are made to various hosts by various tests, so it's better > if there's no overzealous firewall in-between. I know that, just confirming. """ You'll need to get someone to create the slavename/slavepasswd on dinsdale.python.org before doing this. Talk to someone like Antoine Pitrou, Martin von L?wis, Anthony or Neal Norwitz to do this. #python-dev on freenode is a good place to ask. """ ?Could you provide the connection credential?. I rather prefer to skip the IRC (I am a XMPP guy), but I can connect to freenode if you need it. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOQIPplgi5GaxT1NAQJJggP7B+kMnhEpZQlxCy8E95Qs3Q70zJmQJXjj aodjURYlIW9PJLXUMH0dhiK3Oggsl0k/iq44pL1fu+LRpgD7bo9Snxi4IBgYlArj IMGThrpdEHKVh0r2TkVsmkCA6pAwV3crM3170ItzSDqXZPmGQgqdqFuD5fk8xQl2 caqC+sTcJjw= =zbQs -----END PGP SIGNATURE----- From foom at fuhm.net Wed Nov 17 18:05:02 2010 From: foom at fuhm.net (James Y Knight) Date: Wed, 17 Nov 2010 12:05:02 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3D497.50102@voidspace.org.uk> <3B40F127-F82B-4A91-9485-3D089DAF4A4F@fuhm.net> Message-ID: On Nov 17, 2010, at 11:38 AM, Guido van Rossum wrote: > Deprecation doesn't *require* logging a warning or raising an > exception. You can also add a note to the docs, or if it is > undocumented, just add a comment to the code. (Though if it is in > widespread use despite being undocumented, a better way would be to > document it first -- as immediately deprecated if necessary.) > > Deprecation is in the end a way to give people advance warning about > future changes. The mechanism of the warning doesn't always have to be > implemented by the interpreter/compiler/parser or whatever other tool. Well, that's certainly a possible policy. I'd suggest that adding notes to the docs after-the-fact is a singularly ineffective way of giving people advance warning of feature removal compared to having the interpreter/compiler/parser or whatever other tool warn you. And if that's to be python's policy, when it's possible to do better, I'm disappointed. (But won't respond further, my point is made.) James From solipsis at pitrou.net Wed Nov 17 18:10:02 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 17 Nov 2010 18:10:02 +0100 Subject: [Python-Dev] Help deploying a new buildbot running OpenIndiana/x86 References: <4CE3FDA6.5040703@jcea.es> <20101117172301.36ac88f9@pitrou.net> <4CE4083E.1080603@jcea.es> Message-ID: <20101117181002.73e61cd1@pitrou.net> > > ?Could you provide the connection credential?. I rather prefer to skip > the IRC (I am a XMPP guy), but I can connect to freenode if you need it. I've already sent you a private e-mail. From jcea at jcea.es Wed Nov 17 18:13:24 2010 From: jcea at jcea.es (Jesus Cea) Date: Wed, 17 Nov 2010 18:13:24 +0100 Subject: [Python-Dev] Help deploying a new buildbot running OpenIndiana/x86 In-Reply-To: <20101117181002.73e61cd1@pitrou.net> References: <4CE3FDA6.5040703@jcea.es> <20101117172301.36ac88f9@pitrou.net> <4CE4083E.1080603@jcea.es> <20101117181002.73e61cd1@pitrou.net> Message-ID: <4CE40D34.4060804@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 17/11/10 18:10, Antoine Pitrou wrote: >> >> ?Could you provide the connection credential?. I rather prefer to skip >> the IRC (I am a XMPP guy), but I can connect to freenode if you need it. > > I've already sent you a private e-mail. OK. Sorry. My mail greylist is probably involved. Lets wait for another hour... Thanks for your time, Antoine. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOQNNJlgi5GaxT1NAQJVLAP9ElT0GGLWBZsGBMAHbzZCn1b0SC18Ki8o jp5eQgxDGRFo8ZPWVz3Q+/TGoIIs8UHLKjpYskfEae9Vm789lMlY/OZFerTn1Eus D9ldaVMKwpsLgSIgQr3AdAm3d5fXKvT6SXhGVwCOnuVi/iDiIGJl54UXoSqtLqo8 7PVP3LDaK8c= =8poZ -----END PGP SIGNATURE----- From janssen at parc.com Wed Nov 17 18:12:53 2010 From: janssen at parc.com (Bill Janssen) Date: Wed, 17 Nov 2010 09:12:53 PST Subject: [Python-Dev] Help deploying a new buildbot running OpenIndiana/x86 In-Reply-To: <4CE4083E.1080603@jcea.es> References: <4CE3FDA6.5040703@jcea.es> <20101117172301.36ac88f9@pitrou.net> <4CE4083E.1080603@jcea.es> Message-ID: <72914.1290013973@parc.com> Jesus Cea wrote: > On 17/11/10 17:23, Antoine Pitrou wrote: > > There is no incoming connection; however, a bunch of outgoing > > connections are made to various hosts by various tests, so it's better > > if there's no overzealous firewall in-between. For those of us who can't do that, there's a list of what machines the testing framework needs to be able to reach at . If you modify the tests, please keep that list up-to-date. Bill From alexander.belopolsky at gmail.com Wed Nov 17 19:35:19 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 17 Nov 2010 13:35:19 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> Message-ID: On Wed, Nov 17, 2010 at 8:30 AM, Nick Coghlan wrote: .. > The library documentation is *not* the right place for quibbling about > what constitutes a public API when using other means than the library > documentation to find APIs to call. > +1 People who bother to read the Library Reference most likely already know that it is the authoritative source. People who read the sources or use deep introspection most likely know that they are walking on thin ice. The only grey area is help() and dir(). Unfortunately may novice guides recommend using these tools for learning as follows: >>> L = [] >>> dir(L) ['append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'] >>> help(L.append) Help on built-in function append: ... See http://docs.python.org/faq/general.html#is-python-a-good-language-for-beginning-programmers Given the quirkiness of dir(), this is probably not the best practice. For the standard library however, >>> help('module') or $ pydoc module already refer users to the official manual. Unfortunately this feature is slightly broken in 3.x (the link takes you to 2.x documentation instead of 3.x). I have opened a bug report about this, http://bugs.python.org/issue10446, and would like to add a sentence or two to the "MODULE DOC" section explaining the differences between the auto-generated docs and the official manual. We may also revisit the rules used by help() to decide what to include on the auto-generated module implementation. Note that currently help() output excludes names not in __all__ is the module has __all__ defined. While I advocated this rule earlier in this thread, I now realize that it may not be quite practical. Consider the recent addition of open() to the tokenize module. It was documented in the manual, but (wisely) excluded from tokenize.__all__. It appears that this discussion is converging to the conclusion that public API = documented in the reST manual. An unfortunate consequence is that it is not easy to discover public API programmatically. However, "not easy" does not mean "impossible." ReST documentation is highly structured and Sphinx already generates various indices that can be easily queried. Maybe some of these indices should be distilled into something compact and made available to pydoc by the build process. This would allow help(anyobject) display a deep link to the official documentation or a warning that anyobject is undocumented. From tjreedy at udel.edu Wed Nov 17 20:52:58 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 17 Nov 2010 14:52:58 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <4CDAA27B.8040703@voidspace.org.uk> <4CDBDB0C.6080703@voidspace.org.uk> <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <87lj4t9cqq.fsf@benfinney.id.au> Message-ID: On 11/17/2010 10:52 AM, Guido van Rossum wrote: > That's not what I meant. In the case of style guides I think it is > totally appropriate to update the PEP as new rules are developed or > existing ones are clarified (or even changed). Revising style guides is standard practice. The Chicago Manual of Style, which is practically the 'Bible' of American publishing, is now in its 16th edition after 104 years. http://www.amazon.com/Chicago-Manual-Style-16th/dp/0226104206/ref=sr_1_2?s=books&ie=UTF8&qid=1290022712&sr=1-2 Idea: include the 'current' version of PEP8 in the doc set for each Python version as the frozen Python Stdlib Style Guide for that version. Then people could specifically refer to the 3.2 version of the style guide. PEP8 would then be the trunk version subject to further revision. Include with the frozen version the repository id info needed to do a diff between it and future revisions so people can discover what has changed since whenever. -- Terry Jan Reedy From merwok at netwok.org Wed Nov 17 22:08:32 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Wed, 17 Nov 2010 22:08:32 +0100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> Message-ID: <4CE44450.3020303@netwok.org> > We may also revisit the rules used by help() to decide what to include > on the auto-generated module implementation. Note that currently > help() output excludes names not in __all__ is the module has __all__ > defined. While I advocated this rule earlier in this thread, I now > Consider the recent addition of open() to the tokenize module. It > was documented in the manual, but (wisely) excluded from tokenize.__all__. I?m not sure this was on purpose. Victor? From ncoghlan at gmail.com Wed Nov 17 22:10:01 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 18 Nov 2010 07:10:01 +1000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE44450.3020303@netwok.org> References: <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <4CE44450.3020303@netwok.org> Message-ID: On Thu, Nov 18, 2010 at 7:08 AM, ?ric Araujo wrote: >> We may also revisit the rules used by help() to decide what to include >> on the auto-generated module implementation. ?Note that currently >> help() output excludes names not in __all__ is the module has __all__ >> defined. ?While I advocated this rule earlier in this thread, I now >> Consider the recent addition of open() to the tokenize module. ?It >> was documented in the manual, but (wisely) excluded from tokenize.__all__. > > I?m not sure this was on purpose. ?Victor? Excluding a builtin name from __all__ sounds like a perfectly sensible idea, so even if it wasn't deliberate, I'd say it qualifies as fortuitous :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From jaraco at jaraco.com Wed Nov 17 21:58:10 2010 From: jaraco at jaraco.com (Jason R. Coombs) Date: Wed, 17 Nov 2010 12:58:10 -0800 Subject: [Python-Dev] new LRU cache API in Py3.2 In-Reply-To: References: Message-ID: <12C7AB425F0DD546B6049311F827C74E0986D4151B@VA3DIAXVS141.RED001.local> I see now that my previous reply went only to Stefan, so I'm re-submitting, this time to the list. > -----Original Message----- > From: Stefan Behnel > Sent: Saturday, 04 September, 2010 04:29 > > What about adding an intermediate namespace called "cache", so that > the new operations are available like this: > > print get_phone_number.cache.hits > get_phone_number.cache.clear() I agree. While the function-based implementation is highly efficient, the pure use of functions has the counter-Pythonic effect of obfuscating the internal state (the same way the 'private' keyword does in Java). A class-based implementation would be capable of having its state introspected and could easily be extended. While the functional implementation is a powerful construct, it fails to generalize well. IMHO, a stdlib implementation should err on the side of transparency and extensibility over performance. That said, I've adapted Hettinger's Python 2.5 implementation to a class-based implementation. I've tried to keep the performance optimizations in place, but instead of instrumenting the wrapped method with lots of cache_* functions, I simply attach the cache object itself, which then provides the interface suggested by Stefan. This technique allows access to the cache object and all of its internal state, so it's also possible to do things like: get_phone_number.cache.maxsize += 100 or if get_phone_number.cache.store: do_something_interesting() These techniques are nearly impossible in the functional implementation, as the state is buried in the locals() of the nested functions. I'm most grateful to Raymond for contributing this to Python; On many occasions, I've used the ActiveState recipes for simple caches, but in almost every case, I've had to adapt the implementation to provide more transparency. I'd prefer to not have to do the same with the stdlib. Regards, Jason R. Coombs -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cache.py URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 6448 bytes Desc: not available URL: From merwok at netwok.org Wed Nov 17 22:16:10 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Wed, 17 Nov 2010 22:16:10 +0100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <4CE44450.3020303@netwok.org> Message-ID: <4CE4461A.1020007@netwok.org> > Excluding a builtin name from __all__ sounds like a perfectly sensible > idea, so even if it wasn't deliberate, I'd say it qualifies as > fortuitous :) But then, a tool that looks into __all__ to find for example what objects to document will miss open. I?d put open in __all__. Regards From g.brandl at gmx.net Wed Nov 17 22:22:50 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 17 Nov 2010 22:22:50 +0100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE4461A.1020007@netwok.org> References: <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <4CE44450.3020303@netwok.org> <4CE4461A.1020007@netwok.org> Message-ID: Am 17.11.2010 22:16, schrieb ?ric Araujo: >> Excluding a builtin name from __all__ sounds like a perfectly sensible >> idea, so even if it wasn't deliberate, I'd say it qualifies as >> fortuitous :) > > But then, a tool that looks into __all__ to find for example what > objects to document will miss open. I?d put open in __all__. So it comes down again to what we'd like __all__ to mean foremost: public API, or just a list for "import *"? Georg From fdrake at acm.org Wed Nov 17 22:39:25 2010 From: fdrake at acm.org (Fred Drake) Date: Wed, 17 Nov 2010 16:39:25 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <4CE44450.3020303@netwok.org> <4CE4461A.1020007@netwok.org> Message-ID: On Wed, Nov 17, 2010 at 4:22 PM, Georg Brandl wrote: > So it comes down again to what we'd like __all__ to mean foremost: > public API, or just a list for "import *"? It is and has been since its inception *the* list for "import *". Any additional meaning will have to accommodate that usage as well. ? -Fred -- Fred L. Drake, Jr.? ? "A storm broke loose in my mind."? --Albert Einstein From solipsis at pitrou.net Wed Nov 17 22:48:01 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 17 Nov 2010 22:48:01 +0100 Subject: [Python-Dev] PEP 3151 dictator Message-ID: <20101117224801.44d97bad@pitrou.net> Hello, I would like to announce that, following Guido's (private) suggestion that I find a temporary dictator for PEP 3151, Barry has accepted to fill in this role. Regards Antoine. From g.brandl at gmx.net Wed Nov 17 22:50:10 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 17 Nov 2010 22:50:10 +0100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <4CE44450.3020303@netwok.org> <4CE4461A.1020007@netwok.org> Message-ID: Am 17.11.2010 22:39, schrieb Fred Drake: > On Wed, Nov 17, 2010 at 4:22 PM, Georg Brandl wrote: >> So it comes down again to what we'd like __all__ to mean foremost: >> public API, or just a list for "import *"? > > It is and has been since its inception *the* list for "import *". > > Any additional meaning will have to accommodate that usage as well. Seeing that "import *" is discouraged anywhere I look, it might just not be as important anymore. BTW, "open" is listed in __all__ for lots of modules: io, gzip, dbm... and even "ancient" ones like aifc. cheers, Georg From steve at pearwood.info Wed Nov 17 22:57:00 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 18 Nov 2010 08:57:00 +1100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> Message-ID: <4CE44FAC.5010408@pearwood.info> Nick Coghlan wrote: > The policy we're aiming to clarify here is what we should do when we > come across standard library APIs that land in the grey area, with > there being two appropriate ways to deal with them: > 1. Document them and make them officially public > 2. Deprecate the public names and make them officially private (with > the public names later removed in accordance with normal deprecation > procedures) You missed at least two other options: 3. Treat "documented" and "public" as orthogonal, not synonymous: undocumented public API is not an oxymoron, and neither is documented private API. 4. Do nothing. Inertia wins. Is this problem we're trying to solve so serious that we need to solve it now except on a case-by-case basis? The approach that gives us the most flexibility is #3. Clearly one would not need to document private APIs for the use of the general public, but adding docstrings to private functions and classes for in-house use is a sensible thing to do. This applies equally to the standard library as to any other major project. Likewise, one might introduce a public function into some module, but for whatever reason, choose not to document it. (Perhaps it's a lack of hours in the day, perhaps it is a deliberate decision.) In this case, the mere lack of documentation shouldn't relieve us of the responsibility of treating the function as public. For emphasis: I strongly believe that public/private and documented/undocumented are orthogonal qualities, and should not be treated as, or forced to be, identical. The use of imported modules is possibly an exception. If a user is writing something like (say) getopt.os.getcwd() instead of importing os directly, then they're on shaky ground. We shouldn't expect module authors to write "import os as _os" just to avoid making os a part of their public API. I'd be prepared to make an exception to the rule "no leading underscore means public": imported modules are implementation details unless explicitly documented otherwise. E.g. the os module explicitly makes path part of its public API, but os.sys is an implementation detail. -- Steven From ben+python at benfinney.id.au Thu Nov 18 02:08:08 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Thu, 18 Nov 2010 12:08:08 +1100 Subject: [Python-Dev] Breaking undocumented API References: <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <4CE44FAC.5010408@pearwood.info> Message-ID: <8762vva16v.fsf@benfinney.id.au> Steven D'Aprano writes: > 3. Treat "documented" and "public" as orthogonal, not synonymous: > undocumented public API is not an oxymoron, and neither is documented > private API. +1 > The use of imported modules is possibly an exception. If a user is > writing something like (say) getopt.os.getcwd() instead of importing > os directly, then they're on shaky ground. We shouldn't expect module > authors to write "import os as _os" just to avoid making os a part of > their public API. > > I'd be prepared to make an exception to the rule "no leading > underscore means public": imported modules are implementation details > unless explicitly documented otherwise. E.g. the os module explicitly > makes path part of its public API, but os.sys is an implementation > detail. After reading the discussion for many days, I'm leaning to this position also. -- \ ?I may disagree with what you say, but I will defend to the | `\ death your right to mis-attribute this quote to Voltaire.? | _o__) ?Avram Grumer, rec.arts.sf.written, 2000-05-30 | Ben Finney From guido at python.org Thu Nov 18 03:44:35 2010 From: guido at python.org (Guido van Rossum) Date: Wed, 17 Nov 2010 18:44:35 -0800 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <8762vva16v.fsf@benfinney.id.au> References: <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <4CE44FAC.5010408@pearwood.info> <8762vva16v.fsf@benfinney.id.au> Message-ID: On Wed, Nov 17, 2010 at 5:08 PM, Ben Finney wrote: > Steven D'Aprano writes: > >> 3. Treat "documented" and "public" as orthogonal, not synonymous: >> undocumented public API is not an oxymoron, and neither is documented >> private API. > > +1 > >> The use of imported modules is possibly an exception. If a user is >> writing something like (say) getopt.os.getcwd() instead of importing >> os directly, then they're on shaky ground. We shouldn't expect module >> authors to write "import os as _os" just to avoid making os a part of >> their public API. >> >> I'd be prepared to make an exception to the rule "no leading >> underscore means public": imported modules are implementation details >> unless explicitly documented otherwise. E.g. the os module explicitly >> makes path part of its public API, but os.sys is an implementation >> detail. > > After reading the discussion for many days, I'm leaning to this position > also. Agreed on both counts. -- --Guido van Rossum (python.org/~guido) From fuzzyman at voidspace.org.uk Thu Nov 18 11:47:18 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 18 Nov 2010 10:47:18 +0000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE4461A.1020007@netwok.org> References: <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <4CE44450.3020303@netwok.org> <4CE4461A.1020007@netwok.org> Message-ID: <4CE50436.4010706@voidspace.org.uk> On 17/11/2010 21:16, ?ric Araujo wrote: >> Excluding a builtin name from __all__ sounds like a perfectly sensible >> idea, so even if it wasn't deliberate, I'd say it qualifies as >> fortuitous :) > But then, a tool that looks into __all__ to find for example what > objects to document will miss open. I?d put open in __all__. > "import *" would then override the builtin open. A good reason not to use "import *" I guess, but also a good reason not to create names that shadow builtins. All the best, Michael > Regards > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fuzzyman at voidspace.org.uk Thu Nov 18 11:54:23 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 18 Nov 2010 10:54:23 +0000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <4CE44450.3020303@netwok.org> <4CE4461A.1020007@netwok.org> Message-ID: <4CE505DF.8030401@voidspace.org.uk> On 17/11/2010 21:22, Georg Brandl wrote: > Am 17.11.2010 22:16, schrieb ?ric Araujo: >>> Excluding a builtin name from __all__ sounds like a perfectly sensible >>> idea, so even if it wasn't deliberate, I'd say it qualifies as >>> fortuitous :) >> But then, a tool that looks into __all__ to find for example what >> objects to document will miss open. I?d put open in __all__. > So it comes down again to what we'd like __all__ to mean foremost: > public API, or just a list for "import *"? Well, as noted earlier in this discussion - the language reference *states* that __all__ defines the module level public API. From: http://docs.python.org/reference/simple_stmts.html#grammar-token-import_stmt "If the list of identifiers is replaced by a star ('*'), all public names defined in the module are bound in the local namespace of the import statement." ... "The public names defined by a module are determined by checking the module?s namespace for a variable named __all__" If we decide that __all__ is purely for "import *" we should refine the use of the word public on this page. All the best, Michael Foord > Georg > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fuzzyman at voidspace.org.uk Thu Nov 18 12:41:07 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 18 Nov 2010 11:41:07 +0000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE44FAC.5010408@pearwood.info> References: <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <4CE44FAC.5010408@pearwood.info> Message-ID: <4CE510D3.5090501@voidspace.org.uk> On 17/11/2010 21:57, Steven D'Aprano wrote: > Nick Coghlan wrote: > >> The policy we're aiming to clarify here is what we should do when we >> come across standard library APIs that land in the grey area, with >> there being two appropriate ways to deal with them: >> 1. Document them and make them officially public >> 2. Deprecate the public names and make them officially private (with >> the public names later removed in accordance with normal deprecation >> procedures) > > You missed at least two other options: > > 3. Treat "documented" and "public" as orthogonal, not synonymous: > undocumented public API is not an oxymoron, and neither is documented > private API. > Along with the others +1 I think how we handle the deprecations (legacy modules with unclear or clearly wrong naming policies) is the least interesting part of this discussion. For deprecating existing names we have *no choice* but to proceed on a case-by-case basis evaluating how likely the deprecation is to break other code, whether or not the name was originally intended to be public or not. (At least that is how we *should* proceed and part of our standard deprecation policy - it is why we aren't removing unittest.TestCase.assertEquals and assert_ even though they are deprecated. They are just too widely used.) What is more important is that we have a clearly stated policy for new modules and adding names to existing modules so that we don't have to repeat this debate in five years time. My suggestion, which fits in with the use of __all__ by the language and also the convention widely in use by the community already boils down to: * If __all__ exists it is definitive * Imported names are never part of the public API of a module unless in __all__ or documented to be part of the API * Names with leading underscores are private unless in __all__ (and if you want to export leading underscore names as part of a public API you should define __all__ or "import *" won't export them) * Leading underscore convention extends to packages and class members; no members of a package or class whose name begins with a leading underscore are public It is still good practise that public APIs *should* be documented (and *should* have docstrings). There is however no corollary that private APIs should not be documented (and they may have docstrings). All the best, Michael Foord > 4. Do nothing. Inertia wins. Is this problem we're trying to solve so > serious that we need to solve it now except on a case-by-case basis? > > The approach that gives us the most flexibility is #3. Clearly one > would not need to document private APIs for the use of the general > public, but adding docstrings to private functions and classes for > in-house use is a sensible thing to do. This applies equally to the > standard library as to any other major project. > > Likewise, one might introduce a public function into some module, but > for whatever reason, choose not to document it. (Perhaps it's a lack > of hours in the day, perhaps it is a deliberate decision.) In this > case, the mere lack of documentation shouldn't relieve us of the > responsibility of treating the function as public. > > For emphasis: I strongly believe that public/private and > documented/undocumented are orthogonal qualities, and should not be > treated as, or forced to be, identical. > > The use of imported modules is possibly an exception. If a user is > writing something like (say) getopt.os.getcwd() instead of importing > os directly, then they're on shaky ground. We shouldn't expect module > authors to write "import os as _os" just to avoid making os a part of > their public API. > > I'd be prepared to make an exception to the rule "no leading > underscore means public": imported modules are implementation details > unless explicitly documented otherwise. E.g. the os module explicitly > makes path part of its public API, but os.sys is an implementation > detail. > > > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From ncoghlan at gmail.com Thu Nov 18 13:16:35 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 18 Nov 2010 22:16:35 +1000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <4CE44450.3020303@netwok.org> <4CE4461A.1020007@netwok.org> Message-ID: On Thu, Nov 18, 2010 at 7:22 AM, Georg Brandl wrote: > Am 17.11.2010 22:16, schrieb ?ric Araujo: >>> Excluding a builtin name from __all__ sounds like a perfectly sensible >>> idea, so even if it wasn't deliberate, I'd say it qualifies as >>> fortuitous :) >> >> But then, a tool that looks into __all__ to find for example what >> objects to document will miss open. ?I?d put open in __all__. > > So it comes down again to what we'd like __all__ to mean foremost: > public API, or just a list for "import *"? It's the list for star imports. This intended use case is borne out by the description of the feature when it was first added to the language back in 2.1: http://docs.python.org/dev/whatsnew/2.1.html?highlight=__all__#other-changes-and-fixes The public API (for documentation and introspection purposes) is any name that doesn't start with an underscore and isn't an imported module. If a tool is attempting to use __all__ as more than just the list of names for star imports, I would call the tool buggy. The use of the term "public names" in the language reference when describing the semantics of __all__ is an unfortunate choice, but it is used specifically in the context of talking about star imports and clarifying which names they bring in without making any reference to standards for documentation or deprecation policies. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From g.brandl at gmx.net Thu Nov 18 13:37:38 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 18 Nov 2010 13:37:38 +0100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE50436.4010706@voidspace.org.uk> References: <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <4CE44450.3020303@netwok.org> <4CE4461A.1020007@netwok.org> <4CE50436.4010706@voidspace.org.uk> Message-ID: Am 18.11.2010 11:47, schrieb Michael Foord: > On 17/11/2010 21:16, ?ric Araujo wrote: >>> Excluding a builtin name from __all__ sounds like a perfectly sensible >>> idea, so even if it wasn't deliberate, I'd say it qualifies as >>> fortuitous :) >> But then, a tool that looks into __all__ to find for example what >> objects to document will miss open. I?d put open in __all__. >> > > "import *" would then override the builtin open. A good reason not to > use "import *" I guess, but also a good reason not to create names that > shadow builtins. Heh. Instead have fun with io.ioopen(), gzip.gzipopen(), webbrowser.webbrowseropen(), etc.? We do have namespace support for a reason. Georg From fuzzyman at voidspace.org.uk Thu Nov 18 13:48:57 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 18 Nov 2010 12:48:57 +0000 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <4CE44450.3020303@netwok.org> <4CE4461A.1020007@netwok.org> <4CE50436.4010706@voidspace.org.uk> Message-ID: <4CE520B9.3020500@voidspace.org.uk> On 18/11/2010 12:37, Georg Brandl wrote: > Am 18.11.2010 11:47, schrieb Michael Foord: >> On 17/11/2010 21:16, ?ric Araujo wrote: >>>> Excluding a builtin name from __all__ sounds like a perfectly sensible >>>> idea, so even if it wasn't deliberate, I'd say it qualifies as >>>> fortuitous :) >>> But then, a tool that looks into __all__ to find for example what >>> objects to document will miss open. I?d put open in __all__. >>> >> "import *" would then override the builtin open. A good reason not to >> use "import *" I guess, but also a good reason not to create names that >> shadow builtins. > Heh. Instead have fun with io.ioopen(), gzip.gzipopen(), > webbrowser.webbrowseropen(), etc.? We do have namespace support for a reason. Or urllib2.urlopen, oh wait - that's real... If I was importing from those namespaces I probably *would* import and rename to have unambiguous names (and you would *have* to if there was any possibility of you using the builtin open). io.open is arguably an exception to this as it does the same as the builtin open... Using meaningful names is *good*. This is a reason I dislike modules that just call their base exception class "Error". You *have* to use it from the namespace (or import with import as and give it a good name) for it to have any meaning. Michael > Georg > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From lukasz at langa.pl Thu Nov 18 14:13:39 2010 From: lukasz at langa.pl (=?UTF-8?B?xYF1a2FzeiBMYW5nYQ==?=) Date: Thu, 18 Nov 2010 14:13:39 +0100 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE520B9.3020500@voidspace.org.uk> References: <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <4CE44450.3020303@netwok.org> <4CE4461A.1020007@netwok.org> <4CE50436.4010706@voidspace.org.uk> <4CE520B9.3020500@voidspace.org.uk> Message-ID: <4CE52683.2040809@langa.pl> Am 18.11.2010 13:48, schrieb Michael Foord: > On 18/11/2010 12:37, Georg Brandl wrote: >> Am 18.11.2010 11:47, schrieb Michael Foord: >>> On 17/11/2010 21:16, ?ric Araujo wrote: >>>>> Excluding a builtin name from __all__ sounds like a perfectly >>>>> sensible >>>>> idea, so even if it wasn't deliberate, I'd say it qualifies as >>>>> fortuitous :) >>>> But then, a tool that looks into __all__ to find for example what >>>> objects to document will miss open. I?d put open in __all__. >>>> >>> "import *" would then override the builtin open. A good reason not to >>> use "import *" I guess, but also a good reason not to create names that >>> shadow builtins. >> Heh. Instead have fun with io.ioopen(), gzip.gzipopen(), >> webbrowser.webbrowseropen(), etc.? We do have namespace support for >> a reason. > > Or urllib2.urlopen, oh wait - that's real... > > If I was importing from those namespaces I probably *would* import and > rename to have unambiguous names (and you would *have* to if there was > any possibility of you using the builtin open). io.open is arguably an > exception to this as it does the same as the builtin open... > > Using meaningful names is *good*. This is a reason I dislike modules > that just call their base exception class "Error". You *have* to use > it from the namespace (or import with import as and give it a good > name) for it to have any meaning. > Guys, I may agree or disagree with these statements but we are drifting towards "opinion" versus "solid, well understood practice". Let's focus on the subject. For the matter, "import *" is a discouraged mechanism anyway, let alone the rare exceptions where its usage is valid. If you use star-imports and you don't know what you're doing, you might just as well hurt yourself in other ways than just by "open". Maybe we should just sum up the discussion somewhere already. Keeping up with a thread reaching a megabyte in size is starting to be painful. Best regards, ?ukasz From fdrake at acm.org Thu Nov 18 14:47:05 2010 From: fdrake at acm.org (Fred Drake) Date: Thu, 18 Nov 2010 08:47:05 -0500 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: <4CE510D3.5090501@voidspace.org.uk> References: <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <4CE44FAC.5010408@pearwood.info> <4CE510D3.5090501@voidspace.org.uk> Message-ID: On Thu, Nov 18, 2010 at 6:41 AM, Michael Foord wrote: > Along with the others +1 I agree with keeping these distinct and orthogonal as well. > What is more important is that we have a clearly stated policy for new > modules and adding names to existing modules so that we don't have to repeat > this debate in five years time. Agreed again. > My suggestion, which fits in with the use of __all__ by the language and > also the convention widely in use by the community already boils down to: > > * If __all__ exists it is definitive I think this is overly vague. :-) Specifically, if something is mentioned in __all__, it's public. Non-inclusion in __all__ doesn't imply privateness. > * Names with leading underscores are private unless in __all__ (and if you > want to export leading underscore names as part of a public API you should > define __all__ or "import *" won't export them) We shouldn't confuse non-export via "import *" with non-public, however. ? -Fred -- Fred L. Drake, Jr.? ? "A storm broke loose in my mind."? --Albert Einstein From solipsis at pitrou.net Thu Nov 18 16:18:57 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 18 Nov 2010 16:18:57 +0100 Subject: [Python-Dev] r86514 - in python/branches/py3k/Lib: test/test_xmlrpc.py xmlrpc/client.py References: <20101118150053.F2247EEA6E@mail.python.org> Message-ID: <20101118161857.42a6750e@pitrou.net> On Thu, 18 Nov 2010 16:00:53 +0100 (CET) senthil.kumaran wrote: > Author: senthil.kumaran > Date: Thu Nov 18 16:00:53 2010 > New Revision: 86514 > > Log: > Fix Issue 9991: xmlrpc client ssl check faulty > [...] > > + def test_ssl_presence(self): > + #Check for ssl support > + have_ssl = False > + if hasattr(socket, 'ssl'): > + have_ssl = True This is not the right way to check for ssl. socket.ssl is deprecated in 2.x and doesn't exist in 3.x. "import ssl" is enough. > + try: > + xmlrpc.client.ServerProxy('https://localhost:9999').bad_function() > + except: > + exc = sys.exc_info() > + if exc[0] == socket.error: This is a rather clumsy way to check for exception types. Why don't you just write "except socket.error"? > - if not hasattr(socket, "ssl"): > + if not hasattr(http.client, "ssl"): That isn't better. "http.client.ssl" is not a public API. You should check for http.client.HTTPSConnection instead. cheers Antoine. From orsenthil at gmail.com Thu Nov 18 17:23:25 2010 From: orsenthil at gmail.com (Senthil Kumaran) Date: Fri, 19 Nov 2010 00:23:25 +0800 Subject: [Python-Dev] r86514 - in python/branches/py3k/Lib: test/test_xmlrpc.py xmlrpc/client.py In-Reply-To: <20101118161857.42a6750e@pitrou.net> References: <20101118150053.F2247EEA6E@mail.python.org> <20101118161857.42a6750e@pitrou.net> Message-ID: On Thu, Nov 18, 2010 at 11:18 PM, Antoine Pitrou wrote: >> Log: >> Fix Issue 9991: xmlrpc client ssl check faulty >> > [...] >> >> + ? ?def test_ssl_presence(self): >> + ? ? ? ?#Check for ssl support >> + ? ? ? ?have_ssl = False >> + ? ? ? ?if hasattr(socket, 'ssl'): >> + ? ? ? ? ? ?have_ssl = True > > This is not the right way to check for ssl. ?socket.ssl is deprecated in > 2.x and doesn't exist in 3.x. ?"import ssl" is enough. The history of the bug report showed that it was closed earlier with comments such as "Python should be complied with SSL" which had resulted in some confusion, so after some thought, I let those earlier verifications remain (Just for readability/understanding the context of the tests). Thinking again, I see that it is not required. Agree to your comments on code changes. Shall change it. -- Senthil From martin at v.loewis.de Thu Nov 18 17:25:41 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 18 Nov 2010 17:25:41 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: References: <4CE2CF8F.4040500@jcea.es> Message-ID: <4CE55385.6080002@v.loewis.de> Am 17.11.2010 08:18, schrieb Georg Brandl: > Am 16.11.2010 19:38, schrieb Jesus Cea: >> Is there any updated mercurial schedule?. >> >> Any impact related with the new 3.2 schedule (three weeks offset)? > > I've been trying to contact Dirkjan and ask; generally, I don't > see much connection to the 3.2 schedule (with the exception that > the final migration day should not be a release day.) Please reconsider. When Python migrates to Mercurial, new features will be added to Python, most notably a new way of identifying versions, perhaps new variables in the sys module. So far, the policy has been that no new features can be added after beta 1. So consequentially, migrating 3.2 to Mercurial would violate that policy if done after b1. Consequentially, we would need to release 3.2 from Subversion, which in turn means that the Mercurial migration should be delayed until after 3.2 is released. Alternatively, b1 should be postponed until after the Mercurial migration is done. Regards, Martin From guido at python.org Thu Nov 18 17:50:22 2010 From: guido at python.org (Guido van Rossum) Date: Thu, 18 Nov 2010 08:50:22 -0800 Subject: [Python-Dev] Breaking undocumented API In-Reply-To: References: <20101111100516.6e90aa41@mission> <4CDC08F3.6010501@langa.pl> <4CDC0950.5040309@voidspace.org.uk> <20101116163454.2040.394815387.divmod.xquotient.928@localhost.localdomain> <4CE3C31D.50701@voidspace.org.uk> <4CE3CC87.1000105@langa.pl> <4CE3D497.50102@voidspace.org.uk> <4CE44450.3020303@netwok.org> <4CE4461A.1020007@netwok.org> Message-ID: On Thu, Nov 18, 2010 at 4:16 AM, Nick Coghlan wrote: > On Thu, Nov 18, 2010 at 7:22 AM, Georg Brandl wrote: >> So it comes down again to what we'd like __all__ to mean foremost: >> public API, or just a list for "import *"? > > It's the list for star imports. This intended use case is borne out by > the description of the feature when it was first added to the language > back in 2.1: > http://docs.python.org/dev/whatsnew/2.1.html?highlight=__all__#other-changes-and-fixes > > The public API (for documentation and introspection purposes) is any > name that doesn't start with an underscore and isn't an imported > module. If a tool is attempting to use __all__ as more than just the > list of names for star imports, I would call the tool buggy. Not so fast. The feature's meaning has clearly evolved. > The use of the term "public names" in the language reference when > describing the semantics of __all__ is an unfortunate choice, but it > is used specifically in the context of talking about star imports and > clarifying which names they bring in without making any reference to > standards for documentation or deprecation policies. Let's live with a little ambiguity. There are more shades of gray here than you can imagine. I like gray. -- --Guido van Rossum (python.org/~guido) From guido at python.org Thu Nov 18 17:57:58 2010 From: guido at python.org (Guido van Rossum) Date: Thu, 18 Nov 2010 08:57:58 -0800 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <4CE55385.6080002@v.loewis.de> References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> Message-ID: On Thu, Nov 18, 2010 at 8:25 AM, "Martin v. L?wis" wrote: > Am 17.11.2010 08:18, schrieb Georg Brandl: >> Am 16.11.2010 19:38, schrieb Jesus Cea: >>> Is there any updated mercurial schedule?. >>> >>> Any impact related with the new 3.2 schedule (three weeks offset)? >> >> I've been trying to contact Dirkjan and ask; generally, I don't >> see much connection to the 3.2 schedule (with the exception that >> the final migration day should not be a release day.) > > Please reconsider. When Python migrates to Mercurial, new features > will be added to Python, most notably a new way of identifying versions, > perhaps new variables in the sys module. So far, the policy has been > that no new features can be added after beta 1. So consequentially, > migrating 3.2 to Mercurial would violate that policy if done after b1. > Consequentially, we would need to release 3.2 from Subversion, which > in turn means that the Mercurial migration should be delayed until > after 3.2 is released. > > Alternatively, b1 should be postponed until after the Mercurial > migration is done. I think this "new feature" is not so shocking that it can be used as an argument to hold up the migration. If you have another reason to stop the migration please say so; personally I can't wait for it to happen. -- --Guido van Rossum (python.org/~guido) From g.brandl at gmx.net Thu Nov 18 18:08:10 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 18 Nov 2010 18:08:10 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <4CE55385.6080002@v.loewis.de> References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> Message-ID: Am 18.11.2010 17:25, schrieb "Martin v. L?wis": > Am 17.11.2010 08:18, schrieb Georg Brandl: >> Am 16.11.2010 19:38, schrieb Jesus Cea: >>> Is there any updated mercurial schedule?. >>> >>> Any impact related with the new 3.2 schedule (three weeks offset)? >> >> I've been trying to contact Dirkjan and ask; generally, I don't >> see much connection to the 3.2 schedule (with the exception that >> the final migration day should not be a release day.) > > Please reconsider. When Python migrates to Mercurial, new features > will be added to Python, most notably a new way of identifying versions, > perhaps new variables in the sys module. So far, the policy has been > that no new features can be added after beta 1. So consequentially, > migrating 3.2 to Mercurial would violate that policy if done after b1. > Consequentially, we would need to release 3.2 from Subversion, which > in turn means that the Mercurial migration should be delayed until > after 3.2 is released. I'm with Guido here. Plus, if you like it can be seen as a bug fix: the SVN build identification stops working, and we neeed to fix it. Georg From martin at v.loewis.de Thu Nov 18 18:32:33 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 18 Nov 2010 18:32:33 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> Message-ID: <4CE56331.3050508@v.loewis.de> >> Alternatively, b1 should be postponed until after the Mercurial >> migration is done. > > I think this "new feature" is not so shocking that it can be used as > an argument to hold up the migration. If you have another reason to > stop the migration please say so; personally I can't wait for it to > happen. I can't point out any other specific concern, just a general feeling that *when* the migration happens, it will be rushed, and we will have to deal for a long time with the aftermath. For example, I expect that it will take me several days until I get the Windows build process to work correctly, and, if the migration gets as rushed as it appears to, that the migration will happen without everything being worked out beforehand. Therefore, I'm concerned that I will have to work out all the details on my own, just so that I can produce the b2 binaries (says); this is not something I look forward to. I'm not asking that the migration be stopped - I'm asking that it be accelerated, so that there is plenty of time to identify all the problems. But I'm also not willing to put time into it. Failing the acceleration, I ask that appropriate consequences for the 3.2 release are drawn: either it is postponed, or done using Subversion until the final release (I think something can be worked out then to get the 3.2.1 release from Mercurial - with only slight incompatibilities). In general, I'm *also* concerned about the lack of volunteers that are interested in working on the infrastructure. I wish some of the people who stated that they can't wait for the migration to happen would work on solving some of the remaining problems. Regards, Martin From g.brandl at gmx.net Thu Nov 18 19:56:51 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 18 Nov 2010 19:56:51 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <4CE56331.3050508@v.loewis.de> References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> Message-ID: Am 18.11.2010 18:32, schrieb "Martin v. L?wis": >>> Alternatively, b1 should be postponed until after the Mercurial >>> migration is done. >> >> I think this "new feature" is not so shocking that it can be used as >> an argument to hold up the migration. If you have another reason to >> stop the migration please say so; personally I can't wait for it to >> happen. > > I can't point out any other specific concern, just a general feeling > that *when* the migration happens, it will be rushed, and we will have > to deal for a long time with the aftermath. For example, I expect that > it will take me several days until I get the Windows build process to > work correctly, and, if the migration gets as rushed as it appears to, > that the migration will happen without everything being worked out > beforehand. > > Therefore, I'm concerned that I will have to work out all the details > on my own, just so that I can produce the b2 binaries (says); this is > not something I look forward to. How much does the binary build process really depend on version control? I.e., what would be stopping you from making a binary from an archive made with e.g. "svn export"? (I'm really asking because I don't know.) Concerning the SVN external/ subdir, that is quite orthogonal to the main development repo, and doesn't need to be migrated in lockstep (if it is migrated to Mercurial at all in its current shape. > I'm not asking that the migration be stopped - I'm asking that it be > accelerated, so that there is plenty of time to identify all the > problems. But I'm also not willing to put time into it. I think we have anticipated what we could. Of course there will still be problems, but I think not of the sort that causes big disruptions everywhere, preventing our developers from committing or breaking the issue tracker, etc. > Failing the acceleration, I ask that appropriate consequences for > the 3.2 release are drawn: either it is postponed, or done using > Subversion until the final release (I think something can be worked > out then to get the 3.2.1 release from Mercurial - with only slight > incompatibilities). > > In general, I'm *also* concerned about the lack of volunteers that > are interested in working on the infrastructure. I wish some of the > people who stated that they can't wait for the migration to happen > would work on solving some of the remaining problems. Well, put some butter to the fish: how many volunteers would you deem sufficient, and which specific tasks are uncared for in the infrastructure? I can only speak for myself, but I am prepared to put in my time. Georg From martin at v.loewis.de Thu Nov 18 20:33:40 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 18 Nov 2010 20:33:40 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> Message-ID: <4CE57F94.4080703@v.loewis.de> >> Therefore, I'm concerned that I will have to work out all the details >> on my own, just so that I can produce the b2 binaries (says); this is >> not something I look forward to. > > How much does the binary build process really depend on version control? > I.e., what would be stopping you from making a binary from an archive made > with e.g. "svn export"? (I'm really asking because I don't know.) The build process currently compiles a program (make_buildinfo), which in turn finds the subversion installation, and runs subwcrev if found. If no .svn folder is found, it falls back to the version information in the export. I would have to try out what exactly will happen when I try to build the current hg conversion result on Windows, but chances are that the resulting interpreter will crash because the string manipulation fails to find the right substrings. > Well, put some butter to the fish: how many volunteers would you deem > sufficient, and which specific tasks are uncared for in the infrastructure? > I can only speak for myself, but I am prepared to put in my time. As a starting point, I'd like to see a complete, current conversion result, using as many repositories as planned, and including as many branches into each repository as planned (rather than the giant cpython repository which we have now - unless the plan now is that there will be a single giant repository). Then the existing patches to the build identification should be applied, and the repositories should be opened for (test) commits. Then people could start identifying problems. As a parallel activity, I'd also ask that the PEP is finished, or atleast put into a form where the authors consider it complete (again so that people could start identifying issues, and determine where the PEP differs from reality - currently most obviously in the branching approach). Regards, Martin From jcea at jcea.es Fri Nov 19 03:13:38 2010 From: jcea at jcea.es (Jesus Cea) Date: Fri, 19 Nov 2010 03:13:38 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <4CE56331.3050508@v.loewis.de> References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> Message-ID: <4CE5DD52.7050907@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 18/11/10 18:32, "Martin v. L?wis" wrote: > In general, I'm *also* concerned about the lack of volunteers that > are interested in working on the infrastructure. I wish some of the > people who stated that they can't wait for the migration to happen > would work on solving some of the remaining problems. Do we have a exhaustive list of mercurial "to do" things?. I thought the plan was to keep a read only SVN mirror fedded from mercurial. The 3.2 build could come from the mirror, I guess. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOXdUplgi5GaxT1NAQIL3AP/WRq9IwRZXEuFkKRAqBm0cOi4CkTbcV5X Ix+JZvimKEiq1DkUsJJb6q5/ViQ3z15ai9idY+AOmv4EdMK9hbgYZIQXGig9TLvA LFvqTqnl9ZuZCVFEYh2QdnXU576edgn2AaBpBDpoC88IXcu6Y3kcmzFIHWRTh2MF SEkUAzETSrc= =cOVM -----END PGP SIGNATURE----- From benjamin at python.org Fri Nov 19 03:23:25 2010 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 18 Nov 2010 20:23:25 -0600 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <4CE5DD52.7050907@jcea.es> References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> Message-ID: 2010/11/18 Jesus Cea : > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 18/11/10 18:32, "Martin v. L?wis" wrote: >> In general, I'm *also* concerned about the lack of volunteers that >> are interested in working on the infrastructure. I wish some of the >> people who stated that they can't wait for the migration to happen >> would work on solving some of the remaining problems. > > Do we have a exhaustive list of mercurial "to do" things?. http://hg.python.org/pymigr/file/1576eb34ec9f/tasks.txt -- Regards, Benjamin From g.brandl at gmx.net Fri Nov 19 08:43:15 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 19 Nov 2010 08:43:15 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> Message-ID: Am 19.11.2010 03:23, schrieb Benjamin Peterson: > 2010/11/18 Jesus Cea : >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On 18/11/10 18:32, "Martin v. L?wis" wrote: >>> In general, I'm *also* concerned about the lack of volunteers that >>> are interested in working on the infrastructure. I wish some of the >>> people who stated that they can't wait for the migration to happen >>> would work on solving some of the remaining problems. >> >> Do we have a exhaustive list of mercurial "to do" things?. > > http://hg.python.org/pymigr/file/1576eb34ec9f/tasks.txt Uh, that's the list of things to do *at* the migration. The todo list is http://hg.python.org/pymigr/file/1576eb34ec9f/todo.txt Georg From martin at v.loewis.de Fri Nov 19 08:58:27 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Fri, 19 Nov 2010 08:58:27 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> Message-ID: <4CE62E23.9010701@v.loewis.de> Am 19.11.2010 03:23, schrieb Benjamin Peterson: > 2010/11/18 Jesus Cea : >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On 18/11/10 18:32, "Martin v. L?wis" wrote: >>> In general, I'm *also* concerned about the lack of volunteers that >>> are interested in working on the infrastructure. I wish some of the >>> people who stated that they can't wait for the migration to happen >>> would work on solving some of the remaining problems. >> >> Do we have a exhaustive list of mercurial "to do" things?. > > http://hg.python.org/pymigr/file/1576eb34ec9f/tasks.txt This doesn't, but IMO should, list - resolve open issues in PEP - finalize and implement branch structure - set and implement policy for external code bases for Windows builds - set up account management infrastructure, determine account managers Regards, Martin From kristjan at ccpgames.com Fri Nov 19 08:31:59 2010 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Fri, 19 Nov 2010 15:31:59 +0800 Subject: [Python-Dev] sha digest endianness Message-ID: <2E034B571A5CE44E949B9FCC3B6D24EE57872AF5@exchcn.ccp.ad.local> Please see this defect: http://bugs.python.org/issue10430 It would appear that the digest and hexdigest for sha, is wrong on little endian machines. There certainly is a discrepancy between little and big endian ones, irrespective of which one is "right" Any thoughts? K -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Nov 19 14:50:36 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 19 Nov 2010 23:50:36 +1000 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> Message-ID: On Fri, Nov 19, 2010 at 5:43 PM, Georg Brandl wrote: > Am 19.11.2010 03:23, schrieb Benjamin Peterson: >> 2010/11/18 Jesus Cea : >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> On 18/11/10 18:32, "Martin v. L?wis" wrote: >>>> In general, I'm *also* concerned about the lack of volunteers that >>>> are interested in working on the infrastructure. I wish some of the >>>> people who stated that they can't wait for the migration to happen >>>> would work on solving some of the remaining problems. >>> >>> Do we have a exhaustive list of mercurial "to do" things?. >> >> http://hg.python.org/pymigr/file/1576eb34ec9f/tasks.txt > > Uh, that's the list of things to do *at* the migration. ?The todo list is > > http://hg.python.org/pymigr/file/1576eb34ec9f/todo.txt That kind of link is the sort of thing that should really be in the PEP... (along with the info about where to find the hooks repository, specific URLs for at least 3.x, 3.1 and 2.7, pointers to a draft FAQ to replace the current SVN focused FAQ, etc) Target dates for the following specific activities would also be useful: - date a "final draft" of converted repository will be made available to Martin and Ronald to dry run creation of Windows and Mac OS X installers - date SVN will go read only - date Hg will be available for write access (it should be frozen for a while, to give the folks doing the conversion a chance to make sure buildbot is back up and run, commit emails are working properly, etc) So as long as we acknowledge that any migration problems may mean additional beta releases of 3.2 to iron things out, I don't see a problem with releasing beta 1 as planned to close the door on any *other* new features, and giving the Hg migration a clear run at the source repository before we start working seriously on dealing with bug reports (either existing ones, or those from the first beta). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From martin at v.loewis.de Fri Nov 19 15:36:35 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 19 Nov 2010 15:36:35 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> Message-ID: <4CE68B73.30005@v.loewis.de> > - date Hg will be available for write access (it should be frozen for > a while, to give the folks doing the conversion a chance to make sure > buildbot is back up and run, commit emails are working properly, etc) I would target the build slaves to the Mercurial repository already in the testing phase, e.g by creating builders for building from commits to the 3k branch. I hope Buildbot supports multiple change sources now. Likewise, I'd also see commit emails being delivered in the test phase already, and let committers make test commits to trigger this all (and also to get acquainted with the Mercurial tools they are going to use, without fear of breaking something). Regards, Martin From barry at python.org Fri Nov 19 15:46:57 2010 From: barry at python.org (Barry Warsaw) Date: Fri, 19 Nov 2010 09:46:57 -0500 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> Message-ID: <20101119094657.1a7cc24a@mission> On Nov 19, 2010, at 11:50 PM, Nick Coghlan wrote: >- date SVN will go read only Please note that svn cannot be made completely read-only. We've already decided that versions already in maintenance or security-only mode (2.5, 2.6, 2.7, 3.1) will get updates and releases only via svn. But only the release managers should have write access to the svn repositories. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From ncoghlan at gmail.com Fri Nov 19 15:56:40 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 20 Nov 2010 00:56:40 +1000 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <20101119094657.1a7cc24a@mission> References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> <20101119094657.1a7cc24a@mission> Message-ID: On Sat, Nov 20, 2010 at 12:46 AM, Barry Warsaw wrote: > On Nov 19, 2010, at 11:50 PM, Nick Coghlan wrote: > >>- date SVN will go read only > > Please note that svn cannot be made completely read-only. ?We've already > decided that versions already in maintenance or security-only mode (2.5, 2.6, > 2.7, 3.1) will get updates and releases only via svn. ?But only the release > managers should have write access to the svn repositories. Again, something that should be in PEP 385 (but isn't). It seems that the work *is* going on, and the people actually doing it have a reasonable idea as to what has been decided and where things are going, but those of us "out here" have a fair stake in this as well, and without an up to date PEP 385 there's no one place to go to to see the current state of the migration. That's enough to make folks like me somewhat nervous as to whether or not we're actually going to have a usable source control system come December 12. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From dirkjan at ochtman.nl Fri Nov 19 16:00:40 2010 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Fri, 19 Nov 2010 16:00:40 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> <20101119094657.1a7cc24a@mission> Message-ID: On Fri, Nov 19, 2010 at 15:56, Nick Coghlan wrote: > That's enough to make folks like me somewhat nervous as to whether or > not we're actually going to have a usable source control system come > December 12. Yes, I've been negligent about updating the PEP. I'll try do so next week. Georg, if you have time to update it a bit, that would be great as well. Cheers, Dirkjan From g.brandl at gmx.net Fri Nov 19 17:23:44 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 19 Nov 2010 17:23:44 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <4CE62E23.9010701@v.loewis.de> References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> <4CE62E23.9010701@v.loewis.de> Message-ID: Am 19.11.2010 08:58, schrieb "Martin v. L?wis": > Am 19.11.2010 03:23, schrieb Benjamin Peterson: >> 2010/11/18 Jesus Cea : >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> On 18/11/10 18:32, "Martin v. L?wis" wrote: >>>> In general, I'm *also* concerned about the lack of volunteers that >>>> are interested in working on the infrastructure. I wish some of the >>>> people who stated that they can't wait for the migration to happen >>>> would work on solving some of the remaining problems. >>> >>> Do we have a exhaustive list of mercurial "to do" things?. >> >> http://hg.python.org/pymigr/file/1576eb34ec9f/tasks.txt > > This doesn't, but IMO should, list > > - resolve open issues in PEP > - finalize and implement branch structure > - set and implement policy for external code bases for Windows builds > - set up account management infrastructure, determine account managers Good points, I've added the missing ones to the todo list. Georg From john at arbash-meinel.com Fri Nov 19 17:38:16 2010 From: john at arbash-meinel.com (John Arbash Meinel) Date: Fri, 19 Nov 2010 10:38:16 -0600 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> Message-ID: <4CE6A7F8.3030008@arbash-meinel.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/19/2010 7:50 AM, Nick Coghlan wrote: > On Fri, Nov 19, 2010 at 5:43 PM, Georg Brandl wrote: >> Am 19.11.2010 03:23, schrieb Benjamin Peterson: >>> 2010/11/18 Jesus Cea : >>>> -----BEGIN PGP SIGNED MESSAGE----- >>>> Hash: SHA1 >>>> >>>> On 18/11/10 18:32, "Martin v. L?wis" wrote: >>>>> In general, I'm *also* concerned about the lack of volunteers that >>>>> are interested in working on the infrastructure. I wish some of the >>>>> people who stated that they can't wait for the migration to happen >>>>> would work on solving some of the remaining problems. >>>> >>>> Do we have a exhaustive list of mercurial "to do" things?. >>> >>> http://hg.python.org/pymigr/file/1576eb34ec9f/tasks.txt >> >> Uh, that's the list of things to do *at* the migration. The todo list is >> >> http://hg.python.org/pymigr/file/1576eb34ec9f/todo.txt > > That kind of link is the sort of thing that should really be in the > PEP... (along with the info about where to find the hooks repository, > specific URLs for at least 3.x, 3.1 and 2.7, pointers to a draft FAQ > to replace the current SVN focused FAQ, etc) > Well, if it goes in the pep, you should at least use the 'always the most recent' version :) http://hg.python.org/pymigr/file/tip/todo.txt John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkzmp/gACgkQJdeBCYSNAAOwjgCeOda2XeNvxOR0UnFuQOfN0zZt jGIAoIuarrvIz3oQ+o1jtnH5dFoFk35t =JJo8 -----END PGP SIGNATURE----- From g.brandl at gmx.net Fri Nov 19 17:51:23 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 19 Nov 2010 17:51:23 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> <20101119094657.1a7cc24a@mission> Message-ID: Am 19.11.2010 16:00, schrieb Dirkjan Ochtman: > On Fri, Nov 19, 2010 at 15:56, Nick Coghlan wrote: >> That's enough to make folks like me somewhat nervous as to whether or >> not we're actually going to have a usable source control system come >> December 12. > > Yes, I've been negligent about updating the PEP. I'll try do so next > week. Georg, if you have time to update it a bit, that would be great > as well. I'm at it. In fact, I think I will merge both todo.txt and tasks.txt into the PEP. It's not more of a burden to update it there, and it's more visible to the developer community. Georg From alexander.belopolsky at gmail.com Fri Nov 19 17:53:58 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 19 Nov 2010 11:53:58 -0500 Subject: [Python-Dev] len(chr(i)) = 2? Message-ID: I was recently surprised to learn that chr(i) can produce a string of length 2 in python 3.x. I suspect that I am not alone finding this behavior non-obvious given that a mistake in Python manual stating the contrary survived several releases. [1] Note that I am not arguing that the change was bad. In Python 2.x, \U escapes have been producing surrogate pair on narrow builds for a long time if not since introduction of unicode. I do believe, however that a change like this [2] and its consequences should be better publicized. I have not found any discussion of this change in PEPs or "What's new" documents. The closest find was a mentioning of a related issue #3280 in the 3.0 NEWS file. [3] Since this feature will be first documented in the Library Reference in 3.2, I wonder if it will be appropriate to mention it in "What's new in 3.2"? [1] http://bugs.python.org/issue7828 [2] http://svn.python.org/view?view=rev&revision=56395 [3] http://www.python.org/download/releases/3.0.1/NEWS.txt From g.brandl at gmx.net Fri Nov 19 17:53:28 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 19 Nov 2010 17:53:28 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <4CE68B73.30005@v.loewis.de> References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> <4CE68B73.30005@v.loewis.de> Message-ID: Am 19.11.2010 15:36, schrieb "Martin v. L?wis": >> - date Hg will be available for write access (it should be frozen for >> a while, to give the folks doing the conversion a chance to make sure >> buildbot is back up and run, commit emails are working properly, etc) > > I would target the build slaves to the Mercurial repository already in > the testing phase, e.g by creating builders for building from commits > to the 3k branch. I hope Buildbot supports multiple change sources now. > Likewise, I'd also see commit emails being delivered in the test phase > already, and let committers make test commits to trigger this all (and > also to get acquainted with the Mercurial tools they are going to use, > without fear of breaking something). I've already let my Mercurial buildbot configuration run for a few checkins while testing it; a separate changesource was not needed. The commit email hook also has been tested extensively by its usage for the distutils2 repo, which are also sent to python-checkins. That said, it will of course be nice to activate both for the test repo as well, once it's available. Georg From status at bugs.python.org Fri Nov 19 18:07:02 2010 From: status at bugs.python.org (Python tracker) Date: Fri, 19 Nov 2010 18:07:02 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20101119170702.BB0FA1DBAD@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2010-11-12 - 2010-11-19) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 2549 (+23) closed 19694 (+43) total 22243 (+66) Open issues with patches: 1058 Issues opened (43) ================== #2571: cmd.py always uses raw_input, even when another stdin is speci http://bugs.python.org/issue2571 reopened by eric.araujo #4153: Unicode HOWTO up to date? http://bugs.python.org/issue4153 reopened by belopolsky #6941: Socket error when launching IDLE http://bugs.python.org/issue6941 reopened by 08jpurcell #10356: decimal.py: hash of -1 http://bugs.python.org/issue10356 reopened by rhettinger #10399: AST Optimization: inlining of function calls http://bugs.python.org/issue10399 opened by dmalcolm #10401: Globals / builtins cache http://bugs.python.org/issue10401 opened by pitrou #10402: sporadic test_bsddb3 failures http://bugs.python.org/issue10402 opened by pitrou #10403: Use "member" consistently http://bugs.python.org/issue10403 opened by fdrake #10404: IDLE on OS X popup menus do not work: cannot set/clear breakpo http://bugs.python.org/issue10404 opened by ned.deily #10405: IDLE breakpoint facility undocumented http://bugs.python.org/issue10405 opened by ned.deily #10406: IDLE 2.7 on OS X does not enable Rstrip extension by default http://bugs.python.org/issue10406 opened by ned.deily #10407: missing errno import in distutils/dir_util.py http://bugs.python.org/issue10407 opened by zbysz #10408: Denser dicts and linear probing http://bugs.python.org/issue10408 opened by pitrou #10415: readline.insert_text documentation incomplete http://bugs.python.org/issue10415 opened by Justin.Lebar #10417: unittest triggers UnicodeEncodeError with non-ASCII character http://bugs.python.org/issue10417 opened by jammon #10419: distutils command build_scripts fails with UnicodeDecodeError http://bugs.python.org/issue10419 opened by hagen #10420: Document of Bdb.effective is wrong. http://bugs.python.org/issue10420 opened by naoki #10423: s/args/options in arpgarse "Upgrading optparse code" http://bugs.python.org/issue10423 opened by bethard #10424: better error message from argparse when positionals missing http://bugs.python.org/issue10424 opened by bethard #10427: 24:00 Hour in DateTime http://bugs.python.org/issue10427 opened by ingo.janssen #10430: _sha.sha().digest() method is endian-sensitive. and hexdigest( http://bugs.python.org/issue10430 opened by krisvale #10433: Document unique behavior of 'getgroups' on OSX http://bugs.python.org/issue10433 opened by r.david.murray #10434: Document the rules for "public names" http://bugs.python.org/issue10434 opened by belopolsky #10435: Document unicode C-API in reST http://bugs.python.org/issue10435 opened by belopolsky #10436: tarfile.extractfile in "r|" stream mode fails with filenames o http://bugs.python.org/issue10436 opened by David.Nesting #10437: ThreadPoolExecutor should accept max_workers=None http://bugs.python.org/issue10437 opened by stutzbach #10438: list an example for calling static methods from WITHIN classes http://bugs.python.org/issue10438 opened by ifreecarve #10439: PyCodec C API is not documented in reST http://bugs.python.org/issue10439 opened by belopolsky #10441: some stdlib modules need to be updated to handle SSL certifica http://bugs.python.org/issue10441 opened by db #10444: A mechanism is needed to override waiting for Python threads t http://bugs.python.org/issue10444 opened by michaelahughes #10446: pydoc3 links to 2.x library reference http://bugs.python.org/issue10446 opened by belopolsky #10448: Add Mako template benchmark to Python Benchmark Suite http://bugs.python.org/issue10448 opened by bobbyi #10449: ???os.environ was modified by test_httpservers??? http://bugs.python.org/issue10449 opened by eric.araujo #10450: Fix markup in Misc/NEWS http://bugs.python.org/issue10450 opened by eric.araujo #10451: memoryview can be used to write into readonly buffer http://bugs.python.org/issue10451 opened by abacabadabacaba #10453: Add -h/--help option to compileall http://bugs.python.org/issue10453 opened by eric.araujo #10454: Clarify compileall command-line options http://bugs.python.org/issue10454 opened by eric.araujo #10457: "Related help topics" shown outside pager http://bugs.python.org/issue10457 opened by cben #10458: 2.7 += re.ASCII http://bugs.python.org/issue10458 opened by hfuru #10459: missing character names in unicodedata (CJK...) http://bugs.python.org/issue10459 opened by vbr #10460: Misc/indent.pro does not reflect PEP 7 http://bugs.python.org/issue10460 opened by Mick.Beaver #10461: Use with statement throughout the docs http://bugs.python.org/issue10461 opened by eric.araujo #10445: _ast py3k : add lineno back to "args" node http://bugs.python.org/issue10445 opened by emile.anclin Most recent 15 issues with no replies (15) ========================================== #10461: Use with statement throughout the docs http://bugs.python.org/issue10461 #10460: Misc/indent.pro does not reflect PEP 7 http://bugs.python.org/issue10460 #10457: "Related help topics" shown outside pager http://bugs.python.org/issue10457 #10451: memoryview can be used to write into readonly buffer http://bugs.python.org/issue10451 #10449: ???os.environ was modified by test_httpservers??? http://bugs.python.org/issue10449 #10445: _ast py3k : add lineno back to "args" node http://bugs.python.org/issue10445 #10439: PyCodec C API is not documented in reST http://bugs.python.org/issue10439 #10437: ThreadPoolExecutor should accept max_workers=None http://bugs.python.org/issue10437 #10433: Document unique behavior of 'getgroups' on OSX http://bugs.python.org/issue10433 #10424: better error message from argparse when positionals missing http://bugs.python.org/issue10424 #10423: s/args/options in arpgarse "Upgrading optparse code" http://bugs.python.org/issue10423 #10420: Document of Bdb.effective is wrong. http://bugs.python.org/issue10420 #10419: distutils command build_scripts fails with UnicodeDecodeError http://bugs.python.org/issue10419 #10406: IDLE 2.7 on OS X does not enable Rstrip extension by default http://bugs.python.org/issue10406 #10405: IDLE breakpoint facility undocumented http://bugs.python.org/issue10405 Most recent 15 issues waiting for review (15) ============================================= #10448: Add Mako template benchmark to Python Benchmark Suite http://bugs.python.org/issue10448 #10446: pydoc3 links to 2.x library reference http://bugs.python.org/issue10446 #10444: A mechanism is needed to override waiting for Python threads t http://bugs.python.org/issue10444 #10435: Document unicode C-API in reST http://bugs.python.org/issue10435 #10419: distutils command build_scripts fails with UnicodeDecodeError http://bugs.python.org/issue10419 #10408: Denser dicts and linear probing http://bugs.python.org/issue10408 #10406: IDLE 2.7 on OS X does not enable Rstrip extension by default http://bugs.python.org/issue10406 #10404: IDLE on OS X popup menus do not work: cannot set/clear breakpo http://bugs.python.org/issue10404 #10401: Globals / builtins cache http://bugs.python.org/issue10401 #10399: AST Optimization: inlining of function calls http://bugs.python.org/issue10399 #10391: obj2ast's error handling can lead to python crashing with a C- http://bugs.python.org/issue10391 #10385: Mark up "subprocess" as module in its doc http://bugs.python.org/issue10385 #10383: test_os leaks under Windows http://bugs.python.org/issue10383 #10382: Command line error marker misplaced on unicode entry http://bugs.python.org/issue10382 #10371: Deprecate trace module undocumented API http://bugs.python.org/issue10371 Top 10 most discussed issues (10) ================================= #3871: cross and native build of python for mingw32 with distutils http://bugs.python.org/issue3871 17 msgs #10441: some stdlib modules need to be updated to handle SSL certifica http://bugs.python.org/issue10441 16 msgs #2001: Pydoc interactive browsing enhancement http://bugs.python.org/issue2001 14 msgs #10356: decimal.py: hash of -1 http://bugs.python.org/issue10356 14 msgs #10446: pydoc3 links to 2.x library reference http://bugs.python.org/issue10446 12 msgs #7900: posix.getgroups() failure on Mac OS X http://bugs.python.org/issue7900 11 msgs #10435: Document unicode C-API in reST http://bugs.python.org/issue10435 11 msgs #4153: Unicode HOWTO up to date? http://bugs.python.org/issue4153 10 msgs #10417: unittest triggers UnicodeEncodeError with non-ASCII character http://bugs.python.org/issue10417 8 msgs #1553375: Add traceback.print_full_exception() http://bugs.python.org/issue1553375 8 msgs Issues closed (44) ================== #4471: IMAP4 missing support for starttls http://bugs.python.org/issue4471 closed by pitrou #4476: compileall fails if current dir has a "types" package http://bugs.python.org/issue4476 closed by ncoghlan #5111: httplib: wrong Host header when connecting to IPv6 litteral UR http://bugs.python.org/issue5111 closed by orsenthil #7828: chr() and ord() documentation for wide characters http://bugs.python.org/issue7828 closed by belopolsky #8649: Py_UNICODE_* functions are undocumented http://bugs.python.org/issue8649 closed by belopolsky #9076: Add C-API documentation for PyUnicode_AsDecodedObject/Unicode http://bugs.python.org/issue9076 closed by georg.brandl #9520: Add Patricia Trie high performance container http://bugs.python.org/issue9520 closed by rhettinger #9991: xmlrpc client ssl check faulty http://bugs.python.org/issue9991 closed by orsenthil #10070: 2to3 wishes for already-2to3'ed files http://bugs.python.org/issue10070 closed by loewis #10205: Can't have two tags with the same QName http://bugs.python.org/issue10205 closed by orsenthil #10260: Add a threading.Condition.wait_for() method http://bugs.python.org/issue10260 closed by krisvale #10373: Setup Script example incorrect http://bugs.python.org/issue10373 closed by eric.araujo #10392: GZipFile crash when fileobj.mode is None http://bugs.python.org/issue10392 closed by r.david.murray #10396: stdin argument to pdb.Pdb doesn't work unless you also set Pdb http://bugs.python.org/issue10396 closed by georg.brandl #10397: Unified Benchmark Suite fails on py3k with --track-memory http://bugs.python.org/issue10397 closed by pitrou #10398: errors in docs re module initialization vs self arg to functio http://bugs.python.org/issue10398 closed by georg.brandl #10400: updating unicodedata to Unicode 6 http://bugs.python.org/issue10400 closed by loewis #10409: mkcfg crashes with ValueError http://bugs.python.org/issue10409 closed by tarek #10410: Is iterable a container type? http://bugs.python.org/issue10410 closed by rhettinger #10411: Pickle benchmark fails after converting Benchmark Suite to py3 http://bugs.python.org/issue10411 closed by pitrou #10412: Add py3k support for "slow" pickle benchmark in Benchmark Suit http://bugs.python.org/issue10412 closed by pitrou #10413: Comments in unicode.h are out of date http://bugs.python.org/issue10413 closed by belopolsky #10414: socket.gethostbyname doesn't return an ipv6 address http://bugs.python.org/issue10414 closed by loewis #10416: UnicodeDecodeError when 2to3 is run on a dir with numpy .npy f http://bugs.python.org/issue10416 closed by benjamin.peterson #10418: test_io hangs on 3.1.3rc1 http://bugs.python.org/issue10418 closed by vdupras #10421: Failed issue tracker submission http://bugs.python.org/issue10421 closed by eric.araujo #10422: pstats.py : error when loading multiple stats files http://bugs.python.org/issue10422 closed by ezio.melotti #10425: xmlrpclib support for None isn't compliant with XMLRPC http://bugs.python.org/issue10425 closed by orsenthil #10426: The whole thing is NOT good http://bugs.python.org/issue10426 closed by georg.brandl #10428: IDLE Trouble shooting http://bugs.python.org/issue10428 closed by r.david.murray #10429: bug in test_imaplib http://bugs.python.org/issue10429 closed by pitrou #10431: Failed issue tracker submission http://bugs.python.org/issue10431 closed by ezio.melotti #10432: concurrent.futures.as_completed() spins waiting for futures to http://bugs.python.org/issue10432 closed by bquinlan #10440: support RUSAGE_THREAD as a constant in the resource module http://bugs.python.org/issue10440 closed by pitrou #10442: Please by default enforce ssl certificate checking in modules http://bugs.python.org/issue10442 closed by ned.deily #10443: add wrapper for SSL_CTX_set_default_verify_paths http://bugs.python.org/issue10443 closed by pitrou #10447: zipfile: IOError for long directory paths on Windows http://bugs.python.org/issue10447 closed by amaury.forgeotdarc #10452: Unhelpful diagnostic 'cannot find the path specified' http://bugs.python.org/issue10452 closed by eric.smith #10455: typo in urllib.request documentation http://bugs.python.org/issue10455 closed by ezio.melotti #10456: unittest.main(verbosity=2) broke in python31, worked when I ha http://bugs.python.org/issue10456 closed by r.david.murray #1599329: urllib(2) should allow automatic decoding by charset http://bugs.python.org/issue1599329 closed by eric.araujo #1376292: Write user's version of the reference guide http://bugs.python.org/issue1376292 closed by akuchling #1509798: replace dist/src/Tools/scripts/which.py with tmick's which http://bugs.python.org/issue1509798 closed by eric.araujo #1520831: urrlib2 max_redirections=0 disables redirects http://bugs.python.org/issue1520831 closed by orsenthil From g.brandl at gmx.net Fri Nov 19 18:12:22 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 19 Nov 2010 18:12:22 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <20101119094657.1a7cc24a@mission> References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> <20101119094657.1a7cc24a@mission> Message-ID: Am 19.11.2010 15:46, schrieb Barry Warsaw: > On Nov 19, 2010, at 11:50 PM, Nick Coghlan wrote: > >>- date SVN will go read only > > Please note that svn cannot be made completely read-only. We've already > decided that versions already in maintenance or security-only mode (2.5, 2.6, > 2.7, 3.1) will get updates and releases only via svn. But only the release > managers should have write access to the svn repositories. Really? I can understand this for security-only branches (commits there will be rare, and equivalent commits to the Mercurial branches can be made by others than the release managers, in order to keep history consistent). But having the maintenance branches (by then, that will mostly be 2.7 because 3.1 will go to security-only mode soon) in SVN will be a burden for every developer, since they have to backport bugfixes from Hg to SVN... Georg From solipsis at pitrou.net Fri Nov 19 18:17:20 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 19 Nov 2010 18:17:20 +0100 Subject: [Python-Dev] len(chr(i)) = 2? References: Message-ID: <20101119181720.10ec11d3@pitrou.net> On Fri, 19 Nov 2010 11:53:58 -0500 Alexander Belopolsky wrote: > Since this feature will be first documented in the > Library Reference in 3.2, I wonder if it will be appropriate to > mention it in "What's new in 3.2"? No, since it's not new in 3.2. No need to further confuse users. If there's a porting guide to 3.x it should be mentioned there. Regards Antoine. From barry at python.org Fri Nov 19 18:41:58 2010 From: barry at python.org (Barry Warsaw) Date: Fri, 19 Nov 2010 12:41:58 -0500 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> <20101119094657.1a7cc24a@mission> Message-ID: <20101119124158.3d8debc9@mission> On Nov 19, 2010, at 06:12 PM, Georg Brandl wrote: >Am 19.11.2010 15:46, schrieb Barry Warsaw: >> On Nov 19, 2010, at 11:50 PM, Nick Coghlan wrote: >> >>>- date SVN will go read only >> >> Please note that svn cannot be made completely read-only. We've already >> decided that versions already in maintenance or security-only mode (2.5, 2.6, >> 2.7, 3.1) will get updates and releases only via svn. But only the release >> managers should have write access to the svn repositories. > >Really? I can understand this for security-only branches (commits there will >be rare, and equivalent commits to the Mercurial branches can be made by >others than the release managers, in order to keep history consistent). > >But having the maintenance branches (by then, that will mostly be 2.7 because >3.1 will go to security-only mode soon) in SVN will be a burden for every >developer, since they have to backport bugfixes from Hg to SVN... Maybe I misremembered Martin's suggestion, and he was only talking about security releases. I think the key thing is whether you're going to backport the vcs related bits to stable releases. I plan to only do releases for 2.6 from svn, because it's not worth breaking things like sys.subversion, and as you say the number of commits will be small. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From solipsis at pitrou.net Fri Nov 19 19:06:09 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 19 Nov 2010 19:06:09 +0100 Subject: [Python-Dev] Mercurial Schedule References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> <20101119094657.1a7cc24a@mission> <20101119124158.3d8debc9@mission> Message-ID: <20101119190609.637c7a72@pitrou.net> On Fri, 19 Nov 2010 12:41:58 -0500 Barry Warsaw wrote: > >Really? I can understand this for security-only branches (commits there will > >be rare, and equivalent commits to the Mercurial branches can be made by > >others than the release managers, in order to keep history consistent). > > > >But having the maintenance branches (by then, that will mostly be 2.7 because > >3.1 will go to security-only mode soon) in SVN will be a burden for every > >developer, since they have to backport bugfixes from Hg to SVN... > > Maybe I misremembered Martin's suggestion, and he was only talking about > security releases. I think the key thing is whether you're going to backport > the vcs related bits to stable releases. It would be horribly burdensome to use two different VCSes depending on whether you're working on a bugfix branch or a feature branch. > I plan to only do releases for 2.6 from svn, because it's not worth breaking > things like sys.subversion, and as you say the number of commits will be > small. But 2.6 is security-fixes only, right? It would really be annoying if the same rules applied for 2.7 and 3.1. I don't understand all the worry about sys.subversion. It's not like it's useful to anybody else than us, and I think it should have been named sys._subversion instead. There's no point in making API-like promises about which DVCS, bug tracker or documentation toolset we use for our workflow. Regards Antoine. From merwok at netwok.org Fri Nov 19 19:41:54 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Fri, 19 Nov 2010 19:41:54 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <20101119190609.637c7a72@pitrou.net> References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> <20101119094657.1a7cc24a@mission> <20101119124158.3d8debc9@mission> <20101119190609.637c7a72@pitrou.net> Message-ID: <4CE6C4F2.2040806@netwok.org> > I don't understand all the worry about sys.subversion. It's not like > it's useful to anybody else than us, and I think it should have been > named sys._subversion instead. There's no point in making API-like > promises about which DVCS, bug tracker or documentation toolset we use > for our workflow. I read ?subversion? as ?sub-piece of information about version?, not the name of a VCS, so I have no problem with its continuing existence under Mercurial (it?s in PEP 385). Regards From brett at python.org Fri Nov 19 19:52:03 2010 From: brett at python.org (Brett Cannon) Date: Fri, 19 Nov 2010 10:52:03 -0800 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> Message-ID: On Fri, Nov 19, 2010 at 05:50, Nick Coghlan wrote: > On Fri, Nov 19, 2010 at 5:43 PM, Georg Brandl wrote: >> Am 19.11.2010 03:23, schrieb Benjamin Peterson: >>> 2010/11/18 Jesus Cea : >>>> -----BEGIN PGP SIGNED MESSAGE----- >>>> Hash: SHA1 >>>> >>>> On 18/11/10 18:32, "Martin v. L?wis" wrote: >>>>> In general, I'm *also* concerned about the lack of volunteers that >>>>> are interested in working on the infrastructure. I wish some of the >>>>> people who stated that they can't wait for the migration to happen >>>>> would work on solving some of the remaining problems. >>>> >>>> Do we have a exhaustive list of mercurial "to do" things?. >>> >>> http://hg.python.org/pymigr/file/1576eb34ec9f/tasks.txt >> >> Uh, that's the list of things to do *at* the migration. ?The todo list is >> >> http://hg.python.org/pymigr/file/1576eb34ec9f/todo.txt > > That kind of link is the sort of thing that should really be in the > PEP... (along with the info about where to find the hooks repository, > specific URLs for at least 3.x, 3.1 and 2.7, pointers to a draft FAQ > to replace the current SVN focused FAQ, etc) I am spending my PSF grant time in January rewriting python.org/dev practically from scratch. Any needed updates to take Mercurial in account will happen no later than then. -Brett > > Target dates for the following specific activities would also be useful: > - date a "final draft" of converted repository will be made available > to Martin and Ronald to dry run creation of Windows and Mac OS X > installers > - date SVN will go read only > - date Hg will be available for write access (it should be frozen for > a while, to give the folks doing the conversion a chance to make sure > buildbot is back up and run, commit emails are working properly, etc) > > So as long as we acknowledge that any migration problems may mean > additional beta releases of 3.2 to iron things out, I don't see a > problem with releasing beta 1 as planned to close the door on any > *other* new features, and giving the Hg migration a clear run at the > source repository before we start working seriously on dealing with > bug reports (either existing ones, or those from the first beta). > > Cheers, > Nick. > > -- > Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org > From victor.stinner at haypocalc.com Fri Nov 19 21:23:14 2010 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 19 Nov 2010 21:23:14 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: Message-ID: <201011192123.14169.victor.stinner@haypocalc.com> Hi, On Friday 19 November 2010 17:53:58 Alexander Belopolsky wrote: > I was recently surprised to learn that chr(i) can produce a string of > length 2 in python 3.x. Yes, but only on narrow build. Eg. Debian and Ubuntu compile Python 3.1 in wide mode (sys.maxunicode == 1114111). > I suspect that I am not alone finding this behavior non-obvious > given that a mistake in Python manual stating the contrary survived > several releases. [1] It was a documentation bug and you fixed it. Non-BMP characters are rare, so few (maybe only you?) noticed the documentation bug. I consider the behaviour as an improvment of non-BMP support of Python3. Python is unclear about non-BMP characters: narrow build was called "ucs2" for long time, even if it is UTF-16 (each character is encoded to one or two UTF-16 words). Python2 accepts non-BMP characters with \U syntax, but not with chr(). This is inconsistent and I see this as a bug. But I don't want to touch Python2 about non-BMP characters, and the "bug" is already fixed in Python3! > I do believe, however that a change like > this [2] and its consequences should be better publicized. Change made before the release of Python 3.0. Do you want to patch the "What's new in Python 3.0?" document? > I have not > found any discussion of this change in PEPs or "What's new" documents. > The closest find was a mentioning of a related issue #3280 in the 3.0 > NEWS file. [3] Since this feature will be first documented in the > Library Reference in 3.2, I wonder if it will be appropriate to > mention it in "What's new in 3.2"? In my opinion, the question is more what was it not fixed in Python2. I suppose that the answer is something ugly like "backward compatibility" or "historical reasons" :-) Victor From martin at v.loewis.de Fri Nov 19 22:25:08 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 19 Nov 2010 22:25:08 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <20101119124158.3d8debc9@mission> References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> <20101119094657.1a7cc24a@mission> <20101119124158.3d8debc9@mission> Message-ID: <4CE6EB34.5010805@v.loewis.de> > Maybe I misremembered Martin's suggestion, and he was only talking about > security releases. Technically, I was only talking about 2.5. For each branch, the respective release manager should make a decision. For 2.5 and 2.6, it's been decided; Benjamin has not yet announced plans how 2.7 and 3.1 will be maintained after the switchover. Regards, Martin From martin at v.loewis.de Fri Nov 19 22:35:54 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 19 Nov 2010 22:35:54 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <20101119190609.637c7a72@pitrou.net> References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> <20101119094657.1a7cc24a@mission> <20101119124158.3d8debc9@mission> <20101119190609.637c7a72@pitrou.net> Message-ID: <4CE6EDBA.9040706@v.loewis.de> > I don't understand all the worry about sys.subversion. Really? For a security release, there should be *zero* chance that it breaks existing applications, unless the application relies on the security bug that has been fixed. By "zero chance", I mean absolutely no chance, never. I'm pretty sure that applications *will* break because of the change to sys.subversion, or sys.version. People made bug reports complaining that sys.version has a newline on some systems and not on others. > It's not like > it's useful to anybody else than us I think you underestimate what API people actually use in applications http://tinyurl.com/292vhxx http://tinyurl.com/23ah8ps http://tinyurl.com/27fhyvk http://tinyurl.com/28cuyv9 etc. Regards, Martin From g.brandl at gmx.net Fri Nov 19 22:39:04 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 19 Nov 2010 22:39:04 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <4CE6EDBA.9040706@v.loewis.de> References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> <20101119094657.1a7cc24a@mission> <20101119124158.3d8debc9@mission> <20101119190609.637c7a72@pitrou.net> <4CE6EDBA.9040706@v.loewis.de> Message-ID: Am 19.11.2010 22:35, schrieb "Martin v. L?wis": >> I don't understand all the worry about sys.subversion. > > Really? For a security release, there should be *zero* chance that it > breaks existing applications, unless the application relies on the > security bug that has been fixed. By "zero chance", I mean absolutely > no chance, never. I'm pretty sure that applications *will* break because > of the change to sys.subversion, or sys.version. People made bug reports > complaining that sys.version has a newline on some systems and not on > others. > >> It's not like >> it's useful to anybody else than us > > I think you underestimate what API people actually use in applications > > http://tinyurl.com/292vhxx > http://tinyurl.com/23ah8ps > http://tinyurl.com/27fhyvk > http://tinyurl.com/28cuyv9 > etc. Well, it should not be a problem to continue to provide a sys.subversion that at least will not break applications reading it. And yes, I am in favor of giving the new attribute a leading underscore. Georg From solipsis at pitrou.net Fri Nov 19 22:43:12 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 19 Nov 2010 22:43:12 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <4CE6EDBA.9040706@v.loewis.de> References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> <20101119094657.1a7cc24a@mission> <20101119124158.3d8debc9@mission> <20101119190609.637c7a72@pitrou.net> <4CE6EDBA.9040706@v.loewis.de> Message-ID: <1290202992.3621.4.camel@localhost.localdomain> Le vendredi 19 novembre 2010 ? 22:35 +0100, "Martin v. L?wis" a ?crit : > > I don't understand all the worry about sys.subversion. > > Really? For a security release, there should be *zero* chance that it > breaks existing applications, It should have been clear that my message explicitly excluded security releases. Regards Antoine. From martin at v.loewis.de Fri Nov 19 22:43:45 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 19 Nov 2010 22:43:45 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <201011192123.14169.victor.stinner@haypocalc.com> References: <201011192123.14169.victor.stinner@haypocalc.com> Message-ID: <4CE6EF91.1040803@v.loewis.de> > In my opinion, the question is more what was it not fixed in Python2. I suppose > that the answer is something ugly like "backward compatibility" or "historical > reasons" :-) No, there was a deliberate decision to not support that, see http://www.python.org/dev/peps/pep-0261/ There had been a long discussion on this specific detail when PEP 261 was written, and in the end, an explicit, deliberate, considered decision was made to raise a ValueError. Regards, Martin From ezio.melotti at gmail.com Fri Nov 19 23:05:51 2010 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Sat, 20 Nov 2010 00:05:51 +0200 Subject: [Python-Dev] [Python-checkins] r86530 - python/branches/py3k/Doc/howto/unicode.rst In-Reply-To: <20101119161003.AAEEAEE9A6@mail.python.org> References: <20101119161003.AAEEAEE9A6@mail.python.org> Message-ID: <4CE6F4BF.9050409@gmail.com> Hi, On 19/11/2010 18.10, alexander.belopolsky wrote: > Author: alexander.belopolsky > Date: Fri Nov 19 17:09:58 2010 > New Revision: 86530 > > Log: > Issue #4153: Updated Unicode HOWTO. > > Modified: > python/branches/py3k/Doc/howto/unicode.rst > > Modified: python/branches/py3k/Doc/howto/unicode.rst > ============================================================================== > --- python/branches/py3k/Doc/howto/unicode.rst (original) > +++ python/branches/py3k/Doc/howto/unicode.rst Fri Nov 19 17:09:58 2010 > > > [...] > > > -Python 2.x's Unicode Support > -============================ > +Python's Unicode Support > +======================== > > Now that you've learned the rudiments of Unicode, we can look at Python's > Unicode features. > @@ -265,7 +263,7 @@ > UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: > unexpected code byte > >>> b'\x80abc'.decode("utf-8", "replace") > - '\ufffdabc' > + '?abc' Apparently 'make latex' and 'make all-pdf' don't like this char. > >>> b'\x80abc'.decode("utf-8", "ignore") > 'abc' > > [...] Best Regards, Ezio Melotti From benjamin at python.org Fri Nov 19 23:20:25 2010 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 19 Nov 2010 16:20:25 -0600 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <4CE6EB34.5010805@v.loewis.de> References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> <20101119094657.1a7cc24a@mission> <20101119124158.3d8debc9@mission> <4CE6EB34.5010805@v.loewis.de> Message-ID: 2010/11/19 "Martin v. L?wis" : >> Maybe I misremembered Martin's suggestion, and he was only talking about >> security releases. > > Technically, I was only talking about 2.5. For each branch, the > respective release manager should make a decision. For 2.5 and 2.6, > it's been decided; Benjamin has not yet announced plans how 2.7 and 3.1 > will be maintained after the switchover. I propose that they follow the development branches over to hg. Having to backport bug fixes with any frequency from hg to svn would probably be more unpleasant than the current svnmerge situation. -- Regards, Benjamin From mal at egenix.com Fri Nov 19 23:25:03 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 19 Nov 2010 23:25:03 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <201011192123.14169.victor.stinner@haypocalc.com> References: <201011192123.14169.victor.stinner@haypocalc.com> Message-ID: <4CE6F93F.9010109@egenix.com> Victor Stinner wrote: > Hi, > > On Friday 19 November 2010 17:53:58 Alexander Belopolsky wrote: >> I was recently surprised to learn that chr(i) can produce a string of >> length 2 in python 3.x. > > Yes, but only on narrow build. Eg. Debian and Ubuntu compile Python 3.1 in > wide mode (sys.maxunicode == 1114111). > >> I suspect that I am not alone finding this behavior non-obvious >> given that a mistake in Python manual stating the contrary survived >> several releases. [1] > > It was a documentation bug and you fixed it. Non-BMP characters are rare, so > few (maybe only you?) noticed the documentation bug. I consider the behaviour > as an improvment of non-BMP support of Python3. > > Python is unclear about non-BMP characters: narrow build was called "ucs2" for > long time, even if it is UTF-16 (each character is encoded to one or two > UTF-16 words). No, no, no :-) UCS2 and UCS4 are more appropriate than "narrow" and "wide" or even "UTF-16" and "UTF-32". It'S rather common to confuse a transfer encoding with a storage format. UCS2 and UCS4 refer to code units (the storage format). You can use UCS2 and UCS4 code units to represent UTF-16 and UTF-32 resp., but those are not the same things. In UTF-16 0xD800 has a special meaning, in UCS2 it doesn't. Python uses UCS2 internally. It does not assign a special meaning to those surrogate code point ranges. However, when it comes to codecs, we do try to make use of the fact that UCS2 can easily be used to represent an UTF-16 encoding and that's why you often see surrogates being created for code points that wouldn't otherwise fit into UCS2 and you see those surrogates being converted back to single code units in UCS4 builds. I don't know who invented the terms "narrow" and "wide" builds for Python3. Not me that's for sure :-) They don't have any meaning in Unicode terminology and thus cause even more confusion than UCS2 and UCS4. E.g. the import errors you get when importing extensions built for a different Unicode version, (correctly) refer to UCS2 vs. UCS4 and now give even less of a clue that they relate to difference in Unicode builds (since these are now labeled "narrow" and "wide"). IMO, we should go back to the Python2 terms UCS2 and UCS4 which are correct and provide a clear description of what Python uses internally for code units. > Python2 accepts non-BMP characters with \U syntax, but not with > chr(). This is inconsistent and I see this as a bug. But I don't want to touch > Python2 about non-BMP characters, and the "bug" is already fixed in Python3! > >> I do believe, however that a change like >> this [2] and its consequences should be better publicized. > > Change made before the release of Python 3.0. Do you want to patch the "What's > new in Python 3.0?" document? Perhaps add a section "What we forgot to mention in 3.0" or "What's not so new in 3.2" to "What's new in 3.2" :-) >> I have not >> found any discussion of this change in PEPs or "What's new" documents. >> The closest find was a mentioning of a related issue #3280 in the 3.0 >> NEWS file. [3] Since this feature will be first documented in the >> Library Reference in 3.2, I wonder if it will be appropriate to >> mention it in "What's new in 3.2"? > > In my opinion, the question is more what was it not fixed in Python2. I suppose > that the answer is something ugly like "backward compatibility" or "historical > reasons" :-) Backwards compatibility. Python2 applications don't expect unichr(i) to return anything other than a single character. If you need this in Python2, it's easy enough to get around, though, with a little helper function. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 19 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From martin at v.loewis.de Fri Nov 19 23:46:08 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 19 Nov 2010 23:46:08 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CE6F93F.9010109@egenix.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> Message-ID: <4CE6FE30.1050903@v.loewis.de> > It'S rather common to confuse a transfer encoding with a storage format. > UCS2 and UCS4 refer to code units (the storage format). Actually, they don't. Instead, they refer to "coded character sets", in W3C terminology: mapping of characters to natural numbers. See http://unicode.org/faq/basic_q.html#14 The term "UCS-2" is a character set that can encode only encode 65536 characters; it thus refers to Unicode 1.1. According to the Unicode Consortium's FAQ, the term UCS-2 should be avoided these days. > IMO, we should go back to the Python2 terms UCS2 and UCS4 which > are correct and provide a clear description of what Python uses > internally for code units. No, we shouldn't. The term UCS-2 is deprecated, see above. Regards, Martin From v+python at g.nevcal.com Sat Nov 20 04:48:58 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 19 Nov 2010 19:48:58 -0800 Subject: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4 Message-ID: <4CE7452A.7050109@g.nevcal.com> So maybe this is the wrong forum, if so please tell me what the right forum is for each of the various pieces. I'm assuming that I should file some bugs in the tracker, but I'm not exactly sure whether to file them on cgitb, http.server, or subprocess, or all of the above. Pretty sure there are at least some in http.server, but maybe some of those will be considered "enhancement requests" since they are long outstanding in the predecessor code. So I've been writing CGI scripts in Python behind Apache. No framework, just raw CGI. Got everything working on Python 2.6 (it's the newest that the hosting company has). Whacked at 2.6's CGIHTTPServer.py until I got an environment that would actually run CGI programs in the same sort of way that Apache does, so I can test faster, locally. Got the site working. Am happy. Now I decided to tackle porting the code to Python 3, in hopes that someday the hosting company might have it, and to see what I could learn about the "Subject:" matters, and to altruistically see if 3.2a4 has a consistent story. Um. Well. Some of me, Python 3.2a4, or its documentation is missing something. Maybe several somethings. Here's some code to ponder. import sys import traceback sys.stdout = open("sob", "wb") # WSGI sez data should be binary, so stdout should be binary??? import cgitb sys.stdout.write(b"out") fhb = open("fhb", "wb") cgitb.enable(0,"d:\temp") fhb.write("abcdef") # try writing non-binary to binary file. Expect an error, of course. Feed it to python32... d:\temp>c:\python32\python.exe test11.py Error in sys.excepthook: TypeError: 'str' does not support the buffer interface Original exception was: Traceback (most recent call last): File "d:\my\py\test11.py", line 8, in fhb.write("abcdef") # try writing non-binary to binary file. Expect an err or, of course. TypeError: 'str' does not support the buffer interface So it seems that cgitb can't write to binary files, to report the error? Or how else should I interpret the Error in sys.excepthook ? So then I tweaked the code for cgitb's enjoyment: import sys import traceback sys.stdout = open("sob", "w", encoding="UTF-8") # WSGI sez data should be binary, so stdout should be binary??? import cgitb sys.stdout.write("out") fhb = open("fhb", "wb") cgitb.enable(0,"d:\temp") fhb.write("abcdef") # try writing non-binary to binary file. Expect an error, of course. Now I get the following report in the stdout file: out --> -->

A problem occurred in a Python script. and the following error on the console: d:\temp>c:\python32\python.exe test12.py Error in sys.excepthook: Traceback (most recent call last): File "c:\python32\lib\tempfile.py", line 209, in _mkstemp_inner fd = _os.open(file, flags, 0o600) OSError: [Errno 22] Invalid argument Original exception was: Traceback (most recent call last): File "d:\my\py\test12.py", line 8, in fhb.write("abcdef") # try writing non-binary to binary file. Expect an error, of course. TypeError: 'str' does not support the buffer interface I was expecting see a whole cgitb in sob, but no such luck. Not sure why it is trying to create a temporary file, but it seems to fail to do that. Of course, the next test, would have been to write binary data into fhb, and try to copy it to stdout, which would fail, because stdout has to not be binary to make cgitb work??? That brings me to http.server, the 3.2a4 replacement for CGIHTTPServer. There are definitely some improvements here, and some reported-but-yet-unfixed bugs. And some pitiful missing features, especially on Windows. I applied some of the whacks I had applied to CGIHTTPServer, and got some things working, but, per what I was trying to demonstrate above, there seems to be an incompatibility with the idea of using cgitb (which wants stdout open with some encoding provided) and serving binary files (which wants stdout open in binary) [this latter is supported by the WSGI spec too]. So it seems to be that there are some problems. Yet, it seems that http.server can some accept the data sent by cgitb, which comes from subprocess running my CGI script, but my CGI script fails to be able to copy a binary file to its stdout (a subprocess created PIPE). The subprocess documentation doesn't say what encoding is supplied to the PIPE-created handles, if any, but since cgitb data is accepted but binary file data is not, I infer it must be a non-binary handle, encoding unknown. The subprocess documentation doesn't document any way to specify what encoding should be used on the PIPE-created handles, either. So this isn't very enlightening. In the absence of a specification or parameter, I would have expected the PIPEs to be binary, but this seems to be experimentally false. Yet http.server, when serving plain files, seems to open them in binary mode, and transfer them successfully to the browser. And it can also accept the non-binary?? data from cgitb from my CGI script, and display it in the browser. The former comes from a file it opens in binary mode, and the latter from the subprocess PIPE in unknown mode. It seems that the socketfile.server opens the socket in "wb" mode, and encodes most data. That in turn, seems to imply that the binary data from SimpleHTTPServer files are reasonably returned, and I note the headers and such are expliticly encoded before being written to wfile... again, consistent with the socket, wfile, being in binary mode. But the data coming back from the subprocess PIPE from my CGI script seems to be acceptable to be written to wfile also, implying that the PIPEs are binary, like the absence of specifications and parameters and knowledge of pipes as being bytestreams would be expected. But then, it would seem that the cgitb output should be in binary to get into the PIPE, but it seems that using a binary stdout makes cgitb fail, in the above experiment... and I can't find any code in cgitb that does explicit encoding. So I'm confused, and it seems a little extra documentation might help decide which are the modules that have bugs or missing features, and which do not. One of the cgitb outputs from my attempt to serve the binary file claims that my CGI script's output file (which comes from a subprocess PIPE) is a TextIOWrapper with encoding cp1252. Maybe that is the default that comes when a new Python is launched, even though it gets a subprocess PIPE as stdout? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sat Nov 20 05:11:48 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 20 Nov 2010 13:11:48 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CE6FE30.1050903@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> Message-ID: <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > The term "UCS-2" is a character set that can encode only encode 65536 > characters; it thus refers to Unicode 1.1. According to the Unicode > Consortium's FAQ, the term UCS-2 should be avoided these days. So what do you propose we call the Python implementation? You can call it "code-unit-oriented" if you like, but in fact it is identical to UCS-2 for all non-hairsplitting purposes. AFAICS the Unicode Consortium deprecates the *term* UCS-2 because they would like us to avoid *implementations* that don't encode the full Unicode character set, not because the term is technically incorrect. Strictly speaking, internally Python only encodes 65536 characters in 2-octet builds. Its (Unicode) string-handling code does not know about surrogates at all, AFAIK, and therefore is not UTF-16 conforming. (The anomolies discussed here are type transformations, not string-handling, for my purpose.) I really don't see why we shouldn't call a UCS-2 implementation by its name. AFAIK this was not supposed to change in Python 3; indexing and slicing go by code unit (isomorphic to UCS-n), not character, and due to PEP 383 4-octet builds do not conform (internally) to UTF-32, and can produce output that conforms to Unicode not at all (as a user option, of course, but it's still non-conformant). > > IMO, we should go back to the Python2 terms UCS2 and UCS4 which > > are correct and provide a clear description of what Python uses > > internally for code units. > > No, we shouldn't. The term UCS-2 is deprecated, see above. Too bad for the Unicode Consortium, I say. UCS-2 is the closest term that folks who are not Unicode geeks will have a chance of understanding. I agree with Marc-Andre that "narrow" and "wide" are too ambiguous to be useful. Many people will interpret that as "UTF-16" (or even "UTF-8") and "UTF-32", respectively, which is dead wrong. Others won't have a clue. Using "UCS-2" and "UCS-4" has the correct connotations to Unicode geeks, and they are easy to look up for non-geeks who care about precise definitions. Cf. the second half of the FAQ you quote: Instead, "UCS-2" has sometimes been used in the past to indicate that an implementation does not support supplementary characters and doesn't interpret pairs of surrogate code points as characters. Such an implementation would not handle processing like character properties, codepoint boundaries, collation, etc. for supplementary characters. "Hey, Python, I'm looking at you!" (Strictly speaking, Python libraries do some of that for us, but the Python *language* does not.) From brian.curtin at gmail.com Sat Nov 20 05:24:38 2010 From: brian.curtin at gmail.com (Brian Curtin) Date: Fri, 19 Nov 2010 22:24:38 -0600 Subject: [Python-Dev] [Python-checkins] r86540 - in python/branches/py3k: Parser/asdl_c.py Python/Python-ast.c In-Reply-To: <20101120020146.25797EE989@mail.python.org> References: <20101120020146.25797EE989@mail.python.org> Message-ID: On Fri, Nov 19, 2010 at 20:01, benjamin.peterson wrote: > Author: benjamin.peterson > Date: Sat Nov 20 03:01:45 2010 > New Revision: 86540 > > Log: > c89 declarations > > Modified: > python/branches/py3k/Parser/asdl_c.py > python/branches/py3k/Python/Python-ast.c > > Modified: python/branches/py3k/Parser/asdl_c.py > > ============================================================================== > --- python/branches/py3k/Parser/asdl_c.py (original) > +++ python/branches/py3k/Parser/asdl_c.py Sat Nov 20 03:01:45 2010 > @@ -366,9 +366,9 @@ > self.emit("obj2ast_%s(PyObject* obj, %s* out, PyArena* arena)" % > (name, ctype), 0) > self.emit("{", 0) > self.emit("PyObject* tmp = NULL;", 1) > + self.emit("int isinstance;", 1) > # Prevent compiler warnings about unused variable. > self.emit("tmp = tmp;", 1) > - self.emit("int isinstance;", 1) > self.emit("", 0) > > def sumTrailer(self, name, add_label=False): > > Modified: python/branches/py3k/Python/Python-ast.c > > ============================================================================== > --- python/branches/py3k/Python/Python-ast.c (original) > +++ python/branches/py3k/Python/Python-ast.c Sat Nov 20 03:01:45 2010 > @@ -3375,8 +3375,8 @@ > obj2ast_mod(PyObject* obj, mod_ty* out, PyArena* arena) > { > PyObject* tmp = NULL; > - tmp = tmp; > int isinstance; > + tmp = tmp; Windows builds fail due to this change. -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Sat Nov 20 07:56:18 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 19 Nov 2010 22:56:18 -0800 Subject: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4 In-Reply-To: <4CE7452A.7050109@g.nevcal.com> References: <4CE7452A.7050109@g.nevcal.com> Message-ID: <4CE77112.3080604@g.nevcal.com> On 11/19/2010 7:48 PM, Glenn Linderman wrote: > One of the cgitb outputs from my attempt to serve the binary file > claims that my CGI script's output file (which comes from a subprocess > PIPE) is a TextIOWrapper with encoding cp1252. Maybe that is the > default that comes when a new Python is launched, even though it gets > a subprocess PIPE as stdout? So the rather gross code below solves the cp1252 stdout problem, and also permits both strings and bytes to be written to the same file, although those two features are separable. But now that I've worked around it, it seems that subprocesss should somehow ensure that launched Python programs know they are working on a binary stream? Of course, not all programs launched are Python programs... so maybe it should be a documentation issue, but it seems to be missing from the documentation. ##################################### if sys.version_info[ 0 ] == 2: class IOMix(): def __init__( self, fh, encoding="UTF-8"): self.fh = fh def write( self, param ): if isinstance( param, unicode ): self.fh.write( param.encode( encoding )) else: self.fh.write( param ) ##################################### if sys.version_info[ 0 ] == 3: class IOMix(): def __init__( self, fh, encoding="UTF-8"): if hasattr( fh, 'buffer'): self.bio = fh.buffer fh.flush() self.last = 'b' import io self.txt = io.TextIOWrapper( self.bio, encoding, None, '\r\n') else: raise ValueError("not a buffered stream") def write( self, param ): if isinstance( param, str ): self.last = 't' self.txt.write( param ) else: if self.last == 't': self.txt.flush() self.last = 'b' self.bio.write( param ) ##################################### -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Sat Nov 20 10:05:38 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 20 Nov 2010 10:05:38 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CE78F62.7060707@v.loewis.de> Am 20.11.2010 05:11, schrieb Stephen J. Turnbull: > "Martin v. L?wis" writes: > > > The term "UCS-2" is a character set that can encode only encode 65536 > > characters; it thus refers to Unicode 1.1. According to the Unicode > > Consortium's FAQ, the term UCS-2 should be avoided these days. > > So what do you propose we call the Python implementation? A technical correct description would be to say that Python uses either 16-bit code units or 32-bit code units; for brevity, these can be called narrow and wide code units. > Strictly speaking, internally Python only encodes 65536 characters in > 2-octet builds. Its (Unicode) string-handling code does not know > about surrogates at all, AFAIK Here you are mistaken: it does indeed know about UTF-16 and surrogates in several places, e.g. in the UTF-8 codec, or in the repr() implementation; likewise in the parser. > and therefore is not UTF-16 conforming. I disagree. Python does "conform" to "UTF-16" (certainly in the sense that no UTF-16 specification ever mandates a certain Python API, and that Python follows all general requirements of the UTF-16 specification). > AFAIK this was not supposed to change in Python 3; indexing and > slicing go by code unit (isomorphic to UCS-n), not character, and due > to PEP 383 4-octet builds do not conform (internally) to UTF-32, and > can produce output that conforms to Unicode not at all (as a user > option, of course, but it's still non-conformant). What behavior specifically do you consider non-conforming, and what specific specification do you think it is not conforming to? For example, it *is* fully conforming with UTF-8. Regards, Martin From merwok at netwok.org Sat Nov 20 12:38:53 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Sat, 20 Nov 2010 12:38:53 +0100 Subject: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4 In-Reply-To: <4CE7452A.7050109@g.nevcal.com> References: <4CE7452A.7050109@g.nevcal.com> Message-ID: <4CE7B34D.4020309@netwok.org> Hello > cgitb.enable(0,"d:\temp") Isn?t that expanded to ?d: emp?? From ncoghlan at gmail.com Sat Nov 20 14:16:27 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 20 Nov 2010 23:16:27 +1000 Subject: [Python-Dev] [Python-checkins] pymigr: Build identification patch is updated, but only for Unix. In-Reply-To: References: Message-ID: On Sat, Nov 20, 2010 at 6:02 PM, georg.brandl wrote: > georg.brandl pushed abd0dc1328ce to pymigr: > > http://hg.python.org/pymigr/rev/abd0dc1328ce > changeset: ? 70:abd0dc1328ce > tag: ? ? ? ? tip > user: ? ? ? ?Georg Brandl > date: ? ? ? ?Sat Nov 20 09:01:03 2010 +0100 > summary: ? ? Build identification patch is updated, but only for Unix. > files: ? ? ? todo.txt Does this repository use the same set of hooks as distutils2? (I'm hoping not, since if it does, my change to the email hook didn't work...) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Nov 20 14:55:57 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 20 Nov 2010 23:55:57 +1000 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> <20101119094657.1a7cc24a@mission> Message-ID: On Sat, Nov 20, 2010 at 2:51 AM, Georg Brandl wrote: > I'm at it. ?In fact, I think I will merge both todo.txt and tasks.txt > into the PEP. ?It's not more of a burden to update it there, and it's > more visible to the developer community. The latest checkin was definitely an improvement (especially the updated timeline). According to the PEP, the .hgeol rules aren't currently enforced server side - having such a hook in place before Hg went live was definitely one of the things we agreed on before the hgeol extension even existed in a usable form. For fixing whitespace issues (another open question mentioned in the PEP), "make patchcheck" can continue to handle that - no need to create a Hg specific extension for it. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Nov 20 16:21:32 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Nov 2010 01:21:32 +1000 Subject: [Python-Dev] [Python-checkins] r86566 - in python/branches/py3k: Doc/glossary.rst Doc/library/inspect.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS Misc/python-wing4.wpr In-Reply-To: <20101120150731.2D346E78E@mail.python.org> References: <20101120150731.2D346E78E@mail.python.org> Message-ID: On Sun, Nov 21, 2010 at 1:07 AM, michael.foord wrote: > +Fetching attributes statically > +------------------------------ > + > +Both :func:`getattr` and :func:`hasattr` can trigger code execution when > +fetching or checking for the existence of attributes. Descriptors, like > +properties, will be invoked and :meth:`__getattr__` and :meth:`__getattribute__` > +may be called. > + > +For cases where you want passive introspection, like documentation tools, this > +can be inconvenient. `getattr_static` has the same signature as :func:`getattr` > +but avoids executing code when it fetches attributes. This description feels a little strong to me - getattr_static still executes all those things on the metaclass as it retrieves the information it needs to do the "static" lookup. Leaving this original description (which assumes metaclass=type) alone and adding a note near the end of the section to say that metaclass code is still executed might be an improvement. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Sat Nov 20 16:29:13 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 20 Nov 2010 15:29:13 +0000 Subject: [Python-Dev] [Python-checkins] r86566 - in python/branches/py3k: Doc/glossary.rst Doc/library/inspect.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS Misc/python-wing4.wpr In-Reply-To: References: <20101120150731.2D346E78E@mail.python.org> Message-ID: <4CE7E949.5030300@voidspace.org.uk> On 20/11/2010 15:21, Nick Coghlan wrote: > On Sun, Nov 21, 2010 at 1:07 AM, michael.foord > wrote: >> +Fetching attributes statically >> +------------------------------ >> + >> +Both :func:`getattr` and :func:`hasattr` can trigger code execution when >> +fetching or checking for the existence of attributes. Descriptors, like >> +properties, will be invoked and :meth:`__getattr__` and :meth:`__getattribute__` >> +may be called. >> + >> +For cases where you want passive introspection, like documentation tools, this >> +can be inconvenient. `getattr_static` has the same signature as :func:`getattr` >> +but avoids executing code when it fetches attributes. > This description feels a little strong to me - getattr_static still > executes all those things on the metaclass as it retrieves the > information it needs to do the "static" lookup. Leaving this original > description (which assumes metaclass=type) alone and adding a note > near the end of the section to say that metaclass code is still > executed might be an improvement. Can you give an example of code in a metaclass that may be executed by getattr_static? It's not that I don't believe you I just can't think of an example. Looking up the class and the mro are the only two examples I can think of (klass.__mro__ and instance.__class__ - and they are noted in the docs?) but aren't metaclass specific. Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From solipsis at pitrou.net Sat Nov 20 16:42:30 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 20 Nov 2010 16:42:30 +0100 Subject: [Python-Dev] r86570 - in python/branches/py3k: Lib/unittest/case.py Lib/unittest/test/test_case.py Misc/NEWS References: <20101120153426.47AC0ED9A@mail.python.org> Message-ID: <20101120164230.5dc326bc@pitrou.net> On Sat, 20 Nov 2010 16:34:26 +0100 (CET) michael.foord wrote: > + > + def testPickle(self): > + # Issue 10326 > + > + # Can't use TestCase classes defined in Test class as > + # pickle does not work with inner classes > + test = unittest.TestCase('run') > + for protocol in range(pickle.HIGHEST_PROTOCOL + 1): > + > + # blew up prior to fix > + pickled_test = pickle.dumps(test, protocol=protocol) You must also check that the object can be unpickled, otherwise making TestCase picklable is not only pointless, but misleading the user. Other classes which claim to be picklable (such as e.g. io.BytesIO) are careful to check that unpickling works fine and produces an usable object. Regards Antoine. From fuzzyman at voidspace.org.uk Sat Nov 20 16:48:59 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 20 Nov 2010 15:48:59 +0000 Subject: [Python-Dev] r86570 - in python/branches/py3k: Lib/unittest/case.py Lib/unittest/test/test_case.py Misc/NEWS In-Reply-To: <20101120164230.5dc326bc@pitrou.net> References: <20101120153426.47AC0ED9A@mail.python.org> <20101120164230.5dc326bc@pitrou.net> Message-ID: <4CE7EDEB.9080706@voidspace.org.uk> On 20/11/2010 15:42, Antoine Pitrou wrote: > On Sat, 20 Nov 2010 16:34:26 +0100 (CET) > michael.foord wrote: >> + >> + def testPickle(self): >> + # Issue 10326 >> + >> + # Can't use TestCase classes defined in Test class as >> + # pickle does not work with inner classes >> + test = unittest.TestCase('run') >> + for protocol in range(pickle.HIGHEST_PROTOCOL + 1): >> + >> + # blew up prior to fix >> + pickled_test = pickle.dumps(test, protocol=protocol) > You must also check that the object can be unpickled, otherwise > making TestCase picklable is not only pointless, but misleading the > user. Other classes which claim to be picklable (such as e.g. > io.BytesIO) are careful to check that unpickling works fine and > produces an usable object. Well, given the *particular* bug it is fixing, ensuring that the TestCase instances can be pickled is enough. If they fail to unpickle that is a bug in pickle and not in unittest. *However*, the test is very easy to extend to what you suggest so I have done it. All the best, Michael > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From solipsis at pitrou.net Sat Nov 20 16:59:49 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 20 Nov 2010 16:59:49 +0100 Subject: [Python-Dev] r86570 - in python/branches/py3k: Lib/unittest/case.py Lib/unittest/test/test_case.py Misc/NEWS In-Reply-To: <4CE7EDEB.9080706@voidspace.org.uk> References: <20101120153426.47AC0ED9A@mail.python.org> <20101120164230.5dc326bc@pitrou.net> <4CE7EDEB.9080706@voidspace.org.uk> Message-ID: <1290268789.3560.12.camel@localhost.localdomain> Le samedi 20 novembre 2010 ? 15:48 +0000, Michael Foord a ?crit : > On 20/11/2010 15:42, Antoine Pitrou wrote: > > On Sat, 20 Nov 2010 16:34:26 +0100 (CET) > > michael.foord wrote: > >> + > >> + def testPickle(self): > >> + # Issue 10326 > >> + > >> + # Can't use TestCase classes defined in Test class as > >> + # pickle does not work with inner classes > >> + test = unittest.TestCase('run') > >> + for protocol in range(pickle.HIGHEST_PROTOCOL + 1): > >> + > >> + # blew up prior to fix > >> + pickled_test = pickle.dumps(test, protocol=protocol) > > You must also check that the object can be unpickled, otherwise > > making TestCase picklable is not only pointless, but misleading the > > user. Other classes which claim to be picklable (such as e.g. > > io.BytesIO) are careful to check that unpickling works fine and > > produces an usable object. > > Well, given the *particular* bug it is fixing, ensuring that the > TestCase instances can be pickled is enough. If they fail to unpickle > that is a bug in pickle and not in unittest. It wouldn't be, no. pickle provides several different APIs to ensure that state gets correctly stored *and* restored, but it's up to application classes such as TestCase to ensure that they implement those APIs correctly for the intended behaviour. Therefore, checking that pickling "works" fine (or, rather, seems to work) is only half ot the job. (for example, if you define a __getstate__, chances are you must define a __setstate__ too, and it is your job to make it work properly) Antoine. From ncoghlan at gmail.com Sat Nov 20 17:01:06 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Nov 2010 02:01:06 +1000 Subject: [Python-Dev] [Python-checkins] r86566 - in python/branches/py3k: Doc/glossary.rst Doc/library/inspect.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS Misc/python-wing4.wpr In-Reply-To: <4CE7E949.5030300@voidspace.org.uk> References: <20101120150731.2D346E78E@mail.python.org> <4CE7E949.5030300@voidspace.org.uk> Message-ID: On Sun, Nov 21, 2010 at 1:29 AM, Michael Foord wrote: > Can you give an example of code in a metaclass that may be executed by > getattr_static? It's not that I don't believe you I just can't think of an > example. Looking up the class and the mro are the only two examples I can > think of (klass.__mro__ and instance.__class__ - and they are noted in the > docs?) but aren't metaclass specific. The description heavily implies that arbitrary Python code won't be executed by calling getattr_static, and that isn't necessarily true. It's almost certain to be true in the case when the metaclass is type, but can't be guaranteed otherwise. The retrieval of __class__ is a normal lookup on the object, so it can trigger all of the things getattr_static is trying to avoid (unavoidable if you want to support proxy classes at all), and the lookup of __mro__ invokes all of those things on the metaclass. I'll see if I'm still of the same opinion after I sleep on it, but my first impression of the docs was that they slightly oversold the strength of the "doesn't execute arbitrary code" aspect of the new function. The existing caveats were all relating to when getattr() and getattr_static() might give different answers, while the additional caveats I was suggesting related to cases where arbitrary code may still be executed. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Sat Nov 20 17:06:59 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 20 Nov 2010 16:06:59 +0000 Subject: [Python-Dev] [Python-checkins] r86566 - in python/branches/py3k: Doc/glossary.rst Doc/library/inspect.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS Misc/python-wing4.wpr In-Reply-To: References: <20101120150731.2D346E78E@mail.python.org> <4CE7E949.5030300@voidspace.org.uk> Message-ID: <4CE7F223.5040009@voidspace.org.uk> On 20/11/2010 16:01, Nick Coghlan wrote: > On Sun, Nov 21, 2010 at 1:29 AM, Michael Foord > wrote: >> Can you give an example of code in a metaclass that may be executed by >> getattr_static? It's not that I don't believe you I just can't think of an >> example. Looking up the class and the mro are the only two examples I can >> think of (klass.__mro__ and instance.__class__ - and they are noted in the >> docs?) but aren't metaclass specific. > The description heavily implies that arbitrary Python code won't be > executed by calling getattr_static, and that isn't necessarily true. > It's almost certain to be true in the case when the metaclass is type, > but can't be guaranteed otherwise. Given the way that member lookups are done by getattr_static I don't think any assumptions about the metaclass are made. I'm happy to be proven wrong (but would rather fix it than document it as an exception). (Actually we assume the metaclass doesn't use __slots__, but only because it isn't *possible* for a metaclass to use __slots__.) > The retrieval of __class__ is a > normal lookup on the object, so it can trigger all of the things > getattr_static is trying to avoid (unavoidable if you want to support > proxy classes at all), and the lookup of __mro__ invokes all of those > things on the metaclass. __class__ and mro lookup are noted in the docs as being exceptions. We could actually remove the __class__ lookup from the list of exceptions by using type(...) instead of obj.__class__. > I'll see if I'm still of the same opinion after I sleep on it, but my > first impression of the docs was that they slightly oversold the > strength of the "doesn't execute arbitrary code" aspect of the new > function. The existing caveats were all relating to when getattr() and > getattr_static() might give different answers, while the additional > caveats I was suggesting related to cases where arbitrary code may > still be executed. I'm happy to change the wording to make the promise less strong. All the best, Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fuzzyman at voidspace.org.uk Sat Nov 20 17:10:42 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 20 Nov 2010 16:10:42 +0000 Subject: [Python-Dev] r86570 - in python/branches/py3k: Lib/unittest/case.py Lib/unittest/test/test_case.py Misc/NEWS In-Reply-To: <1290268789.3560.12.camel@localhost.localdomain> References: <20101120153426.47AC0ED9A@mail.python.org> <20101120164230.5dc326bc@pitrou.net> <4CE7EDEB.9080706@voidspace.org.uk> <1290268789.3560.12.camel@localhost.localdomain> Message-ID: <4CE7F302.8090909@voidspace.org.uk> On 20/11/2010 15:59, Antoine Pitrou wrote: > Le samedi 20 novembre 2010 ? 15:48 +0000, Michael Foord a ?crit : >> On 20/11/2010 15:42, Antoine Pitrou wrote: >>> On Sat, 20 Nov 2010 16:34:26 +0100 (CET) >>> michael.foord wrote: >>>> + >>>> + def testPickle(self): >>>> + # Issue 10326 >>>> + >>>> + # Can't use TestCase classes defined in Test class as >>>> + # pickle does not work with inner classes >>>> + test = unittest.TestCase('run') >>>> + for protocol in range(pickle.HIGHEST_PROTOCOL + 1): >>>> + >>>> + # blew up prior to fix >>>> + pickled_test = pickle.dumps(test, protocol=protocol) >>> You must also check that the object can be unpickled, otherwise >>> making TestCase picklable is not only pointless, but misleading the >>> user. Other classes which claim to be picklable (such as e.g. >>> io.BytesIO) are careful to check that unpickling works fine and >>> produces an usable object. >> Well, given the *particular* bug it is fixing, ensuring that the >> TestCase instances can be pickled is enough. If they fail to unpickle >> that is a bug in pickle and not in unittest. > It wouldn't be, no. pickle provides several different APIs to ensure > that state gets correctly stored *and* restored, but it's up to > application classes such as TestCase to ensure that they implement those > APIs correctly for the intended behaviour. Therefore, checking that > pickling "works" fine (or, rather, seems to work) is only half ot the > job. > > (for example, if you define a __getstate__, chances are you must define > a __setstate__ too, and it is your job to make it work properly) Yes, but unittest.TestCase doesn't implement any of those APIs (and if we did we would *definitely* need to test unpickling). That aside I have extended the test in the way you suggest. Actually it would be nice to implement custom pickling / unpickling methods to allow Python 2.7 / 3.2 pickled TestCases to be unpickled on earlier versions of Python. I couldn't see how to change the class name in the pickle using the pickle protocol methods. Suggestions welcomed. Michael > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fuzzyman at voidspace.org.uk Sat Nov 20 17:28:40 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 20 Nov 2010 16:28:40 +0000 Subject: [Python-Dev] [Python-checkins] r86566 - in python/branches/py3k: Doc/glossary.rst Doc/library/inspect.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS Misc/python-wing4.wpr In-Reply-To: <4CE7F223.5040009@voidspace.org.uk> References: <20101120150731.2D346E78E@mail.python.org> <4CE7E949.5030300@voidspace.org.uk> <4CE7F223.5040009@voidspace.org.uk> Message-ID: <4CE7F738.90706@voidspace.org.uk> On 20/11/2010 16:06, Michael Foord wrote: > On 20/11/2010 16:01, Nick Coghlan wrote: > [snip...] >> The retrieval of __class__ is a >> normal lookup on the object, so it can trigger all of the things >> getattr_static is trying to avoid (unavoidable if you want to support >> proxy classes at all), and the lookup of __mro__ invokes all of those >> things on the metaclass. > > __class__ and mro lookup are noted in the docs as being exceptions. We > could actually remove the __class__ lookup from the list of exceptions > by using type(...) instead of obj.__class__. > Done. >> I'll see if I'm still of the same opinion after I sleep on it, but my >> first impression of the docs was that they slightly oversold the >> strength of the "doesn't execute arbitrary code" aspect of the new >> function. The existing caveats were all relating to when getattr() and >> getattr_static() might give different answers, while the additional >> caveats I was suggesting related to cases where arbitrary code may >> still be executed. > I'm happy to change the wording to make the promise less strong. I've also removed the __mro__ exception. This is done with: type.__dict__['__mro__'].__get__(klass) If you can think of any other exceptions then please let me know. Michael > All the best, > > Michael > >> Cheers, >> Nick. >> > > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From v+python at g.nevcal.com Sat Nov 20 19:19:11 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sat, 20 Nov 2010 10:19:11 -0800 Subject: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4 In-Reply-To: <4CE7B34D.4020309@netwok.org> References: <4CE7452A.7050109@g.nevcal.com> <4CE7B34D.4020309@netwok.org> Message-ID: <4CE8111F.9060502@g.nevcal.com> On 11/20/2010 3:38 AM, ?ric Araujo wrote: > Hello > >> cgitb.enable(0,"d:\temp") > Isn?t that expanded to ?d: emp?? > Oops. Yes, that fixes the problem with creation of the temp file, thanks for catching that. I now get a complete report of the original error in the temp file (below). I am a bit less confused now... but it seems that there are still a number of issues. Here is an enumeration of problems I was hard pressed to make before you removed my confusion on this issue. 1. cgitb should expect to report to a binary stdout, using whatever encoding (possibly ASCII) that seems appropriate for the output that in generates. 2. Some appropriate documentation or API or both should be provided to enable a script to set "binary" mode for stdout for CGI scripts. This link demonstrates the confusion (wish I had found it earlier) that is encountered by such lack. One must tell msvcrt the stream is binary (I had figured that out early on), one must also sidestep the use of the cp1252 default when printing binary, one must also choose a proper text encoding corresponding to the HTTP headers sent. My second email in this thread, sent a few hours after the first, shows a convenient set of cures for all but msvcrt (as long as only "write" is used for writing. "print" support could be added, similarly). Likely something along this line is needed for stdin as well, I haven't yet experimented with uploading binary content to a CGI. One could speculate about having the Python runtime auto-detect CGI mode, but I don't know of any foolproof technique for that, and the selection of the "proper" text encoding depends on the details of the CGI, so having instead an API or two that assists with doing this sort of thing would be better; the need for documentation, at least, seems imperative. 3. subprocess documentation could be improved to point out that when using subprocess.PIPE to talk to a Python subprocess, that the communications will be in binary. Again, I don't know of any way to autodetect the subprocess environment, but if it were possible to select an appropriate encoding and use it consistently on both sides of the PIPE, that would be a convenience to its use; if not possible, documenting the issue, and providing an API to use to easily select such encodings both in client and server, would be helpful. While the layers are all there, and ".buffer" is documented for TextIOWrapper, the use of sys.stdout.buffer and the fact that it has a full set of operations isn't immediately obvious from the reference material; perhaps it is in a tutorial I haven't found, but... I was looking, and didn't find it. Of course, subprocess may launch non-Python programs; they will have their own ideas of binary vs text encoding, so it is important that it is convenient to match them on the Python side. It would be nice if subprocess had a mechanism for providing no-deadlock stdout data to the parent prior to the child terminating. A CGI implementation via subprocess shouldn't accumulate all of stdout (or all of stderr, for that matter, although less important). I don't (yet) know enough about Python threading to know if this is possible, but it certainly would be useful. 4. http.server has a number of bugs and limitations. 4a. _url_collapse_path_split seems inefficient (although I have to benchmark it against what I think would be more efficient), and for its only use within http.server it produces the wrong information, so the information has to be recombined and resplit to make it function properly, adding to the perception of inefficiency. 4b. Detection of "executable" on Windows is simply wrong. Unix execution bits do not exist. 4c. is_cgi doesn't properly handle PATHINFO parts of the path, this is the other half of 4a. The Python2.x CGIHTTPServer.py had this right, but the introduction and use of _url_collapse_path_split broke it. 4d. Searching for a ? to find an explicit query string should use .find('?') rather than .rfind('?') as there is no prohibition on using '?' within a query string, AFAIK. 4e. doesn't set the REQUEST_URI, HTTP_HOST, or HTTP_PORT environment variables for the CGI. 4f. Should not send the 200 response until it sees if the CGI sends a Status: header. 4g. Should not buffer all of stdout: subprocess.communicate is inappropriate for a web server CGI interface. The data should stream through to avoid consuming inordinate amounts of memory. The only solution within the current limitations of subprocess is to abandon stderr, force the CGI to do its own error logging, and use shutil.copyfileobj to hook up p.stdout to self.wfile once the Status: message processing has happened. 4h. Doesn't seem to close p.stdin (I'm not sure if that is necessary, it may happen when p is garbage collected, but effort was made to close p.stdout and p.stderr, which seem similar.) *TypeError* Python 3.2a4: c:\python32\python.exe Sat Nov 20 09:28:41 2010 A problem occurred in a Python script. Here is the sequence of function calls leading up to the error, in the order they occurred. d:\my\py\test12.py in **() 4 import cgitb 5 sys.stdout.write("out") 6 fhb = open("fhb", "wb") 7 cgitb.enable(0,"d:\\temp") => 8 fhb.write("abcdef") # try writing non-binary to binary file. Expect an error, of course. *fhb* = <_io.BufferedWriter name='fhb'>, fhb.*write* = *TypeError*: 'str' does not support the buffer interface args = ("'str' does not support the buffer interface",) with_traceback = -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sat Nov 20 23:32:28 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 20 Nov 2010 17:32:28 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CE78F62.7060707@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> Message-ID: On Sat, Nov 20, 2010 at 4:05 AM, "Martin v. L?wis" wrote: .. > A technical correct description would be to say that Python uses either > 16-bit code units or 32-bit code units; for brevity, these can be called > narrow and wide code units. +1 PEP 261 introduced terms "wide Py_UNICODE" and "narrow Py_UNICODE," but when discussion is at Python level, I don't think we should use names of C typedefs. I think "wide/narrow Unicode" builds describe the two options clearly and unambiguously. I prefer Python-specific terminology to Unicode terms because in Python reference documentation we often discuss details that are outside of the scope of Unicode Standard. For example, interpretation of lone surrogates on narrow builds is one such detail. From ziade.tarek at gmail.com Sun Nov 21 00:05:12 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sun, 21 Nov 2010 00:05:12 +0100 Subject: [Python-Dev] Reminder: Distutils vs Distutils2 Message-ID: Hello, I have seen some efforts recently to improve Distutils in the standard library, Just a quick reminder of the status of Distutils: it's frozen and is just being bug fixed at this time. The work I done last year was reverted and pushed to Distutils2. A lot of work has been done since then, and we had 4 GSOC students working this summer on Distutils2. It's backward-incompatible, so we can remove the things we don't like and add new things w/o suffering from backward compatibility pains. So if you want to improve the tool, or if you have some pending changes to Distutils, I would encourage you to join the Distutils2 effort and not to waste time on Distutils anymore. The patches that did not make it to Distutils can still be added in Distutils2, for most of them. The workflow we currently use to change the code is as follow and make it easy for everyone to contribute: 1. clone http://bitbucket.org/tarek/distutils2 2. discuss / propose a patch on IRC (#distutils - Freenode) or on the dedicated mailing list (http://groups.google.com/group/the-fellowship-of-the-packaging) 3. I review and merge all changes at bitbucket, then push them on http://hg,python.org/distutils2 Crazy ideas are welcome. "setup.py" is gone in d2 for instance ;) Thanks ! Regards. Tarek -- Tarek Ziad? | http://ziade.org From ziade.tarek at gmail.com Sun Nov 21 00:15:41 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sun, 21 Nov 2010 00:15:41 +0100 Subject: [Python-Dev] Reminder: Distutils vs Distutils2 In-Reply-To: References: Message-ID: On Sun, Nov 21, 2010 at 12:05 AM, Tarek Ziad? wrote: .. > Crazy ideas are welcome. "setup.py" is gone in d2 for instance ;) But you can still use a similar form if you want - just to mention From ncoghlan at gmail.com Sun Nov 21 04:52:19 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Nov 2010 13:52:19 +1000 Subject: [Python-Dev] [Python-checkins] r86566 - in python/branches/py3k: Doc/glossary.rst Doc/library/inspect.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS Misc/python-wing4.wpr In-Reply-To: <4CE7F223.5040009@voidspace.org.uk> References: <20101120150731.2D346E78E@mail.python.org> <4CE7E949.5030300@voidspace.org.uk> <4CE7F223.5040009@voidspace.org.uk> Message-ID: On Sun, Nov 21, 2010 at 2:06 AM, Michael Foord wrote: >> I'll see if I'm still of the same opinion after I sleep on it, but my >> first impression of the docs was that they slightly oversold the >> strength of the "doesn't execute arbitrary code" aspect of the new >> function. The existing caveats were all relating to when getattr() and >> getattr_static() might give different answers, while the additional >> caveats I was suggesting related to cases where arbitrary code may >> still be executed. > > I'm happy to change the wording to make the promise less strong. Your latest changes may have actually made the stronger wording accurate (I certainly can't think of any loopholes off the top of my head). If you did still want to soften the wording, I'd be inclined to replace the word "avoids" with "minimises" in the appropriate places. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sun Nov 21 04:54:11 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Nov 2010 13:54:11 +1000 Subject: [Python-Dev] [Python-checkins] r86566 - in python/branches/py3k: Doc/glossary.rst Doc/library/inspect.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS Misc/python-wing4.wpr In-Reply-To: <20101120150731.2D346E78E@mail.python.org> References: <20101120150731.2D346E78E@mail.python.org> Message-ID: On Sun, Nov 21, 2010 at 1:07 AM, michael.foord wrote: > Author: michael.foord > Date: Sat Nov 20 16:07:30 2010 > New Revision: 86566 > > Log: > Issue 9732: addition of getattr_static to the inspect module > > Modified: > ? python/branches/py3k/Doc/glossary.rst > ? python/branches/py3k/Doc/library/inspect.rst > ? python/branches/py3k/Lib/inspect.py > ? python/branches/py3k/Lib/test/test_inspect.py > ? python/branches/py3k/Misc/NEWS > ? python/branches/py3k/Misc/python-wing4.wpr Unrelated to my previous comment - when adding inspect.getgeneratorstate, I noticed that inspect.getattr_static isn't mentioned in the 3.2 What's New yet (I put a XXX placeholder in for you/Raymond). -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From v+python at g.nevcal.com Sun Nov 21 08:52:45 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sat, 20 Nov 2010 23:52:45 -0800 Subject: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4 In-Reply-To: <4CE8111F.9060502@g.nevcal.com> References: <4CE7452A.7050109@g.nevcal.com> <4CE7B34D.4020309@netwok.org> <4CE8111F.9060502@g.nevcal.com> Message-ID: <4CE8CFCD.4040906@g.nevcal.com> On 11/20/2010 10:19 AM, Glenn Linderman wrote: > Oops. Yes, that fixes the problem with creation of the temp file, > thanks for catching that. I now get a complete report of the > original error in the temp file (below). I am a bit less confused > now... but it seems that there are still a number of issues. Here is > an enumeration of problems I was hard pressed to make before you > removed my confusion on this issue. Related issues, regarding binary stream requirements for cgi interface. Perhaps the cgi module should have the API to set binary mode. http://bugs.python.org/issue1610654 http://bugs.python.org/issue8077 http://bugs.python.org/issue4953 Sadly, cgi.py input handling seems to depend on the email module, thought to be fixed for 3.2, but it is not clear if that has been achieved, or if the surrogate encode workaround is sufficient for this. More testing needed, but I don't have such a test case developed yet. > 1. cgitb should expect to report to a binary stdout, using whatever > encoding (possibly ASCII) that seems appropriate for the output that > in generates. Maybe cgi.py should have an API to set the stdin and stdout to binary streams. Although cgi.py deals more with stdin than stdout, cgitb deals more with stdout. Created http://bugs.python.org/issue10479 > > 2. Some appropriate documentation or API or both should be provided to > enable a script to set "binary" mode for stdout for CGI scripts. This > link > > demonstrates the confusion (wish I had found it earlier) that is > encountered by such lack. One must tell msvcrt the stream is binary > (I had figured that out early on), one must also sidestep the use of > the cp1252 default when printing binary, one must also choose a proper > text encoding corresponding to the HTTP headers sent. My second email > in this thread, sent a few hours after the first, shows a convenient > set of cures for all but msvcrt (as long as only "write" is used for > writing. "print" support could be added, similarly). Likely > something along this line is needed for stdin as well, I haven't yet > experimented with uploading binary content to a CGI. > > One could speculate about having the Python runtime auto-detect CGI > mode, but I don't know of any foolproof technique for that, and the > selection of the "proper" text encoding depends on the details of the > CGI, so having instead an API or two that assists with doing this sort > of thing would be better; the need for documentation, at least, seems > imperative. Created http://bugs.python.org/issue10480 > > 3. subprocess documentation could be improved to point out that when > using subprocess.PIPE to talk to a Python subprocess, that the > communications will be in binary. Again, I don't know of any way to > autodetect the subprocess environment, but if it were possible to > select an appropriate encoding and use it consistently on both sides > of the PIPE, that would be a convenience to its use; if not possible, > documenting the issue, and providing an API to use to easily select > such encodings both in client and server, would be helpful. > > While the layers are all there, and ".buffer" is documented for > TextIOWrapper, the use of sys.stdout.buffer and the fact that it has a > full set of operations isn't immediately obvious from the reference > material; perhaps it is in a tutorial I haven't found, but... I was > looking, and didn't find it. > > Of course, subprocess may launch non-Python programs; they will have > their own ideas of binary vs text encoding, so it is important that it > is convenient to match them on the Python side. > > It would be nice if subprocess had a mechanism for providing > no-deadlock stdout data to the parent prior to the child terminating. > A CGI implementation via subprocess shouldn't accumulate all of stdout > (or all of stderr, for that matter, although less important). I don't > (yet) know enough about Python threading to know if this is possible, > but it certainly would be useful. http://bugs.python.org/issue1048 for subprocess to document that communicate produces byte stream output. http://bugs.python.org/issue10482 for subprocess enhancements to handle more cases without deadlock. Found http://bugs.python.org/issue4571 which documents how to switch stdin/stdout/stderr to binary mode, and even back! I couldn't track the documented change to the actual documentation, though, but I did find it in section 26.1, under the documentation for the three stdio streams: def make_streams_binary(): sys.stdin = sys.stdin.detach() sys.stdout = sys.stdout.detach() > 4. http.server has a number of bugs and limitations. > 4a. _url_collapse_path_split seems inefficient (although I have to > benchmark it against what I think would be more efficient), and for > its only use within http.server it produces the wrong information, so > the information has to be recombined and resplit to make it function > properly, adding to the perception of inefficiency. > 4b. Detection of "executable" on Windows is simply wrong. Unix > execution bits do not exist. http://bugs.python.org/issue10483 for 4b. > 4c. is_cgi doesn't properly handle PATHINFO parts of the path, this is > the other half of 4a. The Python2.x CGIHTTPServer.py had this right, > but the introduction and use of _url_collapse_path_split broke it. http://bugs.python.org/issue10484 for 4a and 4c. > 4d. Searching for a ? to find an explicit query string should use > .find('?') rather than .rfind('?') as there is no prohibition on using > '?' within a query string, AFAIK. http://bugs.python.org/issue10485 for 4d. > 4e. doesn't set the REQUEST_URI, HTTP_HOST, or HTTP_PORT environment > variables for the CGI. http://bugs.python.org/issue10486 for 4e. > 4f. Should not send the 200 response until it sees if the CGI sends a > Status: header. http://bugs.python.org/issue10487 for 4f and 4g. > 4g. Should not buffer all of stdout: subprocess.communicate is > inappropriate for a web server CGI interface. The data should stream > through to avoid consuming inordinate amounts of memory. The only > solution within the current limitations of subprocess is to abandon > stderr, force the CGI to do its own error logging, and use > shutil.copyfileobj to hook up p.stdout to self.wfile once the Status: > message processing has happened. > 4h. Doesn't seem to close p.stdin (I'm not sure if that is necessary, > it may happen when p is garbage collected, but effort was made to > close p.stdout and p.stderr, which seem similar.) Discovered that subprocess.communicate closes p.stdin, so it wasn't needed until I quit using .communicate in my version of the code. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sun Nov 21 13:55:12 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 21 Nov 2010 21:55:12 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CE78F62.7060707@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> Message-ID: <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > Am 20.11.2010 05:11, schrieb Stephen J. Turnbull: > > "Martin v. L?wis" writes: > > > > > The term "UCS-2" is a character set that can encode only encode 65536 > > > characters; it thus refers to Unicode 1.1. According to the Unicode > > > Consortium's FAQ, the term UCS-2 should be avoided these days. > > > > So what do you propose we call the Python implementation? > > A technical correct description would be to say that Python uses either > 16-bit code units or 32-bit code units; for brevity, these can be called > narrow and wide code units. I agree that's technically correct. Unfortunately, it's also useless to anybody who doesn't already know more about Unicode than anybody should have to know. > > and therefore is not UTF-16 conforming. > > I disagree. Python does "conform" to "UTF-16" I'm sure the codecs do. But the Unicode standard doesn't care about the parts of the process, it cares about what it does as a whole. Python's internal coding does not conform to UTF-16, and that internal coding can, under certain conditions, escape to the outside world as invalid "Unicode" output. > > AFAIK this was not supposed to change in Python 3; indexing and > > slicing go by code unit (isomorphic to UCS-n), not character, and due > > to PEP 383 4-octet builds do not conform (internally) to UTF-32, and > > can produce output that conforms to Unicode not at all (as a user > > option, of course, but it's still non-conformant). > > What behavior specifically do you consider non-conforming, and what > specific specification do you think it is not conforming to? For > example, it *is* fully conforming with UTF-8. Oh, f = open('/tmp/broken','wt',encoding='utf8',errors='surrogateescape') f.write(chr(int('dc80',16))) f.close() for one. That produces a non-UTF-8 file in a 32-bit-code-unit build. You can say, "oh, but that's not really a UTF-8 codec", and I'd agree. Nevertheless, the program is able to produce output from internal "Unicode" strings that does not conform to Unicode at all. A Unicode- conforming Python implementation would error at the chr() call, or perhaps would not provide surrogateescape error handlers. It is, of course, possible to write Python programs that conform (and easier than in any other language I know), but Python itself does not conform to post-1.1 Unicode standards. Too bad for the standards: "Although practicality beats purity." The point is that internal code is *not* UTF-16 (or -32), but it *is* isomorphic to UCS-2 (or -4). *That is very useful information to users*, it's not a technical detail of interest only to Unicode geeks. It means that if you stick to defined characters in the BMP when giving Python input, then slicing and indexing unicode (Python 2) or str (Python 3) objects gives only valid output even in builds with 16-bit code units. OTOH, invalid processing (involving functions like 'chr' or input using surrogateescape codecs) can lead to invalid output even in builds with 32-bit code units. IMO, saying "UCS-2" or "UCS-4" tells ordinary developers most of what they need to know about the limitations of their Python vis-a-vis full conformance, at least with respect to the string manipulation functions. From rdmurray at bitdance.com Sun Nov 21 18:18:20 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 21 Nov 2010 12:18:20 -0500 Subject: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4 In-Reply-To: <4CE8CFCD.4040906@g.nevcal.com> References: <4CE7452A.7050109@g.nevcal.com> <4CE7B34D.4020309@netwok.org> <4CE8111F.9060502@g.nevcal.com> <4CE8CFCD.4040906@g.nevcal.com> Message-ID: <20101121171821.195552194AC@kimball.webabinitio.net> On Sat, 20 Nov 2010 23:52:45 -0800, Glenn Linderman wrote: > Sadly, cgi.py input handling seems to depend on the email module, > thought to be fixed for 3.2, but it is not clear if that has been > achieved, or if the surrogate encode workaround is sufficient for this. > More testing needed, but I don't have such a test case developed yet. Indeed, this should theoretically be fixable now. The email module is now perfectly capable of both consuming and producing binary data. The user of the module doesn't need to care how this was achieved unless they want to do processing of non-RFC conformant data. I want to look at the CGI issue, but I'm not sure when I'll get to it. -- R. David Murray www.bitdance.com From jcea at jcea.es Sun Nov 21 18:27:42 2010 From: jcea at jcea.es (Jesus Cea) Date: Sun, 21 Nov 2010 18:27:42 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <4CE2CF8F.4040500@jcea.es> References: <4CE2CF8F.4040500@jcea.es> Message-ID: <4CE9568E.4010102@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 What is the impact in the buildbot architecture?. Slaves must do anything?. At least they need to have mercurial installed, I guess. What, as a buildslave manager, must I do to ready my server for the migration?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOlWjplgi5GaxT1NAQKwJAP/W1w/mn3Jv9XECxGCLKFj1Xvjz4fKq8im e1oKpvrl5hzXfKfYtIC4K2fy5G4O3iP1gS/Iwy0iGSSqcpnxFIfpwcTpjigRGaBi rpZp956TosaSLTGZxS2Wb11KFxsGlhAcgVF2ooFF7Z+wL73wCyVjfUqMXCB/50Nr dztlJuv3Wvg= =ntFy -----END PGP SIGNATURE----- From rdmurray at bitdance.com Sun Nov 21 18:38:25 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 21 Nov 2010 12:38:25 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20101121173825.B1BFB235977@kimball.webabinitio.net> On Sun, 21 Nov 2010 21:55:12 +0900, "Stephen J. Turnbull" wrote: > "Martin v. L??wis" writes: > > Am 20.11.2010 05:11, schrieb Stephen J. Turnbull: > > > "Martin v. L??wis" writes: > > > > > > > The term "UCS-2" is a character set that can encode only encode 65536 > > > > characters; it thus refers to Unicode 1.1. According to the Unicode > > > > Consortium's FAQ, the term UCS-2 should be avoided these days. > > > > > > So what do you propose we call the Python implementation? > > > > A technical correct description would be to say that Python uses either > > 16-bit code units or 32-bit code units; for brevity, these can be called > > narrow and wide code units. > > I agree that's technically correct. Unfortunately, it's also useless > to anybody who doesn't already know more about Unicode than anybody > should have to know. [...] > The point is that internal code is *not* UTF-16 (or -32), but it *is* > isomorphic to UCS-2 (or -4). *That is very useful information to > users*, it's not a technical detail of interest only to Unicode geeks. > It means that if you stick to defined characters in the BMP when > giving Python input, then slicing and indexing unicode (Python 2) or > str (Python 3) objects gives only valid output even in builds with > 16-bit code units. OTOH, invalid processing (involving functions like > 'chr' or input using surrogateescape codecs) can lead to invalid > output even in builds with 32-bit code units. > > IMO, saying "UCS-2" or "UCS-4" tells ordinary developers most of what > they need to know about the limitations of their Python vis-a-vis full > conformance, at least with respect to the string manipulation functions. I'm sorry, but I have to disagree. As a relative unicode ignoramus, "UCS-2" and "UCS-4" convey almost no information to me, and the bits I have heard about them on this list have only confused me. On the other hand, I understand that 'narrow' means that fewer bytes are used for each internal character, meaning that some unicode characters need to be represented by more than one string element, and thus that slicing strings containing such characters on a narrow build causes problems. Now, you could tell me the same information using the terms 'UCS-2' and 'UCS-4' instead of 'narrow' and 'wide', but to my ear 'narrow' and 'wide' convey a better gut level feeling for what is going on than 'UCS-2' and 'UCS-4' do. And it avoids any question of whether or not Python's internal representation actually conforms to whatever standard it is that UCS refers to, a point on which there seems to be some dissension. Having written the above, I googled for UCS-2 and got the Wikipedia article on UTF16/UCS-2 [1]. Scanning that article, I do not see anything that would clue me in to the problems of slicing strings in a Python narrow build. Indeed, reading that article with my limited unicode knowledge, if I were told Python used UCS-2, I would assume that non-BMP characters could not be processed by a Python narrow build. -- R. David Murray www.bitdance.com [1] http://en.wikipedia.org/wiki/UTF-16/UCS-2 From g.brandl at gmx.net Sun Nov 21 18:58:53 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 21 Nov 2010 18:58:53 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <4CE9568E.4010102@jcea.es> References: <4CE2CF8F.4040500@jcea.es> <4CE9568E.4010102@jcea.es> Message-ID: Am 21.11.2010 18:27, schrieb Jesus Cea: > What is the impact in the buildbot architecture?. Slaves must do > anything?. At least they need to have mercurial installed, I guess. > > What, as a buildslave manager, must I do to ready my server for the > migration?. Apart from having Mercurial installed and "hg" in the PATH (that will be important for Windows I assume), I don't think anything else is required. Georg From raymond.hettinger at gmail.com Sun Nov 21 19:17:57 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 21 Nov 2010 10:17:57 -0800 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <20101121173825.B1BFB235977@kimball.webabinitio.net> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> Message-ID: <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> On Nov 21, 2010, at 9:38 AM, R. David Murray wrote: > > I'm sorry, but I have to disagree. As a relative unicode ignoramus, > "UCS-2" and "UCS-4" convey almost no information to me, and the bits I > have heard about them on this list have only confused me. From the users point of view, it doesn't much matter which encoding is used internally. Neither UTF-16 nor UCS-2 is exactly correct anyway. The former encodes the entire range of unicode characters in a variable length code (a character is usually 2 bytes but is sometimes 4 bytes long). The latter encodes only a subset of unicode (the basic mulitlingual plane) in a fixed-length code of bytes per character). What we use internally looks like utf-16 but a character encoded with 4 bytes is treated as two 2-byte characters (hence the subject of this thread). Our hybrid internal coding lets use handle the entire range of unicode while getting speed and simplicity by doing len() and slicing with a surrogate pair being treated as two separate characters). For the "wide" build, the entire range of unicode is encoded at 4 bytes per character and slicing/len operate correctly since every character is the same length. This used to be called UCS-4 and is now UTF-32. So, with "wide" builds there isn't much confusion (except perhaps unfamiliar terminology). The real issue seems to be that for "narrow" builds, none of the usual encoding names is exactly correct. From a users point-of-view, the actual encoding or encoding name doesn't matter much. They just need to be able to predict the relevant behaviors (memory consumption and len/slicing behavior). For the narrow build, that behavior is: - Characters in the BMP consume 2 bytes and count as one char for purposes of len and slicing. - Characters above the BMP consume 4 bytes and counts as two distinct chars for purpose of len and slicing. For wide builds, all characters are 4 bytes and count as a single char for len and slicing. Hope this helps, Raymond From martin at v.loewis.de Sun Nov 21 19:51:44 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 21 Nov 2010 19:51:44 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CE96A40.1050705@v.loewis.de> > > I disagree. Python does "conform" to "UTF-16" > > I'm sure the codecs do. But the Unicode standard doesn't care about > the parts of the process, it cares about what it does as a whole. Chapter and verse? > Python's internal coding does not conform to UTF-16, and that internal > coding can, under certain conditions, escape to the outside world as > invalid "Unicode" output. I'm fairly certain there are provisions in the Unicode standard for such behavior (taking into account "certain conditions"). > > What behavior specifically do you consider non-conforming, and what > > specific specification do you think it is not conforming to? For > > example, it *is* fully conforming with UTF-8. > > Oh, > > f = open('/tmp/broken','wt',encoding='utf8',errors='surrogateescape') > f.write(chr(int('dc80',16))) > f.close() > > for one. That produces a non-UTF-8 file Right. You are using an API that does not promise to create UTF-8, and hence isn't UTF-8. The Unicode standard certainly allows implementations to use character encoding schemes other than UTF-8; this one being "UTF-8 with surrogate escapes", which is different from "UTF-8" (IANA MIBEnum 106). > You can say, "oh, but that's not really a UTF-8 codec", and I'd agree. See above :-) > Nevertheless, the program is able to produce output from internal > "Unicode" strings that does not conform to Unicode at all. *Any* Unicode implementation will do that, since they all have to support legacy encodings in some form. This is certainly conforming to the Unicode standard, and in fact one of the primary Unicode design principles. > A Unicode- > conforming Python implementation would error at the chr() call, or > perhaps would not provide surrogateescape error handlers. Chapter and verse? > "Although practicality beats purity." The Unicode standard itself is based on practicality. It wouldn't have received the success it did if it was based on purity only (and indeed, was often rejected in cases where it put purity over practicality, e.g. with the Hangul syllables). Regards, Martin From rdmurray at bitdance.com Sun Nov 21 20:29:15 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 21 Nov 2010 14:29:15 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> Message-ID: <20101121192915.0FFE1209B7A@kimball.webabinitio.net> On Sun, 21 Nov 2010 10:17:57 -0800, Raymond Hettinger wrote: > On Nov 21, 2010, at 9:38 AM, R. David Murray wrote: > > I'm sorry, but I have to disagree. As a relative unicode ignoramus, > > "UCS-2" and "UCS-4" convey almost no information to me, and the bits I > > have heard about them on this list have only confused me. [...] > 6rom a users point-of-view, the actual encoding or encoding name > doesn't matter much. They just need to be able to predict the relevant > behaviors (memory consumption and len/slicing behavior). > > For the narrow build, that behavior is: > - Characters in the BMP consume 2 bytes and count as one char > for purposes of len and slicing. > - Characters above the BMP consume 4 bytes and counts as > two distinct chars for purpose of len and slicing. > > For wide builds, all characters are 4 bytes and count as a single > char for len and slicing. > > Hope this helps, Thank you, that nicely summarizes and confirms what I thought I knew about wide versus narrow build. And as I said, using the names UCS-2/UCS-4 would only *confuse* that understanding, not clarify it. -- R. David Murray www.bitdance.com From alexander.belopolsky at gmail.com Sun Nov 21 23:13:22 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 21 Nov 2010 17:13:22 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CE6EF91.1040803@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6EF91.1040803@v.loewis.de> Message-ID: On Fri, Nov 19, 2010 at 4:43 PM, "Martin v. L?wis" wrote: >> In my opinion, the question is more what was it not fixed in Python2. I suppose >> that the answer is something ugly like "backward compatibility" or "historical >> reasons" :-) > > No, there was a deliberate decision to not support that, see > > http://www.python.org/dev/peps/pep-0261/ > > There had been a long discussion on this specific detail when PEP 261 > was written, and in the end, an explicit, deliberate, considered > decision was made to raise a ValueError. > Yes, the existence of PEP 261 was one of the reasons I was surprised that a change like this was made without a deliberation. Personally, I've never used chr() or ord() other than on the python command prompt. Processing text one character at a time is just too slow in Python. So for my own use cases, the change is quite welcome. I also find that with bytes() items being int in 3.x more or less removes the need for ord(). On the other hand any 2.x program that uses unichr() and ord() is very likely to exhibit subtly buggy behavior when ported to 3.x. I don't think len(chr(i)) = 2 is likely to cause problems, but map(ord, s) not being an iterator over code points is likely to break naive programs. This is especially true because as far as I can tell there is no easy way to iterate over code points in a Python string on a narrow build. From merwok at netwok.org Mon Nov 22 01:54:34 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Mon, 22 Nov 2010 01:54:34 +0100 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS In-Reply-To: <20101121034404.52924F20A@mail.python.org> References: <20101121034404.52924F20A@mail.python.org> Message-ID: <4CE9BF4A.1020302@netwok.org> > Author: nick.coghlan > New Revision: 86633 > > Issue #10220: Add inspect.getgeneratorstate(). Initial patch by Rodolpho Eckhardt > > Modified: python/branches/py3k/Doc/library/inspect.rst > ============================================================================== > --- python/branches/py3k/Doc/library/inspect.rst (original) > +++ python/branches/py3k/Doc/library/inspect.rst Sun Nov 21 04:44:04 2010 > @@ -620,3 +620,25 @@ > # in which case the descriptor itself will > # have to do > pass > + > +Current State of a Generator > +---------------------------- > + > +When implementing coroutine schedulers and for other advanced uses of > +generators, it is useful to determine whether a generator is currently > +executing, is waiting to start or resume or execution, or has already > +terminated. func:`getgeneratorstate` allows the current state of a > +generator to be determined easily. > + > +.. function:: getgeneratorstate(generator) > + > + Get current state of a generator-iterator. > + > + Possible states are: > + GEN_CREATED: Waiting to start execution. > + GEN_RUNNING: Currently being executed by the interpreter. > + GEN_SUSPENDED: Currently suspended at a yield expression. > + GEN_CLOSED: Execution has completed. I wonder if those shouldn?t be marked up as :data: or something to make them indexed. From v+python at g.nevcal.com Mon Nov 22 04:59:54 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sun, 21 Nov 2010 19:59:54 -0800 Subject: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4 In-Reply-To: <20101121171821.195552194AC@kimball.webabinitio.net> References: <4CE7452A.7050109@g.nevcal.com> <4CE7B34D.4020309@netwok.org> <4CE8111F.9060502@g.nevcal.com> <4CE8CFCD.4040906@g.nevcal.com> <20101121171821.195552194AC@kimball.webabinitio.net> Message-ID: <4CE9EABA.1090306@g.nevcal.com> On 11/21/2010 9:18 AM, R. David Murray wrote: > I want to look at the CGI issue, but I'm not sure when I'll get to it. Actually, since this code was working before 3.x, and if email.parser can now accept binary streams, it seems like maybe the only thing that might be wrong is that presently it is getting a text stream instead, so that is something cgi.py or the application program would have to switch, and then maybe some testing would discover correctness, or maybe a specification of UTF-8 as the encoding to use for the text parts would have to be done. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmurray at bitdance.com Mon Nov 22 05:39:57 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 21 Nov 2010 23:39:57 -0500 Subject: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4 In-Reply-To: <4CE9EABA.1090306@g.nevcal.com> References: <4CE7452A.7050109@g.nevcal.com> <4CE7B34D.4020309@netwok.org> <4CE8111F.9060502@g.nevcal.com> <4CE8CFCD.4040906@g.nevcal.com> <20101121171821.195552194AC@kimball.webabinitio.net> <4CE9EABA.1090306@g.nevcal.com> Message-ID: <20101122043957.2A5D6235C7A@kimball.webabinitio.net> On Sun, 21 Nov 2010 19:59:54 -0800, Glenn Linderman wrote: > On 11/21/2010 9:18 AM, R. David Murray wrote: > > I want to look at the CGI issue, but I'm not sure when I'll get to it. > > Actually, since this code was working before 3.x, and if email.parser > can now accept binary streams, it seems like maybe the only thing that > might be wrong is that presently it is getting a text stream instead, so > that is something cgi.py or the application program would have to > switch, and then maybe some testing would discover correctness, or maybe > a specification of UTF-8 as the encoding to use for the text parts would > have to be done. Well, given the bytes/string split in Python3, code definitely has to be changed to make this work, since you have to explicitly call bytes processing routines (message_from_bytes, message_from_binary_file, BytesFeedparser, etc) to parse binary data, and likewise use BytesGenerator to emit binary data. -- R. David Murray www.bitdance.com From brian.curtin at gmail.com Mon Nov 22 06:14:24 2010 From: brian.curtin at gmail.com (Brian Curtin) Date: Sun, 21 Nov 2010 23:14:24 -0600 Subject: [Python-Dev] Bug week-end on the 20th-21st? In-Reply-To: <20101025220401.0406722b@pitrou.net> References: <20101023190828.47b7f03e@pitrou.net> <20101025153242.2FBEC219F92@kimball.webabinitio.net> <20101025220401.0406722b@pitrou.net> Message-ID: On Mon, Oct 25, 2010 at 15:04, Antoine Pitrou wrote: > On Mon, 25 Oct 2010 11:32:42 -0400 > "R. David Murray" wrote: > > On Mon, 25 Oct 2010 12:22:24 -0200, Rodrigo Bernardo Pimentel < > rbp at isnomore.net> wrote: > > >> Am 23.10.2010 19:08, schrieb Antoine Pitrou: > > >>> The first 3.2 beta is scheduled by Georg for November 13th. > > >>> What would you think of scheduling a bug week-end one week later, > that > > >>> is on November 20th and 21st? We would need enough core developers to > > >>> be available on #python-dev. > > > > > >FWIW, I'm +1, and I'll try to get the Sao Paulo users group to > participate. > > > > I think this is a great idea (both Antoine's initial suggestion and the > > idea of getting users groups to participate). > > > > I'll be around and able to participate that weekend except for evening > > US Eastern time. > > Ok, so 20th-21st of November it shall be! > > Regards > > Antoine. Although a few time zones are still celebrating Bug Weekend, it looks like at least 76 bugs got closed out [0]. Some of those happened thanks to a number of first time contributors. Thanks to everyone for their efforts! [0] http://bugs.python.org/issue?%40columns=title&%40columns=id&activity=from+2010-11-20+to+2010-11-22&%40columns=activity&%40sort=activity&%40group=priority&status=2&%40columns=status&%40pagesize=50&%40startwith=0&%40action=search -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Mon Nov 22 06:28:13 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 22 Nov 2010 14:28:13 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CE96A40.1050705@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE96A40.1050705@v.loewis.de> Message-ID: <87ipzqc4gi.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > Chapter and verse? Unicode 5.0, Chapter 3, verse C9: When a process generates a code unit sequence which purports to be in a Unicode character encoding form, it shall not emit ill-formed code sequences. I think anything called "UTF-8 something" is likely to be taken to "purport". Furthermore, users don't necessarily see which error handlers are being used. A user who specifies "utf8" as the output codec is likely to be rather surprised if non-UTF-8 is emitted because the app specified surrogateescape. Eg, consider a script which munges file descriptions into reasonable-length file names on Unix. Yes, technically the non-Unicode output is the app's fault, but I expect many users will put some blame on Python. I am in full agreement with you about the technicalities, but I am looking for ways to clue in users that (a) the technicalities matter, and (b) that Python does a *very* good job of making things as safe as possible without becoming unable to handle bytes. I think "wide" vs. "narrow" fails at both. It focuses on storage issues, which of course are important, but at the cost of ignoring the fact that for users of non-BMP characters 32-bit code units are much safer. Users who need non-BMP characters are relatively few, and at least at the present time most are painfully aware of the need to care for technicalities. I expect them to be pleasantly surprised by how easy it is to get reasonably safe behavior even from a 16-bit build. > > Python's internal coding does not conform to UTF-16, and that internal > > coding can, under certain conditions, escape to the outside world as > > invalid "Unicode" output. > > I'm fairly certain there are provisions in the Unicode standard for such > behavior (taking into account "certain conditions"). Sure. There's nothing in the Unicode standard that says you have to conform to it unless you claim to conform to it. So it is valid to say that Python's Unicode codecs without surrogateescape do conform. The point is that Python does not, even if all of the input is valid Unicode, because of the provision of surrogateescape and the lack of Unicode conformance-checking for certain internal functionality like chr() and slicing. You can say "we don't make any such claim", but IMO the distinction in question is too fine a point for most users, and requires a very large amount of Unicode knowledge (not to mention standards geekiness) to even understand the precise statement. "Unicode support" to users should mean that Python does the right thing, not that if you look hard enough in the documentation you will discover that Python doesn't claim to do the right thing even though in practice it mostly does. IMO, "UCS-2" is a pretty good description of what the user can leave up to Python in perfect safety. RDM's reply worries me a little, but I'll reply to his message separately. > *Any* Unicode implementation will do that, since they all have to > support legacy encodings in some form. This is certainly conforming to > the Unicode standard, and in fact one of the primary Unicode design > principles. No. Support for legacy encodings takes you outside of the realm of Unicode conformance by definition. Their names tell you that, however. "UTF-8 with surrogate escapes" on the other hand is an entirely different kettle of fish. It pretends to be UTF-8, but isn't. I think that users who give Python valid input should be able to expect valid output, but they can't. Chapter 3, verse C7: When a process purports not to modify the interpretation of a valid coded character sequence, it shall make no change to that coded character sequence other than the possible replacement of character sequences by their canonical-equivalent sequences, or the deletion of *noncharacter* code points. Sure, you can tell users the truth: "Python may modify your Unicode characters if you slice or index Unicode strings. It may even silently turn them into invalid codes which will eventually raise Errors." Then you are conformant, but why would anyone want to use such a program? If you tell them "UCS-2[sic] Python is safe to use with *no* extra care if you use only UCS-2 [or BMP] characters", suddenly Python looks very nice indeed again. "UCS-4" Python is even better; all you have to do is to avoid surrogateescape codecs. However, you're still vulnerable to hard-to-diagnose errors at the output stage in case of program bugs, because not enough checking of values is done by Python itself. > > A Unicode-conforming Python implementation would error at the > > chr() call, or perhaps would not provide surrogateescape error > > handlers. > > Chapter and verse? Chapter 3, verse C9 again. > > "Although practicality beats purity." > > The Unicode standard itself is based on practicality. It wouldn't > have received the success it did if it was based on purity only > (and indeed, was often rejected in cases where it put purity over > practicality, e.g. with the Hangul syllables). Python practicality is very different from Unicode practicality. From v+python at g.nevcal.com Mon Nov 22 06:40:22 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sun, 21 Nov 2010 21:40:22 -0800 Subject: [Python-Dev] is this a bug? no environment variables Message-ID: <4CEA0246.9080607@g.nevcal.com> In reviewing my notes from my experimentations with CGIHTTPServer (Python2.6) and then http.server (Python 3.2a4), I note one behavior I haven't reported as a bug, nor do I know where to start to figure it out, other than experimentally. The experiment: launching CGIHTTPServer without environment variables, by the simple expedient of using a batch file to unset all the existing environment variables, and then launching Python2.6 with CGIHTTPServer. So it failed early: random.py fails at line 110 (Python 2.6). I suppose it is possible that some environment variables are used by Python directly (but I can't seem to find a documented list of them) although I would expect that usage to be optional, with fall-back defaults when they don't exist. I suppose it is even possible that some Windows APIs might depend on some environment variables, but I expected that the registry had replaced such usage completely, by now, with the environment variables mostly being a convenience tool for batch files, or for optional, temporary alteration of particular settings. If anyone knows of documentation listing what environment variables are required by Python on Windows, I would appreciate a pointer, searches and doc browsing having not turned it up. I'll attempt to recreate the test situation later this week with Python 3.2a4, if no one responds, but the only debug technique I can think of is to slowly remove environment variables until I find the minimum set required to run http.server successfully for my tests with CGI files. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Mon Nov 22 07:14:46 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 22 Nov 2010 15:14:46 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <20101121173825.B1BFB235977@kimball.webabinitio.net> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> Message-ID: <87hbf9dgvd.fsf@uwakimon.sk.tsukuba.ac.jp> R. David Murray writes: > I'm sorry, but I have to disagree. As a relative unicode ignoramus, > "UCS-2" and "UCS-4" convey almost no information to me, and the bits I > have heard about them on this list have only confused me. OK, point taken. > On the other hand, I understand that 'narrow' means that fewer > bytes are used for each internal character, meaning that some > unicode characters need to be represented by more than one string > element, and thus that slicing strings containing such characters > on a narrow build causes problems. Now, you could tell me the same > information using the terms 'UCS-2' and 'UCS-4' instead of 'narrow' > and 'wide', but to my ear 'narrow' and 'wide' convey a better gut > level feeling for what is going on than 'UCS-2' and 'UCS-4' do. I think that is probably conditioned by your long experience with Python's Unicode features, specifically the knowledge that Python's Unicode strings are not arrays of characters, which often is referred to on this list. My guess is that very few newbies would know that, and it is not implied by "narrow". For example, both Emacs (for sure) and Perl (IIUC) index strings of variable-width character by characters (at great expense of performance in Emacs, at least), not as code units. > And it avoids any question of whether or not Python's internal > representation actually conforms to whatever standard it is that > UCS refers to, a point on which there seems to be some dissension. UCS-2 refers to ISO 10646, Annex 1 IIRC.[1] Anyway, it's somewhere in ISO 10646. I don't think there's actually dissension on conformance to UCS-2, as that's very easy to achieve. Rather, Guido explicitly pronounced that Python processes arrays of code units, not characters. My point is that if you pretend that Python is processing *characters* according to UCS-2 rules for characters, you'll always come to the same conclusion about what Python will do as if you use the technically correct terminology of code units. (At least for the BMP and UTF-16 private areas. There will necessarily be some confusion about surrogates, since in UCS-2 they are characters while in UTF-16 they're merely "code points", and the Unicode characters they represent can't be represented at all in UCS-2.) > Indeed, reading that article with my limited unicode knowledge, if > I were told Python used UCS-2, I would assume that non-BMP > characters could not be processed by a Python narrow build. Actually, I'm almost happy with that. That is, the precise formulation is "could not be processed *safely without extra care* by a Python narrow build." Specifically, AFAIK if you range check characters that have been indexed out of a string, or are located at slice boundaries, or produced by chr() or a surrogateescape input codec, you're safe. But practically speaking few apps will actually do those checks and therefore they are unsafe: processing non-BMP characters can easily lead to show-stopping Exceptions. It's very analogous to the kind of show-stopping "bad character in a header" exception that plagued Mailman for so long, and had to be fixed on a case-by-case basis. But the restriction to BMP characters is much more reasonable (at least for now) than RFC 822's restriction to ASCII! But evidently you take it much more stringently. So the question is, "what fraction of developers who think as you do would therefore be put off from using Python to build their applications?" If most would say "OK, we'll stick with BMP for now and use UCS-4 or some hack to deal with extended characters later -- it can't really be true that it's absolutely impossible to use non-BMP characters," I don't mind that misunderstanding. OTOH, yes, it would be bad if the use of "UCS-2" were to imply to more than a couple of developers that 16-bit builds of Python can't handle UTF-16 *at all*. Footnotes: [1] It simply says "we have a subset of the Unicode character set all of whose code points can be represented in 16 bits, excluding 0xFFFF." It goes on to define a private area, reserved for use by applications that will never be standardized, and it says that if you don't know what a code point in the character area is, don't change it (you can delete it, however). ISTR that a later Amendment added 0xFFFE to the short-list of non-characters. The surrogate area was taken out of the private area, so a UCS-2 application will simply consider each surrogate to be an unknown character and pass it through unchanged -- unless it deletes it, or inserts other characters between the code points of a surrogate pair. And that's why UCS-2 isn't UTF-16 conforming -- which is basically why Python isn't either. From martin at v.loewis.de Mon Nov 22 09:20:59 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 22 Nov 2010 09:20:59 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87ipzqc4gi.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE96A40.1050705@v.loewis.de> <87ipzqc4gi.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEA27EB.8000104@v.loewis.de> > Unicode 5.0, Chapter 3, verse C9: > > When a process generates a code unit sequence which purports to be > in a Unicode character encoding form, it shall not emit ill-formed > code sequences. > > > A Unicode-conforming Python implementation would error at the > > > chr() call, or perhaps would not provide surrogateescape error > > > handlers. > > > > Chapter and verse? > > Chapter 3, verse C9 again. I agree that the surrogateescape error handler is non-conforming, but, as you say, it doesn't claim to, either (would your concern about utf-8 being misleading here been resolved if the thing had been called "utf-8b"?) More interestingly (and to the subject) is chr: how did you arrive at C9 banning Python3's definition of chr? This chr function puts the code sequence into well-formed UTF-16; that's the whole point of UTF-16. Regards, Martin From stephen at xemacs.org Mon Nov 22 11:47:09 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 22 Nov 2010 19:47:09 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEA27EB.8000104@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE96A40.1050705@v.loewis.de> <87ipzqc4gi.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA27EB.8000104@v.loewis.de> Message-ID: <87fwutd49e.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > More interestingly (and to the subject) is chr: how did you arrive > at C9 banning Python3's definition of chr? This chr function puts > the code sequence into well-formed UTF-16; that's the whole point of > UTF-16. No, it doesn't, in the specific case of surrogate code points. In 3.1.2 from MacPorts on a iBook G4 and from Gentoo on AMD64, chr(0xd800) returns "\ud800". I don't know if that's by design (eg, so that it can be used in the implementation of the surrogateescape error handler) or a correctable oversight, but it's not conformant. From stephen at xemacs.org Mon Nov 22 11:48:42 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 22 Nov 2010 19:48:42 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> Message-ID: <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Raymond Hettinger writes: > Neither UTF-16 nor UCS-2 is exactly correct anyway. >From a standards lawyer point of view, UCS-2 is exactly correct, as far as I can tell upon rereading ISO 10646-1, especially Annexes H ("retransmitting devices") and Q ("UTF-16"). Annex Q makes it clear that UTF-16 was intentionally designed so that Python-style processing could be done in a UCS-2 context. > For the "wide" build, the entire range of unicode is encoded at > 4 bytes per character and slicing/len operate correctly since > every character is the same length. This used to be called UCS-4 > and is now UTF-32. That's inaccurate, I believe. UCS-4 is not a UTF, and doesn't satisfy the range restrictions of a UTF. > So, with "wide" builds there isn't much confusion (except perhaps > unfamiliar terminology). The real issue seems to be that for > "narrow" builds, none of the usual encoding names is exactly > correct. I disagree. I do see a problem with "UCS-2", because it fails to tell us that Python implements a large number of features that make it easy to do a very good job of working with non-BMP data in 16-bit builds of Python, with no extra effort. Python is not perfect, and (rarely) some of the imperfections may be very distressing. But it's very good, and deserves to be advertised as such. However, I don't see how "narrow" tells us more than "UCS-2" does. If "UCS-2" is equally (or more) informative, I prefer it because it is the technically precise, already well-defined, term. > From a users point-of-view, the actual encoding or encoding name > doesn't matter much. They just need to be able to predict the relevant > behaviors (memory consumption and len/slicing behavior). "UCS-2" indicates those behaviors precisely and concisely. The problems are (a) the lack of familiarity of users with this term, if David is reasonably representative, and (b) the fact that it fails to advertise Python's UTF-16 capabilities. "Narrow" suffers from both of those problems, and further from the fact that it has no independent standard definition. Furthermore, "wide" has a very widespread, platform-dependent meaning derived from wchar_t. If we have to document what the terms we choose mean anyway, why not document the existing terms and reduce entropy, rather than invent new ones and increase entropy? From martin at v.loewis.de Mon Nov 22 12:22:35 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 22 Nov 2010 12:22:35 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87fwutd49e.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE96A40.1050705@v.loewis.de> <87ipzqc4gi.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA27EB.8000104@v.loewis.de> <87fwutd49e.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEA527B.4030002@v.loewis.de> Am 22.11.2010 11:47, schrieb Stephen J. Turnbull: > "Martin v. L?wis" writes: > > > More interestingly (and to the subject) is chr: how did you arrive > > at C9 banning Python3's definition of chr? This chr function puts > > the code sequence into well-formed UTF-16; that's the whole point of > > UTF-16. > > No, it doesn't, in the specific case of surrogate code points. In > 3.1.2 from MacPorts on a iBook G4 and from Gentoo on AMD64, > chr(0xd800) returns "\ud800". Ah, I see - this is *not* the subject's issue, right? > > I don't know if that's by design (eg, so that it can be used in the > implementation of the surrogateescape error handler) or a correctable > oversight, but it's not conformant. I disagree: Quoting from Unicode 5.0, section 5.4: # The individual components of implementations may have different # levels of support for surrogates, as long as those components are # assembled and communicate correctly. Low-level string processing, # where a Unicode string is not interpreted but is handled simply as an # array of code units, may ignore surrogate pairs. With such strings, # for example, a truncation operation with an arbitrary offset might # break a surrogate pair. (For further discussion, see Section 2.7, # Unicode Strings.) For performance in string operations, such behavior # is reasonable at a low level, but it requires higher-level processes # to ensure that offsets are on character boundaries so as to guarantee # the integrity of surrogate pairs. So lower-level routines (which I claim chr() is one) are allowed to create lone surrogates. The formal requirement behind this is C1: # A process shall not interpret a high-surrogate code point or a # low-surrogate code point as an abstract character. I also claim that Python, in both narrow and wide mode, conforms to this requirement. Notice that the requirement is a ban on interpreting the code point as a character. In particular, unicodedata.category claims that the code point is of class Cs (surrogate), which I consider conforming. By the same line of reasoning, it is also OK that chr() allows the creation of unassigned code points, even though C2 says that they must not be interpreted as abstract characters. The rationale for supporting these characters in chr() goes back much further than the surrogateescape handler - as Python unicode strings are sequences of code points, it would be impractical if you couldn't create some of them, or even would have to consult the UCD before determining whether they can be created. Regards, Martin From martin at v.loewis.de Mon Nov 22 12:43:00 2010 From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 22 Nov 2010 12:43:00 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEA5744.3080308@v.loewis.de> Am 22.11.2010 11:48, schrieb Stephen J. Turnbull: > Raymond Hettinger writes: > > > Neither UTF-16 nor UCS-2 is exactly correct anyway. > >>From a standards lawyer point of view, UCS-2 is exactly correct, as > far as I can tell upon rereading ISO 10646-1, especially Annexes H > ("retransmitting devices") and Q ("UTF-16"). Annex Q makes it clear > that UTF-16 was intentionally designed so that Python-style processing > could be done in a UCS-2 context. I could only find the FCD of 10646:2010, where annex H was integrated into section 10: http://www.itscj.ipsj.or.jp/sc2/open/02n4125/FCD10646-Main.pdf There they have stopped using the term UCS-2, and added a note # NOTE ? Former editions of this standard included references to a # two-octet BMP form called UCS-2 which would be a subset # of the UTF-16 encoding form restricted to the BMP UCS scalar values. # The UCS-2 form is deprecated. I think they are now acknowledging that UCS-2 was a misleading term, making it ambiguous whether this refers to a CCS, a CEF, or a CES; like "ASCII", people have been using it for all three of them. Apparently, the ISO WG interprets earlier revisions as saying that UCS-2 is a CEF that restricted UTF-16 to the BMP. THIS IS NOT WHAT PYTHON DOES. In a narrow Python build, the character set is *not* restricted to the BMP. Instead, Unicode strings are meant to be interpreted (by applications) as UTF-16. > > For the "wide" build, the entire range of unicode is encoded at > > 4 bytes per character and slicing/len operate correctly since > > every character is the same length. This used to be called UCS-4 > > and is now UTF-32. > > That's inaccurate, I believe. UCS-4 is not a UTF, and doesn't satisfy > the range restrictions of a UTF. Not sure what it says in your copy; in mine, section 9.3 says # 9.3 UTF-32 (UCS-4) # UTF-32 (or UCS-4) is the UCS encoding form that assigns each UCS # scalar value to a single unsigned 32-bit code unit. The terms UTF-32 # and UCS-4 can be used interchangeably to designate this encoding # form. so they (now) view the two as synonyms. I think that when ISO 10646 started, they were also fairly confused about these issues (as the group/plane/row/cell structure demonstrates, IMO). This is not surprising, since the notion of byte-based character sets had been ingrained for so long. It took 20 years to learn that a UCS scalar value really is *not* a sequence of bytes, but a natural number. > However, I don't see how "narrow" tells us more than "UCS-2" does. If > "UCS-2" is equally (or more) informative, I prefer it because it is > the technically precise, already well-defined, term. But it's not. It is a confusing term, one that the relevant standards bodies are abandoning. After reading FCD 10646:2010, I could agree to call the two implementations UTF-16 and UTF-32 (as these terms designate CEFs). Unfortunately, they also designate CESs. > If we have to document what the terms we choose mean anyway, why not > document the existing terms and reduce entropy, rather than invent new > ones and increase entropy? Because the proposed existing term is deprecated. Regards, Martin From mal at egenix.com Mon Nov 22 13:47:29 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 22 Nov 2010 13:47:29 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEA5744.3080308@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> Message-ID: <4CEA6661.4080402@egenix.com> Martin, it is really irrelevant whether the standards have decided to no longer use the terms UCS-2 and UCS-4 in their latest standard documents. The definitions still stand (just like Unicode 2.0 is still a valid standard, even if it's ten years old): * UCS-2 is defined as "Universal Character Set coded in 2 octets" by ISO 10464: (see http://www.unicode.org/versions/Unicode5.2.0/appC.pdf) * UCS-4 is defined as "Universal Character Set coded in 4 octets" by ISO 10464. Those two terms have been in use for many years. They refer to the Unicode character set as it can be represented in 2 or 4 bytes. As such they don't include any of the special meanings associated with the UTF transfer encodings. There are no invalid sequences, no invalid code points, etc. as you can find in the UTF encodings. And that's an important detail. If you interpret them as encodings, they are 1-1 mappings of Unicode code point ordinals to integers represented using 2 or 4 bytes. UCS-2 only supports BMP code points and can conveniently be interpreted as UTF-16, if you need to encode non-BMP code points (which we do in the UTF codecs). UCS-4 also supports non-BMP code points directly. Now, from a ISO or Unicode Consortium point of view, deprecating the term UCS-2 in *their* standard papers is only natural, since they are actively starting to assign non-BMP code points which cannot be represented in UCS-2. However, this deprecation is only relevant for the purpose of defining the standard. The above definitions are still useful when it comes to defining code units, i.e. the used storage format, (as opposed to the transfer format). For the purpose of describing the code units we are using in Python they are (still) the most correct terms and that's also the reason why we chose to use them when introducing the configure options in Python2. There are no other accurate definitions we could use. The terms "narrow" and "wide" are simply too inaccurate to be used as description of UCS-2 and UCS-4 code units. Please also note that we have used the terms UCS-2 and UCS-4 in Python2 for 9+ years now and users are just starting to learn the difference and get acquainted with the fact that Python uses these two forms. Confronting them with "narrow" and "wide" builds is only going to cause more confusion, not less, and adding those strings to Python package files isn't going to help much either, since the terms don't convey any relationship to Unicode: package-3.1.3.linux-x86_64-py2.6_ucs2.egg vs. package-3.1.3.linux-x86_64-py2.6_narrow.egg I opt for switching to the following config options: --with-unicode=ucs2 (default) --with-unicode=ucs4 and using "UCS-2" and "UCS-4" in the Python documentation when describing the two different build modes. We can add glossary entries for the two which clarify the differences. Python2 used --enable-unicode=ucs2/ucs4, but since Python3 doesn't build without Unicode support, the above two versions appear more appropriate. We can keep the alternative --with-wide-unicode as an alias for --with-unicode=ucs4 to maintain 3.x backwards compatibility. Cheers, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 22 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ "Martin v. L?wis" wrote: > Am 22.11.2010 11:48, schrieb Stephen J. Turnbull: >> Raymond Hettinger writes: >> >> > Neither UTF-16 nor UCS-2 is exactly correct anyway. >> >> >From a standards lawyer point of view, UCS-2 is exactly correct, as >> far as I can tell upon rereading ISO 10646-1, especially Annexes H >> ("retransmitting devices") and Q ("UTF-16"). Annex Q makes it clear >> that UTF-16 was intentionally designed so that Python-style processing >> could be done in a UCS-2 context. > > I could only find the FCD of 10646:2010, where annex H was integrated > into section 10: > > http://www.itscj.ipsj.or.jp/sc2/open/02n4125/FCD10646-Main.pdf > > There they have stopped using the term UCS-2, and added a note > > # NOTE ? Former editions of this standard included references to a > # two-octet BMP form called UCS-2 which would be a subset > # of the UTF-16 encoding form restricted to the BMP UCS scalar values. # > The UCS-2 form is deprecated. > > I think they are now acknowledging that UCS-2 was a misleading term, > making it ambiguous whether this refers to a CCS, a CEF, or a CES; > like "ASCII", people have been using it for all three of them. > > Apparently, the ISO WG interprets earlier revisions as saying that > UCS-2 is a CEF that restricted UTF-16 to the BMP. THIS IS NOT WHAT > PYTHON DOES. In a narrow Python build, the character set is *not* > restricted to the BMP. Instead, Unicode strings are meant to be > interpreted (by applications) as UTF-16. > >> > For the "wide" build, the entire range of unicode is encoded at >> > 4 bytes per character and slicing/len operate correctly since >> > every character is the same length. This used to be called UCS-4 >> > and is now UTF-32. >> >> That's inaccurate, I believe. UCS-4 is not a UTF, and doesn't satisfy >> the range restrictions of a UTF. > > Not sure what it says in your copy; in mine, section 9.3 says > > # 9.3 UTF-32 (UCS-4) > # UTF-32 (or UCS-4) is the UCS encoding form that assigns each UCS > # scalar value to a single unsigned 32-bit code unit. The terms UTF-32 # > and UCS-4 can be used interchangeably to designate this encoding > # form. > > so they (now) view the two as synonyms. > > I think that when ISO 10646 started, they were also fairly confused > about these issues (as the group/plane/row/cell structure demonstrates, > IMO). This is not surprising, since the notion of byte-based character > sets had been ingrained for so long. It took 20 years to learn that > a UCS scalar value really is *not* a sequence of bytes, but a natural > number. > >> However, I don't see how "narrow" tells us more than "UCS-2" does. If >> "UCS-2" is equally (or more) informative, I prefer it because it is >> the technically precise, already well-defined, term. > > But it's not. It is a confusing term, one that the relevant standards > bodies are abandoning. After reading FCD 10646:2010, I could agree to > call the two implementations UTF-16 and UTF-32 (as these terms > designate CEFs). Unfortunately, they also designate CESs. > >> If we have to document what the terms we choose mean anyway, why not >> document the existing terms and reduce entropy, rather than invent new >> ones and increase entropy? > > Because the proposed existing term is deprecated. > > Regards, > Martin From foom at fuhm.net Mon Nov 22 15:18:02 2010 From: foom at fuhm.net (James Y Knight) Date: Mon, 22 Nov 2010 09:18:02 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEA6661.4080402@egenix.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> Message-ID: Why don't ya'll just call them "--unichar-width=16/32". That describes precisely what the options do, and doesn't invite any quibbling over definitions. James From ncoghlan at gmail.com Mon Nov 22 16:14:46 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 23 Nov 2010 01:14:46 +1000 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS In-Reply-To: <4CE9BF4A.1020302@netwok.org> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> Message-ID: On Mon, Nov 22, 2010 at 10:54 AM, ?ric Araujo wrote: >> +.. function:: getgeneratorstate(generator) >> + >> + ? ?Get current state of a generator-iterator. >> + >> + ? ?Possible states are: >> + ? ? ?GEN_CREATED: Waiting to start execution. >> + ? ? ?GEN_RUNNING: Currently being executed by the interpreter. >> + ? ? ?GEN_SUSPENDED: Currently suspended at a yield expression. >> + ? ? ?GEN_CLOSED: Execution has completed. > > I wonder if those shouldn?t be marked up as :data: or something to make > them indexed. The same definitions are in the docstrings, and they're just integer constants so I'm not sure why anyone would be looking them up directly. Still, if someone with greater Sphinx-fu thinks additional markup would be helpful, I have no problem with them adding it :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Mon Nov 22 16:19:04 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 22 Nov 2010 15:19:04 +0000 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> Message-ID: <4CEA89E8.5090107@voidspace.org.uk> On 22/11/2010 15:14, Nick Coghlan wrote: > On Mon, Nov 22, 2010 at 10:54 AM, ?ric Araujo wrote: >>> +.. function:: getgeneratorstate(generator) >>> + >>> + Get current state of a generator-iterator. >>> + >>> + Possible states are: >>> + GEN_CREATED: Waiting to start execution. >>> + GEN_RUNNING: Currently being executed by the interpreter. >>> + GEN_SUSPENDED: Currently suspended at a yield expression. >>> + GEN_CLOSED: Execution has completed. >> I wonder if those shouldn?t be marked up as :data: or something to make >> them indexed. > The same definitions are in the docstrings, and they're just integer > constants so I'm not sure why anyone would be looking them up > directly. Still, if someone with greater Sphinx-fu thinks additional > markup would be helpful, I have no problem with them adding it :) > Why not use string constants instead? You lose comparability (less than / greater than) but gain readability. Comparability may be a requirement - of course if Python had an Enum type we could use that and have both. Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From ncoghlan at gmail.com Mon Nov 22 16:37:21 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 23 Nov 2010 01:37:21 +1000 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEA6661.4080402@egenix.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> Message-ID: On Mon, Nov 22, 2010 at 10:47 PM, M.-A. Lemburg wrote: > Please also note that we have used the terms UCS-2 and UCS-4 in Python2 > for 9+ years now and users are just starting to learn the difference > and get acquainted with the fact that Python uses these two forms. > > Confronting them with "narrow" and "wide" builds is only > going to cause more confusion, not less, and adding those > strings to Python package files isn't going to help much either, > since the terms don't convey any relationship to Unicode: I was personally surprised to learn in this discussion that there had even been an *attempt* to change the names of the two build variants to anything other than UCS2/UCS4. The concrete API implementations certainly still use those two terms to prevent inadvertent linkage with the wrong version of the C API. For practical purposes, UCS2/UCS4 convey far more inherent information than narrow/wide: - many developers will recognise them as Unicode related, even if they don't know exactly what they mean - even those that don't recognise them, can soon learn that they're Unicode related just by plugging them into Google* - a bit more digging should reveal that they're Unicode storage formats closely related to the UTF-16 and UTF-32 transfer encodings respectively* *(The first Google hit for "ucs2" is the UTF-16/UCS-2 article on Wikipedia, the first hit for "ucs4" is the UTF-32/UCS-4 article) All that just armed with Google, without even looking at the Python docs specifically. So don't just think about "what will developers know?", also think about "what will developers know, and what will a quick trip to a search engine tell them?". And once you take that stance, the overly generic narrow/wide terms fail, badly. +1 for MAL's suggested tweaks to the Py3k configure options. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Mon Nov 22 16:37:22 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 22 Nov 2010 16:37:22 +0100 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> Message-ID: <20101122163722.7e96d123@pitrou.net> On Mon, 22 Nov 2010 15:19:04 +0000 Michael Foord wrote: > On 22/11/2010 15:14, Nick Coghlan wrote: > > On Mon, Nov 22, 2010 at 10:54 AM, ?ric Araujo wrote: > >>> +.. function:: getgeneratorstate(generator) > >>> + > >>> + Get current state of a generator-iterator. > >>> + > >>> + Possible states are: > >>> + GEN_CREATED: Waiting to start execution. > >>> + GEN_RUNNING: Currently being executed by the interpreter. > >>> + GEN_SUSPENDED: Currently suspended at a yield expression. > >>> + GEN_CLOSED: Execution has completed. > >> I wonder if those shouldn?t be marked up as :data: or something to make > >> them indexed. > > The same definitions are in the docstrings, and they're just integer > > constants so I'm not sure why anyone would be looking them up > > directly. Still, if someone with greater Sphinx-fu thinks additional > > markup would be helpful, I have no problem with them adding it :) > > > > Why not use string constants instead? You lose comparability (less than > / greater than) but gain readability. Comparability may be a requirement > - of course if Python had an Enum type we could use that and have both. +1. The problem with int constants is that the int gets printed, not the name, when you dump them for debugging purposes :) cheers Antoine. From ncoghlan at gmail.com Mon Nov 22 16:45:28 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 23 Nov 2010 01:45:28 +1000 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS In-Reply-To: <4CEA89E8.5090107@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> Message-ID: On Tue, Nov 23, 2010 at 1:19 AM, Michael Foord wrote: > On 22/11/2010 15:14, Nick Coghlan wrote: >> On Mon, Nov 22, 2010 at 10:54 AM, ?ric Araujo ?wrote: >>>> + ? ?Possible states are: >>>> + ? ? ?GEN_CREATED: Waiting to start execution. >>>> + ? ? ?GEN_RUNNING: Currently being executed by the interpreter. >>>> + ? ? ?GEN_SUSPENDED: Currently suspended at a yield expression. >>>> + ? ? ?GEN_CLOSED: Execution has completed. >>> >>> I wonder if those shouldn?t be marked up as :data: or something to make >>> them indexed. >> >> The same definitions are in the docstrings, and they're just integer >> constants so I'm not sure why anyone would be looking them up >> directly. Still, if someone with greater Sphinx-fu thinks additional >> markup would be helpful, I have no problem with them adding it :) >> > > Why not use string constants instead? You lose comparability (less than / > greater than) but gain readability. Comparability may be a requirement - of > course if Python had an Enum type we could use that and have both. With only 4 states, comparability isn't really necessary. I'm just so used to using the range() trick as a replacement for the lack of proper Enum type that using strings instead didn't even occur to me. The lack of printability did bother me a bit, so yeah, +1 from me as well (I've reopened the relevant issue to remind me to change it before beta 1). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From alexander.belopolsky at gmail.com Mon Nov 22 17:03:47 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 22 Nov 2010 11:03:47 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> Message-ID: On Mon, Nov 22, 2010 at 10:37 AM, Nick Coghlan wrote: .. > *(The first Google hit for "ucs2" is the UTF-16/UCS-2 article on > Wikipedia, the first hit for "ucs4" is the UTF-32/UCS-4 article) > Do you think these articles are helpful for someone learning how to use chr() and ord() in Python for the first time? From hrvoje.niksic at avl.com Mon Nov 22 17:08:36 2010 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Mon, 22 Nov 2010 17:08:36 +0100 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS In-Reply-To: <20101122163722.7e96d123@pitrou.net> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> Message-ID: <4CEA9584.7040301@avl.com> On 11/22/2010 04:37 PM, Antoine Pitrou wrote: > +1. The problem with int constants is that the int gets printed, not > the name, when you dump them for debugging purposes :) Well, it's trivial to subclass int to something with a nicer __repr__. PyGTK uses that technique for wrapping C enums: >>> gtk.PREVIEW_GRAYSCALE >>> isinstance(gtk.PREVIEW_GRAYSCALE, int) True >>> gtk.PREVIEW_GRAYSCALE + 0 1 From ncoghlan at gmail.com Mon Nov 22 17:13:39 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 23 Nov 2010 02:13:39 +1000 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> Message-ID: On Tue, Nov 23, 2010 at 2:03 AM, Alexander Belopolsky wrote: > On Mon, Nov 22, 2010 at 10:37 AM, Nick Coghlan wrote: > .. >> *(The first Google hit for "ucs2" is the UTF-16/UCS-2 article on >> Wikipedia, the first hit for "ucs4" is the UTF-32/UCS-4 article) >> > > Do you think these articles are helpful for someone learning how to > use chr() and ord() in Python for the first time? No, that's what the documentation of chr() and ord() is for. For that use case, it doesn't matter *what* the terms are. They could say "in a FOO build this will do X, in a BAR build it will do Y, see for a detailed explanation of the differences between FOO and BAR builds of Python" and be perfectly adequate for the task. If there is no appropriate documentation link to point to (probably somewhere in the C API docs if it isn't anywhere else) then that is a key issue that needs to be fixed, rather than trying to change the terms that have been in use for the better part of a decade already. The raw meaning of UCS2/UCS4 mainly comes into the story when people are encountering this as a config option when building Python. The whole idea of changing the terms for the two build types *should* have been short circuited by the "status quo wins a stalemate" guideline, but apparently that didn't happen at the time. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Mon Nov 22 17:24:40 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 22 Nov 2010 17:24:40 +0100 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> Message-ID: <20101122172440.77d27ed5@pitrou.net> On Mon, 22 Nov 2010 17:08:36 +0100 Hrvoje Niksic wrote: > On 11/22/2010 04:37 PM, Antoine Pitrou wrote: > > +1. The problem with int constants is that the int gets printed, not > > the name, when you dump them for debugging purposes :) > > Well, it's trivial to subclass int to something with a nicer __repr__. > PyGTK uses that technique for wrapping C enums: Nice. It might be useful to add a private _Constant class somewhere for stdlib purposes. Regards Antoine. From guido at python.org Mon Nov 22 17:33:57 2010 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Nov 2010 08:33:57 -0800 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: <4CEA0246.9080607@g.nevcal.com> References: <4CEA0246.9080607@g.nevcal.com> Message-ID: On Sun, Nov 21, 2010 at 9:40 PM, Glenn Linderman wrote: > In reviewing my notes from my experimentations with CGIHTTPServer > (Python2.6) and then http.server (Python 3.2a4), I note one behavior I > haven't reported as a bug, nor do I know where to start to figure it out, > other than experimentally. > > The experiment: launching CGIHTTPServer without environment variables, by > the simple expedient of using a batch file to unset all the existing > environment variables, and then launching Python2.6 with CGIHTTPServer. > > So it failed early: random.py fails at line 110 (Python 2.6). What specific traceback do you get? In my copy of the code that line says a = long(_hexlify(_urandom(16)), 16) and I could just imagine that _urandom() fails for some reason to do with the environment (it is a reference to os.urandom()), which, being part of the C library code, might depend on the environment. But you're not giving enough info to debug this. > I suppose it is possible that some environment variables are used by Python > directly (but I can't seem to find a documented list of them) although I > would expect that usage to be optional, with fall-back defaults when they > don't exist. That is certainly the idea, but the fallbacks may not always be nice. Environment variables used by Python or the stdlib itself are supposed to be named PYTHON if they are Python-specific, and there's a way to disable all of these (-E). But there are other environment variables (HOME and PATH come to mind) that have a broader definition and that are used in some part of the stdlib. Plus, as I mentioned, who knows what the non-Python C library uses (well, somebody probably knows, but I don't know of a central source that we can actually trust across the many platforms where Python runs). > I suppose it is even possible that some Windows APIs might > depend on some environment variables, but I expected that the registry had > replaced such usage completely, by now, with the environment variables > mostly being a convenience tool for batch files, or for optional, temporary > alteration of particular settings. That sounds like wishful thinking. :-) > If anyone knows of documentation listing what environment variables are > required by Python on Windows, I would appreciate a pointer, searches and > doc browsing having not turned it up. > > I'll attempt to recreate the test situation later this week with Python > 3.2a4, if no one responds, but the only debug technique I can think of is to > slowly remove environment variables until I find the minimum set required to > run http.server successfully for my tests with CGI files. -- --Guido van Rossum (python.org/~guido) From fuzzyman at voidspace.org.uk Mon Nov 22 17:58:56 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 22 Nov 2010 16:58:56 +0000 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS In-Reply-To: <20101122172440.77d27ed5@pitrou.net> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> Message-ID: <4CEAA150.3020106@voidspace.org.uk> On 22/11/2010 16:24, Antoine Pitrou wrote: > On Mon, 22 Nov 2010 17:08:36 +0100 > Hrvoje Niksic wrote: >> On 11/22/2010 04:37 PM, Antoine Pitrou wrote: >>> +1. The problem with int constants is that the int gets printed, not >>> the name, when you dump them for debugging purposes :) >> Well, it's trivial to subclass int to something with a nicer __repr__. >> PyGTK uses that technique for wrapping C enums: > Nice. It might be useful to add a private _Constant class somewhere for > stdlib purposes. Why not just solve the problem properly and add it to the standard library... (Allowing for flag enums too that can be or'd together and still have a decent repr.) Michael > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From alexander.belopolsky at gmail.com Mon Nov 22 18:00:14 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 22 Nov 2010 12:00:14 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> Message-ID: On Mon, Nov 22, 2010 at 11:13 AM, Nick Coghlan wrote: .. >> Do you think these articles are helpful for someone learning how to >> use chr() and ord() in Python for the first time? > > No, that's what the documentation of chr() and ord() is for. For that > use case, it doesn't matter *what* the terms are. I recently updated chr() and ord() documentation and used "narrow/wide" terms. I thought USC2/4 proponents objected to that on the basis that these terms are imprecise. http://docs.python.org/dev/library/functions.html#chr http://docs.python.org/dev/library/functions.html#ord > They could say "in a > FOO build this will do X, in a BAR build it will do Y, see for > a detailed explanation of the differences between FOO and BAR builds > of Python" and be perfectly adequate for the task. If there is no > appropriate documentation link to point to (probably somewhere in the > C API docs if it isn't anywhere else) then that is a key issue that > needs to be fixed, rather than trying to change the terms that have > been in use for the better part of a decade already. > That's the point that I was trying to make. Using somewhat vague narrow/wide terms gives us an opportunity to describe exactly what is going on without confusing the reader with the intricacies of the Unicode Standard or Python'd compliance with a particular version of it. > The raw meaning of UCS2/UCS4 mainly comes into the story when people > are encountering this as a config option when building Python. The > whole idea of changing the terms for the two build types *should* have > been short circuited by the "status quo wins a stalemate" guideline, > but apparently that didn't happen at the time. > It also comes in the "Data model" reference section on String which is currently out of date: """ Strings The items of a string object are Unicode code units. A Unicode code unit is represented by a string object of one item and can hold either a 16-bit or 32-bit value representing a Unicode ordinal (the maximum value for the ordinal is given in sys.maxunicode, and depends on how Python is configured at compile time). Surrogate pairs may be present in the Unicode object, and will be reported as two separate items. The built-in functions chr() and ord() convert between code units and nonnegative integers representing the Unicode ordinals as defined in the Unicode Standard 3.0. Conversion from and to other encodings are possible through the string method encode(). """ http://docs.python.org/dev/reference/datamodel.html The out of date part is the reference to the Unicode Standard 3.0. I don't think we should refer to a specific version of Unicode here. It has little consequence for the "Python data model" and AFAICT does not come into play anywhere except unicodedata which is currently at version 6.0. The description of chr() and ord() is also not accurate on narrow builds and nether is the statement "The items of a string object are Unicode code units." From exarkun at twistedmatrix.com Mon Nov 22 17:46:54 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Mon, 22 Nov 2010 16:46:54 -0000 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS In-Reply-To: <20101122172440.77d27ed5@pitrou.net> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> Message-ID: <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> On 04:24 pm, solipsis at pitrou.net wrote: >On Mon, 22 Nov 2010 17:08:36 +0100 >Hrvoje Niksic wrote: >>On 11/22/2010 04:37 PM, Antoine Pitrou wrote: >> > +1. The problem with int constants is that the int gets printed, >>not >> > the name, when you dump them for debugging purposes :) >> >>Well, it's trivial to subclass int to something with a nicer __repr__. >>PyGTK uses that technique for wrapping C enums: > >Nice. It might be useful to add a private _Constant class somewhere for >stdlib purposes. http://www.python.org/dev/peps/pep-0354/ >Regards > >Antoine. > > >_______________________________________________ >Python-Dev mailing list >Python-Dev at python.org >http://mail.python.org/mailman/listinfo/python-dev >Unsubscribe: http://mail.python.org/mailman/options/python- >dev/exarkun%40twistedmatrix.com From ezio.melotti at gmail.com Mon Nov 22 18:14:03 2010 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Mon, 22 Nov 2010 19:14:03 +0200 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest Message-ID: <4CEAA4DB.6020904@gmail.com> I would like to re-enable by default warnings for regrtest and/or unittest. The reasons are: 1) these tools are used mainly by developers and they (should) care about warnings; 2) developers won't have to remember that warning are silenced and how to enable them manually; 3) developers won't have to enable them manually every time they run the tests; 4) some developers are not even aware that warnings have been silenced and might not notice things like DeprecationWarnings until the function/method/class/etc gets removed and breaks their code; 5) another developer tool -- the --with-pydebug flag -- already re-enables warnings when it's used; If this is fixed in unittest it won't be necessary to patch regrtest. If it's fixed in regrtest only the core developers will benefit from this. This could be fixed checking if any warning flags (-Wx) are passed to python. If no flags are passed the default will be -Wd, otherwise the behavior will be the one specified by the flag. This will allow developers to use `python -Wi` to ignore errors explicitly. Best Regards, Ezio Melotti From rdmurray at bitdance.com Mon Nov 22 18:30:29 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 22 Nov 2010 12:30:29 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> Message-ID: <20101122173029.CB5AA235E1E@kimball.webabinitio.net> On Mon, 22 Nov 2010 12:00:14 -0500, Alexander Belopolsky wrote: > I recently updated chr() and ord() documentation and used > "narrow/wide" terms. I thought USC2/4 proponents objected to that on > the basis that these terms are imprecise. For reference, a grep in py3k/Doc reveals that there are currently exactly 23 lines mentioning UCS2 or UCS4 in the docs. Most are in the unicode part of the c-api, and 6 are in what's new for 2.2: c-api/arg.rst: Convert a null-terminated buffer of Unicode (UCS-2 or UCS-4) data to a Python c-api/arg.rst: Convert a Unicode (UCS-2 or UCS-4) data buffer and its length to a Python c-api/unicode.rst: for :c:type:`Py_UNICODE` and store Unicode values internally as UCS2. It is also c-api/unicode.rst: possible to build a UCS4 version of Python (most recent Linux distributions come c-api/unicode.rst: with UCS4 builds of Python). These builds then use a 32-bit type for c-api/unicode.rst: :c:type:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms c-api/unicode.rst: short` (UCS2) or :c:type:`unsigned long` (UCS4). c-api/unicode.rst:Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep c-api/unicode.rst: values is interpreted as an UCS-2 character. whatsnew/2.2.rst:usually stored as UCS-2, as 16-bit unsigned integers. Python 2.2 can also be whatsnew/2.2.rst:compiled to use UCS-4, 32-bit unsigned integers, as its internal encoding by whatsnew/2.2.rst:supplying :option:`--enable-unicode=ucs4` to the configure script. (It's also whatsnew/2.2.rst:When built to use UCS-4 (a "wide Python"), the interpreter can natively handle whatsnew/2.2.rst:compiled to use UCS-2 (a "narrow Python"), values greater than 65535 will still whatsnew/2.2.rst:Marc-Andr?? Lemburg. The changes to support using UCS-4 internally were howto/unicode.rst:.. comment Additional topic: building Python w/ UCS2 or UCS4 support howto/unicode.rst: - [ ] Building Python (UCS2, UCS4) library/sys.rst: characters are stored as UCS-2 or UCS-4. library/json.rst: specified. Encodings that are not ASCII based (such as UCS-2) are not faq/extending.rst:When importing module X, why do I get "undefined symbol: PyUnicodeUCS2*"? faq/extending.rst:If instead the name of the undefined symbol starts with ``PyUnicodeUCS4``, the faq/extending.rst: ... print('UCS4 build') faq/extending.rst: ... print('UCS2 build') -- R. David Murray www.bitdance.com From lukasz at langa.pl Mon Nov 22 18:35:16 2010 From: lukasz at langa.pl (=?UTF-8?B?xYF1a2FzeiBMYW5nYQ==?=) Date: Mon, 22 Nov 2010 18:35:16 +0100 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: <4CEAA4DB.6020904@gmail.com> References: <4CEAA4DB.6020904@gmail.com> Message-ID: <4CEAA9D4.2020904@langa.pl> Am 22.11.2010 18:14, schrieb Ezio Melotti: > I would like to re-enable by default warnings for regrtest and/or > unittest. +1 Especially in regrtest it could help manage stdlib quality (currently we have a horde of ResourceWarnings, zipfile mostly). I would even be +1 on making warnings errors for regrtest but that seems to be unpopular on #python-dev. Best regards, ?ukasz Langa From alexander.belopolsky at gmail.com Mon Nov 22 18:37:59 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 22 Nov 2010 12:37:59 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <20101122173029.CB5AA235E1E@kimball.webabinitio.net> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> <20101122173029.CB5AA235E1E@kimball.webabinitio.net> Message-ID: On Mon, Nov 22, 2010 at 12:30 PM, R. David Murray wrote: .. > For reference, a grep in py3k/Doc reveals that there are currently exactly > 23 lines mentioning UCS2 or UCS4 in the docs. Did you grep for USC-2 and USC-4 as well? I have to admit that my aversion to these terms is mostly due to the fact that I don't know how to spell them correctly. :-) From tjreedy at udel.edu Mon Nov 22 18:41:46 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 22 Nov 2010 12:41:46 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11/22/2010 5:48 AM, Stephen J. Turnbull wrote: > I disagree. I do see a problem with "UCS-2", because it fails to tell > us that Python implements a large number of features that make it easy > to do a very good job of working with non-BMP data in 16-bit builds of Yes. As I read the standard, UCS-2 is limited to BMP chars. So I was a bit confused when Python was described as UCS-2, until I realized that the term was inaccurate. Using that term punishes people like me who take the time to read the standard or otherwise learn what the term means. What Python does might be called USC-2+ or UCS-2e (xtended). -- Terry Jan Reedy From fuzzyman at voidspace.org.uk Mon Nov 22 18:45:58 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 22 Nov 2010 17:45:58 +0000 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: <4CEAA9D4.2020904@langa.pl> References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> Message-ID: <4CEAAC56.2090702@voidspace.org.uk> On 22/11/2010 17:35, ?ukasz Langa wrote: > Am 22.11.2010 18:14, schrieb Ezio Melotti: >> I would like to re-enable by default warnings for regrtest and/or >> unittest. > > +1 > > Especially in regrtest it could help manage stdlib quality (currently > we have a horde of ResourceWarnings, zipfile mostly). I would even be > +1 on making warnings errors for regrtest but that seems to be > unpopular on #python-dev. > Enabling it for regrtest makes sense. For unittest I still think it is a choice that should be left to developers. Michael > Best regards, > ?ukasz Langa > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From raymond.hettinger at gmail.com Mon Nov 22 19:13:30 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 22 Nov 2010 10:13:30 -0800 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Nov 22, 2010, at 2:48 AM, Stephen J. Turnbull wrote: > Raymond Hettinger writes: > >> Neither UTF-16 nor UCS-2 is exactly correct anyway. > > From a standards lawyer point of view, UCS-2 is exactly correct, You're twisting yourself into definitional knots. Any explanation we give users needs to let them know two things: * that we cover the entire range of unicode not just BMP * that sometimes len(chr(i)) is one and sometimes two The term UCS-2 is a complete communications failure in that regard. If someone looks up the term, they will immediately see something like the wikipedia entry which says, "UCS-2 cannot represent code points outside the BMP". How is that helpful? Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Mon Nov 22 19:29:33 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 22 Nov 2010 10:29:33 -0800 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Nov 22, 2010, at 9:41 AM, Terry Reedy wrote: > On 11/22/2010 5:48 AM, Stephen J. Turnbull wrote: > >> I disagree. I do see a problem with "UCS-2", because it fails to tell >> us that Python implements a large number of features that make it easy >> to do a very good job of working with non-BMP data in 16-bit builds of > > Yes. As I read the standard, UCS-2 is limited to BMP chars. So I was a bit confused when Python was described as UCS-2, until I realized that the term was inaccurate. Using that term punishes people like me who take the time to read the standard or otherwise learn what the term means. Bingo! Thanks for the excellent summary of the problem. > > What Python does might be called USC-2+ or UCS-2e (xtended). That would be a step in the right direction. Raymond From jcea at jcea.es Mon Nov 22 19:34:49 2010 From: jcea at jcea.es (Jesus Cea) Date: Mon, 22 Nov 2010 19:34:49 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling Message-ID: <4CEAB7C9.7020504@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 A Solaris installation contains ALWAYS 32 and 64 bits libraries. So in any Solaris you can run 32/64 bits programs, and compile in 32 and 64 bits. For this, libraries are stores in "/usr/lib", for instance, for 32 bits, while the same 64 bits libraries are stored in "/usr/lib/64". Currently, python do not considerate this. We have Solaris 10 buildslaves, but they compile in 32 bits, aparently. For instance . We now have 32 and 64 bits OpenIndiana buildslaves, so we can actually check this. They were deployed yesterday. Apparently the changes would be pretty simple, adding ".../64" to library paths, to try to find the extra libraries. What do you think?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOq3yZlgi5GaxT1NAQLQhAP9G2liX+YveYmfYDOuVjWWS8PE7r2wM/XA 5rik9mJM4Z7/wDnY4wrWjG5l3B9sSyrhhNI1YmIcXm4klfYxV9xTkG9dMNL+2bVc +s98rlTdjNlMVTf8Xc7U3tMpdkG/JK0+XWmRfWsf52ATdtxPHazI9L6KvqdYjNuZ 2w3dXNXErZE= =oYXo -----END PGP SIGNATURE----- From mal at egenix.com Mon Nov 22 19:53:00 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 22 Nov 2010 19:53:00 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEABC0C.4080909@egenix.com> Raymond Hettinger wrote: > Any explanation we give users needs to let them know two things: > * that we cover the entire range of unicode not just BMP > * that sometimes len(chr(i)) is one and sometimes two > > The term UCS-2 is a complete communications failure > in that regard. If someone looks up the term, they will > immediately see something like the wikipedia entry which says, > "UCS-2 cannot represent code points outside the BMP". > How is that helpful? It's very helpful, since it explains why a UCS-2 build of Python requires a surrogates pair to represent a non-BMP code point and explains why chr(i) gives you a length 2 string rather than a length 1 string. A UCS-4 build does not need to use surrogates for this, hence you get a length 1 string from chr(i). There are two levels we have to explain to users: 1. the transfer level 2. the storage level The UTF encodings address the transfer level and is what you deal with in I/O. These provide variable length encodings of the complete Unicode code point range, regardless of whether you have a UCS-2 or a UCS-4 build. The storage level becomes important if you want to work on strings using indexing and slicing. Here you do have to know whether you're dealing with a UCS-2 or a UCS-4 build, since the indexes will vary if you're using non-BMP code points. Finally, to tie both together, we have to explain that UTF-16 (the transfer encoding) maps to UCS-2 in a straight-forward way, so it is possible to work with a UCS-2 build of Python and still use the complete Unicode code point range - you only have to take into consideration, that Python's string indexing will not necessarily point you to n-th code point in a string, but may well give you half or a surrogate. Note that while that last aspect may appear like a good argument for UCS-4 builds, in reality it is not. UCS-4 has the same issue on a different level: the letters that get printed on the screen or printer (graphemes) may well be made up of multiple combining code points, e.g. an "e" and an "?". Those again map to two indexes in the Python string, even though, the appear to be one character on output. Now try to explain all of the above using the terms "narrow" and "wide" (while remembering "explicit is better than implicit" and "avoid the temptation to guess") :-) It is not really helpful to replace a correct and accurate term with a fuzzy term: either way we're stuck with the semantics. However, the correct and accurate terms at least give you a chance to figure out and understand the reasoning behind the design. UCS-2 vs. UCS-4 is a trade-off, "narrow" and "wide" is marketing talk with an implicit emphasis on one side :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 22 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ezio.melotti at gmail.com Mon Nov 22 19:58:33 2010 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Mon, 22 Nov 2010 20:58:33 +0200 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: <4CEAAC56.2090702@voidspace.org.uk> References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> <4CEAAC56.2090702@voidspace.org.uk> Message-ID: <4CEABD59.6080005@gmail.com> On 22/11/2010 19.45, Michael Foord wrote: > On 22/11/2010 17:35, ?ukasz Langa wrote: >> Am 22.11.2010 18:14, schrieb Ezio Melotti: >>> I would like to re-enable by default warnings for regrtest and/or >>> unittest. >> >> +1 >> >> Especially in regrtest it could help manage stdlib quality (currently >> we have a horde of ResourceWarnings, zipfile mostly). I would even be >> +1 on making warnings errors for regrtest but that seems to be >> unpopular on #python-dev. >> As I said on IRC I think it makes sense to turn them into errors once we fixed/silenced all the ones that we have now. That would help keeping the number of warning to 0. > > Enabling it for regrtest makes sense. For unittest I still think it is > a choice that should be left to developers. If we consider that most of the developers want to see them, I'd prefer to have the warnings by default rather than having to use -Wd explicitly every time I run the tests (keep in mind that many developers out there don't even know/remember that now they should use -Wd). > > Michael > >> Best regards, >> ?ukasz Langa >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > > From alexander.belopolsky at gmail.com Mon Nov 22 20:09:14 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 22 Nov 2010 14:09:14 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Nov 22, 2010 at 12:41 PM, Terry Reedy wrote: .. > What Python does might be called USC-2+ or UCS-2e (xtended). > Wow! I am not the only one who can't get the order of letters right in these acronyms. (I am usually consistent within one sentence, though.) :-) I-can't-spell-three-letter-acronyms-right-ly yours ... From brett at python.org Mon Nov 22 20:12:26 2010 From: brett at python.org (Brett Cannon) Date: Mon, 22 Nov 2010 11:12:26 -0800 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAB7C9.7020504@jcea.es> References: <4CEAB7C9.7020504@jcea.es> Message-ID: On Mon, Nov 22, 2010 at 10:34, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > A Solaris installation contains ALWAYS 32 and 64 bits libraries. So in > any Solaris you can run 32/64 bits programs, and compile in 32 and 64 bits. > > For this, libraries are stores in "/usr/lib", for instance, for 32 bits, > while the same 64 bits libraries are stored in "/usr/lib/64". > > Currently, python do not considerate this. > > We have Solaris 10 buildslaves, but they compile in 32 bits, aparently. > For instance > . > > We now have 32 and 64 bits OpenIndiana buildslaves, so we can actually > check this. They were deployed yesterday. > > Apparently the changes would be pretty simple, adding ".../64" to > library paths, to try to find the extra libraries. > > What do you think?. Are you asking about buildbots only or as a general policy? If you are asking about the buildbots then I definitely think we should use 64 bits. If you are asking about policy I would say it should be an option in case people are using C extensions that are not designed to work with 64 bits. From brett at python.org Mon Nov 22 20:24:34 2010 From: brett at python.org (Brett Cannon) Date: Mon, 22 Nov 2010 11:24:34 -0800 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: <4CEABD59.6080005@gmail.com> References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> <4CEAAC56.2090702@voidspace.org.uk> <4CEABD59.6080005@gmail.com> Message-ID: On Mon, Nov 22, 2010 at 10:58, Ezio Melotti wrote: > On 22/11/2010 19.45, Michael Foord wrote: >> >> On 22/11/2010 17:35, ?ukasz Langa wrote: >>> >>> Am 22.11.2010 18:14, schrieb Ezio Melotti: >>>> >>>> I would like to re-enable by default warnings for regrtest and/or >>>> unittest. >>> >>> +1 >>> >>> Especially in regrtest it could help manage stdlib quality (currently we >>> have a horde of ResourceWarnings, zipfile mostly). I would even be +1 on >>> making warnings errors for regrtest but that seems to be unpopular on >>> #python-dev. >>> > > As I said on IRC I think it makes sense to turn them into errors once we > fixed/silenced all the ones that we have now. That would help keeping the > number of warning to 0. I agree. > >> >> Enabling it for regrtest makes sense. For unittest I still think it is a >> choice that should be left to developers. > > If we consider that most of the developers want to see them, I'd prefer to > have the warnings by default rather than having to use -Wd explicitly every > time I run the tests (keep in mind that many developers out there don't even > know/remember that now they should use -Wd). The problem with that is it means developers who switch to Python 3.2 or whatever are suddenly going to have their tests fail until they update their code to turn the warnings off. Then again, if we make the switch for this dead simple to add and backwards-compatible so that turning them off doesn't trigger an error in older versions then I am all for turning warnings on by default. Another approach is to have unittest's runner, when run in verbose mode, print out what the warnings filter is set to so developers are aware that they are silencing warnings. -Brett > > >> >> Michael >> >>> Best regards, >>> ?ukasz Langa >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev at python.org >>> http://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: >>> http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk >> >> > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brett%40python.org > From jcea at jcea.es Mon Nov 22 20:26:40 2010 From: jcea at jcea.es (Jesus Cea) Date: Mon, 22 Nov 2010 20:26:40 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: References: <4CEAB7C9.7020504@jcea.es> Message-ID: <4CEAC3F0.4040806@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 22/11/10 20:12, Brett Cannon wrote: > Are you asking about buildbots only or as a general policy? If you are > asking about the buildbots then I definitely think we should use 64 > bits. If you are asking about policy I would say it should be an > option in case people are using C extensions that are not designed to > work with 64 bits. The point is that building python in 64 bits under Solaris (family) is not easy, because the 64 bits libraries (zlib, openssl, berkeley db, curses, etc., etc., etc) are not is "/usr/lib", "/usr/local/lib", etc., but "/usr/lib/64", "/usr/local/lib/64", etc. Solaris overcomes most of the issue having separate library searchpath in 32 and 64 bits (via the "crle" command). But in some cases python try to find some library in "/usr/local/lib", and my point is that it should search TOO inside "/usr/local/lib/64". - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOrD8Jlgi5GaxT1NAQJhRQP/dd4q70eXsq5AUFrleqUx3A+AagChpCcp UDHAomaX26cMl0tLFwLOd4SaKizzRMvjdTJc3GhZDIqYrF3QuqZAyLPjr5tyogP8 /4KPM73l5L2cb7IdHdSHpruwMh8f2WJ4S6+ig8DzOj6qBcttXKMymrV/skum4ENJ yb4mbpH9q/0= =Oe2G -----END PGP SIGNATURE----- From barry at python.org Mon Nov 22 20:28:43 2010 From: barry at python.org (Barry Warsaw) Date: Mon, 22 Nov 2010 14:28:43 -0500 Subject: [Python-Dev] issue 9807 - abiflags in paths and symlinks (updated patch) In-Reply-To: <20101110162719.11ae7fe6@mission> References: <20101110162719.11ae7fe6@mission> Message-ID: <20101122142843.45ae45ae@mission> On Nov 10, 2010, at 04:27 PM, Barry Warsaw wrote: >I finally found a chance to address all the outstanding technical issues >mentioned in bug 9807: > > http://bugs.python.org/issue9807 > >I've uploaded a new patch which contains the rest of the changes I'm >proposing. I think we still need consensus about whether these changes are >good to commit. With 3.2b1 coming soon, now's the time to do that. > >If there are any remaining concerns about the details of the patch, please add >them to the tracker issue. If you have any remaining objections to the >change, please let me know or follow up here. The patch has now been updated to address the last few comments in the tracker issue. I am now ready to commit it to py3k. If there are any remaining objections or concerns, please reply here or update the tracker issue. Otherwise, I plan to commit this to py3k on Wednesday. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From martin at v.loewis.de Mon Nov 22 20:42:16 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 22 Nov 2010 20:42:16 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAC3F0.4040806@jcea.es> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> Message-ID: <4CEAC798.5050707@v.loewis.de> > Solaris overcomes most of the issue having separate library searchpath > in 32 and 64 bits (via the "crle" command). But in some cases python try > to find some library in "/usr/local/lib", and my point is that it should > search TOO inside "/usr/local/lib/64". I don't think this will work. If the linker finds a library of the wrong ELF type, then it will choke. Before enabling anything on a build slave, a patch needs to be contributed to make it work in the first place. Regards, Martin From rdmurray at bitdance.com Mon Nov 22 20:50:14 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 22 Nov 2010 14:50:14 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> <20101122173029.CB5AA235E1E@kimball.webabinitio.net> Message-ID: <20101122195014.B3D9A235C94@kimball.webabinitio.net> On Mon, 22 Nov 2010 12:37:59 -0500, Alexander Belopolsky wrote: > On Mon, Nov 22, 2010 at 12:30 PM, R. David Murray wrote: > .. > > For reference, a grep in py3k/Doc reveals that there are currently exactly > > 23 lines mentioning UCS2 or UCS4 in the docs. > > Did you grep for USC-2 and USC-4 as well? I have to admit that my > aversion to these terms is mostly due to the fact that I don't know > how to spell them correctly. :-) I grepped using "-ri ucs." and eliminated the false positives (of which there were only a few) by hand. -- R. David Murray www.bitdance.com From guido at python.org Mon Nov 22 22:08:57 2010 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Nov 2010 13:08:57 -0800 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> <4CEAAC56.2090702@voidspace.org.uk> <4CEABD59.6080005@gmail.com> Message-ID: On Mon, Nov 22, 2010 at 11:24 AM, Brett Cannon wrote: > The problem with that is it means developers who switch to Python 3.2 > or whatever are suddenly going to have their tests fail until they > update their code to turn the warnings off. That sounds like a feature to me... :-) -- --Guido van Rossum (python.org/~guido) From jcea at jcea.es Mon Nov 22 22:31:21 2010 From: jcea at jcea.es (Jesus Cea) Date: Mon, 22 Nov 2010 22:31:21 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAC798.5050707@v.loewis.de> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> Message-ID: <4CEAE129.2060505@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 22/11/10 20:42, "Martin v. L?wis" wrote: > Before enabling anything on a build slave, a patch needs to be > contributed to make it work in the first place. I actually agree. I am not sure yet, but I am thinking that adding a "--build-64" parameter to "configure" could be an option under Solaris. Most OSs (let say, Linux) force you to choose 32/64 bits at install time, but Solaris can use both at the same time, and compilers allow to compile both (using -m32 or -m64). Since choosing 32 or 64 bits when compiling python under Solaris change the requirement, paths, etc., automating it should be a goal. PS: Martin, is there any reason to restrict the solaris 10 buildslaves to 32 bits, beside the said problems?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOrhKZlgi5GaxT1NAQI0cAP+OUFGVDd7UV6MdHzMenBn8fO3h4M1n0dR UZrVyYJhUYvEX9p7MRBdDNFY/6LrUITb3WCVegD3PuOymQP16GgksRfIA/jGDXyl Fe+Ed5amlDgdVPeVVH/55OodrO4SuOrJZ846G6GB1wav2IjR7I9YGxZQ6PA0LR7l 4Iph6HfcMlw= =hTNy -----END PGP SIGNATURE----- From v+python at g.nevcal.com Mon Nov 22 22:54:47 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 22 Nov 2010 13:54:47 -0800 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: References: <4CEA0246.9080607@g.nevcal.com> Message-ID: <4CEAE6A7.3010902@g.nevcal.com> On 11/22/2010 8:33 AM, Guido van Rossum wrote: > On Sun, Nov 21, 2010 at 9:40 PM, Glenn Linderman wrote: >> In reviewing my notes from my experimentations with CGIHTTPServer >> (Python2.6) and then http.server (Python 3.2a4), I note one behavior I >> haven't reported as a bug, nor do I know where to start to figure it out, >> other than experimentally. >> >> The experiment: launching CGIHTTPServer without environment variables, by >> the simple expedient of using a batch file to unset all the existing >> environment variables, and then launching Python2.6 with CGIHTTPServer. >> >> So it failed early: random.py fails at line 110 (Python 2.6). > What specific traceback do you get? In my copy of the code that line says > > a = long(_hexlify(_urandom(16)), 16) > > and I could just imagine that _urandom() fails for some reason to do > with the environment (it is a reference to os.urandom()), which, being > part of the C library code, might depend on the environment. > > But you're not giving enough info to debug this. Yep, that's the line. I'll have to re-run the scenario, but will do it on 3.2a4, hopefully tonight or tomorrow, to get the traceback. >> I suppose it is possible that some environment variables are used by Python >> directly (but I can't seem to find a documented list of them) although I >> would expect that usage to be optional, with fall-back defaults when they >> don't exist. > That is certainly the idea, but the fallbacks may not always be nice. > > Environment variables used by Python or the stdlib itself are supposed > to be named PYTHON if they are Python-specific, and there's > a way to disable all of these (-E). But there are other environment > variables (HOME and PATH come to mind) that have a broader definition > and that are used in some part of the stdlib. Plus, as I mentioned, > who knows what the non-Python C library uses (well, somebody probably > knows, but I don't know of a central source that we can actually trust > across the many platforms where Python runs). OK, thanks for the philosophy statement. That's what I didn't know, being new. >> I suppose it is even possible that some Windows APIs might >> depend on some environment variables, but I expected that the registry had >> replaced such usage completely, by now, with the environment variables >> mostly being a convenience tool for batch files, or for optional, temporary >> alteration of particular settings. > That sounds like wishful thinking. :-) Well, wishful thinking from me regarding the Windows and the registry is that Windows would be better off without a registry. But it seemed like their direction was instead to do away with environment variables, but in any case, I have little idea if they've achieved it, but should have achieved something in 6.1 versions of Windows! -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at voidspace.org.uk Mon Nov 22 23:01:12 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 22 Nov 2010 22:01:12 +0000 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> <4CEAAC56.2090702@voidspace.org.uk> <4CEABD59.6080005@gmail.com> Message-ID: <4CEAE828.5000801@voidspace.org.uk> On 22/11/2010 21:08, Guido van Rossum wrote: > On Mon, Nov 22, 2010 at 11:24 AM, Brett Cannon wrote: >> The problem with that is it means developers who switch to Python 3.2 >> or whatever are suddenly going to have their tests fail until they >> update their code to turn the warnings off. > That sounds like a feature to me... :-) > I think Ezio was suggesting just turning warnings on by default when unittest is run, not turning them into errors. Ezio is suggesting that developers could explicitly turn warnings off again, but when you use the default test runner warnings would be shown. His logic is that warnings are for developers, and so are tests... Michael -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From martin at v.loewis.de Mon Nov 22 23:05:40 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 22 Nov 2010 23:05:40 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAE129.2060505@jcea.es> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> Message-ID: <4CEAE934.9000106@v.loewis.de> > I actually agree. I am not sure yet, but I am thinking that adding a > "--build-64" parameter to "configure" could be an option under Solaris. > Most OSs (let say, Linux) force you to choose 32/64 bits at install > time Actually, that's not at all the case. Most systems these days support 32-bit and 64-bit applications simultaneously, and also support compiler tool chains that allow building for either mode. Solaris, Linux, and Windows are about on-par in this respect; OS X is more advanced as it allows to have a single binary that supports both 32-bit and 64-bit execution (making the need for adjusted path names irrelevant). > Since choosing 32 or 64 bits when compiling python under Solaris change > the requirement, paths, etc., automating it should be a goal. > > PS: Martin, is there any reason to restrict the solaris 10 buildslaves > to 32 bits, beside the said problems?. I don't see that as a restriction. I have to make a choice, and there are sooo many choices to make: - gcc vs. SunPRO - 32-bit vs. 64-bit - GNU make vs. /usr/ccs/bin/make I picked the combination which was most easy to setup, and is therefore likely to be used by most users (except for those who think 64-bit is somehow "better" than 32-bit, when it is actually the other way 'round - IMO). As for configuration, I personally prefer that setting CC indicates what type of build you want. Set CC to "gcc -m64" to indicate a 64-build. Ideally, you will *not* have to adjust library paths, since the other compiler will know on its own where to search things. Regards, Martin From nad at acm.org Mon Nov 22 23:12:05 2010 From: nad at acm.org (Ned Deily) Date: Mon, 22 Nov 2010 14:12:05 -0800 Subject: [Python-Dev] Solaris family and 64 bits compiling References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> Message-ID: In article <4CEAE129.2060505 at jcea.es>, Jesus Cea wrote: > On 22/11/10 20:42, "Martin v. L?wis" wrote: > > Before enabling anything on a build slave, a patch needs to be > > contributed to make it work in the first place. > > I actually agree. I am not sure yet, but I am thinking that adding a > "--build-64" parameter to "configure" could be an option under Solaris. > Most OSs (let say, Linux) force you to choose 32/64 bits at install > time, but Solaris can use both at the same time, and compilers allow to > compile both (using -m32 or -m64). > > Since choosing 32 or 64 bits when compiling python under Solaris change > the requirement, paths, etc., automating it should be a goal. You might want to look at the existing --with-universal-archs=ARCH in configure for how this is done for OS X builds. It's probably both simpler and more complicated than would be needed elsewhere: on OS X, a single file can contain object codes for multiple architectures, e.g 32-bit and 64-bit, rather than having to have multiple files. -- Ned Deily, nad at acm.org From brett at python.org Mon Nov 22 23:20:21 2010 From: brett at python.org (Brett Cannon) Date: Mon, 22 Nov 2010 14:20:21 -0800 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> <4CEAAC56.2090702@voidspace.org.uk> <4CEABD59.6080005@gmail.com> Message-ID: On Mon, Nov 22, 2010 at 13:08, Guido van Rossum wrote: > On Mon, Nov 22, 2010 at 11:24 AM, Brett Cannon wrote: >> The problem with that is it means developers who switch to Python 3.2 >> or whatever are suddenly going to have their tests fail until they >> update their code to turn the warnings off. > > That sounds like a feature to me... :-) =) I meant update their tests with the switch to turn off the warnings, not update to make the warnings properly disappear. I guess it's a question of whether it will be errors by default or simply output the warning. I can get behind printing the warnings by default and adding a switch to make them errors or off otherwise. -Brett > > -- > --Guido van Rossum (python.org/~guido) > From anurag.chourasia at gmail.com Mon Nov 22 23:46:16 2010 From: anurag.chourasia at gmail.com (Anurag Chourasia) Date: Tue, 23 Nov 2010 04:16:16 +0530 Subject: [Python-Dev] Missing Python Symbols when Starting Python App (Apache/Django/Mod_Wsgi) Message-ID: All, I have a problem in starting my Python(Django) App using Apache and Mod_Wsgi I am using Django 1.2.3 and Python 2.6.6 running on Apache 2.2.17 with Mod_Wsgi 3.3 When I try to access the app from Web Browser, I am getting these errors. [Mon Nov 22 09:45:25 2010] [notice] Apache/2.2.17 (Unix) mod_wsgi/3.3 Python/2.6.6 configured -- resuming normal operations [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] mod_wsgi (pid=1273874): Target WSGI script '/u01/home/apli/wm/app/gdd/pyserver/ apache/django.wsgi' cannot be loaded as Python module. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] mod_wsgi (pid=1273874): Exception occurred processing WSGI script '/u01/home/ apli/wm/app/gdd/pyserver/apache/django.wsgi'. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] Traceback (most recent call last): [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] File "/u01/ home/apli/wm/app/gdd/pyserver/apache/django.wsgi", line 19, in [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] import django.core.handlers.wsgi [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] File "/usr/ local/lib/python2.6/site-packages/django/core/handlers/wsgi.py", line 1, in [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from threading import Lock [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] File "/usr/ local/lib/python2.6/threading.py", line 13, in [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from functools import wraps [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] File "/usr/ local/lib/python2.6/functools.py", line 10, in [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from _functools import partial, reduce [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] ImportError: rtld: 0712-001 Symbol PyArg_UnpackTuple was referenced [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from module /usr/local/lib/python2.6/lib-dynload/_functools.so(), but a runtime definition [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] of the symbol was not found. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] rtld: 0712-001 Symbol PyCallable_Check was referenced [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from module /usr/local/lib/python2.6/lib-dynload/_functools.so(), but a runtime definition [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] of the symbol was not found. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] rtld: 0712-001 Symbol PyDict_Copy was referenced [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from module /usr/local/lib/python2.6/lib-dynload/_functools.so(), but a runtime definition [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] of the symbol was not found. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] rtld: 0712-001 Symbol PyDict_Merge was referenced [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from module /usr/local/lib/python2.6/lib-dynload/_functools.so(), but a runtime definition [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] of the symbol was not found. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] rtld: 0712-001 Symbol PyDict_New was referenced [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from module /usr/local/lib/python2.6/lib-dynload/_functools.so(), but a runtime definition [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] of the symbol was not found. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] rtld: 0712-001 Symbol PyErr_Occurred was referenced [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from module /usr/local/lib/python2.6/lib-dynload/_functools.so(), but a runtime definition [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] of the symbol was not found. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] rtld: 0712-001 Symbol PyErr_SetString was referenced [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from module /usr/local/lib/python2.6/lib-dynload/_functools.so(), but a runtime definition [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] of the symbol was not found. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] \t0509-021 Additional errors occurred but are not reported. I assume that those missing runtime definitions are supposed to be in the Python executable. Doing an nm on the first missing symbol reveals that it does exist. root [zibal]% nm /usr/local/bin/python | grep -i PyArg_UnpackTuple .PyArg_UnpackTuple T 268683204 524 PyArg_UnpackTuple D 537073500 PyArg_UnpackTuple d 537073500 12 PyArg_UnpackTuple:F-1 - 224 Please guide. Regards, Guddu -------------- next part -------------- An HTML attachment was scrubbed... URL: From merwok at netwok.org Mon Nov 22 23:51:18 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Mon, 22 Nov 2010 23:51:18 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAB7C9.7020504@jcea.es> References: <4CEAB7C9.7020504@jcea.es> Message-ID: <4CEAF3E6.4080602@netwok.org> Hi, I think this bug is related: http://bugs.python.org/issue1294959 ?Problems with /usr/lib64 builds.? Regards From tlesher at gmail.com Mon Nov 22 23:56:25 2010 From: tlesher at gmail.com (Tim Lesher) Date: Mon, 22 Nov 2010 17:56:25 -0500 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: <4CEAE6A7.3010902@g.nevcal.com> References: <4CEA0246.9080607@g.nevcal.com> <4CEAE6A7.3010902@g.nevcal.com> Message-ID: On Mon, Nov 22, 2010 at 16:54, Glenn Linderman wrote: > I suppose it is possible that some environment variables are used by Python > directly (but I can't seem to find a documented list of them) although I > would expect that usage to be optional, with fall-back defaults when they > don't exist. I can verify that that's the case: Python (at least through 3.1.2) runs fine on Windows platforms when environment variables are completely unavailable. I know that from running our port for Windows CE (which has no environment variables at all), cross-compiled for Windows XP. -- Tim Lesher From martin at v.loewis.de Tue Nov 23 00:16:47 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 23 Nov 2010 00:16:47 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAF3E6.4080602@netwok.org> References: <4CEAB7C9.7020504@jcea.es> <4CEAF3E6.4080602@netwok.org> Message-ID: <4CEAF9DF.6070509@v.loewis.de> Am 22.11.2010 23:51, schrieb ?ric Araujo: > Hi, > > I think this bug is related: http://bugs.python.org/issue1294959 > ?Problems with /usr/lib64 builds.? Perhaps more closely related: http://bugs.python.org/issue847812 http://bugs.python.org/issue1733484 http://bugs.python.org/issue1676121 http://bugs.python.org/issue1628484 Regards, Martin From jcea at jcea.es Tue Nov 23 00:41:19 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 23 Nov 2010 00:41:19 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAE934.9000106@v.loewis.de> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> Message-ID: <4CEAFF9F.5070503@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 22/11/10 23:05, "Martin v. L?wis" wrote: >> PS: Martin, is there any reason to restrict the solaris 10 buildslaves >> to 32 bits, beside the said problems?. > > I don't see that as a restriction. I have to make a choice, and there > are sooo many choices to make: > - gcc vs. SunPRO > - 32-bit vs. 64-bit > - GNU make vs. /usr/ccs/bin/make > > I picked the combination which was most easy to setup, and is therefore > likely to be used by most users (except for those who think 64-bit > is somehow "better" than 32-bit, when it is actually the other way > 'round - IMO). Do not think this is a personal attack. Not at all. I am deploying 32 and 64 bits buildslaves (in the same machine) and feeling the pain. You are far more experiences than me with buildbots and python. I want to know if I am missing something. > As for configuration, I personally prefer that setting CC indicates > what type of build you want. Set CC to "gcc -m64" to indicate a > 64-build. Ideally, you will *not* have to adjust library paths, since > the other compiler will know on its own where to search things. The problem is not with system library paths. Compilers overcome that. The problem is with things like "/usr/local/lib" and hardcoded library paths in Python. For example, checking : """ gcc -shared -m64 build/temp.solaris-2.11-i86pc-3.2-pydebug/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/readline.o - -L/usr/lib/termcap -L/usr/local/lib -lreadline -lncursesw -o build/lib.solaris-2.11-i86pc-3.2-pydebug/readline.so ld: fatal: file /usr/local/lib/libncursesw.so: wrong ELF class: ELFCLASS32 ld: fatal: file processing errors. No output written to build/lib.solaris-2.11-i86pc-3.2-pydebug/readline.so collect2: ld returned 1 exit status """ The "-L/usr/local/lib" should be "-L/usr/local/lib/64". An example of many. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOr/n5lgi5GaxT1NAQLzogP/Sb2VMe7UwK/YeB8/cQSxhuoKeNRre0pZ XCJDePusysqI3uXBHmH8vitEIILmUKd5kQ6vsFwErPIry7ikl2fbDHe7eQgNr2HK o5Xcul36bqtuKWGkDV+gIyBH/m9k4pkvc7Lfp3mvR7yiYTBB75V/azt64XSTC9si 7QjjetX5wnA= =NCtE -----END PGP SIGNATURE----- From benjamin at python.org Tue Nov 23 00:47:16 2010 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 22 Nov 2010 17:47:16 -0600 Subject: [Python-Dev] [Python-checkins] r86699 - python/branches/py3k/Lib/zipfile.py In-Reply-To: <20101122233126.C8BDBEE981@mail.python.org> References: <20101122233126.C8BDBEE981@mail.python.org> Message-ID: No test? 2010/11/22 lukasz.langa : > Author: lukasz.langa > Date: Tue Nov 23 00:31:26 2010 > New Revision: 86699 > > Log: > Issue #9846: ZipExtFile provides no mechanism for closing the underlying file object > > > > Modified: > ? python/branches/py3k/Lib/zipfile.py > > Modified: python/branches/py3k/Lib/zipfile.py > ============================================================================== > --- python/branches/py3k/Lib/zipfile.py (original) > +++ python/branches/py3k/Lib/zipfile.py Tue Nov 23 00:31:26 2010 > @@ -473,9 +473,11 @@ > ? ? # Search for universal newlines or line chunks. > ? ? PATTERN = re.compile(br'^(?P [^\r\n]+)|(?P \n|\r\n?)') > > - ? ?def __init__(self, fileobj, mode, zipinfo, decrypter=None): > + ? ?def __init__(self, fileobj, mode, zipinfo, decrypter=None, > + ? ? ? ? ? ? ? ? close_fileobj=False): > ? ? ? ? self._fileobj = fileobj > ? ? ? ? self._decrypter = decrypter > + ? ? ? ?self._close_fileobj = close_fileobj > > ? ? ? ? self._compress_type = zipinfo.compress_type > ? ? ? ? self._compress_size = zipinfo.compress_size > @@ -647,6 +649,12 @@ > ? ? ? ? self._offset += len(data) > ? ? ? ? return data > > + ? ?def close(self): > + ? ? ? ?try: > + ? ? ? ? ? ?if self._close_fileobj: > + ? ? ? ? ? ? ? ?self._fileobj.close() > + ? ? ? ?finally: > + ? ? ? ? ? ?super().close() > > > ?class ZipFile: > @@ -889,8 +897,10 @@ > ? ? ? ? # given a file object in the constructor > ? ? ? ? if self._filePassed: > ? ? ? ? ? ? zef_file = self.fp > + ? ? ? ? ? ?should_close = False > ? ? ? ? else: > ? ? ? ? ? ? zef_file = io.open(self.filename, 'rb') > + ? ? ? ? ? ?should_close = True > > ? ? ? ? # Make sure we have an info object > ? ? ? ? if isinstance(name, ZipInfo): > @@ -944,7 +954,7 @@ > ? ? ? ? ? ? if h[11] != check_byte: > ? ? ? ? ? ? ? ? raise RuntimeError("Bad password for file", name) > > - ? ? ? ?return ?ZipExtFile(zef_file, mode, zinfo, zd) > + ? ? ? ?return ?ZipExtFile(zef_file, mode, zinfo, zd, close_fileobj=should_close) > > ? ? def extract(self, member, path=None, pwd=None): > ? ? ? ? """Extract a member from the archive to the current working directory, > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://mail.python.org/mailman/listinfo/python-checkins > -- Regards, Benjamin From jcea at jcea.es Tue Nov 23 00:48:06 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 23 Nov 2010 00:48:06 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAE934.9000106@v.loewis.de> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> Message-ID: <4CEB0136.9050602@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I think this is probably trivial, but is there any foolproof way to detect 64 bit builds in python, beside "sys.maxint"?. And any macro useable for conditional compilation in C?. Checking Solaris 10 header files, I see macros like "_LP64". Portability would be nice, but in this personal case, probably unneeded... - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOsBNplgi5GaxT1NAQLkJwP+P1YyABBPGInHJXvwsU2ZLuj+u/OuZCRE m6hmbZgMajAyc5NtTie36qyHKAtVBcxFFvUdDeyfDZXV5gU+dF9Ha7/R16dclG3k b5W0CbccnGFcQJ/XypNPjH2dYPFDiqF8kCkDfeLJ7ZyL9ojA1YlRGFrgswN77/cF XM7Cwq1mh5k= =JXDq -----END PGP SIGNATURE----- From tjreedy at udel.edu Tue Nov 23 00:58:03 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 22 Nov 2010 18:58:03 -0500 Subject: [Python-Dev] Missing Python Symbols when Starting Python App (Apache/Django/Mod_Wsgi) In-Reply-To: References: Message-ID: On 11/22/2010 5:46 PM, Anurag Chourasia wrote: > > [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] mod_wsgi > (pid=1273874): Target WSGI script '/u01/home/apli/wm/app/gdd/pyserver/ > apache/django.wsgi' cannot be loaded as Python module. All other error stem probably from this. > Please guide. Ask usage questions like this on python-list or a django-specific list. python-list is for discussion of development of future versions of Python, not usage of current versions. -- Terry Jan Reedy From martin at v.loewis.de Tue Nov 23 01:05:59 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 23 Nov 2010 01:05:59 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAFF9F.5070503@jcea.es> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> <4CEAFF9F.5070503@jcea.es> Message-ID: <4CEB0567.8040500@v.loewis.de> Am 23.11.2010 00:41, schrieb Jesus Cea: > On 22/11/10 23:05, "Martin v. L?wis" wrote: >>> PS: Martin, is there any reason to restrict the solaris 10 buildslaves >>> to 32 bits, beside the said problems?. > >> I don't see that as a restriction. I have to make a choice, and there >> are sooo many choices to make: >> - gcc vs. SunPRO >> - 32-bit vs. 64-bit >> - GNU make vs. /usr/ccs/bin/make > >> I picked the combination which was most easy to setup, and is therefore >> likely to be used by most users (except for those who think 64-bit >> is somehow "better" than 32-bit, when it is actually the other way >> 'round - IMO). > > Do not think this is a personal attack. No offense taken. If you really want to know the historical background: this was the very first build slave (before I actually announced it to python-dev), and I haven't changed much from the initial setup. I just point out that none of the binaries in /usr/bin is a 64-bit binary; this includes the Sun-provided /usr/sfw/bin/python > The "-L/usr/local/lib" should be "-L/usr/local/lib/64". An example of many. Is that really the case? I.e. will ncurses automatically install into /usr/local/lib/64 if built with a 64-bit compiler? My installation doesn't even have a /usr/local/lib/64 folder. In any case: this shouldn't need a configure option. Instead, Python can find out itself whether it's a 64-bit build, and make modifications it considers necessary. Regards, Martin From solipsis at pitrou.net Tue Nov 23 01:06:12 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 01:06:12 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> <4CEB0136.9050602@jcea.es> Message-ID: <20101123010612.119d401c@pitrou.net> On Tue, 23 Nov 2010 00:48:06 +0100 Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I think this is probably trivial, but is there any foolproof way to > detect 64 bit builds in python, beside "sys.maxint"?. sys.maxsize > And any macro useable for conditional compilation in C?. SIZEOF_VOID_P > 4 From brian.curtin at gmail.com Tue Nov 23 01:06:33 2010 From: brian.curtin at gmail.com (Brian Curtin) Date: Mon, 22 Nov 2010 18:06:33 -0600 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEB0136.9050602@jcea.es> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> <4CEB0136.9050602@jcea.es> Message-ID: On Mon, Nov 22, 2010 at 17:48, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I think this is probably trivial, but is there any foolproof way to > detect 64 bit builds in python, beside "sys.maxint"?. > import platform platform.architecture() -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Tue Nov 23 01:12:16 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 23 Nov 2010 01:12:16 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEB0136.9050602@jcea.es> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> <4CEB0136.9050602@jcea.es> Message-ID: <4CEB06E0.1080204@v.loewis.de> Am 23.11.2010 00:48, schrieb Jesus Cea: > I think this is probably trivial, but is there any foolproof way to > detect 64 bit builds in python, beside "sys.maxint"?. The canonical way is to use platform.architecture(). > And any macro useable for conditional compilation in C?. You need to be more specific than that. There are perhaps ten independent properties you may query, depending on what precise problem you try to solve. Most likely, you are looking for SIZEOF_VOID_P (but don't use that unless you literally want to know how many bytes a pointer uses, or whether it uses 4 or 8 bytes). Regards, Martin From lukasz at langa.pl Tue Nov 23 01:25:01 2010 From: lukasz at langa.pl (=?utf-8?Q?=C5=81ukasz_Langa?=) Date: Tue, 23 Nov 2010 01:25:01 +0100 Subject: [Python-Dev] [Python-checkins] r86699 - python/branches/py3k/Lib/zipfile.py In-Reply-To: References: <20101122233126.C8BDBEE981@mail.python.org> Message-ID: <66720F75-169A-4702-AF53-69845701AA55@langa.pl> Wiadomo?? napisana przez Benjamin Peterson w dniu 2010-11-23, o godz. 00:47: > No test? > The tests were there already, raising ResourceWarnings. After this change, they stopped doing that. You may say: now they pass for the first time :) Best regards, ?ukasz > 2010/11/22 lukasz.langa : >> Author: lukasz.langa >> Date: Tue Nov 23 00:31:26 2010 >> New Revision: 86699 >> >> Log: >> Issue #9846: ZipExtFile provides no mechanism for closing the underlying file object >> >> >> >> Modified: >> python/branches/py3k/Lib/zipfile.py >> >> Modified: python/branches/py3k/Lib/zipfile.py >> ============================================================================== >> --- python/branches/py3k/Lib/zipfile.py (original) >> +++ python/branches/py3k/Lib/zipfile.py Tue Nov 23 00:31:26 2010 >> @@ -473,9 +473,11 @@ >> # Search for universal newlines or line chunks. >> PATTERN = re.compile(br'^(?P [^\r\n]+)|(?P \n|\r\n?)') >> >> - def __init__(self, fileobj, mode, zipinfo, decrypter=None): >> + def __init__(self, fileobj, mode, zipinfo, decrypter=None, >> + close_fileobj=False): >> self._fileobj = fileobj >> self._decrypter = decrypter >> + self._close_fileobj = close_fileobj >> >> self._compress_type = zipinfo.compress_type >> self._compress_size = zipinfo.compress_size >> @@ -647,6 +649,12 @@ >> self._offset += len(data) >> return data >> >> + def close(self): >> + try: >> + if self._close_fileobj: >> + self._fileobj.close() >> + finally: >> + super().close() >> >> >> class ZipFile: >> @@ -889,8 +897,10 @@ >> # given a file object in the constructor >> if self._filePassed: >> zef_file = self.fp >> + should_close = False >> else: >> zef_file = io.open(self.filename, 'rb') >> + should_close = True >> >> # Make sure we have an info object >> if isinstance(name, ZipInfo): >> @@ -944,7 +954,7 @@ >> if h[11] != check_byte: >> raise RuntimeError("Bad password for file", name) >> >> - return ZipExtFile(zef_file, mode, zinfo, zd) >> + return ZipExtFile(zef_file, mode, zinfo, zd, close_fileobj=should_close) >> >> def extract(self, member, path=None, pwd=None): >> """Extract a member from the archive to the current working directory, >> _______________________________________________ >> Python-checkins mailing list >> Python-checkins at python.org >> http://mail.python.org/mailman/listinfo/python-checkins >> > > > > -- > Regards, > Benjamin > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://mail.python.org/mailman/listinfo/python-checkins -- Pozdrawiam serdecznie, ?ukasz Langa tel. +48 791 080 144 WWW http://lukasz.langa.pl/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From reinout at vanrees.org Mon Nov 22 23:52:10 2010 From: reinout at vanrees.org (Reinout van Rees) Date: Mon, 22 Nov 2010 23:52:10 +0100 Subject: [Python-Dev] Missing Python Symbols when Starting Python App (Apache/Django/Mod_Wsgi) In-Reply-To: References: Message-ID: On 11/22/2010 11:46 PM, Anurag Chourasia wrote: > > I have a problem in starting my Python(Django) App using Apache and Mod_Wsgi I'm pretty sure you're asking on the wrong list. This one is for discussing development of python-the-language :-) You'd better head over to the django-user mailinglist, for instance via http://groups.google.com/group/django-users Reinout -- Reinout van Rees - reinout at vanrees.org - http://reinout.vanrees.org Collega's gezocht! Django/python vacature in Utrecht: http://tinyurl.com/35v34f9 From lukasz at langa.pl Tue Nov 23 01:43:21 2010 From: lukasz at langa.pl (=?utf-8?Q?=C5=81ukasz_Langa?=) Date: Tue, 23 Nov 2010 01:43:21 +0100 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: <4CEAE828.5000801@voidspace.org.uk> References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> <4CEAAC56.2090702@voidspace.org.uk> <4CEABD59.6080005@gmail.com> <4CEAE828.5000801@voidspace.org.uk> Message-ID: Wiadomo?? napisana przez Michael Foord w dniu 2010-11-22, o godz. 23:01: > On 22/11/2010 21:08, Guido van Rossum wrote: >> On Mon, Nov 22, 2010 at 11:24 AM, Brett Cannon wrote: >>> The problem with that is it means developers who switch to Python 3.2 >>> or whatever are suddenly going to have their tests fail until they >>> update their code to turn the warnings off. >> That sounds like a feature to me... :-) >> > I think Ezio was suggesting just turning warnings on by default when unittest is run, not turning them into errors. Ezio is suggesting that developers could explicitly turn warnings off again, but when you use the default test runner warnings would be shown. His logic is that warnings are for developers, and so are tests... Then again, he is not against the idea to turn those warnings into errors, at least for regrtest. If you agree to do that for regrtest I will clean up the tests for warnings. Already did that for zipfile so it doesn't raise ResourceWarnings anymore. I just need to correct multiprocessing and xmlrpc ResourceWarnings, silence some DeprecationWarnings in the tests and we're all set. Ah, I see a couple more with -uall but nothing scary. Anyway, I find warnings as errors in regrtest a welcome feature. Let's make it happen :) -- Best regards, ?ukasz Langa tel. +48 791 080 144 WWW http://lukasz.langa.pl/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcea at jcea.es Tue Nov 23 01:47:01 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 23 Nov 2010 01:47:01 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEB0567.8040500@v.loewis.de> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> <4CEAFF9F.5070503@jcea.es> <4CEB0567.8040500@v.loewis.de> Message-ID: <4CEB0F05.1040700@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 23/11/10 01:05, "Martin v. L?wis" wrote: > No offense taken. If you really want to know the historical background: > this was the very first build slave (before I actually announced it to > python-dev), and I haven't changed much from the initial setup. I do really want to know. I love trivia :-). Thanks. > I just point out that none of the binaries in /usr/bin is a 64-bit > binary; this includes the Sun-provided /usr/sfw/bin/python > >> The "-L/usr/local/lib" should be "-L/usr/local/lib/64". An example of many. > > Is that really the case? I.e. will ncurses automatically install into > /usr/local/lib/64 if built with a 64-bit compiler? My installation > doesn't even have a /usr/local/lib/64 folder. A fresh Solaris 10 install doesn't even have a "/usr/local" directory :). Sadly today most Open Source code is written like if Linux were the only Unix system out there. I was amazed that OpenSSL 1.0 installs automatically in "/usr/local/ssl/lib" when compiled in 32 bits, and in "/usr/local/ssl/lib/64" when compiled in 64 bits. I almost cry. > In any case: this shouldn't need a configure option. Instead, Python can > find out itself whether it's a 64-bit build, and make modifications > it considers necessary. I agree. Python should detect it automatically and update the paths when compiling. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOsPBZlgi5GaxT1NAQIw+QP/ZuxpWo2WZYUUcDfARRnOtp60n4PbIGMf fqQ4ZnC9JnelzKDU9kBo0yReL2zYAw0ZwezsGwZ98M9i3XyKkFCtcJcM1vXpIsDL eBwga8kPDpab5loP/vuac5kVC0wn0Z0z8x+BRMW6mwoOMHJzd463E8GTQywdx3x1 06FUHwJ0Hv4= =PV43 -----END PGP SIGNATURE----- From jcea at jcea.es Tue Nov 23 01:58:46 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 23 Nov 2010 01:58:46 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEB0567.8040500@v.loewis.de> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> <4CEAFF9F.5070503@jcea.es> <4CEB0567.8040500@v.loewis.de> Message-ID: <4CEB11C6.1010504@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 23/11/10 01:05, "Martin v. L?wis" wrote: > I just point out that none of the binaries in /usr/bin is a 64-bit > binary; this includes the Sun-provided /usr/sfw/bin/python True. This is for simplicity reasons (provide only one binary valid for 32 and 64 bits CPUs) and because 64 bits is overkill for a lot of stuff. In my own system my only 64 bits libraries are OpenSSL, GMP, and some multimedia stuff like mencoder, vorbis, etc, where the difference is big. And the GCC 4.5.x install, that installs libraries (fortran, stdc++, objective C, etc) automatically under "/usr/local/lib/64". GOOD. But if we say the Python can be compiled as 64 bits under Solaris, would be nice if that was actually true. Now that we have a buildbot (under OpenIndiana) to test, it is doable. If not, we could say that Solaris+64 bits is unsupported. I don't think we should go that way. Solaris+64 bits should be a full citizen. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOsRxplgi5GaxT1NAQKqqAP/fkiPpnPswMYOWc30Bflg3nDqRf6ih1bW ZZYHEMuJN9C8rm419LnRtoTyeAruHQYJ3o/dAoA2xDZu1xDYz8OOJKzG1L8hRVce OGm9TmziS4zuwWS4sYdmh21/ZCuD0MVq3gqD1h8zYPwrqbTTA6shYr6/He5hAo6j 5PsYWj4gIAE= =Rr80 -----END PGP SIGNATURE----- From benjamin at python.org Tue Nov 23 05:00:08 2010 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 22 Nov 2010 22:00:08 -0600 Subject: [Python-Dev] [Python-checkins] r86699 - python/branches/py3k/Lib/zipfile.py In-Reply-To: <66720F75-169A-4702-AF53-69845701AA55@langa.pl> References: <20101122233126.C8BDBEE981@mail.python.org> <66720F75-169A-4702-AF53-69845701AA55@langa.pl> Message-ID: 2010/11/22 ?ukasz Langa : > Wiadomo?? napisana przez Benjamin Peterson w dniu 2010-11-23, o godz. 00:47: > > No test? > > > The tests were there already, raising ResourceWarnings. After this change, > they stopped doing that. You may say: now they pass for the first time :) It looks like you added new API, though. For that, we would expect new tests. -- Regards, Benjamin From ocean-city at m2.ccsnet.ne.jp Tue Nov 23 05:13:38 2010 From: ocean-city at m2.ccsnet.ne.jp (Hirokazu Yamamoto) Date: Tue, 23 Nov 2010 13:13:38 +0900 Subject: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a) Message-ID: <4CEB3F72.7000006@m2.ccsnet.ne.jp> Hello. Does this affect python? Thank you. http://www.openssl.org/news/secadv_20101116.txt From glyph at twistedmatrix.com Tue Nov 23 06:07:09 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 23 Nov 2010 00:07:09 -0500 Subject: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a) In-Reply-To: <4CEB3F72.7000006@m2.ccsnet.ne.jp> References: <4CEB3F72.7000006@m2.ccsnet.ne.jp> Message-ID: On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto < ocean-city at m2.ccsnet.ne.jp> wrote: > Hello. Does this affect python? Thank you. > > http://www.openssl.org/news/secadv_20101116.txt > No. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Nov 23 07:13:44 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 23 Nov 2010 01:13:44 -0500 Subject: [Python-Dev] [Python-checkins] r86702 - python/branches/py3k/Lib/idlelib/IOBinding.py In-Reply-To: <20101123060131.EB345EE9C0@mail.python.org> References: <20101123060131.EB345EE9C0@mail.python.org> Message-ID: <4CEB5B98.6070003@udel.edu> On 11/23/2010 1:01 AM, terry.reedy wrote: > Author: terry.reedy > Date: Tue Nov 23 07:01:31 2010 > New Revision: 86702 > > Log: Issue 9222 Fix filetypes for open dialog Sorry, forgot to add this before clicking [go] or whatever the button is. Is there any way to revise a revision ;-? > Modified: > python/branches/py3k/Lib/idlelib/IOBinding.py > > Modified: python/branches/py3k/Lib/idlelib/IOBinding.py > ============================================================================== > --- python/branches/py3k/Lib/idlelib/IOBinding.py (original) > +++ python/branches/py3k/Lib/idlelib/IOBinding.py Tue Nov 23 07:01:31 2010 > @@ -476,8 +476,8 @@ > savedialog = None > > filetypes = [ > - ("Python and text files", "*.py *.pyw *.txt", "TEXT"), > - ("All text files", "*", "TEXT"), > + ("Python files", "*.py *.pyw", "TEXT"), > + ("Text files", "*.txt", "TEXT"), > ("All files", "*"), > ] From orsenthil at gmail.com Tue Nov 23 07:16:12 2010 From: orsenthil at gmail.com (Senthil Kumaran) Date: Tue, 23 Nov 2010 14:16:12 +0800 Subject: [Python-Dev] [Python-checkins] r86703 - python/branches/release31-maint/Lib/idlelib/IOBinding.py In-Reply-To: <20101123060705.0651CEE9C0@mail.python.org> References: <20101123060705.0651CEE9C0@mail.python.org> Message-ID: Hi Terry, On Tue, Nov 23, 2010 at 2:07 PM, terry.reedy wrote: > Author: terry.reedy > Date: Tue Nov 23 07:07:04 2010 > New Revision: 86703 > > Log: > Issue 9222 Fix filetypes for open dialog > > Modified: > ? python/branches/release31-maint/Lib/idlelib/IOBinding.py You should be using svnmerge.py script ( referenced in the dev FAQ), to merge your changes to release31-maint. This helps in merge tracking and helpful to release managers when they do the release. It is pretty simple, in your release31-maint checkout: Just run python svnmerge.py merge -r 9221 (your py3k revision value) If successful, do a svn commit -F svnmerge-output-filename ( this file is autogenerated) If any conflicts occur, resolve them and then do the step 2. Thanks, Senthil From g.brandl at gmx.net Tue Nov 23 07:44:43 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 23 Nov 2010 07:44:43 +0100 Subject: [Python-Dev] [Python-checkins] r86702 - python/branches/py3k/Lib/idlelib/IOBinding.py In-Reply-To: <4CEB5B98.6070003@udel.edu> References: <20101123060131.EB345EE9C0@mail.python.org> <4CEB5B98.6070003@udel.edu> Message-ID: Am 23.11.2010 07:13, schrieb Terry Reedy: > > > On 11/23/2010 1:01 AM, terry.reedy wrote: >> Author: terry.reedy >> Date: Tue Nov 23 07:01:31 2010 >> New Revision: 86702 >> >> Log: > Issue 9222 Fix filetypes for open dialog > > Sorry, forgot to add this before clicking [go] or whatever the button > is. Is there any way to revise a revision ;-? Yes, with SVN there is. I don't know if you can do it with whatever GUI tool you use, but the command is the following: svn propedit --revprop -r 86702 svn:log In a short time however, after switching to Mercurial, commits will be truly immutable. However, since the equivalent to committing in SVN is a two-step process (commit locally and then push one or more commits to the public repo on the server), you can review your commits locally before pushing them, and fix mistakes by "rewriting history" (you can see from that description that it won't work when the changes are already public). Georg From tjreedy at udel.edu Tue Nov 23 07:49:56 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 23 Nov 2010 01:49:56 -0500 Subject: [Python-Dev] [Python-checkins] r86703 - python/branches/release31-maint/Lib/idlelib/IOBinding.py In-Reply-To: References: <20101123060705.0651CEE9C0@mail.python.org> Message-ID: <4CEB6414.9020606@udel.edu> On 11/23/2010 1:16 AM, Senthil Kumaran wrote: > Hi Terry, > > On Tue, Nov 23, 2010 at 2:07 PM, terry.reedy wrote: >> Author: terry.reedy >> Date: Tue Nov 23 07:07:04 2010 >> New Revision: 86703 >> >> Log: >> Issue 9222 Fix filetypes for open dialog >> >> Modified: >> python/branches/release31-maint/Lib/idlelib/IOBinding.py > > > You should be using svnmerge.py script ( referenced in the dev FAQ), > to merge your changes to release31-maint. This helps in merge tracking > and helpful to release managers when they do the release. > > It is pretty simple, in your release31-maint checkout: > > Just run python svnmerge.py merge -r 9221 (your py3k revision value) > If successful, do a svn commit -F svnmerge-output-filename ( this file > is autogenerated) I am using TortoiseSVN which has a similar merge but does not seem to autogenerate anything. I did use its merge + commit for the 2.7 backport. Terry From martin at v.loewis.de Tue Nov 23 07:55:20 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 23 Nov 2010 07:55:20 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEB11C6.1010504@jcea.es> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> <4CEAFF9F.5070503@jcea.es> <4CEB0567.8040500@v.loewis.de> <4CEB11C6.1010504@jcea.es> Message-ID: <4CEB6558.3000600@v.loewis.de> > But if we say the Python can be compiled as 64 bits under Solaris, would > be nice if that was actually true. Now that we have a buildbot (under > OpenIndiana) to test, it is doable. But it is true, and always has been true. The lib/64 issue did not prevent one building Python on Solaris/SPARC64 at all, including the extension modules. Just edit Modules/Setup to suit your needs - that works since 1995 (before distutils was even written). > If not, we could say that Solaris+64 bits is unsupported. I don't think > we should go that way. Solaris+64 bits should be a full citizen. There we go again: "supported". Python builds on many systems which we don't have buildbots for, including obscure systems (although Guido has ruled that we won't specifically accept code for obscure systems anymore, unlike we did before). It is never fully automatic (you always have to at least make sure manually that the dependencies are installed). Regards, Martin From tjreedy at udel.edu Tue Nov 23 08:16:11 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 23 Nov 2010 02:16:11 -0500 Subject: [Python-Dev] [Python-checkins] r86702 - python/branches/py3k/Lib/idlelib/IOBinding.py In-Reply-To: References: <20101123060131.EB345EE9C0@mail.python.org> <4CEB5B98.6070003@udel.edu> Message-ID: On 11/23/2010 1:44 AM, Georg Brandl wrote: > Am 23.11.2010 07:13, schrieb Terry Reedy: >> >> >> On 11/23/2010 1:01 AM, terry.reedy wrote: >>> Author: terry.reedy >>> Date: Tue Nov 23 07:01:31 2010 >>> New Revision: 86702 >>> >>> Log: >> Issue 9222 Fix filetypes for open dialog >> >> Sorry, forgot to add this before clicking [go] or whatever the button >> is. Is there any way to revise a revision ;-? > > Yes, with SVN there is. I don't know if you can do it with whatever > GUI tool you use, but the command is the following: > > svn propedit --revprop -r 86702 svn:log (followed by new message?) OK, done. TortoiseSVN has a nice revision log dialog. Right click and one of the choices is 'edit log message'. Easy. I see that there is a TortoiseHg as well. -- Terry Jan Reedy From g.brandl at gmx.net Tue Nov 23 09:10:46 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 23 Nov 2010 09:10:46 +0100 Subject: [Python-Dev] [Python-checkins] r86703 - python/branches/release31-maint/Lib/idlelib/IOBinding.py In-Reply-To: <4CEB6414.9020606@udel.edu> References: <20101123060705.0651CEE9C0@mail.python.org> <4CEB6414.9020606@udel.edu> Message-ID: Am 23.11.2010 07:49, schrieb Terry Reedy: > > > On 11/23/2010 1:16 AM, Senthil Kumaran wrote: >> Hi Terry, >> >> On Tue, Nov 23, 2010 at 2:07 PM, terry.reedy wrote: >>> Author: terry.reedy >>> Date: Tue Nov 23 07:07:04 2010 >>> New Revision: 86703 >>> >>> Log: >>> Issue 9222 Fix filetypes for open dialog >>> >>> Modified: >>> python/branches/release31-maint/Lib/idlelib/IOBinding.py >> >> >> You should be using svnmerge.py script ( referenced in the dev FAQ), >> to merge your changes to release31-maint. This helps in merge tracking >> and helpful to release managers when they do the release. >> >> It is pretty simple, in your release31-maint checkout: >> >> Just run python svnmerge.py merge -r 9221 (your py3k revision value) >> If successful, do a svn commit -F svnmerge-output-filename ( this file >> is autogenerated) > > I am using TortoiseSVN which has a similar merge but does not seem to > autogenerate anything. I did use its merge + commit for the 2.7 backport. While the policy is to use svnmerge and I'd expect developers to follow this policy, in this specific case it's not as important anymore since we use neither svnmerge's mass merging nor its blocking feature anymore. Georg From trent at snakebite.org Tue Nov 23 09:40:50 2010 From: trent at snakebite.org (Trent Nelson) Date: Tue, 23 Nov 2010 03:40:50 -0500 Subject: [Python-Dev] Stable buildbots In-Reply-To: References: <20101113133712.60e9be27@pitrou.net> Message-ID: <4CEB7E12.1070201@snakebite.org> On 14-Nov-10 3:48 AM, David Bolen wrote: > This is a completely separate issue, though probably around just as > long, and like the popup problem its frequency changes over time. By > "hung" here I'm referring to cases where something must go wrong with > a test and/or its cleanup such that a python_d process remains > running, usually several of them at the same time. My guess: the "hung" (single-threaded) Python process has called select() without a timeout in order to wait for some data. However, the data never arrives (due to a broken/failed test), and the select() never returns. On Windows, processes seem harder to kill when they get into this state. If I purposely wedge a Windows process via select() via the interactive interpreter, ctrl-c has absolutely no effect (whereas on Unix, ctrl-c will interrupt the select()). As for why kill_python.exe doesn't seem to be able to kill said wedged processes, the MSDN documentation on TerminateProcess[1] states the following: The terminated process cannot exit until all pending I/O has been completed or canceled. (sic) It's not unreasonable to assume a wedged select() constitutes pending I/O, so that's a possible explanation as to why kill_python.exe isn't able to terminate the processes. (Also, kill_python currently assumes TerminateProcess() always works; perhaps this optimism is misplaced. Also note the XXX TODO regarding the fact that we don't kill processes that have loaded our python*.dll, but may not be named python_d.exe. I don't think that's the issue here, though.) On 14-Nov-10 5:32 AM, David Bolen wrote: > "Martin v. L?wis" writes: > >> This is what kill_python.exe is supposed to solve. So I recommend to >> investigate why it fails to kill the hanging Pythons. > > Yeah, I know, and I can't say I disagree in principle - not sure why > Windows doesn't let the kill in that module work (or if there's an > issue actually running it under all conditions). > > At the moment though, I do know that using the sysinternals pskill > utility externally (which is what I currently do interactively) > definitely works so to be honest, That's interesting. (That kill_python.exe doesn't kill the wedged processes, but pskill does.) kill_python is pretty simple, it just calls TerminateProcess() after acquiring a handle with the relevant PROCESS_TERMINATE access right. That being said, that's the recommended way to kill a process -- I doubt pskill would be going about it any differently (although, it is sysinternals... you never know what kind of crazy black magic it's doing behind the scenes). Are you calling pskill with the -t flag? i.e. kill process and all dependents? That might be the ticket, especially if killing the child process that wedged select() is waiting on causes it to return, and thus, makes it killable. Otherwise, if it happens again, can you try kill_python.exe first, then pskill, and confirm if the former fails but the latter succeeds? Trent. [1]: http://msdn.microsoft.com/en-us/library/ms686714(VS.85).aspx From v+python at g.nevcal.com Tue Nov 23 11:30:31 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 23 Nov 2010 02:30:31 -0800 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: References: <4CEA0246.9080607@g.nevcal.com> Message-ID: <4CEB97C7.1070708@g.nevcal.com> On 11/22/2010 8:33 AM, Guido van Rossum wrote: > On Sun, Nov 21, 2010 at 9:40 PM, Glenn Linderman wrote: >> > In reviewing my notes from my experimentations with CGIHTTPServer >> > (Python2.6) and then http.server (Python 3.2a4), I note one behavior I >> > haven't reported as a bug, nor do I know where to start to figure it out, >> > other than experimentally. >> > >> > The experiment: launching CGIHTTPServer without environment variables, by >> > the simple expedient of using a batch file to unset all the existing >> > environment variables, and then launching Python2.6 with CGIHTTPServer. >> > >> > So it failed early: random.py fails at line 110 (Python 2.6). > What specific traceback do you get? In my copy of the code that line says > > a = long(_hexlify(_urandom(16)), 16) > > and I could just imagine that _urandom() fails for some reason to do > with the environment (it is a reference to os.urandom()), which, being > part of the C library code, might depend on the environment. > > But you're not giving enough info to debug this. OK, here is the traceback. I've upgraded the application from Python 2.6 + CGIHTTPServer.py + bugfixes to Python 3.2a4 + http.server + bugfixes, hoping that it would fix it, but since it didn't that the traceback would be more relevant. It seems that _urandom is the likely culprit. Traceback (most recent call last): File "d:\my\web\areliabl\0test\https.py", line 5, in import server File "d:\my\web\areliabl\0test\server.py", line 88, in import email.message File "C:\Python32\lib\email\message.py", line 17, in from email import utils File "C:\Python32\lib\email\utils.py", line 27, in import random File "C:\Python32\lib\random.py", line 698, in _inst = Random() File "C:\Python32\lib\random.py", line 90, in __init__ self.seed(x) File "C:\Python32\lib\random.py", line 108, in seed a = int.from_bytes(_urandom(32), 'big') WindowsError: [Error -2146893818] Invalid Signature -------------- next part -------------- An HTML attachment was scrubbed... URL: From amauryfa at gmail.com Tue Nov 23 11:55:08 2010 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 23 Nov 2010 11:55:08 +0100 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: <4CEB97C7.1070708@g.nevcal.com> References: <4CEA0246.9080607@g.nevcal.com> <4CEB97C7.1070708@g.nevcal.com> Message-ID: Hi, 2010/11/23 Glenn Linderman : > ? File "C:\Python32\lib\random.py", line 108, in seed > ??? a = int.from_bytes(_urandom(32), 'big') > WindowsError: [Error -2146893818] Invalid Signature In the subprocess documentation http://docs.python.org/library/subprocess.html """On Windows, in order to run a side-by-side assembly the specified env *must* include a valid SystemRoot.""" Can you keep this variable and start again? -- Amaury Forgeot d'Arc From martin at v.loewis.de Tue Nov 23 12:55:38 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 23 Nov 2010 12:55:38 +0100 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: References: <4CEA0246.9080607@g.nevcal.com> <4CEB97C7.1070708@g.nevcal.com> Message-ID: <4CEBABBA.9050002@v.loewis.de> Am 23.11.2010 11:55, schrieb Amaury Forgeot d'Arc: > Hi, > > 2010/11/23 Glenn Linderman : >> File "C:\Python32\lib\random.py", line 108, in seed >> a = int.from_bytes(_urandom(32), 'big') >> WindowsError: [Error -2146893818] Invalid Signature > > In the subprocess documentation http://docs.python.org/library/subprocess.html > """On Windows, in order to run a side-by-side assembly the specified > env *must* include a valid SystemRoot.""" Indeed, setting SystemRoot might solve this problem. According to http://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/ CrypoAPI, in Windows 7, requires this variable be set. Failure to find the enhanced crypto provider would explain why the "random" module of Python fails to work. The specific cause is in the registry: HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Cryptography\Defaults\Provider\Microsoft Strong Cryptographic Provider has as it's ImagePath value %SystemRoot%\system32\rsaenh.dll So the registry (and COM) do rely on environment variables. Regards, Martin From stephen at xemacs.org Tue Nov 23 13:15:20 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 23 Nov 2010 21:15:20 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <877hg4ck2v.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > Yes. As I read the standard, UCS-2 is limited to BMP chars. Et tu, Terry? OK, I change my vote on the suggestion of "UCS2" to -1. If a couple of conscientious blokes like you and David both understand it that way, I can't see any way to fight it. FWIW, ISO/IEC 10646 (which is authoritative for UCS-2 and UCS-4) is available via http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html Probably I'm the last non-author to ever read that document! From nadeem.vawda at gmail.com Tue Nov 23 13:15:18 2010 From: nadeem.vawda at gmail.com (Nadeem Vawda) Date: Tue, 23 Nov 2010 14:15:18 +0200 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> <4CEAAC56.2090702@voidspace.org.uk> <4CEABD59.6080005@gmail.com> <4CEAE828.5000801@voidspace.org.uk> Message-ID: 2010/11/23 ?ukasz Langa : > If you agree to do that for regrtest I will clean up the tests for warnings. > Already did that for zipfile so it doesn't raise ResourceWarnings anymore. I > just need to correct multiprocessing and xmlrpc ResourceWarnings, silence > some DeprecationWarnings in the tests and we're all set. Ah, I see a couple > more with -uall but nothing scary. There are also some in test_socket - I've submitted a patch on Roundup: http://bugs.python.org/issue10512 Looking at the multiprocessing warnings, they seem to be caused by leaks in the underlying package, unlike xmlrpc and socket, where it's just a matter of the test code neglecting to close the connection. So +1 to: > Anyway, I find warnings as errors in regrtest a welcome feature. Let's make > it happen :) Nadeem From jcea at jcea.es Tue Nov 23 13:19:39 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 23 Nov 2010 13:19:39 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEB6558.3000600@v.loewis.de> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> <4CEAFF9F.5070503@jcea.es> <4CEB0567.8040500@v.loewis.de> <4CEB11C6.1010504@jcea.es> <4CEB6558.3000600@v.loewis.de> Message-ID: <4CEBB15B.1010800@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 23/11/10 07:55, "Martin v. L?wis" wrote: >> >> But if we say the Python can be compiled as 64 bits under Solaris, would >> >> be nice if that was actually true. Now that we have a buildbot (under >> >> OpenIndiana) to test, it is doable. > > > > But it is true, and always has been true. The lib/64 issue did not > > prevent one building Python on Solaris/SPARC64 at all, including the > > extension modules. Just edit Modules/Setup to suit your needs - that > > works since 1995 (before distutils was even written). Would be acceptable to change something like: """ add_library_path("/usr/local/lib") """ to something similar to: """ if (platform.uname()=="SunOS") and (platform.architecture()[0]=="64bits") : add_library_path("/usr/local/lib/64") else : add_library_path("/usr/local/lib") """ python-dev would consider that change OK?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOuxW5lgi5GaxT1NAQJuDwP/dzbhDZScanoSnPeF4Ze5XHm+WnSmowx+ x9qvM782i4bYzqYNsbpPHflshROpUwdl9dC0/dFySLFWmMYo12hYogbM6vr5RD6k vEgq1iriIfsei9yNrtt2Ou6+1LVxJ2FMsbpY0Av5hDQVfuJpvB5WRML/mbyYj4T7 9w/jmPT2+rc= =riDG -----END PGP SIGNATURE----- From ncoghlan at gmail.com Tue Nov 23 14:41:05 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 23 Nov 2010 23:41:05 +1000 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS In-Reply-To: <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> Message-ID: On Tue, Nov 23, 2010 at 2:46 AM, wrote: > On 04:24 pm, solipsis at pitrou.net wrote: >> >> On Mon, 22 Nov 2010 17:08:36 +0100 >> Hrvoje Niksic wrote: >>> >>> On 11/22/2010 04:37 PM, Antoine Pitrou wrote: >>> > +1. ?The problem with int constants is that the int gets printed, not >>> > the name, when you dump them for debugging purposes :) >>> >>> Well, it's trivial to subclass int to something with a nicer __repr__. >>> PyGTK uses that technique for wrapping C enums: >> >> Nice. It might be useful to add a private _Constant class somewhere for >> stdlib purposes. > > http://www.python.org/dev/peps/pep-0354/ Indeed, it is difficult to do enums is such a way that they feel sufficiently robust to be worth the effort of including them (although these days, I would be inclined to follow the namedtuple API style rather than that presented in PEP 354). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Tue Nov 23 14:50:53 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 23 Nov 2010 13:50:53 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> Message-ID: <4CEBC6BD.9060402@voidspace.org.uk> On 23/11/2010 13:41, Nick Coghlan wrote: > On Tue, Nov 23, 2010 at 2:46 AM, wrote: >> On 04:24 pm, solipsis at pitrou.net wrote: >>> On Mon, 22 Nov 2010 17:08:36 +0100 >>> Hrvoje Niksic wrote: >>>> On 11/22/2010 04:37 PM, Antoine Pitrou wrote: >>>>> +1. The problem with int constants is that the int gets printed, not >>>>> the name, when you dump them for debugging purposes :) >>>> Well, it's trivial to subclass int to something with a nicer __repr__. >>>> PyGTK uses that technique for wrapping C enums: >>> Nice. It might be useful to add a private _Constant class somewhere for >>> stdlib purposes. >> http://www.python.org/dev/peps/pep-0354/ > Indeed, it is difficult to do enums is such a way that they feel > sufficiently robust to be worth the effort of including them (although > these days, I would be inclined to follow the namedtuple API style > rather than that presented in PEP 354). Right. As it happens I just submitted a patch to Barry Warsaw's enum package (nice), flufl.enum [1], to allow namedtuple style creation of named constants: >>> from flufl.enum import make_enum >>> Colors = make_enum('Colors', 'red green blue') >>> Colors PEP 354 was rejected for two primary reasons - lack of interest and nowhere obvious to put it. Would it be *so bad* if an enum type lived in its own module? There is certainly more interest now, and if we are to use something like this in the standard library it *has* to be in the standard library (unless every module implements their own private _Constant class). Time to revisit the PEP? All the best, Michael [1] https://launchpad.net/flufl.enum > Cheers, > Nick. > -- http://www.voidspace.org.uk/ From solipsis at pitrou.net Tue Nov 23 15:02:19 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 15:02:19 +0100 Subject: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a) References: <4CEB3F72.7000006@m2.ccsnet.ne.jp> Message-ID: <20101123150219.29e20374@pitrou.net> On Tue, 23 Nov 2010 00:07:09 -0500 Glyph Lefkowitz wrote: > On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto < > ocean-city at m2.ccsnet.ne.jp> wrote: > > > Hello. Does this affect python? Thank you. > > > > http://www.openssl.org/news/secadv_20101116.txt > > > > No. Well, actually it does, but Python links against the system OpenSSL on most platforms (except Windows), so it's up to the OS vendor to apply the patch. Regards Antoine. From ncoghlan at gmail.com Tue Nov 23 15:03:53 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 24 Nov 2010 00:03:53 +1000 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: <4CEAE828.5000801@voidspace.org.uk> References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> <4CEAAC56.2090702@voidspace.org.uk> <4CEABD59.6080005@gmail.com> <4CEAE828.5000801@voidspace.org.uk> Message-ID: On Tue, Nov 23, 2010 at 8:01 AM, Michael Foord wrote: > On 22/11/2010 21:08, Guido van Rossum wrote: >> >> On Mon, Nov 22, 2010 at 11:24 AM, Brett Cannon ?wrote: >>> >>> The problem with that is it means developers who switch to Python 3.2 >>> or whatever are suddenly going to have their tests fail until they >>> update their code to turn the warnings off. >> >> That sounds like a feature to me... :-) >> > I think Ezio was suggesting just turning warnings on by default when > unittest is run, not turning them into errors. Ezio is suggesting that > developers could explicitly turn warnings off again, but when you use the > default test runner warnings would be shown. His logic is that warnings are > for developers, and so are tests... Having at least the default test runner change the default warnings behaviour to -Wd (while still respecting sys.warnoptions) sounds like a good idea. That way users won't see the warnings (as intended with that change), but developers are less likely to get nasty surprises when things break in future releases (which was one of our major concerns when we made the decision to change the default handling of DeprecationWarning). A similar change may be appropriate for doctest as well. Printing out the list of suppressed warnings in verbose mode may also be useful. A blanket -We is unlikely to work for the test suite, since generating warnings on some platforms is expected behaviour (e.g. due to the ongoing argument between multiprocessing and FreeBSD as to the appropriate behaviour of semaphores). However, we may be able to get to the point where it is run that way by default and then affected tests use check_warnings() to alter the filter configuration (something that many such affected tests already do). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Tue Nov 23 15:02:57 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 15:02:57 +0100 Subject: [Python-Dev] r86699 - python/branches/py3k/Lib/zipfile.py References: <20101122233126.C8BDBEE981@mail.python.org> <66720F75-169A-4702-AF53-69845701AA55@langa.pl> Message-ID: <20101123150257.76a423ad@pitrou.net> On Mon, 22 Nov 2010 22:00:08 -0600 Benjamin Peterson wrote: > 2010/11/22 ?ukasz Langa : > > Wiadomo?? napisana przez Benjamin Peterson w dniu 2010-11-23, o godz. 00:47: > > > > No test? > > > > > > The tests were there already, raising ResourceWarnings. After this change, > > they stopped doing that. You may say: now they pass for the first time :) > > It looks like you added new API, though. For that, we would expect new tests. It's an internal API, although ZipExtFile doesn't begin with an underscore. Regards Antoine. From ncoghlan at gmail.com Tue Nov 23 15:16:15 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 24 Nov 2010 00:16:15 +1000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEBC6BD.9060402@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> Message-ID: On Tue, Nov 23, 2010 at 11:50 PM, Michael Foord wrote: > PEP 354 was rejected for two primary reasons - lack of interest and nowhere > obvious to put it. Would it be *so bad* if an enum type lived in its own > module? There is certainly more interest now, and if we are to use something > like this in the standard library it *has* to be in the standard library > (unless every module implements their own private _Constant class). > > Time to revisit the PEP? If you (or anyone else) wanted to revisit the PEP, then I would advise trawling through the standard library looking for constants that could be sensibly converted to enum values. A decision would also need to be made as to whether or not to subclass int, or just provide __index__ (the former has the advantage of being able to drop cleanly into OS level APIs that expect a numerical constant). Whether enums should provide arbitrary name-value mappings (ala C enums) or were restricted to sequential indices starting from zero would be another question best addressed by a code survey of at least the stdlib. And getgeneratorstate() doesn't count as a use case, since the ordering isn't needed and using string literals instead of integers will cover the debugging aspect :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Tue Nov 23 15:24:18 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 23 Nov 2010 14:24:18 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> Message-ID: <4CEBCE92.40801@voidspace.org.uk> On 23/11/2010 14:16, Nick Coghlan wrote: > On Tue, Nov 23, 2010 at 11:50 PM, Michael Foord > wrote: >> PEP 354 was rejected for two primary reasons - lack of interest and nowhere >> obvious to put it. Would it be *so bad* if an enum type lived in its own >> module? There is certainly more interest now, and if we are to use something >> like this in the standard library it *has* to be in the standard library >> (unless every module implements their own private _Constant class). >> >> Time to revisit the PEP? > If you (or anyone else) wanted to revisit the PEP, then I would advise > trawling through the standard library looking for constants that could > be sensibly converted to enum values. > > A decision would also need to be made as to whether or not to subclass > int, or just provide __index__ (the former has the advantage of being > able to drop cleanly into OS level APIs that expect a numerical > constant). > > Whether enums should provide arbitrary name-value mappings (ala C > enums) or were restricted to sequential indices starting from zero > would be another question best addressed by a code survey of at least > the stdlib. > > And getgeneratorstate() doesn't count as a use case, since the > ordering isn't needed and using string literals instead of integers > will cover the debugging aspect :) > Well, for backwards compatibility reasons the new constants would have to *behave* like the old ones (including having the same underlying value and comparing equal to it). In many cases it is *likely* that subclassing int is a better way of achieving that. Actually looking through the standard library to evaluate it is the only way of confirming that. Another API, that reduces the duplication of creating the enum and setting the names, could be something like: make_enums("Names", "NAME_ONE NAME_TWO NAME_THREE", base_type=int, module=__name__) Using __name__ we can set the module globals in the call to make_enums. All the best, Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ From solipsis at pitrou.net Tue Nov 23 15:42:29 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 15:42:29 +0100 Subject: [Python-Dev] constant/enum type in stdlib References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> Message-ID: <20101123154229.474f7a90@pitrou.net> On Tue, 23 Nov 2010 14:24:18 +0000 Michael Foord wrote: > Well, for backwards compatibility reasons the new constants would have > to *behave* like the old ones (including having the same underlying > value and comparing equal to it). > > In many cases it is *likely* that subclassing int is a better way of > achieving that. Actually looking through the standard library to > evaluate it is the only way of confirming that. > > Another API, that reduces the duplication of creating the enum and > setting the names, could be something like: > > make_enums("Names", "NAME_ONE NAME_TWO NAME_THREE", base_type=int, > module=__name__) > > Using __name__ we can set the module globals in the call to make_enums. I don't understand why people insist on calling that an "enum". enum is a C legacy and it doesn't bring anything useful as I can tell. Instead, just assign the values explicitly. Antoine. From benjamin at python.org Tue Nov 23 15:49:37 2010 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 23 Nov 2010 08:49:37 -0600 Subject: [Python-Dev] r86699 - python/branches/py3k/Lib/zipfile.py In-Reply-To: <20101123150257.76a423ad@pitrou.net> References: <20101122233126.C8BDBEE981@mail.python.org> <66720F75-169A-4702-AF53-69845701AA55@langa.pl> <20101123150257.76a423ad@pitrou.net> Message-ID: 2010/11/23 Antoine Pitrou : > On Mon, 22 Nov 2010 22:00:08 -0600 > Benjamin Peterson wrote: >> 2010/11/22 ?ukasz Langa : >> > Wiadomo?? napisana przez Benjamin Peterson w dniu 2010-11-23, o godz. 00:47: >> > >> > No test? >> > >> > >> > The tests were there already, raising ResourceWarnings. After this change, >> > they stopped doing that. You may say: now they pass for the first time :) >> >> It looks like you added new API, though. For that, we would expect new tests. > > It's an internal API, although ZipExtFile doesn't begin with an > underscore. Why is it internal API then? -- Regards, Benjamin From benjamin at python.org Tue Nov 23 15:52:09 2010 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 23 Nov 2010 08:52:09 -0600 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123154229.474f7a90@pitrou.net> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> Message-ID: 2010/11/23 Antoine Pitrou : > On Tue, 23 Nov 2010 14:24:18 +0000 > Michael Foord wrote: >> Well, for backwards compatibility reasons the new constants would have >> to *behave* like the old ones (including having the same underlying >> value and comparing equal to it). >> >> In many cases it is *likely* that subclassing int is a better way of >> achieving that. Actually looking through the standard library to >> evaluate it is the only way of confirming that. >> >> Another API, that reduces the duplication of creating the enum and >> setting the names, could be something like: >> >> ? ? ?make_enums("Names", "NAME_ONE NAME_TWO NAME_THREE", base_type=int, >> module=__name__) >> >> Using __name__ we can set the module globals in the call to make_enums. > > I don't understand why people insist on calling that an "enum". enum is > a C legacy and it doesn't bring anything useful as I can tell. Instead, > just assign the values explicitly. The concept of a "enumeration" of values is still useful outside its stunted C incarnation. Out of curiosity, why is enum "legacy" in C? -- Regards, Benjamin From fuzzyman at voidspace.org.uk Tue Nov 23 15:56:36 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 23 Nov 2010 14:56:36 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123154229.474f7a90@pitrou.net> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> Message-ID: <4CEBD624.9000402@voidspace.org.uk> On 23/11/2010 14:42, Antoine Pitrou wrote: > On Tue, 23 Nov 2010 14:24:18 +0000 > Michael Foord wrote: >> Well, for backwards compatibility reasons the new constants would have >> to *behave* like the old ones (including having the same underlying >> value and comparing equal to it). >> >> In many cases it is *likely* that subclassing int is a better way of >> achieving that. Actually looking through the standard library to >> evaluate it is the only way of confirming that. >> >> Another API, that reduces the duplication of creating the enum and >> setting the names, could be something like: >> >> make_enums("Names", "NAME_ONE NAME_TWO NAME_THREE", base_type=int, >> module=__name__) >> >> Using __name__ we can set the module globals in the call to make_enums. > I don't understand why people insist on calling that an "enum". enum is > a C legacy and it doesn't bring anything useful as I can tell. Instead, > just assign the values explicitly. > enum isn't only in C. (They are in C# as well at least.) Wikipedia links enum to "enumerated type" and says: an enumerated type (also called enumeration or enum) is a data type consisting of a set of named values It sounds entirely appropriate. I have no problem with explicitly assigning values instead of doing it automagically. All the best, Michael > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ From stephen at xemacs.org Tue Nov 23 16:00:22 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 24 Nov 2010 00:00:22 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEA5744.3080308@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> Message-ID: <8762voccft.fsf@uwakimon.sk.tsukuba.ac.jp> If you don't care about the ISO standard, but only about Python, Martin's right, I was wrong. You can stop reading now. "Martin v. L?wis" writes: > I could only find the FCD of 10646:2010, where annex H was integrated > into section 10: Thank you for the reference. I referred to two older versions, 10646-1:1993 (for the annexes and Amendment, and my basic understanding) and 10646:2003 (for the detailed definition of UCS-2 in Sections 7, 8 and 13; unfortunately, I missed the most important detail, which is in Section 9). In :2003 the Annex I referred to as "Annex H" is Annex J, and "Annex Q" is partly in Section 9.1 and mostly in Annex C. I don't know where the former is in the 2010 FCD, and the latter is section 9.2. > I think they are now acknowledging that UCS-2 was a misleading term, > making it ambiguous whether this refers to a CCS, a CEF, or a CES; > like "ASCII", people have been using it for all three of them. In :1993 it wasn't ambiguous, they simply didn't make those distinctions. They were not needed for ISO 10646's published versions, although they certainly are for Unicode. Now, quite clearly, the ISO has *changed the definition* in every new version, progressively adding new restrictions that go beyond clarifying ambiguity. But even in :2003, in view of 4.2, 6.2, 6.3, and 13.1, UCS-2 is clearly well-defined as a CM according to UTR#17, which can probably be identified with CCS in :2003 terminology. Ie, returning to UTR#17 terminology, it is the composition of a CES, a CEF, and a CCS, which are not defined individually. Note: The definition of "coded character" changed between :2003 and the 2010 FCD, from "character with representation" to "character with integer". There is a NOTE indicating that 16-bit integers may be used in processing. Given that this is a non-normative note, I take it to mean that in an array of 16-bit integers, "most significant octet" is to be interpreted in the natural way for the architecture rather than by the representation in memory, which might be little-endian. IMO it's unnatural to think that that changes the definition of UCS-2 to be either a CEF, or a composition of a CEF and a CCS. > Apparently, the ISO WG interprets earlier revisions as saying that > UCS-2 is a CEF that restricted UTF-16 to the BMP. I think that ISO 10646-1:1993 admits only one interpretation, a CM restricted to the BMP (including surrogates), and ISO 10646:2003 admits only one interpretation, a CM restricted to the BMP (not including surrogates). The note under Table 4 on p.24 of the FCD is, uh, well, a lie. Earlier versions certainly did not restrict to "scalar values"; they had no such concept. > THIS IS NOT WHAT PYTHON DOES. Well, no shit, Sherlock. You don't have to yell at me, I know what Python does. The question is, is what does UCS-2 do? The answer is that in :1993, AFAICT it did what Python does. In :2003, they added (last sentence, section 9.1): UCS-2 cannot be used to represent any characters on the supplementary planes. I assume they maintain that position in 2010, so End Of Thread. I apologize for missing that when I was reviewing the standard earlier, but I expected restrictions on UCS-2 to be explained in 13.1 or perhaps 14. And 13.1 simply requires that characters in the BMP be represented by their defined code positions, truncated to two octets. Like earlier versions, it doesn't prohibit use of surrogates or say that non-BMP characters can't be represented. > Not sure what it says in your copy; in mine, section 9.3 says [snip] Mine (:2003) says "NOTE 2 - When confined to the code positions in Planes 00 to 10, UCS-4 is also referred to as UCS Transformation Format 32 (UTF-32)." Then it references the Unicode Standard (v4.0) as the authority for UTF-32. Obviously they continued to be confused at this point in time; by the draft you have, apparently the WG had decided to pretty much completely synchronize the whole standard to a subset of Unicode. This seems pointless to me (unlike, say, the work that has been done on standardizing criteria for repertoire changes). In particular, the :1993 definition of UCS-2 was a perfectly good standard for describing the processing Python actually does internally. The current definition of UCS-2 as identical to the BMP is useless, and good riddance, I'm perfectly happy to have them deprecate it. From solipsis at pitrou.net Tue Nov 23 16:01:06 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 16:01:06 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> Message-ID: <1290524466.3642.4.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 08:52 -0600, Benjamin Peterson a ?crit : > 2010/11/23 Antoine Pitrou : > > On Tue, 23 Nov 2010 14:24:18 +0000 > > Michael Foord wrote: > >> Well, for backwards compatibility reasons the new constants would have > >> to *behave* like the old ones (including having the same underlying > >> value and comparing equal to it). > >> > >> In many cases it is *likely* that subclassing int is a better way of > >> achieving that. Actually looking through the standard library to > >> evaluate it is the only way of confirming that. > >> > >> Another API, that reduces the duplication of creating the enum and > >> setting the names, could be something like: > >> > >> make_enums("Names", "NAME_ONE NAME_TWO NAME_THREE", base_type=int, > >> module=__name__) > >> > >> Using __name__ we can set the module globals in the call to make_enums. > > > > I don't understand why people insist on calling that an "enum". enum is > > a C legacy and it doesn't bring anything useful as I can tell. Instead, > > just assign the values explicitly. > > The concept of a "enumeration" of values is still useful outside its > stunted C incarnation. Well, it is easy to assign range(N) to a tuple of names when desired. I don't think an automatically-enumerating constant generator is needed. Regards Antoine. From solipsis at pitrou.net Tue Nov 23 16:01:59 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 16:01:59 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEBD624.9000402@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <4CEBD624.9000402@voidspace.org.uk> Message-ID: <1290524519.3642.5.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 14:56 +0000, Michael Foord a ?crit : > On 23/11/2010 14:42, Antoine Pitrou wrote: > > On Tue, 23 Nov 2010 14:24:18 +0000 > > Michael Foord wrote: > >> Well, for backwards compatibility reasons the new constants would have > >> to *behave* like the old ones (including having the same underlying > >> value and comparing equal to it). > >> > >> In many cases it is *likely* that subclassing int is a better way of > >> achieving that. Actually looking through the standard library to > >> evaluate it is the only way of confirming that. > >> > >> Another API, that reduces the duplication of creating the enum and > >> setting the names, could be something like: > >> > >> make_enums("Names", "NAME_ONE NAME_TWO NAME_THREE", base_type=int, > >> module=__name__) > >> > >> Using __name__ we can set the module globals in the call to make_enums. > > I don't understand why people insist on calling that an "enum". enum is > > a C legacy and it doesn't bring anything useful as I can tell. Instead, > > just assign the values explicitly. > > > > enum isn't only in C. (They are in C# as well at least.) Well, it's been inherited by C-like languages, no doubt. Like braces and semicolumns :) Regards Antoine. From solipsis at pitrou.net Tue Nov 23 15:59:59 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 15:59:59 +0100 Subject: [Python-Dev] r86699 - python/branches/py3k/Lib/zipfile.py In-Reply-To: References: <20101122233126.C8BDBEE981@mail.python.org> <66720F75-169A-4702-AF53-69845701AA55@langa.pl> <20101123150257.76a423ad@pitrou.net> Message-ID: <1290524399.3642.3.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 08:49 -0600, Benjamin Peterson a ?crit : > 2010/11/23 Antoine Pitrou : > > On Mon, 22 Nov 2010 22:00:08 -0600 > > Benjamin Peterson wrote: > >> 2010/11/22 ?ukasz Langa : > >> > Wiadomo?? napisana przez Benjamin Peterson w dniu 2010-11-23, o godz. 00:47: > >> > > >> > No test? > >> > > >> > > >> > The tests were there already, raising ResourceWarnings. After this change, > >> > they stopped doing that. You may say: now they pass for the first time :) > >> > >> It looks like you added new API, though. For that, we would expect new tests. > > > > It's an internal API, although ZipExtFile doesn't begin with an > > underscore. > > Why is it internal API then? Because it's for use by ZipFile.open(). The ZipExtFile constructor is not supposed to be called by the user. You might instead asked why ZipExtFile isn't called _ZipExtFile instead, and I have no idea. Regards Antoine. From fuzzyman at voidspace.org.uk Tue Nov 23 16:15:29 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 23 Nov 2010 15:15:29 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290524466.3642.4.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> Message-ID: <4CEBDA91.4050205@voidspace.org.uk> On 23/11/2010 15:01, Antoine Pitrou wrote: > Le mardi 23 novembre 2010 ? 08:52 -0600, Benjamin Peterson a ?crit : >> 2010/11/23 Antoine Pitrou : >>> On Tue, 23 Nov 2010 14:24:18 +0000 >>> Michael Foord wrote: >>>> Well, for backwards compatibility reasons the new constants would have >>>> to *behave* like the old ones (including having the same underlying >>>> value and comparing equal to it). >>>> >>>> In many cases it is *likely* that subclassing int is a better way of >>>> achieving that. Actually looking through the standard library to >>>> evaluate it is the only way of confirming that. >>>> >>>> Another API, that reduces the duplication of creating the enum and >>>> setting the names, could be something like: >>>> >>>> make_enums("Names", "NAME_ONE NAME_TWO NAME_THREE", base_type=int, >>>> module=__name__) >>>> >>>> Using __name__ we can set the module globals in the call to make_enums. >>> I don't understand why people insist on calling that an "enum". enum is >>> a C legacy and it doesn't bring anything useful as I can tell. Instead, >>> just assign the values explicitly. >> The concept of a "enumeration" of values is still useful outside its >> stunted C incarnation. > Well, it is easy to assign range(N) to a tuple of names when desired. I > don't think an automatically-enumerating constant generator is needed. > Right, and that is current practise. It has the disadvantage (that you seemed to acknowledge) that when debugging the integer values are seen instead of something with a useful repr. Having a *simple* class (and API to create them) that produces named constants with a useful repr, is what we are discussing, and that seems awfully like an enum (in the general sense not in a C specific sense). For backwards compatibility these constants, where they replace integer constants, would need to be integer subclasses with the same behaviour. Like the Qt example you appreciated so much. ;-) There are still two reasonable APIs (unless you have changed your mind and think that sticking with plain integers is best), of which I prefer the latter: SOME_CONST = Constant('SOME_CONST', 1) OTHER_CONST = Constant('OTHER_CONST', 2) or: Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1) SOME_CONST = Constants.SOME_CONST OTHER_CONST = Constants.OTHER_CONST (Well, there is a third option that takes __name__ and sets the constants in the module automagically. I can understand why people would dislike that though.) All the best, Michael Foord Michael > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ From solipsis at pitrou.net Tue Nov 23 16:30:53 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 16:30:53 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEBDA91.4050205@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> Message-ID: <1290526253.3642.9.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 15:15 +0000, Michael Foord a ?crit : > There are still two reasonable APIs (unless you have changed your mind > and think that sticking with plain integers is best), of which I prefer > the latter: > > SOME_CONST = Constant('SOME_CONST', 1) > OTHER_CONST = Constant('OTHER_CONST', 2) > > or: > > Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1) Or: Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', values=range(1, 3)) Again, auto-enumeration is useless since it's trivial to achieve explicitly. Regards Antoine. From fuzzyman at voidspace.org.uk Tue Nov 23 16:40:28 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 23 Nov 2010 15:40:28 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290526253.3642.9.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> Message-ID: <4CEBE06C.9030101@voidspace.org.uk> On 23/11/2010 15:30, Antoine Pitrou wrote: > Le mardi 23 novembre 2010 ? 15:15 +0000, Michael Foord a ?crit : >> There are still two reasonable APIs (unless you have changed your mind >> and think that sticking with plain integers is best), of which I prefer >> the latter: >> >> SOME_CONST = Constant('SOME_CONST', 1) >> OTHER_CONST = Constant('OTHER_CONST', 2) >> >> or: >> >> Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1) > Or: > > Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', > values=range(1, 3)) > > Again, auto-enumeration is useless since it's trivial to achieve > explicitly. Ah, I see. It is the auto-enumeration you disliked. Sure - not a problem. I think the step that Nick described, of evaluating places in the standard library that this could be used, is a good one. I'll try to get around to it and perhaps attempt to resuscitate the PEP. (Any suggestions as to an appropriate module if having it live in its own module is still an objection?) Michael > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From solipsis at pitrou.net Tue Nov 23 17:05:19 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 17:05:19 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEBE06C.9030101@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> Message-ID: <1290528319.3642.11.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 15:40 +0000, Michael Foord a ?crit : > On 23/11/2010 15:30, Antoine Pitrou wrote: > > Le mardi 23 novembre 2010 ? 15:15 +0000, Michael Foord a ?crit : > >> There are still two reasonable APIs (unless you have changed your mind > >> and think that sticking with plain integers is best), of which I prefer > >> the latter: > >> > >> SOME_CONST = Constant('SOME_CONST', 1) > >> OTHER_CONST = Constant('OTHER_CONST', 2) > >> > >> or: > >> > >> Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1) > > Or: > > > > Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', > > values=range(1, 3)) > > > > Again, auto-enumeration is useless since it's trivial to achieve > > explicitly. > > Ah, I see. It is the auto-enumeration you disliked. Sure - not a problem. > > I think the step that Nick described, of evaluating places in the > standard library that this could be used, is a good one. I'll try to get > around to it and perhaps attempt to resuscitate the PEP. (Any > suggestions as to an appropriate module if having it live in its own > module is still an objection?) We already have a bunch of bizarrely unrelated stuff in collections (such as Callable), so we could put enum there too. Regards Antoine. From fuzzyman at voidspace.org.uk Tue Nov 23 17:07:30 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 23 Nov 2010 16:07:30 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290528319.3642.11.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> Message-ID: <4CEBE6C2.1070204@voidspace.org.uk> On 23/11/2010 16:05, Antoine Pitrou wrote: > Le mardi 23 novembre 2010 ? 15:40 +0000, Michael Foord a ?crit : >> On 23/11/2010 15:30, Antoine Pitrou wrote: >>> Le mardi 23 novembre 2010 ? 15:15 +0000, Michael Foord a ?crit : >>>> There are still two reasonable APIs (unless you have changed your mind >>>> and think that sticking with plain integers is best), of which I prefer >>>> the latter: >>>> >>>> SOME_CONST = Constant('SOME_CONST', 1) >>>> OTHER_CONST = Constant('OTHER_CONST', 2) >>>> >>>> or: >>>> >>>> Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1) >>> Or: >>> >>> Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', >>> values=range(1, 3)) >>> >>> Again, auto-enumeration is useless since it's trivial to achieve >>> explicitly. >> Ah, I see. It is the auto-enumeration you disliked. Sure - not a problem. >> >> I think the step that Nick described, of evaluating places in the >> standard library that this could be used, is a good one. I'll try to get >> around to it and perhaps attempt to resuscitate the PEP. (Any >> suggestions as to an appropriate module if having it live in its own >> module is still an objection?) > We already have a bunch of bizarrely unrelated stuff in collections > (such as Callable), so we could put enum there too. > I guess it creates collections of constants... Michael > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From Ben.Cottrell at nominum.com Tue Nov 23 16:37:43 2010 From: Ben.Cottrell at nominum.com (Ben.Cottrell at nominum.com) Date: Tue, 23 Nov 2010 07:37:43 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: Your message of "Tue, 23 Nov 2010 15:15:29 GMT." <4CEBDA91.4050205@voidspace.org.uk> Message-ID: <20101123153743.3D9451B8ED4@shell-too.nominum.com> On Tue, 23 Nov 2010 15:15:29 +0000, Michael Foord wrote: > There are still two reasonable APIs (unless you have changed your mind > and think that sticking with plain integers is best), of which I prefer > the latter: > > SOME_CONST = Constant('SOME_CONST', 1) > OTHER_CONST = Constant('OTHER_CONST', 2) > > or: > > Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1) > SOME_CONST = Constants.SOME_CONST > OTHER_CONST = Constants.OTHER_CONST I prefer the latter too, because that makes it possible to have 'Constants' be a rendezvous point for making sure that you're passing something valid. Perhaps using 'in': def func(foo): if foo not in Constants: raise ValueError('foo must be SOME_CONST or OTHER_CONST') ... I know this is probably not going to happen, but I would *so much* like it if functions would start rejecting "the wrong kind of 2". Constants that are valid, integer-wise, but which aren't part of the set of constants allowed for that argument. I'd prefer not to think of the number of times I've made the following mistake: s = socket.socket(socket.SOCK_DGRAM, socket.AF_INET) ~Ben From turnbull at sk.tsukuba.ac.jp Tue Nov 23 17:16:55 2010 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 24 Nov 2010 01:16:55 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEA527B.4030002@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE96A40.1050705@v.loewis.de> <87ipzqc4gi.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA27EB.8000104@v.loewis.de> <87fwutd49e.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA527B.4030002@v.loewis.de> Message-ID: <871v6cc8w8.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > I disagree: Quoting from Unicode 5.0, section 5.4: > > # The individual components of implementations may have different > # levels of support for surrogates, as long as those components are > # assembled and communicate correctly. "Assembly" is the problem. If chr() or a slice creates a lone surrogate and surrogateescape passes it back out, Python as a whole is non-conforming. Technically, you can hide behind "none of slicing, chr(), or surrogateescape promises to conform", and maybe that would fly to a standards lawyer; I'd have to see the precise statement. Here's a more convincing example. A user specifies "utf8" as her locale charset. Then she specifies a string containing a non-BMP character as the "description" of a file, and internal code munges this via slicing into a file name conforming to some specification (eg, length limit + uniquifier if needed). Then if the non-BMP character is in the "right" place, she will get either a broken file name, which will either get written to disk or raise an exception, depending on whether the munging program has enabled surrogateescape or not. I claim both of those results are non-conforming to the specification of UTF-16, and therefore Python Unicode processing as a whole must be considered non-conforming. It's still pretty damn good. But I've elaborated that point elsewhere. > The rationale for supporting these characters in chr() goes back much > further than the surrogateescape handler - as Python unicode strings > are sequences of code points, it would be impractical if you couldn't > create some of them, or even would have to consult the UCD before > determining whether they can be created. The Zen is irrelevant to determining conformance to Unicode, which has its own Zen. From stephen at xemacs.org Tue Nov 23 17:18:57 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 24 Nov 2010 01:18:57 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> Message-ID: <87zkt0au8e.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > For practical purposes, UCS2/UCS4 convey far more inherent information > than narrow/wide: That was my stance, but in fact (1) the ISO JTC1/SC2 has deliberately made them ambiguous by changing their definitions over the years[1], and (2) the more recent definitions and "interpretations" of UCS-2 *prohibit* use of surrogates in UCS-2 as far as I can tell. And that's what you'll see everywhere you look, because Wikipedia and friends pick up the most recent versions of everything. > So don't just think about "what will developers know?", also think > about "what will developers know, and what will a quick trip to a > search engine tell them?". It will tell them that UCS-2 cannot even *express* non-BMP characters. Terry and David are *not* dummies, and that's what they got from more or less careful study of the issue. > And once you take that stance, the overly > generic narrow/wide terms fail, badly. I still agree that something more accurate would be nice, but face it: the ISO will redefine and deprecate such terms as soon as they notice us using them. > +1 for MAL's suggested tweaks to the Py3k configure options. Despite my natural sympathy for your arguments, and MAL's, I'm still -1. I really wish I could switch back, but it seems to me that "UCS-2" is a liability we don't need, *especially* on Windows where the default build is presumably going to be UCS2 forever. Footnotes: [1] You'd think it would be hard to change the definition of UCS-4, but they managed. :-( From fuzzyman at voidspace.org.uk Tue Nov 23 17:19:16 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 23 Nov 2010 16:19:16 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123153743.3D9451B8ED4@shell-too.nominum.com> References: <20101123153743.3D9451B8ED4@shell-too.nominum.com> Message-ID: <4CEBE984.4050807@voidspace.org.uk> On 23/11/2010 15:37, Ben.Cottrell at nominum.com wrote: > On Tue, 23 Nov 2010 15:15:29 +0000, Michael Foord wrote: >> There are still two reasonable APIs (unless you have changed your mind >> and think that sticking with plain integers is best), of which I prefer >> the latter: >> >> SOME_CONST = Constant('SOME_CONST', 1) >> OTHER_CONST = Constant('OTHER_CONST', 2) >> >> or: >> >> Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1) >> SOME_CONST = Constants.SOME_CONST >> OTHER_CONST = Constants.OTHER_CONST > I prefer the latter too, because that makes it possible to have > 'Constants' be a rendezvous point for making sure that you're > passing something valid. Perhaps using 'in': > > def func(foo): > if foo not in Constants: > raise ValueError('foo must be SOME_CONST or OTHER_CONST') > ... > > I know this is probably not going to happen, but I would *so much* > like it if functions would start rejecting "the wrong kind of 2". > Constants that are valid, integer-wise, but which aren't part of > the set of constants allowed for that argument. I'd prefer not to > think of the number of times I've made the following mistake: > > s = socket.socket(socket.SOCK_DGRAM, socket.AF_INET) Well it would be perfectly possible for the __contains__ method (on the metaclass so that a Constants class can act as a container) to permit a *raw integer* (to be backwards compatible with code using hard coded values) but not permit other constants that aren't valid. Code that is *deliberately* using the wrong constants would be screwed of course... All the best, Michael > ~Ben > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From barry at python.org Tue Nov 23 17:27:03 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 23 Nov 2010 11:27:03 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEBC6BD.9060402@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> Message-ID: <20101123112703.42b42812@mission> On Nov 23, 2010, at 01:50 PM, Michael Foord wrote: >Right. As it happens I just submitted a patch to Barry Warsaw's enum package >(nice), flufl.enum [1], to allow namedtuple style creation of named >constants: Thanks for the plug (and the nice patch). FWIW, the documentation for the package is here: http://packages.python.org/flufl.enum/ I made some explicit decisions about the API and semantics of this package, to fit my own use cases and sensibilities. I guess you wouldn't expect anything else , but I'm willing to acknowledge that others would make different decisions, and certainly the number of existing enum implementations out there proves that there are lots of interesting ways to go about it. That said, there are several things I like about my package: * Enums are not subclassed from ints or strs. They are a distinct data type that can be converted to and from ints and strs. EIBTI. * The typical way to create them is through a simple, but explicit class definition. I personally like being explicit about the item values, and the assignments are required to make the metaclass work properly, but Michael's convenience patch is totally appropriate for cases where you don't care, or you want a one-liner. * Enum items are singletons and are intended to be compared by identity. They can be compared by equality but are not ordered. * Enum items have an unambiguous symbolic repr and a nice human readable str. * Given an enum item, you can get to its enum class, and given the class you can get to the set of items. * Enums can be subclassed (though all items in the subclass must have unique values). In any case it may be that enums are too tied to specific use cases to find a good common ground for the stdlib. I've been using my module for years and if there's interest I would of course be happy to donate it for use in the stdlib. Like the original sets implementation, it makes perfect sense to provide them in a separate module rather than as a built-in type. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Tue Nov 23 17:31:27 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 23 Nov 2010 11:31:27 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEBDA91.4050205@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> Message-ID: <20101123113127.78506cb5@mission> On Nov 23, 2010, at 03:15 PM, Michael Foord wrote: >(Well, there is a third option that takes __name__ and sets the constants in >the module automagically. I can understand why people would dislike that >though.) Personally, I think if you want that, then the explicit class definition is a better way to go. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From pje at telecommunity.com Tue Nov 23 17:52:37 2010 From: pje at telecommunity.com (P.J. Eby) Date: Tue, 23 Nov 2010 11:52:37 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123113127.78506cb5@mission> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <20101123113127.78506cb5@mission> Message-ID: <20101123165252.0C0743A4114@sparrow.telecommunity.com> At 11:31 AM 11/23/2010 -0500, Barry Warsaw wrote: >On Nov 23, 2010, at 03:15 PM, Michael Foord wrote: > > >(Well, there is a third option that takes __name__ and sets the constants in > >the module automagically. I can understand why people would dislike that > >though.) > >Personally, I think if you want that, then the explicit class definition is a >better way to go. This reminds me: a stdlib enum should support proper pickling and copying; i.e.: assert SomeEnum.anEnum is pickle.loads(pickle.dumps(SomeEnum.anEnum)) This could probably be implemented by adding something like: def __reduce__(self): return getattr, (self._class, self._enumname) in the EnumValue class. From fuzzyman at voidspace.org.uk Tue Nov 23 18:02:33 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 23 Nov 2010 17:02:33 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123112703.42b42812@mission> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <20101123112703.42b42812@mission> Message-ID: <4CEBF3A9.3060604@voidspace.org.uk> On 23/11/2010 16:27, Barry Warsaw wrote: > On Nov 23, 2010, at 01:50 PM, Michael Foord wrote: > >> Right. As it happens I just submitted a patch to Barry Warsaw's enum package >> (nice), flufl.enum [1], to allow namedtuple style creation of named >> constants: > Thanks for the plug (and the nice patch). > > FWIW, the documentation for the package is here: > > http://packages.python.org/flufl.enum/ > > I made some explicit decisions about the API and semantics of this package, to > fit my own use cases and sensibilities. I guess you wouldn't expect anything > else , but I'm willing to acknowledge that others would make different > decisions, and certainly the number of existing enum implementations out there > proves that there are lots of interesting ways to go about it. > > That said, there are several things I like about my package: > > * Enums are not subclassed from ints or strs. They are a distinct data type > that can be converted to and from ints and strs. EIBTI. But if we are to use it *in* the standard library (as opposed to merely adding a module *to* the standard library) there are backwards compatibility concerns. Where modules are already using integers for constants then integers still need to work. One easy way to achieve this is to subclass integer. If we don't do that (assuming we decide that putting a solution in the standard library is appropriate) then we'll have to evaluate what we mean by backwards compatible. If the modules that use the constants aren't to change then comparing equal to the underlying value is the minimum (so that the original value can still be used in place of the new named constant). Not sure if you'd be happy to make that change in flufl.enum. > * The typical way to create them is through a simple, but explicit class > definition. I personally like being explicit about the item values, and the > assignments are required to make the metaclass work properly, but Michael's > convenience patch is totally appropriate for cases where you don't care, or > you want a one-liner. If make_enum was to take a set of values to use (as Antoine suggested) I don't see what's un-explicit about it. All the best, Michael > * Enum items are singletons and are intended to be compared by identity. They > can be compared by equality but are not ordered. > > * Enum items have an unambiguous symbolic repr and a nice human readable str. > > * Given an enum item, you can get to its enum class, and given the class you > can get to the set of items. > > * Enums can be subclassed (though all items in the subclass must have unique > values). > > In any case it may be that enums are too tied to specific use cases to find a > good common ground for the stdlib. I've been using my module for years and if > there's interest I would of course be happy to donate it for use in the > stdlib. Like the original sets implementation, it makes perfect sense to > provide them in a separate module rather than as a built-in type. > > -Barry > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Nov 23 18:37:40 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 18:37:40 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> Message-ID: <1290533860.3642.73.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 12:32 -0500, Isaac Morland a ?crit : > On Tue, 23 Nov 2010, Antoine Pitrou wrote: > > > We already have a bunch of bizarrely unrelated stuff in collections > > (such as Callable), so we could put enum there too. > > Why not just "enum" (i.e., "from enum import [...]" or "import > enum.[...]")? Enumerations are one of the basic kinds of types overall > (speaking informally and independent of any specific language) - they > aren't at all exotic. Enumerations aren't a type at all (they have no distinguishing property). > And "Flat is better than nested", after all. Not when it means creating a separate module for every micro-feature. Regards Antoine. From ijmorlan at uwaterloo.ca Tue Nov 23 18:32:15 2010 From: ijmorlan at uwaterloo.ca (Isaac Morland) Date: Tue, 23 Nov 2010 12:32:15 -0500 (EST) Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290528319.3642.11.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> Message-ID: On Tue, 23 Nov 2010, Antoine Pitrou wrote: > We already have a bunch of bizarrely unrelated stuff in collections > (such as Callable), so we could put enum there too. Why not just "enum" (i.e., "from enum import [...]" or "import enum.[...]")? Enumerations are one of the basic kinds of types overall (speaking informally and independent of any specific language) - they aren't at all exotic. And "Flat is better than nested", after all. Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist From ijmorlan at uwaterloo.ca Tue Nov 23 18:50:31 2010 From: ijmorlan at uwaterloo.ca (Isaac Morland) Date: Tue, 23 Nov 2010 12:50:31 -0500 (EST) Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290533860.3642.73.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> Message-ID: On Tue, 23 Nov 2010, Antoine Pitrou wrote: > Le mardi 23 novembre 2010 ? 12:32 -0500, Isaac Morland a ?crit : >> On Tue, 23 Nov 2010, Antoine Pitrou wrote: >> >>> We already have a bunch of bizarrely unrelated stuff in collections >>> (such as Callable), so we could put enum there too. >> >> Why not just "enum" (i.e., "from enum import [...]" or "import >> enum.[...]")? Enumerations are one of the basic kinds of types overall >> (speaking informally and independent of any specific language) - they >> aren't at all exotic. > > Enumerations aren't a type at all (they have no distinguishing > property). Each enumeration is a type (well, OK, not in every language, presumably, but certainly in many languages). The word "basic" is more important than "types" in my sentence - the point is that an enumeration capability is a very common one in a type system, and is very general, not specific to any particular application. >> And "Flat is better than nested", after all. > > Not when it means creating a separate module for every micro-feature. Classes have their own keyword. I don't think it's disproportionate to give enums a top-level module name. Having said that, I understand we're trying to have a not-too-flat module namespace and I can see the sense in putting it in "collections". But I think the idea that enumerations are of very wide applicability and hence deserve a shorter name should be seriously considered. I'll leave it at that, except for: Hey, how about this syntax: enum Colors: red = 0 green = 10 blue (blue gets the value 11) ;-) Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist From fdrake at acm.org Tue Nov 23 18:57:20 2010 From: fdrake at acm.org (Fred Drake) Date: Tue, 23 Nov 2010 12:57:20 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290533860.3642.73.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> Message-ID: On Tue, Nov 23, 2010 at 12:37 PM, Antoine Pitrou wrote: > Enumerations aren't a type at all (they have no distinguishing > property). In any given language, this may be true, or not. Whether they should be distinct in Python is core to the current discussion. >From a backward-compatibility perspective, what makes sense depends on whether they're used to implement existing constants (socket.AF_INET, etc.) or if they reserved for new features only. ? -Fred -- Fred L. Drake, Jr.? ? "A storm broke loose in my mind."? --Albert Einstein From solipsis at pitrou.net Tue Nov 23 19:06:42 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 19:06:42 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> Message-ID: <1290535602.3642.87.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 12:57 -0500, Fred Drake a ?crit : > On Tue, Nov 23, 2010 at 12:37 PM, Antoine Pitrou wrote: > > Enumerations aren't a type at all (they have no distinguishing > > property). > > In any given language, this may be true, or not. Whether they should > be distinct in Python is core to the current discussion. I meant "type" in the structural sense (hence the parenthesis). enums are just auto-generated constants. Since Python makes it trivial to generate sequential integers, there's no need for a specific "enum" construct. Now you may argue that enums should be strongly-typed, but that would be a bit backwards given Python's preference for duck-typing. > From a backward-compatibility perspective, what makes sense depends on > whether they're used to implement existing constants (socket.AF_INET, > etc.) or if they reserved for new features only. It's not only backwards compatibility. New features relying on C APIs have to be able to map constants to the integers used in the C library. It would be much better if this were done naturally rather than through explicit conversion maps. (this really means subclassing int, if we don't want to complicate C-level code) Regards Antoine. From solipsis at pitrou.net Tue Nov 23 19:07:56 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 19:07:56 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> Message-ID: <1290535676.3642.89.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 12:50 -0500, Isaac Morland a ?crit : > Each enumeration is a type (well, OK, not in every language, presumably, > but certainly in many languages). The word "basic" is more important than > "types" in my sentence - the point is that an enumeration capability is a > very common one in a type system, and is very general, not specific to any > particular application. Python already has an enumeration capability. It's called range(). There's nothing else that C enums have. AFAICT, neither do enums in other mainstream languages (assuming they even exist; I don't remember Perl, PHP or Javascript having anything like that, but perhaps I'm mistaken). Regards Antoine. From v+python at g.nevcal.com Tue Nov 23 19:56:20 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 23 Nov 2010 10:56:20 -0800 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: <4CEBABBA.9050002@v.loewis.de> References: <4CEA0246.9080607@g.nevcal.com> <4CEB97C7.1070708@g.nevcal.com> <4CEBABBA.9050002@v.loewis.de> Message-ID: <4CEC0E54.5070101@g.nevcal.com> On 11/23/2010 3:55 AM, "Martin v. L?wis" wrote: > Am 23.11.2010 11:55, schrieb Amaury Forgeot d'Arc: >> Hi, >> >> 2010/11/23 Glenn Linderman : >>> File "C:\Python32\lib\random.py", line 108, in seed >>> a = int.from_bytes(_urandom(32), 'big') >>> WindowsError: [Error -2146893818] Invalid Signature >> In the subprocess documentation http://docs.python.org/library/subprocess.html >> """On Windows, in order to run a side-by-side assembly the specified >> env *must* include a valid SystemRoot.""" > Indeed, setting SystemRoot might solve this problem. According to > > http://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/ > > CrypoAPI, in Windows 7, requires this variable be set. Failure to > find the enhanced crypto provider would explain why the "random" > module of Python fails to work. > > The specific cause is in the registry: > HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Cryptography\Defaults\Provider\Microsoft > Strong Cryptographic Provider has as it's ImagePath value > > %SystemRoot%\system32\rsaenh.dll > > So the registry (and COM) do rely on environment variables. > > Regards, > Martin I find it sad but hilarious that after working so hard to remove the need for environment variables from Windows that M$ has introduced new dependencies on them. I wonder if this particular registry variable is simply an oversight/bug on M$' part, that they will eventually fix, or if it a turnaround toward the use of more environment variables in the future. Hmm. Time will tell, I suppose. I'm unaware of any benefits in _changing_ SystemRoot to other values, so not pre-expanding it in that registry location seems only to add an unnecessary dependency on the environment. Indeed, preserving that one environment variable allows my version of http.server to proceed with, as far as initial testing can determine, proper behavior. Thanks for your help in figuring this out. That was a lot faster than a "binary search" to choose which variable(s) to preserve. My purpose in such testing was two-fold: firstly, web servers, for security purposes, generally limit the number of environment variables that are seen by CGI programs, and secondly, in debugging whether or not http.server was properly setting the necessary environment variables, the many other environment variables were cluttering up log dumps of all environment variables. It will be nicer to limit the "passed through" environment variables to SystemRoot, as see how things go. I have read some about side-by-side assemblies but had considered them a good reason to stick with the outdated M$VC 6.0 compiler, which doesn't seem to need to create them, and their myriad requirements, which seem far from necessary for simply compiling a program. I was disappointed to realize that Python was heading down the path of using the newer tools that create side-by-side assemblies, but I suppose using an old and crufty compiler like M$VC 6.0 cannot support some of the newer features of Windows, which may seem to be necessary to some.... like 64-bit support, which does seem necessary, even to me. I was well aware that shortcuts and the registry _may_ refer to environment variables, and have a number of environment variables of my own which leverage that capability, to avoid hard-coded drive letters and paths in certain areas, and for the convenience of shorting the specification of some of the long-winded path names that Windows foists upon us (some of those have been significantly shortened in Windows 6.1, and maybe 6.0 which I used only for 2 months with disgust; 6.1 has helped alleviate the disgust, but I still recommend XP for people that don't need 64-bit capabilities). -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Tue Nov 23 19:58:37 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 23 Nov 2010 10:58:37 -0800 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: References: <4CEA0246.9080607@g.nevcal.com> <4CEAE6A7.3010902@g.nevcal.com> Message-ID: <4CEC0EDD.5080604@g.nevcal.com> On 11/22/2010 2:56 PM, Tim Lesher wrote: > On Mon, Nov 22, 2010 at 16:54, Glenn Linderman wrote: >> I suppose it is possible that some environment variables are used by Python >> directly (but I can't seem to find a documented list of them) although I >> would expect that usage to be optional, with fall-back defaults when they >> don't exist. > I can verify that that's the case: Python (at least through 3.1.2) > runs fine on Windows platforms when environment variables are > completely unavailable. I know that from running our port for Windows > CE (which has no environment variables at all), cross-compiled for > Windows XP. Is the Windows CE port generally available? From where? The CE ports I have found in past searches seem to have been quite outdated and not much on-going activity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Nov 23 20:11:06 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 23 Nov 2010 14:11:06 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Nov 22, 2010 at 1:13 PM, Raymond Hettinger wrote: .. > Any explanation we give users needs to let them know two things: > * that we cover the entire range of unicode not just BMP > * that sometimes len(chr(i)) is one and sometimes two This discussion motivated me to start looking into how well Python library itself is prepared to deal with len(chr(i)) = 2. I was not surprised to find that textwrap does not handle the issue that well: >>> len(wrap(' \U00010140' * 80, 20)) 12 >>> len(wrap(' \U00000140' * 80, 20)) 8 That module should probably be rewritten to properly implement the Unicode line breaking algorithm . Yet finding a bug in a str object method after a 5 min review was a bit discouraging: >>> 'xyz'.center(20, '\U00010140') Traceback (most recent call last): File " ", line 1, in TypeError: The fill character must be exactly one character long Given the apparent difficulty of writing even basic text processing algorithms in presence of surrogate pairs, I wonder how wise it is to expose Python users to them. As Wikipedia explains, [1] """ Because the most commonly used characters are all in the Basic Multilingual Plane, converting between surrogate pairs and the original values is often not tested thoroughly. This leads to persistent bugs, and potential security holes, even in popular and well-reviewed application software. """ Since UCS-2 (the Character Encoding Form (CEF)) is now defined [1] to cover only BMP, maybe rather than changing the terms used in the reference manual, we should tighten the code to conform to the updated standards? Again, given that the str object itself has at least one non-BMP character bug as we are closing on the third major release of py3k, how likely are 3rd party developers to get their libraries right as they port to 3.x? [1] http://en.wikipedia.org/wiki/UTF-16/UCS-2 [2] http://unicode.org/reports/tr17/#CharacterEncodingForm From amauryfa at gmail.com Tue Nov 23 20:19:28 2010 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 23 Nov 2010 20:19:28 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: 2010/11/23 Alexander Belopolsky : > This discussion motivated me to start looking into how well Python > library itself is prepared to deal with len(chr(i)) = 2. ?I was not > surprised to find that textwrap does not handle the issue that well: > >>>> len(wrap(' \U00010140' * 80, 20)) > 12 >>>> len(wrap(' \U00000140' * 80, 20)) > 8 > > That module should probably be rewritten to properly implement ?the > Unicode line breaking algorithm > . > > Yet finding a bug in a str object method after a 5 min review was a > bit discouraging: > >>>> 'xyz'.center(20, '\U00010140') > Traceback (most recent call last): > ?File " ", line 1, in > TypeError: The fill character must be exactly one character long > > Given the apparent difficulty of writing even basic text processing > algorithms in presence of surrogate pairs, I wonder how wise it is to > expose Python users to them. This was already discussed two years ago: http://mail.python.org/pipermail/python-dev/2008-July/080900.html So yes, wrap() and center() should be fixed. -- Amaury Forgeot d'Arc From janssen at parc.com Tue Nov 23 20:26:57 2010 From: janssen at parc.com (Bill Janssen) Date: Tue, 23 Nov 2010 11:26:57 PST Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> Message-ID: <58396.1290540417@parc.com> Isaac Morland wrote: > On Tue, 23 Nov 2010, Antoine Pitrou wrote: > > > Le mardi 23 novembre 2010 ? 12:32 -0500, Isaac Morland a ?crit : > >> On Tue, 23 Nov 2010, Antoine Pitrou wrote: > >> > >>> We already have a bunch of bizarrely unrelated stuff in collections > >>> (such as Callable), so we could put enum there too. > >> > >> Why not just "enum" (i.e., "from enum import [...]" or "import > >> enum.[...]")? Enumerations are one of the basic kinds of types overall > >> (speaking informally and independent of any specific language) - they > >> aren't at all exotic. > > > > Enumerations aren't a type at all (they have no distinguishing > > property). Not in C, but in some other languages. > Each enumeration is a type (well, OK, not in every language, > presumably, but certainly in many languages). The main purpose of that is to be able to catch type mismatches with static typing, though. Seems kind of pointless for Python. > Classes have their own keyword. I don't think it's disproportionate > to give enums a top-level module name. I do. > Hey, how about this syntax: > > enum Colors: > red = 0 > green = 10 > blue Why not class Color: red = (255, 0, 0) green = (0, 255, 0) blue = (0, 0, 255) Seems to handle the situation OK. Bill From mal at egenix.com Tue Nov 23 20:31:37 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 23 Nov 2010 20:31:37 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEC1699.40700@egenix.com> Alexander Belopolsky wrote: > On Mon, Nov 22, 2010 at 1:13 PM, Raymond Hettinger > wrote: > .. >> Any explanation we give users needs to let them know two things: >> * that we cover the entire range of unicode not just BMP >> * that sometimes len(chr(i)) is one and sometimes two > > This discussion motivated me to start looking into how well Python > library itself is prepared to deal with len(chr(i)) = 2. I was not > surprised to find that textwrap does not handle the issue that well: > >>>> len(wrap(' \U00010140' * 80, 20)) > 12 >>>> len(wrap(' \U00000140' * 80, 20)) > 8 > > That module should probably be rewritten to properly implement the > Unicode line breaking algorithm > . > > Yet finding a bug in a str object method after a 5 min review was a > bit discouraging: > >>>> 'xyz'.center(20, '\U00010140') > Traceback (most recent call last): > File " ", line 1, in > TypeError: The fill character must be exactly one character long > > Given the apparent difficulty of writing even basic text processing > algorithms in presence of surrogate pairs, I wonder how wise it is to > expose Python users to them. What's the alternative ? Without surrogates, Python users with UCS-2 build (e.g. the Windows Python users) would not be allowed to play with non-BMP code points. IMHO, it's better to fix the stdlib. This is a long process, as you can see with the Python3 stdlib evolution, but Python will eventually get there. > As Wikipedia explains, [1] > > """ > Because the most commonly used characters are all in the Basic > Multilingual Plane, converting between surrogate pairs and the > original values is often not tested thoroughly. This leads to > persistent bugs, and potential security holes, even in popular and > well-reviewed application software. > """ > > Since UCS-2 (the Character Encoding Form (CEF)) is now defined [1] to > cover only BMP, maybe rather than changing the terms used in the > reference manual, we should tighten the code to conform to the updated > standards? Can we please stop turning this around over and over again :-) UCS-2 has never supported anything other than the BMP. However, you can interpret sequences of UCS-2 code unit as UTF-16 and then get access to the full Unicode character set. We've been doing this in codecs ever since UCS-4 builds were introduced some 8-9 years ago. The change to have chr(i) return surrogates on UCS-2 builds was perhaps done too early, but then, without such changes you'd never notice that your code doesn't work well with surrogates. It's just one piece of the puzzle when going from 8-bit strings to Unicode. > Again, given that the str object itself has at least one non-BMP > character bug as we are closing on the third major release of py3k, > how likely are 3rd party developers to get their libraries right as > they port to 3.x? > > [1] http://en.wikipedia.org/wiki/UTF-16/UCS-2 > [2] http://unicode.org/reports/tr17/#CharacterEncodingForm -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 23 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From guido at python.org Tue Nov 23 20:34:17 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 23 Nov 2010 11:34:17 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290535602.3642.87.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535602.3642.87.camel@localhost.localdomain> Message-ID: On Tue, Nov 23, 2010 at 10:06 AM, Antoine Pitrou wrote: > Le mardi 23 novembre 2010 ? 12:57 -0500, Fred Drake a ?crit : >> On Tue, Nov 23, 2010 at 12:37 PM, Antoine Pitrou wrote: >> > Enumerations aren't a type at all (they have no distinguishing >> > property). >> >> In any given language, this may be true, or not. ?Whether they should >> be distinct in Python is core to the current discussion. > > I meant "type" in the structural sense (hence the parenthesis). enums > are just auto-generated constants. Since Python makes it trivial to > generate sequential integers, there's no need for a specific "enum" > construct. > > Now you may argue that enums should be strongly-typed, but that would be > a bit backwards given Python's preference for duck-typing. Please take a step back. The best example of the utility of enums even for Python is bool. I resisted this for the longest time but people kept asking for it. Some properties of bool: (a) bool is a (final) subclass of int, and an int is acceptable in a pinch where a bool is expected (b) bool values are guaranteed unique -- there is only one instance with value True, and only one with value False (c) bool values have a str() and repr() that shows their name instead of their value (but not their class -- that's rarely an issue, and makes the output more compact) I think it makes sense to add a way to the stdlib to add other types like bool. I think (c) is probably the most important feature, followed by (a) -- except the *final* part: I want to subclass enums. (b) is probably easy to do but I don't think it matters that much in practice. >> From a backward-compatibility perspective, what makes sense depends on >> whether they're used to implement existing constants (socket.AF_INET, >> etc.) or if they reserved for new features only. > > It's not only backwards compatibility. New features relying on C APIs > have to be able to map constants to the integers used in the C library. > It would be much better if this were done naturally rather than through > explicit conversion maps. I'm not sure what you mean here. Can you give an example of what you mean? I agree that it should be possible to make pretty much any constant in the OS modules enums -- even if the values vary across platforms. > (this really means subclassing int, if we don't want to complicate > C-level code) Right. FWIW I don't think I'm particular about the exact API to construct a new enum type in Python code; I think in most cases explicitly assigning values is fine. Often the values are constrained by something external anyway; it should be easy to dynamically set the values of a particular enum type (even add new values after the fact). There might also be enums with the same value (even though the mapping from int to enum will then have to pick one). I expect that the API to convert between enums and bare ints should be i = int(e) and e = (i). It would be nice if s = str(e) and e = (s) would work too. -- --Guido van Rossum (python.org/~guido) From barry at python.org Tue Nov 23 20:40:45 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 23 Nov 2010 14:40:45 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> Message-ID: <20101123144045.17b00ac4@mission> On Nov 23, 2010, at 12:57 PM, Fred Drake wrote: >>From a backward-compatibility perspective, what makes sense depends on >whether they're used to implement existing constants (socket.AF_INET, >etc.) or if they reserved for new features only. As is usually the case, there's little reason to change existing working code. Enums can be used whenever a module or API is updated. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Tue Nov 23 20:47:47 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 23 Nov 2010 14:47:47 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEBF3A9.3060604@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <20101123112703.42b42812@mission> <4CEBF3A9.3060604@voidspace.org.uk> Message-ID: <20101123144747.44a2f4c9@mission> On Nov 23, 2010, at 05:02 PM, Michael Foord wrote: >> * Enums are not subclassed from ints or strs. They are a distinct data type >> that can be converted to and from ints and strs. EIBTI. > >But if we are to use it *in* the standard library (as opposed to merely >adding a module *to* the standard library) there are backwards compatibility >concerns. Where modules are already using integers for constants then >integers still need to work. Is int(enum_value) enough, or must the enum value actually *be* an int? >One easy way to achieve this is to subclass integer. If we don't do that >(assuming we decide that putting a solution in the standard library is >appropriate) then we'll have to evaluate what we mean by backwards >compatible. If the modules that use the constants aren't to change then >comparing equal to the underlying value is the minimum (so that the original >value can still be used in place of the new named constant). Not sure if >you'd be happy to make that change in flufl.enum. I'm not sure either. In flufl.enum enum_class(i) also works as expected. >> * The typical way to create them is through a simple, but explicit class >> definition. I personally like being explicit about the item values, and >> the assignments are required to make the metaclass work properly, but >> Michael's convenience patch is totally appropriate for cases where you >> don't care, or you want a one-liner. > >If make_enum was to take a set of values to use (as Antoine suggested) I >don't see what's un-explicit about it. When I saw your patch I immediately thought that I could add a default argument that was something like `int_iter`, i.e. an iterator of integers for the values in the string. I suspect YAGNI, which is why I didn't just add it, but I'm not totally opposed to it. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Tue Nov 23 21:01:02 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 23 Nov 2010 15:01:02 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123165252.0C0743A4114@sparrow.telecommunity.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <20101123113127.78506cb5@mission> <20101123165252.0C0743A4114@sparrow.telecommunity.com> Message-ID: <20101123150102.75f6256c@mission> On Nov 23, 2010, at 11:52 AM, P.J. Eby wrote: >This reminds me: a stdlib enum should support proper pickling and copying; >i.e.: > > assert SomeEnum.anEnum is pickle.loads(pickle.dumps(SomeEnum.anEnum)) > >This could probably be implemented by adding something like: > > def __reduce__(self): > return getattr, (self._class, self._enumname) > >in the EnumValue class. Excellent idea, thanks. Added to flufl.enum in r38. However, only enums created with the class syntax can be pickled though. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From guido at python.org Tue Nov 23 21:00:51 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 23 Nov 2010 12:00:51 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123144747.44a2f4c9@mission> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <20101123112703.42b42812@mission> <4CEBF3A9.3060604@voidspace.org.uk> <20101123144747.44a2f4c9@mission> Message-ID: On Tue, Nov 23, 2010 at 11:47 AM, Barry Warsaw wrote: > On Nov 23, 2010, at 05:02 PM, Michael Foord wrote: > >>> * Enums are not subclassed from ints or strs. ?They are a distinct data type >>> ? ?that can be converted to and from ints and strs. ?EIBTI. >> >>But if we are to use it *in* the standard library (as opposed to merely >>adding a module *to* the standard library) there are backwards compatibility >>concerns. Where modules are already using integers for constants then >>integers still need to work. > > Is int(enum_value) enough, or must the enum value actually *be* an int? I vote for *be*, following bool's example. >>One easy way to achieve this is to subclass integer. If we don't do that >>(assuming we decide that putting a solution in the standard library is >>appropriate) then we'll have to evaluate what we mean by backwards >>compatible. If the modules that use the constants aren't to change then >>comparing equal to the underlying value is the minimum (so that the original >>value can still be used in place of the new named constant). Not sure if >>you'd be happy to make that change in flufl.enum. > > I'm not sure either. ?In flufl.enum enum_class(i) also works as expected. > >>> * The typical way to create them is through a simple, but explicit class >>> ? ?definition. ?I personally like being explicit about the item values, and >>> ? ?the assignments are required to make the metaclass work properly, but >>> ? ?Michael's convenience patch is totally appropriate for cases where you >>> ? ?don't care, or you want a one-liner. >> >>If make_enum was to take a set of values to use (as Antoine suggested) I >>don't see what's un-explicit about it. > > When I saw your patch I immediately thought that I could add a default > argument that was something like `int_iter`, i.e. an iterator of integers for > the values in the string. ?I suspect YAGNI, which is why I didn't just add it, > but I'm not totally opposed to it. > > -Barry > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > > -- --Guido van Rossum (python.org/~guido) From jcea at jcea.es Tue Nov 23 21:33:02 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 23 Nov 2010 21:33:02 +0100 Subject: [Python-Dev] Sporadic problems with bugs.python.org Message-ID: <4CEC24FE.70107@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Happen to me last Sunday, and happening just now. I can access http://bugs.python.org/ just fine, but trying to post a message, open a new bug, change nosy, etc., takes a LONG time (minutes) and it is finally failing with a "400 Bad Request" error: """ Bad Request Your browser sent a request that this server could not understand. Apache/2.2.9 (Debian) mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g mod_wsgi/2.5 Server at bugs.python.org Port 80 """ Last sunday I was able to open the bug after a time. Today I have been retrying for while, with no luck yet. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOwk/plgi5GaxT1NAQJYuQP+LhEUtOXyaz0Ut6586/cwura87jq/XVxn XatNzwadYNH4yF3ewXVkLk6eSjXOnEszr8kWX3inoLY9ND7o3TCMn5uCKOF2G4Lh sgogv7eB5KEffAaXoxZxT+ZJVYBEPyUISgMeD40DL/tQJIcMBtyZtU1nY5QxwPzN O8mGHBlEGpQ= =i/s7 -----END PGP SIGNATURE----- From martin at v.loewis.de Tue Nov 23 21:33:19 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 23 Nov 2010 21:33:19 +0100 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: <4CEC0E54.5070101@g.nevcal.com> References: <4CEA0246.9080607@g.nevcal.com> <4CEB97C7.1070708@g.nevcal.com> <4CEBABBA.9050002@v.loewis.de> <4CEC0E54.5070101@g.nevcal.com> Message-ID: <4CEC250F.6060102@v.loewis.de> > I have read some about side-by-side assemblies but had considered them a > good reason to stick with the outdated M$VC 6.0 compiler, which doesn't > seem to need to create them, and their myriad requirements, which seem > far from necessary for simply compiling a program. I was disappointed > to realize that Python was heading down the path of using the newer > tools that create side-by-side assemblies, but I suppose using an old > and crufty compiler like M$VC 6.0 cannot support some of the newer > features of Windows, which may seem to be necessary to some.... like > 64-bit support, which does seem necessary, even to me. The rationale for moving along with the releases is different, though: you cannot obtain the old versions anymore, except perhaps on Ebay. So new developers coming to Python would not be able to build Python extensions if we didn't always try to use a compiler that is still available (and we are stressing that a little bit: 3.2 will use VS 2008, even though it has been already superceded). In any case, VS 2010 will stop using SxS for the CRT. Regards, Martin From v+python at g.nevcal.com Tue Nov 23 21:42:40 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 23 Nov 2010 12:42:40 -0800 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: <4CEC250F.6060102@v.loewis.de> References: <4CEA0246.9080607@g.nevcal.com> <4CEB97C7.1070708@g.nevcal.com> <4CEBABBA.9050002@v.loewis.de> <4CEC0E54.5070101@g.nevcal.com> <4CEC250F.6060102@v.loewis.de> Message-ID: <4CEC2740.7@g.nevcal.com> On 11/23/2010 12:33 PM, "Martin v. L?wis" wrote: > In any case, VS 2010 will stop using SxS for the CRT. Good news! Maybe M$VC will become a useful compiler yet again :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Tue Nov 23 21:43:05 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 23 Nov 2010 12:43:05 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535602.3642.87.camel@localhost.localdomain> Message-ID: <4CEC2759.40203@g.nevcal.com> On 11/23/2010 11:34 AM, Guido van Rossum wrote: > The best example of the utility of enums even for Python is bool. I > resisted this for the longest time but people kept asking for it. Some > properties of bool: > > (a) bool is a (final) subclass of int, and an int is acceptable in a > pinch where a bool is expected > (b) bool values are guaranteed unique -- there is only one instance > with value True, and only one with value False > (c) bool values have a str() and repr() that shows their name instead > of their value (but not their class -- that's rarely an issue, and > makes the output more compact) > > I think it makes sense to add a way to the stdlib to add other types > like bool. I think (c) is probably the most important feature, > followed by (a) -- except the *final* part: I want to subclass enums. > (b) is probably easy to do but I don't think it matters that much in > practice. I was concerned about uniqueness constraints some were touting. While that can be a useful property for some enumerations, it can also be convenient for other enumerations to have multiple names map to the same value. Bool seems appropriately not extensible to additional values. While there are tri-valued (and other) logic systems, they deserve a separate namespace. Bool seems to be an example, then of a "set of distingushed names, with values associated to the names", and is restricted to [two] [unique] integer values. C/C++/C# enum is somewhat like that, and is also restricted to integer values [not necessarily unique]. I wonder if a set of distinguished names need to be restricted to integer values to be useful, although I have no doubt that distinguished names with integer values are useful. Someone used an example of color names class having RGB tuple values, which is a counter example to a restriction to integers. I can think of others as well. Perhaps a "set of distinguished names, with values associated to the names" is really a dict, with the unique names restricted to Python identifier syntax (to be useful), and the values unrestricted. The type of the named value, and the value of the named value, seem not to need to be restricted. But the implementations Bool = dict('False': 0, 'True': 1) or alternately class Bool(): self.False = 0 self.True = 1 is missing a couple characteristics of Python's present bool: the names are not special, and the values are not immutable. Perhaps games could be played to make the second implementation effectively immutable. So I think the real trick of the "enum" (or a generalized "distinguished names") is in the naming. A technique to import the keys that are legal Python identifiers from a dict into a namespace, and retain henceforth immutable values for those names would permit the syntactical usage that people are accustomed to from the C/C++/C# enum, but with extended ranges and types of values, and it seems Bool could be mostly reimplemented via that technique. What is still missing? The "debugging" help: the values, once imported, should not become "just" values of their type, but rather a new type of value, that has an associated name (and type, I think). Whatever magic is worked under the covers to make sure that there is just one True and just one False, so that they can be distinguished from the values 1 and 0, and so reported, should also be applied to these values. So there need not be new syntax for creating the name/value pairs; just use dict. The only new API would be the code that "imports" the dict into the local namespace. Note that other scoped definitions of True and False are not possible today because True and False are keywords. It would be inappropriate to define these distinguished names as all being keywords, so it seems like one could still override the names, even once defined, but such overridden names would lose their special value that makes them a distinguished name. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Nov 23 21:48:43 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 21:48:43 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535602.3642.87.camel@localhost.localdomain> Message-ID: <1290545323.3642.101.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 11:34 -0800, Guido van Rossum a ?crit : > >> From a backward-compatibility perspective, what makes sense depends on > >> whether they're used to implement existing constants (socket.AF_INET, > >> etc.) or if they reserved for new features only. > > > > It's not only backwards compatibility. New features relying on C APIs > > have to be able to map constants to the integers used in the C library. > > It would be much better if this were done naturally rather than through > > explicit conversion maps. > > I'm not sure what you mean here. Can you give an example of what you > mean? I agree that it should be possible to make pretty much any > constant in the OS modules enums -- even if the values vary across > platforms. I mean that PyArg_ParseTuple should continue to be pratical even if e.g. os.SEEK_SET and friends become named constants. It implies that the various format codes such as "i", "l", etc. are still usable with those constants. Hence: > > (this really means subclassing int, if we don't want to complicate > > C-level code) > > Right. :-) Regards Antoine. From rrr at ronadam.com Tue Nov 23 22:03:21 2010 From: rrr at ronadam.com (Ron Adam) Date: Tue, 23 Nov 2010 15:03:21 -0600 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290535676.3642.89.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535676.3642.89.camel@localhost.localdomain> Message-ID: On 11/23/2010 12:07 PM, Antoine Pitrou wrote: > Le mardi 23 novembre 2010 ? 12:50 -0500, Isaac Morland a ?crit : >> Each enumeration is a type (well, OK, not in every language, presumably, >> but certainly in many languages). The word "basic" is more important than >> "types" in my sentence - the point is that an enumeration capability is a >> very common one in a type system, and is very general, not specific to any >> particular application. > > Python already has an enumeration capability. It's called range(). > There's nothing else that C enums have. AFAICT, neither do enums in > other mainstream languages (assuming they even exist; I don't remember > Perl, PHP or Javascript having anything like that, but perhaps I'm > mistaken). Aren't we forgetting enumerate? >>> colors = 'BLACK BROWN RED ORANGE YELLOW GREEN BLUE VIOLET GREY WHITE' >>> dict(e for e in enumerate(colors.split())) {0: 'BLACK', 1: 'BROWN', 2: 'RED', 3: 'ORANGE', 4: 'YELLOW', 5: 'GREEN', 6: 'BLUE', 7: 'VIOLET', 8: 'GREY', 9: 'WHITE'} >>> dict((f, n) for (n, f) in enumerate(colors.split())) {'BLUE': 6, 'BROWN': 1, 'GREY': 8, 'YELLOW': 4, 'GREEN': 5, 'VIOLET': 7, 'ORANGE': 3, 'BLACK': 0, 'WHITE': 9, 'RED': 2} Most other languages that use numbered constants number them by base n^2. >>> [x**2 for x in range(10)] [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] Binary flags have the advantage of saving memory because you can assign more than one to a single integer. Another advantage is other languages use them so it can make it easier interface with them. There also may be some performance advantages as well since you can test for multiple flags with a single comparison. Sets of strings can also work when you don't need to associate a numeric value to the constant. ie... the constant is the value. In this case the set supplies the api. Cheers, Ron From glyph at twistedmatrix.com Tue Nov 23 22:06:41 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 23 Nov 2010 16:06:41 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123153743.3D9451B8ED4@shell-too.nominum.com> References: <20101123153743.3D9451B8ED4@shell-too.nominum.com> Message-ID: On Nov 23, 2010, at 10:37 AM, Ben.Cottrell at nominum.com wrote: > I'd prefer not to think of the number of times I've made the following mistake: > > s = socket.socket(socket.SOCK_DGRAM, socket.AF_INET) If it's any consolation, it's fewer than the number of times I have :). (More fun, actually, is where you pass a file descriptor to the wrong argument of 'fromfd'...) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Nov 23 22:06:45 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 24 Nov 2010 08:06:45 +1100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290526253.3642.9.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> Message-ID: <4CEC2CE5.8000302@pearwood.info> Antoine Pitrou wrote: > Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', > values=range(1, 3)) > > Again, auto-enumeration is useless since it's trivial to achieve > explicitly. That doesn't make auto-enumeration "useless". Unnecessary, perhaps, but not useless. But even then it's only unnecessary if the number of constants are small enough that you can see how many there are without counting (essentially, 4 or fewer). When you have more, it becomes error-prone and a nuisance to have to count them by hand: Constants = make_constants( 'Constants', 'ST_MODE ST_INO ST_DEV ST_NLINK ST_UID ST_GID' \ 'ST_SIZE ST_ATIME ST_MTIME ST_CTIME', values=range(10) ) -- Steven From glyph at twistedmatrix.com Tue Nov 23 22:10:00 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 23 Nov 2010 16:10:00 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290524466.3642.4.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> Message-ID: <935CA187-6799-437E-8F18-2A35886B5117@twistedmatrix.com> On Nov 23, 2010, at 10:01 AM, Antoine Pitrou wrote: > Well, it is easy to assign range(N) to a tuple of names when desired. I > don't think an automatically-enumerating constant generator is needed. I don't think that numerical enumerations are the only kind of constants we're talking about. Others have already mentioned strings. Also, see for some other use-cases. Since this isn't coming to 2.x, we're probably going to do our own thing anyway (unless it turns out that flufl.enum is so great that we want to add another dependency...) but I'm hoping that the outcome of this discussion will point to something we can be compatible with. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Nov 23 22:15:20 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 22:15:20 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <935CA187-6799-437E-8F18-2A35886B5117@twistedmatrix.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <935CA187-6799-437E-8F18-2A35886B5117@twistedmatrix.com> Message-ID: <1290546920.3642.104.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 16:10 -0500, Glyph Lefkowitz a ?crit : > > On Nov 23, 2010, at 10:01 AM, Antoine Pitrou wrote: > > > Well, it is easy to assign range(N) to a tuple of names when > > desired. I > > don't think an automatically-enumerating constant generator is > > needed. > > I don't think that numerical enumerations are the only kind of > constants we're talking about. Others have already mentioned strings. > Also, see for some other use-cases. Since this > isn't coming to 2.x, we're probably going to do our own thing anyway > (unless it turns out that flufl.enum is so great that we want to add > another dependency...) but I'm hoping that the outcome of this > discussion will point to something we can be compatible with. I think that asking for too many features would get in the way, and also make the API quite un-Pythonic. If you want your values to be e.g. OR'able, just choose your values wisely ;) Regards Antoine. From rrr at ronadam.com Tue Nov 23 22:21:17 2010 From: rrr at ronadam.com (Ron Adam) Date: Tue, 23 Nov 2010 15:21:17 -0600 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535676.3642.89.camel@localhost.localdomain> Message-ID: Oops.. x**2 should have been 2**x below. On 11/23/2010 03:03 PM, Ron Adam wrote: > > > On 11/23/2010 12:07 PM, Antoine Pitrou wrote: >> Le mardi 23 novembre 2010 ? 12:50 -0500, Isaac Morland a ?crit : >>> Each enumeration is a type (well, OK, not in every language, presumably, >>> but certainly in many languages). The word "basic" is more important than >>> "types" in my sentence - the point is that an enumeration capability is a >>> very common one in a type system, and is very general, not specific to any >>> particular application. >> >> Python already has an enumeration capability. It's called range(). >> There's nothing else that C enums have. AFAICT, neither do enums in >> other mainstream languages (assuming they even exist; I don't remember >> Perl, PHP or Javascript having anything like that, but perhaps I'm >> mistaken). > > > Aren't we forgetting enumerate? > > >>> colors = 'BLACK BROWN RED ORANGE YELLOW GREEN BLUE VIOLET GREY WHITE' > > >>> dict(e for e in enumerate(colors.split())) > {0: 'BLACK', 1: 'BROWN', 2: 'RED', 3: 'ORANGE', 4: 'YELLOW', 5: 'GREEN', 6: > 'BLUE', 7: 'VIOLET', 8: 'GREY', 9: 'WHITE'} > > >>> dict((f, n) for (n, f) in enumerate(colors.split())) > {'BLUE': 6, 'BROWN': 1, 'GREY': 8, 'YELLOW': 4, 'GREEN': 5, 'VIOLET': 7, > 'ORANGE': 3, 'BLACK': 0, 'WHITE': 9, 'RED': 2} > > > Most other languages that use numbered constants number them by base n^2. > > >>> [x**2 for x in range(10)] > [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] >>> [2**x for x in range(10)] [1, 2, 4, 8, 16, 32, 64, 128, 256, 512] > Binary flags have the advantage of saving memory because you can assign > more than one to a single integer. Another advantage is other languages use > them so it can make it easier interface with them. There also may be some > performance advantages as well since you can test for multiple flags with a > single comparison. > > Sets of strings can also work when you don't need to associate a numeric > value to the constant. ie... the constant is the value. In this case the > set supplies the api. > > Cheers, > Ron > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gmane.org > From steve at pearwood.info Tue Nov 23 22:30:37 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 24 Nov 2010 08:30:37 +1100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290535676.3642.89.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535676.3642.89.camel@localhost.localdomain> Message-ID: <4CEC327D.1050503@pearwood.info> Antoine Pitrou wrote: > Python already has an enumeration capability. It's called range(). > There's nothing else that C enums have. AFAICT, neither do enums in > other mainstream languages (assuming they even exist; I don't remember > Perl, PHP or Javascript having anything like that, but perhaps I'm > mistaken). In Pascal, enumerations are a type, and the value of the named values are an implementation detail. E.g. one would define an enumerated type: type flavour = (sweet, salty, sour, bitter, umame); var x: flavour; and then you would write something like: x := sour; Notice that the constants sweet etc. aren't explicitly predefined, since they're purely internal details and the compiler is allowed to number them any way it likes. In Python, we would need stronger guarantees about the values chosen, so that they could be exposed to external modules, pickled, etc. But that doesn't mean we should be forced to specify the values ourselves. -- Steven From greg.ewing at canterbury.ac.nz Tue Nov 23 22:26:58 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Nov 2010 10:26:58 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123154229.474f7a90@pitrou.net> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> Message-ID: <4CEC31A2.5080809@canterbury.ac.nz> Antoine Pitrou wrote: > I don't understand why people insist on calling that an "enum". enum is > a C legacy and it doesn't bring anything useful as I can tell. The usefulness is that they can have a str() or repr() that displays the name of the value instead of an integer. The bool type was added for much the same reason -- otherwise we would simply have gotten builtin names False = 0 and True = 1. -- Greg From greg.ewing at canterbury.ac.nz Tue Nov 23 22:27:02 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Nov 2010 10:27:02 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290524519.3642.5.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <4CEBD624.9000402@voidspace.org.uk> <1290524519.3642.5.camel@localhost.localdomain> Message-ID: <4CEC31A6.5090505@canterbury.ac.nz> Antoine Pitrou wrote: > Well, it's been inherited by C-like languages, no doubt. Like braces and > semicolumns :) The idea isn't confined to the C family. Pascal and many of the languages inspired by it also have enumerated types. -- Greg From tjreedy at udel.edu Tue Nov 23 23:44:07 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 23 Nov 2010 17:44:07 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11/23/2010 2:11 PM, Alexander Belopolsky wrote: > This discussion motivated me to start looking into how well Python > library itself is prepared to deal with len(chr(i)) = 2. I was not Good idea! > surprised to find that textwrap does not handle the issue that well: > >>>> len(wrap(' \U00010140' * 80, 20)) > 12 >>>> len(wrap(' \U00000140' * 80, 20)) > 8 How well does textwrap handles composable pairs (letter + accent)? Does is count two codepoints as one char space? and avoid putting line breaks between? I suspect textwrap should be regarded as (extended?)_ascii_textwrap. > > That module should probably be rewritten to properly implement the > Unicode line breaking algorithm > . Probably a good idea > Yet finding a bug in a str object method after a 5 min review was a > bit discouraging: > >>>> 'xyz'.center(20, '\U00010140') > Traceback (most recent call last): > File " ", line 1, in > TypeError: The fill character must be exactly one character long Again, what does it do with letter + decorator combinations? It seems to me that the whole notion that one code point == one printed character space is broken once one leaves ascii. Perhaps we need an is_uchar function to recognize multi-code sequences, inclusing surrogate pairs, that represent one char for the purpose of character oriented functions. > Given the apparent difficulty of writing even basic text processing > algorithms in presence of surrogate pairs, I wonder how wise it is to > expose Python users to them. As Wikipedia explains, [1] > > """ > Because the most commonly used characters are all in the Basic > Multilingual Plane, converting between surrogate pairs and the > original values is often not tested thoroughly. This leads to > persistent bugs, and potential security holes, even in popular and > well-reviewed application software. > """ So we did not test thoroughly enough and need to add appropriate unit tests as bugs are fixed. -- Terry Jan Reedy From tjreedy at udel.edu Wed Nov 24 00:07:03 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 23 Nov 2010 18:07:03 -0500 Subject: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS In-Reply-To: <4CEC43A4.80907@netwok.org> References: <20101123203252.39BE7EE9CF@mail.python.org> <4CEC43A4.80907@netwok.org> Message-ID: <4CEC4917.2070508@udel.edu> On 11/23/2010 5:43 PM, ?ric Araujo wrote: >> Modified: python/branches/py3k/Misc/ACKS >> ============================================================================== >> --- python/branches/py3k/Misc/ACKS (original) >> +++ python/branches/py3k/Misc/ACKS Tue Nov 23 21:32:47 2010 >> @@ -1,4 +1,4 @@ >> -Acknowledgements >> +?Acknowledgements > > This change introduced a so-called UTF-8 BOM in the file. Is > TortoiseSvn the culprit or a text editor? I used Notepad to edit the file, TortoiseSvn to commit, the same as I did for #9222, rev86702, Lib\idlelib\IOBinding.py, yesterday. If the latter is OK, perhaps *.py gets filtered better than misc. text files. I believe I have the config as specified in dev/faq. [miscellany] enable-auto-props = yes [auto-props] * = svn:eol-style=native *.c = svn:keywords=Id *.h = svn:keywords=Id *.py = svn:keywords=Id *.txt = svn:keywords=Author Date Id Revision Terry From ijmorlan at uwaterloo.ca Wed Nov 24 00:15:03 2010 From: ijmorlan at uwaterloo.ca (Isaac Morland) Date: Tue, 23 Nov 2010 18:15:03 -0500 (EST) Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <58396.1290540417@parc.com> References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <58396.1290540417@parc.com> Message-ID: On Tue, 23 Nov 2010, Bill Janssen wrote: > The main purpose of that is to be able to catch type mismatches with > static typing, though. Seems kind of pointless for Python. The concept can work dynamically. In fact, the flufl.enum package which has been discussed here makes each enumeration into a separate class so many of the advantages of catching type mismatches are obtained. >> Hey, how about this syntax: >> >> enum Colors: >> red = 0 >> green = 10 >> blue > > Why not > > class Color: > red = (255, 0, 0) > green = (0, 255, 0) > blue = (0, 0, 255) > > Seems to handle the situation OK. Yes, this looks almost exactly like flufl.enum syntax. In any case my suggestion of a new keyword was not meant to be taken seriously. If I ever think I have a good reason to suggest a new keyword I'll sleep on it, take a vacation, and then if I still think a new keyword is justified I will specifically disclaim any possibility of the suggestion being a joke. Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist From db3l.net at gmail.com Wed Nov 24 00:18:33 2010 From: db3l.net at gmail.com (David Bolen) Date: Tue, 23 Nov 2010 18:18:33 -0500 Subject: [Python-Dev] Stable buildbots References: <20101113133712.60e9be27@pitrou.net> <4CEB7E12.1070201@snakebite.org> Message-ID: Trent Nelson writes: > That's interesting. (That kill_python.exe doesn't kill the wedged > processes, but pskill does.) kill_python is pretty simple, it just > calls TerminateProcess() after acquiring a handle with the relevant > PROCESS_TERMINATE access right. (...) > > Are you calling pskill with the -t flag? i.e. kill process and all > dependents? That might be the ticket, especially if killing the child > process that wedged select() is waiting on causes it to return, and > thus, makes it killable. Nope, just "pskill python_d". Haven't bothered to check the pskill source but I'm assuming it's just a basic TerminateProcess. Ideally my quickest workaround would just be to replace the kill_python in the buildbot tools script with that command but of course they could get updated on checkouts and I'm not arguing it's generally appropriate enough to belong in the source. I suspect the problem may be on the "identify which process to kill" rather than the "kill it" part, but it's definitely going to take time to figure that out for sure. While the approach kill_python takes is much more appropriate, since we don't currently have multiple builds running simultaneously (and for me the machines are dedicated as build slaves, so I won't be having my own python_d), a more blanket kill operation is safe enough. > Otherwise, if it happens again, can you try kill_python.exe first, > then pskill, and confirm if the former fails but the latter succeeds? Yeah, I've got a temporary tree with a built-binary around, but still have to make sure of the right way to run it manually in a way that it will do the identification right (which I think also means I need to figure out from which build tree the hung process started). Up until now, typically when I've found a hung setup, the rest of the build tree which originally applied to that process has been cleaned. I definitely sympathize with Martin's position though - it wasn't the simplest tool to write (and I still have some email from him about the week+ it took just to test the process identification part remotely through buildbots at the time), so I regret not jumping right in to try to fix it. But it's just way more effort than typing "pskill python_d", at least with my current availability. -- David From greg.ewing at canterbury.ac.nz Wed Nov 24 00:32:39 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Nov 2010 12:32:39 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290546920.3642.104.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <935CA187-6799-437E-8F18-2A35886B5117@twistedmatrix.com> <1290546920.3642.104.camel@localhost.localdomain> Message-ID: <4CEC4F17.7030600@canterbury.ac.nz> Antoine Pitrou wrote: > I think that asking for too many features would get in the way, and also > make the API quite un-Pythonic. If you want your values to be e.g. > OR'able, just choose your values wisely ;) On the other hand it could be useful to have an easy way to request power-of-2 value assignment, seeing as it's another common pattern. -- Greg From greg.ewing at canterbury.ac.nz Wed Nov 24 00:32:56 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Nov 2010 12:32:56 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <58396.1290540417@parc.com> References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <58396.1290540417@parc.com> Message-ID: <4CEC4F28.7010904@canterbury.ac.nz> Bill Janssen wrote: > The main purpose of that is to be able to catch type mismatches with > static typing, though. Seems kind of pointless for Python. But catching type mismatches with dynamic typing doesn't seem pointless for Python. There's nothing static about the proposals being made here that I can see. > Why not > > class Color: > red = (255, 0, 0) > green = (0, 255, 0) > blue = (0, 0, 255) If all you want is a bunch of named constants, that's fine. But the facilities being discussed here are designed to give you other things as well, such as c = Color.red print(c) printing "red" rather than "(255, 0, 0)". -- Greg From greg.ewing at canterbury.ac.nz Wed Nov 24 00:33:02 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Nov 2010 12:33:02 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290526253.3642.9.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> Message-ID: <4CEC4F2E.6080601@canterbury.ac.nz> Antoine Pitrou wrote: > Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', > values=range(1, 3)) > > Again, auto-enumeration is useless since it's trivial to achieve > explicitly. But seeing as it's going to be a common thing to do, why not make it the default? When defining an enum, often you don't *care* what the underlying values are, so assigning sequential natural numbers is as good a default as any. In fact, with the Pascal concept of an enumerated type you don't get any choice in the matter. It's only in the C family that you get this bastardised conflation of enumerations with arbitrary named constants... -- Greg From greg.ewing at canterbury.ac.nz Wed Nov 24 00:41:50 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Nov 2010 12:41:50 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <58396.1290540417@parc.com> Message-ID: <4CEC513E.4050603@canterbury.ac.nz> Isaac Morland wrote: > In any case my > suggestion of a new keyword was not meant to be taken seriously. I don't think it need be taken entirely as a joke, either. All the proposed patterns for creating enums that I've seen end up leaving something to be desired. They violate DRY by requiring you to write the class name twice, or they make you write the names of the values in quotes, or some other minor ugliness. While it may be possible to work around these things with sufficient levels of metaclass hackery and black magic, at some point one has to consider whether new syntax might be the least worst option. -- Greg From greg.ewing at canterbury.ac.nz Wed Nov 24 00:49:42 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Nov 2010 12:49:42 +1300 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEC5316.4010608@canterbury.ac.nz> Alexander Belopolsky wrote: > """ > Because the most commonly used characters are all in the Basic > Multilingual Plane, converting between surrogate pairs and the > original values is often not tested thoroughly. This leads to > persistent bugs, and potential security holes, even in popular and > well-reviewed application software. > """ Maybe Python should have used UTF-8 as its internal unicode representation. Then people who were foolish enough to assume one character per string item would have their programs break rather soon under only light unicode testing. :-) -- Greg From foom at fuhm.net Wed Nov 24 01:22:23 2010 From: foom at fuhm.net (James Y Knight) Date: Tue, 23 Nov 2010 19:22:23 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEC5316.4010608@canterbury.ac.nz> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> Message-ID: <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> On Nov 23, 2010, at 6:49 PM, Greg Ewing wrote: > Maybe Python should have used UTF-8 as its internal unicode > representation. Then people who were foolish enough to assume > one character per string item would have their programs break > rather soon under only light unicode testing. :-) You put a smiley, but, in all seriousness, I think that's actually the right thing to do if anyone writes a new programming language. It is clearly the right thing if you don't have to be concerned with backwards-compatibility: nobody really needs to be able to access the Nth codepoint in a string in constant time, so there's not really any point in storing a vector of codepoints. Instead, provide bidirectional iterators which can traverse the string by byte, codepoint, or by grapheme (that is: the set of combining characters + base character that go together, making up one thing which a human would think of as a character). James From jcea at jcea.es Wed Nov 24 01:31:01 2010 From: jcea at jcea.es (Jesus Cea) Date: Wed, 24 Nov 2010 01:31:01 +0100 Subject: [Python-Dev] Sporadic problems with bugs.python.org In-Reply-To: <4CEC24FE.70107@jcea.es> References: <4CEC24FE.70107@jcea.es> Message-ID: <4CEC5CC5.5070305@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 23/11/10 21:33, Jesus Cea wrote: > Happen to me last Sunday, and happening just now. > > I can access http://bugs.python.org/ just fine, but trying to post a > message, open a new bug, change nosy, etc., takes a LONG time (minutes) > and it is finally failing with a "400 Bad Request" error: > > """ > Bad Request > > Your browser sent a request that this server could not understand. > Apache/2.2.9 (Debian) mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 > OpenSSL/0.9.8g mod_wsgi/2.5 Server at bugs.python.org Port 80 > """ > > Last sunday I was able to open the bug after a time. Today I have been > retrying for while, with no luck yet. Still retrying, with no luck. Anybody else can reproduce?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOxcxZlgi5GaxT1NAQJGEQQApyTPFFyPbzc45v5AfeLwT0YHvIcFyT5a lZVZIJ+TVeI1PY/bZpebO4YnjQ6JrHIIedXf8IUqBi9sD8UUDY5tST8TikZPwvvk pGvdCRwa2A6slGG5zgnA4u4+H2MiOiRhua0sTELNQJYAgzTNER+LDTWQ04p31kOD D++Hjb2mBs8= =TI1J -----END PGP SIGNATURE----- From fuzzyman at voidspace.org.uk Wed Nov 24 01:41:37 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 24 Nov 2010 00:41:37 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290546920.3642.104.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <935CA187-6799-437E-8F18-2A35886B5117@twistedmatrix.com> <1290546920.3642.104.camel@localhost.localdomain> Message-ID: <4CEC5F41.8060806@voidspace.org.uk> On 23/11/2010 21:15, Antoine Pitrou wrote: > Le mardi 23 novembre 2010 ? 16:10 -0500, Glyph Lefkowitz a ?crit : >> On Nov 23, 2010, at 10:01 AM, Antoine Pitrou wrote: >> >>> Well, it is easy to assign range(N) to a tuple of names when >>> desired. I >>> don't think an automatically-enumerating constant generator is >>> needed. >> I don't think that numerical enumerations are the only kind of >> constants we're talking about. Others have already mentioned strings. >> Also, see for some other use-cases. Since this >> isn't coming to 2.x, we're probably going to do our own thing anyway >> (unless it turns out that flufl.enum is so great that we want to add >> another dependency...) but I'm hoping that the outcome of this >> discussion will point to something we can be compatible with. > I think that asking for too many features would get in the way, and also > make the API quite un-Pythonic. If you want your values to be e.g. > OR'able, just choose your values wisely ;) > Well, the point of an OR'able flag is that the result shows the OR'd values in the repr. Raymond suggests using a set of strings where you need flag constants. For new apis (so no backwards compatibility constraints) where you don't need to use integers (i.e. not wrapping a C library) that's a great suggestion: flags = {'FOO', 'BAR'} Michael > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From lukasz at langa.pl Wed Nov 24 01:50:23 2010 From: lukasz at langa.pl (=?utf-8?Q?=C5=81ukasz_Langa?=) Date: Wed, 24 Nov 2010 01:50:23 +0100 Subject: [Python-Dev] Centos 5.5 freeze during test_concurrent_futures Message-ID: Hi there! py3k built from trunk on Centos 5.5 freezes during regrtest on test_concurrent_futures with "Fatal Python error: Invalid thread state for this thread". As in a typical concurrent problem, subsequent calls freeze in different test cases, but the freeze itself is always reproducible and always during this test. A colorful example: http://bpaste.net/show/11493/ I created an issue for that here: http://bugs.python.org/issue10517 If necessary, I can provide Centos 5.5 shell access. I would also like to donate a Centos 5.5 buildbot. -- Best regards, ?ukasz Langa tel. +48 791 080 144 WWW http://lukasz.langa.pl/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcea at jcea.es Wed Nov 24 02:32:05 2010 From: jcea at jcea.es (Jesus Cea) Date: Wed, 24 Nov 2010 02:32:05 +0100 Subject: [Python-Dev] Sporadic problems with bugs.python.org In-Reply-To: <4CEC5CC5.5070305@jcea.es> References: <4CEC24FE.70107@jcea.es> <4CEC5CC5.5070305@jcea.es> Message-ID: <4CEC6B15.6060606@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 24/11/10 01:31, Jesus Cea wrote: > Still retrying, with no luck. > > Anybody else can reproduce?. One of my tracker changes was just processed. The important one still retrying every 5 minutes... I hope I can go sleep before dawn :-P. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOxrFZlgi5GaxT1NAQLHUQP+IyN3X/vt5AQKpg/fTjSUpfX2f3wTzeOp 8+5Gnb2ktyZQEF0ELBo0wiWNReJcxicw3ZD9Zqy05cprJ8VL7QZSRHkom+BiXrKK P+Rllulp8Eu+wq59NKJb5DGk8tfDt6zywepUAHB449Dkcyq9p8gt8L5LAiABTfsy dFaQPP2w1Kg= =ERTw -----END PGP SIGNATURE----- From tjreedy at udel.edu Wed Nov 24 02:51:20 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 23 Nov 2010 20:51:20 -0500 Subject: [Python-Dev] Sporadic problems with bugs.python.org In-Reply-To: <4CEC6B15.6060606@jcea.es> References: <4CEC24FE.70107@jcea.es> <4CEC5CC5.5070305@jcea.es> <4CEC6B15.6060606@jcea.es> Message-ID: On 11/23/2010 8:32 PM, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 24/11/10 01:31, Jesus Cea wrote: >> Still retrying, with no luck. >> >> Anybody else can reproduce?. > > One of my tracker changes was just processed. > > The important one still retrying every 5 minutes... > > I hope I can go sleep before dawn :-P. I added a comment to one issue and opened another with no problem during the last couple of hours. -- Terry Jan Reedy From glyph at twistedmatrix.com Wed Nov 24 02:52:13 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 23 Nov 2010 20:52:13 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> Message-ID: On Nov 23, 2010, at 7:22 PM, James Y Knight wrote: > On Nov 23, 2010, at 6:49 PM, Greg Ewing wrote: >> Maybe Python should have used UTF-8 as its internal unicode >> representation. Then people who were foolish enough to assume >> one character per string item would have their programs break >> rather soon under only light unicode testing. :-) > > You put a smiley, but, in all seriousness, I think that's actually the right thing to do if anyone writes a new programming language. It is clearly the right thing if you don't have to be concerned with backwards-compatibility: nobody really needs to be able to access the Nth codepoint in a string in constant time, so there's not really any point in storing a vector of codepoints. > > Instead, provide bidirectional iterators which can traverse the string by byte, codepoint, or by grapheme (that is: the set of combining characters + base character that go together, making up one thing which a human would think of as a character). I really hope that this idea is not just for new programming languages. If you switch from doing unicode "wrong" to doing unicode "right" in Python, you quadruple the memory footprint of programs which primarily store and manipulate large amounts of text. This is especially ridiculous in PyGTK applications, where the GUI's internal representation required by the GUI UTF-8 anyway, so the round-tripping of string data back and forth to the exploded UTF-32 representation is wasting gobs of memory and time. It at least makes sense when your C library's idea about character width and your Python build match up. But, in a desktop app this is unlikely to be a performance concern; in servers, it's a big deal; measurably so. I am pretty sure that in the server apps that I work on, we are eventually going to need our own string type and UTF-8 logic that does exactly what James suggested - certainly if we ever hope to support Py3. (I dimly recall that both James and I have made this point before, but it's pretty important, so it bears repeating.) From glyph at twistedmatrix.com Wed Nov 24 02:56:57 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 23 Nov 2010 20:56:57 -0500 Subject: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a) In-Reply-To: <20101123150219.29e20374@pitrou.net> References: <4CEB3F72.7000006@m2.ccsnet.ne.jp> <20101123150219.29e20374@pitrou.net> Message-ID: <720EFE43-119F-4F2F-BCB1-939275B5FA6E@twistedmatrix.com> On Nov 23, 2010, at 9:02 AM, Antoine Pitrou wrote: > On Tue, 23 Nov 2010 00:07:09 -0500 > Glyph Lefkowitz wrote: >> On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto < >> ocean-city at m2.ccsnet.ne.jp> wrote: >> >>> Hello. Does this affect python? Thank you. >>> >>> http://www.openssl.org/news/secadv_20101116.txt >>> >> >> No. > > Well, actually it does, but Python links against the system OpenSSL on > most platforms (except Windows), so it's up to the OS vendor to apply > the patch. It does? If so, I must have misunderstood the vulnerability. Can you explain how it affects Python? From stephen at xemacs.org Wed Nov 24 03:29:47 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 24 Nov 2010 11:29:47 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87tyj7bgis.fsf@uwakimon.sk.tsukuba.ac.jp> Alexander Belopolsky writes: > Yet finding a bug in a str object method after a 5 min review was a > bit discouraging: > > >>> 'xyz'.center(20, '\U00010140') > Traceback (most recent call last): > File " ", line 1, in > TypeError: The fill character must be exactly one character long > > Given the apparent difficulty of writing even basic text processing > algorithms in presence of surrogate pairs, I wonder how wise it is to > expose Python users to them. "Consenting adults" applies here. What to do? Write tests, fix the stdlib. Raise the probability of surrogate pair tests in the fuzzer. But "expose the users to surrogate pairs in an efficient (ie, UCS-2) implementation" is a fundamental design principle of Python. Tightening up the internal implementation is -10 unacceptable IMO YMMV. > Again, given that the str object itself has at least one non-BMP > character bug as we are closing on the third major release of py3k, > how likely are 3rd party developers to get their libraries right as > they port to 3.x? Not our problem, really. We need to fix the stdlib, but 3rd party libraries know what they're doing. I guess we could provide a fuzztest module that generates known nasty data (zero, very big numbers, "\0x00", "\U00010140", etc) that people would be able to plug in as a data source for their own code. Of course that doesn't replace conventional unittests based on analysis of edge cases and tests designed to tickle them, but it would be a start for many projects. From raymond.hettinger at gmail.com Wed Nov 24 03:35:35 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 23 Nov 2010 18:35:35 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEC513E.4050603@canterbury.ac.nz> References: <20101121034404.52924F20A@mail.python.org> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <58396.1290540417@parc.com> <4CEC513E.4050603@canterbury.ac.nz> Message-ID: <6A9ADF09-971A-4CD7-B583-3BF264E47CF2@gmail.com> On Nov 23, 2010, at 3:41 PM, Greg Ewing wrote: > While it may be possible to work around these things with > sufficient levels of metaclass hackery and black magic, at > some point one has to consider whether new syntax might > be the least worst option. The least worst option is to do nothing at all. That's better than creating a new little monster with its own nuances and limitations. We've gotten by well for almost two decades without this particular static language feature creeping into Python. For the most part, strings work well enough (see decimal.ROUND_UP for example). They are self-documenting and work well with the rest of the language. When a cluster of names cries out for its own namespace, the usual technique is to put the names in class (see the examples in the namedtuple docs for a way to make this a one-liner) or in a module (see opcode.py for example). For xor'able and or'able flags, sets of strings work well: flags = {'runnable', 'callable'} flags |= {'runnable', 'kissable'} if 'callable' in flags: . . . We have a hard enough time getting people to not program Java in Python. IMO, adding a new enumeration type would make this situation worse. Also, it adds weight to the language -- Python is not in needs of yet another fundamental construct. Raymond P.S. I do recognize that lots of people have written their own versions of Enum(), but I think they do it either out of habits formed from statically compiled languages that lack all of our namespace mechanisms or they do it because it is easy and fun to write (just like people seem to enjoy writing flatten() recipes more than they like actually using them). One other thought: With Py3.x, the language had its one chance to get smaller. Old-style classes were tossed, some built-ins vanished, and a few obsolete modules got nuked. It would be easy to have a "let's add thingie x" fest and lose those benefits. There are many devs who find that the language does not fit-in-their-heads anymore, so considerable restraint needs to be exercised before adding a new language feature that would soon permeate everyone's code base and add yet another thing that infrequent users have to learn before being able to read code. From stephen at xemacs.org Wed Nov 24 03:44:40 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 24 Nov 2010 11:44:40 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> Message-ID: <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> James Y Knight writes: > You put a smiley, but, in all seriousness, I think that's actually > the right thing to do if anyone writes a new programming > language. It is clearly the right thing if you don't have to be > concerned with backwards-compatibility: nobody really needs to be > able to access the Nth codepoint in a string in constant time, so > there's not really any point in storing a vector of codepoints. A sad commentary on the state of Emacs usage, "nobody". The theory is that accessing the first character of a region in a string often occurs as a primitive operation in O(N) or worse algorithms, sometimes without enough locality at the "collection of regions" level to give a reasonably small average access time. In practice, any *Emacs user can tell you that yes, we do need to be able to access the Nth codepoint in a buffer in constant time. The O(N) behavior of current Emacs implementations means that people often use a binary coding system on large files. Yes, some position caching is done, but if you have a large file (eg, a mail file) which is virtually segmented using pointers to regions, locality gets lost. (This is not a design bug, this is a fundamental requirement: consider fast switching between threaded view and author-sorted view.) And of course an operation that sorts regions in a buffer using character pointers will have the same problem. Working with memory pointers, OTOH, sucks more than that; GNU Emacs recently bit the bullet and got rid of their higher-level memory-oriented APIs, all of the Lisp structures now work with pointers, and only the very low-level structures know about character-to-memory pointer translation. This performance issue is perceptible even on 3GHz machines with not so large (50MB) mbox files. It's *horrid* if you do something like "occur" on a 1GB log file, then try randomly jumping to detected log entries. From fdrake at acm.org Wed Nov 24 03:58:47 2010 From: fdrake at acm.org (Fred Drake) Date: Tue, 23 Nov 2010 21:58:47 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <6A9ADF09-971A-4CD7-B583-3BF264E47CF2@gmail.com> References: <20101121034404.52924F20A@mail.python.org> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <58396.1290540417@parc.com> <4CEC513E.4050603@canterbury.ac.nz> <6A9ADF09-971A-4CD7-B583-3BF264E47CF2@gmail.com> Message-ID: On Tue, Nov 23, 2010 at 9:35 PM, Raymond Hettinger wrote: > The least worst option is to do nothing at all. For the standard library, I agree. There are enough variants that are needed/desired in different contexts, and there isn't a single clear winner. Nor is there any compelling reason to have a winner. I'm generally in favor of enums (or whatever you want to call them), and I'm in favor of importing support for the flavor you need, or just defining constants in whatever way makes sense for your library or application. I don't see any problems that aren't solved by that. ? -Fred -- Fred L. Drake, Jr.? ? "A storm broke loose in my mind."? --Albert Einstein From jcea at jcea.es Wed Nov 24 04:03:36 2010 From: jcea at jcea.es (Jesus Cea) Date: Wed, 24 Nov 2010 04:03:36 +0100 Subject: [Python-Dev] Sporadic problems with bugs.python.org In-Reply-To: References: <4CEC24FE.70107@jcea.es> <4CEC5CC5.5070305@jcea.es> <4CEC6B15.6060606@jcea.es> Message-ID: <4CEC8088.7010709@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 24/11/10 02:51, Terry Reedy wrote: >> I hope I can go sleep before dawn :-P. > > I added a comment to one issue and opened another with no problem during > the last couple of hours. My changes have work now. After like 8 hours and a retry every five minutes. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOyAiJlgi5GaxT1NAQLavgP/ZmlKIu+luLw7DpJAVk/p3BCF7wmciE0J KW5SmCHVsyPuKFgOY45f5PM0q7+iXiv3m59zrDNbk0yBvLnVbmGwEeeV1/kGsZ94 NrYuHqnwW6h19tbrFTmVZ5BVKBSc4pdvBhV3+0Zx9hAfkkH/heE4WKJEFd7tIzTu h9jsvAI8pR8= =sG82 -----END PGP SIGNATURE----- From glyph at twistedmatrix.com Wed Nov 24 04:27:38 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 23 Nov 2010 22:27:38 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> On Nov 23, 2010, at 9:44 PM, Stephen J. Turnbull wrote: > James Y Knight writes: > >> You put a smiley, but, in all seriousness, I think that's actually >> the right thing to do if anyone writes a new programming >> language. It is clearly the right thing if you don't have to be >> concerned with backwards-compatibility: nobody really needs to be >> able to access the Nth codepoint in a string in constant time, so >> there's not really any point in storing a vector of codepoints. > > A sad commentary on the state of Emacs usage, "nobody". > > The theory is that accessing the first character of a region in a > string often occurs as a primitive operation in O(N) or worse > algorithms, sometimes without enough locality at the "collection of > regions" level to give a reasonably small average access time. I'm not sure what you mean by "the theory is". Whose theory? About what? > In practice, any *Emacs user can tell you that yes, we do need to be > able to access the Nth codepoint in a buffer in constant time. The > O(N) behavior of current Emacs implementations means that people often > use a binary coding system on large files. Yes, some position caching > is done, but if you have a large file (eg, a mail file) which is > virtually segmented using pointers to regions, locality gets lost. > (This is not a design bug, this is a fundamental requirement: consider > fast switching between threaded view and author-sorted view.) Sounds like a design bug to me. Personally, I'd implement "fast switching between threaded view and author-sorted view" the same way I'd address any other multiple-views-on-the-same-data problem. I'd retain data structures for both, and update them as the underlying model changed. These representations may need to maintain cursors into the underlying character data, if they must retain giant wads of character data as an underlying representation (arguably the _main_ design bug in Emacs, that it encourages you to do that for everything, rather than imposing a sensible structure), but those cursors don't need to be code-point counters; they could be byte offsets, or opaque handles whose precise meaning varied with the potentially variable underlying storage. Also, please remember that Emacs couldn't be implemented with giant Python strings anyway: crucially, all of this stuff is _mutable_ in Emacs. > And of course an operation that sorts regions in a buffer using > character pointers will have the same problem. Working with memory > pointers, OTOH, sucks more than that; GNU Emacs recently bit the > bullet and got rid of their higher-level memory-oriented APIs, all of > the Lisp structures now work with pointers, and only the very > low-level structures know about character-to-memory pointer > translation. > > This performance issue is perceptible even on 3GHz machines with not > so large (50MB) mbox files. It's *horrid* if you do something like > "occur" on a 1GB log file, then try randomly jumping to detected log > entries. Case in point: "occur" needs to scan the buffer anyway; you can't do better than linear time there. So you're going to iterate through the buffer, using one of the techniques that James proposed, and remember some locations. Why not just have those locations be opaque cursors into your data? In summary: you're right, in that James missed a spot. You need bidirectional, *copyable* iterators that can traverse the string by byte, codepoint, grapheme, or decomposed glyph. From v+python at g.nevcal.com Wed Nov 24 05:28:19 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 23 Nov 2010 20:28:19 -0800 Subject: [Python-Dev] http.server - reference to bug #427345 Message-ID: <4CEC9463.8030302@g.nevcal.com> Where might I find the bug #427345 that is referred to in a comment inside http.server ? Here is a code excerpt: # throw away additional data [see bug #427345] while select.select([self.rfile._sock], [], [], 0)[0]: if not self.rfile._sock.recv(1): break -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.curtin at gmail.com Wed Nov 24 05:35:10 2010 From: brian.curtin at gmail.com (Brian Curtin) Date: Tue, 23 Nov 2010 22:35:10 -0600 Subject: [Python-Dev] http.server - reference to bug #427345 In-Reply-To: <4CEC9463.8030302@g.nevcal.com> References: <4CEC9463.8030302@g.nevcal.com> Message-ID: On Tue, Nov 23, 2010 at 22:28, Glenn Linderman > wrote: > Where might I find the bug #427345 that is referred to in a comment inside > http.server ? Here is a code excerpt: > > # throw away additional data [see bug #427345] > while select.select([self.rfile._sock], [], [], 0)[0]: > if not self.rfile._sock.recv(1): > break > http://bugs.python.org/issue427345 http://bugs.python.org/ has a box on the left-hand side where you can enter issue numbers. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Nov 24 06:07:52 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 24 Nov 2010 14:07:52 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> Message-ID: <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> Note that I'm not saying that there shouldn't be a UTF-8 string type; I'm just saying that for some purposes it might be a good idea to keep UTF-16 and UTF-32 string types around. Glyph Lefkowitz writes: > > The theory is that accessing the first character of a region in a > > string often occurs as a primitive operation in O(N) or worse > > algorithms, sometimes without enough locality at the "collection of > > regions" level to give a reasonably small average access time. > > I'm not sure what you mean by "the theory is". Whose theory? About what? Mine. About why somebody somewhere someday would need fast random access to character positions. "Nobody ever needs that" is a strong claim. > > In practice, any *Emacs user can tell you that yes, we do need to be > > able to access the Nth codepoint in a buffer in constant time. The > > O(N) behavior of current Emacs implementations means that people often > > use a binary coding system on large files. Yes, some position caching > > is done, but if you have a large file (eg, a mail file) which is > > virtually segmented using pointers to regions, locality gets lost. > > (This is not a design bug, this is a fundamental requirement: consider > > fast switching between threaded view and author-sorted view.) > > Sounds like a design bug to me. Personally, I'd implement "fast > switching between threaded view and author-sorted view" the same > way I'd address any other multiple-views-on-the-same-data problem. > I'd retain data structures for both, and update them as the > underlying model changed. Um, that's precisely the design I'm talking about. But as you recognize later, the message content is not part of those structures because there's no real point in copying it *if you have fast access to character positions*. In a variable width character, character- addressed design, there can be a perceptible delay in accessing even the "next" message's content if you're in the wrong view. > These representations may need to maintain cursors into the > underlying character data, if they must retain giant wads of > character data as an underlying representation (arguably the _main_ > design bug in Emacs, that it encourages you to do that for > everything, rather than imposing a sensible structure), but those > cursors don't need to be code-point counters; they could be byte > offsets, or opaque handles whose precise meaning varied with the > potentially variable underlying storage. Both byte offsets and opaque handles really really suck to design, implement, and maintain, if Lisp or Python level users can use them. They're hard enough to do when you can hide them behind internal APIs, but if they're accessible to users they're an endless source of user bugs. What was that you were saying about the difficulty of remembering which argument is the fd? It's like that. Sure, you can design APIs to help get that right, but it's not easy to provide one that can be used for all the different applications out there. > Also, please remember that Emacs couldn't be implemented with giant > Python strings anyway: crucially, all of this stuff is _mutable_ in > Emacs. No, that's a red herring. The use-cases where Emacs users complain most is browsing giant logs and reading old mail; neither needs the content to be mutable (although of course it's a convenience in the mail case if you delete messages or fetch new mail, but that could be done with transaction logs that get appended to the on-disk file). > Case in point: "occur" needs to scan the buffer anyway; you can't > do better than linear time there. So you're going to iterate > through the buffer, using one of the techniques that James > proposed, and remember some locations. Why not just have those > locations be opaque cursors into your data? They are. But unless you're willing to implement correct character motion, they need to be character indicies, which will be slow to access the actual locations. We've implemented caches, as does Emacs, but they don't always get hits. Finding an arbitrary position once can involve perceptible delay on up to 1GHz machines; doing it in a loop (which mail programs have a habit of doing) could be very painful. > In summary: you're right, in that James missed a spot. You need > bidirectional, *copyable* iterators that can traverse the string by > byte, codepoint, grapheme, or decomposed glyph. That's a good start, yes. But once you talk about "remembering some locations", you're implicitly talking about random access. Either you maintain position indexes which naively implemented can easily be close to the size of the text buffer (indexes are going to be at least 4 bytes, possibly 8, per position, and something like "occur" can generate a lot of positions) -- in which case you might as well just use a representation that is an array in the first place -- or you need to implement a position cache which can be very hairy to do well. Or you can give user programs memory indicies, and enjoy the fun as the poor developers do things like "pos += 1" which works fine on the ASCII data they have lying around, then wonder why they get Unicode errors when they take substrings. I'm sure it all can be done, but I don't think it will be done right the first time around. You may be right that designs better adapted to large data sets than Emacs's "everything is a big buffer" will almost always be available with reasonable effort. But remember, a lot of good applications start small, when a flat array might make lots of sense as the underlying structure, and then need to scale. If you need to scale for the paying customers, well, "ouch!" but you can afford it, but for many volunteer or startup projects that takes the wind right out of your sails. Note that if the user doesn't use private space, in a UCS-2 build you have about 1.5K code points available for compressing non-BMP characters into a 2-byte, valid Unicode representation (of course you need to save off the table somewhere if that ever gets out of your program, but that's easy). I find it hard to imagine that there will be many use-cases that need more than that many non-BMP characters. So probably you can tell those few users who care to use a UCS-4 build; most of the array use-cases can be served by UCS-2. Note that in my Japanese corpuses, UTF-8 averages just about 2 bytes per character anyway, and those are mail files, where two lines of Japanese may be preceded by 2KB of ASCII-only header. I suspect Hebrew, Arabic, and Cyrillic users will have similar experiences. By the way, to send the ball back into your court, I have this feeling that the demand for UTF-8 is once again driven by native English speakers who are very shortly going to find themselves, and the data they are most familiar with, very much in the minority. Of course the market that benefits from UTF-8 compression will remain very large for the immediate future, but in the grand scheme of things, most of the world is going to prefer UTF-16 by a substantial margin. N.B. I'm not talking about persistent storage, where it's 6 of one and half a dozen of the other; you can translate UTF-8 to UTF-16 way faster than you can read content from disk, of course. From foom at fuhm.net Wed Nov 24 07:26:11 2010 From: foom at fuhm.net (James Y Knight) Date: Wed, 24 Nov 2010 01:26:11 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <3C1ADB64-63F3-4165-926D-EDE9846E0DBD@fuhm.net> On Nov 24, 2010, at 12:07 AM, Stephen J. Turnbull wrote: > Or you can give user programs memory indicies, and enjoy the fun as > the poor developers do things like "pos += 1" which works fine on > the ASCII data they have lying around, then wonder why they get > Unicode errors when they take substrings. a) You seem to be hung up implementation details of emacs. But yes, positions should be stored as an byte offset into the utf8 string. NOT as number of codepoints since the beginning of the string. Probably you want it to be somewhat opaque, so that you actually have to specify whether you wanted to go to +1 byte, codepoint, or grapheme. b) Those poor developers are *already* screwed if they're using pos += 1 when pos is a codepoint index and they then take a substring based on that! They will get half a character when the string contains combining characters... Pretending that "codepoints" are a useful abstraction just makes poor developers get by without doing the correct thing (incrementing to the next grapheme boundary) for a little bit longer. But once you [the language implementor] are providing correct abstractions for grapheme movement, it's just as easy to also provide an abstraction for codepoint movement, and make your low-level implementation of the iterator object be a byte-offset into a UTF8 buffer. James From foom at fuhm.net Wed Nov 24 07:27:52 2010 From: foom at fuhm.net (James Y Knight) Date: Wed, 24 Nov 2010 01:27:52 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Nov 24, 2010, at 12:07 AM, Stephen J. Turnbull wrote: > By the way, to send the ball back into your court, I have this feeling > that the demand for UTF-8 is once again driven by native English > speakers who are very shortly going to find themselves, and the data > they are most familiar with, very much in the minority. Of course the > market that benefits from UTF-8 compression will remain very large for > the immediate future, but in the grand scheme of things, most of the > world is going to prefer UTF-16 by a substantial margin. No, the demand for UTF-8 is because that's what much of the internet (and not coincidentally, unix) world has standardized on. The main pieces of software using UTF-16 (Windows, Java) started doing so before it became apparent that 16 bits wasn't enough to actually hold a unicode codepoint, so they were actually implementing UCS-2. In those days, UCS-2 was a fairly sensible choice. But, now, if your choices are UTF-8 or UTF-16, UTF-8 is clearly superior. Not because it's smaller -- it's pretty much a tossup -- but because it is an ASCII superset, and thus more easily compatible with other software. That also makes it most commonly used for internet communication. (So, there's a huge advantage for using it internally as well right there: no transcoding necessary for writing your HTML output). UTF-16 is incompatible with ASCII, and furthermore, it's still a variable-width encoding, with all the same issues that causes. As such, there's really very little to be said in favor of it. If you really want a fixed-width encoding, you have to go to UTF-32, which is excessively large. UTF-32 is a losing choice, simply because of the wasted memory usage. But that's all a side issue: even if you do choose UTF-16 as your underlying encoding, you *still* need to provide iterators that work by "byte" (only now bytes are 16-bits), by codepoint, and by grapheme. Of course, people who implement UTF-16 (such as python, java, and windows) often pretend they're still implementing UCS-2, and don't bother even providing their users with the necessary APIs to do things correctly. Which, you can often get away with...just so long as you don't mind that you sometimes end up splitting a string in the middle of a codepoint and causing a unicode error! James From v+python at g.nevcal.com Wed Nov 24 08:43:18 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 23 Nov 2010 23:43:18 -0800 Subject: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4 In-Reply-To: <20101122043957.2A5D6235C7A@kimball.webabinitio.net> References: <4CE7452A.7050109@g.nevcal.com> <4CE7B34D.4020309@netwok.org> <4CE8111F.9060502@g.nevcal.com> <4CE8CFCD.4040906@g.nevcal.com> <20101121171821.195552194AC@kimball.webabinitio.net> <4CE9EABA.1090306@g.nevcal.com> <20101122043957.2A5D6235C7A@kimball.webabinitio.net> Message-ID: <4CECC216.8090802@g.nevcal.com> On 11/21/2010 8:39 PM, R. David Murray wrote: > On Sun, 21 Nov 2010 19:59:54 -0800, Glenn Linderman wrote: >> On 11/21/2010 9:18 AM, R. David Murray wrote: >>> I want to look at the CGI issue, but I'm not sure when I'll get to it. >> Actually, since this code was working before 3.x, and if email.parser >> can now accept binary streams, it seems like maybe the only thing that >> might be wrong is that presently it is getting a text stream instead, so >> that is something cgi.py or the application program would have to >> switch, and then maybe some testing would discover correctness, or maybe >> a specification of UTF-8 as the encoding to use for the text parts would >> have to be done. > Well, given the bytes/string split in Python3, code definitely has to > be changed to make this work, since you have to explicitly call bytes > processing routines (message_from_bytes, message_from_binary_file, > BytesFeedparser, etc) to parse binary data, and likewise use > BytesGenerator to emit binary data. Looks like cgi.py also calls http.client and both of them would need to be changed to deal with bytes. I don't have the full translation of API calls in my head, nor have I ever used the email.parser API to know what the calls actually do... just read a bit about it... but that is different than using it... However, I find code in http.client.parse_headers that is attempting to work-around reading a binary stream and feeding email.parser a string. So definitely some work to be done to fix things. I did add some explicit threads to http.server CGI script code that I think work around the deadlocks that can result from attempting to serialize 3 pipes, and yet not require full buffering of stdin or stdout. At the moment, I still am doing full buffering of stderr, but that is thought to be small potatoes in an http.server environment, generally. But since my test case is a CGI form data, I'm stuck until this is fixed, or I wrap my head around the code in http.client and email.parser. But not tonight (yawn!). -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Nov 24 09:02:13 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 24 Nov 2010 09:02:13 +0100 Subject: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a) In-Reply-To: <720EFE43-119F-4F2F-BCB1-939275B5FA6E@twistedmatrix.com> References: <4CEB3F72.7000006@m2.ccsnet.ne.jp> <20101123150219.29e20374@pitrou.net> <720EFE43-119F-4F2F-BCB1-939275B5FA6E@twistedmatrix.com> Message-ID: <1290585733.3642.2.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 20:56 -0500, Glyph Lefkowitz a ?crit : > On Nov 23, 2010, at 9:02 AM, Antoine Pitrou wrote: > > > On Tue, 23 Nov 2010 00:07:09 -0500 > > Glyph Lefkowitz wrote: > >> On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto < > >> ocean-city at m2.ccsnet.ne.jp> wrote: > >> > >>> Hello. Does this affect python? Thank you. > >>> > >>> http://www.openssl.org/news/secadv_20101116.txt > >>> > >> > >> No. > > > > Well, actually it does, but Python links against the system OpenSSL on > > most platforms (except Windows), so it's up to the OS vendor to apply > > the patch. > > > It does? If so, I must have misunderstood the vulnerability. Can you > explain how it affects Python? If I believe the link above: ?Any OpenSSL based TLS server is vulnerable if it is multi-threaded and uses OpenSSL's internal caching mechanism. Servers that are multi-process and/or disable internal session caching are NOT affected.? So, you just have to create a multithreaded TLS server which doesn't disable server-side session caching (it is enabled by default according to http://www.openssl.org/docs/ssl/SSL_CTX_set_session_cache_mode.html ) Regards Antoine. From solipsis at pitrou.net Wed Nov 24 09:42:07 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 24 Nov 2010 09:42:07 +0100 Subject: [Python-Dev] Centos 5.5 freeze during test_concurrent_futures References: Message-ID: <20101124094207.33ac093f@pitrou.net> Hi, > py3k built from trunk on Centos 5.5 freezes during regrtest on test_concurrent_futures with "Fatal Python error: Invalid thread state for this thread". As in a typical concurrent problem, subsequent calls freeze in different test cases, but the freeze itself is always reproducible and always during this test. Well, could you run this under gdb and report the stacks for the various threads when the process crashes? (when compiled --with-pydebug, if possible) Thank you Antoine. From solipsis at pitrou.net Wed Nov 24 09:43:12 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 24 Nov 2010 09:43:12 +0100 Subject: [Python-Dev] http.server - reference to bug #427345 References: <4CEC9463.8030302@g.nevcal.com> Message-ID: <20101124094312.06bec373@pitrou.net> On Tue, 23 Nov 2010 22:35:10 -0600 Brian Curtin wrote: > On Tue, Nov 23, 2010 at 22:28, Glenn Linderman > > > wrote: > > > Where might I find the bug #427345 that is referred to in a comment inside > > http.server ? Here is a code excerpt: > > > > # throw away additional data [see bug #427345] > > while select.select([self.rfile._sock], [], [], 0)[0]: > > if not self.rfile._sock.recv(1): > > break > > > > http://bugs.python.org/issue427345 > > http://bugs.python.org/ has a box on the left-hand side where you can enter > issue numbers. And of course you can also reverse-engineer the clever URL scheme used by Roundup bug entries ;) Regards Antoine. From stephen at xemacs.org Wed Nov 24 10:03:29 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 24 Nov 2010 18:03:29 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <3C1ADB64-63F3-4165-926D-EDE9846E0DBD@fuhm.net> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> <3C1ADB64-63F3-4165-926D-EDE9846E0DBD@fuhm.net> Message-ID: <87mxozayam.fsf@uwakimon.sk.tsukuba.ac.jp> James Y Knight writes: > a) You seem to be hung up implementation details of emacs. Hung up? No. It's the program whose text model I know best, and even if its design could theoretically be a lot better for this purpose, I can't say I've seen a real program whose model is obviously better for the purpose of a language for implementing text editors.[1] So it's not obvious to me that its model can be ruled out on a priori grounds. If not, it would be nice if your new language could implement it efficiently without contorted programming. > But yes, positions should be stored as an byte offset into the > utf8 string. NOT as number of codepoints since the beginning of > the string. Probably you want it to be somewhat opaque, so that > you actually have to specify whether you wanted to go to +1 > byte, codepoint, or grapheme. Well, first of all, +1 byte should not be available to a text iterator, at least not with the same iterator/position object that implements character and/or grapheme movement. (You seem to have thought about this issue a lot, but mixing bytes with text units makes wonder how much practical implementation you've done.) Second, incrementing to grapheme boundaries is relatively easy to do efficiently, just as incrementing to a UTF-8 character boundary is easy to do. We already do the latter, the former is pragmatically harder, but not a conceptual stretch. That's not the question. The question is how do we identify an arbitrary position in the text? Sometimes it's nice to have a numerical measure of size or location. It is not obvious that position by grapheme count is going to be the obvious way to determine position in a text. Eg, for languages with variable metric characters, character counts as a way of lining up table columns is going the way of Tyrannosaurus. In the Han-using languages, yes, column counts within lines are going to be important forever, because the characters are literally square for most practical purposes ... but they don't use composing characters (all the Japanese kana are precomposed, for example), so position by grapheme is going to be very close to position by character, and fine positioning will be done either by mouse or by incrementing the last few characters. Nor do I think operations like "advance 1,000,000 characters" will have less meaning than "advance 1,000,000 graphemes." Both of them are just a way of saying "go way far away", end up in about the same place, and where there's a bias, it will be pretty consistent in a statistical sense for any given natural language (and therefore, for 99% of users). > But once you [the language implementor] are providing correct > abstractions for grapheme movement, it's just as easy to also > provide an abstraction for codepoint movement, and make your > low-level implementation of the iterator object be a byte-offset > into a UTF8 buffer. Sure, that's fine for something that just iterates over the text. But if you actually need to remember positions, or regions, to jump to later or to communicate to other code that manipulates them, doing this stuff the straightforward way (just copying the whole iterator object to hang on to its state) becomes expensive. You end up proliferating types that all do the same kind of thing. Judicious use of inheritance helps, but getting the fundamental abstraction right is hard. Or least, Emacs hasn't found it in 20 years of trying. OTOH, all that stuff "just works" and just works efficiently, up to the grapheme vs. character issue, with an array. About that issue, to go back to tired old Emacs, *all* of the things I can think of that I might want to do by grapheme (display, insert, delete, move a few places) do fit the "increment until done" model. These things already work quite well for the variable-width buffer that "multilingual" Emacsen use, whether the old Mule encoding or UTF-8. So I can see how the UTF-8 model with appropriate iterators for characters and graphemes can work well for lots of applications and use cases. But Emacs already has opaque "markers", yet nevertheless the use of integer character positions in strings and buffers has survived. That *may* have to do with mutability, and the "all the world is a buffer" design, as Glyph suggested, but I think it more likely that markers are very expense to create and use compared to integers. Perhaps an editor of power similar to Emacs could be implemented with string operations on lines, or the like, and these issues would go away. But it's not obvious to me. Footnotes: [1] Yes, I know that not all programs are text editors. So shoot me. It's still the text manipulation program I know best, and it's not obvious to me that it's the unique class that would need these features. From stephen at xemacs.org Wed Nov 24 10:51:49 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 24 Nov 2010 18:51:49 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87lj4jaw22.fsf@uwakimon.sk.tsukuba.ac.jp> James Y Knight writes: > But, now, if your choices are UTF-8 or UTF-16, UTF-8 is clearly > superior [...]a because it is an ASCII superset, and thus more > easily compatible with other software. That also makes it most > commonly used for internet communication. Sure, UTF-8 is very nice as a protocol for communicating text. So what? If your application involves shoveling octets real fast, don't convert and shovel those octets. If your application involves significant text processing, well, conversion can almost always be done as fast as you can do I/O so it doesn't cost wallclock time, and generally doesn't require a huge percentage of CPU time compared to the actual text processing. It's just a specialization of serialization, that we do all the time for more complex data structures. So wire protocols are not a killer argument for or against any particular internal representation of text. > (So, there's a huge advantage for using it internally as well right > there: no transcoding necessary for writing your HTML output). I don't know your use cases but for mine, transcoding (whether in Lisp or Python or C) is invariably the least of my worries. *Especially* transcoding to UTF-8, which is the default codec for me, and I *never* mix bytes and text, so having not bothered to set the codec, I don't bother to transcode explicitly. > If you really want a fixed-width encoding, you have to go to > UTF-32 Not really. I never bothered implementing the codec, because I haven't yet seen a non-BMP Unicode character in the wild (I still see a lot of non-Unicode characters, but hey, that's the price you pay for living in the land that invented sushi, sake, and anime). For most use cases, those are going to be rare, where by "rare" I mean "you aren't going to see 6400 *different* non-BMP characters."[1] So instead of having the codec produce UTF-16, you have it produce (Holy CEF, Batman!) "pure" UCS-2 with the non-BMP characters registered on demand and encoded in the BMP private area. Python, of course, will never know the difference, and your language won't need to care, either. > But that's all a side issue: even if you do choose UTF-16 as your > underlying encoding, you *still* need to provide iterators that > work by "byte" (only now bytes are 16-bits), by codepoint, Nope, see above. Codepoints can be bytes and vice versa. The needed codec is no harder to use than any other codec, and only slightly less efficient than the normal UTF-8 codec unless you're basically restricted to a rather uncommon script (and even then there are optimizations). > and by grapheme. Sure, but as I point out elsewhere, the use cases where grapheme movement is distinguished from character movement I can come up with are all iterative, and I don't need array behavior for both anyway. So since I *can* have a character array in Unicode, and I *can't* have a grapheme array (except maybe by a scheme like the above), I'll go for the character array. Unless maybe you convince me I don't need it, but I'm yet to be convinced. > away with...just so long as you don't mind that you sometimes end > up splitting a string in the middle of a codepoint and causing a > unicode error! I *do* mind, but I like Python anyway. Footnotes: [1] OK, in practice a lot of the private space will be taken by existing system characters, such as the Apple logo (absolutely essential for writing email on Mac, at least in Japan). Whose use-case is going to see 1000 different non-BMP characters in a session? I do know a couple of Buddhist dictionary editors, but aside from them, I can't think of anybody. Lara Croft, maybe. From solipsis at pitrou.net Wed Nov 24 11:27:30 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 24 Nov 2010 11:27:30 +0100 Subject: [Python-Dev] len(chr(i)) = 2? References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> <87lj4jaw22.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20101124112730.6867fb17@pitrou.net> On Wed, 24 Nov 2010 18:51:49 +0900 "Stephen J. Turnbull" wrote: > James Y Knight writes: > > > But, now, if your choices are UTF-8 or UTF-16, UTF-8 is clearly > > superior [...]a because it is an ASCII superset, and thus more > > easily compatible with other software. That also makes it most > > commonly used for internet communication. > > Sure, UTF-8 is very nice as a protocol for communicating text. So > what? If your application involves shoveling octets real fast, don't > convert and shovel those octets. If your application involves > significant text processing, well, conversion can almost always be > done as fast as you can do I/O so it doesn't cost wallclock time, and > generally doesn't require a huge percentage of CPU time compared to > the actual text processing. It's just a specialization of > serialization, that we do all the time for more complex data > structures. > > So wire protocols are not a killer argument for or against any > particular internal representation of text. Agreed. Decoding and encoding utf-8 is so fast that it should be dwarfed by any actual processing done on the text. Regards Antoine. From solipsis at pitrou.net Wed Nov 24 12:37:54 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 24 Nov 2010 12:37:54 +0100 Subject: [Python-Dev] r86726 - python/branches/release27-maint/Objects/setobject.c References: <20101124103923.DC18EDE50@mail.python.org> Message-ID: <20101124123754.3b60d3a3@pitrou.net> On Wed, 24 Nov 2010 11:39:23 +0100 (CET) armin.rigo wrote: > Author: armin.rigo > Date: Wed Nov 24 11:39:23 2010 > New Revision: 86726 > > Log: > A no-op change. It looks like this call was not meant to be a recursive > call, but just call the helper (which the recursive call ends up doing). Since it's allegedly a no-op change, it doesn't come with a test, and 2.7.1 is in rc phase, is it really the right time to do it? What is the motivation for it? Thanks Antoine. > > > Modified: > python/branches/release27-maint/Objects/setobject.c > > Modified: python/branches/release27-maint/Objects/setobject.c > ============================================================================== > --- python/branches/release27-maint/Objects/setobject.c (original) > +++ python/branches/release27-maint/Objects/setobject.c Wed Nov 24 11:39:23 2010 > @@ -1858,7 +1858,7 @@ > tmpkey = make_new_set(&PyFrozenSet_Type, key); > if (tmpkey == NULL) > return -1; > - rv = set_contains(so, tmpkey); > + rv = set_contains_key(so, tmpkey); > Py_DECREF(tmpkey); > } > return rv; From fuzzyman at voidspace.org.uk Wed Nov 24 13:30:15 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 24 Nov 2010 12:30:15 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> Message-ID: <4CED0557.9090101@voidspace.org.uk> On 23/11/2010 14:16, Nick Coghlan wrote: > On Tue, Nov 23, 2010 at 11:50 PM, Michael Foord > wrote: >> PEP 354 was rejected for two primary reasons - lack of interest and nowhere >> obvious to put it. Would it be *so bad* if an enum type lived in its own >> module? There is certainly more interest now, and if we are to use something >> like this in the standard library it *has* to be in the standard library >> (unless every module implements their own private _Constant class). >> >> Time to revisit the PEP? > If you (or anyone else) wanted to revisit the PEP, then I would advise > trawling through the standard library looking for constants that could > be sensibly converted to enum values. Based on a non-exhaustive search, Python standard library modules currently using integers for constants: * re - has flags (OR'able constants) defined in sre_constants, each flag has two names (e.g. re.IGNORECASE and re.I) * os has SEEK_SET, SEEK_CUR, SEEK_END - *plus* those implemented in posix / nt * doctest has its own flag system, but is really just using integer flags / constants (quite a few of them) * token has a tonne of constants (autogenerated) * socket exports a bunch of constants defined in _socket * gzip has flags: FTEXT, FHCRC, FEXTRA, FNAME, FCOMMENT * errno (builtin module) EALREADY, EINPROGRESS, EWOULDBLOCK, ECONNRESET, EINVAL, ENOTCONN, ESHUTDOWN, EINTR, EISCONN, EBADF, ECONNABORTED * opcode has HAVE_ARGUMENT, EXTENDED_ARG. In fact pretty much the whole of opcode is about defining and exposing named constants * msilib uses flag constants * multiprocessing.pool - RUN, CLOSE, TERMINATE * multiprocessing.util - NOTSET, SUBDEBUG, DEBUG, INFO, SUBWARNING * xml.dom and xml.dom.Node (in __init__.py) have a bunch of constants * xml.dom.NodeFilter.NodeFilter holds a bunch of constants (some of them flags) * xmlrpc.client has a bunch of error constants * calendar uses constants to represent weekdays, plus one for the EPOCH that is best left alone * http.client has a tonne of constants - recognisable as ports / error codes though * dis has flags in COMPILER_FLAG_NAMES, which are then set as locals in inspect * io defines SEEK_SET, SEEK_CUR, SEEK_END (same as os) Where constants are implemented in C but exported via a Python module (the constants exported by os and socket for example) they could be wrapped. Where they are exported directly by a C extension or builtin module (e.g. errno) they are probably best left. Raymond feels that having an enum / constant type would be Javaesque and unused. If we used it in the standard library the unused fear at least would be unwarranted. The change would be largely transparent to developers, except they get better debugging info. Twisted is also looking for an enum / constant type: http://twistedmatrix.com/trac/ticket/4671 Because we would need to subclass from int for backwards compatibility we can't (unless the base class is set dynamically which I don't propose) it couldn't replace float / string constants. Hopefully it would still be sufficient to allow Twisted to use it. (Although they do so love reimplementing parts of the standard library - usually better than the standard library it has to be said.) All the best, Michael There are a tonne of constants that are used as numbers (MAX_LINE_LENGTH appears in a few places) and aren't just arbitrary constants. There are also some other interesting ones: * pty has STDIN_FILENO, STDOUT_FILENO, STDERR_FILENO, CHILD * poplib has POP3_PORT, POP3_SSL_PORT - recognisable as port numbers, should be left as ints * datetime.py has MINYEAR and MAXYEAR * colorsys has float constants * tty uses constants for termios list indexes (used as numbers I guess) * curses.ascii has a whole bunch of integer constants referring to ascii characters * Several modules - decimal, concurrent.futures, uuid (and now inspect) already use strings > A decision would also need to be made as to whether or not to subclass > int, or just provide __index__ (the former has the advantage of being > able to drop cleanly into OS level APIs that expect a numerical > constant). > > Whether enums should provide arbitrary name-value mappings (ala C > enums) or were restricted to sequential indices starting from zero > would be another question best addressed by a code survey of at least > the stdlib. > > And getgeneratorstate() doesn't count as a use case, since the > ordering isn't needed and using string literals instead of integers > will cover the debugging aspect :) > > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From ncoghlan at gmail.com Wed Nov 24 15:08:04 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 25 Nov 2010 00:08:04 +1000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CED0557.9090101@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> Message-ID: On Wed, Nov 24, 2010 at 10:30 PM, Michael Foord wrote: > Based on a non-exhaustive search, Python standard library modules currently > using integers for constants: Thanks for that review. I think following up on the "NamedConstant" idea may make more sense than pursuing enums in their own right. That way we could get the debugging benefits on the Python side regardless of any type constraints on the value (e.g. needing to be an integer in order to interface to C code), without needing to design an enum API that suited all purposes. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From exarkun at twistedmatrix.com Wed Nov 24 16:01:06 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Wed, 24 Nov 2010 15:01:06 -0000 Subject: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a) In-Reply-To: <1290585733.3642.2.camel@localhost.localdomain> References: <4CEB3F72.7000006@m2.ccsnet.ne.jp> <20101123150219.29e20374@pitrou.net> <720EFE43-119F-4F2F-BCB1-939275B5FA6E@twistedmatrix.com> <1290585733.3642.2.camel@localhost.localdomain> Message-ID: <20101124150106.2109.660794265.divmod.xquotient.197@localhost.localdomain> On 08:02 am, solipsis at pitrou.net wrote: >Le mardi 23 novembre 2010 ? 20:56 -0500, Glyph Lefkowitz a ?crit : >>On Nov 23, 2010, at 9:02 AM, Antoine Pitrou wrote: >> >> > On Tue, 23 Nov 2010 00:07:09 -0500 >> > Glyph Lefkowitz wrote: >> >> On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto < >> >> ocean-city at m2.ccsnet.ne.jp> wrote: >> >> >> >>> Hello. Does this affect python? Thank you. >> >>> >> >>> http://www.openssl.org/news/secadv_20101116.txt >> >>> >> >> >> >> No. >> > >> > Well, actually it does, but Python links against the system OpenSSL >>on >> > most platforms (except Windows), so it's up to the OS vendor to >>apply >> > the patch. >> >> >>It does? If so, I must have misunderstood the vulnerability. Can you >>explain how it affects Python? > >If I believe the link above: > 1CAny OpenSSL based TLS server is vulnerable if it is multi-threaded and >uses OpenSSL's internal caching mechanism. Servers that are >multi-process and/or disable internal session caching are NOT >affected. 1D > >So, you just have to create a multithreaded TLS server which doesn't >disable server-side session caching (it is enabled by default according >to http://www.openssl.org/docs/ssl/SSL_CTX_set_session_cache_mode.html >) Hm. The session cache is enabled by default, but nothing will ever use it unless the server specifies a session id using SSL_set_session_id_context or SSL_CTX_set_session_id_context. Python doesn't expose these, so I don't think any Python SSL server can set them. The vulnerability announcement isn't 100% clear on this, but I took a look at the patch which fixes the issue and it /appears/ as though if a client never tries to re-use a session then you will be safe from this bug. However, perhaps this only means that only malicious clients (which send a session id even when they can't actually have one) will be able to trigger the bug. Or I may misunderstand how SSL sessions work in OpenSSL entirely. The documentation for them is on par with that for most of the rest of OpenSSL. Jean-Paul From solipsis at pitrou.net Wed Nov 24 16:11:20 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 24 Nov 2010 16:11:20 +0100 Subject: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a) References: <4CEB3F72.7000006@m2.ccsnet.ne.jp> <20101123150219.29e20374@pitrou.net> <720EFE43-119F-4F2F-BCB1-939275B5FA6E@twistedmatrix.com> <1290585733.3642.2.camel@localhost.localdomain> <20101124150106.2109.660794265.divmod.xquotient.197@localhost.localdomain> Message-ID: <20101124161120.5ddd106c@pitrou.net> On Wed, 24 Nov 2010 15:01:06 -0000 exarkun at twistedmatrix.com wrote: > > > >If I believe the link above: > > 1CAny OpenSSL based TLS server is vulnerable if it is multi-threaded and > >uses OpenSSL's internal caching mechanism. Servers that are > >multi-process and/or disable internal session caching are NOT > >affected. 1D > > > >So, you just have to create a multithreaded TLS server which doesn't > >disable server-side session caching (it is enabled by default according > >to http://www.openssl.org/docs/ssl/SSL_CTX_set_session_cache_mode.html > >) > > Hm. The session cache is enabled by default, but nothing will ever use > it unless the server specifies a session id using > SSL_set_session_id_context or SSL_CTX_set_session_id_context. Python > doesn't expose these, so I don't think any Python SSL server can set > them. Well, Python calls SSL_CTX_set_session_id_context() implicitly, starting from 3.2 (precisely so that the session cache gets used). The "documentation" I've found about the "session id context" seems to suggest that a process-wide constant is enough. (and you can verify that caching occurs using the new SSLContext.session_stats() method) > Or I may misunderstand how SSL sessions work in OpenSSL entirely. The > documentation for them is on par with that for most of the rest of > OpenSSL. Agreed. Regards Antoine. From steve at pearwood.info Wed Nov 24 16:44:57 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 25 Nov 2010 02:44:57 +1100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> Message-ID: <4CED32F9.5050004@pearwood.info> Nick Coghlan wrote: > On Wed, Nov 24, 2010 at 10:30 PM, Michael Foord > wrote: >> Based on a non-exhaustive search, Python standard library modules currently >> using integers for constants: > > Thanks for that review. I think following up on the "NamedConstant" > idea may make more sense than pursuing enums in their own right. Pardon me if I've missed something in this thread, but when you say "NamedConstant", do you mean actual constants that can only be bound once but not re-bound? If so, +1. If not, what do you mean? I thought PEP 3115 could be used to implement such constants, but I can't get it to work... class readonlydict(dict): def __setitem__(self, key, value): if key in self: raise TypeError("can't rebind constant") dict.__setitem__(self, key, value) # Need to also handle updates, del, pop, etc. class MetaConstant(type): @classmethod def __prepare__(metacls, name, bases): return readonlydict() def __new__(cls, name, bases, classdict): assert type(classdict) is readonlydict return type.__new__(cls, name, bases, classdict) class Constant(metaclass=MetaConstant): a = 1 b = 2 c = 3 What I expect is that Constant.a should return 1, and Constant.a=2 should raise TypeError, but what I get is a normal class __dict__. >>> Constant.a 1 >>> Constant.a = 2 >>> Constant.a 2 -- Steven From exarkun at twistedmatrix.com Wed Nov 24 17:23:12 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Wed, 24 Nov 2010 16:23:12 -0000 Subject: [Python-Dev] OpenSSL Vulnerability (openssl-1.0.0a) In-Reply-To: <20101124161120.5ddd106c@pitrou.net> References: <4CEB3F72.7000006@m2.ccsnet.ne.jp> <20101123150219.29e20374@pitrou.net> <720EFE43-119F-4F2F-BCB1-939275B5FA6E@twistedmatrix.com> <1290585733.3642.2.camel@localhost.localdomain> <20101124150106.2109.660794265.divmod.xquotient.197@localhost.localdomain> <20101124161120.5ddd106c@pitrou.net> Message-ID: <20101124162312.2109.1025683352.divmod.xquotient.215@localhost.localdomain> On 03:11 pm, solipsis at pitrou.net wrote: >On Wed, 24 Nov 2010 15:01:06 -0000 >exarkun at twistedmatrix.com wrote: >> > >> >If I believe the link above: >> > 1CAny OpenSSL based TLS server is vulnerable if it is multi-threaded >>and >> >uses OpenSSL's internal caching mechanism. Servers that are >> >multi-process and/or disable internal session caching are NOT >> >affected. 1D >> > >> >So, you just have to create a multithreaded TLS server which doesn't >> >disable server-side session caching (it is enabled by default >>according >> >to >>http://www.openssl.org/docs/ssl/SSL_CTX_set_session_cache_mode.html >> >) >> >>Hm. The session cache is enabled by default, but nothing will ever >>use >>it unless the server specifies a session id using >>SSL_set_session_id_context or SSL_CTX_set_session_id_context. Python >>doesn't expose these, so I don't think any Python SSL server can set >>them. > >Well, Python calls SSL_CTX_set_session_id_context() implicitly, >starting >from 3.2 (precisely so that the session cache gets used). The >"documentation" I've found about the "session id context" seems to >suggest that a process-wide constant is enough. Ah. Okay, then Python 3.2 would be vulnerable. Good thing it isn't released yet. ;) Jean-Paul From benjamin at python.org Wed Nov 24 17:32:56 2010 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 24 Nov 2010 10:32:56 -0600 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CED32F9.5050004@pearwood.info> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED32F9.5050004@pearwood.info> Message-ID: 2010/11/24 Steven D'Aprano : > Nick Coghlan wrote: >> >> On Wed, Nov 24, 2010 at 10:30 PM, Michael Foord >> wrote: >>> >>> Based on a non-exhaustive search, Python standard library modules >>> currently >>> using integers for constants: >> >> Thanks for that review. I think following up on the "NamedConstant" >> idea may make more sense than pursuing enums in their own right. > > Pardon me if I've missed something in this thread, but when you say > "NamedConstant", do you mean actual constants that can only be bound once > but not re-bound? If so, +1. If not, what do you mean? > > I thought PEP 3115 could be used to implement such constants, but I can't > get it to work... > > class readonlydict(dict): > ? ?def __setitem__(self, key, value): > ? ? ? ?if key in self: > ? ? ? ? ? ?raise TypeError("can't rebind constant") > ? ? ? ?dict.__setitem__(self, key, value) > ? ?# Need to also handle updates, del, pop, etc. > > class MetaConstant(type): > ? ?@classmethod > ? ?def __prepare__(metacls, name, bases): > ? ? ? ?return readonlydict() > ? ?def __new__(cls, name, bases, classdict): > ? ? ? ?assert type(classdict) is readonlydict > ? ? ? ?return type.__new__(cls, name, bases, classdict) > > class Constant(metaclass=MetaConstant): > ? ?a = 1 > ? ?b = 2 > ? ?c = 3 > > > What I expect is that Constant.a should return 1, and Constant.a=2 should > raise TypeError, but what I get is a normal class __dict__. The construction namespace can be customized, but class.__dict__ must always be a real dict. -- Regards, Benjamin From jsbueno at python.org.br Wed Nov 24 18:23:57 2010 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Wed, 24 Nov 2010 15:23:57 -0200 Subject: [Python-Dev] Fwd: constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> Message-ID: Hi -- If I may add my 0.02 cents - this sample has a sample implementation of the proposed features I found most interesting up to now: 1) inherit from int 2) display the constant's name on 'repr' 3) optionally populate a module with the constants 4) Optionally provide a starting value for the enum 5) Optionally provide a mapping with the values http://pastebin.com/6f1u35qJ (implementation is in python 2) Todo here: 6) Make them "read only" 7) Make the base type optional, with "int" as default - but also being able to create "constants" inheriting from other objects 8) more ideas? I am willing to play along this sample code as discussion goes on if there is any feedback. ?js ?-><- From alexander.belopolsky at gmail.com Wed Nov 24 18:37:43 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 24 Nov 2010 12:37:43 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Nov 23, 2010 at 2:18 PM, Amaury Forgeot d'Arc wrote: .. >> Given the apparent difficulty of writing even basic text processing >> algorithms in presence of surrogate pairs, I wonder how wise it is to >> expose Python users to them. > > This was already discussed two years ago: > > http://mail.python.org/pipermail/python-dev/2008-July/080900.html > Thanks for the link. Let me summarize that discussion as I read it. The discussion starts with a reference to Guido's 2001 post which concluded with """ ... if we had wanted to use a variable-lenth internal representation, we should have picked UTF-8 way back, like Perl did. Moving to a UTF-16-based internal representation now will give us all the problems of the Perl choice without any of the benefits. """ [1] and proposes to move to USC-4 completely for Python 3.0. Note that this is not the option that I would like to discuss here. I don't propose to discuss abandoning narrow builds. Instead, I would like to discuss the costs and benefits associated with using variable width CES as an internal representation. This is where the 2008 discussion moved. OP did not realize that narrow build supported UTF-16 and like myself was surprised that application developers should be aware of surrogates if they want to use narrow builds. It was also suggested that Python itself is likely to have many bugs that can be triggered by non-BMP characters on narrow builds. Guido's response was: """ I'd also prefer to receive bug reports about breakages actually encountered in the wild than purely theoretical issues """ I don't think this is a good position to take. Programs that expect one code unit where Python may produce two are likely to have security holes. Even when programmers carefully sanitize their input, they are likely to do it at the code point level based on Unicode category and 0xFFFF boundary does not mean anything special for their applications. I think anyone who wants to write a robust application has two choices in practice: (a) use wide Unicode build; (b) restrict all text to BMP. Supporting surrogates at the application level is likely to be prohibitively expensive. It was later suggested that the main benefit of "UTF-16" builds is that they can easily interface with system libraries that are "UTF-16" based. However, how likely are these libraries be bug-free when it comes to non-BMP characters? The history teaches us that not very likely. Daniel Arbuckle presented arguments against imposing the burden of dealing with surrogates on application writers. [2] The recurrent theme on the thread was that non-BMP characters are rare and those who need them can afford the extra development cost associated with the surrogates. This point was very eloquently articulated by Guido: """ Who are the many here? Who are the few? I'd venture that (at least for the foreseeable future, say, until China will finally have taken over the role of the US as the de-facto dominant super power :-) the many are people whose app will never see a Unicode character outside the BMP, or who do such minimal string processing that their code doesn't care whether it's handling UTF-16-encoded data. """ [3] This argument can also be used to support the position that narrow builds should not support non-BMP characters. Later the discussion started resembling this thread when it went into a scholastic dispute over fine points in Unicode Standard terminology. :-) Then BDFL vetoed len(u"\U00012345") returning 1 on narrow builds. [4] I would be against that as well. I don't see len("\U00012345") == 2 as a big problem because application developers can simply avoid using \U literals if they don't want to support non-BMP characters. On the other hand, an option to warn users about non-BMP literals on a narrow build may be useful but it is easy to implement in lint-like tools. There were multiple suggestions for standard library additions to help application writers to deal with surrogate pairs, but as far as I can tell, nothing has been done in this area in the following two years. I don't think there is a recipe on how to fix legacy character-by-character processing loop such as for c in string: ... to make it iterate over code points consistently in wide and narrow builds. (Note that I am not asking for a grapheme iterator here. This is clearly an application level feature.) > So yes, wrap() and center() should be fixed. I opened an issue 10521 for that. [5] I am fully prepared to see it dismissed as "theoretical" and be closed with "won't fix" or linger indefinitely. Fixing it would most likely involve writing the second version of pad() utility function specifically for the narrow build. All examples I've seen in Python C code of dealing with surrogates came with hand-coded #ifndef Py_UNICODE_WIDE fragments and no user-friendly macros or APIs that would abstract it away. A quick grep for maxunicode in the standard library revealed only one case of "narrow-build aware" code: if sys.maxunicode != 65535: # XXX: negation does not work with big charsets return charset See Lib/sre_compile.py. Not exactly a model to follow. To conclude, I feel that rather than trying to fully support non-BMP characters as surrogate pairs in narrow builds, we should make it easier for application developers to avoid them. If abandoning internal use of UTF-16 is not an option, I think we should at least add an option for decoders that currently produce surrogate pairs to treat non-BMP characters as errors and handle them according to user's choice. [1] http://mail.python.org/pipermail/i18n-sig/2001-June/001107.html [2] http://mail.python.org/pipermail/python-dev/2008-July/080912.html [3] http://mail.python.org/pipermail/python-dev/2008-July/080940.html [4] http://mail.python.org/pipermail/python-dev/2008-July/080916.html [5] http://bugs.python.org/issue10521 From fuzzyman at voidspace.org.uk Wed Nov 24 18:41:08 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 24 Nov 2010 17:41:08 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> Message-ID: <4CED4E34.5060400@voidspace.org.uk> On 24/11/2010 14:08, Nick Coghlan wrote: > On Wed, Nov 24, 2010 at 10:30 PM, Michael Foord > wrote: >> Based on a non-exhaustive search, Python standard library modules currently >> using integers for constants: > Thanks for that review. I think following up on the "NamedConstant" > idea may make more sense than pursuing enums in their own right. That > way we could get the debugging benefits on the Python side regardless > of any type constraints on the value (e.g. needing to be an integer in > order to interface to C code), without needing to design an enum API > that suited all purposes. Can you explain what you see as the difference? I'm not particularly interested in type validation but I like the fact that typical enum APIs allow you to group constants: the generated constant class acts as a namespace for all the defined constants. Are you just suggesting something along the lines of: class NamedConstant(int): def __new__(cls, name, val): return int.__new__(cls, val) def __init__(self, name, val): self._name = name def __repr__(self): return ' ' % self._name FOO = NamedConstant('FOO', 3) In general the less features the better, but I'd like a few more features than that. :-) All the best, Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From mal at egenix.com Wed Nov 24 19:50:57 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 24 Nov 2010 19:50:57 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CED5E91.9070705@egenix.com> Alexander Belopolsky wrote: > To conclude, I feel that rather than trying to fully support non-BMP > characters as surrogate pairs in narrow builds, we should make it > easier for application developers to avoid them. I don't understand what you're after here. Programmers can easily avoid them by not using them :-) > If abandoning > internal use of UTF-16 is not an option, I think we should at least > add an option for decoders that currently produce surrogate pairs to > treat non-BMP characters as errors and handle them according to user's > choice. But what do you gain by doing this ? You'd lose the round-trip safety of those codecs and that's not a good thing. Note that most text processing APIs in Python work based on code units, which in most cases represent single code points, but in some cases can also represent surrogates (both on UCS-2 and on UCS-4 builds). E.g. str.center(n) centers the string in a padded string that is composed of n code units. Whether that operation will result in a text that's centered visually on output is a completely different story. The original string could contain surrogates, it could also contain combing code points, so the visual presentation of the result may very well not be centered at all; it may not even appear as having the length n to the user. Since we're not going change the semantics of those APIs, it is OK to not support padding with non-BMP code points on UCS-2 builds. Supporting such cases would only cause problems: * if the methods would pad with surrogates, the resulting string would no longer have length n; breaking the assumption that len(str.center(n)) == n * if the methods would pad with half the number of surroagtes to make sure that len(str.center(n)) == n, the resulting output to e.g. a terminal would be further off, than what you already have with surrogates and combining code points in the original string. More on codecs supporting surrogates: http://mail.python.org/pipermail/python-dev/2008-July/080915.html Perhaps it's time to reconsider a project I once started but that never got off the ground: http://mail.python.org/pipermail/python-dev/2008-July/080911.html Here's the pre-PEP: http://mail.python.org/pipermail/python-dev/2001-July/015938.html -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 24 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From brett at python.org Wed Nov 24 20:04:01 2010 From: brett at python.org (Brett Cannon) Date: Wed, 24 Nov 2010 11:04:01 -0800 Subject: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS In-Reply-To: <4CEC4917.2070508@udel.edu> References: <20101123203252.39BE7EE9CF@mail.python.org> <4CEC43A4.80907@netwok.org> <4CEC4917.2070508@udel.edu> Message-ID: On Tue, Nov 23, 2010 at 15:07, Terry Reedy wrote: > > > On 11/23/2010 5:43 PM, ?ric Araujo wrote: >>> >>> Modified: python/branches/py3k/Misc/ACKS >>> >>> ============================================================================== >>> --- python/branches/py3k/Misc/ACKS ? ? ?(original) >>> +++ python/branches/py3k/Misc/ACKS ? ? ?Tue Nov 23 21:32:47 2010 >>> @@ -1,4 +1,4 @@ >>> -Acknowledgements >>> +?Acknowledgements >> >> This change introduced a so-called UTF-8 BOM in the file. ?Is >> TortoiseSvn the culprit or a text editor? > > I used Notepad to edit the file, TortoiseSvn to commit, the same as I did > for #9222, rev86702, Lib\idlelib\IOBinding.py, yesterday. > If the latter is OK, perhaps *.py gets filtered better than misc. text > files. I believe I have the config as specified in dev/faq. Adding the BOM will be an editor thing, not a svn thing. Doing a Google search for [ms notepad bom] shows that Notepad did the "helpful", invisible edit. -Brett > > [miscellany] > enable-auto-props = yes > > [auto-props] > * = svn:eol-style=native > *.c = svn:keywords=Id > *.h = svn:keywords=Id > *.py = svn:keywords=Id > *.txt = svn:keywords=Author Date Id Revision > > Terry > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brett%40python.org > From tjreedy at udel.edu Wed Nov 24 20:25:17 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 24 Nov 2010 14:25:17 -0500 Subject: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS In-Reply-To: References: <20101123203252.39BE7EE9CF@mail.python.org> <4CEC43A4.80907@netwok.org> <4CEC4917.2070508@udel.edu> Message-ID: On 11/24/2010 2:04 PM, Brett Cannon wrote: > On Tue, Nov 23, 2010 at 15:07, Terry Reedy wrote: >> I used Notepad to edit the file, TortoiseSvn to commit, the same as I did >> for #9222, rev86702, Lib\idlelib\IOBinding.py, yesterday. >> If the latter is OK, perhaps *.py gets filtered better than misc. text >> files. I believe I have the config as specified in dev/faq. > > Adding the BOM will be an editor thing, not a svn thing. Doing a > Google search for [ms notepad bom] shows that Notepad did the > "helpful", invisible edit. So I presume it did the same with IOBinding.py. Does *.py get filtered is a way that could be extended to no-extention files? Do *.txt files get BOM filtered off? Should all text files in repository have some extension (default .txt)? More to the point, can better filtering be added to the new hg repository? Or can a local Windows hg setup have such filtering on local commits before pushing? I know now that I could always edit with IDLE's editor, but it is a lot easier to right click and select edit than it is to run thru the directory tree in an open dialog. And of course, since the pseudo-BOM addition is undocumented within notepad itself, and probably other editors, it is easy to not know. -- Terry Jan Reedy From g.brandl at gmx.net Wed Nov 24 21:04:40 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 24 Nov 2010 21:04:40 +0100 Subject: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS In-Reply-To: References: <20101123203252.39BE7EE9CF@mail.python.org> <4CEC43A4.80907@netwok.org> <4CEC4917.2070508@udel.edu> Message-ID: Am 24.11.2010 20:25, schrieb Terry Reedy: > On 11/24/2010 2:04 PM, Brett Cannon wrote: >> On Tue, Nov 23, 2010 at 15:07, Terry Reedy wrote: > >>> I used Notepad to edit the file, TortoiseSvn to commit, the same as I did >>> for #9222, rev86702, Lib\idlelib\IOBinding.py, yesterday. >>> If the latter is OK, perhaps *.py gets filtered better than misc. text >>> files. I believe I have the config as specified in dev/faq. >> >> Adding the BOM will be an editor thing, not a svn thing. Doing a >> Google search for [ms notepad bom] shows that Notepad did the >> "helpful", invisible edit. > > So I presume it did the same with IOBinding.py. Does *.py get filtered > is a way that could be extended to no-extention files? Do *.txt files > get BOM filtered off? Should all text files in repository have some > extension (default .txt)? > > More to the point, can better filtering be added to the new hg > repository? Or can a local Windows hg setup have such filtering on local > commits before pushing? Of course it can; it's just a matter of writing the respective hooks. What we *can* do in any case is to check for UTF-8 "BOMs" server-side in the whitespace checking hook. > I know now that I could always edit with IDLE's editor, but it is a lot > easier to right click and select edit than it is to run thru the > directory tree in an open dialog. And of course, since the pseudo-BOM > addition is undocumented within notepad itself, and probably other > editors, it is easy to not know. It should show up as an invisible change in the first line of a file when you look at a "svn diff". (It is a very good practice to look at a diff before committing anyway.) Georg From alexander.belopolsky at gmail.com Wed Nov 24 21:06:25 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 24 Nov 2010 15:06:25 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CED5E91.9070705@egenix.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CED5E91.9070705@egenix.com> Message-ID: On Wed, Nov 24, 2010 at 1:50 PM, M.-A. Lemburg wrote: .. >> add an option for decoders that currently produce surrogate pairs to >> treat non-BMP characters as errors and handle them according to user's >> choice. > > But what do you gain by doing this ? You'd lose the round-trip > safety of those codecs and that's not a good thing. > Any non-trivial text processing is likely to be broken in presence of surrogates. Producing them on input is just trading known issue for an unknown one. Processing surrogate pairs in python code is hard. Software that has to support non-BMP characters will most likely be written for a wide build and contain subtle bugs when run under a narrow build. Note that my latest proposal does not abolish surrogates outright. Users who want them can still use something like "surrogateescape" error handler for non-BMP characters. > Since we're not going change the semantics of those APIs, > it is OK to not support padding with non-BMP code points on > UCS-2 builds. > Well, I think more users are willing to accept slightly misaligned text in their web-app logs than those willing to cope with Traceback (most recent call last): ... TypeError: The fill character must be exactly one character long there. Yes, allowing non-trusted users to specify fill character is unlikely, but it is quite likely that naive slicing or iteration over string units would result in Traceback (most recent call last): ... UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed > Supporting such cases would only cause problems: > > * if the methods would pad with surrogates, the resulting > ?string would no longer have length n; breaking the > ?assumption that len(str.center(n)) == n > I agree, but how is this different from breaking the assumption that len(chr(i)) == 1? > * if the methods would pad with half the number of surroagtes > ?to make sure that len(str.center(n)) == n, the resulting > ?output to e.g. a terminal would be further off, than what > ?you already have with surrogates and combining code points > ?in the original string. > I agree again. What I suggested on the tracker, supporting non-BMP characters in narrow builds should mean that library functions given input with the same UCS-4 encoding should produce output with the same UCS-4 encoding. > Perhaps it's time to reconsider a project I once started > but that never got off the ground: > > ?http://mail.python.org/pipermail/python-dev/2008-July/080911.html > > Here's the pre-PEP: > > ?http://mail.python.org/pipermail/python-dev/2001-July/015938.html I agree again, but I feel that exposing code units rather than code points at the Python string level takes us back to 2.x days of mixing bytes and strings. Let me quote Guido circa 2001 again: """ ... if we had wanted to use a variable-lenth internal representation, we should have picked UTF-8 way back, like Perl did. Moving to a UTF-16-based internal representation now will give us all the problems of the Perl choice without any of the benefits. """ I don't understand what changed since 2001 that made this argument invalid. I note that an opinion has been raised on this thread that if we want compressed internal representation for strings, we should use UTF-8. I tend to agree, but UTF-8 has been repeatedly rejected as too hard to implement. What makes UTF-16 easier than UTF-8? Only the fact that you can ignore bugs longer, in my view. From g.brandl at gmx.net Wed Nov 24 21:24:49 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 24 Nov 2010 21:24:49 +0100 Subject: [Python-Dev] [Preview] Comments and change proposals on documentation Message-ID: Hi, at , you can look at a version of the 3.2 docs that has the upcoming commenting feature. JavaScript is mandatory. I've switched on anonymous comments for testing, but usually at least comments from anonymous users can be moderated. Be sure to test the "propose a change" feature too. Login currently allows OpenID exclusively. Credits go to Jacob Mason, whose GSOC project is responsible for almost all of what you see there. [1] Please test on a smaller page, such as , there is currently a speed issue with larger pages. (Helpful tips from JS experts are welcome.) Other things I have to do before this can go live: * reuse existing logins from either wiki or tracker? * (re)Captcha integration for anonymous comments * easier moderation (currently emails are sent on new comments) * facility for (semi)automatic applying of proposals (once Hg is live, this should be easy to do due to the separation between commit and merge) * allow commenting on code blocks (figure out where to place the "bubble") Any feedback is appreciated (I'd suggest mailing it to doc-SIG only, to avoid cluttering up python-dev). Have fun, Georg [1] The source for the webapp is at , but most of the functionality is implemented in Sphinx trunk. From anurag.chourasia at gmail.com Wed Nov 24 22:01:32 2010 From: anurag.chourasia at gmail.com (Anurag Chourasia) Date: Thu, 25 Nov 2010 02:31:32 +0530 Subject: [Python-Dev] collect2: library libpython2.6 not found while building extensions (--enable-shared) Message-ID: All, When I configure python to enable shared libraries, none of the extensions are getting built during the make step due to this error. building 'cStringIO' extension gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I. -I/u01/home/apli/wm/GDD/Python-2.6.6/./Include -I. -IInclude -I./Include -I/opt/freeware/include -I/opt/freeware/include/readline -I/opt/freeware/include/ncurses -I/usr/local/include -I/u01/home/apli/wm/GDD/Python-2.6.6/Include -I/u01/home/apli/wm/GDD/Python-2.6.6 -c /u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.c -o build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.o ./Modules/ld_so_aix gcc -pthread -bI:Modules/python.exp build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.o -L/usr/local/lib *-lpython2.6* -o build/lib.aix-5.3-2.6/cStringIO.so *collect2: library libpython2.6 not found* building 'cPickle' extension gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I. -I/u01/home/apli/wm/GDD/Python-2.6.6/./Include -I. -IInclude -I./Include -I/opt/freeware/include -I/opt/freeware/include/readline -I/opt/freeware/include/ncurses -I/usr/local/include -I/u01/home/apli/wm/GDD/Python-2.6.6/Include -I/u01/home/apli/wm/GDD/Python-2.6.6 -c /u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.c -o build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.o ./Modules/ld_so_aix gcc -pthread -bI:Modules/python.exp build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.o -L/usr/local/lib *-lpython2.6* -o build/lib.aix-5.3-2.6/cPickle.so *collect2: library libpython2.6 not found* This is on AIX 5.3, GCC 4.2, Python 2.6.6 I can confirm that there is a libpython2.6.a file in the top level directory from where I am doing the configure/make etc Here are the options supplied to the configure command ./configure --enable-shared --disable-ipv6 --with-gcc=gcc CPPFLAGS="-I /opt/freeware/include -I /opt/freeware/include/readline -I /opt/freeware/include/ncurses" Please guide me in getting past this error. Thanks for your help on this. Regards, Anurag -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Wed Nov 24 23:13:50 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 24 Nov 2010 23:13:50 +0100 Subject: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS In-Reply-To: References: <20101123203252.39BE7EE9CF@mail.python.org> <4CEC43A4.80907@netwok.org> <4CEC4917.2070508@udel.edu> Message-ID: <4CED8E1E.5050400@v.loewis.de> > So I presume it did the same with IOBinding.py. No. This file contains only ASCII characters, so notepad has decided to not add the BOM. Regards, Martin From dreamingforward at gmail.com Thu Nov 25 00:38:01 2010 From: dreamingforward at gmail.com (average) Date: Wed, 24 Nov 2010 16:38:01 -0700 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> Message-ID: Is immutability a general need that should have general solution? By generalizing the idea to lists/tuples, set/frozenset, dicts, and strings (for example), it seems one could simplify the container classes, eliminate code complexity, and perhaps improve resource utilization. mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Nov 25 00:41:58 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 25 Nov 2010 00:41:58 +0100 Subject: [Python-Dev] r86731 - in python/branches/py3k: Lib/distutils/command/install.py Lib/distutils/sysconfig.py Lib/sysconfig.py Makefile.pre.in Misc/python.pc.in configure configure.in References: <20101124194347.C5C86EEA56@mail.python.org> Message-ID: <20101125004158.32b1ceaa@pitrou.net> On Wed, 24 Nov 2010 20:43:47 +0100 (CET) barry.warsaw wrote: > Author: barry.warsaw > Date: Wed Nov 24 20:43:47 2010 > New Revision: 86731 > > Log: > Final patch for issue 9807. This seems to have broken compilation under Windows: Build started: Project: ssl, Configuration: Debug|Win32 Performing Makefile project actions Traceback (most recent call last): File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 519, in main() File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 507, in main known_paths = addusersitepackages(known_paths) File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 253, in addusersitepackages user_site = getusersitepackages() File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 228, in getusersitepackages user_base = getuserbase() # this will also set USER_BASE File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 218, in getuserbase USER_BASE = get_config_var('userbase') File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\sysconfig.py", line 586, in get_config_var return get_config_vars().get(name) File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\sysconfig.py", line 478, in get_config_vars _CONFIG_VARS['abiflags'] = sys.abiflags AttributeError: 'module' object has no attribute 'abiflags' Regards Antoine. From barry at python.org Thu Nov 25 00:50:25 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 24 Nov 2010 18:50:25 -0500 Subject: [Python-Dev] r86731 - in python/branches/py3k: Lib/distutils/command/install.py Lib/distutils/sysconfig.py Lib/sysconfig.py Makefile.pre.in Misc/python.pc.in configure configure.in In-Reply-To: <20101125004158.32b1ceaa@pitrou.net> References: <20101124194347.C5C86EEA56@mail.python.org> <20101125004158.32b1ceaa@pitrou.net> Message-ID: <20101124185025.6cb67127@mission> On Nov 25, 2010, at 12:41 AM, Antoine Pitrou wrote: >On Wed, 24 Nov 2010 20:43:47 +0100 (CET) >barry.warsaw wrote: >> Author: barry.warsaw >> Date: Wed Nov 24 20:43:47 2010 >> New Revision: 86731 >> >> Log: >> Final patch for issue 9807. > >This seems to have broken compilation under Windows: > >Build started: Project: ssl, Configuration: Debug|Win32 >Performing Makefile project actions >Traceback (most recent call last): > File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 519, in > main() > File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 507, in main > known_paths = addusersitepackages(known_paths) > File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 253, in addusersitepackages > user_site = getusersitepackages() > File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 228, in getusersitepackages > user_base = getuserbase() # this will also set USER_BASE > File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 218, in getuserbase > USER_BASE = get_config_var('userbase') > File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\sysconfig.py", line 586, in get_config_var > return get_config_vars().get(name) > File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\sysconfig.py", line 478, in get_config_vars > _CONFIG_VARS['abiflags'] = sys.abiflags >AttributeError: 'module' object has no attribute 'abiflags' As discussed on IRC, _CONFIG_VARS['abiflags'] = '' if sys.abiflags is not defined. Amaury is going to test that. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From greg.ewing at canterbury.ac.nz Thu Nov 25 01:19:37 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Nov 2010 13:19:37 +1300 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> Message-ID: <4CEDAB99.2000005@canterbury.ac.nz> On 24/11/10 13:22, James Y Knight wrote: > Instead, provide bidirectional iterators which can traverse the string by byte, > codepoint, or by grapheme Maybe it would be a good idea to add some iterators like this to Python. (Or has the time machine beaten me there?) -- Greg From stephen at xemacs.org Thu Nov 25 03:17:44 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 25 Nov 2010 11:17:44 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CED5E91.9070705@egenix.com> Message-ID: <87bp5eb0zb.fsf@uwakimon.sk.tsukuba.ac.jp> Alexander Belopolsky writes: > Any non-trivial text processing is likely to be broken in presence of > surrogates. If you're worried about this, write a UCS-2-producing codec that rejects surrogates or stuffs them into the private zone of the BMP. Maybe such a codec should be default, but so far nobody seems to want one enough; they want UTF-16 even though they know it's wrong. One of the things that makes the 16-bit code unit attractive to me is that the options for working around the variable-width nature of UTF-16 (without actually implementing conformance to UTF-16 in internal operations!) are many. If you use octets as code units, you don't have such options: you have to do it right. > Processing surrogate pairs in python code is hard. Sure, but as James Knight and MAL point out, so is processing compose characters, and those errors will go undetected in your proposals, even with a strict UCS-2 definition. What can you do? Banning composing characters isn't going to fly! > Yes, allowing non-trusted users to specify fill character is unlikely, > but it is quite likely that naive slicing or iteration over string > units would result in > > Traceback (most recent call last): Naive slicing yes, but naive iteration (ie, iteration that consumes the whole string, or up to a known character, rather than up to a specified position) is highly unlikely to result in such a traceback. It is precisely that property (non-BMP characters get passed through unchanged, or ignored) that makes extension to non-BMP code points attractive. > I agree again, but I feel that exposing code units rather than code > points at the Python string level takes us back to 2.x days of mixing > bytes and strings. It does, but there's a difference. With bytes as UTF-8, only ASCII values have defined semantics in Unicode. The rest have semantics that is context-dependent, and they are frequent in any non-English processing and many English use cases (math symbols, correctly- oriented punctuation). With 16-bit code units, all values have well- defined semantics in Unicode, and non-characters are going to be extremely rare in the vast majority of use cases. IOW, you can think of Python as a UCS-2 device processing characters, and let surrounding UTF-16 processors deal with the errors. > Let me quote Guido circa 2001 again: > > """ > ... if we had wanted to use a > variable-lenth internal representation, we should have picked UTF-8 > way back, like Perl did. Moving to a UTF-16-based internal > representation now will give us all the problems of the Perl choice > without any of the benefits. > """ > > I don't understand what changed since 2001 that made this argument > invalid. Nothing. The internal representation of Python is UCS-2, not UTF-16. People who want to think otherwise are kidding themselves. The presence of surrogates is not sufficient to call something UTF-16. Preserving the Unicode code points through any builtin operations is a necessary condition, and Python doesn't do that. *However*, in my opinion, it's not a big deal to allow surrogates in UCS-2 a la ISO 10646-1:1996. That lets people who want a quick and dirty way to handle BMP text that *might* (but usually won't) contain some non-BMP characters go a long way fast. "Although practicality beats purity." > I note that an opinion has been raised on this thread that > if we want compressed internal representation for strings, we should > use UTF-8. I tend to agree, but UTF-8 has been repeatedly rejected as > too hard to implement. What makes UTF-16 easier than UTF-8? Only the > fact that you can ignore bugs longer, in my view. That's mostly true. My guess is that we can probably ignore those bugs for as long as it takes someone to write the higher-level libraries that James suggests and MAL has actually proposed and started a PEP for. From greg.ewing at canterbury.ac.nz Thu Nov 25 03:35:50 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Nov 2010 15:35:50 +1300 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87mxozayam.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> <3C1ADB64-63F3-4165-926D-EDE9846E0DBD@fuhm.net> <87mxozayam.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEDCB86.9030506@canterbury.ac.nz> On 24/11/10 22:03, Stephen J. Turnbull wrote: > But > if you actually need to remember positions, or regions, to jump to > later or to communicate to other code that manipulates them, doing > this stuff the straightforward way (just copying the whole iterator > object to hang on to its state) becomes expensive. If the internal representation of a text pointer (I won't call it an iterator because that means something else in Python) is a byte offset or something similar, it shouldn't take up any more space than a Python int, which is what you'd be using anyway if you represented text positions by grapheme indexes or whatever. If you want the text pointer to also remember which string it points into, it'll be a bit bigger, but again, no bigger than you would need to get the same functionality using a grapheme index plus a reference to the original string. Probably smaller, because it would all be encapsulated in one object. So I don't really see what you're arguing for here. How do *you* think positions in unicode strings should be represented? -- Greg From greg.ewing at canterbury.ac.nz Thu Nov 25 04:19:33 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Nov 2010 16:19:33 +1300 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEDD5C5.9050306@canterbury.ac.nz> On 25/11/10 06:37, Alexander Belopolsky wrote: > I don't think there is a recipe on how to fix legacy > character-by-character processing loop such as > > for c in string: > ... > > to make it iterate over code points consistently in wide and narrow > builds. A couple of possibilities: 1) Make things so that 'for c in string' does actually iterate over characters rather than code units. This could break existing code, though. 2) Provide some things like for c in string.chars(): ... for c in string.graphemes(): ... where chars() and graphemes() return appropriate iterators. (Or possibly iterable views, but that would raise the expectation that the views could also be randomly indexed by char or grapheme, which we probably wouldn't want to support.) -- Greg From greg.ewing at canterbury.ac.nz Thu Nov 25 04:46:53 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Nov 2010 16:46:53 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> Message-ID: <4CEDDC2D.204@canterbury.ac.nz> On 25/11/10 12:38, average wrote: > Is immutability a general need that should have general solution? I don't think it really generalizes. Tuples are not just frozen lists, for example -- they have a different internal structure that's more efficient to create and access. -- Greg From stephen at xemacs.org Thu Nov 25 04:55:40 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 25 Nov 2010 12:55:40 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEDCB86.9030506@canterbury.ac.nz> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> <3C1ADB64-63F3-4165-926D-EDE9846E0DBD@fuhm.net> <87mxozayam.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEDCB86.9030506@canterbury.ac.nz> Message-ID: <87ipzm6oqr.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > On 24/11/10 22:03, Stephen J. Turnbull wrote: > > But > > if you actually need to remember positions, or regions, to jump to > > later or to communicate to other code that manipulates them, doing > > this stuff the straightforward way (just copying the whole iterator > > object to hang on to its state) becomes expensive. > > If the internal representation of a text pointer (I won't call it > an iterator because that means something else in Python) is a byte > offset or something similar, it shouldn't take up any more space > than a Python int, which is what you'd be using anyway if you > represented text positions by grapheme indexes or whatever. That's not necessarily true. Eg, in Emacs ("there you go again"), Lisp integers are not only immediate (saving one pointer), but the type is encoded in the lower bits, so that there is no need for a type pointer -- the representation is smaller than the opaque marker type. Altogether, up to 8 of 12 bytes saved on a 32-bit platform, or 16 of 24 bytes on a 64-bit platform. In Python it's true that markers can use the same data structure as integers and simply provide different methods, and it's arguable that Python's design is better. But if you use bytes internally, then you have problems. Do you expose that byte value to the user? Can users (programmers using the language and end users) specify positions in terms of byte values? If so, what do you do if the user specifies a byte value that points into a multibyte character? What if the user wants to specify position by number of characters? Can you translate efficiently? As I say elsewhere, it's possible that there really never is a need to efficiently specify an absolute position in a large text as a character (grapheme, whatever) count. But I think it would be hard to implement an efficient text-processing *language*, eg, a Python module for *full conformance* in handling Unicode, on top of UTF-8. Any time you have an algorithm that requires efficient access to arbitrary text positions, you'll spend all your skull sweat fighting the representation. At least, that's been my experience with Emacsen. > So I don't really see what you're arguing for here. How do > *you* think positions in unicode strings should be represented? I think what users should see is character positions, and they should be able to specify them numerically as well as via an opaque marker object. I don't care whether that position is represented as bytes or characters internally, except that the experience of Emacsen is that representation as byte positions is both inefficient and fragile. The representation as character positions is more robust but slightly more inefficient. From alexander.belopolsky at gmail.com Thu Nov 25 05:37:33 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 24 Nov 2010 23:37:33 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87bp5eb0zb.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CED5E91.9070705@egenix.com> <87bp5eb0zb.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Nov 24, 2010 at 9:17 PM, Stephen J. Turnbull wrote: .. > ?> I note that an opinion has been raised on this thread that > ?> if we want compressed internal representation for strings, we should > ?> use UTF-8. ?I tend to agree, but UTF-8 has been repeatedly rejected as > ?> too hard to implement. ?What makes UTF-16 easier than UTF-8? ?Only the > ?> fact that you can ignore bugs longer, in my view. > > That's mostly true. ?My guess is that we can probably ignore those > bugs for as long as it takes someone to write the higher-level > libraries that James suggests and MAL has actually proposed and > started a PEP for. > As far as I can tell, that PEP generated grand total of one comment in nine years. This may or may not be indicative of how far away we are from seeing it implemented. :-) As far as UTF-8 vs. UCS-2/4 debate, I have an idea that may be even more far fetched. Once upon a time, Python Unicode strings supported buffer protocol and would lazily fill an internal buffer with bytes in the default encoding. In 3.x the default encoding has been fixed as UTF-8, buffer protocol support was removed from strings, but the internal buffer caching (now UTF-8) encoded representation remained. Maybe we can now implement defenc logic in reverse. Recall that strings are stored as UCS-2/4 sequences, but once buffer is requested in 2.x Python code or char* is obtained via _PyUnicode_AsStringAndSize() at the C level in 3.x, an internal buffer is filled with UTF-8 bytes and defenc is set to point to that buffer. So the idea is for strings to store their data as UTF-8 buffer pointed by defenc upon construction. If an application uses string indexing, UTF-8 only strings will lazily fill their UCS-2/4 buffer. Proper, Unicode-aware algorithms such as grapheme, word or line iteration or simple operations such as concatenation, search or substitution would operate directly on defenc buffers. Presumably over time fewer and fewer applications would use code unit indexing that require UCS-2/4 buffer and eventually Python strings can stop supporting indexing altogether just like they stopped supporting the buffer protocol in 3.x. From tjreedy at udel.edu Thu Nov 25 06:22:01 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 25 Nov 2010 00:22:01 -0500 Subject: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS In-Reply-To: References: <20101123203252.39BE7EE9CF@mail.python.org> <4CEC43A4.80907@netwok.org> <4CEC4917.2070508@udel.edu> Message-ID: On 11/24/2010 3:04 PM, Georg Brandl wrote: >>> Adding the BOM will be an editor thing, not a svn thing. Doing a > It should show up as an invisible change in the first line of a file when you > look at a "svn diff". (It is a very good practice to look at a diff before > committing anyway.) It does show up, and yes I agree. That should be in dev/faq if not already -- Terry Jan Reedy From tjreedy at udel.edu Thu Nov 25 06:23:27 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 25 Nov 2010 00:23:27 -0500 Subject: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS In-Reply-To: <4CED8E1E.5050400@v.loewis.de> References: <20101123203252.39BE7EE9CF@mail.python.org> <4CEC43A4.80907@netwok.org> <4CEC4917.2070508@udel.edu> <4CED8E1E.5050400@v.loewis.de> Message-ID: On 11/24/2010 5:13 PM, "Martin v. L?wis" wrote: >> So I presume it did the same with IOBinding.py. > > No. This file contains only ASCII characters, so notepad has decided > to not add the BOM. Or it somehow got removed from the .py file. I tried with another .py file (and reverted!) and the diff showed the invisible change to the first line that Georg predicted. -- Terry Jan Reedy From tjreedy at udel.edu Thu Nov 25 06:39:30 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 25 Nov 2010 00:39:30 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CED5E91.9070705@egenix.com> Message-ID: On 11/24/2010 3:06 PM, Alexander Belopolsky wrote: > Any non-trivial text processing is likely to be broken in presence of > surrogates. Producing them on input is just trading known issue for > an unknown one. Processing surrogate pairs in python code is hard. > Software that has to support non-BMP characters will most likely be > written for a wide build and contain subtle bugs when run under a > narrow build. Note that my latest proposal does not abolish > surrogates outright. Users who want them can still use something like > "surrogateescape" error handler for non-BMP characters. It seems to me that what you are asking for is an alternate, optional, utf-8-bmp codec that would raise an error, in either direction, for non-bmp chars. Then, as you suggest, if one is not prepared for surrogates, they are not allowed. -- Terry Jan Reedy From anurag.chourasia at gmail.com Thu Nov 25 10:24:34 2010 From: anurag.chourasia at gmail.com (Anurag Chourasia) Date: Thu, 25 Nov 2010 14:54:34 +0530 Subject: [Python-Dev] AIX 5.3 - Enabling Shared Library Support Vs Extensions Message-ID: All, When I configure python to enable shared libraries, none of the extensions are getting built during the make step due to this error. building 'cStringIO' extension gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I. -I/u01/home/apli/wm/GDD/Python-2.6.6/./Include -I. -IInclude -I./Include -I/opt/freeware/include -I/opt/freeware/include/readline -I/opt/freeware/include/ncurses -I/usr/local/include -I/u01/home/apli/wm/GDD/Python-2.6.6/Include -I/u01/home/apli/wm/GDD/Python-2.6.6 -c /u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.c -o build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.o ./Modules/ld_so_aix gcc -pthread -bI:Modules/python.exp build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.o -L/usr/local/lib *-lpython2.6* -o build/lib.aix-5.3-2.6/cStringIO.so *collect2: library libpython2.6 not found* building 'cPickle' extension gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I. -I/u01/home/apli/wm/GDD/Python-2.6.6/./Include -I. -IInclude -I./Include -I/opt/freeware/include -I/opt/freeware/include/readline -I/opt/freeware/include/ncurses -I/usr/local/include -I/u01/home/apli/wm/GDD/Python-2.6.6/Include -I/u01/home/apli/wm/GDD/Python-2.6.6 -c /u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.c -o build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.o ./Modules/ld_so_aix gcc -pthread -bI:Modules/python.exp build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.o -L/usr/local/lib *-lpython2.6* -o build/lib.aix-5.3-2.6/cPickle.so *collect2: library libpython2.6 not found* This is on AIX 5.3, GCC 4.2, Python 2.6.6 I can confirm that there is a libpython2.6.a file in the top level directory from where I am doing the configure/make etc Here are the options supplied to the configure command ./configure --enable-shared --disable-ipv6 --with-gcc=gcc CPPFLAGS="-I /opt/freeware/include -I /opt/freeware/include/readline -I /opt/freeware/include/ncurses" Please guide me in getting past this error. Thanks for your help on this. Regards, Anurag -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Thu Nov 25 10:34:51 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 25 Nov 2010 01:34:51 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEC2759.40203@g.nevcal.com> References: <20101121034404.52924F20A@mail.python.org> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535602.3642.87.camel@localhost.localdomain> <4CEC2759.40203@g.nevcal.com> Message-ID: <4CEE2DBB.3040502@g.nevcal.com> So the following code defines constants with associated names that get put in the repr. I'm still a Python newbie in some areas, particularly classes and metaclasses, maybe more. But this Python 3 code seems to create constants with names ... works for int and str at least. Special case for int defines a special __or__ operator to OR both the values and the names, which some might like. Dunno why it doesn't work for dict, and it is too late to research that today. That's the last test case in the code below, so you can see how it works for int and string before it bombs. There's some obvious cleanup work to be done, and it would be nice to make the names actually be constant... but they do lose their .name if you ignorantly assign the base type, so at least it is hard to change the value and keep the associated .name that gets reported by repr, which might reduce some confusion at debug time. An idea I had, but have no idea how to implement, is that it might be nice to say: with imported_constants_from_module: do_stuff where do_stuff could reference the constants without qualifying them by module. Of course, if you knew it was just a module of constants, you could "import * from module" :) But the idea of with is that they'd go away at the end of that scope. Some techniques here came from Raymond's namedtuple code. def constant( name, val ): typ = str( type( val )) if typ.startswith(" ": typ = typ[ 8:-2 ] ev = ''' class constant_%s( %s ): def __new__( cls, val, name ): self = %s.__new__( cls, val ) self.name = name return self def __repr__( self ): return self.name + ': ' + str( self ) ''' if typ == 'int': ev += ''' def __or__( self, other ): if isinstance( other, constant_int ): return constant_int( int( self ) | int( other ), self.name + ' | ' + other.name ) ''' ev += ''' %s = constant_%s( %s, '%s' ) ''' ev = ev % ( typ, typ, typ, name, typ, repr( val ), name ) print( ev ) exec( ev, globals()) constant('O_RANDOM', val=16 ) constant('O_SEQUENTIAL', val=32 ) constant("O_STRING", val="string") def foo( x ): print( str( x )) print( repr( x )) print( type( x )) foo( O_RANDOM ) foo( O_SEQUENTIAL ) foo( O_STRING ) zz = O_RANDOM | O_SEQUENTIAL foo( zz ) y = {'ab': 2, 'yz': 3 } constant('O_DICT', y ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu Nov 25 10:51:09 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 25 Nov 2010 10:51:09 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CED5E91.9070705@egenix.com> Message-ID: <4CEE318D.5000705@egenix.com> Terry Reedy wrote: > On 11/24/2010 3:06 PM, Alexander Belopolsky wrote: > >> Any non-trivial text processing is likely to be broken in presence of >> surrogates. Producing them on input is just trading known issue for >> an unknown one. Processing surrogate pairs in python code is hard. >> Software that has to support non-BMP characters will most likely be >> written for a wide build and contain subtle bugs when run under a >> narrow build. Note that my latest proposal does not abolish >> surrogates outright. Users who want them can still use something like >> "surrogateescape" error handler for non-BMP characters. > > It seems to me that what you are asking for is an alternate, optional, > utf-8-bmp codec that would raise an error, in either direction, for > non-bmp chars. Then, as you suggest, if one is not prepared for > surrogates, they are not allowed. That would be a possibility as well... but I doubt that many users are going to bother, since slicing surrogates is just as bad as slicing combining code points and the latter are much more common in real life and they do happen to mostly live in the BMP. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 25 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Thu Nov 25 10:57:17 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 25 Nov 2010 10:57:17 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CED5E91.9070705@egenix.com> <87bp5eb0zb.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEE32FD.90507@egenix.com> Alexander Belopolsky wrote: > On Wed, Nov 24, 2010 at 9:17 PM, Stephen J. Turnbull wrote: > .. >> > I note that an opinion has been raised on this thread that >> > if we want compressed internal representation for strings, we should >> > use UTF-8. I tend to agree, but UTF-8 has been repeatedly rejected as >> > too hard to implement. What makes UTF-16 easier than UTF-8? Only the >> > fact that you can ignore bugs longer, in my view. >> >> That's mostly true. My guess is that we can probably ignore those >> bugs for as long as it takes someone to write the higher-level >> libraries that James suggests and MAL has actually proposed and >> started a PEP for. >> > > As far as I can tell, that PEP generated grand total of one comment in > nine years. This may or may not be indicative of how far away we are > from seeing it implemented. :-) At the time it was too early for people to start thinking about these issues. Actual use of Unicode really only started a few years ago. Since I didn't have a need for such an indexing module myself (and didn't have much time to work on it anyway), I punted on the idea. If someone else wants to pick up the idea, I'd gladly help out with the details. > As far as UTF-8 vs. UCS-2/4 debate, I have an idea that may be even > more far fetched. Once upon a time, Python Unicode strings supported > buffer protocol and would lazily fill an internal buffer with bytes in > the default encoding. In 3.x the default encoding has been fixed as > UTF-8, buffer protocol support was removed from strings, but the > internal buffer caching (now UTF-8) encoded representation remained. > Maybe we can now implement defenc logic in reverse. Recall that > strings are stored as UCS-2/4 sequences, but once buffer is requested > in 2.x Python code or char* is obtained via > _PyUnicode_AsStringAndSize() at the C level in 3.x, an internal buffer > is filled with UTF-8 bytes and defenc is set to point to that buffer. The original idea was for that buffer to go away once we moved to Unicode for strings. Reality has shown that we still need to stick the buffer, though, since the UTF-8 representation of Unicode objects is used a lot. > So the idea is for strings to store their data as UTF-8 buffer > pointed by defenc upon construction. If an application uses string > indexing, UTF-8 only strings will lazily fill their UCS-2/4 buffer. > Proper, Unicode-aware algorithms such as grapheme, word or line > iteration or simple operations such as concatenation, search or > substitution would operate directly on defenc buffers. Presumably > over time fewer and fewer applications would use code unit indexing > that require UCS-2/4 buffer and eventually Python strings can stop > supporting indexing altogether just like they stopped supporting the > buffer protocol in 3.x. I don't follow you: how would UTF-8, which has even more issues with variable length representation of code points, make something easier compared to UTF-16, which has far fewer such issues and then only for non-BMP code points ? Please note that we can only provide one way of string indexing in Python using the standard s[1] notation and since we don't want that operation to be fast and no more than O(1), using the code units as items is the only reasonable way to implement it. With an indexing module, we could then let applications work based on higher level indexing schemes such as complete code points (skipping surrogates), combined code points, graphemes (ignoring e.g. most control code points and zero width code points), words (with some customizations as to where to break words, which will likely have to be language dependent), lines (which can be complicated for scripts that use columns instead ;-)), paragraphs, etc. It would also help to add transparent indexing for right-to-left scripts and text that uses both left-to-right and right-to-left text (BIDI). However, in order for these indexing methods to actually work, they will need to return references to the code units, so we cannot just drop that access method. * Back on the surrogates topic: In any case, I think this discussion is losing its grip on reality. By far, most strings you find in actual applications don't use surrogates at all, so the problem is being exaggerated. If you need to be careful about surrogates for some reason, I think a single new method .hassurrogates() on string objects would go a long way in making detection and adding special-casing for these a lot easier. If adding support for surrogates doesn't make sense (e.g. in the case of the formatting methods), then we simply punt on that and leave such handling to other tools. * Regarding preventing surrogates from entering the Python runtime: It is by far more important to maintain round-trip safety for Unicode data, than getting every bit of code work correctly with surrogates (often, there won't be a single correct way). With a new method for fast detection of surrogates, we could protect code which obviously doesn't work with surrogates and then consider each case individually by either adding special cases as necessary or punting on the support. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 25 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From nadeem.vawda at gmail.com Thu Nov 25 11:12:20 2010 From: nadeem.vawda at gmail.com (Nadeem Vawda) Date: Thu, 25 Nov 2010 12:12:20 +0200 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEE2DBB.3040502@g.nevcal.com> References: <20101121034404.52924F20A@mail.python.org> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535602.3642.87.camel@localhost.localdomain> <4CEC2759.40203@g.nevcal.com> <4CEE2DBB.3040502@g.nevcal.com> Message-ID: On Thu, Nov 25, 2010 at 11:34 AM, Glenn Linderman wrote: > So the following code defines constants with associated names that get put > in the repr. The code you gave doesn't work if the constant() function is moved into a separate module from the code that calls it. The globals() function, as I understand it, gives you access to the global namespace *of the current module*, so the constants end up being defined in the module containing constant(), not the module you're calling it from. You could get around this by passing the globals of the calling module to constant(), but I think it's cleaner to use a class to provide a distinct namespace for the constants. > An idea I had, but have no idea how to implement, is that it might be nice > to say: > > ??? with imported_constants_from_module: > ??? ?????? do_stuff > > where do_stuff could reference the constants without qualifying them by > module.? Of course, if you knew it was just a module of constants, you could > "import * from module" :)? But the idea of with is that they'd go away at > the end of that scope. I don't think this is possible - the context manager protocol doesn't allow you to modify the namespace of the caller like that. Also, a with statement does not have its own namespace; any names defined inside its body will continue to be visible in the containing scope. Of course, if you want to achieve something similar (at function scope), you could say: def foo(bar, baz): from module import * ... From fuzzyman at voidspace.org.uk Thu Nov 25 11:34:25 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 25 Nov 2010 10:34:25 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535602.3642.87.camel@localhost.localdomain> <4CEC2759.40203@g.nevcal.com> <4CEE2DBB.3040502@g.nevcal.com> Message-ID: <4CEE3BB1.5090308@voidspace.org.uk> On 25/11/2010 10:12, Nadeem Vawda wrote: > On Thu, Nov 25, 2010 at 11:34 AM, Glenn Linderman wrote: >> So the following code defines constants with associated names that get put >> in the repr. > The code you gave doesn't work if the constant() function is moved > into a separate module from the code that calls it. The globals() > function, as I understand it, gives you access to the global namespace > *of the current module*, so the constants end up being defined in the > module containing constant(), not the module you're calling it from. > > You could get around this by passing the globals of the calling module > to constant(), but I think it's cleaner to use a class to provide a > distinct namespace for the constants. > >> An idea I had, but have no idea how to implement, is that it might be nice >> to say: >> >> with imported_constants_from_module: >> do_stuff >> >> where do_stuff could reference the constants without qualifying them by >> module. Of course, if you knew it was just a module of constants, you could >> "import * from module" :) But the idea of with is that they'd go away at >> the end of that scope. > I don't think this is possible - the context manager protocol doesn't > allow you to modify the namespace of the caller like that. Also, a > with statement does not have its own namespace; any names defined > inside its body will continue to be visible in the containing scope. > > Of course, if you want to achieve something similar (at function > scope), you could say: > > def foo(bar, baz): > from module import * > ... Not in Python 3 you can't. :-) That's invalid syntax, import * can only be used at module level. This makes *testing* import * (i.e. testing your __all__) annoying - you have to exec('from module import *') instead. Michael > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fuzzyman at voidspace.org.uk Thu Nov 25 11:37:13 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 25 Nov 2010 10:37:13 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEE2DBB.3040502@g.nevcal.com> References: <20101121034404.52924F20A@mail.python.org> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535602.3642.87.camel@localhost.localdomain> <4CEC2759.40203@g.nevcal.com> <4CEE2DBB.3040502@g.nevcal.com> Message-ID: <4CEE3C59.1030002@voidspace.org.uk> On 25/11/2010 09:34, Glenn Linderman wrote: > So the following code defines constants with associated names that get > put in the repr. > > I'm still a Python newbie in some areas, particularly classes and > metaclasses, maybe more. > But this Python 3 code seems to create constants with names ... works > for int and str at least. > > Special case for int defines a special __or__ operator to OR both the > values and the names, which some might like. > > Dunno why it doesn't work for dict, and it is too late to research > that today. That's the last test case in the code below, so you can > see how it works for int and string before it bombs. > > There's some obvious cleanup work to be done, and it would be nice to > make the names actually be constant... but they do lose their .name if > you ignorantly assign the base type, so at least it is hard to change > the value and keep the associated .name that gets reported by repr, > which might reduce some confusion at debug time. > > An idea I had, but have no idea how to implement, is that it might be > nice to say: > > with imported_constants_from_module: > do_stuff > > where do_stuff could reference the constants without qualifying them > by module. Of course, if you knew it was just a module of constants, > you could "import * from module" :) But the idea of with is that > they'd go away at the end of that scope. > > Some techniques here came from Raymond's namedtuple code. > > > def constant( name, val ): > typ = str( type( val )) > if typ.startswith(" ": > typ = typ[ 8:-2 ] > ev = ''' > class constant_%s( %s ): > def __new__( cls, val, name ): > self = %s.__new__( cls, val ) > self.name = name > return self > def __repr__( self ): > return self.name + ': ' + str( self ) > ''' > if typ == 'int': > ev += ''' > def __or__( self, other ): > if isinstance( other, constant_int ): > return constant_int( int( self ) | int( other ), > self.name + ' | ' + other.name ) > ''' Not quite correct. If you or a value you with itself you should get back just the value not something with "name|name" as the repr. We can hold off on implementations until we have general agreement that some kind of named constant *should* be added, and what the feature set should look like. All the best, Michael > ev += ''' > %s = constant_%s( %s, '%s' ) > > ''' > ev = ev % ( typ, typ, typ, name, typ, repr( val ), name ) > print( ev ) > exec( ev, globals()) > > constant('O_RANDOM', val=16 ) > > constant('O_SEQUENTIAL', val=32 ) > > constant("O_STRING", val="string") > > def foo( x ): > print( str( x )) > print( repr( x )) > print( type( x )) > > foo( O_RANDOM ) > foo( O_SEQUENTIAL ) > foo( O_STRING ) > > zz = O_RANDOM | O_SEQUENTIAL > > foo( zz ) > > y = {'ab': 2, 'yz': 3 } > constant('O_DICT', y ) > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From merwok at netwok.org Thu Nov 25 12:47:00 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Thu, 25 Nov 2010 12:47:00 +0100 Subject: [Python-Dev] [Python-checkins] r86748 - in python/branches/py3k-urllib/Lib: http/client.py urllib/request.py In-Reply-To: <20101125081820.7FA2EEEA97@mail.python.org> References: <20101125081820.7FA2EEEA97@mail.python.org> Message-ID: <4CEE4CB4.6010107@netwok.org> > Author: senthil.kumaran > New Revision: 86748 > > Log: > Experimental - Transparent gzip Encoding in urllib2. There should be a good way to deal with Content-Length. Cool feature! But... > Modified: > python/branches/py3k-urllib/Lib/http/client.py > python/branches/py3k-urllib/Lib/urllib/request.py No tests? Misc/NEWS? :) Regards From rob.cliffe at btinternet.com Thu Nov 25 13:52:44 2010 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Thu, 25 Nov 2010 12:52:44 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEDDC2D.204@canterbury.ac.nz> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CEDDC2D.204@canterbury.ac.nz> Message-ID: <4CEE5C1C.9000905@btinternet.com> On 25/11/2010 03:46, Greg Ewing wrote: > On 25/11/10 12:38, average wrote: >> Is immutability a general need that should have general solution? > Yes, I have sometimes thought this. Might be nice to have a "mutable" attribute that could be read and could be changed from True to False, though presumably not vice versa. > I don't think it really generalizes. Tuples are not just frozen > lists, for example -- they have a different internal structure > that's more efficient to create and access. > But couldn't they be presented to the Python programmer as a single type, with the implementation details hidden "under the hood"? So MyList.__mutable__ = False would have the same effect as the present MyList = tuple(MyList) This would simplify some code that copes with either list(s) or tuple(s) as input data. One would need syntax for (im)mutable literals, e.g. []i # immutable list (really a tuple). Bit of a shame that "i[]" doesn't work. or []f # frozen list (same thing) [] # mutable list (same as now) []m # alternative syntax for mutable list This would reduce the overloading on parentheses and avoid having to write a tuple of one item as (t,) which often trips up newbies. It woud also avoid one FAQ: Why does Python have separate list and tuple types? Also the syntax could be extended, e.g. {a,b,c}f # frozen set with 3 objects {p:x,q:y}f # frozen dictionary with 2 items {:}f, {}f # (re the thread on set literals) frozen empty dictionary and frozen empty set! Just some thoughts for Python 4. Best wishes Rob Cliffe From g.brandl at gmx.net Thu Nov 25 14:27:14 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 25 Nov 2010 14:27:14 +0100 Subject: [Python-Dev] [Python-checkins] r86748 - in python/branches/py3k-urllib/Lib: http/client.py urllib/request.py In-Reply-To: <4CEE4CB4.6010107@netwok.org> References: <20101125081820.7FA2EEEA97@mail.python.org> <4CEE4CB4.6010107@netwok.org> Message-ID: Am 25.11.2010 12:47, schrieb ?ric Araujo: >> Author: senthil.kumaran >> New Revision: 86748 >> >> Log: >> Experimental - Transparent gzip Encoding in urllib2. There should be a good way to deal with Content-Length. > Cool feature! But... > >> Modified: >> python/branches/py3k-urllib/Lib/http/client.py >> python/branches/py3k-urllib/Lib/urllib/request.py > No tests? Misc/NEWS? :) Note that this is work in a separate branch. Georg From emile.anclin at logilab.fr Thu Nov 25 15:30:23 2010 From: emile.anclin at logilab.fr (Emile Anclin) Date: Thu, 25 Nov 2010 15:30:23 +0100 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError Message-ID: <201011251530.23947.emile.anclin@logilab> hello, working on Pylint, we have a lot of voluntary corrupted files to test Pylint behavior; for instance $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py # -*- coding: IBO-8859-1 -*- """ check correct unknown encoding declaration """ __revision__ = '????' and we try to find that module : find_module('func_unknown_encoding', None). But python3 raises SyntaxError in that case ; it didn't raise SyntaxError on python2 nor does so on our func_nonascii_noencoding and func_wrong_encoding modules (with obvious names) Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from imp import find_module >>> find_module('func_unknown_encoding', None) Traceback (most recent call last): File " ", line 1, in SyntaxError: encoding problem: with BOM >>> find_module('func_wrong_encoding', None) (<_io.TextIOWrapper name=5 encoding='utf-8'>, 'func_wrong_encoding.py', ('.py', 'U', 1)) >>> find_module('func_nonascii_noencoding', None) (<_io.TextIOWrapper name=6 encoding='utf-8'>, 'func_nonascii_noencoding.py', ('.py', 'U', 1)) So what is the reason of this selective behavior? Furthermore, there is BOM in our func_unknown_encoding.py module. -- Emile Anclin http://www.logilab.fr/ http://www.logilab.org/ Informatique scientifique & et gestion de connaissances From rrr at ronadam.com Thu Nov 25 18:22:58 2010 From: rrr at ronadam.com (Ron Adam) Date: Thu, 25 Nov 2010 11:22:58 -0600 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: <201011251530.23947.emile.anclin@logilab> References: <201011251530.23947.emile.anclin@logilab> Message-ID: <4CEE9B72.1070002@ronadam.com> On 11/25/2010 08:30 AM, Emile Anclin wrote: > > hello, > > working on Pylint, we have a lot of voluntary corrupted files to test > Pylint behavior; for instance > > $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py > # -*- coding: IBO-8859-1 -*- > """ check correct unknown encoding declaration > """ > > __revision__ = '????' > > > and we try to find that module : > find_module('func_unknown_encoding', None). But python3 raises SyntaxError > in that case ; it didn't raise SyntaxError on python2 nor does so on our > func_nonascii_noencoding and func_wrong_encoding modules (with obvious > names) > > Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) > [GCC 4.3.4] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> from imp import find_module >>>> find_module('func_unknown_encoding', None) > Traceback (most recent call last): > File " ", line 1, in > SyntaxError: encoding problem: with BOM >>>> find_module('func_wrong_encoding', None) > (<_io.TextIOWrapper name=5 encoding='utf-8'>, 'func_wrong_encoding.py', > ('.py', 'U', 1)) >>>> find_module('func_nonascii_noencoding', None) > (<_io.TextIOWrapper name=6 encoding='utf-8'>, > 'func_nonascii_noencoding.py', ('.py', 'U', 1)) > > > So what is the reason of this selective behavior? > Furthermore, there is BOM in our func_unknown_encoding.py module. I don't think there is a clear reason by design. Also try importing the same modules directly and noting the differences in the errors you get. For example, the problem that brought this to my attention in python3.2. >>> find_module('test/badsyntax_pep3120') Segmentation fault >>> from test import badsyntax_pep3120 Traceback (most recent call last): File " ", line 1, in File "/usr/local/lib/python3.2/test/badsyntax_pep3120.py", line 1 SyntaxError: Non-UTF-8 code starting with '\xf6' in file /usr/local/lib/python3.2/test/badsyntax_pep3120.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details The import statement uses parser.c, and tokenizer.c indirectly, to import a file, but the imp module uses tokenizer.c directly. They aren't consistent in how they handle errors because the different error messages are generated in different places depending on what the error is, *and* what the code path to get to that point was, *and* weather or not a filename was set. For the example above with imp.findmodule(), the filename isn't set, so you get a different error than if you used import, which uses the parser module and that does set the filename. From what I've seen, it would help if the imp module was rewritten to use parser.c like the import statement does, rather than tokenizer.c directly. The error handling in parser.c is much better than tokenizer.c. Possibly tokenizer.c could be cleaned up after that and be made much simpler. Ron Adam From rrr at ronadam.com Thu Nov 25 18:22:58 2010 From: rrr at ronadam.com (Ron Adam) Date: Thu, 25 Nov 2010 11:22:58 -0600 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: <201011251530.23947.emile.anclin@logilab> References: <201011251530.23947.emile.anclin@logilab> Message-ID: <4CEE9B72.1070002@ronadam.com> On 11/25/2010 08:30 AM, Emile Anclin wrote: > > hello, > > working on Pylint, we have a lot of voluntary corrupted files to test > Pylint behavior; for instance > > $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py > # -*- coding: IBO-8859-1 -*- > """ check correct unknown encoding declaration > """ > > __revision__ = '????' > > > and we try to find that module : > find_module('func_unknown_encoding', None). But python3 raises SyntaxError > in that case ; it didn't raise SyntaxError on python2 nor does so on our > func_nonascii_noencoding and func_wrong_encoding modules (with obvious > names) > > Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) > [GCC 4.3.4] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> from imp import find_module >>>> find_module('func_unknown_encoding', None) > Traceback (most recent call last): > File " ", line 1, in > SyntaxError: encoding problem: with BOM >>>> find_module('func_wrong_encoding', None) > (<_io.TextIOWrapper name=5 encoding='utf-8'>, 'func_wrong_encoding.py', > ('.py', 'U', 1)) >>>> find_module('func_nonascii_noencoding', None) > (<_io.TextIOWrapper name=6 encoding='utf-8'>, > 'func_nonascii_noencoding.py', ('.py', 'U', 1)) > > > So what is the reason of this selective behavior? > Furthermore, there is BOM in our func_unknown_encoding.py module. I don't think there is a clear reason by design. Also try importing the same modules directly and noting the differences in the errors you get. For example, the problem that brought this to my attention in python3.2. >>> find_module('test/badsyntax_pep3120') Segmentation fault >>> from test import badsyntax_pep3120 Traceback (most recent call last): File " ", line 1, in File "/usr/local/lib/python3.2/test/badsyntax_pep3120.py", line 1 SyntaxError: Non-UTF-8 code starting with '\xf6' in file /usr/local/lib/python3.2/test/badsyntax_pep3120.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details The import statement uses parser.c, and tokenizer.c indirectly, to import a file, but the imp module uses tokenizer.c directly. They aren't consistent in how they handle errors because the different error messages are generated in different places depending on what the error is, *and* what the code path to get to that point was, *and* weather or not a filename was set. For the example above with imp.findmodule(), the filename isn't set, so you get a different error than if you used import, which uses the parser module and that does set the filename. From what I've seen, it would help if the imp module was rewritten to use parser.c like the import statement does, rather than tokenizer.c directly. The error handling in parser.c is much better than tokenizer.c. Possibly tokenizer.c could be cleaned up after that and be made much simpler. Ron Adam From merwok at netwok.org Thu Nov 25 18:53:54 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Thu, 25 Nov 2010 18:53:54 +0100 Subject: [Python-Dev] [Python-checkins] r86748 - in python/branches/py3k-urllib/Lib: http/client.py urllib/request.py In-Reply-To: References: <20101125081820.7FA2EEEA97@mail.python.org> <4CEE4CB4.6010107@netwok.org> Message-ID: <4CEEA2B2.1030306@netwok.org> >>> Modified: >>> python/branches/py3k-urllib/Lib/http/client.py >>> python/branches/py3k-urllib/Lib/urllib/request.py >> No tests? Misc/NEWS? :) > > Note that this is work in a separate branch. Ah, didn?t notice that! Senthil replied as much in private email: > That was in a different branch. Once stable shall definitey include > the tests and news. unconsciously-ignoring-svn-branches-to-preserve-sanity-ly yours, ?ric From victor.stinner at haypocalc.com Thu Nov 25 22:39:00 2010 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 25 Nov 2010 22:39:00 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CE6F93F.9010109@egenix.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> Message-ID: <201011252239.00288.victor.stinner@haypocalc.com> On Friday 19 November 2010 23:25:03 you wrote: > > Python is unclear about non-BMP characters: narrow build was called > > "ucs2" for long time, even if it is UTF-16 (each character is encoded to > > one or two UTF-16 words). > > No, no, no :-) > > UCS2 and UCS4 are more appropriate than "narrow" and "wide" or even > "UTF-16" and "UTF-32". Ok for Python 2: $ ./python Python 2.7.0+ (release27-maint:84618M, Sep 8 2010, 12:43:49) >>> import sys; sys.maxunicode 65535 >>> x=u'\U0010ffff'; len(x) 2 >>> ord(x) ... TypeError: ord() expected a character, but string of length 2 found But Python 3 does use UTF-16 for narrow build: $ ./python Python 3.2a3+ (py3k:86396:86399M, Nov 10 2010, 15:24:09) >>> import sys; sys.maxunicode 65535 >>> c=chr(0x10ffff); len(c) 2 >>> ord(c) 1114111 Victor From merwok at netwok.org Fri Nov 26 02:32:43 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Fri, 26 Nov 2010 02:32:43 +0100 Subject: [Python-Dev] [Python-checkins] r86750 - python/branches/py3k/Demo/curses/life.py In-Reply-To: <20101125145644.D98FAEEA26@mail.python.org> References: <20101125145644.D98FAEEA26@mail.python.org> Message-ID: <4CEF0E3B.2070608@netwok.org> Hello, > Author: senthil.kumaran > Log: > Mouse support and colour to Demo/curses/life.py by Dafydd Crosby > > Modified: > python/branches/py3k/Demo/curses/life.py Okay, this time I?m reacting to the right branch > Modified: python/branches/py3k/Demo/curses/life.py > ============================================================================== > --- python/branches/py3k/Demo/curses/life.py (original) > +++ python/branches/py3k/Demo/curses/life.py Thu Nov 25 15:56:44 2010 > @@ -1,6 +1,7 @@ > #!/usr/bin/env python3 > # life.py -- A curses-based version of Conway's Game of Life. > # Contributed by AMK > +# Mouse support and colour by Dafydd Crosby Shouldn?t his name rather be in Misc/ACKS too? Modules typically (warning: non-scientific data) include the name of the author or first contributors but not the name of every contributor. I think these cool features deserve a note in Misc/NEWS too :) Re: ?colour?: the rest of the file use US English, as do the function names (see for example curses.has_color). It?s good to use one dialect consistently in one file. going-back-to-stare-at-shiny-colors-ly yours, ?ric From orsenthil at gmail.com Fri Nov 26 03:15:24 2010 From: orsenthil at gmail.com (Senthil Kumaran) Date: Fri, 26 Nov 2010 10:15:24 +0800 Subject: [Python-Dev] [Python-checkins] r86750 - python/branches/py3k/Demo/curses/life.py In-Reply-To: <4CEF0E3B.2070608@netwok.org> References: <20101125145644.D98FAEEA26@mail.python.org> <4CEF0E3B.2070608@netwok.org> Message-ID: <20101126021524.GA1450@rubuntu> On Fri, Nov 26, 2010 at 02:32:43AM +0100, ?ric Araujo wrote: > Shouldn?t his name rather be in Misc/ACKS too? Modules typically > (warning: non-scientific data) include the name of the author or first > contributors but not the name of every contributor. > > I think these cool features deserve a note in Misc/NEWS too :) I don't think it is required. Demo stuffs are usually fun demonstrations. The contributor had added his name to patch in the header, and I just left it like that. It's fine. For features and important patches (subjective), Misc/{ACKS,NEWS} are both added. > Re: ?colour?: the rest of the file use US English, as do the function > names (see for example curses.has_color). It?s good to use one dialect > consistently in one file. Good catch. Did not realize it because, we write it as colour too. Changing it. Thanks, Senthil From stephen at xemacs.org Fri Nov 26 03:42:33 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 26 Nov 2010 11:42:33 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEE318D.5000705@egenix.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CED5E91.9070705@egenix.com> <4CEE318D.5000705@egenix.com> Message-ID: <87fwuo7qli.fsf@uwakimon.sk.tsukuba.ac.jp> M.-A. Lemburg writes: > That would be a possibility as well... but I doubt that many users > are going to bother, since slicing surrogates is just as bad as > slicing combining code points and the latter are much more common in > real life and they do happen to mostly live in the BMP. That's only if you require 100% fidelity in the data, which may not be true in some use cases. Where 99.99% fidelity is good enough, an unexpected sliced surrogate pair is a show-stopper, while a sliced combining character sequence not only doesn't stop the show (at least in Python, and I doubt any correct Unicode process can signal a fatal error there either, I can put a tilde on a Cyrillic character if I want to, no?), it's probably readable enough that readers will assume a keypunch error. Personally, if available I would always use some such dodge in server software (I don't care enough about 24x7 availability to write it myself, though). And never in a script for interactive use; something needs fixing, may as well take the fatal error and fix it on the spot. (Again, "on the spot" for me can mean "tomorrow".) From stephen at xemacs.org Fri Nov 26 04:02:09 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 26 Nov 2010 12:02:09 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEE32FD.90507@egenix.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CED5E91.9070705@egenix.com> <87bp5eb0zb.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEE32FD.90507@egenix.com> Message-ID: <87eia87pou.fsf@uwakimon.sk.tsukuba.ac.jp> M.-A. Lemburg writes: > Please note that we can only provide one way of string indexing > in Python using the standard s[1] notation and since we don't > want that operation to be fast and no more than O(1), using the > code units as items is the only reasonable way to implement it. AFAICT, the "we" that wants "no more than O(1)" does not include Glyph Lefkowitz, James Knight, and Greg Ewing. Greg even said that in designing a UTF-8 string type he might not provide a indexing operation at all. (Caution: That may not be what he meant; I'm just reporting the way I interpreted it.) Of course none of them are proposing to change Python, that's all in the context of designing a new language. But it does suggest that a lot of people can't think of use cases where O(1) string indexing is more important than Unicode robustness. > It is by far more important to maintain round-trip safety for > Unicode data, than getting every bit of code work correctly > with surrogates (often, there won't be a single correct way). But surely it's more important than that to ensure that surrogates can't crash a Python process with unexpect UnicodeErrors? From jcea at jcea.es Fri Nov 26 05:11:56 2010 From: jcea at jcea.es (Jesus Cea) Date: Fri, 26 Nov 2010 05:11:56 +0100 Subject: [Python-Dev] Question about GDB bindings and 32/64 bits Message-ID: <4CEF338C.4070509@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I have installed GDB 7.2 32 bits and 32 bits buildslaves are green. Nevertheless 64 bits buildslaves are failing test_gdb. Is there any expectation that a 32 bits GDB be able to debug a 64 bits python?. If not, gdb test should compare "platform.architecture()" (for python and gdb in the system) and run only when they are the same. If this should work, I would open a bug and maybe spend some time with it. But before thinking about investing time, I would like to know if this mix is actually expected or not to work. If not, I would consider to install a 64 bits GDB too and do some tricks (like using an "/usr/local/bin/gdb" script wrapper to choose 32/64 "real" gdb version) to actually execute "test_gdb" in both buildslaves (they are running in the same physical machine). Any advice? PS: I am talking about AMD64 OpenIndiana buildbots. Haven't check others. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTO8zjJlgi5GaxT1NAQLusgP9GVuhvQJWhPqjzdkZnrMObQg0AD6ggbIR 2B4IstFpD1bKvIcGPJv0Irk3+heaQuFbTzYVLC132d89Ektfib9ZbJ/hzJz2wqd2 lnkfNUCV0tKal3P7kbGYUk828glIrlufSuF1HYIknd2BAzHFl5Zf6q5/AXzYr90D v4Y82b7Wg0k= =NHcR -----END PGP SIGNATURE----- From glyph at twistedmatrix.com Fri Nov 26 08:21:26 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Fri, 26 Nov 2010 02:21:26 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87mxozayam.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> <3C1ADB64-63F3-4165-926D-EDE9846E0DBD@fuhm.net> <87mxozayam.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Nov 24, 2010, at 4:03 AM, Stephen J. Turnbull wrote: > You end up proliferating types that all do the same kind of thing. Judicious use of inheritance helps, but getting the fundamental abstraction right is hard. Or least, Emacs hasn't found it in 20 years of trying. Emacs hasn't even figured out how to do general purpose iteration in 20 years of trying either. The easiest way I've found to loop across an arbitrary pile of 'stuff' is the CL 'loop' macro, which you're not even supposed to use. Even then, you still have to make the arcane and pointless distinction of using 'across' or 'in' or 'on'. Python, on the other hand, has iteration pretty well tied up nicely in a bow. I don't know how to respond to the rest of your argument. Nothing you've said has in any way indicated to me why having code-point offsets is a good idea, only that people who know C and elisp would rather sling around piles of integers than have good abstract types. For example: > I think it more likely that markers are very expense to create and use compared to integers. What? When you do 'for x in str' in python, you are already creating an iterator object, which has to store the exact same amount of state that our proposed 'marker' or 'character pointer' would have to store. The proposed UTF-8 marker would have to do a tiny bit more work when iterating because it would have to combine multibyte characters, but in exchange for that you get to skip a whole ton of copying when encoding and decoding. How is this expensive to create and use? For every application I have ever designed, encountered, or can even conjecture about, this would be cheaper. (Assuming not just a UTF-8 string type, but one for UTF-16 as well, where native data is in that format already.) For what it's worth, not wanting to use abstract types in Emacs makes sense to me: I've written my share of elisp code, and it is hard to create reasonable abstractions in Emacs, because the facilities for defining types and creating polymorphic logic are so crude. It's a lot easier to just assume your underlying storage is an array, because at the end of the day you're going to need to call some functions on it which care whether it's an array or an alist or a list or a vector anyway, so you might as well just say so up front. But in Python we could just call 'mystring.by_character()' or 'mystring.by_codepoint()' and get an iterator object back and forget about all that junk. -------------- next part -------------- An HTML attachment was scrubbed... URL: From glyph at twistedmatrix.com Fri Nov 26 08:51:35 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Fri, 26 Nov 2010 02:51:35 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87ipzm6oqr.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> <3C1ADB64-63F3-4165-926D-EDE9846E0DBD@fuhm.net> <87mxozayam.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEDCB86.9030506@canterbury.ac.nz> <87ipzm6oqr.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Nov 24, 2010, at 10:55 PM, Stephen J. Turnbull wrote: > Greg Ewing writes: >> On 24/11/10 22:03, Stephen J. Turnbull wrote: >>> But >>> if you actually need to remember positions, or regions, to jump to >>> later or to communicate to other code that manipulates them, doing >>> this stuff the straightforward way (just copying the whole iterator >>> object to hang on to its state) becomes expensive. >> >> If the internal representation of a text pointer (I won't call it >> an iterator because that means something else in Python) is a byte >> offset or something similar, it shouldn't take up any more space >> than a Python int, which is what you'd be using anyway if you >> represented text positions by grapheme indexes or whatever. > > That's not necessarily true. Eg, in Emacs ("there you go again"), > Lisp integers are not only immediate (saving one pointer), but the > type is encoded in the lower bits, so that there is no need for a type > pointer -- the representation is smaller than the opaque marker type. > Altogether, up to 8 of 12 bytes saved on a 32-bit platform, or 16 of > 24 bytes on a 64-bit platform. Yes, yes, lisp is very clever. Maybe some other runtime, like PyPy, could make this optimization. But I don't think that anyone is filling up main memory with gigantic piles of character indexes and need to squeeze out that extra couple of bytes of memory on such a tiny object. Plus, this would allow such a user to stop copying the character data itself just to decode it, and on mostly-ascii UTF-8 text (a common use-case) this is a 2x savings right off the bat. > In Python it's true that markers can use the same data structure as > integers and simply provide different methods, and it's arguable that > Python's design is better. But if you use bytes internally, then you > have problems. No, you just have design questions. > Do you expose that byte value to the user? Yes, but only if they ask for it. It's useful for computing things like quota and the like. > Can users (programmers using the language and end users) specify positions in terms of byte values? Sure, why not? > If so, what do you do if the user specifies a byte value that points into a multibyte character? Go to the beginning of the multibyte character. Report that position; if the user then asks the requested marker object for its position, it will report that byte offset, not the originally-requested one. (Obviously, do the same thing for surrogate pair code points.) > What if the user wants to specify position by number of characters? Part of the point that we are trying to make here is that nobody really cares about that use-case. In order to know anything useful about a position in a text, you have to have traversed to that location in the text. You can remember interesting things like the offsets of starts of lines, or the x/y positions of characters. > Can you translate efficiently? No, because there's no point :). But you _could_ implement an overlay that cached things like the beginning of lines, or the x/y positions of interesting characters. > As I say elsewhere, it's possible that there really never is a need to efficiently specify an absolute position in a large text as a character (grapheme, whatever) count. > But I think it would be hard to implement an efficient text-processing *language*, eg, a Python module > for *full conformance* in handling Unicode, on top of UTF-8. Still: why? I guess if I have some free time I'll try my hand at it, and maybe I'll run into a wall and realize you're right :). > Any time you have an algorithm that requires efficient access to arbitrary text positions, you'll spend all your skull sweat fighting the representation. At least, that's been my experience with Emacsen. What sort of algorithm would that be, though? The main thing that I could think of is a text editor trying to efficiently allow the user to scroll to the middle of a large file without reading the whole thing into memory. But, in that case, you could use byte-positions to estimate, and display an heuristic number while calculating the real line numbers. (This is what 'less' does, and it seems to work well.) >> So I don't really see what you're arguing for here. How do >> *you* think positions in unicode strings should be represented? > > I think what users should see is character positions, and they should > be able to specify them numerically as well as via an opaque marker > object. I don't care whether that position is represented as bytes or > characters internally, except that the experience of Emacsen is that > representation as byte positions is both inefficient and fragile. The > representation as character positions is more robust but slightly more > inefficient. Is it really the representation as byte positions which is fragile (i.e. the internal implementation detail), or the exposure of that position to calling code, and the idiomatic usage of that number as an integer? -------------- next part -------------- An HTML attachment was scrubbed... URL: From facundobatista at gmail.com Fri Nov 26 16:05:09 2010 From: facundobatista at gmail.com (Facundo Batista) Date: Fri, 26 Nov 2010 12:05:09 -0300 Subject: [Python-Dev] [Preview] Comments and change proposals on documentation In-Reply-To: References: Message-ID: On Wed, Nov 24, 2010 at 5:24 PM, Georg Brandl wrote: > at , you can look at a version of the 3.2 > docs that has the upcoming commenting feature. ?JavaScript is mandatory. This is awesome!! Thanks for this work, remember to buy you a beer next PyCon! > Credits go to Jacob Mason, whose GSOC project is responsible for almost all > of what you see there. ?[1] Ok, two beers. -- .? ? Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From ocean-city at m2.ccsnet.ne.jp Fri Nov 26 17:33:50 2010 From: ocean-city at m2.ccsnet.ne.jp (Hirokazu Yamamoto) Date: Sat, 27 Nov 2010 01:33:50 +0900 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <201011140106.55153.victor.stinner@haypocalc.com> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011121308.30368.victor.stinner@haypocalc.com> <4CDEBB11.5050209@m2.ccsnet.ne.jp> <201011140106.55153.victor.stinner@haypocalc.com> Message-ID: <4CEFE16E.6040801@m2.ccsnet.ne.jp> On 2010/11/14 9:06, Victor Stinner wrote: > Yes, but how do you check if the input argument is a bytes or a str object > with your PyArg_Parse converter? You should use "O" format and manually > convert it to unicode, and then convert the result back to bytes (if the input > was bytes). It don't think that it makes the code shorter. > > The code is currently working. The question is if we have to drop the ANSI API > now, later or never. It looks like the decision moves to "later" (deprecate in > 3.2, remove in 3.3). I still think that drop now doesn't really hurt. > > Victor Humble thoughts... Is it possible a conversion from bytes (ANSI) to unicode fails on windows? If not, is it allowed to convert to unicode with PyUnicode_FSDecoder if function doesn't return str? For example, os.stat() takes str as arguments but doesn't return str. # I noticed win_readlink() in Modules/posixmodule.c already unicode # only. Maybe not so much problem? ;-) From ocean-city at m2.ccsnet.ne.jp Fri Nov 26 18:06:06 2010 From: ocean-city at m2.ccsnet.ne.jp (Hirokazu Yamamoto) Date: Sat, 27 Nov 2010 02:06:06 +0900 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <201011111718.08207.eckhardt@satorlaser.com> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011111718.08207.eckhardt@satorlaser.com> Message-ID: <4CEFE8FE.8060201@m2.ccsnet.ne.jp> On 2010/11/12 1:18, Ulrich Eckhardt wrote: >> # I recently did it for winsound.PlaySound with MvL's approval > > Interesting, is there a ticket associate with this? Also, was that on Python 3 > or 2? Which commits? Sorry for late posting. Rev 86300 and Issue 6317. From status at bugs.python.org Fri Nov 26 18:07:01 2010 From: status at bugs.python.org (Python tracker) Date: Fri, 26 Nov 2010 18:07:01 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20101126170701.EDA80104026@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2010-11-19 - 2010-11-26) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 2533 (-16) closed 19792 (+98) total 22325 (+82) Open issues with patches: 1083 Issues opened (66) ================== #1178: IDLE - add "paste code" functionality http://bugs.python.org/issue1178 reopened by ned.deily #3709: BaseHTTPRequestHandler innefficient when sending HTTP header http://bugs.python.org/issue3709 reopened by r.david.murray #5150: IDLE to support reindent.py http://bugs.python.org/issue5150 reopened by rhettinger #8879: Implement os.link on Windows http://bugs.python.org/issue8879 reopened by amaury.forgeotdarc #9769: PyUnicode_FromFormatV() doesn't handle non-ascii text correctl http://bugs.python.org/issue9769 reopened by belopolsky #10220: Make generator state easier to introspect http://bugs.python.org/issue10220 reopened by ncoghlan #10268: Add --enable-loadable-sqlite-extensions option to `configure` http://bugs.python.org/issue10268 reopened by ned.deily #10441: some stdlib modules need to be updated to handle SSL certifica http://bugs.python.org/issue10441 reopened by pitrou #10453: Add -h/--help option to compileall http://bugs.python.org/issue10453 reopened by eric.araujo #10464: netrc module not parsing passwords containing #s. http://bugs.python.org/issue10464 opened by the_isz #10466: locale.py resetlocale throws exception on Windows (getdefaultl http://bugs.python.org/issue10466 opened by skoczian #10469: test_socket fails using Visual Studio 2010 http://bugs.python.org/issue10469 opened by Kotan #10475: hardcoded compilers for LDSHARED/LDCXXSHARED on NetBSD http://bugs.python.org/issue10475 opened by njoly #10478: Ctrl-C locks up the interpreter http://bugs.python.org/issue10478 opened by isandler #10479: cgitb.py should assume a binary stream for output http://bugs.python.org/issue10479 opened by v+python #10480: cgi.py should document the need for binary stdin/stdout http://bugs.python.org/issue10480 opened by v+python #10481: subprocess PIPEs are byte streams http://bugs.python.org/issue10481 opened by v+python #10482: subprocess and deadlock avoidance http://bugs.python.org/issue10482 opened by v+python #10483: http.server - what is executable on Windows http://bugs.python.org/issue10483 opened by v+python #10484: http.server.is_cgi fails to handle CGI URLs containing PATH_IN http://bugs.python.org/issue10484 opened by v+python #10485: http.server fails when query string contains addition '?' char http://bugs.python.org/issue10485 opened by v+python #10486: http.server doesn't set all CGI environment variables http://bugs.python.org/issue10486 opened by v+python #10487: http.server - doesn't process Status: header from CGI scripts http://bugs.python.org/issue10487 opened by v+python #10492: test_doctest fails with iso-8859-15 locale http://bugs.python.org/issue10492 opened by pitrou #10494: Demo/comparisons/regextest.py needs some usage information. http://bugs.python.org/issue10494 opened by ramiroluz #10495: Demo/comparisons/sortingtest.py needs some usage information. http://bugs.python.org/issue10495 opened by ramiroluz #10496: "import site failed" when Python can't find home directory http://bugs.python.org/issue10496 opened by bbi5291 #10497: Incorrect use of gettext in argparse http://bugs.python.org/issue10497 opened by eric.araujo #10498: calendar.LocaleHTMLCalendar.formatyearpage() results in traceb http://bugs.python.org/issue10498 opened by r.david.murray #10499: Modular interpolation in configparser http://bugs.python.org/issue10499 opened by lukasz.langa #10500: Palevo.DZ worm msix86 installer 3.x installer http://bugs.python.org/issue10500 opened by VilIgnoble #10502: Add unittestguirunner to Tools/ http://bugs.python.org/issue10502 opened by michael.foord #10503: os.getuid() documentation should be clear on what kind of uid http://bugs.python.org/issue10503 opened by giampaolo.rodola #10504: Trivial mingw compile fixes http://bugs.python.org/issue10504 opened by jonny #10507: Check well-formedness of reST markup within "make patchcheck" http://bugs.python.org/issue10507 opened by dmalcolm #10509: PyTokenizer_FindEncoding can lead to a segfault if bad charact http://bugs.python.org/issue10509 opened by Trundle #10510: distutils upload/register should use CRLF in HTTP requests http://bugs.python.org/issue10510 opened by Brian.Jones #10512: regrtest ResourceWarning - unclosed sockets and files http://bugs.python.org/issue10512 opened by nvawda #10513: sqlite3.InterfaceError after commit http://bugs.python.org/issue10513 opened by anders.blomdell at control.lth.se #10514: configure does not create accurate Makefile http://bugs.python.org/issue10514 opened by daelious #10515: csv sniffer does not recognize quotes at the end of line http://bugs.python.org/issue10515 opened by Martin.Budaj #10516: Add list.clear() and list.copy() http://bugs.python.org/issue10516 opened by terry.reedy #10517: test_concurrent_futures crashes with "Fatal Python error: Inva http://bugs.python.org/issue10517 opened by lukasz.langa #10518: Bring back callable() http://bugs.python.org/issue10518 opened by pitrou #10519: setobject.c no-op typo http://bugs.python.org/issue10519 opened by arigo #10521: str methods don't accept non-BMP fillchar on a narrow Unicode http://bugs.python.org/issue10521 opened by belopolsky #10522: test_telnet exception http://bugs.python.org/issue10522 opened by pitrou #10523: argparse has problem parsing option files containing empty row http://bugs.python.org/issue10523 opened by Michal.Pomorski #10524: Patch to add Pardus to supported dists in platform http://bugs.python.org/issue10524 opened by zaburt #10527: multiprocessing.Pipe problem: "handle out of range in select() http://bugs.python.org/issue10527 opened by synapse #10528: argparse uses %s in gettext calls http://bugs.python.org/issue10528 opened by eric.araujo #10529: Write argparse i18n howto http://bugs.python.org/issue10529 opened by eric.araujo #10530: distutils2 should allow the installing of python files with in http://bugs.python.org/issue10530 opened by michael.foord #10531: write tilted text in turtle http://bugs.python.org/issue10531 opened by lanyjie #10532: A bug related to matching the empty string http://bugs.python.org/issue10532 opened by lanyjie #10533: Need example of using __missing__ http://bugs.python.org/issue10533 opened by lukasz.langa #10534: difflib.SequenceMatcher: expose junk sets, deprecate undocumen http://bugs.python.org/issue10534 opened by terry.reedy #10535: Enable warnings by default in unittest http://bugs.python.org/issue10535 opened by ezio.melotti #10536: Enhancements to gettext docs http://bugs.python.org/issue10536 opened by eric.araujo #10537: IDLE crashes when you paste something. http://bugs.python.org/issue10537 opened by 5ragar5 #10538: PyArg_ParseTuple("s*") does not always incref object http://bugs.python.org/issue10538 opened by krisvale #10539: Regular expression not checking 'range' element on 1st char in http://bugs.python.org/issue10539 opened by TxRxFx #10540: test_shutil fails on Windows after r86733 http://bugs.python.org/issue10540 opened by brian.curtin #10541: regrtest.py -T broken http://bugs.python.org/issue10541 opened by doerwalter #10542: Py_UNICODE_NEXT and other macros for surrogates http://bugs.python.org/issue10542 opened by belopolsky #10543: Test discovery (unittest) does not work with jython http://bugs.python.org/issue10543 opened by michael.foord Most recent 15 issues with no replies (15) ========================================== #10543: Test discovery (unittest) does not work with jython http://bugs.python.org/issue10543 #10542: Py_UNICODE_NEXT and other macros for surrogates http://bugs.python.org/issue10542 #10541: regrtest.py -T broken http://bugs.python.org/issue10541 #10539: Regular expression not checking 'range' element on 1st char in http://bugs.python.org/issue10539 #10538: PyArg_ParseTuple("s*") does not always incref object http://bugs.python.org/issue10538 #10537: IDLE crashes when you paste something. http://bugs.python.org/issue10537 #10536: Enhancements to gettext docs http://bugs.python.org/issue10536 #10534: difflib.SequenceMatcher: expose junk sets, deprecate undocumen http://bugs.python.org/issue10534 #10531: write tilted text in turtle http://bugs.python.org/issue10531 #10530: distutils2 should allow the installing of python files with in http://bugs.python.org/issue10530 #10523: argparse has problem parsing option files containing empty row http://bugs.python.org/issue10523 #10522: test_telnet exception http://bugs.python.org/issue10522 #10514: configure does not create accurate Makefile http://bugs.python.org/issue10514 #10507: Check well-formedness of reST markup within "make patchcheck" http://bugs.python.org/issue10507 #10499: Modular interpolation in configparser http://bugs.python.org/issue10499 Most recent 15 issues waiting for review (15) ============================================= #10542: Py_UNICODE_NEXT and other macros for surrogates http://bugs.python.org/issue10542 #10540: test_shutil fails on Windows after r86733 http://bugs.python.org/issue10540 #10536: Enhancements to gettext docs http://bugs.python.org/issue10536 #10535: Enable warnings by default in unittest http://bugs.python.org/issue10535 #10527: multiprocessing.Pipe problem: "handle out of range in select() http://bugs.python.org/issue10527 #10524: Patch to add Pardus to supported dists in platform http://bugs.python.org/issue10524 #10521: str methods don't accept non-BMP fillchar on a narrow Unicode http://bugs.python.org/issue10521 #10518: Bring back callable() http://bugs.python.org/issue10518 #10515: csv sniffer does not recognize quotes at the end of line http://bugs.python.org/issue10515 #10512: regrtest ResourceWarning - unclosed sockets and files http://bugs.python.org/issue10512 #10509: PyTokenizer_FindEncoding can lead to a segfault if bad charact http://bugs.python.org/issue10509 #10504: Trivial mingw compile fixes http://bugs.python.org/issue10504 #10499: Modular interpolation in configparser http://bugs.python.org/issue10499 #10498: calendar.LocaleHTMLCalendar.formatyearpage() results in traceb http://bugs.python.org/issue10498 #10497: Incorrect use of gettext in argparse http://bugs.python.org/issue10497 Top 10 most discussed issues (10) ================================= #10461: Use with statement throughout the docs http://bugs.python.org/issue10461 27 msgs #7995: On Mac / BSD sockets returned by accept inherit the parent's F http://bugs.python.org/issue7995 24 msgs #10453: Add -h/--help option to compileall http://bugs.python.org/issue10453 24 msgs #9915: speeding up sorting with a key http://bugs.python.org/issue9915 14 msgs #9742: Python 2.7: math module fails to build on Solaris 9 http://bugs.python.org/issue9742 13 msgs #10533: Need example of using __missing__ http://bugs.python.org/issue10533 13 msgs #9509: argparse FileType raises ugly exception for missing file http://bugs.python.org/issue9509 12 msgs #10469: test_socket fails using Visual Studio 2010 http://bugs.python.org/issue10469 12 msgs #10504: Trivial mingw compile fixes http://bugs.python.org/issue10504 12 msgs #10518: Bring back callable() http://bugs.python.org/issue10518 12 msgs Issues closed (92) ================== #2244: urllib and urllib2 decode userinfo multiple times http://bugs.python.org/issue2244 closed by orsenthil #2986: difflib.SequenceMatcher not matching long sequences http://bugs.python.org/issue2986 closed by terry.reedy #3292: Position index limit; s.insert(i,x) not same as s[i:i]=[x] http://bugs.python.org/issue3292 closed by rhettinger #4493: urllib2 doesn't always supply / where URI path component is em http://bugs.python.org/issue4493 closed by orsenthil #4925: Improve error message of subprocess when cannot open http://bugs.python.org/issue4925 closed by benjamin.peterson #5353: Improve IndexError messages with actual values http://bugs.python.org/issue5353 closed by rhettinger #5412: extend configparser to support mapping access(__*item__) http://bugs.python.org/issue5412 closed by lukasz.langa #5616: Distutils 2to3 support doesn't have the doctest_only flag. http://bugs.python.org/issue5616 closed by eric.araujo #6166: encoding error for 'setup.py --author' when read via subproces http://bugs.python.org/issue6166 closed by eric.araujo #6378: Patch to make 'idle.bat' run idle.pyw using appropriate Python http://bugs.python.org/issue6378 closed by brian.curtin #6466: duplicate get_version() code between cygwinccompiler and emxcc http://bugs.python.org/issue6466 closed by eric.araujo #6722: collections.namedtuple: confusing example http://bugs.python.org/issue6722 closed by rhettinger #6799: mimetypes does not give canonical extension for guess_extensio http://bugs.python.org/issue6799 closed by eric.araujo #6878: changed return type from tkinter.Canvas.coords http://bugs.python.org/issue6878 closed by belopolsky #7212: Retrieve an arbitrary element from a set without removing it http://bugs.python.org/issue7212 closed by rhettinger #7226: IDLE right-clicks don't work on Mac OS 10.5 http://bugs.python.org/issue7226 closed by ned.deily #7257: Improve documentation of list.sort and sorted() http://bugs.python.org/issue7257 closed by rhettinger #7645: test_distutils fails on Windows XP http://bugs.python.org/issue7645 closed by brian.curtin #7770: sin/cos function in decimal-docs http://bugs.python.org/issue7770 closed by rhettinger #7804: test_readline failure http://bugs.python.org/issue7804 closed by pitrou #8078: add more baud constants to termios http://bugs.python.org/issue8078 closed by pitrou #8340: bytearray undocumented on trunk http://bugs.python.org/issue8340 closed by pitrou #8381: IDLE 2.6 freezes on OS X 10.6 http://bugs.python.org/issue8381 closed by ned.deily #8569: Upgrade OpenSSL in Windows builds http://bugs.python.org/issue8569 closed by brian.curtin #8590: test_httpservers.CGIHTTPServerTestCase failure on 3.1-maint Ma http://bugs.python.org/issue8590 closed by michael.foord #8631: subprocess.Popen.communicate(...) hangs on Windows http://bugs.python.org/issue8631 closed by brian.curtin #8645: PyUnicode_AsEncodedObject is undocumented http://bugs.python.org/issue8645 closed by belopolsky #8646: PyUnicode_EncodeDecimal is undocumented http://bugs.python.org/issue8646 closed by belopolsky #8647: PyUnicode_GetMax is undocumented http://bugs.python.org/issue8647 closed by eric.araujo #8705: shutil.rmtree with empty filepath http://bugs.python.org/issue8705 closed by brian.curtin #8938: Mac OS dialogs(Save As..., Load) translation http://bugs.python.org/issue8938 closed by ned.deily #9222: IDLE: Fix open/saveas 'Files of type' choices http://bugs.python.org/issue9222 closed by terry.reedy #9500: urllib2: Content-Encoding http://bugs.python.org/issue9500 closed by r.david.murray #9732: Addition of getattr_static for inspect module http://bugs.python.org/issue9732 closed by michael.foord #9746: All sequence types support .index and .count http://bugs.python.org/issue9746 closed by eric.araujo #9802: Document 'stability' of builtin min() and max() http://bugs.python.org/issue9802 closed by rhettinger #9807: deriving configuration information for different builds with t http://bugs.python.org/issue9807 closed by barry #9846: ZipExtFile provides no mechanism for closing the underlying fi http://bugs.python.org/issue9846 closed by lukasz.langa #9852: test_ctypes fail with clang http://bugs.python.org/issue9852 closed by ned.deily #9876: ConfigParser can't interpolate values from other sections http://bugs.python.org/issue9876 closed by lukasz.langa #9965: Loading malicious pickle may cause excessive memory usage http://bugs.python.org/issue9965 closed by georg.brandl #10134: test_email failures on Windows: end of line issue? http://bugs.python.org/issue10134 closed by r.david.murray #10138: calendar module does not support years outside [1, 9999] range http://bugs.python.org/issue10138 closed by belopolsky #10164: Add an assertBytesEqual to unittest and use it for bytes asser http://bugs.python.org/issue10164 closed by rhettinger #10172: code block has no syntax coloring http://bugs.python.org/issue10172 closed by georg.brandl #10183: test_concurrent_futures failure on Windows http://bugs.python.org/issue10183 closed by bquinlan #10255: refleak in initstdio http://bugs.python.org/issue10255 closed by pitrou #10299: Add index with links section for built-in functions http://bugs.python.org/issue10299 closed by ezio.melotti #10319: SocketServer.TCPServer truncates responses on close (in some s http://bugs.python.org/issue10319 closed by orsenthil #10325: PY_LLONG_MAX & co - preprocessor constants or not? http://bugs.python.org/issue10325 closed by mark.dickinson #10366: Remove unneeded '(object)' from 3.x class examples http://bugs.python.org/issue10366 closed by eric.araujo #10371: Deprecate trace module undocumented API http://bugs.python.org/issue10371 closed by belopolsky #10377: cProfile incorrectly labels its output http://bugs.python.org/issue10377 closed by orsenthil #10391: obj2ast's error handling can lead to python crashing with a C- http://bugs.python.org/issue10391 closed by benjamin.peterson #10420: Document of Bdb.effective is wrong. http://bugs.python.org/issue10420 closed by georg.brandl #10430: _sha.sha().digest() method is endian-sensitive. and hexdigest( http://bugs.python.org/issue10430 closed by krisvale #10437: ThreadPoolExecutor should accept max_workers=None http://bugs.python.org/issue10437 closed by stutzbach #10439: PyCodec C API is not documented in reST http://bugs.python.org/issue10439 closed by georg.brandl #10448: Add Mako template benchmark to Python Benchmark Suite http://bugs.python.org/issue10448 closed by pitrou #10450: Fix markup in Misc/NEWS http://bugs.python.org/issue10450 closed by eric.araujo #10458: 2.7 += re.ASCII http://bugs.python.org/issue10458 closed by terry.reedy #10459: missing character names in unicodedata (CJK...) http://bugs.python.org/issue10459 closed by loewis #10460: Misc/indent.pro does not reflect PEP 7 http://bugs.python.org/issue10460 closed by georg.brandl #10462: Handler.close is not called in subclass while Logger.removeHan http://bugs.python.org/issue10462 closed by vinay.sajip #10463: Wrong return type for xml.etree.ElementTree.parse() http://bugs.python.org/issue10463 closed by tiwoc #10465: gzip module calls getattr incorrectly http://bugs.python.org/issue10465 closed by georg.brandl #10467: io.BytesIO.readinto() segfaults when used on BytesIO object se http://bugs.python.org/issue10467 closed by benjamin.peterson #10468: Document UnicodeError access functions http://bugs.python.org/issue10468 closed by georg.brandl #10470: python -m unittest ought to default to discovery http://bugs.python.org/issue10470 closed by michael.foord #10471: include documentation in python docs and under python -h for o http://bugs.python.org/issue10471 closed by georg.brandl #10472: Strange tab key behaviour in interactive python 2.7 OSX 10.6.2 http://bugs.python.org/issue10472 closed by ned.deily #10473: Strange behavior for socket.timeout http://bugs.python.org/issue10473 closed by ned.deily #10474: range.count returns boolean http://bugs.python.org/issue10474 closed by benjamin.peterson #10476: __iter__ on a byte file object using a method to return an ite http://bugs.python.org/issue10476 closed by benjamin.peterson #10477: AttributeError: 'NoneType' object has no attribute 'name' (bo http://bugs.python.org/issue10477 closed by eric.araujo #10488: Improve documentation for 'float' built-in. http://bugs.python.org/issue10488 closed by mark.dickinson #10489: configparser: remove broken `__name__` support http://bugs.python.org/issue10489 closed by lukasz.langa #10490: mimetypes read_windows_registry fails for non-ASCII keys http://bugs.python.org/issue10490 closed by r.david.murray #10491: Insecure Windows python directory permissions http://bugs.python.org/issue10491 closed by loewis #10493: test_strptime failures under OpenIndiana http://bugs.python.org/issue10493 closed by jcea #10501: make_buildinfo regression with unquoted path http://bugs.python.org/issue10501 closed by krisvale #10505: test_compileall: failure on Windows http://bugs.python.org/issue10505 closed by eric.araujo #10506: argparse execute system exit in python prompt http://bugs.python.org/issue10506 closed by r.david.murray #10508: compiler warnings about formatting pid_t as an int http://bugs.python.org/issue10508 closed by georg.brandl #10511: heapq docs clarification http://bugs.python.org/issue10511 closed by georg.brandl #10520: Build with --enable-shared fails http://bugs.python.org/issue10520 closed by barry #10525: Added mouse and colour support to Game of Life curses demo http://bugs.python.org/issue10525 closed by orsenthil #10526: Minor typo in What's New in Python 2.7 http://bugs.python.org/issue10526 closed by georg.brandl #10345: fcntl.ioctl always fails claiming an invalid fd http://bugs.python.org/issue10345 closed by ned.deily #1059244: distutil bdist hardcodes the python location http://bugs.python.org/issue1059244 closed by eric.araujo #1574217: isinstance swallows exceptions http://bugs.python.org/issue1574217 closed by r.david.murray #1699853: locale.getlocale() output fails as setlocale() input http://bugs.python.org/issue1699853 closed by r.david.murray From fijall at gmail.com Fri Nov 26 19:23:45 2010 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 26 Nov 2010 20:23:45 +0200 Subject: [Python-Dev] PyPy 1.4 released Message-ID: =============================== PyPy 1.4: Ouroboros in practice =============================== We're pleased to announce the 1.4 release of PyPy. This is a major breakthrough in our long journey, as PyPy 1.4 is the first PyPy release that can translate itself faster than CPython. Starting today, we are using PyPy more for our every-day development. So may you :) You can download it here: http://pypy.org/download.html What is PyPy ============ PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython. It's fast (`pypy 1.4 and cpython 2.6`_ comparison) Among its new features, this release includes numerous performance improvements (which made fast self-hosting possible), a 64-bit JIT backend, as well as serious stabilization. As of now, we can consider the 32-bit and 64-bit linux versions of PyPy stable enough to run `in production`_. Numerous speed achievements are described on `our blog`_. Normalized speed charts comparing `pypy 1.4 and pypy 1.3`_ as well as `pypy 1.4 and cpython 2.6`_ are available on benchmark website. For the impatient: yes, we got a lot faster! More highlights =============== * PyPy's built-in Just-in-Time compiler is fully transparent and automatically generated; it now also has very reasonable memory requirements. The total memory used by a very complex and long-running process (translating PyPy itself) is within 1.5x to at most 2x the memory needed by CPython, for a speed-up of 2x. * More compact instances. All instances are as compact as if they had ``__slots__``. This can give programs a big gain in memory. (In the example of translation above, we already have carefully placed ``__slots__``, so there is no extra win.) * `Virtualenv support`_: now PyPy is fully compatible with virtualenv_: note that to use it, you need a recent version of virtualenv (>= 1.5). * Faster (and JITted) regular expressions - huge boost in speeding up the `re` module. * Other speed improvements, like JITted calls to functions like map(). .. _virtualenv: http://pypi.python.org/pypi/virtualenv .. _`Virtualenv support`: http://morepypy.blogspot.com/2010/08/using-virtualenv-with-pypy.html .. _`in production`: http://morepypy.blogspot.com/2010/11/running-large-radio-telescope-software.html .. _`our blog`: http://morepypy.blogspot.com .. _`pypy 1.4 and pypy 1.3`: http://speed.pypy.org/comparison/?exe=1%2B41,1%2B172&ben=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20&env=1&hor=false&bas=1%2B41&chart=normal+bars .. _`pypy 1.4 and cpython 2.6`: http://speed.pypy.org/comparison/?exe=2%2B35,1%2B172&ben=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20&env=1&hor=false&bas=2%2B35&chart=normal+bars Cheers, Carl Friedrich Bolz, Antonio Cuni, Maciej Fijalkowski, Amaury Forgeot d'Arc, Armin Rigo and the PyPy team From reid.kleckner at gmail.com Fri Nov 26 19:33:54 2010 From: reid.kleckner at gmail.com (Reid Kleckner) Date: Fri, 26 Nov 2010 13:33:54 -0500 Subject: [Python-Dev] PyPy 1.4 released In-Reply-To: References: Message-ID: Congratulations! Excellent work. Reid On Fri, Nov 26, 2010 at 1:23 PM, Maciej Fijalkowski wrote: > =============================== > PyPy 1.4: Ouroboros in practice > =============================== > > We're pleased to announce the 1.4 release of PyPy. This is a major breakthrough > in our long journey, as PyPy 1.4 is the first PyPy release that can translate > itself faster than CPython. ?Starting today, we are using PyPy more for > our every-day development. ?So may you :) You can download it here: > > ? ?http://pypy.org/download.html > > What is PyPy > ============ > > PyPy is a very compliant Python interpreter, almost a drop-in replacement > for CPython. It's fast (`pypy 1.4 and cpython 2.6`_ comparison) > > Among its new features, this release includes numerous performance improvements > (which made fast self-hosting possible), a 64-bit JIT backend, as well > as serious stabilization. As of now, we can consider the 32-bit and 64-bit > linux versions of PyPy stable enough to run `in production`_. > > Numerous speed achievements are described on `our blog`_. Normalized speed > charts comparing `pypy 1.4 and pypy 1.3`_ as well as `pypy 1.4 and cpython 2.6`_ > are available on benchmark website. For the impatient: yes, we got a lot faster! > > More highlights > =============== > > * PyPy's built-in Just-in-Time compiler is fully transparent and > ?automatically generated; it now also has very reasonable memory > ?requirements. ?The total memory used by a very complex and > ?long-running process (translating PyPy itself) is within 1.5x to > ?at most 2x the memory needed by CPython, for a speed-up of 2x. > > * More compact instances. ?All instances are as compact as if > ?they had ``__slots__``. ?This can give programs a big gain in > ?memory. ?(In the example of translation above, we already have > ?carefully placed ``__slots__``, so there is no extra win.) > > * `Virtualenv support`_: now PyPy is fully compatible with > virtualenv_: note that > ?to use it, you need a recent version of virtualenv (>= 1.5). > > * Faster (and JITted) regular expressions - huge boost in speeding up > ?the `re` module. > > * Other speed improvements, like JITted calls to functions like map(). > > .. _virtualenv: http://pypi.python.org/pypi/virtualenv > .. _`Virtualenv support`: > http://morepypy.blogspot.com/2010/08/using-virtualenv-with-pypy.html > .. _`in production`: > http://morepypy.blogspot.com/2010/11/running-large-radio-telescope-software.html > .. _`our blog`: http://morepypy.blogspot.com > .. _`pypy 1.4 and pypy 1.3`: > http://speed.pypy.org/comparison/?exe=1%2B41,1%2B172&ben=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20&env=1&hor=false&bas=1%2B41&chart=normal+bars > .. _`pypy 1.4 and cpython 2.6`: > http://speed.pypy.org/comparison/?exe=2%2B35,1%2B172&ben=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20&env=1&hor=false&bas=2%2B35&chart=normal+bars > > Cheers, > > Carl Friedrich Bolz, Antonio Cuni, Maciej Fijalkowski, > Amaury Forgeot d'Arc, Armin Rigo and the PyPy team > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/reid.kleckner%40gmail.com > From brian.curtin at gmail.com Fri Nov 26 19:52:22 2010 From: brian.curtin at gmail.com (Brian Curtin) Date: Fri, 26 Nov 2010 12:52:22 -0600 Subject: [Python-Dev] [Python-checkins] r86817 - python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py In-Reply-To: <20101126184428.E04A0EE984@mail.python.org> References: <20101126184428.E04A0EE984@mail.python.org> Message-ID: On Fri, Nov 26, 2010 at 12:44, hirokazu.yamamoto wrote: > Author: hirokazu.yamamoto > Date: Fri Nov 26 19:44:28 2010 > New Revision: 86817 > > Log: > Now can reproduce the error on AMD64 Windows Server 2008 > even where os.symlink is not supported. > > > Modified: > python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py > > Modified: python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py > > ============================================================================== > --- python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py > (original) > +++ python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py Fri > Nov 26 19:44:28 2010 > @@ -271,24 +271,32 @@ > shutil.rmtree(src_dir) > shutil.rmtree(os.path.dirname(dst_dir)) > > - @support.skip_unless_symlink > + @unittest.skipUnless(hasattr(os, 'link'), 'requires os.link') > def test_dont_copy_file_onto_link_to_itself(self): > # bug 851123. > os.mkdir(TESTFN) > src = os.path.join(TESTFN, 'cheese') > dst = os.path.join(TESTFN, 'shop') > try: > - f = open(src, 'w') > - f.write('cheddar') > - f.close() > - > - if hasattr(os, "link"): > - os.link(src, dst) > - self.assertRaises(shutil.Error, shutil.copyfile, src, dst) > - with open(src, 'r') as f: > - self.assertEqual(f.read(), 'cheddar') > - os.remove(dst) > + with open(src, 'w') as f: > + f.write('cheddar') > + os.link(src, dst) > + self.assertRaises(shutil.Error, shutil.copyfile, src, dst) > + with open(src, 'r') as f: > + self.assertEqual(f.read(), 'cheddar') > + os.remove(dst) > + finally: > + shutil.rmtree(TESTFN, ignore_errors=True) > > + @support.skip_unless_symlink > + def test_dont_copy_file_onto_symlink_to_itself(self): > + # bug 851123. > + os.mkdir(TESTFN) > + src = os.path.join(TESTFN, 'cheese') > + dst = os.path.join(TESTFN, 'shop') > + try: > + with open(src, 'w') as f: > + f.write('cheddar') > # Using `src` here would mean we end up with a symlink pointing > # to TESTFN/TESTFN/cheese, while it should point at > # TESTFN/cheese. > @@ -298,10 +306,7 @@ > self.assertEqual(f.read(), 'cheddar') > os.remove(dst) > finally: > - try: > - shutil.rmtree(TESTFN) > - except OSError: > - pass > + shutil.rmtree(TESTFN, ignore_errors=True) > > @support.skip_unless_symlink > def test_rmtree_on_symlink(self): You might be working on something slightly different, but I have an issue created for the failure of that test: http://bugs.python.org/issue10540 It slipped past me because I was only running the test suite as a regular user without the required symlink privilege, so the test was skipped. That Server 2008 build slave runs the test suite as administrator, so it was running that test and going into the os.link block, which it didn't do until r86733. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ocean-city at m2.ccsnet.ne.jp Fri Nov 26 20:45:18 2010 From: ocean-city at m2.ccsnet.ne.jp (Hirokazu Yamamoto) Date: Sat, 27 Nov 2010 04:45:18 +0900 Subject: [Python-Dev] [Python-checkins] r86817 - python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py In-Reply-To: References: <20101126184428.E04A0EE984@mail.python.org> Message-ID: <4CF00E4E.6030507@m2.ccsnet.ne.jp> On 2010/11/27 3:52, Brian Curtin wrote: > On Fri, Nov 26, 2010 at 12:44, hirokazu.yamamoto > wrote: > >> Author: hirokazu.yamamoto >> Date: Fri Nov 26 19:44:28 2010 >> New Revision: 86817 >> >> Log: >> Now can reproduce the error on AMD64 Windows Server 2008 >> even where os.symlink is not supported. >> >> >> Modified: >> python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py >> >> Modified: python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py >> >> ============================================================================== >> --- python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py >> (original) >> +++ python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py Fri >> Nov 26 19:44:28 2010 >> @@ -271,24 +271,32 @@ >> shutil.rmtree(src_dir) >> shutil.rmtree(os.path.dirname(dst_dir)) >> >> - @support.skip_unless_symlink >> + @unittest.skipUnless(hasattr(os, 'link'), 'requires os.link') >> def test_dont_copy_file_onto_link_to_itself(self): >> # bug 851123. >> os.mkdir(TESTFN) >> src = os.path.join(TESTFN, 'cheese') >> dst = os.path.join(TESTFN, 'shop') >> try: >> - f = open(src, 'w') >> - f.write('cheddar') >> - f.close() >> - >> - if hasattr(os, "link"): >> - os.link(src, dst) >> - self.assertRaises(shutil.Error, shutil.copyfile, src, dst) >> - with open(src, 'r') as f: >> - self.assertEqual(f.read(), 'cheddar') >> - os.remove(dst) >> + with open(src, 'w') as f: >> + f.write('cheddar') >> + os.link(src, dst) >> + self.assertRaises(shutil.Error, shutil.copyfile, src, dst) >> + with open(src, 'r') as f: >> + self.assertEqual(f.read(), 'cheddar') >> + os.remove(dst) >> + finally: >> + shutil.rmtree(TESTFN, ignore_errors=True) >> >> + @support.skip_unless_symlink >> + def test_dont_copy_file_onto_symlink_to_itself(self): >> + # bug 851123. >> + os.mkdir(TESTFN) >> + src = os.path.join(TESTFN, 'cheese') >> + dst = os.path.join(TESTFN, 'shop') >> + try: >> + with open(src, 'w') as f: >> + f.write('cheddar') >> # Using `src` here would mean we end up with a symlink pointing >> # to TESTFN/TESTFN/cheese, while it should point at >> # TESTFN/cheese. >> @@ -298,10 +306,7 @@ >> self.assertEqual(f.read(), 'cheddar') >> os.remove(dst) >> finally: >> - try: >> - shutil.rmtree(TESTFN) >> - except OSError: >> - pass >> + shutil.rmtree(TESTFN, ignore_errors=True) >> >> @support.skip_unless_symlink >> def test_rmtree_on_symlink(self): > > > You might be working on something slightly different, but I have an issue > created for the failure of that test: http://bugs.python.org/issue10540 > > It slipped past me because I was only running the test suite as a regular > user without the required symlink privilege, so the test was skipped. That > Server 2008 build slave runs the test suite as administrator, so it was > running that test and going into the os.link block, which it didn't do until > r86733. I'm not sure, but why does os.path.samefile return False for hard link on windows? MSDN says, > A hard link is the file system representation of a file by which more > than one path references a single file in the same volume. (http://msdn.microsoft.com/en-us/library/aa365006%28VS.85%29.aspx) I know st_ino on windows is a bit different from POSIX, so, just I'm not sure. ;-) From brian.curtin at gmail.com Fri Nov 26 21:02:29 2010 From: brian.curtin at gmail.com (Brian Curtin) Date: Fri, 26 Nov 2010 14:02:29 -0600 Subject: [Python-Dev] [Python-checkins] r86817 - python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py In-Reply-To: <4CF00E4E.6030507@m2.ccsnet.ne.jp> References: <20101126184428.E04A0EE984@mail.python.org> <4CF00E4E.6030507@m2.ccsnet.ne.jp> Message-ID: On Fri, Nov 26, 2010 at 13:45, Hirokazu Yamamoto wrote: > On 2010/11/27 3:52, Brian Curtin wrote: > >> On Fri, Nov 26, 2010 at 12:44, hirokazu.yamamoto< >> python-checkins at python.org >> >>> wrote: >>> >> >> Author: hirokazu.yamamoto >>> Date: Fri Nov 26 19:44:28 2010 >>> New Revision: 86817 >>> >>> Log: >>> Now can reproduce the error on AMD64 Windows Server 2008 >>> even where os.symlink is not supported. >>> >>> >>> Modified: >>> python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py >>> >>> Modified: python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py >>> >>> >>> ============================================================================== >>> --- python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py >>> (original) >>> +++ python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py >>> Fri >>> Nov 26 19:44:28 2010 >>> @@ -271,24 +271,32 @@ >>> shutil.rmtree(src_dir) >>> shutil.rmtree(os.path.dirname(dst_dir)) >>> >>> - @support.skip_unless_symlink >>> + @unittest.skipUnless(hasattr(os, 'link'), 'requires os.link') >>> def test_dont_copy_file_onto_link_to_itself(self): >>> # bug 851123. >>> os.mkdir(TESTFN) >>> src = os.path.join(TESTFN, 'cheese') >>> dst = os.path.join(TESTFN, 'shop') >>> try: >>> - f = open(src, 'w') >>> - f.write('cheddar') >>> - f.close() >>> - >>> - if hasattr(os, "link"): >>> - os.link(src, dst) >>> - self.assertRaises(shutil.Error, shutil.copyfile, src, >>> dst) >>> - with open(src, 'r') as f: >>> - self.assertEqual(f.read(), 'cheddar') >>> - os.remove(dst) >>> + with open(src, 'w') as f: >>> + f.write('cheddar') >>> + os.link(src, dst) >>> + self.assertRaises(shutil.Error, shutil.copyfile, src, dst) >>> + with open(src, 'r') as f: >>> + self.assertEqual(f.read(), 'cheddar') >>> + os.remove(dst) >>> + finally: >>> + shutil.rmtree(TESTFN, ignore_errors=True) >>> >>> + @support.skip_unless_symlink >>> + def test_dont_copy_file_onto_symlink_to_itself(self): >>> + # bug 851123. >>> + os.mkdir(TESTFN) >>> + src = os.path.join(TESTFN, 'cheese') >>> + dst = os.path.join(TESTFN, 'shop') >>> + try: >>> + with open(src, 'w') as f: >>> + f.write('cheddar') >>> # Using `src` here would mean we end up with a symlink >>> pointing >>> # to TESTFN/TESTFN/cheese, while it should point at >>> # TESTFN/cheese. >>> @@ -298,10 +306,7 @@ >>> self.assertEqual(f.read(), 'cheddar') >>> os.remove(dst) >>> finally: >>> - try: >>> - shutil.rmtree(TESTFN) >>> - except OSError: >>> - pass >>> + shutil.rmtree(TESTFN, ignore_errors=True) >>> >>> @support.skip_unless_symlink >>> def test_rmtree_on_symlink(self): >>> >> >> >> You might be working on something slightly different, but I have an issue >> created for the failure of that test: http://bugs.python.org/issue10540 >> >> It slipped past me because I was only running the test suite as a regular >> user without the required symlink privilege, so the test was skipped. That >> Server 2008 build slave runs the test suite as administrator, so it was >> running that test and going into the os.link block, which it didn't do >> until >> r86733. >> > > I'm not sure, but why does os.path.samefile return False for hard link > on windows? MSDN says, > > > A hard link is the file system representation of a file by which more > > than one path references a single file in the same volume. > (http://msdn.microsoft.com/en-us/library/aa365006%28VS.85%29.aspx) > > I know st_ino on windows is a bit different from POSIX, so, just I'm not > sure. ;-) The samefile thing, I don't know either. GetFinalPathNameByHandle does not appear to work with hard links, at least how it's being used right now. It has no problem with symlinks. We briefly chatted about this on the os.link feature issue, but I never found a way around it. I'll look into it this weekend. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ocean-city at m2.ccsnet.ne.jp Fri Nov 26 21:18:58 2010 From: ocean-city at m2.ccsnet.ne.jp (Hirokazu Yamamoto) Date: Sat, 27 Nov 2010 05:18:58 +0900 Subject: [Python-Dev] [Python-checkins] r86817 - python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py In-Reply-To: References: <20101126184428.E04A0EE984@mail.python.org> <4CF00E4E.6030507@m2.ccsnet.ne.jp> Message-ID: <4CF01632.8070504@m2.ccsnet.ne.jp> On 2010/11/27 5:02, Brian Curtin wrote: > We briefly chatted about this on the os.link > feature issue, but I never found a way around it. How about implementing os.path.samefile in Modules/posixmodule.c like this? http://bugs.python.org/file19262/py3k_fix_kill_python_for_short_path.patch # I hope this works. From brian.curtin at gmail.com Fri Nov 26 21:31:49 2010 From: brian.curtin at gmail.com (Brian Curtin) Date: Fri, 26 Nov 2010 14:31:49 -0600 Subject: [Python-Dev] [Python-checkins] r86817 - python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py In-Reply-To: <4CF01632.8070504@m2.ccsnet.ne.jp> References: <20101126184428.E04A0EE984@mail.python.org> <4CF00E4E.6030507@m2.ccsnet.ne.jp> <4CF01632.8070504@m2.ccsnet.ne.jp> Message-ID: On Fri, Nov 26, 2010 at 14:18, Hirokazu Yamamoto wrote: > On 2010/11/27 5:02, Brian Curtin wrote: > >> We briefly chatted about this on the os.link >> feature issue, but I never found a way around it. >> > > How about implementing os.path.samefile in > Modules/posixmodule.c like this? > > http://bugs.python.org/file19262/py3k_fix_kill_python_for_short_path.patch > > # I hope this works. > That's almost identical to what the current os.path.sameopenfile is. Lib/ntpath.py opens both files, then compares them via _getfileinformation. That function is implemented to take in a file descriptor, call GetFileInformationByHandle with it, then returns a tuple of dwVolumeSerialNumber, nFileIndexHigh, and nFileIndexLow. -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Fri Nov 26 21:39:36 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 26 Nov 2010 21:39:36 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <4CEFE16E.6040801@m2.ccsnet.ne.jp> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011121308.30368.victor.stinner@haypocalc.com> <4CDEBB11.5050209@m2.ccsnet.ne.jp> <201011140106.55153.victor.stinner@haypocalc.com> <4CEFE16E.6040801@m2.ccsnet.ne.jp> Message-ID: <4CF01B08.9000409@v.loewis.de> > Is it possible a conversion from bytes (ANSI) to unicode fails on > windows? It should fail sometimes, right? Not for windows-1252, but certainly for shift-jis (you know better than me). It seems that whether MultiByteToWideChar will fail depends on whether MB_ERR_INVALID_CHARS is given or not. I don't know what it will do if this flag is not given - my guess it fills in REPLACEMENT CHARACTER. > If not, is it allowed to convert to unicode with > PyUnicode_FSDecoder if function doesn't return str? For example, > os.stat() takes str as arguments but doesn't return str. This I don't understand. os.stat doesn't return text at all - so what do you want to convert? > # I noticed win_readlink() in Modules/posixmodule.c already unicode > # only. Maybe not so much problem? ;-) Well, readlink is new on Windows, and symlinks are not widespread. So there is no backwards compatibility concern here. Regards, Martin From ncoghlan at gmail.com Sat Nov 27 08:35:52 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Nov 2010 17:35:52 +1000 Subject: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS In-Reply-To: References: <20101123203252.39BE7EE9CF@mail.python.org> <4CEC43A4.80907@netwok.org> <4CEC4917.2070508@udel.edu> Message-ID: On Thu, Nov 25, 2010 at 5:25 AM, Terry Reedy wrote: > I know now that I could always edit with IDLE's editor, but it is a lot > easier to right click and select edit than it is to run thru the directory > tree in an open dialog. If you want a decent free text editor on Windows, the open source Notepad++ does a very nice job. It also adds an "Edit with Notepad++" to the explorer context menu :) > And of course, since the pseudo-BOM addition is > undocumented within notepad itself, and probably other editors, it is easy > to not know. As far as the implicit BOM addition itself goes, reindent.py and reindent-rst.py could probably be updated to check for it, but the miscellaneous files (like ACKS) are likely to continue to need manual checks. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stephen at xemacs.org Sat Nov 27 09:48:52 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 27 Nov 2010 17:48:52 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> <3C1ADB64-63F3-4165-926D-EDE9846E0DBD@fuhm.net> <87mxozayam.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEDCB86.9030506@canterbury.ac.nz> <87ipzm6oqr.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87y68f5eyz.fsf@uwakimon.sk.tsukuba.ac.jp> Glyph Lefkowitz writes: > But I don't think that anyone is filling up main memory with > gigantic piles of character indexes and need to squeeze out that > extra couple of bytes of memory on such a tiny object. How do you think editors and browsers represent the regions that they highlight, then? How do you think that structure-oriented editors represent the structures that they work with, then? In a detailed analysis of a C or Java file, it's easy to end up with almost 1:2 positions to characters ratio. Note that *buffer* characters are typically smaller than a platform word, so saving one word in the representation of a position mean a 100% or more increase in the character count of the buffer. Even in the case of UCS-4 on a 32-bit platform, that's a 50% increase in the maximum usable size of a buffer before a parser starts raising OOM errors. There are two plausible ways to represent these structures that I can think of offhand. The first is to do it the way Emacs does, by reading the text into a buffer and using position offsets to map to display or structure attributes. The second is to use a hierarchical document model, and render the display by traversing the document hierarchy. It's not obvious to me that forcing use of the second representation is a good idea for performance in an editor, and I would think that they have similar memory requirements. > Plus, this would allow such a user to stop copying the character > data itself just to decode it, and on mostly-ascii UTF-8 text (a > common use-case) this is a 2x savings right off the bat. Which only matters if you're a server in the business of shoveling octets really fast but are CPU bound (seems unlikely to me, but I'm no expert; WDYT?), and even then is only that big a savings if you can push off the issue of validating the purported UTF-8 text on others. If you're not validating, you may as well acknowledge that you're processing binary data, not text.[1] But we're talking about text. And of course, if you copy mostly-Han UTF-8 text (a common use-case) to UCS-2, this is a 1.5x memory savings right off the bat, and a 3x time savings when iterating in most architectures (one increment operation per character instead of three). As I've already said, I don't think this is an argument in favor of either representation. Sometimes one wins, sometimes the other. I don't think supplying both is a great idea, although I've proposed it myself for XEmacs (but made as opaque as possible). > > In Python it's true that markers can use the same data structure as > > integers and simply provide different methods, and it's arguable that > > Python's design is better. But if you use bytes internally, then you > > have problems. > > No, you just have design questions. Call them what you like, they're as yet unanswered. In any given editing scenario, I'd concede that it's a "SMOD". But if you're designing a language for text processing, it's a restriction that I believe to be a hindrance to applications. Many applications may prefer to use a straightforward array implementation of text and focus their design efforts on the real problems of their use cases. > > Do you expose that byte value to the user? If so, what do you do > > if the user specifies a byte value that points into a multibyte > > character? > > Go to the beginning of the multibyte character. Report that > position; if the user then asks the requested marker object for its > position, it will report that byte offset, not the > originally-requested one. (Obviously, do the same thing for > surrogate pair code points.) I will guarantee that some use cases will prefer that you go to the beginning of the *next* character. For an obvious example, your algorithm will infloop if you iterate "pos += 1". (And the opposite problem appears for "beginning of next character" combined with "pos -= 1".) Of course this trivial example is easily addressed by saying "the user should be using the character iterator API here", but I expect the issue can arise where that is not an easy answer. Either the API becomes complex, or the user/developers will have to do complex bookkeeping that should be done by the text implementation. Nor is it obvious that surrogate pairs will be present in a UCS-2 representation. Specifically, they can be encoded to single private space characters in almost all applications, at a very small cost in performance. > > What if the user wants to specify position by number of > > characters? > > Part of the point that we are trying to make here is that nobody > really cares about that use-case. In order to know anything useful > about a position in a text, you have to have traversed to that > location in the text. Binary search of an ordered text is useful. Granted, this particular example can be addressed usefully in terms of byte positions (viz. your example of less), but your basic premise is falsified. > You can remember interesting things like the offsets of starts of > lines, or the x/y positions of characters. > > > Can you translate efficiently? > > No, because there's no point :). But you _could_ implement an > overlay that cached things like the beginning of lines, or the x/y > positions of interesting characters. Emacs does, and a lot of effort has gone into it, and it still sucks compared to an array representation. Maybe _you_ _could_ do better, but as yet we haven't managed to pull it off. :-( > > But I think it would be hard to implement an efficient > > text-processing *language*, eg, a Python module for *full > > conformance* in handling Unicode, on top of UTF-8. > > Still: why? I guess if I have some free time I'll try my hand at > it, and maybe I'll run into a wall and realize you're right :). I'd rather have you make it plausible to me that there's no point in having efficient access to arbitrary character positions. Then maybe you can delegate that implementation to me. :-) But my Emacs experience says otherwise, and IIUC the intuition and/or experience of MAL and Guido says this is not a YAGNI. > > Any time you have an algorithm that requires efficient access to > > arbitrary text positions, you'll spend all your skull sweat > > fighting the representation. At least, that's been my experience > > with Emacsen. > > What sort of algorithm would that be, though? The main thing that > I could think of is a text editor trying to efficiently allow the > user to scroll to the middle of a large file without reading the > whole thing into memory. Reading into memory or not is a red herring, I think. For many legacy encodings you have to pretty much read the whole thing because they are stateful, and it's just not very expensive compared to the text processing itself (unless your application is shoveling octets as fast as possible, in which case character positions are indeed a YAGNI). The question is whether opaque markers are always sufficient. For example, XEmacs does use byte positions internally for markers and extents (objects representing regions of text that can carry arbitrary properties but are tuned for display properties). Obviously, we have the marker objects you propose as sufficient, and indeed the representation is as efficient as you claim. However, these positions are not exposed as integers to end users, Lisp, or even most of the C code. If a client (end user or code) requests a position, they get a character position. Such requests are frequent enough that they constitute a major drag on many practical applications. It may be that this is unnecessary, as less shows for its application. But less is not an editor, let alone a language for writing editors. Do you know of an editor language of power comparable to Emacs Lisp that is not based on an array representation of text? > Is it really the representation as byte positions which is fragile > (i.e. the internal implementation detail), or the exposure of that > position to calling code, and the idiomatic usage of that number as > an integer? It's the latter. Sufficient effort can make it safe to use byte positions, and the effort is not all that great as long as you don't demand efficiency. XEmacs vs. Emacs implementation of Mule demonstrates that. We at XEmacs never did expose byte positions to even the C code (other than to buffer and string methods), and that implementation has not had to change much, if at all, in 15 years. The caching mechanism to make character position access reasonably efficient, however, has been buggy and not so efficient, and so complex that RMS said "I was going to implement your [position cache] in Emacs but it was too hard for me to understand". (OTOH, the alternative Emacs had implemented turned out to be O(n**2) or worse, so he had to replace it. Translating byte positions to character positions seems to be a real loser.) Emacs did expose byte positions for efficiency reasons, and has had at least four regressions of the "\201 bug". "\201" prefixes a Latin-1 character in internal code, and code that treated byte positions would often result in this being duplicated because all trailing bytes in Mule code are also Latin-1 code points. (Don't ask me about the exact mechanism, XEmacs's implementation is quite different and never suffered from this bug.) Note that a \201-like bug is very unlikely to occur in Python's UCS-2 representation because the semantics of surrogate values in Unicode is unambiguous. However, I believe similar bugs would be possible in a UTF-8 representation -- if code is allowed to choose whether to view UTF-8 in binary or text mode -- because trailing byte values are Latin-1 code points. Maybe I'm just an old granny, scared of my shadow. Footnotes: [1] I have no objection to providing "text" algorithms (such as regexps) for use on "binary" data. But then they don't provide any guarantees that transformations of purported text remains text. From ncoghlan at gmail.com Sat Nov 27 11:51:38 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Nov 2010 20:51:38 +1000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CED4E34.5060400@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> Message-ID: On Thu, Nov 25, 2010 at 3:41 AM, Michael Foord wrote: > Can you explain what you see as the difference? > > I'm not particularly interested in type validation but I like the fact that > typical enum APIs allow you to group constants: the generated constant class > acts as a namespace for all the defined constants. The problem with blessing one particular "enum API" is that people have so many different ideas as to what an enum API should look like. However, the one thing they all have in common is the ability to take a value and give it a name, then present *both* of those in debugging information. > Are you just suggesting something along the lines of: > > class NamedConstant(int): > def __new__(cls, name, val): > return int.__new__(cls, val) > > def __init__(self, name, val): > self._name = name > > def __repr__(self): > return ' ' % self._name > > FOO = NamedConstant('FOO', 3) > > In general the less features the better, but I'd like a few more features > than that. :-) Not quite. I'm suggesting a factory function that works for any value, and derives the parent class from the type of the supplied value. However, what you wrote is still the essence of the idea - we would be primarily providing a building block that makes it easier for people to *create* enum APIs if they want to, but for simple use cases (where all they really wanted was the enhanced debugging information) they wouldn't need to bother. In the standard library, wherever we do "enum-like things" we would switch to using named values where it makes sense to do so. Doing so may actually make sense for more than just constants - it may make sense for significant mutable globals as well. ========================================================================== # Implementation (more than just a sketch, since it handles some interesting corner cases) import functools @functools.lru_cache() def _make_named_value_type(base_type): class _NamedValueType(base_type): def __new__(cls, name, value): return base_type.__new__(cls, value) def __init__(self, name, value): self.__name = name super().__init__(value) @property def _name(self): return self.__name def _raw(self): return base_type(self) def __repr__(self): return "{}={}".format(self._name, super().__repr__()) if base_type.__str__ is object.__str__: __str__ = base_type.__repr__ _NamedValueType.__name__ = "Named<{}>".format(base_type.__name__) return _NamedValueType def named_value(name, value): return _make_named_value_type(type(value))(name, value) def set_named_values(namespace, **kwds): for k, v in kwds.items(): namespace[k] = named_value(k, v) x = named_value("FOO", 1) y = named_value("BAR", "Hello World!") z = named_value("BAZ", dict(a=1, b=2, c=3)) print(x, y, z, sep="\n") print("\n".join(map(repr, (x, y, z)))) print("\n".join(map(str, map(type, (x, y, z))))) set_named_values(globals(), foo=x._raw(), bar=y._raw(), baz=z._raw()) print("\n".join(map(repr, (foo, bar, baz)))) print(type(x) is type(foo), type(y) is type(bar), type(z) is type(baz)) ========================================================================== # Session output for the last 6 lines >>> print(x, y, z, sep="\n") 1 Hello World! {'a': 1, 'c': 3, 'b': 2} >>> print("\n".join(map(repr, (x, y, z)))) FOO=1 BAR='Hello World!' BAZ={'a': 1, 'c': 3, 'b': 2} >>> print("\n".join(map(str, map(type, (x, y, z))))) '> '> '> >>> set_named_values(globals(), foo=x._raw(), bar=y._raw(), baz=z._raw()) >>> print("\n".join(map(repr, (foo, bar, baz)))) foo=1 bar='Hello World!' baz={'a': 1, 'c': 3, 'b': 2} >>> print(type(x) is type(foo), type(y) is type(bar), type(z) is type(baz)) True True True For "normal" use, such objects would look like ordinary instances of their class. They would only behave differently when their representation is printed (prepending their name), or when their type is interrogated (being an instance of the named subclass rather than the ordinary type). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Nov 27 13:05:32 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Nov 2010 22:05:32 +1000 Subject: [Python-Dev] [Preview] Comments and change proposals on documentation In-Reply-To: References: Message-ID: On Thu, Nov 25, 2010 at 6:24 AM, Georg Brandl wrote: > Hi, > > at , you can look at a version of the 3.2 > docs that has the upcoming commenting feature. ?JavaScript is mandatory. Very nice! I'm not sure what to do about the discoverability of the comment bubbles as the end of each paragraph. I initially thought commenting wasn't available on What's New or the Using Python docs until seeing where the blue comment bubbles appeared in the math module docs. A discreet notice at the bottom of the sidebar and/or an explanation at the "Report a Bug" page may cover it I guess. > Please test on a smaller page, such as , > there is currently a speed issue with larger pages. ?(Helpful tips from > JS experts are welcome.) I gave the JS a fair few comments on the first paragraph to digest. I also put my detailed UI comments there as well (I needed something to write about while testing, so I figured I may as well make it useful to you!) > Other things I have to do before this can go live: > > * reuse existing logins from either wiki or tracker? Tracker sounds like the best bet to me. > Any feedback is appreciated (I'd suggest mailing it to doc-SIG only, to avoid > cluttering up python-dev). My comments may on the math module may give you a chance to see how easy it is to get text out of comments into a form suitable for sending to a mailing list or posting to a tracker issue for further discussion :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Nov 27 13:17:31 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Nov 2010 22:17:31 +1000 Subject: [Python-Dev] [Python-checkins] r86745 - in python/branches/py3k: Doc/library/difflib.rst Lib/difflib.py Lib/test/test_difflib.py Misc/NEWS In-Reply-To: <20101125061234.F1CC3EEA23@mail.python.org> References: <20101125061234.F1CC3EEA23@mail.python.org> Message-ID: On Thu, Nov 25, 2010 at 4:12 PM, terry.reedy wrote: > ?The :class:`SequenceMatcher` class has this constructor: > > > -.. class:: SequenceMatcher(isjunk=None, a='', b='') > +.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True) > > ? ?Optional argument *isjunk* must be ``None`` (the default) or a one-argument > ? ?function that takes a sequence element and returns true if and only if the > @@ -340,6 +349,9 @@ > ? ?The optional arguments *a* and *b* are sequences to be compared; both default to > ? ?empty strings. ?The elements of both sequences must be :term:`hashable`. > > + ? The optional argument *autojunk* can be used to disable the automatic junk > + ? heuristic. > + Catching up on checkins traffic, so a later checkin may already fix this, but there should be a versionchanged tag in the docs to note when the autojunk parameter was added. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Nov 27 13:22:50 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Nov 2010 22:22:50 +1000 Subject: [Python-Dev] [Python-checkins] r86750 - python/branches/py3k/Demo/curses/life.py In-Reply-To: <20101126021524.GA1450@rubuntu> References: <20101125145644.D98FAEEA26@mail.python.org> <4CEF0E3B.2070608@netwok.org> <20101126021524.GA1450@rubuntu> Message-ID: On Fri, Nov 26, 2010 at 12:15 PM, Senthil Kumaran wrote: >> Re: ?colour?: the rest of the file use US English, as do the function >> names (see for example curses.has_color). ?It?s good to use one dialect >> consistently in one file. > > Good catch. Did not realize it because, we write it as colour too. > Changing it. I just resign myself to having to spell words like colour and serialise wrong when I'm working on Python. Compared to the adjustments the non-native English speakers have to make, I figure I'm getting off lightly ;) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Sat Nov 27 13:52:40 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 27 Nov 2010 12:52:40 +0000 Subject: [Python-Dev] [Python-checkins] r86750 - python/branches/py3k/Demo/curses/life.py In-Reply-To: References: <20101125145644.D98FAEEA26@mail.python.org> <4CEF0E3B.2070608@netwok.org> <20101126021524.GA1450@rubuntu> Message-ID: <4CF0FF18.4030408@voidspace.org.uk> On 27/11/2010 12:22, Nick Coghlan wrote: > On Fri, Nov 26, 2010 at 12:15 PM, Senthil Kumaran wrote: >>> Re: ?colour?: the rest of the file use US English, as do the function >>> names (see for example curses.has_color). It?s good to use one dialect >>> consistently in one file. >> Good catch. Did not realize it because, we write it as colour too. >> Changing it. > I just resign myself to having to spell words like colour and > serialise wrong when I'm working on Python. Compared to the > adjustments the non-native English speakers have to make, I figure I'm > getting off lightly ;) > I *thought* that the Python policy was that English speakers wrote documentation in English and American speakers wrote documentation in American and that we *don't* insist on US spellings in the Python documentation? Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From eliben at gmail.com Sat Nov 27 14:00:27 2010 From: eliben at gmail.com (Eli Bendersky) Date: Sat, 27 Nov 2010 15:00:27 +0200 Subject: [Python-Dev] [Python-checkins] r86745 - in python/branches/py3k: Doc/library/difflib.rst Lib/difflib.py Lib/test/test_difflib.py Misc/NEWS In-Reply-To: References: <20101125061234.F1CC3EEA23@mail.python.org> Message-ID: On Sat, Nov 27, 2010 at 14:17, Nick Coghlan wrote: > On Thu, Nov 25, 2010 at 4:12 PM, terry.reedy > wrote: > > The :class:`SequenceMatcher` class has this constructor: > > > > > > -.. class:: SequenceMatcher(isjunk=None, a='', b='') > > +.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True) > > > > Optional argument *isjunk* must be ``None`` (the default) or a > one-argument > > function that takes a sequence element and returns true if and only if > the > > @@ -340,6 +349,9 @@ > > The optional arguments *a* and *b* are sequences to be compared; both > default to > > empty strings. The elements of both sequences must be > :term:`hashable`. > > > > + The optional argument *autojunk* can be used to disable the automatic > junk > > + heuristic. > > + > > Catching up on checkins traffic, so a later checkin may already fix > this, but there should be a versionchanged tag in the docs to note > when the autojunk parameter was added. > Hi Nick, Since autojunk was added in 2.7.1 (the docs of which do indicate this is the versionchanged tag), I think Terry may have left the tag in 3.2 out on purpose. That said, personally I don't know what the policy is regarding features added just in 3.2 and 2.7 (and didn't exist in 3.1) in this respect. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at voidspace.org.uk Sat Nov 27 14:02:36 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 27 Nov 2010 13:02:36 +0000 Subject: [Python-Dev] [Python-checkins] r86745 - in python/branches/py3k: Doc/library/difflib.rst Lib/difflib.py Lib/test/test_difflib.py Misc/NEWS In-Reply-To: References: <20101125061234.F1CC3EEA23@mail.python.org> Message-ID: <4CF1016C.8050902@voidspace.org.uk> On 27/11/2010 13:00, Eli Bendersky wrote: > On Sat, Nov 27, 2010 at 14:17, Nick Coghlan > wrote: > > On Thu, Nov 25, 2010 at 4:12 PM, terry.reedy > > > wrote: > > The :class:`SequenceMatcher` class has this constructor: > > > > > > -.. class:: SequenceMatcher(isjunk=None, a='', b='') > > +.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True) > > > > Optional argument *isjunk* must be ``None`` (the default) or > a one-argument > > function that takes a sequence element and returns true if > and only if the > > @@ -340,6 +349,9 @@ > > The optional arguments *a* and *b* are sequences to be > compared; both default to > > empty strings. The elements of both sequences must be > :term:`hashable`. > > > > + The optional argument *autojunk* can be used to disable the > automatic junk > > + heuristic. > > + > > Catching up on checkins traffic, so a later checkin may already fix > this, but there should be a versionchanged tag in the docs to note > when the autojunk parameter was added. > > > Hi Nick, > > Since autojunk was added in 2.7.1 (the docs of which do indicate this > is the versionchanged tag), I think Terry may have left the tag in 3.2 > out on purpose. That said, personally I don't know what the policy is > regarding features added just in 3.2 and 2.7 (and didn't exist in 3.1) > in this respect. Features new in Python 3.2 that didn't exist in 3.1 should have a versionadded:: 3.2 tag. Michael > > Eli > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at voidspace.org.uk Sat Nov 27 15:01:22 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 27 Nov 2010 14:01:22 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> Message-ID: <4CF10F32.9020805@voidspace.org.uk> On 27/11/2010 10:51, Nick Coghlan wrote: > On Thu, Nov 25, 2010 at 3:41 AM, Michael Foord > wrote: >> Can you explain what you see as the difference? >> >> I'm not particularly interested in type validation but I like the fact that >> typical enum APIs allow you to group constants: the generated constant class >> acts as a namespace for all the defined constants. > The problem with blessing one particular "enum API" is that people > have so many different ideas as to what an enum API should look like. > There actually seemed to be quite a bit of agreement around basic functionality though. > However, the one thing they all have in common is the ability to take > a value and give it a name, then present *both* of those in debugging > information. And this is the most important functionality. I would say that the grouping (namespacing) of constants is also useful, provided by *most* Python enum APIs and easy to implement without over complexifying the API. (Note that there is no *particular* hurry to get this into 3.2 - the beta is due imminently. I wouldn't object to it ) >> Are you just suggesting something along the lines of: >> >> class NamedConstant(int): >> def __new__(cls, name, val): >> return int.__new__(cls, val) >> >> def __init__(self, name, val): >> self._name = name >> >> def __repr__(self): >> return ' ' % self._name >> >> FOO = NamedConstant('FOO', 3) >> >> In general the less features the better, but I'd like a few more features >> than that. :-) > Not quite. I'm suggesting a factory function that works for any value, > and derives the parent class from the type of the supplied value. > However, what you wrote is still the essence of the idea - we would be > primarily providing a building block that makes it easier for people > to *create* enum APIs if they want to, but for simple use cases (where > all they really wanted was the enhanced debugging information) they > wouldn't need to bother. In the standard library, wherever we do > "enum-like things" we would switch to using named values where it > makes sense to do so. > > Doing so may actually make sense for more than just constants - it may > make sense for significant mutable globals as well. Very interesting proposal (typed named values rather than just named constants). It doesn't handle flag values, which I would still like, but that only really makes sense for integers (sets can be OR'd but their representation is already understandable). Perhaps the integer named type could be special cased for that. Without the grouping functionality (associating a bunch of names together) you lose the 'from_name' functionality. Guido was in favour of this, and it is an obvious feature where you have grouping: http://mail.python.org/pipermail/python-dev/2010-November/105912.html """I expect that the API to convert between enums and bare ints should be i = int(e) and e = (i). It would be nice if s = str(e) and e = (s) would work too.""" This wouldn't work with your suggested implementation (as it is). Grouping and mutable "named values" could be inefficient and have issues around identity / equality. Maybe restrict the API to the immutable primitives. All the best, Michael > ========================================================================== > # Implementation (more than just a sketch, since it handles some > interesting corner cases) > import functools > @functools.lru_cache() > def _make_named_value_type(base_type): > class _NamedValueType(base_type): > def __new__(cls, name, value): > return base_type.__new__(cls, value) > def __init__(self, name, value): > self.__name = name > super().__init__(value) > @property > def _name(self): > return self.__name > def _raw(self): > return base_type(self) > def __repr__(self): > return "{}={}".format(self._name, super().__repr__()) > if base_type.__str__ is object.__str__: > __str__ = base_type.__repr__ > _NamedValueType.__name__ = "Named<{}>".format(base_type.__name__) > return _NamedValueType > > def named_value(name, value): > return _make_named_value_type(type(value))(name, value) > > def set_named_values(namespace, **kwds): > for k, v in kwds.items(): > namespace[k] = named_value(k, v) > > x = named_value("FOO", 1) > y = named_value("BAR", "Hello World!") > z = named_value("BAZ", dict(a=1, b=2, c=3)) > > print(x, y, z, sep="\n") > print("\n".join(map(repr, (x, y, z)))) > print("\n".join(map(str, map(type, (x, y, z))))) > > set_named_values(globals(), foo=x._raw(), bar=y._raw(), baz=z._raw()) > print("\n".join(map(repr, (foo, bar, baz)))) > print(type(x) is type(foo), type(y) is type(bar), type(z) is type(baz)) > > ========================================================================== > > # Session output for the last 6 lines >>>> print(x, y, z, sep="\n") > 1 > Hello World! > {'a': 1, 'c': 3, 'b': 2} > >>>> print("\n".join(map(repr, (x, y, z)))) > FOO=1 > BAR='Hello World!' > BAZ={'a': 1, 'c': 3, 'b': 2} > >>>> print("\n".join(map(str, map(type, (x, y, z))))) > '> > '> > '> > >>>> set_named_values(globals(), foo=x._raw(), bar=y._raw(), baz=z._raw()) >>>> print("\n".join(map(repr, (foo, bar, baz)))) > foo=1 > bar='Hello World!' > baz={'a': 1, 'c': 3, 'b': 2} > >>>> print(type(x) is type(foo), type(y) is type(bar), type(z) is type(baz)) > True True True > > For "normal" use, such objects would look like ordinary instances of > their class. They would only behave differently when their > representation is printed (prepending their name), or when their type > is interrogated (being an instance of the named subclass rather than > the ordinary type). > > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From ncoghlan at gmail.com Sat Nov 27 15:58:08 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Nov 2010 00:58:08 +1000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF10F32.9020805@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF10F32.9020805@voidspace.org.uk> Message-ID: On Sun, Nov 28, 2010 at 12:01 AM, Michael Foord wrote: > Very interesting proposal (typed named values rather than just named > constants). It doesn't handle flag values, which I would still like, but > that only really makes sense for integers (sets can be OR'd but their > representation is already understandable). Perhaps the integer named type > could be special cased for that. > > Without the grouping functionality (associating a bunch of names together) > you lose the 'from_name' functionality. Guido was in favour of this, and it > is an obvious feature where you have grouping: > http://mail.python.org/pipermail/python-dev/2010-November/105912.html > > """I expect that the API to convert between enums and bare ints should be > i = int(e) and e = (i). It would be nice if s = str(e) and > e = (s) would work too.""" Note that the "i = int(e)" and "s = str(e)" parts of Guido's expectation do work (they are, in fact, the underling implementation of the _raw() method), so an enum class would only be needed to provide the other half of the equation. The named values have no opinion on equivalence at all (they just defer to the parent class), but change the rules for identity (which are always murky anyway, since caching is optional even for immutable types). > This wouldn't work with your suggested implementation (as it is). Grouping > and mutable "named values" could be inefficient and have issues around > identity / equality. Maybe restrict the API to the immutable primitives. My proposal doesn't say anything about grouping at all - it's just an idea for "here's a standard way to associate a canonical name with a particular object, independent of the namespaces that happen to reference that object". Now, a particular *grouping* API may want to restrict itself in various ways, but that's my point. We should be looking at a standard solution for the ground level problem (i.e. the idea named_value attempts to solve) and then let various 3rd party enum/name grouping implementations flourish on top of that, rather than trying to create an all-singing all-dancing "value grouping" API (which is going to be far more intrusive than a simple API for "here's a way to give your constants and important data structures names that show up in their representations"). For example, using named_value as a primitive, you can fairly easily do: class Namegroup: # Missing lots of niceties of a real enum class, but shows the idea # as to how a real implementation could leverage named_value def __init__(self, _groupname, **kwds): self._groupname = _groupname pattern = _groupname + ".{}" self._value_map = {} for k, v in kwds.items(): attr = named_value(pattern.format(k), v) setattr(self, k, attr) self._value_map[v] = attr @classmethod def from_names(cls, groupname, *args): kwds = dict(zip(args, range(len(args)))) return cls(groupname, **kwds) def __call__(self, arg): return self._value_map[arg] silly = Namegroup.from_names("Silly", "FOO", "BAR", "BAZ") >>> silly.FOO Silly.FOO=0 >>> int(silly.FOO) 0 >>> silly(0) Silly.FOO=0 named_value deals with all the stuff to do with pretending to be the original type of object (only with an associated name), leaving the grouping API to deal with issues of creating groups of names and mapping between them and the original values in various ways. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Nov 27 16:04:17 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Nov 2010 01:04:17 +1000 Subject: [Python-Dev] [Python-checkins] r86750 - python/branches/py3k/Demo/curses/life.py In-Reply-To: <4CF0FF18.4030408@voidspace.org.uk> References: <20101125145644.D98FAEEA26@mail.python.org> <4CEF0E3B.2070608@netwok.org> <20101126021524.GA1450@rubuntu> <4CF0FF18.4030408@voidspace.org.uk> Message-ID: On Sat, Nov 27, 2010 at 10:52 PM, Michael Foord wrote: >> I just resign myself to having to spell words like colour and >> serialise wrong when I'm working on Python. Compared to the >> adjustments the non-native English speakers have to make, I figure I'm >> getting off lightly ;) >> > > I *thought* that the Python policy was that English speakers wrote > documentation in English and American speakers wrote documentation in > American and that we *don't* insist on US spellings in the Python > documentation? If we're just talking about those things in generally, then that's a reasonable rule. But when in close proximity to an actual API that uses the American spelling, or modifying a file that uses the relevant word a lot, following the prevailing style is a definite courtesy to the reader. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Sat Nov 27 16:07:18 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 27 Nov 2010 15:07:18 +0000 Subject: [Python-Dev] [Python-checkins] r86750 - python/branches/py3k/Demo/curses/life.py In-Reply-To: References: <20101125145644.D98FAEEA26@mail.python.org> <4CEF0E3B.2070608@netwok.org> <20101126021524.GA1450@rubuntu> <4CF0FF18.4030408@voidspace.org.uk> Message-ID: <4CF11EA6.8050409@voidspace.org.uk> On 27/11/2010 15:04, Nick Coghlan wrote: > On Sat, Nov 27, 2010 at 10:52 PM, Michael Foord > wrote: >>> I just resign myself to having to spell words like colour and >>> serialise wrong when I'm working on Python. Compared to the >>> adjustments the non-native English speakers have to make, I figure I'm >>> getting off lightly ;) >>> >> I *thought* that the Python policy was that English speakers wrote >> documentation in English and American speakers wrote documentation in >> American and that we *don't* insist on US spellings in the Python >> documentation? > If we're just talking about those things in generally, then that's a > reasonable rule. But when in close proximity to an actual API that > uses the American spelling, or modifying a file that uses the relevant > word a lot, following the prevailing style is a definite courtesy to > the reader. > Ok, thanks. Sounds like a good guideline. Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From ncoghlan at gmail.com Sat Nov 27 16:07:35 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Nov 2010 01:07:35 +1000 Subject: [Python-Dev] [Python-checkins] r86745 - in python/branches/py3k: Doc/library/difflib.rst Lib/difflib.py Lib/test/test_difflib.py Misc/NEWS In-Reply-To: <4CF1016C.8050902@voidspace.org.uk> References: <20101125061234.F1CC3EEA23@mail.python.org> <4CF1016C.8050902@voidspace.org.uk> Message-ID: On Sat, Nov 27, 2010 at 11:02 PM, Michael Foord wrote: > Features new in Python 3.2 that didn't exist in 3.1 should have a > versionadded:: 3.2 tag. As Michael said, from a docs point of view, the version flow is independent: "2.6 -> 2.7" and "3.1 -> 3.2". The issue has really only come up with this release, since there was no intervening 2.x release between 3.0 and 3.1. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From barry at python.org Sat Nov 27 19:22:16 2010 From: barry at python.org (Barry Warsaw) Date: Sat, 27 Nov 2010 13:22:16 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF10F32.9020805@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF10F32.9020805@voidspace.org.uk> Message-ID: <20101127132216.533f7332@mission> On Nov 27, 2010, at 02:01 PM, Michael Foord wrote: >(Note that there is no *particular* hurry to get this into 3.2 - the beta is >due imminently. I wouldn't object to it ) Indeed. I don't think the time is right to try to get this into 3.2. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From anurag.chourasia at gmail.com Sat Nov 27 19:45:44 2010 From: anurag.chourasia at gmail.com (Anurag Chourasia) Date: Sun, 28 Nov 2010 00:15:44 +0530 Subject: [Python-Dev] Python make fails with error "Fatal Python error: Interpreter not initialized (version mismatch?)" Message-ID: Hi All, During the make step of python, I am encountering a weird error. This is on AIX 5.3 using gcc as the compiler. My configuration options are as follows ./configure --enable-shared --disable-ipv6 --with-gcc=gcc CPPFLAGS="-I /opt/freeware/include -I /opt/freeware/include/readline -I /opt/freeware/include/ncurses" LDFLAGS="-L. -L/usr/local/lib" Below is the transcript from the make step. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ running build running build_ext ldd: /lib/libreadline.a: File is an archive. INFO: Can't locate Tcl/Tk libs and/or headers building '_struct' extension gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I. -I/u01/home/apli/wm/GDD/Python-2.6.6/./Include -I. -IInclude -I./Include -I/opt/freeware/include -I/opt/freeware/include/readline -I/opt/freeware/include/ncurses -I/usr/local/include -I/u01/home/apli/wm/GDD/Python-2.6.6/Include -I/u01/home/apli/wm/GDD/Python-2.6.6 -c /u01/home/apli/wm/GDD/Python-2.6.6/Modules/_struct.c -o build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/_struct.o ./Modules/ld_so_aix gcc -pthread -bI:Modules/python.exp -L. -L/usr/local/lib build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/_struct.o -L. -L/usr/local/lib -lpython2.6 -o build/lib.aix-5.3-2.6/_struct.so *Fatal Python error: Interpreter not initialized (version mismatch?)* *make: 1254-059 The signal code from the last command is 6.* ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ The last command that i see above (ld_so_aix) seems to have completed as the file _struct.so exists after this command and hence I am not sure which step is failing. There is no other Python version on my machine. Please guide. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Nov 27 21:50:11 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 27 Nov 2010 15:50:11 -0500 Subject: [Python-Dev] [Python-checkins] r86745 - in python/branches/py3k: Doc/library/difflib.rst Lib/difflib.py Lib/test/test_difflib.py Misc/NEWS In-Reply-To: References: <20101125061234.F1CC3EEA23@mail.python.org> Message-ID: <4CF16F03.9060407@udel.edu> On 11/27/2010 7:17 AM, Nick Coghlan wrote: > On Thu, Nov 25, 2010 at 4:12 PM, terry.reedy wrote: >> The :class:`SequenceMatcher` class has this constructor: >> >> >> -.. class:: SequenceMatcher(isjunk=None, a='', b='') >> +.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True) >> >> Optional argument *isjunk* must be ``None`` (the default) or a one-argument >> function that takes a sequence element and returns true if and only if the >> @@ -340,6 +349,9 @@ >> The optional arguments *a* and *b* are sequences to be compared; both default to >> empty strings. The elements of both sequences must be :term:`hashable`. >> >> + The optional argument *autojunk* can be used to disable the automatic junk >> + heuristic. >> + > > Catching up on checkins traffic, so a later checkin may already fix > this, but there should be a versionchanged tag in the docs to note > when the autojunk parameter was added. Right. When S.C. forward-ported the 2.7 patch. he must have thought it not needed and I missed the difference between the diffs. Will add note in both places needed immediately. Terry From v+python at g.nevcal.com Sat Nov 27 21:56:14 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sat, 27 Nov 2010 12:56:14 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> Message-ID: <4CF1706E.5030503@g.nevcal.com> On 11/27/2010 2:51 AM, Nick Coghlan wrote: > Not quite. I'm suggesting a factory function that works for any value, > and derives the parent class from the type of the supplied value. Nick, thanks for the much better implementation than I achieved; you seem to have the same goals as my implementation. I learned a bit making mine, and more understanding yours to some degree. What I still don't understand about your implementation, is that when adding one additional line to your file, it fails: w = named_value("ABC", z ) Now I can understand why it might not be a good thing to make a named value of a named value (confusing, at least), but I was surprised, and still do not understand, that it failed reporting the __new__() takes exactly 3 arguments (2 given). -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Nov 27 23:11:44 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 28 Nov 2010 09:11:44 +1100 Subject: [Python-Dev] [Preview] Comments and change proposals on documentation In-Reply-To: References: Message-ID: <4CF18220.7000202@pearwood.info> Nick Coghlan wrote: > On Thu, Nov 25, 2010 at 6:24 AM, Georg Brandl wrote: >> Hi, >> >> at , you can look at a version of the 3.2 >> docs that has the upcoming commenting feature. JavaScript is mandatory. > > Very nice! > > I'm not sure what to do about the discoverability of the comment > bubbles as the end of each paragraph. I initially thought commenting > wasn't available on What's New or the Using Python docs until seeing > where the blue comment bubbles appeared in the math module docs. I wonder what the point of the comment bubbles is? This isn't a graphical UI where (contrary to popular opinion) a picture is *not* worth a thousand words, but may require a help-bubble to explain. This is text. If you want to make a comment on some text, the usual practice is to add more text :) I wasn't able to find a comment bubble that contained anything, so I don't know what sort of information you expect them to contain -- every one I tried said "0 comments". But it seems to me that comments are superfluous, if not actively harmful: (1) Anything important enough to tell the reader should be included in the text, where it can be easily seen, read and printed. (2) Discovery is lousy -- not only do you need to be running Javascript, which many people do not for performance, privacy and convenience[*], but you have to carefully mouse-over the paragraph just to see the blue bubble, and THEN you have to *precisely* mouse-over the bubble itself. (3) This will be a horrible and possibly even literally painful experience for anyone with a physical disability that makes precise positioning of the mouse difficult. (4) Accessibility for the blind and those using screen readers will probably be non-existent. (5) If the information in the comment bubbles is trivial enough that we're happy to say that the blind, the disabled and those who avoid Javascript don't need it, then perhaps *nobody* needs it. [*] In my experience, websites tend to fall into two basic categories: those that don't work at all without Javascript, and those that run better, faster, and with fewer anti-features and inconveniences without Javascript. -- Steven From g.brandl at gmx.net Sat Nov 27 23:37:29 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 27 Nov 2010 23:37:29 +0100 Subject: [Python-Dev] [Preview] Comments and change proposals on documentation In-Reply-To: <4CF18220.7000202@pearwood.info> References: <4CF18220.7000202@pearwood.info> Message-ID: Am 27.11.2010 23:11, schrieb Steven D'Aprano: > Nick Coghlan wrote: >> On Thu, Nov 25, 2010 at 6:24 AM, Georg Brandl wrote: >>> Hi, >>> >>> at , you can look at a version of the 3.2 >>> docs that has the upcoming commenting feature. JavaScript is mandatory. >> >> Very nice! >> >> I'm not sure what to do about the discoverability of the comment >> bubbles as the end of each paragraph. I initially thought commenting >> wasn't available on What's New or the Using Python docs until seeing >> where the blue comment bubbles appeared in the math module docs. > > I wonder what the point of the comment bubbles is? This isn't a > graphical UI where (contrary to popular opinion) a picture is *not* > worth a thousand words, but may require a help-bubble to explain. This > is text. If you want to make a comment on some text, the usual practice > is to add more text :) Yes, I already mentioned that the bubbles could be replaced by text links if they prove too confusing. > I wasn't able to find a comment bubble that contained anything, so I > don't know what sort of information you expect them to contain -- every > one I tried said "0 comments". Maybe you should have tried the page I recommended as a demo, and where Nick made his comments? :) > But it seems to me that comments are superfluous, if not actively harmful: (I've not read anything about harmful below. Was that just FUD?) > (1) Anything important enough to tell the reader should be included in > the text, where it can be easily seen, read and printed. Yes. There need to be ways for the reader to feed back to the author what they want to have included. Currently, this is I'm all for removing comments with suggestions once they have been integrated in the main text. > (2) Discovery is lousy -- not only do you need to be running Javascript, > which many people do not for performance, privacy and convenience[*], That is not an argument nowadays, seeing how many sites/web applications require JS. (Most people who deactivate JS globally maintain a whitelist anyway, and can easily add docs.python.org to that.) These comments are an optional feature and therefore do not need to be accessible for 100% of users. > but you have to carefully mouse-over the paragraph just to see the blue > bubble, and THEN you have to *precisely* mouse-over the bubble itself. Bubbles are always shown for paragraphs *with* comments. > (3) This will be a horrible and possibly even literally painful > experience for anyone with a physical disability that makes precise > positioning of the mouse difficult. You're making this point just because of the size of the bubbles? Well, these users can register on the site and there can be a user preference to display larger links instead (if we choose to keep the bubbles, anyway.) > (4) Accessibility for the blind and those using screen readers will > probably be non-existent. It will be the same as for other web apps using JavaScript. Since I'm not a professional user interface designer, I don't know what screen readers can and cannot do. > (5) If the information in the comment bubbles is trivial enough that > we're happy to say that the blind, the disabled and those who avoid > Javascript don't need it, then perhaps *nobody* needs it. Sorry, but that is a nonsensical argument. Apart from the questionable notion that anything must be available to everyone to be worth anything, it also doesn't consider that the comments are not only for fellow users: as I said above, the comments are designed to be a very quick way to give feedback to *us* developers. (This is the reason for the "propose a change" feature, for example.) So even if only 30% of all users had access to the comments and could use that to help us improve the documentation by submitting suggestions and corrections they never would have bothered registering in the tracker for, that would be a net gain. cheers, Georg From raymond.hettinger at gmail.com Sun Nov 28 00:26:13 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 27 Nov 2010 15:26:13 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF1706E.5030503@g.nevcal.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> Message-ID: <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> On Nov 27, 2010, at 12:56 PM, Glenn Linderman wrote: > On 11/27/2010 2:51 AM, Nick Coghlan wrote: >> >> Not quite. I'm suggesting a factory function that works for any value, >> and derives the parent class from the type of the supplied value. > > Nick, thanks for the much better implementation than I achieved; you seem to have the same goals as my implementation. I learned a bit making mine, and more understanding yours to some degree. What I still don't understand about your implementation, is that when adding one additional line to your file, it fails: > > w = named_value("ABC", z ) > > Now I can understand why it might not be a good thing to make a named value of a named value (confusing, at least), but I was surprised, and still do not understand, that it failed reporting the __new__() takes exactly 3 arguments (2 given). Can I suggest that an enum-maker be offered as a third-party module rather than prematurely adding it into the standard library. Raymond From steve at pearwood.info Sun Nov 28 00:58:52 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 28 Nov 2010 10:58:52 +1100 Subject: [Python-Dev] [Preview] Comments and change proposals on documentation In-Reply-To: References: <4CF18220.7000202@pearwood.info> Message-ID: <4CF19B3C.2000308@pearwood.info> Georg Brandl wrote: > Am 27.11.2010 23:11, schrieb Steven D'Aprano: >> I wasn't able to find a comment bubble that contained anything, so I >> don't know what sort of information you expect them to contain -- every >> one I tried said "0 comments". > > Maybe you should have tried the page I recommended as a demo, and where Nick > made his comments? :) Aha! I never would have guessed that the bubbles are clickable -- I thought you just moused-over them and they showed static comments put there by the developers, part of the documentation itself. I didn't realise that it was for users to add spam^W comments to the page. With that perspective, I need to rethink. Yes, I failed to fully read the instructions you sent, or understand them. That's what users do -- they don't read your instructions, and they misunderstand them. If your UI isn't easily discoverable, users will not be able to use it, and will be frustrated and annoyed. The user is always right, even when they're doing it wrong *wink* >> But it seems to me that comments are superfluous, if not actively harmful: > > (I've not read anything about harmful below. Was that just FUD?) Lowering accessibility to parts of the documentation is what I was talking about when I said "actively harmful". But now that I have better understanding of what the comment system is actually for, I have to rethink. -- Steven From glenn at nevcal.com Sun Nov 28 02:04:49 2010 From: glenn at nevcal.com (Glenn Linderman) Date: Sat, 27 Nov 2010 17:04:49 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF1706E.5030503@g.nevcal.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> Message-ID: <4CF1AAB1.4010808@nevcal.com> On 11/27/2010 12:56 PM, Glenn Linderman wrote: > On 11/27/2010 2:51 AM, Nick Coghlan wrote: >> Not quite. I'm suggesting a factory function that works for any value, >> and derives the parent class from the type of the supplied value. > > Nick, thanks for the much better implementation than I achieved; you > seem to have the same goals as my implementation. I learned a bit > making mine, and more understanding yours to some degree. What I > still don't understand about your implementation, is that when adding > one additional line to your file, it fails: > > w = named_value("ABC", z ) > > Now I can understand why it might not be a good thing to make a named > value of a named value (confusing, at least), but I was surprised, and > still do not understand, that it failed reporting the __new__() takes > exactly 3 arguments (2 given). OK, I puzzled out the error, and here is a "cure" of sorts. def __new__(cls, name, value): try: return base_type.__new__(cls, value) except TypeError: return base_type.__new__(cls, name, value) def __init__(self, name, value): self.__name = name try: super().__init__(value) except TypeError: super().__init__(name, value) Probably it would be better for the except clause to raise a different type of error ( Can't recursively create named value ) or to cleverly bypass the intermediate named value, and simply apply a new name to the original value. Hmm... For this, only __new__ need be changed: def __new__(cls, name, value): try: return base_type.__new__(cls, value) except TypeError: return _make_named_value_type( type( value._raw() ))( name, value._raw() ) def __init__(self, name, value): self.__name = name super().__init__(value) Thanks for not responding too quickly, I figured out more, and learned more. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Nov 28 03:38:27 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Nov 2010 12:38:27 +1000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> Message-ID: On Sun, Nov 28, 2010 at 9:26 AM, Raymond Hettinger wrote: > > On Nov 27, 2010, at 12:56 PM, Glenn Linderman wrote: > >> On 11/27/2010 2:51 AM, Nick Coghlan wrote: >>> >>> Not quite. I'm suggesting a factory function that works for any value, >>> and derives the parent class from the type of the supplied value. >> >> Nick, thanks for the much better implementation than I achieved; you seem to have the same goals as my implementation. ?I learned a bit ? ? making mine, and more understanding yours to some degree. ?What I still don't understand about your implementation, is that when adding one additional line to your file, it fails: >> >> w = named_value("ABC", z ) >> >> Now I can understand why it might not be a good thing to make a named value of a named value (confusing, at least), but I was surprised, and still do not understand, that it failed reporting the __new__() takes exactly 3 arguments (2 given). > > Can I suggest that an enum-maker be offered as a third-party module rather than prematurely adding it into the standard library. Indeed. Glenn's failing example suggests to me that using a new metaclass is probably going to be a cleaner option than trying to dance around type's default behaviour within an ordinary class definition (if nothing else, a separate metaclass makes it much easier to detect when you're dealing with an instance of a named type). Regardless, I still see value in approaching this whole discussion as a two-level design problem, with "named values" as the more fundamental concept, and then higher level grouping APIs to get enum-style behaviour. Eventually attaining "One Obvious Way" for the former seems achievable to me, while the diversity of use cases for grouping APIs suggests to me that "one-size-fits-all" isn't going to work unless that "one size" is a Frankenstein API with more options than anyone could reasonably hope to keep in their head at once. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From tjreedy at udel.edu Sun Nov 28 04:20:50 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 27 Nov 2010 22:20:50 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> Message-ID: On 11/27/2010 6:26 PM, Raymond Hettinger wrote: > Can I suggest that an enum-maker be offered as a third-party module Possibly with competing versions for trial and testing ;-) > rather than prematurely adding it into the standard library. I had same thought. -- Terry Jan Reedy From donjohnston at selfaware.com Sun Nov 28 05:17:11 2010 From: donjohnston at selfaware.com (Don Johnston) Date: Sun, 28 Nov 2010 04:17:11 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?=5BPreview=5D_Comments_and_change_proposal?= =?utf-8?q?s=09on=09documentation?= References: <4CF18220.7000202@pearwood.info> <4CF19B3C.2000308@pearwood.info> Message-ID: Steven D'Aprano pearwood.info> writes: > Aha! I never would have guessed that the bubbles are clickable -- I > thought you just moused-over them and they showed static comments put > there by the developers, part of the documentation itself. I didn't > realise that it was for users to add spam^W comments to the page. With > that perspective, I need to rethink. > > Yes, I failed to fully read the instructions you sent, or understand > them. That's what users do -- they don't read your instructions, and > they misunderstand them. If your UI isn't easily discoverable, users > will not be able to use it, and will be frustrated and annoyed. The user > is always right, even when they're doing it wrong *wink* > > > >> But it seems to me that comments are superfluous, if not actively harmful: > > > > (I've not read anything about harmful below. Was that just FUD?) > > Lowering accessibility to parts of the documentation is what I was > talking about when I said "actively harmful". But now that I have better > understanding of what the comment system is actually for, I have to rethink. > As an end-user, I, too, share concerns about the accessibility of the pending (proposed?) commenting functionality. A read-only JSON API would be great. Up until now, Sphinx has been an incredibly helpful tool for generating beautiful documentation from ReStructuredText, which is great for limiting the risk of malformed input. The new commenting feature ("dynamic application functionality") requires persistence for user-submitted content. Database persistence is currently implemented with the -excellent- SQLAlchemy ORM. So, this is a transition from Sphinx being an excellent publishing tool to being a dynamic publishing platform for user-submitted content ("comments"). I am sure this was not without due consideration, and FUD. The Python Web Framework communities (favorite framework *here*) will be the first to reiterate the challenges that all web application developers (and commenting API providers) face on a daily basis: - SQL Injection - XSS (Cross Site Scripting) - CSRF (Cross Site Request Forgery) Here are a few scenarios to consider: (1) Freeloading jackass decides that each paragraph of our documentation would look better with 200 "comments" for viagara. Freeloading jackass is aware of how HTTP GETs work. - What markup features are supported? - How does the application sanitize user-supplied input? - Is html5lib good enough? - On docs.python.org, how are 1000 inappropriate (freeloading) comments from 1000 different IPs deleted? - What's the roadmap for {..., Akismet, ReCaptcha, ...} support? (2) Freeloading jackass buys a block of javascript adspace on . The block of javascript surreptitiously posts helpful comments on behalf of unwitting users. - How does the application ensure that comments are submitted from the site hosting the documentation? - Which frameworks have existing, reviewed CSRF protections? Trying to read through the new source here [1], but there aren't many docstrings and BB doesn't yet support inline commenting. AFAIK, there are not yet any issues filed for these concerns. [2] 1. In the event that that kind of bug is discovered, how should the community report the issues? 2. If we have an alternate method of encouraging documentation feedback, how can this feature be turned off? Thanks again for a great publishing tool, Don [1] http://bitbucket.org/birkenfeld/sphinx [2] http://bitbucket.org/birkenfeld/sphinx/issues/new From benjamin at python.org Sun Nov 28 05:33:43 2010 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 27 Nov 2010 22:33:43 -0600 Subject: [Python-Dev] [RELEASED] Python 2.7.1 Message-ID: On behalf of the Python development team, I'm happy as a clam to announce the immediate availability of Python 2.7.1. 2.7 includes many features that were first released in Python 3.1. The faster io module, the new nested with statement syntax, improved float repr, set literals, dictionary views, and the memoryview object have been backported from 3.1. Other features include an ordered dictionary implementation, unittests improvements, a new sysconfig module, auto-numbering of fields in the str/unicode format method, and support for ttk Tile in Tkinter. For a more extensive list of changes in 2.7, see http://doc.python.org/dev/whatsnew/2.7.html or Misc/NEWS in the Python distribution. To download Python 2.7.1 visit: http://www.python.org/download/releases/2.7.1/ The 2.7.1 changelog is at: http://svn.python.org/projects/python/tags/r271/Misc/NEWS 2.7 documentation can be found at: http://docs.python.org/2.7/ This is a production release. Please report any bugs you find to the bug tracker: http://bugs.python.org/ Enjoy! -- Benjamin Peterson Release Manager benjamin at python.org (on behalf of the entire python-dev team and 2.7.1's contributors) From benjamin at python.org Sun Nov 28 05:34:42 2010 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 27 Nov 2010 22:34:42 -0600 Subject: [Python-Dev] [RELEASED] Python 3.1.3 Message-ID: On behalf of the Python development team, I'm happy as a lark to announce the third bugfix release for the Python 3.1 series, Python 3.1.3. This bug fix release features numerous bug fixes and documentation improvements over 3.1.2. The Python 3.1 version series focuses on the stabilization and optimization of the features and changes that Python 3.0 introduced. For example, the new I/O system has been rewritten in C for speed. File system APIs that use unicode strings now handle paths with undecodable bytes in them. Other features include an ordered dictionary implementation, a condensed syntax for nested with statements, and support for ttk Tile in Tkinter. For a more extensive list of changes in 3.1, see http://doc.python.org/3.1/whatsnew/3.1.html or Misc/NEWS in the Python distribution. This is a production release. To download Python 3.1.3 visit: http://www.python.org/download/releases/3.1.3/ A list of changes in 3.1.3 can be found here: http://svn.python.org/projects/python/tags/r313/Misc/NEWS The 3.1 documentation can be found at: http://docs.python.org/3.1 Bugs can always be reported to: http://bugs.python.org Enjoy! -- Benjamin Peterson Release Manager benjamin at python.org (on behalf of the entire python-dev team and 3.1.3's contributors) From martin at v.loewis.de Sun Nov 28 09:09:53 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 28 Nov 2010 09:09:53 +0100 Subject: [Python-Dev] Virus on python-3.1.2.msi? Message-ID: <4CF20E51.3050004@v.loewis.de> Issue 1050 claims that the 3.1.2 installer has the virus Palevo.DZ. Can somebody with a virus scanner please confirm or contest that claim? Thanks, Martin http://bugs.python.org/issue10500 From fuzzyman at voidspace.org.uk Sun Nov 28 14:48:08 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 28 Nov 2010 13:48:08 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> Message-ID: <4CF25D98.10105@voidspace.org.uk> On 28/11/2010 03:20, Terry Reedy wrote: > On 11/27/2010 6:26 PM, Raymond Hettinger wrote: > >> Can I suggest that an enum-maker be offered as a third-party module > > Possibly with competing versions for trial and testing ;-) > >> rather than prematurely adding it into the standard library. > > I had same thought. > There are already *several* enum packages for Python available. The implementation by Ben Finney, associated with the previous PEP, is on PyPI and the most recent release has over 4000 downloads making it reasonably popular: http://pypi.python.org/pypi/enum/ Other contenders include flufl.enum and lazr.enum. The Twisted guys would like a named constant type, and have a ticket for it, and PyQt has its own implementation (subclassing int) providing this functionality. In terms of assessing *general* usefulness in the wider community that step has already been done. This discussion came out of yet-another-set-of-integer-constants being added to the Python standard library (since changed to strings). We have integer constants, with the associated inscrutability when used from the interactive interpreter or debugging, in *many* standard library modules. The particular features and use cases being discussed have use *within* the standard library in mind. Releasing yet-another-enum-library-that-the-standard-library-can't-use would be a particularly pointless outcome of this discussion. The decision is whether or not to use named constants in the standard library, otherwise we can just point people at one of the several existing packages. All the best, Michael Foord -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From doko at ubuntu.com Sun Nov 28 16:46:09 2010 From: doko at ubuntu.com (Matthias Klose) Date: Sun, 28 Nov 2010 16:46:09 +0100 Subject: [Python-Dev] Question about GDB bindings and 32/64 bits In-Reply-To: <4CEF338C.4070509@jcea.es> References: <4CEF338C.4070509@jcea.es> Message-ID: <4CF27941.1020200@ubuntu.com> On 26.11.2010 05:11, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I have installed GDB 7.2 32 bits and 32 bits buildslaves are green. > Nevertheless 64 bits buildslaves are failing test_gdb. > > Is there any expectation that a 32 bits GDB be able to debug a 64 bits > python?. If not, gdb test should compare "platform.architecture()" (for > python and gdb in the system) and run only when they are the same. that would be too restrictive, as an 64bit gdb is able to handle 32bit binaries too. > If > this should work, I would open a bug and maybe spend some time with it. > > But before thinking about investing time, I would like to know if this > mix is actually expected or not to work. > > If not, I would consider to install a 64 bits GDB too and do some tricks > (like using an "/usr/local/bin/gdb" script wrapper to choose 32/64 > "real" gdb version) to actually execute "test_gdb" in both buildslaves > (they are running in the same physical machine). yes, and then you should be able to use this gdb for both 32 and 64bit builds. No need for a wrapper (Such a gdb is available in the gdb64 package on Debian/Ubuntu). Matthias From fuzzyman at voidspace.org.uk Sun Nov 28 17:28:00 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 28 Nov 2010 16:28:00 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> Message-ID: <4CF28310.7070304@voidspace.org.uk> On 28/11/2010 02:38, Nick Coghlan wrote: > On Sun, Nov 28, 2010 at 9:26 AM, Raymond Hettinger > wrote: >> On Nov 27, 2010, at 12:56 PM, Glenn Linderman wrote: >> >>> On 11/27/2010 2:51 AM, Nick Coghlan wrote: >>>> Not quite. I'm suggesting a factory function that works for any value, >>>> and derives the parent class from the type of the supplied value. >>> Nick, thanks for the much better implementation than I achieved; you seem to have the same goals as my implementation. I learned a bit making mine, and more understanding yours to some degree. What I still don't understand about your implementation, is that when adding one additional line to your file, it fails: >>> >>> w = named_value("ABC", z ) >>> >>> Now I can understand why it might not be a good thing to make a named value of a named value (confusing, at least), but I was surprised, and still do not understand, that it failed reporting the __new__() takes exactly 3 arguments (2 given). >> Can I suggest that an enum-maker be offered as a third-party module rather than prematurely adding it into the standard library. > Indeed. Glenn's failing example suggests to me that using a new > metaclass is probably going to be a cleaner option than trying to > dance around type's default behaviour within an ordinary class > definition (if nothing else, a separate metaclass makes it much easier > to detect when you're dealing with an instance of a named type). > Yep, for representing a group of names a single class with a metaclass seems like a reasonable approach. See my note below about agreeing minimal feature-set and minimal-api before we discuss implementation though. > Regardless, I still see value in approaching this whole discussion as > a two-level design problem, with "named values" as the more > fundamental concept, and then higher level grouping APIs to get > enum-style behaviour. It seems like using the term "enum" provokes a strong negative reaction in some of the core-devs who are basically in favour named constants and not actively against grouping. I'm happy with NamedConstant and GroupedNames (or similar) and dropping the use of the term enum. There are also valid concerns about over-engineering (and not so valid concerns...). Simplicity in creating them and no additional burden in using them are fundamental, but in the APIs / implementations suggested so far I think we are keeping that in mind. > Eventually attaining "One Obvious Way" for the > former seems achievable to me, while the diversity of use cases for > grouping APIs suggests to me that "one-size-fits-all" isn't going to > work unless that "one size" is a Frankenstein API with more options > than anyone could reasonably hope to keep in their head at once. > Well... yes - treating it as a two level design problem is fine. I don't think there are *many* competing features, in fact as far as feature requests on python-dev go I think this is a relatively straightforward one with a lot of *agreement* on the basic functionality. We have had various discussions about what the API should look like, or what the implementation should look like, but I don't think there is a lot of disagreement about basic features. There are some 'optional features'. Many of these can be added later without backwards compatibility issues, so those can profitably be omitted from an initial implementation. Features as I see them: Named constant -------------- * Nice repr * Subclass of the type it represents * Trivially easy to convert either to a string (name) and the value it represents * If an integer type, can be OR'd with other named constants and retains a useful repr Grouped constants ---------------- * Easy to create a group of named constants, accessible as attributes on group object * Capability to go from name or value to corresponding constants Optional Features --------------- * Ability to dynamically add new named values to a group. (Suggested by Guido) * Ability to test if a name or value is in a group * Ability to list all names in a group * ANDing as well as ORing * Constants are unique * OR'ing with an integer will look up the name (or calculate it if the int itself represents flags that have already been OR'd) and return a named value (with useful repr) instead of just an integer * Named constants be named values that can wrap *any* type and not just immutable values. (Note that wrapping mutable types makes providing "from_value" functionality harder *unless* we guarantee that named values are unique. If they aren't unique named values for a mutable type can have different values and there is no single definition of what the named value actually is.) Requiring that values only have one name - or alternatively that values on a group could have multiple names (obviously incompatible features). * Requiring all names in a group to be of the same type * Allow names to be set automatically in a namespace, for example in a class namespace or on a module * Allow subclassing and adding of new values only present in subclass I'd rather we agree a suitable (minimal) API and feature set and go to implementation from that. For wrapping mutable types I'm tempted to say YAGNI. For the standard library wrapping integers meets almost all our use-cases except for one float. (At work we have a decimal constant as it happens.) Perhaps we could require immutable types for groups but allow arbitrary values for individual named values? For the named values api: name = NamedValue('name', value) For the grouping (tentatively accepted as reasonable by Antoine): Group = make_constants('Group', name1=value1, name2=value2) name1, name2 = Group.name1, Group.name1 flag = name1 | name2 value = int(Group.name1) name = Group('name1') # alternatively: value = Group.from_name('name1') name = Group.from_value(value1) # Group(value1) could work only if values aren't strings # perhaps: name = Group(value=value1) Group.new_name = value3 # create new value on the group names = Group.all_names() # further bikeshedding on spelling of all_names required # correspondingly 'all_values' I guess, returning the constants themselves Some of the optional features couldn't later be added without backwards compatibility concerns (I think the type checking features and requiring unique values for example). We should at least consider these if we are to make adding them later difficult. I would be fine with not having these features. All the best, Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fuzzyman at voidspace.org.uk Sun Nov 28 18:05:12 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 28 Nov 2010 17:05:12 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF28310.7070304@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> <4CF28310.7070304@voidspace.org.uk> Message-ID: <4CF28BC8.1080508@voidspace.org.uk> On 28/11/2010 16:28, Michael Foord wrote: > [snip...] > I don't think there are *many* competing features, in fact as far as > feature requests on python-dev go I think this is a relatively > straightforward one with a lot of *agreement* on the basic functionality. > > We have had various discussions about what the API should look like, > or what the implementation should look like, but I don't think there > is a lot of disagreement about basic features. There are some > 'optional features'. Many of these can be added later without > backwards compatibility issues, so those can profitably be omitted > from an initial implementation. > > Features as I see them: > > Named constant > -------------- > > * Nice repr > * Subclass of the type it represents > * Trivially easy to convert either to a string (name) and the value it > represents > * If an integer type, can be OR'd with other named constants and > retains a useful repr > Note that having an OR repr is meaningless *unless* the constants are intended to be flags, OR'ing should be specified. name = NamedValue('name', value, flags=True) Where flags defaults to False. Typically you will use this through the grouping API anyway - where it can either be a keyword argument (slightly annoying because the suggestion is to create the named values through keyword arguments) or we can have two group-factory functions: Group = make_constants('Group', name1=value1, name2=value2) Flags = make_flags('Flags', name1=value1, name2=value2) It is sensible if flag values are only powers of 2; we could enforce that or not... (Another one for the optional feature list.) I forgot auto-enumeration (specifying names only and having values autogenerated) from the optional feature set by the way. I think Antoine strongly disapproves of this feature because it reminds him of C enums. Mark Dickinson thinks that the flags feature could be an optional feature too. If we have ORing it makes sense to have ANDing, so I guess they belong together. I think there is value in it though. I realise that the optional feature list is now not small, and implementing all of it would create the "franken-api" Nick is worried about. The minimal feature list is nicely small though and provides useful functionality. All the best, Michael > > Grouped constants > ---------------- > * Easy to create a group of named constants, accessible as attributes > on group object > * Capability to go from name or value to corresponding constants > > > Optional Features > --------------- > > * Ability to dynamically add new named values to a group. (Suggested > by Guido) > * Ability to test if a name or value is in a group > * Ability to list all names in a group > * ANDing as well as ORing > * Constants are unique > * OR'ing with an integer will look up the name (or calculate it if the > int itself represents flags that have already been OR'd) and return a > named value (with useful repr) instead of just an integer > * Named constants be named values that can wrap *any* type and not > just immutable values. (Note that wrapping mutable types makes > providing "from_value" functionality harder *unless* we guarantee that > named values are unique. If they aren't unique named values for a > mutable type can have different values and there is no single > definition of what the named value actually is.) > Requiring that values only have one name - or alternatively that > values on a group could have multiple names (obviously incompatible > features). > * Requiring all names in a group to be of the same type > * Allow names to be set automatically in a namespace, for example in a > class namespace or on a module > * Allow subclassing and adding of new values only present in subclass > > > I'd rather we agree a suitable (minimal) API and feature set and go to > implementation from that. > > For wrapping mutable types I'm tempted to say YAGNI. For the standard > library wrapping integers meets almost all our use-cases except for > one float. (At work we have a decimal constant as it happens.) Perhaps > we could require immutable types for groups but allow arbitrary values > for individual named values? > > For the named values api: > > name = NamedValue('name', value) > > For the grouping (tentatively accepted as reasonable by Antoine): > > Group = make_constants('Group', name1=value1, name2=value2) > name1, name2 = Group.name1, Group.name1 > flag = name1 | name2 > > value = int(Group.name1) > name = Group('name1') > # alternatively: value = Group.from_name('name1') > name = Group.from_value(value1) > # Group(value1) could work only if values aren't strings > # perhaps: name = Group(value=value1) > > Group.new_name = value3 # create new value on the group > names = Group.all_names() > # further bikeshedding on spelling of all_names required > # correspondingly 'all_values' I guess, returning the constants > themselves > > Some of the optional features couldn't later be added without > backwards compatibility concerns (I think the type checking features > and requiring unique values for example). We should at least consider > these if we are to make adding them later difficult. I would be fine > with not having these features. > > All the best, > > Michael > >> Cheers, >> Nick. >> > > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fuzzyman at voidspace.org.uk Sun Nov 28 18:16:21 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 28 Nov 2010 17:16:21 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF28BC8.1080508@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> <4CF28310.7070304@voidspace.org.uk> <4CF28BC8.1080508@voidspace.org.uk> Message-ID: <4CF28E65.2060405@voidspace.org.uk> On 28/11/2010 17:05, Michael Foord wrote: > [snip...] > It is sensible if flag values are only powers of 2; we could enforce > that or not... (Another one for the optional feature list.) > Another 'optional' feature I omitted was Phillip J. Eby's suggestion / requirement that named values be pickleable. Email is clunky for handling this, is there enough support (there is still some objection that is sure) to revive the PEP or create a new one? I also didn't include Nick's suggested API, which is slightly different from the one I suggested: silly = Namegroup.from_names("Silly", "FOO", "BAR", "BAZ") >>> silly.FOO Silly.FOO=0 >>> int(silly.FOO) 0 >>> silly(0) Silly.FOO=0 x = named_value("FOO", 1) y = named_value("BAR", "Hello World!") z = named_value("BAZ", dict(a=1, b=2, c=3)) set_named_values(globals(), foo=x._raw(), bar=y._raw(), baz=z._raw()) Where a named value created from an integer is an int subclass, from a dict a dict subclass and so on. Michael > I forgot auto-enumeration (specifying names only and having values > autogenerated) from the optional feature set by the way. I think > Antoine strongly disapproves of this feature because it reminds him of > C enums. > > Mark Dickinson thinks that the flags feature could be an optional > feature too. If we have ORing it makes sense to have ANDing, so I > guess they belong together. I think there is value in it though. > > I realise that the optional feature list is now not small, and > implementing all of it would create the "franken-api" Nick is worried > about. The minimal feature list is nicely small though and provides > useful functionality. > > All the best, > > Michael > >> >> Grouped constants >> ---------------- >> * Easy to create a group of named constants, accessible as attributes >> on group object >> * Capability to go from name or value to corresponding constants >> >> >> Optional Features >> --------------- >> >> * Ability to dynamically add new named values to a group. (Suggested >> by Guido) >> * Ability to test if a name or value is in a group >> * Ability to list all names in a group >> * ANDing as well as ORing >> * Constants are unique >> * OR'ing with an integer will look up the name (or calculate it if >> the int itself represents flags that have already been OR'd) and >> return a named value (with useful repr) instead of just an integer >> * Named constants be named values that can wrap *any* type and not >> just immutable values. (Note that wrapping mutable types makes >> providing "from_value" functionality harder *unless* we guarantee >> that named values are unique. If they aren't unique named values for >> a mutable type can have different values and there is no single >> definition of what the named value actually is.) >> Requiring that values only have one name - or alternatively that >> values on a group could have multiple names (obviously incompatible >> features). >> * Requiring all names in a group to be of the same type >> * Allow names to be set automatically in a namespace, for example in >> a class namespace or on a module >> * Allow subclassing and adding of new values only present in subclass >> >> >> I'd rather we agree a suitable (minimal) API and feature set and go >> to implementation from that. >> >> For wrapping mutable types I'm tempted to say YAGNI. For the standard >> library wrapping integers meets almost all our use-cases except for >> one float. (At work we have a decimal constant as it happens.) >> Perhaps we could require immutable types for groups but allow >> arbitrary values for individual named values? >> >> For the named values api: >> >> name = NamedValue('name', value) >> >> For the grouping (tentatively accepted as reasonable by Antoine): >> >> Group = make_constants('Group', name1=value1, name2=value2) >> name1, name2 = Group.name1, Group.name1 >> flag = name1 | name2 >> >> value = int(Group.name1) >> name = Group('name1') >> # alternatively: value = Group.from_name('name1') >> name = Group.from_value(value1) >> # Group(value1) could work only if values aren't strings >> # perhaps: name = Group(value=value1) >> >> Group.new_name = value3 # create new value on the group >> names = Group.all_names() >> # further bikeshedding on spelling of all_names required >> # correspondingly 'all_values' I guess, returning the constants >> themselves >> >> Some of the optional features couldn't later be added without >> backwards compatibility concerns (I think the type checking features >> and requiring unique values for example). We should at least consider >> these if we are to make adding them later difficult. I would be fine >> with not having these features. >> >> All the best, >> >> Michael >> >>> Cheers, >>> Nick. >>> >> >> > > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From steve at pearwood.info Sun Nov 28 19:05:55 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 29 Nov 2010 05:05:55 +1100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF28E65.2060405@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> <4CF28310.7070304@voidspace.org.uk> <4CF28BC8.1080508@voidspace.org.uk> <4CF28E65.2060405@voidspace.org.uk> Message-ID: <4CF29A03.3060900@pearwood.info> Michael Foord wrote: > Another 'optional' feature I omitted was Phillip J. Eby's suggestion / > requirement that named values be pickleable. Email is clunky for > handling this, is there enough support (there is still some objection > that is sure) to revive the PEP or create a new one? I think it definitely needs a PEP. I don't care whether you revive the old PEP or write a new one. -- Steven From fuzzyman at voidspace.org.uk Sun Nov 28 19:49:30 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 28 Nov 2010 18:49:30 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF29A03.3060900@pearwood.info> References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> <4CF28310.7070304@voidspace.org.uk> <4CF28BC8.1080508@voidspace.org.uk> <4CF28E65.2060405@voidspace.org.uk> <4CF29A03.3060900@pearwood.info> Message-ID: <4CF2A43A.5040009@voidspace.org.uk> On 28/11/2010 18:05, Steven D'Aprano wrote: > Michael Foord wrote: > >> Another 'optional' feature I omitted was Phillip J. Eby's suggestion >> / requirement that named values be pickleable. Email is clunky for >> handling this, is there enough support (there is still some objection >> that is sure) to revive the PEP or create a new one? > > I think it definitely needs a PEP. I don't care whether you revive the > old PEP or write a new one. > Well, "if it were to be accepted it would need a PEP" and "the next step should be a PEP" are slightly different statements. :-) As I agree with the former *anyway* at the worst starting a PEP will waste time, so I guess I'll get that underway when I get a chance... Thanks Michael -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From alexander.belopolsky at gmail.com Sun Nov 28 21:24:37 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 15:24:37 -0500 Subject: [Python-Dev] Python and the Unicode Character Database Message-ID: Two recently reported issues brought into light the fact that Python language definition is closely tied to character properties maintained by the Unicode Consortium. [1,2] For example, when Python switches to Unicode 6.0.0 (planned for the upcoming 3.2 release), we will gain two additional characters that Python can use in identifiers. [3] With Python 3.1: >>> exec('\u0CF1 = 1') Traceback (most recent call last): File " ", line 1, in File " ", line 1 ? = 1 ^ SyntaxError: invalid character in identifier but with Python 3.2a4: >>> exec('\u0CF1 = 1') >>> eval('\u0CF1') 1 Of course, the likelihood is low that this change will affect any user, but the change in str.isspace() reported in [1] is likely to cause some trouble: Python 2.6.5: >>> u'A\u200bB'.split() [u'A', u'B'] Python 2.7: >>> u'A\u200bB'.split() [u'A\u200bB'] While we have little choice but to follow UCD in defining str.isidentifier(), I think Python can promise users more stability in what it treats as space or as a digit in its builtins. For example, I don't think that supporting >>> float('????.??') 1234.56 is more important than to assure users that once their program accepted some text as a number, they can assume that the text is ASCII. [1] http://bugs.python.org/issue10567 [2] http://bugs.python.org/issue10557 [3] http://www.unicode.org/versions/Unicode6.0.0/#Database_Changes From solipsis at pitrou.net Sun Nov 28 21:43:11 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 28 Nov 2010 21:43:11 +0100 Subject: [Python-Dev] Python and the Unicode Character Database References: Message-ID: <20101128214311.092abd35@pitrou.net> On Sun, 28 Nov 2010 15:24:37 -0500 Alexander Belopolsky wrote: > While we have little choice but to follow UCD in defining > str.isidentifier(), I think Python can promise users more stability in > what it treats as space or as a digit in its builtins. Well, if "unicode support" means "support the latest version of the Unicode standard", I'm not sure we have a choice. We can make exceptions, but that would only confuse users even more, wouldn't it? > For example, > I don't think that supporting > > >>> float('????.??') > 1234.56 > > is more important than to assure users that once their program > accepted some text as a number, they can assume that the text is > ASCII. Why would they assume the text is ASCII? Regards Antoine. From alexander.belopolsky at gmail.com Sun Nov 28 21:58:33 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 15:58:33 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <20101128214311.092abd35@pitrou.net> References: <20101128214311.092abd35@pitrou.net> Message-ID: On Sun, Nov 28, 2010 at 3:43 PM, Antoine Pitrou wrote: .. >> For example, >> I don't think that supporting >> >> >>> float('????.??') >> 1234.56 >> >> is more important than to assure users that once their program >> accepted some text as a number, they can assume that the text is >> ASCII. > > Why would they assume the text is ASCII? def deposit(self, amountstr): self.balance += float(amountstr) audit_log("Deposited: " + amountstr) Auditor: $ cat numbered-account.log Deposited: ?????.?? ... From solipsis at pitrou.net Sun Nov 28 22:04:15 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 28 Nov 2010 22:04:15 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> Message-ID: <20101128220415.28b77508@pitrou.net> On Sun, 28 Nov 2010 15:58:33 -0500 Alexander Belopolsky wrote: > On Sun, Nov 28, 2010 at 3:43 PM, Antoine Pitrou wrote: > .. > >> For example, > >> I don't think that supporting > >> > >> >>> float('????.??') > >> 1234.56 > >> > >> is more important than to assure users that once their program > >> accepted some text as a number, they can assume that the text is > >> ASCII. > > > > Why would they assume the text is ASCII? > > def deposit(self, amountstr): > self.balance += float(amountstr) > audit_log("Deposited: " + amountstr) > > Auditor: > > $ cat numbered-account.log > Deposited: ?????.?? I'm not sure that's how banking applications are written :) Antoine. From jsbueno at python.org.br Sun Nov 28 22:12:09 2010 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Sun, 28 Nov 2010 19:12:09 -0200 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <20101128220415.28b77508@pitrou.net> References: <20101128214311.092abd35@pitrou.net> <20101128220415.28b77508@pitrou.net> Message-ID: On Sun, Nov 28, 2010 at 7:04 PM, Antoine Pitrou wrote: > On Sun, 28 Nov 2010 15:58:33 -0500 > Alexander Belopolsky wrote: > >> On Sun, Nov 28, 2010 at 3:43 PM, Antoine Pitrou wrote: >> .. >> >> For example, >> >> I don't think that supporting >> >> >> >> >>> float('????.??') >> >> 1234.56 >> >> >> >> is more important than to assure users that once their program >> >> accepted some text as a number, they can assume that the text is >> >> ASCII. >> > >> > Why would they assume the text is ASCII? >> >> def deposit(self, amountstr): >> ? ? ? self.balance += float(amountstr) >> ? ? ? audit_log("Deposited: " + amountstr) >> >> Auditor: >> >> $ cat numbered-account.log >> Deposited: ?????.?? > > > I'm not sure that's how banking applications are written :) > +1 for this being bogus - I see no correlation whatsoever in numbers inside unicode having to be "ASCII" if we have surpassed all technical barriers for needing to behave like that. ASCII is an oversimplification of human communication needed for computing devices not complex enough to represent it fully. Let novice C programmers in English speaking countries deal with the fact that 1 character is not 1 byte anymore. We are past this point. js -><- > Antoine. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/jsbueno%40python.org.br > From alexander.belopolsky at gmail.com Sun Nov 28 22:18:06 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 16:18:06 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <20101128220415.28b77508@pitrou.net> Message-ID: On Sun, Nov 28, 2010 at 4:12 PM, Joao S. O. Bueno wrote: .. > Let novice C programmers in English speaking countries deal with the > fact that 1 character is not 1 byte anymore. We are past this point. If you are, please contribute your expertise here: http://bugs.python.org/issue2382 From greg.ewing at canterbury.ac.nz Sun Nov 28 22:23:56 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 29 Nov 2010 10:23:56 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEE5C1C.9000905@btinternet.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CEDDC2D.204@canterbury.ac.nz> <4CEE5C1C.9000905@btinternet.com> Message-ID: <4CF2C86C.9030505@canterbury.ac.nz> Rob Cliffe wrote: > But couldn't they be presented to the Python programmer as a single > type, with the implementation details hidden "under the hood"? Not in CPython, because tuple items are kept in the same block of memory as the object header. Because CPython can't move objects, this means that the size of the tuple must be known when the object is created. -- Greg From martin at v.loewis.de Sun Nov 28 23:17:13 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 28 Nov 2010 23:17:13 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <20101128214311.092abd35@pitrou.net> References: <20101128214311.092abd35@pitrou.net> Message-ID: <4CF2D4E9.3060607@v.loewis.de> >>>>> float('????.??') >> 1234.56 I think it's a bug that this works. The definition of the float builtin says Convert a string or a number to floating point. If the argument is a string, it must contain a possibly signed decimal or floating point number, possibly embedded in whitespace. The argument may also be '[+|-]nan' or '[+|-]inf'. Now, one may wonder what precisely a "possibly signed floating point number" is, but most likely, this refers to floatnumber ::= pointfloat | exponentfloat pointfloat ::= [intpart] fraction | intpart "." exponentfloat ::= (intpart | pointfloat) exponent intpart ::= digit+ fraction ::= "." digit+ exponent ::= ("e" | "E") ["+" | "-"] digit+ digit ::= "0"..."9" Regards, Martin From alexander.belopolsky at gmail.com Sun Nov 28 23:31:51 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 17:31:51 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2D4E9.3060607@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> Message-ID: On Sun, Nov 28, 2010 at 5:17 PM, "Martin v. L?wis" wrote: >>>>>> float('????.??') >>> 1234.56 > > I think it's a bug that this works. The definition of the float builtin says > > Convert a string or a number to floating point. If the argument is a > string, it must contain a possibly signed decimal or floating point > number, possibly embedded in whitespace. The argument may also be > '[+|-]nan' or '[+|-]inf'. > This definition fails long before we get beyond 127-th code point: >>> float('infinity') inf From mal at egenix.com Sun Nov 28 23:42:31 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 28 Nov 2010 23:42:31 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2D4E9.3060607@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> Message-ID: <4CF2DAD7.2000408@egenix.com> "Martin v. L?wis" wrote: >>>>>> float('????.??') >>> 1234.56 > > I think it's a bug that this works. The definition of the float builtin says > > Convert a string or a number to floating point. If the argument is a > string, it must contain a possibly signed decimal or floating point > number, possibly embedded in whitespace. The argument may also be > '[+|-]nan' or '[+|-]inf'. > > Now, one may wonder what precisely a "possibly signed floating point > number" is, but most likely, this refers to > > floatnumber ::= pointfloat | exponentfloat > pointfloat ::= [intpart] fraction | intpart "." > exponentfloat ::= (intpart | pointfloat) exponent > intpart ::= digit+ > fraction ::= "." digit+ > exponent ::= ("e" | "E") ["+" | "-"] digit+ > digit ::= "0"..."9" I don't see why the language spec should limit the wealth of number formats supported by float(). It is not uncommon for Asians and other non-Latin script users to use their own native script symbols for numbers. Just because these digits may look strange to someone doesn't mean that they are meaningless or should be discarded. Please also remember that Python3 now allows Unicode names for identifiers for much the same reasons. Note that the support in float() (and the other numeric constructors) to work with Unicode code points was explicitly added when Unicode support was added to Python and has been available since Python 1.6. It is not a bug by any definition of "bug", even though the feature may bug someone occasionally to go read up a bit on what else the world has to offer other than Arabic numerals :-) http://en.wikipedia.org/wiki/Numeral_system -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 28 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Sun Nov 28 23:48:59 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 28 Nov 2010 23:48:59 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: Message-ID: <4CF2DC5B.4020702@egenix.com> Alexander Belopolsky wrote: > Two recently reported issues brought into light the fact that Python > language definition is closely tied to character properties maintained > by the Unicode Consortium. [1,2] For example, when Python switches to > Unicode 6.0.0 (planned for the upcoming 3.2 release), we will gain two > additional characters that Python can use in identifiers. [3] > > With Python 3.1: > >>>> exec('\u0CF1 = 1') > Traceback (most recent call last): > File " ", line 1, in > File " ", line 1 > ? = 1 > ^ > SyntaxError: invalid character in identifier > > but with Python 3.2a4: > >>>> exec('\u0CF1 = 1') >>>> eval('\u0CF1') > 1 Such changes are not new, but I agree that they should probably be highlighted in the "What's new in Python x.x". > Of course, the likelihood is low that this change will affect any > user, but the change in str.isspace() reported in [1] is likely to > cause some trouble: > > Python 2.6.5: >>>> u'A\u200bB'.split() > [u'A', u'B'] > > Python 2.7: >>>> u'A\u200bB'.split() > [u'A\u200bB'] That's a classical bug fix. > While we have little choice but to follow UCD in defining > str.isidentifier(), I think Python can promise users more stability in > what it treats as space or as a digit in its builtins. Why should we divert from the work done by the Unicode Consortium ? After all, most of their changes are in fact bug fixes as well. > For example, > I don't think that supporting > >>>> float('????.??') > 1234.56 > > is more important than to assure users that once their program > accepted some text as a number, they can assume that the text is > ASCII. Sorry, but I don't agree. If ASCII numerals are an important aspect of an application, the application should make sure that only those numerals are used (e.g. by using a regular expression for checking). In a Unicode world, not accepting non-Arabic numerals would be a limitation, not a feature. Besides Python has had this support since Python 1.6. > [1] http://bugs.python.org/issue10567 > [2] http://bugs.python.org/issue10557 > [3] http://www.unicode.org/versions/Unicode6.0.0/#Database_Changes -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 28 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From alexander.belopolsky at gmail.com Sun Nov 28 23:51:00 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 17:51:00 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2DAD7.2000408@egenix.com> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> Message-ID: On Sun, Nov 28, 2010 at 5:42 PM, M.-A. Lemburg wrote: .. > I don't see why the language spec should limit the wealth of number > formats supported by float(). > The Language Spec (whatever it is) should not, but hopefully the Library Reference should. If you follow http://docs.python.org/dev/py3k/library/functions.html#float link and the references therein, you'll end up with digit ::= "0"..."9" http://docs.python.org/dev/py3k/reference/lexical_analysis.html#grammar-token-digit From martin at v.loewis.de Sun Nov 28 23:56:47 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 28 Nov 2010 23:56:47 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> Message-ID: <4CF2DE2F.5040405@v.loewis.de> Am 28.11.2010 23:31, schrieb Alexander Belopolsky: > On Sun, Nov 28, 2010 at 5:17 PM, "Martin v. L?wis" wrote: >>>>>>> float('????.??') >>>> 1234.56 >> >> I think it's a bug that this works. The definition of the float builtin says >> >> Convert a string or a number to floating point. If the argument is a >> string, it must contain a possibly signed decimal or floating point >> number, possibly embedded in whitespace. The argument may also be >> '[+|-]nan' or '[+|-]inf'. >> > > This definition fails long before we get beyond 127-th code point: > >>>> float('infinity') > inf What do infer from that? That the definition is wrong, or the code is wrong? Regards, Martin From tjreedy at udel.edu Mon Nov 29 00:00:25 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 28 Nov 2010 18:00:25 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> Message-ID: On 11/28/2010 3:58 PM, Alexander Belopolsky wrote: > On Sun, Nov 28, 2010 at 3:43 PM, Antoine Pitrou wrote: > .. >>> For example, >>> I don't think that supporting >>> >>>>>> float('????.??') >>> 1234.56 Even if this is somehow an accident or something that someone snuck in, I think it a good idea that *users* be able to input amounts with their native digits. That is different from requiring *programmers* to write literals with euro-ascii-digits >>> is more important than to assure users that once their program >>> accepted some text as a number, they can assume that the text is >>> ASCII. >> >> Why would they assume the text is ASCII? > > def deposit(self, amountstr): > self.balance += float(amountstr) > audit_log("Deposited: " + amountstr) If the programmer want to assure ascii, he can produce a string, possible formatted, from the amount depform = "Deposited: ${:14.2f}".format def deposit(self, amountstr): amount = float(amountstr) self.balance += amount # audit_log("Deposited: " + str(amount) # simple version audit_log(depform(amount)) Given that amountstr could be something like ' 182.33 ', I think programmer should plan to format it. -- Terry Jan Reedy From alexander.belopolsky at gmail.com Mon Nov 29 00:01:10 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 18:01:10 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2DE2F.5040405@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DE2F.5040405@v.loewis.de> Message-ID: On Sun, Nov 28, 2010 at 5:56 PM, "Martin v. L?wis" wrote: .. >> This definition fails long before we get beyond 127-th code point: >> >>>>> float('infinity') >> inf > > What do infer from that? That the definition is wrong, or the code is wrong? The development version of the reference manual is more detailed, but as far as I can tell, it still defines digit as 0-9. http://docs.python.org/dev/py3k/library/functions.html#float From martin at v.loewis.de Mon Nov 29 00:03:45 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 29 Nov 2010 00:03:45 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2DAD7.2000408@egenix.com> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> Message-ID: <4CF2DFD1.10901@v.loewis.de> >> Now, one may wonder what precisely a "possibly signed floating point >> number" is, but most likely, this refers to >> >> floatnumber ::= pointfloat | exponentfloat >> pointfloat ::= [intpart] fraction | intpart "." >> exponentfloat ::= (intpart | pointfloat) exponent >> intpart ::= digit+ >> fraction ::= "." digit+ >> exponent ::= ("e" | "E") ["+" | "-"] digit+ >> digit ::= "0"..."9" > > I don't see why the language spec should limit the wealth of number > formats supported by float(). If it doesn't, there should be some other specification of what is correct and what is not. It must not be unspecified. > It is not uncommon for Asians and other non-Latin script users to > use their own native script symbols for numbers. Just because these > digits may look strange to someone doesn't mean that they are > meaningless or should be discarded. Then these users should speak up and indicate their need, or somebody should speak up and confirm that there are users who actually want '????.??' to denote 1234.56. To my knowledge, there is no writing system in which '????.??e4' means 12345600.0. > Please also remember that Python3 now allows Unicode names for > identifiers for much the same reasons. No no no. Addition of Unicode identifiers has a well-designed, deliberate specification, with a PEP and all. The support for non-ASCII digits in float appears to be ad-hoc, and not founded on actual needs of actual users. > Note that the support in float() (and the other numeric constructors) > to work with Unicode code points was explicitly added when Unicode > support was added to Python and has been available since Python 1.6. That doesn't necessarily make it useful. Alexander's complaint is that it makes Python unstable (i.e. changing as the UCD changes). > It is not a bug by any definition of "bug" Most certainly it is: the documentation is either underspecified, or deviates from the implementation (when taking the most plausible interpretation). This is the very definition of "bug". Regards, Martin From tjreedy at udel.edu Mon Nov 29 00:03:30 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 28 Nov 2010 18:03:30 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> Message-ID: On 11/28/2010 5:51 PM, Alexander Belopolsky wrote: > The Language Spec (whatever it is) should not, but hopefully the > Library Reference should. If you follow > http://docs.python.org/dev/py3k/library/functions.html#float link and > the references therein, you'll end up with > > digit ::= "0"..."9" > > http://docs.python.org/dev/py3k/reference/lexical_analysis.html#grammar-token-digit So fix the doc for builtin float() and perhaps int(). -- Terry Jan Reedy From alexander.belopolsky at gmail.com Mon Nov 29 00:05:56 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 18:05:56 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2DFD1.10901@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <4CF2DFD1.10901@v.loewis.de> Message-ID: +1 on all point below. On Sun, Nov 28, 2010 at 6:03 PM, "Martin v. L?wis" wrote: >>> Now, one may wonder what precisely a "possibly signed floating point >>> number" is, but most likely, this refers to >>> >>> floatnumber ? ::= ?pointfloat | exponentfloat >>> pointfloat ? ?::= ?[intpart] fraction | intpart "." >>> exponentfloat ::= ?(intpart | pointfloat) exponent >>> intpart ? ? ? ::= ?digit+ >>> fraction ? ? ?::= ?"." digit+ >>> exponent ? ? ?::= ?("e" | "E") ["+" | "-"] digit+ >>> digit ? ? ? ? ?::= ?"0"..."9" >> >> I don't see why the language spec should limit the wealth of number >> formats supported by float(). > > If it doesn't, there should be some other specification of what > is correct and what is not. It must not be unspecified. > >> It is not uncommon for Asians and other non-Latin script users to >> use their own native script symbols for numbers. Just because these >> digits may look strange to someone doesn't mean that they are >> meaningless or should be discarded. > > Then these users should speak up and indicate their need, or somebody > should speak up and confirm that there are users who actually want > '????.??' to denote 1234.56. To my knowledge, there is no writing > system in which '????.??e4' means 12345600.0. > >> Please also remember that Python3 now allows Unicode names for >> identifiers for much the same reasons. > > No no no. Addition of Unicode identifiers has a well-designed, > deliberate specification, with a PEP and all. The support for > non-ASCII digits in float appears to be ad-hoc, and not founded > on actual needs of actual users. > >> Note that the support in float() (and the other numeric constructors) >> to work with Unicode code points was explicitly added when Unicode >> support was added to Python and has been available since Python 1.6. > > That doesn't necessarily make it useful. Alexander's complaint is that > it makes Python unstable (i.e. changing as the UCD changes). > >> It is not a bug by any definition of "bug" > > Most certainly it is: the documentation is either underspecified, > or deviates from the implementation (when taking the most plausible > interpretation). This is the very definition of "bug". > > Regards, > Martin > From martin at v.loewis.de Mon Nov 29 00:08:29 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 00:08:29 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DE2F.5040405@v.loewis.de> Message-ID: <4CF2E0ED.1080807@v.loewis.de> Am 29.11.2010 00:01, schrieb Alexander Belopolsky: > On Sun, Nov 28, 2010 at 5:56 PM, "Martin v. L?wis" wrote: > .. >>> This definition fails long before we get beyond 127-th code point: >>> >>>>>> float('infinity') >>> inf >> >> What do infer from that? That the definition is wrong, or the code is wrong? > > The development version of the reference manual is more detailed, but > as far as I can tell, it still defines digit as 0-9. > > http://docs.python.org/dev/py3k/library/functions.html#float > I wasn't asking about 0..9, but about "infinity". According to the spec, it shouldn't accept that (and neither should it accept 'infinitY'). However, whether that's a spec bug or an implementation bug - it seems like a minor issue to me (i.e. easily fixed). Regards, Martin From alexander.belopolsky at gmail.com Mon Nov 29 00:12:44 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 18:12:44 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2DFD1.10901@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <4CF2DFD1.10901@v.loewis.de> Message-ID: On Sun, Nov 28, 2010 at 6:03 PM, "Martin v. L?wis" wrote: .. >> Note that the support in float() (and the other numeric constructors) >> to work with Unicode code points was explicitly added when Unicode >> support was added to Python and has been available since Python 1.6. > > That doesn't necessarily make it useful. Alexander's complaint is that > it makes Python unstable (i.e. changing as the UCD changes). > What makes it worse, is that while superficially, Unicode versions follow the same X.Y.Z format as Python versions, the stability promises are completely different. For example, it appears that the general category for the ZERO WIDTH SPACE was changed in Unicode 4.0.1. I don't think a change affecting str.split(), int(), float() and probably numerous other library functions would be acceptable in a Python micro release. From alexander.belopolsky at gmail.com Mon Nov 29 00:16:24 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 18:16:24 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2E0ED.1080807@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DE2F.5040405@v.loewis.de> <4CF2E0ED.1080807@v.loewis.de> Message-ID: On Sun, Nov 28, 2010 at 6:08 PM, "Martin v. L?wis" wrote: > Am 29.11.2010 00:01, schrieb Alexander Belopolsky: >> On Sun, Nov 28, 2010 at 5:56 PM, "Martin v. L?wis" wrote: >> .. >>>> This definition fails long before we get beyond 127-th code point: >>>> >>>>>>> float('infinity') >>>> inf >>> >>> What do infer from that? That the definition is wrong, or the code is wrong? >> >> The development version of the reference manual is more detailed, but >> as far as I can tell, it still defines digit as 0-9. >> >> http://docs.python.org/dev/py3k/library/functions.html#float >> > > I wasn't asking about 0..9, but about "infinity". According to the > spec, it shouldn't accept that (and neither should it accept > 'infinitY'). According to the link that I mentioned, infinity ::= "Infinity" | "inf" and "Case is not significant, so, for example, ?inf?, ?Inf?, ?INFINITY? and ?iNfINity? are all acceptable spellings for positive infinity." I completely agree with your arguments and the reference manual has been improved a lot in the recent years. From martin at v.loewis.de Mon Nov 29 00:19:54 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 00:19:54 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <4CF2DFD1.10901@v.loewis.de> Message-ID: <4CF2E39A.6060605@v.loewis.de> > What makes it worse, is that while superficially, Unicode versions > follow the same X.Y.Z format as Python versions, the stability > promises are completely different. For example, it appears that the > general category for the ZERO WIDTH SPACE was changed in Unicode > 4.0.1. I don't think a change affecting str.split(), int(), float() > and probably numerous other library functions would be acceptable in a > Python micro release. Well, we managed to completely break Unicode normalization between 2.6.5 and 2.6.6, due to a bug. You can see the Unicode Consortium's stability policy at http://unicode.org/policies/stability_policy.html In a sense, this is stronger than Python's backwards compatibility promises (which allow for certain incompatible changes to occur over time, whereas Unicode makes promises about all future versions). Regards, Martin From benjamin at python.org Mon Nov 29 00:23:01 2010 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 28 Nov 2010 17:23:01 -0600 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2DAD7.2000408@egenix.com> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> Message-ID: 2010/11/28 M.-A. Lemburg : > > > "Martin v. L?wis" wrote: >>>>>>> float('????.??') >>>> 1234.56 >> >> I think it's a bug that this works. The definition of the float builtin says >> >> Convert a string or a number to floating point. If the argument is a >> string, it must contain a possibly signed decimal or floating point >> number, possibly embedded in whitespace. The argument may also be >> '[+|-]nan' or '[+|-]inf'. >> >> Now, one may wonder what precisely a "possibly signed floating point >> number" is, but most likely, this refers to >> >> floatnumber ? ::= ?pointfloat | exponentfloat >> pointfloat ? ?::= ?[intpart] fraction | intpart "." >> exponentfloat ::= ?(intpart | pointfloat) exponent >> intpart ? ? ? ::= ?digit+ >> fraction ? ? ?::= ?"." digit+ >> exponent ? ? ?::= ?("e" | "E") ["+" | "-"] digit+ >> digit ? ? ? ? ?::= ?"0"..."9" > > I don't see why the language spec should limit the wealth of number > formats supported by float(). > > It is not uncommon for Asians and other non-Latin script users to > use their own native script symbols for numbers. Just because these > digits may look strange to someone doesn't mean that they are > meaningless or should be discarded. That's different. Python doesn't assign any semantic meaning to the characters in identifiers. The non-latin support for numerals, though, could change the meaning of a program dramatically and needs to be well-specified. Whether int() should do this is debatable. I, for one, think this kind of support belongs in the locale module. -- Regards, Benjamin From alexander.belopolsky at gmail.com Mon Nov 29 00:29:47 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 18:29:47 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2E39A.6060605@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <4CF2DFD1.10901@v.loewis.de> <4CF2E39A.6060605@v.loewis.de> Message-ID: On Sun, Nov 28, 2010 at 6:19 PM, "Martin v. L?wis" wrote: .. > You can see the Unicode Consortium's stability policy at > > http://unicode.org/policies/stability_policy.html > >From the link above: """ As more experience is gathered in implementing the characters, adjustments in the properties may become necessary. Examples of such properties include, but are not limited to, the following: General_Category ... """ > In a sense, this is stronger than Python's backwards compatibility > promises (which allow for certain incompatible changes to occur > over time, whereas Unicode makes promises about all future versions). I would say it is *different* and should be taken into account when tying language features to Unicode specifications. This was done in PEP 3131. Note that one of the stated objections was "Unicode is young; its problems are not yet well understood and solved;" (It is still true.) From martin at v.loewis.de Mon Nov 29 00:33:23 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 29 Nov 2010 00:33:23 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> Message-ID: <4CF2E6C3.3010009@v.loewis.de> >>>>>>> float('????.??') >>>> 1234.56 > > Even if this is somehow an accident or something that someone snuck in, > I think it a good idea that *users* be able to input amounts with their > native digits. That is different from requiring *programmers* to write > literals with euro-ascii-digits So one question is what kind of data float() is aimed at. I claim that it is about "programmer" data, not "user" data. If it supported "user" data, it probably would have to support "1,000" to denote 1e3 in the U.S., and denote 1e0 in Germany. Our users are generally confused on whether they should use th full stop or the comma as the decimal separator. As not even the locale-dependent issues are considered in float(), it is clear to me that entering local numbers cannot possibly be the objective of the function. Instead, following a wide-spread Python convention, it is meant to be the reverse of repr(). Can you name a single person who actually wants to write '????.??' as a number? I'm fairly skeptical that users of arabic-indic digits. Instead, http://en.wikipedia.org/wiki/Decimal_separator suggests that they would rather U+066B, i.e. '???????', which isn't supported by Python. Regards, Martin From martin at v.loewis.de Mon Nov 29 00:40:31 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 00:40:31 +0100 Subject: [Python-Dev] PEP 384 final review Message-ID: <4CF2E86F.5000606@v.loewis.de> I have now completed http://www.python.org/dev/peps/pep-0384/ Benjamin has volunteered to rule on this PEP. Please comment with any changes you want to see, or speak in favor or against this PEP. Regards, Martin From fuzzyman at voidspace.org.uk Mon Nov 29 00:44:50 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 28 Nov 2010 23:44:50 +0000 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2E6C3.3010009@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2E6C3.3010009@v.loewis.de> Message-ID: <4CF2E972.2040209@voidspace.org.uk> On 28/11/2010 23:33, "Martin v. L?wis" wrote: >>>>>>>> float('????.??') >>>>> 1234.56 >> Even if this is somehow an accident or something that someone snuck in, >> I think it a good idea that *users* be able to input amounts with their >> native digits. That is different from requiring *programmers* to write >> literals with euro-ascii-digits > So one question is what kind of data float() is aimed at. I claim that > it is about "programmer" data, not "user" data. If it supported "user" > data, it probably would have to support "1,000" to denote 1e3 in the > U.S., and denote 1e0 in Germany. Our users are generally confused > on whether they should use th full stop or the comma as the decimal > separator. > FWIW the C# equivalent is locale aware *unless* you pass in a specific culture. (System.Double.Parse): http://msdn.microsoft.com/en-us/library/fd84bdyt.aspx If you're not aware that your code may be run on non-US computers this is a trap for the unwary. If you *are* aware then it is very useful. An alternative overload allows you to specify the culture used to do the conversion: http://msdn.microsoft.com/en-us/library/t9ebt447.aspx Michael > As not even the locale-dependent issues are considered in float(), > it is clear to me that entering local numbers cannot possibly be > the objective of the function. > > Instead, following a wide-spread Python convention, it is meant to be > the reverse of repr(). > > Can you name a single person who actually wants to write '????.??' > as a number? I'm fairly skeptical that users of arabic-indic digits. > Instead, > > http://en.wikipedia.org/wiki/Decimal_separator > > suggests that they would rather U+066B, i.e. '???????', which isn't > supported by Python. > > Regards, > Martin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From alexander.belopolsky at gmail.com Mon Nov 29 00:56:00 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 18:56:00 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2DFD1.10901@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <4CF2DFD1.10901@v.loewis.de> Message-ID: On Sun, Nov 28, 2010 at 6:03 PM, "Martin v. L?wis" wrote: .. > No no no. Addition of Unicode identifiers has a well-designed, > deliberate specification, with a PEP and all. The support for > non-ASCII digits in float appears to be ad-hoc, and not founded > on actual needs of actual users. > I wonder how carefully right-to-left scripts were considered when PEP 3131 was discussed. Try the following on the python prompt: >>> ?= int('???') >>> ? 123 In my OSX Terminal window, entering ? flips the >>> prompt and the session looks like this: ('???')int = ? <<< From martin at v.loewis.de Mon Nov 29 00:59:12 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 29 Nov 2010 00:59:12 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2E972.2040209@voidspace.org.uk> References: <20101128214311.092abd35@pitrou.net> <4CF2E6C3.3010009@v.loewis.de> <4CF2E972.2040209@voidspace.org.uk> Message-ID: <4CF2ECD0.4000003@v.loewis.de> > FWIW the C# equivalent is locale aware *unless* you pass in a specific > culture. > (System.Double.Parse): That's not quite the equivalent of float(), I would say: this one apparently is locale-aware, so it is more the equivalent of locale.atof. The next question then is if it supports indo-arabic digits in any locale (or more specifically in an arabic locale). Regards, Martin From solipsis at pitrou.net Mon Nov 29 01:01:12 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 29 Nov 2010 01:01:12 +0100 Subject: [Python-Dev] Python and the Unicode Character Database References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> Message-ID: <20101129010112.343eaf64@pitrou.net> On Sun, 28 Nov 2010 17:23:01 -0600 Benjamin Peterson wrote: > 2010/11/28 M.-A. Lemburg : > > > > > > "Martin v. L?wis" wrote: > >>>>>>> float('????.??') > >>>> 1234.56 > >> > >> I think it's a bug that this works. The definition of the float builtin says > >> > >> Convert a string or a number to floating point. If the argument is a > >> string, it must contain a possibly signed decimal or floating point > >> number, possibly embedded in whitespace. The argument may also be > >> '[+|-]nan' or '[+|-]inf'. > >> > >> Now, one may wonder what precisely a "possibly signed floating point > >> number" is, but most likely, this refers to > >> > >> floatnumber ? ::= ?pointfloat | exponentfloat > >> pointfloat ? ?::= ?[intpart] fraction | intpart "." > >> exponentfloat ::= ?(intpart | pointfloat) exponent > >> intpart ? ? ? ::= ?digit+ > >> fraction ? ? ?::= ?"." digit+ > >> exponent ? ? ?::= ?("e" | "E") ["+" | "-"] digit+ > >> digit ? ? ? ? ?::= ?"0"..."9" > > > > I don't see why the language spec should limit the wealth of number > > formats supported by float(). > > > > It is not uncommon for Asians and other non-Latin script users to > > use their own native script symbols for numbers. Just because these > > digits may look strange to someone doesn't mean that they are > > meaningless or should be discarded. > > That's different. Python doesn't assign any semantic meaning to the > characters in identifiers. The non-latin support for numerals, though, > could change the meaning of a program dramatically and needs to be > well-specified. Whether int() should do this is debatable. Perhaps int(), float(), Decimal() and friends could take an optional parameter indicating whether non-ascii digits are considered. It would then satisfy all parties. Antoine. From martin at v.loewis.de Mon Nov 29 01:02:18 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 29 Nov 2010 01:02:18 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <4CF2DFD1.10901@v.loewis.de> Message-ID: <4CF2ED8A.2010503@v.loewis.de> Am 29.11.2010 00:56, schrieb Alexander Belopolsky: > On Sun, Nov 28, 2010 at 6:03 PM, "Martin v. L?wis" wrote: > .. >> No no no. Addition of Unicode identifiers has a well-designed, >> deliberate specification, with a PEP and all. The support for >> non-ASCII digits in float appears to be ad-hoc, and not founded >> on actual needs of actual users. >> > > I wonder how carefully right-to-left scripts were considered when PEP > 3131 was discussed. IIRC, some Hebrew users have spoken in favor of the PEP, despite the obvious difficulties it would create. I may misremember, but I think someone pointed out that they had these difficulties all the time, and that it wasn't really a burden. Unicode specifies that one should always use "logical order" in memory, and that's what the PEP does. Rendering is then a tool issue. Regards, Martin From alexander.belopolsky at gmail.com Mon Nov 29 01:04:53 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 19:04:53 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2ECD0.4000003@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2E6C3.3010009@v.loewis.de> <4CF2E972.2040209@voidspace.org.uk> <4CF2ECD0.4000003@v.loewis.de> Message-ID: On Sun, Nov 28, 2010 at 6:59 PM, "Martin v. L?wis" wrote: .. > The next question then is if it supports indo-arabic digits in any > locale (or more specifically in an arabic locale). And once you answered that question, does it support Devanagari or Bengali digits? And if so, an arbitrary mix of those and indo-arabic digits? From alexander.belopolsky at gmail.com Mon Nov 29 01:25:37 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 19:25:37 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <20101129010112.343eaf64@pitrou.net> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <20101129010112.343eaf64@pitrou.net> Message-ID: On Sun, Nov 28, 2010 at 7:01 PM, Antoine Pitrou wrote: .. >> That's different. Python doesn't assign any semantic meaning to the >> characters in identifiers. The non-latin support for numerals, though, >> could change the meaning of a program dramatically and needs to be >> well-specified. Whether int() should do this is debatable. > > Perhaps int(), float(), Decimal() and friends could take an optional > parameter indicating whether non-ascii digits are considered. It would > then satisfy all parties. What parties? I don't think anyone has claimed to actually have used non-ASCII digits with float(). Of course it is fun that Python can process Bengali numerals, but so would be allowing Roman numerals. There is a reason why after careful consideration, PEP 313 was ultimately rejected. BTW, it is common in Russia to specify months using roman numerals. Maybe we should consider allowing datetime.date() accept '1.IV.2011'. From fuzzyman at voidspace.org.uk Mon Nov 29 01:25:40 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 29 Nov 2010 00:25:40 +0000 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2ECD0.4000003@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2E6C3.3010009@v.loewis.de> <4CF2E972.2040209@voidspace.org.uk> <4CF2ECD0.4000003@v.loewis.de> Message-ID: <4CF2F304.60905@voidspace.org.uk> On 28/11/2010 23:59, "Martin v. L?wis" wrote: >> FWIW the C# equivalent is locale aware *unless* you pass in a specific >> culture. >> (System.Double.Parse): > That's not quite the equivalent of float(), I would say: this one > apparently is locale-aware, so it is more the equivalent of locale.atof. Right. It is *the* standard way of getting a float from a string though, whereas in Python we have two depending on whether or not you want to be locale aware. The standard way in C# is locale aware. To be non-locale aware you pass in a specific culture or number format. > The next question then is if it supports indo-arabic digits in any > locale (or more specifically in an arabic locale). I don't think so actually. The float parse formatting rules are defined like this: [ws][$][sign][integral-digits[,]]integral-digits[.[fractional-digits]][E[sign]exponential-digits][ws] (From http://msdn.microsoft.com/en-us/library/7yd1h1be.aspx ) integral-digits, fractional-digits and exponential-digits are all defined as "A series of digits ranging from 0 to 9". Arguably this is not be conclusive. In fact the NumberFormatInfo class seems to hint that it may be otherwise: http://msdn.microsoft.com/en-us/library/system.globalization.numberformatinfo.aspx See DigitSubstitution on that page. I would have to try it to be sure and I don't have a Windows VM in convenient reach right now. All the best, Michael > Regards, > Martin -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fuzzyman at voidspace.org.uk Mon Nov 29 01:28:59 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 29 Nov 2010 00:28:59 +0000 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2E6C3.3010009@v.loewis.de> <4CF2E972.2040209@voidspace.org.uk> <4CF2ECD0.4000003@v.loewis.de> Message-ID: <4CF2F3CB.6090808@voidspace.org.uk> On 29/11/2010 00:04, Alexander Belopolsky wrote: > On Sun, Nov 28, 2010 at 6:59 PM, "Martin v. L?wis" wrote: > .. >> The next question then is if it supports indo-arabic digits in any >> locale (or more specifically in an arabic locale). > And once you answered that question, does it support Devanagari or > Bengali digits? And if so, an arbitrary mix of those and indo-arabic > digits? Haha. Go and try it yourself. :-) Michael -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From solipsis at pitrou.net Mon Nov 29 01:29:40 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 29 Nov 2010 01:29:40 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <20101129010112.343eaf64@pitrou.net> Message-ID: <1290990580.8242.2.camel@localhost.localdomain> > > Perhaps int(), float(), Decimal() and friends could take an optional > > parameter indicating whether non-ascii digits are considered. It would > > then satisfy all parties. > > What parties? I don't think anyone has claimed to actually have used > non-ASCII digits with float(). Have you done a poll of all Python 3 users? > Of course it is fun that Python can > process Bengali numerals, but so would be allowing Roman numerals. > There is a reason why after careful consideration, PEP 313 was > ultimately rejected. That's mostly irrelevant. This feature exists and someone, somewhere, may be using it. We normally don't remove stuff without deprecation. Antoine. From ncoghlan at gmail.com Mon Nov 29 01:48:51 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Nov 2010 10:48:51 +1000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF28310.7070304@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> <4CF28310.7070304@voidspace.org.uk> Message-ID: On Mon, Nov 29, 2010 at 2:28 AM, Michael Foord wrote: > For wrapping mutable types I'm tempted to say YAGNI. For the standard > library wrapping integers meets almost all our use-cases except for one > float. (At work we have a decimal constant as it happens.) Perhaps we could > require immutable types for groups but allow arbitrary values for individual > named values? Whereas my opinion is that "immutable vs mutable" is such a blurry distinction that we shouldn't try to make it at the lowest level. Would it be possible to name frozenset instances? Tuples? How about objects that are conceptually immutable, but don't close all the loopholes allowing you to mutate them? (e.g. Decimal, Fraction) Better to design a named value API that doesn't care about mutability, and then leave questions of reverse mappings from values back to names to the grouping API level. At that level, it would be trivial (and natural) to limit names to referencing Hashable values so that a reverse lookup table would be easy to implement. For standard library purposes, we could even reasonably provide an int-only grouping API, since the main use case is almost certainly to be in managing translation of OS-level integer constants to named values. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ben+python at benfinney.id.au Mon Nov 29 01:55:33 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Mon, 29 Nov 2010 11:55:33 +1100 Subject: [Python-Dev] Python and the Unicode Character Database References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <20101129010112.343eaf64@pitrou.net> Message-ID: <8739ql54oq.fsf@benfinney.id.au> Alexander Belopolsky writes: > On Sun, Nov 28, 2010 at 7:01 PM, Antoine Pitrou wrote: > > Perhaps int(), float(), Decimal() and friends could take an optional > > parameter indicating whether non-ascii digits are considered. It > > would then satisfy all parties. > > What parties? I don't think anyone has claimed to actually have used > non-ASCII digits with float(). Rather, it has been pointed out that there is an unknown amount of existing code which does that. You're not going to know how much or how little from this forum. > Of course it is fun that Python can process Bengali numerals, but so > would be allowing Roman numerals. There is a reason why after careful > consideration, PEP 313 was ultimately rejected. Rejecting a proposed *new* capability is a different matter from disabling an *existing* capability which works in existing Python releases. -- \ ?Following fashion and the status quo is easy. Thinking about | `\ your users' lives and creating something practical is much | _o__) harder.? ?Ryan Singer, 2008-07-09 | Ben Finney From fuzzyman at voidspace.org.uk Mon Nov 29 01:57:27 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 29 Nov 2010 00:57:27 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> <4CF28310.7070304@voidspace.org.uk> Message-ID: <4CF2FA77.3000604@voidspace.org.uk> On 29/11/2010 00:48, Nick Coghlan wrote: > On Mon, Nov 29, 2010 at 2:28 AM, Michael Foord > wrote: >> For wrapping mutable types I'm tempted to say YAGNI. For the standard >> library wrapping integers meets almost all our use-cases except for one >> float. (At work we have a decimal constant as it happens.) Perhaps we could >> require immutable types for groups but allow arbitrary values for individual >> named values? > Whereas my opinion is that "immutable vs mutable" is such a blurry > distinction that we shouldn't try to make it at the lowest level. > Would it be possible to name frozenset instances? Tuples? How about > objects that are conceptually immutable, but don't close all the > loopholes allowing you to mutate them? (e.g. Decimal, Fraction) > > Better to design a named value API that doesn't care about mutability, > and then leave questions of reverse mappings from values back to names > to the grouping API level. At that level, it would be trivial (and > natural) to limit names to referencing Hashable values so that a > reverse lookup table would be easy to implement. For standard library > purposes, we could even reasonably provide an int-only grouping API, > since the main use case is almost certainly to be in managing > translation of OS-level integer constants to named values. Sounds reasonable to me. Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From tjreedy at udel.edu Mon Nov 29 02:00:56 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 28 Nov 2010 20:00:56 -0500 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF2E86F.5000606@v.loewis.de> References: <4CF2E86F.5000606@v.loewis.de> Message-ID: On 11/28/2010 6:40 PM, "Martin v. L?wis" wrote: > I have now completed > > http://www.python.org/dev/peps/pep-0384/ The current text contains several error messages like: "System Message: WARNING/2 (pep-0384.txt, line 194) Bullet list ends without a blank line; unexpected unindent." Terry Jan Reedy From steve at pearwood.info Mon Nov 29 01:14:31 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 29 Nov 2010 11:14:31 +1100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2D4E9.3060607@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> Message-ID: <4CF2F067.5020705@pearwood.info> Martin v. L?wis wrote: >>>>>> float('????.??') >>> 1234.56 > > I think it's a bug that this works. The definition of the float builtin says [...] I think that's a documentation bug rather than a coding bug. If Python wishes to limit the digits allowed in numeric *literals* to ASCII 0...9, that's one thing, but I think that the digits allowed in numeric *strings* should allow the full range of digits supported by the Unicode standard. The former ensures that literals in code are always readable; the later allows users to enter numbers in their own number system. How could that be a bad thing? -- Steven From rob.cliffe at btinternet.com Sun Nov 28 02:07:08 2010 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Sun, 28 Nov 2010 01:07:08 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF2C86C.9030505@canterbury.ac.nz> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CEDDC2D.204@canterbury.ac.nz> <4CEE5C1C.9000905@btinternet.com> <4CF2C86C.9030505@canterbury.ac.nz> Message-ID: <4CF1AB3C.3060408@btinternet.com> On 28/11/2010 21:23, Greg Ewing wrote: > Rob Cliffe wrote: > >> But couldn't they be presented to the Python programmer as a single >> type, with the implementation details hidden "under the hood"? > > Not in CPython, because tuple items are kept in the same block > of memory as the object header. Because CPython can't move > objects, this means that the size of the tuple must be known > when the object is created. > But when a frozen list a.k.a. tuple would be created - either directly, or by setting a list's mutable flag to False which would really turn it into a tuple - the size *would* be known. And since the object would now be immutable, there would be no requirement for its size to change. (My idea doesn't require additional functionality, just a different API.) Rob Cliffe From alexander.belopolsky at gmail.com Mon Nov 29 02:24:24 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 20:24:24 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <8739ql54oq.fsf@benfinney.id.au> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <20101129010112.343eaf64@pitrou.net> <8739ql54oq.fsf@benfinney.id.au> Message-ID: On Sun, Nov 28, 2010 at 7:55 PM, Ben Finney wrote: .. >> Of course it is fun that Python can process Bengali numerals, but so >> would be allowing Roman numerals. There is a reason why after careful >> consideration, PEP 313 was ultimately rejected. > > Rejecting a proposed *new* capability is a different matter from > disabling an *existing* capability which works in existing Python > releases. Was this capability ever documented? It does not feel like a deliberate feature. If it was, '\N{ARABIC DECIMAL SEPARATOR}' would be accepted in arabic-indic notation. If feels more like a CPython implementation detail similar to say: >>> int('10') is 10 True >>> int('10000') is 10000 False Note that the underlying PyUnicode_EncodeDecimal() function is described in the unicodeobject.h header file as follows: /* --- Decimal Encoder ---------------------------------------------------- */ /* Takes a Unicode string holding a decimal value and writes it into an output buffer using standard ASCII digit codes. .. The encoder converts whitespace to ' ', decimal characters to their corresponding ASCII digit and all other Latin-1 characters except \0 as-is. Characters outside this range (Unicode ordinals 1-256) are treated as errors. This includes embedded NULL bytes. */ So the support for non-ASCII digits is accidental and should be treated as a bug. From ben+python at benfinney.id.au Mon Nov 29 02:25:56 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Mon, 29 Nov 2010 12:25:56 +1100 Subject: [Python-Dev] Python and the Unicode Character Database References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> Message-ID: <87y68c53a3.fsf@benfinney.id.au> Steven D'Aprano writes: > If Python wishes to limit the digits allowed in numeric *literals* to > ASCII 0...9, that's one thing, but I think that the digits allowed in > numeric *strings* should allow the full range of digits supported by > the Unicode standard. I assume you specifically mean that the numeric class constructors, like ?int? and ?float?, should parse their input string such that any character Unicode defines as a numeric digit is mapped to the corresponding digit. That sounds attractive, but it raises questions about mixed notations, mixing digits from different writing systems, and probably other questionss I haven't thought of. It's not something to make a simple yes-or-no-decision on now, IMO. This sounds best suited to a PEP, which someone who cares enough can champion in ?python-ideas?. -- \ ?The manager has personally passed all the water served here.? | `\ ?hotel, Acapulco | _o__) | Ben Finney From steve at pearwood.info Mon Nov 29 00:43:59 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 29 Nov 2010 10:43:59 +1100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: Message-ID: <4CF2E93F.70208@pearwood.info> Alexander Belopolsky wrote: > Two recently reported issues brought into light the fact that Python > language definition is closely tied to character properties maintained > by the Unicode Consortium. [1,2] For example, when Python switches to > Unicode 6.0.0 (planned for the upcoming 3.2 release), we will gain two > additional characters that Python can use in identifiers. [3] [...] Why do you consider this a problem? It would be a problem if previously valid identifiers *stopped* being valid, but not the other way around. > Of course, the likelihood is low that this change will affect any > user, but the change in str.isspace() reported in [1] is likely to > cause some trouble: Looking at the thread here: http://bugs.python.org/issue10567 I interpret it as indicting that Python's isspace() has been buggy for many years, and is only now being fixed. It's always unfortunate when people rely on bugs, but I'm not sure we should be promising to support bug-for-bug compatibility from one version to the next :) > While we have little choice but to follow UCD in defining > str.isidentifier(), I think Python can promise users more stability in > what it treats as space or as a digit in its builtins. For example, > I don't think that supporting > >>>> float('????.??') > 1234.56 > > is more important than to assure users that once their program > accepted some text as a number, they can assume that the text is > ASCII. Seems like a pretty foolish assumption, if you ask me, pretty much akin to assuming that if string.isalpha() returns true that string is ASCII. Support for non-Arabic numerals in number strings goes back to at least Python 2.4: [steve at sylar ~]$ python2.4 Python 2.4.6 (#1, Mar 30 2009, 10:08:01) [GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> float(u'????.??') 1234.5599999999999 The fact that this is (apparently) only being raised now means that it isn't actually a problem in real life. I'd even say that it's a feature, and that if Python didn't support non-Arabic numerals, it should. -- Steven From alexander.belopolsky at gmail.com Mon Nov 29 03:32:15 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 21:32:15 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2E93F.70208@pearwood.info> References: <4CF2E93F.70208@pearwood.info> Message-ID: On Sun, Nov 28, 2010 at 6:43 PM, Steven D'Aprano wrote: .. >> is more important than to assure users that once their program >> accepted some text as a number, they can assume that the text is >> ASCII. > > Seems like a pretty foolish assumption, if you ask me, pretty much akin to > assuming that if string.isalpha() returns true that string is ASCII. > It is not to 99.9% of Python users whose code is written for 2.x. Their strings are byte strings and string.isdigit() does imply ASCII even if string.isalpha() does not in many locales. .. > The fact that this is (apparently) only being raised now means that it isn't > actually a problem in real life. I'd even say that it's a feature, and that > if Python didn't support non-Arabic numerals, it should. > I raised this problem because I found a bug that is related to this feature. The bug is also a regression from 2.x. In 2.7: >>> float(u'1234\xa1') .. ValueError: invalid literal for float(): 1234? The last character is lost, but the error message is still meaningful. In 3.x, however: >>> float('1234\xa1') .. ValueError See http://bugs.python.org/issue10557 While investigating this issue I found that by the time the string gets to the number parser (_Py_dg_strtod), all non-ascii characters are dropped by PyUnicode_EncodeDecimal() so it cannot produce meaningful diagnostic. Of course, PyUnicode_EncodeDecimal(), can be fixed by making it pass non-ascii chars through as UTF-8 bytes, but I was wondering if preserving the ability to parse exotic numerals was worth the effort. From rrr at ronadam.com Mon Nov 29 04:03:39 2010 From: rrr at ronadam.com (Ron Adam) Date: Sun, 28 Nov 2010 21:03:39 -0600 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> Message-ID: <4CF3180B.1060306@ronadam.com> On 11/27/2010 04:51 AM, Nick Coghlan wrote: > x = named_value("FOO", 1) > y = named_value("BAR", "Hello World!") > z = named_value("BAZ", dict(a=1, b=2, c=3)) > > print(x, y, z, sep="\n") > print("\n".join(map(repr, (x, y, z)))) > print("\n".join(map(str, map(type, (x, y, z))))) > > set_named_values(globals(), foo=x._raw(), bar=y._raw(), baz=z._raw()) > print("\n".join(map(repr, (foo, bar, baz)))) > print(type(x) is type(foo), type(y) is type(bar), type(z) is type(baz)) > > ========================================================================== > > # Session output for the last 6 lines >>>> >>> print(x, y, z, sep="\n") > 1 > Hello World! > {'a': 1, 'c': 3, 'b': 2} > >>>> >>> print("\n".join(map(repr, (x, y, z)))) > FOO=1 > BAR='Hello World!' > BAZ={'a': 1, 'c': 3, 'b': 2} This reminds me of python annotations. Which seem like an already forgotten new feature. Maybe they can help with this? It does associate additional info to names and creates a nice dictionary to reference. >>> def name_values( FOO: 1, BAR: "Hello World!", BAZ: dict(a=1, b=2, c=3) ): ... return FOO, BAR, BAZ ... >>> foo(1,2,3) (1, 2, 3) >>> foo.__annotations__ {'BAR': 'Hello World!', 'FOO': 1, 'BAZ': {'a': 1, 'c': 3, 'b': 2}} Cheers, Ron From rrr at ronadam.com Mon Nov 29 04:03:39 2010 From: rrr at ronadam.com (Ron Adam) Date: Sun, 28 Nov 2010 21:03:39 -0600 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> Message-ID: <4CF3180B.1060306@ronadam.com> On 11/27/2010 04:51 AM, Nick Coghlan wrote: > x = named_value("FOO", 1) > y = named_value("BAR", "Hello World!") > z = named_value("BAZ", dict(a=1, b=2, c=3)) > > print(x, y, z, sep="\n") > print("\n".join(map(repr, (x, y, z)))) > print("\n".join(map(str, map(type, (x, y, z))))) > > set_named_values(globals(), foo=x._raw(), bar=y._raw(), baz=z._raw()) > print("\n".join(map(repr, (foo, bar, baz)))) > print(type(x) is type(foo), type(y) is type(bar), type(z) is type(baz)) > > ========================================================================== > > # Session output for the last 6 lines >>>> >>> print(x, y, z, sep="\n") > 1 > Hello World! > {'a': 1, 'c': 3, 'b': 2} > >>>> >>> print("\n".join(map(repr, (x, y, z)))) > FOO=1 > BAR='Hello World!' > BAZ={'a': 1, 'c': 3, 'b': 2} This reminds me of python annotations. Which seem like an already forgotten new feature. Maybe they can help with this? It does associate additional info to names and creates a nice dictionary to reference. >>> def name_values( FOO: 1, BAR: "Hello World!", BAZ: dict(a=1, b=2, c=3) ): ... return FOO, BAR, BAZ ... >>> foo(1,2,3) (1, 2, 3) >>> foo.__annotations__ {'BAR': 'Hello World!', 'FOO': 1, 'BAZ': {'a': 1, 'c': 3, 'b': 2}} Cheers, Ron From stephen at xemacs.org Mon Nov 29 04:39:32 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 29 Nov 2010 12:39:32 +0900 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2DAD7.2000408@egenix.com> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> Message-ID: <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> M.-A. Lemburg writes: > It is not uncommon for Asians and other non-Latin script users to > use their own native script symbols for numbers. Japanese don't, in computational or scientific work where float() would be used. Japanese numerals are used for dates and for certain felicitous ages (and even there so-called "Arabic" numerals are perfectly acceptable). Otherwise, it's all ASCII (although it might be "full-width" compatibility variants). > Please also remember that Python3 now allows Unicode names for > identifiers for much the same reasons. I don't think it's the same reason, not for Japanese, anyway. I agree that Python should make it easy for the programmer to get numerical values of native numeric strings, but it's not at all clear to me that there is any point to having float() recognize them by default. From ncoghlan at gmail.com Mon Nov 29 04:58:05 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Nov 2010 13:58:05 +1000 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Nov 29, 2010 at 1:39 PM, Stephen J. Turnbull wrote: > I agree that Python should make it easy for the programmer to get > numerical values of native numeric strings, but it's not at all clear > to me that there is any point to having float() recognize them by > default. Indeed, as someone else suggested earlier in the thread, supporting non-ASCII digits sounds more like a job for the locale module than for the builtin types. Deprecating non-ASCII support in the latter, while ensuring it is properly supported in the former sounds like a better way forward than maintaining the status quo (starting in 3.3 though, with the first beta just around the corner, we don't want to be monkeying with this in 3.2) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From martin at v.loewis.de Mon Nov 29 08:18:59 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 29 Nov 2010 08:18:59 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <20101129010112.343eaf64@pitrou.net> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <20101129010112.343eaf64@pitrou.net> Message-ID: <4CF353E3.4020706@v.loewis.de> > Perhaps int(), float(), Decimal() and friends could take an optional > parameter indicating whether non-ascii digits are considered. It would > then satisfy all parties. Not really. I still would want to see what the actual requirement is: i.e. do any users actually have the desire to have these digits accepted, yet the alternative decimal points rejected? Regards, Martin From martin at v.loewis.de Mon Nov 29 08:22:46 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 29 Nov 2010 08:22:46 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2F067.5020705@pearwood.info> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> Message-ID: <4CF354C6.9020302@v.loewis.de> > The former ensures that literals in code are always readable; the later > allows users to enter numbers in their own number system. How could that > be a bad thing? It's YAGNI, feature bloat. It gives the illusion of supporting something that actually isn't supported very well (namely, parsing local number strings). I claim that there is no meaningful application of this feature. Regards, Martin From martin at v.loewis.de Mon Nov 29 08:25:19 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 08:25:19 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <1290990580.8242.2.camel@localhost.localdomain> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <20101129010112.343eaf64@pitrou.net> <1290990580.8242.2.camel@localhost.localdomain> Message-ID: <4CF3555F.9040106@v.loewis.de> > That's mostly irrelevant. This feature exists and someone, somewhere, > may be using it. We normally don't remove stuff without deprecation. Sure: it should be deprecated before being removed. Regards, Martin From amauryfa at gmail.com Mon Nov 29 08:55:13 2010 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Mon, 29 Nov 2010 08:55:13 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF2E86F.5000606@v.loewis.de> References: <4CF2E86F.5000606@v.loewis.de> Message-ID: 2010/11/29 "Martin v. L?wis" > I have now completed > > http://www.python.org/dev/peps/pep-0384/ was structseq.h considered? IMO it could be made PEP384-compliant with two additions that would replace two non-compliant functions: - A new function to create types, since PyStructSequence_InitType is supposed to work on a unititialized static variable: PyTypeObject *PyStructSequence_NewType(PyStructSequence_Desc *desc); - PyStructSequence_SetItem(), similar to the macro PyStructSequence_SET_ITEM; the PyStructSequence structure should be hidden. -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Mon Nov 29 09:09:14 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 09:09:14 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: References: <4CF2E86F.5000606@v.loewis.de> Message-ID: <4CF35FAA.50600@v.loewis.de> > I have now completed > > http://www.python.org/dev/peps/pep-0384/ > > > was structseq.h considered? No, it wasn't - unfortunately, it still doesn't get included when including Python.h. I'll add it. > IMO it could be made PEP384-compliant with two additions that would > replace two non-compliant functions: > > - A new function to create types, since PyStructSequence_InitType > is supposed to work on a unititialized static variable: > PyTypeObject *PyStructSequence_NewType(PyStructSequence_Desc *desc); > - PyStructSequence_SetItem(), similar to the > macro PyStructSequence_SET_ITEM; the PyStructSequence structure should > be hidden. Sounds good. Regards, Martin From mal at egenix.com Mon Nov 29 09:35:05 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 29 Nov 2010 09:35:05 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> Message-ID: <4CF365B9.5040303@egenix.com> Alexander Belopolsky wrote: > On Sun, Nov 28, 2010 at 5:42 PM, M.-A. Lemburg wrote: > .. >> I don't see why the language spec should limit the wealth of number >> formats supported by float(). >> > > The Language Spec (whatever it is) should not, but hopefully the > Library Reference should. If you follow > http://docs.python.org/dev/py3k/library/functions.html#float link and > the references therein, you'll end up with ... the language spec again :-) > digit ::= "0"..."9" > > http://docs.python.org/dev/py3k/reference/lexical_analysis.html#grammar-token-digit That's obviously a bug in the documentation, since the Python 2.7 docs don't mention any such relationship to the language spec: http://docs.python.org/library/functions.html#float -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 29 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From g.brandl at gmx.net Mon Nov 29 09:36:56 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 29 Nov 2010 09:36:56 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF35FAA.50600@v.loewis.de> References: <4CF2E86F.5000606@v.loewis.de> <4CF35FAA.50600@v.loewis.de> Message-ID: Am 29.11.2010 09:09, schrieb "Martin v. L?wis": >> I have now completed >> >> http://www.python.org/dev/peps/pep-0384/ >> >> >> was structseq.h considered? > > No, it wasn't - unfortunately, it still doesn't get included when > including Python.h. I'll add it. Would 3.2 be a good time to finally include it? All of its macros and declarations are named PyStructSequence*, so there shouldn't be a name clash concern. Georg From g.brandl at gmx.net Mon Nov 29 09:52:19 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 29 Nov 2010 09:52:19 +0100 Subject: [Python-Dev] [Preview] Comments and change proposals on documentation In-Reply-To: <4CF19B3C.2000308@pearwood.info> References: <4CF18220.7000202@pearwood.info> <4CF19B3C.2000308@pearwood.info> Message-ID: Am 28.11.2010 00:58, schrieb Steven D'Aprano: > Georg Brandl wrote: >> Am 27.11.2010 23:11, schrieb Steven D'Aprano: > >>> I wasn't able to find a comment bubble that contained anything, so I >>> don't know what sort of information you expect them to contain -- every >>> one I tried said "0 comments". >> >> Maybe you should have tried the page I recommended as a demo, and where Nick >> made his comments? :) > > Aha! I never would have guessed that the bubbles are clickable -- I > thought you just moused-over them and they showed static comments put > there by the developers, part of the documentation itself. I didn't > realise that it was for users to add spam^W comments to the page. With > that perspective, I need to rethink. > > Yes, I failed to fully read the instructions you sent, or understand > them. That's what users do -- they don't read your instructions, and > they misunderstand them. If your UI isn't easily discoverable, users > will not be able to use it, and will be frustrated and annoyed. The user > is always right, even when they're doing it wrong *wink* That's right, of course. I really come to the conclusion that having a text link that "looks like" a link, i.e. is underlined, will have a better UI experience (since we cannot put notes "click bubble to comment" everywhere). >>> But it seems to me that comments are superfluous, if not actively harmful: >> >> (I've not read anything about harmful below. Was that just FUD?) > > Lowering accessibility to parts of the documentation is what I was > talking about when I said "actively harmful". But now that I have better > understanding of what the comment system is actually for, I have to rethink. Thanks! Georg From doko at ubuntu.com Mon Nov 29 11:24:22 2010 From: doko at ubuntu.com (Matthias Klose) Date: Mon, 29 Nov 2010 11:24:22 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF2E86F.5000606@v.loewis.de> References: <4CF2E86F.5000606@v.loewis.de> Message-ID: <4CF37F56.9030808@ubuntu.com> On 29.11.2010 00:40, "Martin v. L?wis" wrote: > I have now completed > > http://www.python.org/dev/peps/pep-0384/ > > Benjamin has volunteered to rule on this PEP. > > Please comment with any changes you want to see, or speak in > favor or against this PEP. I looked at a diff with r84330 from the py3k branch. Extensions built with Py_LIMITED_API have the python version encoded in it's name. Which abi name should be used for these extensions? - The m and u modifiers in the abi name are complimentary (?) - debug builds and Py_LIMITED_API are incompatible (?) and therefore the current name should be used? - For posix systems the implementation is currently part of the abi name, are Py_LIMITED_API extensions supposed to be compatible with e.g. PyPy? Should the LIMITED_API abi name include the implementation string? - Should the distutils support for LIMITED_API be part of the pep, or be implemented later? In favour of the pep. Matthias From mal at egenix.com Mon Nov 29 12:02:57 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 29 Nov 2010 12:02:57 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CF38861.5090309@egenix.com> Nick Coghlan wrote: > On Mon, Nov 29, 2010 at 1:39 PM, Stephen J. Turnbull wrote: >> I agree that Python should make it easy for the programmer to get >> numerical values of native numeric strings, but it's not at all clear >> to me that there is any point to having float() recognize them by >> default. > > Indeed, as someone else suggested earlier in the thread, supporting > non-ASCII digits sounds more like a job for the locale module than for > the builtin types. > > Deprecating non-ASCII support in the latter, while ensuring it is > properly supported in the former sounds like a better way forward than > maintaining the status quo (starting in 3.3 though, with the first > beta just around the corner, we don't want to be monkeying with this > in 3.2) Since when do we only support certain Unicode features in specific locales ? If we would go down that road, we would also have to disable other Unicode features based on locale, e.g. whether to apply non-ASCII case mappings, what to consider whitespace, etc. We don't do that for a good reason: Unicode is supposed to be universal and not limited to a single locale. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 29 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From sylvain.thenault at logilab.fr Mon Nov 29 12:53:11 2010 From: sylvain.thenault at logilab.fr (Sylvain =?utf-8?B?VGjDqW5hdWx0?=) Date: Mon, 29 Nov 2010 12:53:11 +0100 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: <4CEE9B72.1070002@ronadam.com> References: <201011251530.23947.emile.anclin@logilab> <4CEE9B72.1070002@ronadam.com> Message-ID: <20101129115311.GD18888@lupus.logilab.fr> On 25 novembre 11:22, Ron Adam wrote: > On 11/25/2010 08:30 AM, Emile Anclin wrote: > > > >hello, > > > >working on Pylint, we have a lot of voluntary corrupted files to test > >Pylint behavior; for instance > > > >$ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py > ># -*- coding: IBO-8859-1 -*- > >""" check correct unknown encoding declaration > >""" > > > >__revision__ = '????' > > > > > >and we try to find that module : > >find_module('func_unknown_encoding', None). But python3 raises SyntaxError > >in that case ; it didn't raise SyntaxError on python2 nor does so on our > >func_nonascii_noencoding and func_wrong_encoding modules (with obvious > >names) > > > >Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) > >[GCC 4.3.4] on linux2 > >Type "help", "copyright", "credits" or "license" for more information. > >>>>from imp import find_module > >>>>find_module('func_unknown_encoding', None) > >Traceback (most recent call last): > > File " ", line 1, in > >SyntaxError: encoding problem: with BOM > > I don't think there is a clear reason by design. Also try importing > the same modules directly and noting the differences in the errors > you get. IMO the point is that we can consider as a bug the fact that find_module tries to somewhat read the content of the file, no? Though it seems to only doing this for encoding detection or like since find_module doesn't choke on a module containing another kind of syntax error. So the question is, should we deal with this in pylint/astng, or can we expect this to be fixed at some point? -- Sylvain Th?nault LOGILAB, Paris (France) Formations Python, Debian, M?th. Agiles: http://www.logilab.fr/formations D?veloppement logiciel sur mesure: http://www.logilab.fr/services CubicWeb, the semantic web framework: http://www.cubicweb.org From ncoghlan at gmail.com Mon Nov 29 13:43:26 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Nov 2010 22:43:26 +1000 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF38861.5090309@egenix.com> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF38861.5090309@egenix.com> Message-ID: On Mon, Nov 29, 2010 at 9:02 PM, M.-A. Lemburg wrote: > If we would go down that road, we would also have to disable other > Unicode features based on locale, e.g. whether to apply non-ASCII > case mappings, what to consider whitespace, etc. > > We don't do that for a good reason: Unicode is supposed to be > universal and not limited to a single locale. Because parsing numbers is about more than just the characters used for the individual digits. There are additional semantics associated with digit ordering (for any number) and decimal separators and exponential notation (for floating point numbers) and those vary by locale. We deliberately chose to make the builtin numeric parsers unaware of all of those things, and assuming that we can simply parse other digits as if they were their ASCII equivalents and otherwise assume a C locale seems questionable. If the existing semantics can be adequately defined, documented and defended, then retaining them would be fine. However, the language reference needs to define the behaviour properly so that other implementations know what they need to support and what can be chalked up as being just an implementation accident of CPython. (As a point in the plus column, both decimal.Decimal and fractions.Fraction were able to handle the '????.??' example in a manner consistent with the int and float handling) Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From merwok at netwok.org Mon Nov 29 14:14:30 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Mon, 29 Nov 2010 14:14:30 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF2E86F.5000606@v.loewis.de> References: <4CF2E86F.5000606@v.loewis.de> Message-ID: <4CF3A736.4050003@netwok.org> Hello, > Please comment with any changes you want to see, or speak in > favor or against this PEP. How to get a diff between py3k and this branch? Regards From doko at ubuntu.com Mon Nov 29 14:37:33 2010 From: doko at ubuntu.com (Matthias Klose) Date: Mon, 29 Nov 2010 14:37:33 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF3A736.4050003@netwok.org> References: <4CF2E86F.5000606@v.loewis.de> <4CF3A736.4050003@netwok.org> Message-ID: <4CF3AC9D.20309@ubuntu.com> On 29.11.2010 14:14, ?ric Araujo wrote: > Hello, > >> Please comment with any changes you want to see, or speak in >> favor or against this PEP. > > How to get a diff between py3k and this branch? I used svn diff svn://svn.python.org/python/branches/py3k at 84330 svn://svn.python.org/python/branches/pep-0384 From ncoghlan at gmail.com Mon Nov 29 14:58:50 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Nov 2010 23:58:50 +1000 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF3AC9D.20309@ubuntu.com> References: <4CF2E86F.5000606@v.loewis.de> <4CF3A736.4050003@netwok.org> <4CF3AC9D.20309@ubuntu.com> Message-ID: On Mon, Nov 29, 2010 at 11:37 PM, Matthias Klose wrote: > On 29.11.2010 14:14, ?ric Araujo wrote: >> >> Hello, >> >>> Please comment with any changes you want to see, or speak in >>> favor or against this PEP. >> >> How to get a diff between py3k and this branch? > > I used > svn diff svn://svn.python.org/python/branches/py3k at 84330 > svn://svn.python.org/python/branches/pep-0384 I had to use the full read/write svn+ssh:pythondev at svn.python.org repository URLs to get it to give me a diff. The http read only URLs didn't work (no diff returned, just "svn: OPTIONS of 'http://svn.python.org/python/branches/pep-0384': 200 OK (http://svn.python.org)"), and the bare svn protocol isn't enabled on svn.python.org. Since directory diffs don't appear to be enabled on the svn.python.org ViewVC instance, it would probably be a good idea to put this up on Reitveld so people can more easily see the details of what has been changed on the branch to date. If nobody beats me to it, I'll put one up in the morning. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Mon Nov 29 15:07:32 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 30 Nov 2010 00:07:32 +1000 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF2E86F.5000606@v.loewis.de> References: <4CF2E86F.5000606@v.loewis.de> Message-ID: On Mon, Nov 29, 2010 at 9:40 AM, "Martin v. L?wis" wrote: > I have now completed > > http://www.python.org/dev/peps/pep-0384/ > > Benjamin has volunteered to rule on this PEP. > > Please comment with any changes you want to see, or speak in > favor or against this PEP. This is probably an issue independent of the PEP, but there appear to be a *lot* of exposed typedefs for various type slots and other function signatures that don't start with the Py prefix (i.e. getter, setter, unaryfunc and friends). Python.h shouldn't be leaking unprefixed names like that. We certainly shouldn't be enshrining them in the stable ABI without adding prefixes first. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Mon Nov 29 15:19:07 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 29 Nov 2010 15:19:07 +0100 Subject: [Python-Dev] Python and the Unicode Character Database References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20101129151907.64e3f6ae@pitrou.net> On Mon, 29 Nov 2010 13:58:05 +1000 Nick Coghlan wrote: > On Mon, Nov 29, 2010 at 1:39 PM, Stephen J. Turnbull wrote: > > I agree that Python should make it easy for the programmer to get > > numerical values of native numeric strings, but it's not at all clear > > to me that there is any point to having float() recognize them by > > default. > > Indeed, as someone else suggested earlier in the thread, supporting > non-ASCII digits sounds more like a job for the locale module than for > the builtin types. Not sure, really. For example, "\d" in a regular expression will match all Unicode digits, unless you pass the re.ASCII flag. The C locale mechanism generally does a poor job of supporting what MS seems to call "culture-specific" characteristics. Regards Antoine. From solipsis at pitrou.net Mon Nov 29 15:22:24 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 29 Nov 2010 15:22:24 +0100 Subject: [Python-Dev] Python and the Unicode Character Database References: <4CF2E93F.70208@pearwood.info> Message-ID: <20101129152224.7c253a8c@pitrou.net> On Sun, 28 Nov 2010 21:32:15 -0500 Alexander Belopolsky wrote: > On Sun, Nov 28, 2010 at 6:43 PM, Steven D'Aprano wrote: > .. > >> is more important than to assure users that once their program > >> accepted some text as a number, they can assume that the text is > >> ASCII. > > > > Seems like a pretty foolish assumption, if you ask me, pretty much akin to > > assuming that if string.isalpha() returns true that string is ASCII. > > > > It is not to 99.9% of Python users whose code is written for 2.x. > Their strings are byte strings and string.isdigit() does imply ASCII > even if string.isalpha() does not in many locales. We are not talking about string.isdigit(), we are talking about the float() constructor when given an unicode string. Constructing a float from an unicode string is certainly a common thing, even in 2.x. Regards Antoine. From foom at fuhm.net Mon Nov 29 15:15:12 2010 From: foom at fuhm.net (James Y Knight) Date: Mon, 29 Nov 2010 09:15:12 -0500 Subject: [Python-Dev] PEP 384 final review In-Reply-To: References: <4CF2E86F.5000606@v.loewis.de> <4CF3A736.4050003@netwok.org> <4CF3AC9D.20309@ubuntu.com> Message-ID: <28693E2E-A60E-4F83-BF55-DBD6EAD88353@fuhm.net> On Nov 29, 2010, at 8:58 AM, Nick Coghlan wrote: > The http read only URLs > didn't work (no diff returned, just "svn: OPTIONS of > 'http://svn.python.org/python/branches/pep-0384': 200 OK > (http://svn.python.org)"), That was the wrong url: you should've used http://svn.python.org/projects/python/branches/pep-0384 James -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Mon Nov 29 16:19:19 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 29 Nov 2010 16:19:19 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF38861.5090309@egenix.com> Message-ID: <4CF3C477.1020007@egenix.com> Nick Coghlan wrote: > On Mon, Nov 29, 2010 at 9:02 PM, M.-A. Lemburg wrote: >> If we would go down that road, we would also have to disable other >> Unicode features based on locale, e.g. whether to apply non-ASCII >> case mappings, what to consider whitespace, etc. >> >> We don't do that for a good reason: Unicode is supposed to be >> universal and not limited to a single locale. > > Because parsing numbers is about more than just the characters used > for the individual digits. There are additional semantics associated > with digit ordering (for any number) and decimal separators and > exponential notation (for floating point numbers) and those vary by > locale. We deliberately chose to make the builtin numeric parsers > unaware of all of those things, and assuming that we can simply parse > other digits as if they were their ASCII equivalents and otherwise > assume a C locale seems questionable. Sure, and those additional semantics are locale dependent, even between ASCII-only locales. However, that does not apply to the basic building blocks, the decimal digits themselves. > If the existing semantics can be adequately defined, documented and > defended, then retaining them would be fine. However, the language > reference needs to define the behaviour properly so that other > implementations know what they need to support and what can be chalked > up as being just an implementation accident of CPython. (As a point in > the plus column, both decimal.Decimal and fractions.Fraction were able > to handle the '????.??' example in a manner consistent with the int > and float handling) The support is built into the C API, so there's not really much surprise there. Regarding documentation, we'd just have to add that numbers may be made up of an Unicode code point in the category "Nd". See http://www.unicode.org/versions/Unicode5.2.0/ch04.pdf, section 4.6 for details.... """ Decimal digits form a large subcategory of numbers consisting of those digits that can be used to form decimal-radix numbers. They include script-specific digits, but exclude char- acters such as Roman numerals and Greek acrophonic numerals. (Note that <1, 5> = 15 = fifteen, but = IV = four.) Decimal digits also exclude the compatibility subscript or superscript digits to prevent simplistic parsers from misinterpreting their values in context. """ int(), float() and long() (in Python2) are such simplistic parsers. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 29 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ziade.tarek at gmail.com Mon Nov 29 16:59:42 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Mon, 29 Nov 2010 16:59:42 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF37F56.9030808@ubuntu.com> References: <4CF2E86F.5000606@v.loewis.de> <4CF37F56.9030808@ubuntu.com> Message-ID: On Mon, Nov 29, 2010 at 11:24 AM, Matthias Klose wrote: > On 29.11.2010 00:40, "Martin v. L?wis" wrote: >> >> I have now completed >> >> http://www.python.org/dev/peps/pep-0384/ >> >> Benjamin has volunteered to rule on this PEP. >> >> Please comment with any changes you want to see, or speak in >> favor or against this PEP. > > I looked at a diff with r84330 from the py3k branch. > > Extensions built with Py_LIMITED_API have the python version encoded in it's > name. ?Which abi name should be used for these extensions? >.. > ?- Should the distutils support for LIMITED_API be part of the pep, or > ? be implemented later? In any case, it has to be implemented in Distutils2, not in Distutils. Distutils is frozen and just in maintenance mode. Once Distutils2 final is released (it's currently in alpha), it will be installable from 2.4 to 3.x and can provide this feature. For Python itself we can backport the feature in its setup.py, until Distutils2 is back to the sdtlib > In favour of the pep. +1 > > ?Matthias > -- Tarek Ziad? | http://ziade.org From alexander.belopolsky at gmail.com Mon Nov 29 17:07:03 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 29 Nov 2010 11:07:03 -0500 Subject: [Python-Dev] [Preview] Comments and change proposals on documentation In-Reply-To: References: <4CF18220.7000202@pearwood.info> <4CF19B3C.2000308@pearwood.info> Message-ID: On Mon, Nov 29, 2010 at 3:52 AM, Georg Brandl wrote: .. >> Yes, I failed to fully read the instructions you sent, or understand >> them. That's what users do -- they don't read your instructions, and >> they misunderstand them. If your UI isn't easily discoverable, users >> will not be able to use it, and will be frustrated and annoyed. The user >> is always right, even when they're doing it wrong *wink* > > That's right, of course. ?I really come to the conclusion that having a text > link that "looks like" a link, i.e. is underlined, will have a better UI > experience (since we cannot put notes "click bubble to comment" everywhere). > Please don't make comment bubbles more visible. Doing so will only decrease signal to noise ratio. I think a little bit of a learning barrier is a good thing: it will keep down the number of "Bart was here" comments. From alexander.belopolsky at gmail.com Mon Nov 29 19:09:58 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 29 Nov 2010 13:09:58 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF354C6.9020302@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> Message-ID: On Mon, Nov 29, 2010 at 2:22 AM, "Martin v. L?wis" wrote: >> The former ensures that literals in code are always readable; the later >> allows users to enter numbers in their own number system. How could that >> be a bad thing? > > It's YAGNI, feature bloat. It gives the illusion of supporting something > that actually isn't supported very well (namely, parsing local number > strings). I claim that there is no meaningful application > of this feature. > Speaking of YAGNI, does anyone want to defend >>> complex('????.??j') 1234.56j ? Especially given that we reject complex('1234.56i'): http://bugs.python.org/issue10562 From solipsis at pitrou.net Mon Nov 29 19:33:02 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 29 Nov 2010 19:33:02 +0100 Subject: [Python-Dev] Python and the Unicode Character Database References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> Message-ID: <20101129193302.115dbcd5@pitrou.net> On Mon, 29 Nov 2010 08:22:46 +0100 "Martin v. L?wis" wrote: > > The former ensures that literals in code are always readable; the later > > allows users to enter numbers in their own number system. How could that > > be a bad thing? > > It's YAGNI, feature bloat. It gives the illusion of supporting something > that actually isn't supported very well (namely, parsing local number > strings). I claim that there is no meaningful application > of this feature. Still, if it's not detrimental and it it's not difficult to support, then why do you care? You aren't even maintaining that part of the code. I don't think "remove feature bloat" is part of our development goals or practices. Given the diversity of our user base, such removal should be done carefully and only for serious reasons. Regards Antoine. From mal at egenix.com Mon Nov 29 19:59:57 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 29 Nov 2010 19:59:57 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> Message-ID: <4CF3F82D.2040000@egenix.com> Alexander Belopolsky wrote: > On Mon, Nov 29, 2010 at 2:22 AM, "Martin v. L?wis" wrote: >>> The former ensures that literals in code are always readable; the later >>> allows users to enter numbers in their own number system. How could that >>> be a bad thing? >> >> It's YAGNI, feature bloat. It gives the illusion of supporting something >> that actually isn't supported very well (namely, parsing local number >> strings). I claim that there is no meaningful application >> of this feature. This is not about parsing local number strings, it's about parsing number strings represented using different scripts - besides en-US is a locale as well, ye know :-) > Speaking of YAGNI, does anyone want to defend > >>>> complex('????.??j') > 1234.56j > > ? Yes. The same arguments apply. Just because ASCII-proponents may have a hard time reading such literals, doesn't mean that script users have the same trouble. > Especially given that we reject complex('1234.56i'): > > http://bugs.python.org/issue10562 We've had that discussion long before we had Unicode in Python. The main reason was that 'i' looked to similar to 1 in a number of fonts which is why it was rejected for Python source code. However, I don't any reason why we shouldn't accept both i and j for complex(), though, since the input to that constructor doesn't have to originate in Python source code. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 29 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From brett at python.org Mon Nov 29 20:22:22 2010 From: brett at python.org (Brett Cannon) Date: Mon, 29 Nov 2010 11:22:22 -0800 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: <20101129115311.GD18888@lupus.logilab.fr> References: <201011251530.23947.emile.anclin@logilab> <4CEE9B72.1070002@ronadam.com> <20101129115311.GD18888@lupus.logilab.fr> Message-ID: On Mon, Nov 29, 2010 at 03:53, Sylvain Th?nault wrote: > On 25 novembre 11:22, Ron Adam wrote: >> On 11/25/2010 08:30 AM, Emile Anclin wrote: >> > >> >hello, >> > >> >working on Pylint, we have a lot of voluntary corrupted files to test >> >Pylint behavior; for instance >> > >> >$ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py >> ># -*- coding: IBO-8859-1 -*- >> >""" check correct unknown encoding declaration >> >""" >> > >> >__revision__ = '????' >> > >> > >> >and we try to find that module : >> >find_module('func_unknown_encoding', None). But python3 raises SyntaxError >> >in that case ; it didn't raise SyntaxError on python2 nor does so on our >> >func_nonascii_noencoding and func_wrong_encoding modules (with obvious >> >names) >> > >> >Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) >> >[GCC 4.3.4] on linux2 >> >Type "help", "copyright", "credits" or "license" for more information. >> >>>>from imp import find_module >> >>>>find_module('func_unknown_encoding', None) >> >Traceback (most recent call last): >> > ? File " ", line 1, in >> >SyntaxError: encoding problem: with BOM >> >> I don't think there is a clear reason by design. ?Also try importing >> the same modules directly and noting the differences in the errors >> you get. > > IMO the point is that we can consider as a bug the fact that find_module > tries to somewhat read the content of the file, no? Though it seems to only > doing this for encoding detection or like since find_module doesn't choke on > a module containing another kind of syntax error. > > So the question is, should we deal with this in pylint/astng, or can we expect > this to be fixed at some point? Considering these semantics changed between Python 2 and 3 w/o a discernable benefit (I would consider it a negative as finding a module should not be impacted by syntactic correctness; the full act of importing should be the only thing that cares about that), I would consider it a bug that should be filed. From tjreedy at udel.edu Mon Nov 29 20:23:28 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 29 Nov 2010 14:23:28 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF3C477.1020007@egenix.com> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF38861.5090309@egenix.com> <4CF3C477.1020007@egenix.com> Message-ID: On 11/29/2010 10:19 AM, M.-A. Lemburg wrote: > Nick Coghlan wrote: >> On Mon, Nov 29, 2010 at 9:02 PM, M.-A. Lemburg wrote: >>> If we would go down that road, we would also have to disable other >>> Unicode features based on locale, e.g. whether to apply non-ASCII >>> case mappings, what to consider whitespace, etc. >>> >>> We don't do that for a good reason: Unicode is supposed to be >>> universal and not limited to a single locale. >> >> Because parsing numbers is about more than just the characters used >> for the individual digits. There are additional semantics associated >> with digit ordering (for any number) and decimal separators and >> exponential notation (for floating point numbers) and those vary by >> locale. We deliberately chose to make the builtin numeric parsers >> unaware of all of those things, and assuming that we can simply parse >> other digits as if they were their ASCII equivalents and otherwise >> assume a C locale seems questionable. > > Sure, and those additional semantics are locale dependent, even > between ASCII-only locales. However, that does not apply to the > basic building blocks, the decimal digits themselves. > >> If the existing semantics can be adequately defined, documented and >> defended, then retaining them would be fine. However, the language >> reference needs to define the behaviour properly so that other >> implementations know what they need to support and what can be chalked >> up as being just an implementation accident of CPython. (As a point in >> the plus column, both decimal.Decimal and fractions.Fraction were able >> to handle the '????.??' example in a manner consistent with the int >> and float handling) > > The support is built into the C API, so there's not really much > surprise there. > > Regarding documentation, we'd just have to add that numbers may > be made up of an Unicode code point in the category "Nd". > > See http://www.unicode.org/versions/Unicode5.2.0/ch04.pdf, section > 4.6 for details.... > > """ > Decimal digits form a large subcategory of numbers consisting of those digits that can be > used to form decimal-radix numbers. They include script-specific digits, but exclude char- > acters such as Roman numerals and Greek acrophonic numerals. (Note that<1, 5> = 15 = > fifteen, but = IV = four.) Decimal digits also exclude the compatibility subscript or > superscript digits to prevent simplistic parsers from misinterpreting their values in context. > """ > > int(), float() and long() (in Python2) are such simplistic > parsers. Since you are the knowledgable advocate of the current behavior, perhaps you could open an issue and propose a doc patch, even if not .rst formatted. -- Terry Jan Reedy From alexander.belopolsky at gmail.com Mon Nov 29 20:38:46 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 29 Nov 2010 14:38:46 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <20101129193302.115dbcd5@pitrou.net> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> Message-ID: On Mon, Nov 29, 2010 at 1:33 PM, Antoine Pitrou wrote: > On Mon, 29 Nov 2010 08:22:46 +0100 > "Martin v. L?wis" wrote: >> > The former ensures that literals in code are always readable; the later >> > allows users to enter numbers in their own number system. How could that >> > be a bad thing? >> >> It's YAGNI, feature bloat. It gives the illusion of supporting something >> that actually isn't supported very well (namely, parsing local number >> strings). I claim that there is no meaningful application >> of this feature. > > Still, if it's not detrimental and it it's not difficult to support, > then why do you care? It is difficult to support. A fix for issue10557 would be much simpler if we did not support non-European digits. I now added a patch that handles non-ascii digits, so you can see what's involved. Note that when Unicode Consortium inevitably adds more Nd characters to the non-BMP planes, we will have to add surrogate pairs' support to this code. In any case, there is little we can do about it in 3.2 other than fix bugs like issue10557 without breaking currently valid code, so I created a separate issue to continue this debate in context of 3.3. [issue10581] Now, I would like to bring this thread back to it's subject. Given that UCD is now affecting the language definition and the standard library behavior, how should changes to UCD be handled? - Should Python documentation refer to the specific version of Unicode that it supports? Current documentation refers to old versions. Should version be updated or removed to imply the latest? - How UCD updates should be handled during the language moratorium? During PEP 3003 discussion, it was suggested to handle it on a case by case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP 3003. Should this upgrade be backported to 2.7? - How specific should library reference manual be in defining methods affected by UCD such as str.upper()? - What is an acceptable level of variation between Python implementations? For example, if '\UXXXXXXXX'.isalpha() returns true in one implementation, can it return false in another? Note that even CPython narrow and wide builds are presently not consistent in this respect. [issue10581] http://bugs.python.org/issue10581 From alexander.belopolsky at gmail.com Mon Nov 29 20:43:14 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 29 Nov 2010 14:43:14 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF38861.5090309@egenix.com> <4CF3C477.1020007@egenix.com> Message-ID: On Mon, Nov 29, 2010 at 2:23 PM, Terry Reedy wrote: .. > Since you are the knowledgable advocate of the current behavior, perhaps you > could open an issue and propose a doc patch, even if not .rst formatted. > I am not an advocate of the current behavior, but an issue for doc patches is at . From martin at v.loewis.de Mon Nov 29 20:38:59 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 20:38:59 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: References: <4CF2E86F.5000606@v.loewis.de> <4CF37F56.9030808@ubuntu.com> Message-ID: <4CF40153.8030100@v.loewis.de> >> - Should the distutils support for LIMITED_API be part of the pep, or >> be implemented later? > > In any case, it has to be implemented in Distutils2, not in Distutils. > Distutils is frozen and just in maintenance mode. I think it's too late for that. PEP 3149 is accepted, and it does specify a change to distutils (namely, the abi= parameter). ISTM that an approved PEP will override the distutils code freeze. > For Python itself we can backport the feature in its setup.py, until > Distutils2 is back to the sdtlib This won't be for python itself, but for extension modules. Regards, Martin From ziade.tarek at gmail.com Mon Nov 29 20:45:35 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Mon, 29 Nov 2010 20:45:35 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF40153.8030100@v.loewis.de> References: <4CF2E86F.5000606@v.loewis.de> <4CF37F56.9030808@ubuntu.com> <4CF40153.8030100@v.loewis.de> Message-ID: 2010/11/29 "Martin v. L?wis" : >>> ?- Should the distutils support for LIMITED_API be part of the pep, or >>> ? be implemented later? >> >> In any case, it has to be implemented in Distutils2, not in Distutils. >> Distutils is frozen and just in maintenance mode. > > I think it's too late for that. PEP 3149 is accepted, and it does > specify a change to distutils (namely, the abi= parameter). ISTM that > an approved PEP will override the distutils code freeze. Having an accepted PEP does not imply that it should be implemented in the standard library. For instance PEP 345 and PEP 376 are accepted but implemented in Distutils2. it's also a: - good opportunity to boost Distutils2 adoption - way to get feedback from people for that abi= option and have the chance to correct any design issue before d2 is added in the sdtlib > >> For Python itself we can backport the feature in its setup.py, until >> Distutils2 is back to the sdtlib > > This won't be for python itself, but for extension modules. ok. > > Regards, > Martin > -- Tarek Ziad? | http://ziade.org From rrr at ronadam.com Mon Nov 29 21:21:07 2010 From: rrr at ronadam.com (Ron Adam) Date: Mon, 29 Nov 2010 14:21:07 -0600 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: References: <201011251530.23947.emile.anclin@logilab> <4CEE9B72.1070002@ronadam.com> <20101129115311.GD18888@lupus.logilab.fr> Message-ID: On 11/29/2010 01:22 PM, Brett Cannon wrote: > On Mon, Nov 29, 2010 at 03:53, Sylvain Th?nault > wrote: >> On 25 novembre 11:22, Ron Adam wrote: >>> On 11/25/2010 08:30 AM, Emile Anclin wrote: >>>> >>>> hello, >>>> >>>> working on Pylint, we have a lot of voluntary corrupted files to test >>>> Pylint behavior; for instance >>>> >>>> $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py >>>> # -*- coding: IBO-8859-1 -*- >>>> """ check correct unknown encoding declaration >>>> """ >>>> >>>> __revision__ = '????' >>>> >>>> >>>> and we try to find that module : >>>> find_module('func_unknown_encoding', None). But python3 raises SyntaxError >>>> in that case ; it didn't raise SyntaxError on python2 nor does so on our >>>> func_nonascii_noencoding and func_wrong_encoding modules (with obvious >>>> names) >>>> >>>> Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) >>>> [GCC 4.3.4] on linux2 >>>> Type "help", "copyright", "credits" or "license" for more information. >>>>>> >from imp import find_module >>>>>>> find_module('func_unknown_encoding', None) >>>> Traceback (most recent call last): >>>> File " ", line 1, in >>>> SyntaxError: encoding problem: with BOM >>> >>> I don't think there is a clear reason by design. Also try importing >>> the same modules directly and noting the differences in the errors >>> you get. >> >> IMO the point is that we can consider as a bug the fact that find_module >> tries to somewhat read the content of the file, no? Though it seems to only >> doing this for encoding detection or like since find_module doesn't choke on >> a module containing another kind of syntax error. >> >> So the question is, should we deal with this in pylint/astng, or can we expect >> this to be fixed at some point? > > Considering these semantics changed between Python 2 and 3 w/o a > discernable benefit (I would consider it a negative as finding a > module should not be impacted by syntactic correctness; the full act > of importing should be the only thing that cares about that), I would > consider it a bug that should be filed. The output of imp.find_module() returns an open file io object, and it's output feeds directly into to imp.load_module(). >>> imp.find_module('pydoc') (<_io.TextIOWrapper name=4 encoding='utf-8'>, '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1)) So I think the imp.find_module() is suppose to be used when you *do* want to do the full act of importing and not for just finding out if or where module xyz exists. Ron From martin at v.loewis.de Mon Nov 29 21:22:02 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 21:22:02 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF37F56.9030808@ubuntu.com> References: <4CF2E86F.5000606@v.loewis.de> <4CF37F56.9030808@ubuntu.com> Message-ID: <4CF40B6A.6080407@v.loewis.de> > Extensions built with Py_LIMITED_API have the python version encoded in > it's name. Which abi name should be used for these extensions? PEP 3149, IIUC, says it should be "abi3". I don't understand what that means, though (with respect to, say, distutils) > - The m and u modifiers in the abi name are complimentary (?) See above: none of these will be used. Of course, it is possible to name an ABI-conforming extensions with the regular ABI name of the Python release. > - For posix systems the implementation is currently part of the abi name, > are Py_LIMITED_API extensions supposed to be compatible with e.g. PyPy? That's a choice that PyPy needs to make, of course, but Amaury has indicated that they are interested in doing so. > Should the LIMITED_API abi name include the implementation string? > - Should the distutils support for LIMITED_API be part of the pep, or > be implemented later? Depends on what support you want. Currently, all you need to do is to define Py_LIMITED_API to the preprocessor - this is something that is already supported in distutils. If you want the support suggested in PEP 3149 (specifying abi=3), it should certainly be implemented in Python 3.2, despite the distutils freeze. Regards, Martin From martin at v.loewis.de Mon Nov 29 21:36:46 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 21:36:46 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: References: <4CF2E86F.5000606@v.loewis.de> Message-ID: <4CF40EDE.10004@v.loewis.de> > This is probably an issue independent of the PEP but there appear to > be a *lot* of exposed typedefs for various type slots and other > function signatures that don't start with the Py prefix (i.e. getter, > setter, unaryfunc and friends). It's indeed independent: the names don't actually affect the ABI, but the API. Changing them is possible later without risking binary compatibility. > Python.h shouldn't be leaking > unprefixed names like that. We certainly shouldn't be enshrining them > in the stable ABI without adding prefixes first. The stable ABI isn't actually enshrining them - what gets enshrined is the value of the typedefs, not their names. I don't mind renaming them, though. I see a number of different cases: - struct names. I don't see a problem to have "typedef struct PyFoo PyFoo" I vaguely recall that there had been compiler problems with that construct at some point, but to my knowledge, they are past, and this is actually both well-formed C and well-formed C++. - function pointer type names - "various" other types For the struct types, in particular for the ones which already have a typedef, I think renaming them should be possible right away. Applications that break should be able to use the typedef instead, and continue to work with older releases. For the function pointer type names, caution is necessary. We cannot remove them, since it would break a lot of code. I also think that some smart naming scheme would be desirable that makes the names all sound right, yet allows easy mapping from the existing types. Once such a scheme is added, we should have a graceful deprecation procedure, such as: - release A: add typedefs in addition to existing pointer types, deprecate pointer types in documentation - release B>A: make the old names somehow conditional (e.g. put them all into a header file rename3.h, or some such) - release C>B: remove rename3.h For the other rest, I think many of them are considered internal (of course, they shouldn't appear in the ABI then at all). Renaming them right away might be fine. Regards, Martin From martin at v.loewis.de Mon Nov 29 21:41:09 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 29 Nov 2010 21:41:09 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF3A736.4050003@netwok.org> References: <4CF2E86F.5000606@v.loewis.de> <4CF3A736.4050003@netwok.org> Message-ID: <4CF40FE5.8080800@v.loewis.de> Am 29.11.2010 14:14, schrieb ?ric Araujo: > Hello, > >> Please comment with any changes you want to see, or speak in >> favor or against this PEP. > > How to get a diff between py3k and this branch? As others have already explained: svn diff http://svn.python.org/projects/python/branches/py3k at 84329 http://svn.python.org/projects/python/branches/pep-0384 (84329 is the value of svnmerge-integrated). In any case, I posted it to Rietveld as http://codereview.appspot.com/3262043/ Regards, Martin From greg.ewing at canterbury.ac.nz Mon Nov 29 21:47:23 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Nov 2010 09:47:23 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF1AB3C.3060408@btinternet.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CEDDC2D.204@canterbury.ac.nz> <4CEE5C1C.9000905@btinternet.com> <4CF2C86C.9030505@canterbury.ac.nz> <4CF1AB3C.3060408@btinternet.com> Message-ID: <4CF4115B.7080200@canterbury.ac.nz> Rob Cliffe wrote: > But when a frozen list a.k.a. tuple would be created - either directly, > or by setting a list's mutable flag to False which would really turn it > into a tuple - the size *would* be known. But at that point the object consists of two memory blocks -- one containing just the object header and a pointer to the items, and the other containing the items. To turn that into a true tuple structure would require resizing the main object block to be big enough to hold the items and copying them into it. The main object can't be moved (because there are PyObject *s all over the place pointing to it), so if there's not enough room at its current location, you're out of luck. So lists frozen after creation would have to remain as two blocks, making them second-class citizens compared to those that were created frozen. Either that or store all lists/tuples as two blocks, and give up some of the performance advantages of the current tuple structure. -- Greg From martin at v.loewis.de Mon Nov 29 22:04:03 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 22:04:03 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <20101129193302.115dbcd5@pitrou.net> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> Message-ID: <4CF41543.1030800@v.loewis.de> Am 29.11.2010 19:33, schrieb Antoine Pitrou: > On Mon, 29 Nov 2010 08:22:46 +0100 > "Martin v. L?wis" wrote: >>> The former ensures that literals in code are always readable; the later >>> allows users to enter numbers in their own number system. How could that >>> be a bad thing? >> >> It's YAGNI, feature bloat. It gives the illusion of supporting something >> that actually isn't supported very well (namely, parsing local number >> strings). I claim that there is no meaningful application >> of this feature. > > Still, if it's not detrimental and it it's not difficult to support, > then why do you care? You aren't even maintaining that part of the code. I sure do maintain the Unicode database implementation in Python - the one that is being used (IMO incorrectly) to implement the conversion in question (and also the one that triggered this thread). > I don't think "remove feature bloat" is part of our development goals > or practices. Given the diversity of our user base, such removal should > be done carefully and only for serious reasons. I think it's a serious reason that the intuitive expectation of many people (including committers) deviates from the actual implementation - so much that they clarify the documentation in a way that makes the difference explicit. Having a mismatch between the expected behavior and the actual behavior is a serious problem because it could lead to security issues, e.g. when someone relies on float() to perform certain syntactic checking, making it then possible to sneak in values that cause corruption later on (speaking theoretically, of course - I'm not aware of an application that is vulnerable in this manner). Regards, Martin From martin at v.loewis.de Mon Nov 29 22:13:41 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 22:13:41 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> Message-ID: <4CF41785.5020807@v.loewis.de> > - Should Python documentation refer to the specific version of Unicode > that it supports? You mean, mention it somewhere? Sure (although it would be nice if the documentation generator would automatically extract it from the source, just as it extracts the Python version number). Of course, such mentioning should explain that this is specific to CPython, and not an aspect of Python-the-language. > Current documentation refers to old versions. Should version be > updated or removed to imply the latest? What specific reference are you referring to? > - How UCD updates should be handled during the language moratorium? It's clearly not affected. > During PEP 3003 discussion, it was suggested to handle it on a case by > case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP > 3003. It's covered by "As the standard library is not directly tied to the language definition it is not covered by this moratorium." > Should this upgrade be backported to 2.7? No, it's a new feature. > - How specific should library reference manual be in defining methods > affected by UCD such as str.upper()? It should specify what this actually does in Unicode terminology (probably in addition to a layman's rephrase of that) > - What is an acceptable level of variation between Python > implementations? For example, if '\UXXXXXXXX'.isalpha() returns true > in one implementation, can it return false in another? Implementations are free to use any version of the UCD. Regards, Martin From martin at v.loewis.de Mon Nov 29 22:14:07 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 22:14:07 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: References: <4CF2E86F.5000606@v.loewis.de> <4CF35FAA.50600@v.loewis.de> Message-ID: <4CF4179F.9080700@v.loewis.de> Am 29.11.2010 09:36, schrieb Georg Brandl: > Am 29.11.2010 09:09, schrieb "Martin v. L?wis": >>> I have now completed >>> >>> http://www.python.org/dev/peps/pep-0384/ >>> >>> >>> was structseq.h considered? >> >> No, it wasn't - unfortunately, it still doesn't get included when >> including Python.h. I'll add it. > > Would 3.2 be a good time to finally include it? All of its macros and > declarations are named PyStructSequence*, so there shouldn't be a > name clash concern. Sure, I see no problem with that. Regards, Martin From greg.ewing at canterbury.ac.nz Mon Nov 29 22:36:51 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Nov 2010 10:36:51 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> <4CF28310.7070304@voidspace.org.uk> Message-ID: <4CF41CF3.7040001@canterbury.ac.nz> I don't see how the grouping can be completely separated from the value-naming. If the named values are to be subclassed from the base values, then you want all the members of a group to belong to the *same* subclass. You can't get that by treating each named value on its own and then trying to group them together afterwards. -- Greg From steve at pearwood.info Mon Nov 29 23:09:15 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 30 Nov 2010 09:09:15 +1100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> Message-ID: <4CF4248B.1060409@pearwood.info> Alexander Belopolsky wrote: > Speaking of YAGNI, does anyone want to defend > >>>> complex('????.??j') > 1234.56j *If* we allow float('????.??') (as we currently do, but is being disputed by some), then we should allow complex('????.??j'). It would be silly for complex to be more restrictive than float. > Especially given that we reject complex('1234.56i'): I don't understand why you use 'i' when Python uses 'j' as the symbol for imaginary numbers. >>> complex('1234.56j') 1234.56j works fine. I have no problem with Python choosing one of i/j as the symbol for imaginary-1 and rejecting the other. I prefer i rather than j, but that's because my background is in maths rather than electrical engineering, but I can live with either. But in any case, please don't conflate the question of whether Python should accept j and/or i for complex numbers with the question of supporting non-arabic numerals. The two issues are unrelated. -- Steven From rrr at ronadam.com Tue Nov 30 00:38:26 2010 From: rrr at ronadam.com (Ron Adam) Date: Mon, 29 Nov 2010 17:38:26 -0600 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF3180B.1060306@ronadam.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF3180B.1060306@ronadam.com> Message-ID: On 11/28/2010 09:03 PM, Ron Adam wrote: > It does associate additional info to names and creates a nice dictionary to > reference. > > > >>> def name_values( FOO: 1, > BAR: "Hello World!", > BAZ: dict(a=1, b=2, c=3) ): > ... return FOO, BAR, BAZ > ... > >>> foo(1,2,3) > (1, 2, 3) > >>> foo.__annotations__ > {'BAR': 'Hello World!', 'FOO': 1, 'BAZ': {'a': 1, 'c': 3, 'b': 2}} sigh... I havn't been very focused lately. That should have been: >>> def named_values(FOO:1, BAR:"Hello World!", BAZ:dict(a=1, b=2, c=3)): ... return FOO, BAR, BAZ ... >>> named_values.__annotations__ {'BAR': 'Hello World!', 'FOO': 1, 'BAZ': {'a': 1, 'c': 3, 'b': 2}} >>> named_values(1, 2, 3) (1, 2, 3) Cheers, Ron From ncoghlan at gmail.com Tue Nov 30 03:04:28 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 30 Nov 2010 12:04:28 +1000 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <28693E2E-A60E-4F83-BF55-DBD6EAD88353@fuhm.net> References: <4CF2E86F.5000606@v.loewis.de> <4CF3A736.4050003@netwok.org> <4CF3AC9D.20309@ubuntu.com> <28693E2E-A60E-4F83-BF55-DBD6EAD88353@fuhm.net> Message-ID: On Tue, Nov 30, 2010 at 12:15 AM, James Y Knight wrote: > > On Nov 29, 2010, at 8:58 AM, Nick Coghlan wrote: > > The http read only URLs > didn't work (no diff returned, just "svn: OPTIONS of > 'http://svn.python.org/python/branches/pep-0384': 200 OK > (http://svn.python.org)"), > > That was the wrong url: you should've > used?http://svn.python.org/projects/python/branches/pep-0384 > James Ah, thanks, I always forget that part (since it isn't there in the read/write URLs). The SVN output may qualify as one of the least helpful error messages I have ever seen, though :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Tue Nov 30 03:23:04 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 30 Nov 2010 12:23:04 +1000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF41CF3.7040001@canterbury.ac.nz> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> <4CF28310.7070304@voidspace.org.uk> <4CF41CF3.7040001@canterbury.ac.nz> Message-ID: On Tue, Nov 30, 2010 at 7:36 AM, Greg Ewing wrote: > I don't see how the grouping can be completely separated > from the value-naming. If the named values are to be > subclassed from the base values, then you want all the > members of a group to belong to the *same* subclass. > You can't get that by treating each named value on its > own and then trying to group them together afterwards. Note that my sample implementation cached the created types, so that (for example) there was only ever one "Named " type (my implementation wasn't quite kosher in that respect, since functools.lru_cache has a non-optional size limit - setting maxsize to float('inf') deals with that). A grouping API would use either single or multiple inheritance to create members that supported both the naming aspects as well as the grouping aspects. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From alexander.belopolsky at gmail.com Tue Nov 30 04:46:33 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 29 Nov 2010 22:46:33 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF4248B.1060409@pearwood.info> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <4CF4248B.1060409@pearwood.info> Message-ID: On Mon, Nov 29, 2010 at 5:09 PM, Steven D'Aprano wrote: .. > But in any case, please don't conflate the question of whether Python should > accept j and/or i for complex numbers with the question of supporting > non-arabic numerals. The two issues are unrelated. The two issues are related because they are both about how strict numerical constructors should be. If we want to accept wide variations in how numbers can be spelled, then surely using i for the imaginary unit is much more common than using ? for the digit 7. I see two problems with supporting non-ascii spellings: 1. Support costs. 2. User confusion. The two are related because when users are confused, they will report invalid bugs when Python does not meet their expectations. For example, why >>> int('???', 10) 123 works, but >>> int('??????', 16) Traceback (most recent call last): .. UnicodeEncodeError: 'decimal' codec can't encode character '\uff21' in position 3: invalid decimal Unicode string does not? And if 'decimal' is a codec, why >>> '123'.encode('decimal') Traceback (most recent call last): ... LookupError: unknown encoding: decimal Before anyone suggests that int(.., 16) should consult the new Hex_Digit property in the UCD, let me remind that int() supports bases from 2 through 36. I thought Python design was primarily driven by practicality. Here the only plausible argument that one can make is that if Unicode says it is a digit, we should treat it as a digit. Purity over practicality. In practical terms, UCD comes at a price. The unicodedata module size is over 700K on my machine. This is almost half the size of the python executable and by far the largest extension module. (only CJK encodings come close.) Making builtins depend on the largest extension module for operation does not strike me as sound design. From stephen at xemacs.org Tue Nov 30 05:20:11 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 30 Nov 2010 13:20:11 +0900 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF3F82D.2040000@egenix.com> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <4CF3F82D.2040000@egenix.com> Message-ID: <87d3pn5tok.fsf@uwakimon.sk.tsukuba.ac.jp> M.-A. Lemburg writes: > Just because ASCII-proponents may have a hard time reading such > literals, That's not the point. > doesn't mean that script users have the same trouble. The script users may have no trouble reading them, but that doesn't mean it's not a YAGNI. In Japanese, it's a YAGNI except in addresses on New Year cards and in dates, which could be handled by specialized modules, or by a generic module for extracting numeric information from general (as opposed to program) text. Neither of those is likely to appear in program text in context where they would be used as a numeric literal. In fact, Python *does* consider it a YAGNI for Han! Although my apartment number would be written "???" on a New Year card, Python won't parse it as 704: unicodedata considers those digits to be Lo, except for "?" which fails anyway because it's Nl, not Nd. (To add insult to injury, it doesn't even return numeric values for those characters, even though any Han-user would consider them numeric when used in isolation, except that Japanese would be likely to consider "?" to be the non-numeric "maru" symbol, ie, circle, meaning "OK"!) The whole concept of numeric in Unicode is a mess; why import that mess into Python? Can you give any examples where people do computation, keep books, or do nuclear physics in non-Arabic numerals? I suppose Arabic users might, but even there I suspect not. From stephen at xemacs.org Tue Nov 30 05:39:21 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 30 Nov 2010 13:39:21 +0900 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF4248B.1060409@pearwood.info> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <4CF4248B.1060409@pearwood.info> Message-ID: <87bp575ssm.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > But in any case, please don't conflate the question of whether Python > should accept j and/or i for complex numbers with the question of > supporting non-arabic numerals. The two issues are unrelated. Different, yes, unrelated, no. They're both about whether variant forms of universally used literals should be allowed in a programming language, or whether only the canonical form is allowed. Note that *nobody* is saying that Python should have no facility for parsing these numbers, only that by default literal decimal numerals should be encoded as ASCII digits. For example, I would not object to int() getting a Boolean flag meaning "consult unicodedata for non-ASCII digits", just as it has an optional parameter meaning "decode in base other than 10".[1] OTOH, until somebody says "Yes, in Mecca the bazaar traders keep books on their Lenovos using ISO-8859-6 numerals, and it would be painful for them to switch to what we call 'Arabic' numerals", I'm going to consider it a YAGNI. Just as even though mathematicians clearly prefer "i" as the imaginary unit, there's not enough pain involved in them switching to "j" to make it worth supporting both. (BTW, my first reaction to the "j" notation was "cool, Python supports quaternions out of the box!" It took only a second or so to return to reality, but that was my first reaction.) Footnotes: [1] That might not be a good idea on other grounds, but in principle I would be OK with such built-ins accepting non-ASCII digits on request. From merwok at netwok.org Tue Nov 30 07:33:51 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Tue, 30 Nov 2010 07:33:51 +0100 Subject: [Python-Dev] PEP 291 versus Python 3 Message-ID: <4CF49ACF.6070904@netwok.org> Good morning python-dev, PEP 291 (Backward Compatibility for Standard Library) does not seem to take Python 3 into account. Is this PEP only relevant for the 2.7 branch?* If it?s supposed to apply to 3.x too, despite the view that 3.0 was a clean break, what does it mean to have a module that is developed in the py3k branch and should retain compatibility with 2.3 or 1.5.2? * Tarek?s interpretation: ?The 2.x needs to stay 2.3 compatible so we should keep the 3.x as similar as possible for bugfixes.? In the particular case of distutils (should be compatible with 2.3), we (including I) have been lax. Our tests for example use modern unittest features like skips, which makes them not runnable on old Pythons. I am very uncomfortable with code that seems to run fine but which tests (however few) cannot be run, so I think I?ll have to trade the skips for old-style ?return? statements. The other way of solving that is to change the compat policy. If I remember correctly, the rationale for code compat in distutils is that people may copy distutils from Python x.y to their install of x.y-n; I don?t know if this is still an active practice, and if it is, I don?t know if it should be supported, considering that distutils2 (compatible with 2.4+ and available from PyPI) is coming. Regards From regebro at gmail.com Tue Nov 30 09:10:37 2010 From: regebro at gmail.com (Lennart Regebro) Date: Tue, 30 Nov 2010 09:10:37 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: Message-ID: On Sun, Nov 28, 2010 at 21:24, Alexander Belopolsky wrote: > While we have little choice but to follow UCD in defining > str.isidentifier(), I think Python can promise users more stability in > what it treats as space or as a digit in its builtins. Why? I can see this is a problem if one character that earlier was allowed no longer is. That breaks backwards compatibility. This doesn't. >>>> float('????.??') > 1234.56 > > is more important than to assure users that once their program > accepted some text as a number, they can assume that the text is > ASCII. *I* think it is more important. In python 3, you can never ever assume anything is ASCII any more. ASCII is practically dead an buried as far as Python goes, unless you explicitly encode to it. > def deposit(self, amountstr): > self.balance += float(amountstr) > audit_log("Deposited: " + amountstr) > > Auditor: > > $ cat numbered-account.log > Deposited: ?????.?? That log reasonably should be in UTF-8 or something else, in which case this is not a problem. And that's ignoring that it makes way more sense to log the numerical amount. -- Lennart Regebro: http://regebro.wordpress.com/ Python 3 Porting: http://python3porting.com/ +33 661 58 14 64 From hagen at zhuliguan.net Tue Nov 30 09:15:54 2010 From: hagen at zhuliguan.net (=?ISO-8859-1?Q?Hagen_F=FCrstenau?=) Date: Tue, 30 Nov 2010 09:15:54 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF41785.5020807@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF41785.5020807@v.loewis.de> Message-ID: >> During PEP 3003 discussion, it was suggested to handle it on a case by >> case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP >> 3003. > > It's covered by "As the standard library is not directly tied to the > language definition it is not covered by this moratorium." How is this restricted to the stdlib if it defines the set of valid identifiers? - Hagen From stephen at xemacs.org Tue Nov 30 09:23:10 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 30 Nov 2010 17:23:10 +0900 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: Message-ID: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> Lennart Regebro writes: > *I* think it is more important. In python 3, you can never ever assume > anything is ASCII any more. Sure you can. In Python program text, all keywords will be ASCII (English, even, though it may be en_NL.UTF-8 ) for the forseeable future. I see no reason not to make a similar promise for numeric literals. I see no good reason to allow compatibility full-width Japanese "ASCII" numerals or Arabic cursive numerals in "for i in range(...)" for example. As soon as somebody gives an example of a culture, however minor, that uses computers but actively prefers to use non-ASCII numerals to express numbers in an IT context, I'll review my thinking. But at the moment it's 101% YAGNI. From sylvain.thenault at logilab.fr Tue Nov 30 09:34:18 2010 From: sylvain.thenault at logilab.fr (Sylvain =?utf-8?B?VGjDqW5hdWx0?=) Date: Tue, 30 Nov 2010 09:34:18 +0100 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: References: <201011251530.23947.emile.anclin@logilab> <4CEE9B72.1070002@ronadam.com> <20101129115311.GD18888@lupus.logilab.fr> Message-ID: <20101130083418.GB4157@lupus.logilab.fr> On 29 novembre 14:21, Ron Adam wrote: > On 11/29/2010 01:22 PM, Brett Cannon wrote: > >Considering these semantics changed between Python 2 and 3 w/o a > >discernable benefit (I would consider it a negative as finding a > >module should not be impacted by syntactic correctness; the full act > >of importing should be the only thing that cares about that), I would > >consider it a bug that should be filed. > > The output of imp.find_module() returns an open file io object, and > it's output feeds directly into to imp.load_module(). > > >>> imp.find_module('pydoc') > (<_io.TextIOWrapper name=4 encoding='utf-8'>, > '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1)) > > So I think the imp.find_module() is suppose to be used when you *do* > want to do the full act of importing and not for just finding out if > or where module xyz exists. in python 2, find_module was usable for such usage, and this is a needed api for a tool like pylint. Is there another way to do so with python 3? -- Sylvain Th?nault LOGILAB, Paris (France) Formations Python, Debian, M?th. Agiles: http://www.logilab.fr/formations D?veloppement logiciel sur mesure: http://www.logilab.fr/services CubicWeb, the semantic web framework: http://www.cubicweb.org From cornsea at gmail.com Tue Nov 30 09:41:19 2010 From: cornsea at gmail.com (haiyang kang) Date: Tue, 30 Nov 2010 16:41:19 +0800 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: hi, I agree with this. I never seen any man in China using chinese number literals (at least two kinds:?, ?, same meaning with 1) in Python program, except UI output. They can do some mappings when want to output these non-ascii numbers. Example: if 1: print "?" I think it is a little ugly to have code like this: num = float("?.?"), expected result is: num = 1.1 br, khy On Tue, Nov 30, 2010 at 4:23 PM, Stephen J. Turnbull wrote: > Lennart Regebro writes: > > ?> *I* think it is more important. In python 3, you can never ever assume > ?> anything is ASCII any more. > > Sure you can. ?In Python program text, all keywords will be ASCII > (English, even, though it may be en_NL.UTF-8 ) for the forseeable > future. > > I see no reason not to make a similar promise for numeric literals. ?I > see no good reason to allow compatibility full-width Japanese "ASCII" > numerals or Arabic cursive numerals in "for i in range(...)" for > example. > > As soon as somebody gives an example of a culture, however minor, that > uses computers but actively prefers to use non-ASCII numerals to > express numbers in an IT context, I'll review my thinking. ?But at the > moment it's 101% YAGNI. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/cornsea%40gmail.com > From ziade.tarek at gmail.com Tue Nov 30 10:14:20 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 30 Nov 2010 10:14:20 +0100 Subject: [Python-Dev] PEP 291 versus Python 3 In-Reply-To: <4CF49ACF.6070904@netwok.org> References: <4CF49ACF.6070904@netwok.org> Message-ID: On Tue, Nov 30, 2010 at 7:33 AM, ?ric Araujo wrote: > Good morning python-dev, > > PEP 291 (Backward Compatibility for Standard Library) does not seem to > take Python 3 into account. ?Is this PEP only relevant for the 2.7 > branch?* ?If it?s supposed to apply to 3.x too, despite the view that > 3.0 was a clean break, what does it mean to have a module that is > developed in the py3k branch and should retain compatibility with 2.3 or > 1.5.2? > > * Tarek?s interpretation: ?The 2.x needs to stay 2.3 compatible > ?so we should keep the 3.x as similar as possible for bugfixes.? > > In the particular case of distutils (should be compatible with 2.3), we > (including I) have been lax. ?Our tests for example use modern unittest > features like skips, which makes them not runnable on old Pythons. ?I am > very uncomfortable with code that seems to run fine but which tests > (however few) cannot be run, so I think I?ll have to trade the skips for > old-style ?return? statements. You shouldn't be uncomfortable with the current state of distutils and try to improve its tests (or improve any other nasty stuff you'll find in that code) Distutils is dead code. All we have to do is the bare minimum maintenance. Everything else is a waste of time. >?The other way of solving that is to > change the compat policy. ?If I remember correctly, the rationale for > code compat in distutils is that people may copy distutils from Python > x.y to their install of x.y-n; I don?t know if this is still an active > practice, and if it is, I don?t know if it should be supported, > considering that distutils2 (compatible with 2.4+ and available from > PyPI) is coming. Again, don't worry about these rules in Distutils now. The only rule that now apply to Distutils is that we do only bug fixing, and we should not waste our precious time to do other stuff in there. Plain python tests are fine for what we want to do and simplify our forward ports and backports. One thing we should do though, is fix those bugs in Distutils2 first when they exist there too. I really appreciate all the hard work your are doing in triaging the issues and bug fixing by the way ! Tarek From emile.anclin at logilab.fr Tue Nov 30 10:39:29 2010 From: emile.anclin at logilab.fr (Emile Anclin) Date: Tue, 30 Nov 2010 10:39:29 +0100 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: References: <201011251530.23947.emile.anclin@logilab> <20101129115311.GD18888@lupus.logilab.fr> Message-ID: <201011301039.30033.emile.anclin@logilab> On Monday 29 November 2010 20:22:22 Brett Cannon wrote: > > Considering these semantics changed between Python 2 and 3 w/o a > discernable benefit (I would consider it a negative as finding a > module should not be impacted by syntactic correctness; the full act > of importing should be the only thing that cares about that), I would > consider it a bug that should be filed. ok, here it is : http://bugs.python.org/issue10588 Since I did not understand all of it, I just quoted Brett Cannon in the ticket. -- Emile Anclin http://www.logilab.fr/ http://www.logilab.org/ Informatique scientifique & et gestion de connaissances From steve at pearwood.info Tue Nov 30 13:59:49 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 30 Nov 2010 23:59:49 +1100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CF4F545.5030902@pearwood.info> haiyang kang wrote: > hi, > > I agree with this. > > I never seen any man in China using chinese number literals (at > least two kinds:?, ?, same meaning with 1) > in Python program, except UI output. > > They can do some mappings when want to output these non-ascii numbers. > Example: if 1: print "?" > > I think it is a little ugly to have code like this: num = > float("?.?"), expected result is: num = 1.1 I don't expect that anyone would sensibly write code like that, except for testing. You wouldn't write num = float("1.1") instead of just num = 1.1 either. But you should be able to write: text = input("Enter a number using your preferred digits: ") num = float(text) without caring whether the user enters ?.? or 1.1 or something else. -- Steven From fuzzyman at voidspace.org.uk Tue Nov 30 14:09:16 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 30 Nov 2010 13:09:16 +0000 Subject: [Python-Dev] PEP 291 versus Python 3 In-Reply-To: <4CF49ACF.6070904@netwok.org> References: <4CF49ACF.6070904@netwok.org> Message-ID: <4CF4F77C.4000308@voidspace.org.uk> On 30/11/2010 06:33, ?ric Araujo wrote: > Good morning python-dev, > > PEP 291 (Backward Compatibility for Standard Library) does not seem to > take Python 3 into account. Is this PEP only relevant for the 2.7 > branch?* If it?s supposed to apply to 3.x too, despite the view that > 3.0 was a clean break, what does it mean to have a module that is > developed in the py3k branch and should retain compatibility with 2.3 or > 1.5.2? PEP 291 is very old and should probably be retired. I don't think anyone is maintaining standard libraries in py3k that are also compatible with Python 2.anything. (At least not in a single codebase.) For Python 2.7 that may not be true, but for Python 3 I think we can start with a clean slate on compatibility. > * Tarek?s interpretation: ?The 2.x needs to stay 2.3 compatible > so we should keep the 3.x as similar as possible for bugfixes.? > > In the particular case of distutils (should be compatible with 2.3), we > (including I) have been lax. Our tests for example use modern unittest > features like skips, which makes them not runnable on old Pythons. They can be run on old Pythons with unittest2. This is what distutils2 is doing. > I am > very uncomfortable with code that seems to run fine but which tests > (however few) cannot be run, so I think I?ll have to trade the skips for > old-style ?return? statements. The other way of solving that is to > change the compat policy. This is only an issue for distutils in Python 2.7 right? Maintaining the compat policy for that will be a short-lived pain, and distutils itself is getting only infrequent bugfixes *anyway*, right? I defer to Tarek on that particular decision. All the best, Michael > If I remember correctly, the rationale for > code compat in distutils is that people may copy distutils from Python > x.y to their install of x.y-n; I don?t know if this is still an active > practice, and if it is, I don?t know if it should be supported, > considering that distutils2 (compatible with 2.4+ and available from > PyPI) is coming. > > Regards > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From steve at pearwood.info Tue Nov 30 14:23:22 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 01 Dec 2010 00:23:22 +1100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CF4FACA.8040900@pearwood.info> Stephen J. Turnbull wrote: > Lennart Regebro writes: > > > *I* think it is more important. In python 3, you can never ever assume > > anything is ASCII any more. > > Sure you can. In Python program text, all keywords will be ASCII > (English, even, though it may be en_NL.UTF-8 ) for the forseeable > future. > > I see no reason not to make a similar promise for numeric literals. I > see no good reason to allow compatibility full-width Japanese "ASCII" > numerals or Arabic cursive numerals in "for i in range(...)" for > example. I agree with you that numeric *literals* should be restricted to the ASCII digits. I don't think anyone here is arguing differently -- if they are, they should speak up and try to make the case for allowing numeric literals in arbitrary scripts. Python doesn't currently allow non-ASCII numeric literals, and even if such a change were desirable, it would run up against the moratorium. So let's just forget the specter of code like: x = math.sqrt(????.?? ** ?.?) It ain't gonna happen :) But I think there is a good case for allowing the constructors int, float and complex to continue to accept numeric *strings* with non-ASCII digits. The code already exists, there's probably people out there who rely on it, and in the absence of any convincing demonstration that the existing behaviour is causing widespread difficulty, we should leave well-enough alone. Various people have suggested that there should be a function in the locale module that handles numeric string input in non-ASCII digits. This is a de facto admission that there are use-cases for taking user input like the string '?' and turning it into the int 3. Python can already do this, and has been able to for many years: [steve at sylar ~]$ python2.4 Python 2.4.6 (#1, Mar 30 2009, 10:08:01) [GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> int(u'?') 3 It seems to me that there's no need to move this functionality into locale. -- Steven From solipsis at pitrou.net Tue Nov 30 14:32:54 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Nov 2010 14:32:54 +0100 Subject: [Python-Dev] Python and the Unicode Character Database References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF4FACA.8040900@pearwood.info> Message-ID: <20101130143254.1964e4a8@pitrou.net> On Wed, 01 Dec 2010 00:23:22 +1100 Steven D'Aprano wrote: > > But I think there is a good case for allowing the constructors int, > float and complex to continue to accept numeric *strings* with non-ASCII > digits. The code already exists, there's probably people out there who > rely on it, and in the absence of any convincing demonstration that the > existing behaviour is causing widespread difficulty, we should leave > well-enough alone. +1 > It seems to me that there's no need to move this functionality into locale. Not only, but moving it into locale won't make it easier to maintain anyway. Regards Antoine. From solipsis at pitrou.net Tue Nov 30 14:38:22 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Nov 2010 14:38:22 +0100 Subject: [Python-Dev] Module size References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <4CF4248B.1060409@pearwood.info> Message-ID: <20101130143822.40a827de@pitrou.net> On Mon, 29 Nov 2010 22:46:33 -0500 Alexander Belopolsky wrote: > > In practical terms, UCD comes at a price. The unicodedata module size > is over 700K on my machine. This is almost half the size of the > python executable and by far the largest extension module. (only CJK > encodings come close.) Making builtins depend on the largest > extension module for operation does not strike me as sound design. Well, do they depend on it? _PyUnicode_EncodeDecimal seems to depend only on Objects/unicodectype.c. $ size Objects/unicode*.o text data bss dec hex filename 60398 0 0 60398 ebee Objects/unicodectype.o 130440 13559 2208 146207 23b1f Objects/unicodeobject.o Antoine. From alexander.belopolsky at gmail.com Tue Nov 30 15:18:13 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 30 Nov 2010 09:18:13 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF4F545.5030902@pearwood.info> References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF4F545.5030902@pearwood.info> Message-ID: On Tue, Nov 30, 2010 at 7:59 AM, Steven D'Aprano wrote: .. > But you should be able to write: > > text = input("Enter a number using your preferred digits: ") > num = float(text) > > without caring whether the user enters ?.? or 1.1 or something else. > I find it ironic that people who argue for preservation of the current behavior do it without checking what it actually is: >>> float('?.?') .. UnicodeEncodeError: 'decimal' codec can't encode character '\u4e00' .. This one of the biggest problems with this feature. It does not fit user's expectations. Even the original author of the decimal "codec" expected the above to work. [1] > Python can already do this, and has been able to for many years: > >>> int(u'?') > 3 but you can do this without support from int() as well: >>> import unicodedata >>> unicodedata.digit('?') 3 and for Unihan numbers, you can do >>> unicodedata.numeric('?') 1.0 and >>> unicodedata.numeric('?') 8.0 and if you are so inclined, >>> [unicodedata.numeric(c) for c in "? ? ? ? ?".split()] [10000.0, 5000.0, 0.6, 0.875, 90000.0] Do you want to see all these supported by float()? [1] "makeunicodedata.py does not support Unihan digit data" http://bugs.python.org/issue10575 From alexander.belopolsky at gmail.com Tue Nov 30 15:32:38 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 30 Nov 2010 09:32:38 -0500 Subject: [Python-Dev] Module size In-Reply-To: <20101130143822.40a827de@pitrou.net> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <4CF4248B.1060409@pearwood.info> <20101130143822.40a827de@pitrou.net> Message-ID: On Tue, Nov 30, 2010 at 8:38 AM, Antoine Pitrou wrote: > On Mon, 29 Nov 2010 22:46:33 -0500 > Alexander Belopolsky wrote: >> >> In practical terms, UCD comes at a price. ?The unicodedata module size >> is over 700K on my machine. ?This is almost half the size of the >> python executable and by far the largest extension module. (only CJK >> encodings come close.) ?Making builtins depend on the largest >> extension module for operation does not strike me as sound design. > > Well, do they depend on it? _PyUnicode_EncodeDecimal seems to depend > only on Objects/unicodectype.c. My mistake. That was a late night post. I wonder why unicodedata.so is so big then. It must be character names: $ python -v >>> '\N{DIGIT ONE}' dlopen("/.../unicodedata.so", 2); import unicodedata # dynamically loaded from /.../unicodedata.so '1' From solipsis at pitrou.net Tue Nov 30 15:41:48 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Nov 2010 15:41:48 +0100 Subject: [Python-Dev] Module size In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <4CF4248B.1060409@pearwood.info> <20101130143822.40a827de@pitrou.net> Message-ID: <1291128108.3538.10.camel@localhost.localdomain> Le mardi 30 novembre 2010 ? 09:32 -0500, Alexander Belopolsky a ?crit : > On Tue, Nov 30, 2010 at 8:38 AM, Antoine Pitrou wrote: > > On Mon, 29 Nov 2010 22:46:33 -0500 > > Alexander Belopolsky wrote: > >> > >> In practical terms, UCD comes at a price. The unicodedata module size > >> is over 700K on my machine. This is almost half the size of the > >> python executable and by far the largest extension module. (only CJK > >> encodings come close.) Making builtins depend on the largest > >> extension module for operation does not strike me as sound design. > > > > Well, do they depend on it? _PyUnicode_EncodeDecimal seems to depend > > only on Objects/unicodectype.c. > > My mistake. That was a late night post. I wonder why unicodedata.so > is so big then. > > It must be character names: > > $ python -v > >>> '\N{DIGIT ONE}' > dlopen("/.../unicodedata.so", 2); > import unicodedata # dynamically loaded from /.../unicodedata.so > '1' From a quick peek using hexdump, character names seem to only account for 1/4 of the module size. That said, I don't think the size is very important. For any non-trivial Python application, the size of unicodedata will be negligible compared to the size of Python objects. Regards Antoine. From tlesher at gmail.com Tue Nov 30 15:48:32 2010 From: tlesher at gmail.com (Tim Lesher) Date: Tue, 30 Nov 2010 09:48:32 -0500 Subject: [Python-Dev] Module size In-Reply-To: <1291128108.3538.10.camel@localhost.localdomain> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <4CF4248B.1060409@pearwood.info> <20101130143822.40a827de@pitrou.net> <1291128108.3538.10.camel@localhost.localdomain> Message-ID: On Tue, Nov 30, 2010 at 09:41, Antoine Pitrou wrote: > That said, I don't think the size is very important. For any non-trivial > Python application, the size of unicodedata will be negligible compared > to the size of Python objects. That depends very much on the platform and the application. For our embedded use of Python, static data size (like the text segment of a shared object) is far dearer than the heap space used by Python objects, which is why we've had to excise both the UCD and the CJK codecs in our builds. -- Tim Lesher From cornsea at gmail.com Tue Nov 30 15:56:33 2010 From: cornsea at gmail.com (haiyang kang) Date: Tue, 30 Nov 2010 22:56:33 +0800 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF4F545.5030902@pearwood.info> References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF4F545.5030902@pearwood.info> Message-ID: > But you should be able to write: > > text = input("Enter a number using your preferred digits: ") > num = float(text) > > without caring whether the user enters ?.? or 1.1 or something else. yes. from logical point of view, this can happen. But i really doubt that if really there are users who would like to input number like that, means that they first use google pinyin method to input ?, then change to english input method to input . , then change to google pinyin again for the other ?; or maybe you mean they input the whole ?.? words with google pinyin input method. To input 1, users only need to type one time keyboard, but to input ?, they need to type three times (yi SPACE). Of course, users can also input something accidentally, but we just need to give them some kind reminders. At least coders in my around will restrain their system users to input numbers with ASCII, and seems that users are still happy with the ASCII type numbers :). br, khy From alexander.belopolsky at gmail.com Tue Nov 30 16:05:42 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 30 Nov 2010 10:05:42 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF41785.5020807@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF41785.5020807@v.loewis.de> Message-ID: On Mon, Nov 29, 2010 at 4:13 PM, "Martin v. L?wis" wrote: >> - Should Python documentation refer to the specific version of Unicode >> that it supports? > > You mean, mention it somewhere? Sure (although it would be nice if the > documentation generator would automatically extract it from the source, > just as it extracts the Python version number). > > Of course, such mentioning should explain that this is specific to > CPython, and not an aspect of Python-the-language. > >> Current documentation refers to old versions. ?Should version be >> updated or removed to imply the latest? > > What specific reference are you referring to? > I found two places: A reference to Unicode 3.0 (!) in the Data Model section and a reference to 5.2.0 in unicodedata docs. See http://mail.python.org/pipermail/docs/2010-November/002074.html >> - How UCD updates should be handled during the language moratorium? > > It's clearly not affected. > This is not what Guido said last year: """ > One question: > > There are currently number of patch waiting on the tracker for > additional Unicode feature support and it's also likely that we'll > want to upgrade to a more recent Unicode version within the > next few years. > > How would such indirect changes be seen under the moratorium ? That would fall under the Case-by-Case Exemptions section. "Within the next few years" sounds like it might well wait until the moratorium is ended though. :-) """ http://mail.python.org/pipermail/python-dev/2009-November/093666.html I don't see it as a big deal, but technically speaking, with Unicode 6.0 changing properties of two characters to become identifiers Python language definition is affected. For example, an alternative implementation based on 5.2.0 will not accept a valid CPython program that uses one of these characters. >> During PEP 3003 discussion, it was suggested to handle it on a case by >> case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP >> 3003. > > It's covered by "As the standard library is not directly tied to the > language definition it is not covered by this moratorium." > See above. Also, it has been suggested that semantics of built-ins cannot change. (If that was so, it would put int('????') debate to rest at least for the time being.:-) >> ?Should this upgrade be backported to 2.7? > > No, it's a new feature. > Given that 2.7 will be maintained for 5 years and arguably Unicode Consortium takes backward compatibility very seriously, wouldn't it make sense to consider a backport at some point? I am sure we will soon see a bug report that the following does not work in 2.7: :-) >>> ord('\N{CAT FACE WITH WRY SMILE}') 128572 >> - How specific should library reference manual be in defining methods >> affected by UCD such as str.upper()? > > It should specify what this actually does in Unicode terminology > (probably in addition to a layman's rephrase of that) > I opened an issue for this: http://bugs.python.org/issue10587 >> .. For example, if '\UXXXXXXXX'.isalpha() returns true >> in one implementation, can it return false in another? > > Implementations are free to use any version of the UCD. I was more concerned about wide an narrow unicode CPython builds. Is it a bug that '\UXXXXXXXX'.isalpha() may disagree even when the two implementations are based on the same version of UCD? Thanks for your answers. From alexander.belopolsky at gmail.com Tue Nov 30 16:11:24 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 30 Nov 2010 10:11:24 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF4F545.5030902@pearwood.info> Message-ID: On Tue, Nov 30, 2010 at 9:56 AM, haiyang kang wrote: >> But you should be able to write: >> >> text = input("Enter a number using your preferred digits: ") >> num = float(text) >> >> without caring whether the user enters ?.? or 1.1 or something else. > > yes. from logical point of view, this can happen. ... Please stop discussing a non-feature. Python's float *does not* accept ' ?.?'. This was reported as a bug and closed as invalid. See "makeunicodedata.py does not support Unihan digit data" http://bugs.python.org/issue10575 From barry at python.org Tue Nov 30 16:35:31 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 30 Nov 2010 10:35:31 -0500 Subject: [Python-Dev] PEP 291 versus Python 3 In-Reply-To: <4CF4F77C.4000308@voidspace.org.uk> References: <4CF49ACF.6070904@netwok.org> <4CF4F77C.4000308@voidspace.org.uk> Message-ID: <20101130103531.54d79465@mission> On Nov 30, 2010, at 01:09 PM, Michael Foord wrote: >PEP 291 is very old and should probably be retired. I don't think anyone is >maintaining standard libraries in py3k that are also compatible with Python >2.anything. (At least not in a single codebase.) I agree. I think we should change the status of PEP 291 to Final, and add a few words to make it clear it applies only to Python 2. Since Neal owns the PEP, he should get first crack at doing the update, but I volunteer to make those changes if he declines (or does not respond). We may eventually need a similar document for Python 3, but it should be a new PEP. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From stefan-usenet at bytereef.org Tue Nov 30 16:55:19 2010 From: stefan-usenet at bytereef.org (Stefan Krah) Date: Tue, 30 Nov 2010 16:55:19 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF4F545.5030902@pearwood.info> Message-ID: <20101130155519.GA23354@yoda.bytereef.org> Alexander Belopolsky wrote: > On Tue, Nov 30, 2010 at 9:56 AM, haiyang kang wrote: > >> But you should be able to write: > >> > >> text = input("Enter a number using your preferred digits: ") > >> num = float(text) > >> > >> without caring whether the user enters ?.? or 1.1 or something else. > > > > yes. from logical point of view, this can happen. ... > > Please stop discussing a non-feature. Python's float *does not* > accept ' ?.?'. This was reported as a bug and closed as invalid. That seems irrelevant to me. One of the main topics of this thread is whether actual native speakers would be happy with ascii-only input for float(). haiyang kang confirmed that this is the case. I hope that more local speakers will contribute their views. Stefan Krah From alexander.belopolsky at gmail.com Tue Nov 30 17:40:19 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 30 Nov 2010 11:40:19 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> Message-ID: On Mon, Nov 29, 2010 at 2:38 PM, Alexander Belopolsky wrote: .. >> Still, if it's not detrimental and it it's not difficult to support, >> then why do you care? > > It is difficult to support. ?A fix for issue10557 would be much > simpler if we did not support non-European digits. ?I now added a > patch that handles non-ascii digits, so you can see what's involved. > Note that when Unicode Consortium inevitably adds more Nd characters > to the non-BMP planes, we will have to add surrogate pairs' support to > this code. > It turns out that this did in fact happen: # Newly assigned in Unicode 3.1.0 (March, 2001) .. 1D7CE..1D7FF ; 3.1 # [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE See http://unicode.org/Public/UNIDATA/DerivedAge.txt And of course, >>> unicodedata.digit('\U0001D7CE') 0 but >>> int('\U0001D7CE') .. UnicodeEncodeError: 'decimal' codec can't encode character '\ud835' .. on a narrow Unicode build. (Note the character reported in the error message!) If you think non-ASCII digits are not difficult to support, please contribute to the following tracker issues: http://bugs.python.org/issue10581 (Review and document string format accepted in numeric data type constructors) http://bugs.python.org/issue10557 (Malformed error message from float()) http://bugs.python.org/issue10435 (Document unicode C-API in reST - Specifically, PyUnicode_EncodeDecimal) http://bugs.python.org/issue8646 (PyUnicode_EncodeDecimal is undocumented) http://bugs.python.org/issue6632 (Include more fullwidth chars in the decimal codec) and back to the issue of user confusion http://bugs.python.org/issue652104 [closed/invalid] (int(u"\u1234") raises UnicodeEncodeError by Guido van Rossum) From fuzzyman at voidspace.org.uk Tue Nov 30 18:40:52 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 30 Nov 2010 17:40:52 +0000 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> Message-ID: <4CF53724.8090000@voidspace.org.uk> On 30/11/2010 16:40, Alexander Belopolsky wrote: > [snip...] > And of course, > >>>> unicodedata.digit('\U0001D7CE') > 0 > > but > >>>> int('\U0001D7CE') > .. > UnicodeEncodeError: 'decimal' codec can't encode character '\ud835' .. > > on a narrow Unicode build. (Note the character reported in the error message!) > > > If you think non-ASCII digits are not difficult to support, please > contribute to the following tracker issues: > Would moving this functionality to the locale module make the issues any easier to fix? Michael > http://bugs.python.org/issue10581 > (Review and document string format accepted in numeric data type constructors) > > http://bugs.python.org/issue10557 > (Malformed error message from float()) > > http://bugs.python.org/issue10435 > (Document unicode C-API in reST - Specifically, PyUnicode_EncodeDecimal) > > http://bugs.python.org/issue8646 > (PyUnicode_EncodeDecimal is undocumented) > > http://bugs.python.org/issue6632 > (Include more fullwidth chars in the decimal codec) > > and back to the issue of user confusion > > http://bugs.python.org/issue652104 [closed/invalid] > (int(u"\u1234") raises UnicodeEncodeError by Guido van Rossum) > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From alexander.belopolsky at gmail.com Tue Nov 30 19:21:30 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 30 Nov 2010 13:21:30 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF53724.8090000@voidspace.org.uk> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> Message-ID: On Tue, Nov 30, 2010 at 12:40 PM, Michael Foord wrote: .. >> If you think non-ASCII digits are not difficult to support, please >> contribute to the following tracker issues: >> > > Would moving this functionality to the locale module make the issues any > easier to fix? > Sure, if we code it in Python, supporting it will by much easier: def normalize_digits(s): digits = {m.group(1) for m in re.finditer('(\d)', s)} trtab = {ord(d): str(unicodedata.digit(d)) for d in digits} return s.translate(trtab) >>> normalize_digits('????.??') '1234.56' I am not sure this belongs to the locale module, however. It seems to me, something like 'unicodealgo' for unicode algorithms would be more appropriate. From solipsis at pitrou.net Tue Nov 30 19:29:52 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Nov 2010 19:29:52 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> Message-ID: <1291141792.8628.0.camel@localhost.localdomain> > Sure, if we code it in Python, supporting it will by much easier: > > def normalize_digits(s): > digits = {m.group(1) for m in re.finditer('(\d)', s)} > trtab = {ord(d): str(unicodedata.digit(d)) for d in digits} > return s.translate(trtab) > > >>> normalize_digits('????.??') > '1234.56' > > I am not sure this belongs to the locale module, however. It seems to > me, something like 'unicodealgo' for unicode algorithms would be more > appropriate. It could simply be in unicodedata if you split the implementation into a core C part and some Python bits. Regards Antoine. From alexander.belopolsky at gmail.com Tue Nov 30 19:59:29 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 30 Nov 2010 13:59:29 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <1291141792.8628.0.camel@localhost.localdomain> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> <1291141792.8628.0.camel@localhost.localdomain> Message-ID: On Tue, Nov 30, 2010 at 1:29 PM, Antoine Pitrou wrote: .. >> I am not sure this belongs to the locale module, however. ?It seems to >> me, something like 'unicodealgo' for unicode algorithms would be more >> appropriate. > > It could simply be in unicodedata if you split the implementation into a > core C part and some Python bits. > Splitting unicodedata may not be a bad idea. There are many more pieces in UCD than covered by unicodedata. [1] Hardcoding them all into unicodedata module is hard to justify, but some are quite useful. For example, PropertyValueAliases.txt is quite useful for those like myself who cannot remember what Pd or Zl category names stand for. SpecialCasing.txt is required for proper casing, but is not currently included in Python. I would not want to change str.upper or str.title because of this, but providing the raw info to someone who wants to implement proper case mappings may not be a bad idea. Blocks.txt is certainly useful for any language-dependent processing. On the other hand, I think we should keep Unicode data and Unicode algorithms separate. And the latter may not even belong to the Python stdlib. [1] http://unicode.org/Public/UNIDATA/ From martin at v.loewis.de Tue Nov 30 20:13:01 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Nov 2010 20:13:01 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF41785.5020807@v.loewis.de> Message-ID: <4CF54CBD.9030703@v.loewis.de> Am 30.11.2010 09:15, schrieb Hagen F?rstenau: >>> During PEP 3003 discussion, it was suggested to handle it on a case by >>> case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP >>> 3003. >> >> It's covered by "As the standard library is not directly tied to the >> language definition it is not covered by this moratorium." > > How is this restricted to the stdlib if it defines the set of valid > identifiers? The language does not change. The language specification says Python 3.0 introduces additional characters from outside the ASCII range (see PEP 3131). For these characters, the classification uses the version of the Unicode Character Database as included in the unicodedata module. That remains unchanged. It was a deliberate design decision of PEP 3131 to not codify a fixed set of characters that can be used in identifiers. Regards, Martin From martin at v.loewis.de Tue Nov 30 20:16:49 2010 From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Nov 2010 20:16:49 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF53724.8090000@voidspace.org.uk> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> Message-ID: <4CF54DA1.5080900@v.loewis.de> > Would moving this functionality to the locale module make the issues any > easier to fix? You could delegate it to the C library, so: yes. Regards, Martin From solipsis at pitrou.net Tue Nov 30 20:23:13 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Nov 2010 20:23:13 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF54DA1.5080900@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> <4CF54DA1.5080900@v.loewis.de> Message-ID: <1291144993.8628.1.camel@localhost.localdomain> Le mardi 30 novembre 2010 ? 20:16 +0100, "Martin v. L?wis" a ?crit : > > Would moving this functionality to the locale module make the issues any > > easier to fix? > > You could delegate it to the C library, so: yes. I hope you don't suggest delegating it to the C locale functions. Do you? From martin at v.loewis.de Tue Nov 30 20:40:54 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 30 Nov 2010 20:40:54 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <1291144993.8628.1.camel@localhost.localdomain> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> <4CF54DA1.5080900@v.loewis.de> <1291144993.8628.1.camel@localhost.localdomain> Message-ID: <4CF55346.1040108@v.loewis.de> Am 30.11.2010 20:23, schrieb Antoine Pitrou: > Le mardi 30 novembre 2010 ? 20:16 +0100, "Martin v. L?wis" a ?crit : >>> Would moving this functionality to the locale module make the issues any >>> easier to fix? >> >> You could delegate it to the C library, so: yes. > > I hope you don't suggest delegating it to the C locale functions. > Do you? Yes, I do. Why do you hope I don't? Regards, Martin From brett at python.org Tue Nov 30 20:41:47 2010 From: brett at python.org (Brett Cannon) Date: Tue, 30 Nov 2010 11:41:47 -0800 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: References: <201011251530.23947.emile.anclin@logilab> <4CEE9B72.1070002@ronadam.com> <20101129115311.GD18888@lupus.logilab.fr> Message-ID: On Mon, Nov 29, 2010 at 12:21, Ron Adam wrote: > > > On 11/29/2010 01:22 PM, Brett Cannon wrote: >> >> On Mon, Nov 29, 2010 at 03:53, Sylvain Th?nault >> ?wrote: >>> >>> On 25 novembre 11:22, Ron Adam wrote: >>>> >>>> On 11/25/2010 08:30 AM, Emile Anclin wrote: >>>>> >>>>> hello, >>>>> >>>>> working on Pylint, we have a lot of voluntary corrupted files to test >>>>> Pylint behavior; for instance >>>>> >>>>> $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py >>>>> # -*- coding: IBO-8859-1 -*- >>>>> """ check correct unknown encoding declaration >>>>> """ >>>>> >>>>> __revision__ = '????' >>>>> >>>>> >>>>> and we try to find that module : >>>>> find_module('func_unknown_encoding', None). But python3 raises >>>>> SyntaxError >>>>> in that case ; it didn't raise SyntaxError on python2 nor does so on >>>>> our >>>>> func_nonascii_noencoding and func_wrong_encoding modules (with obvious >>>>> names) >>>>> >>>>> Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) >>>>> [GCC 4.3.4] on linux2 >>>>> Type "help", "copyright", "credits" or "license" for more information. >>>>>>> >>>>>>> >from imp import find_module >>>>>>>> >>>>>>>> find_module('func_unknown_encoding', None) >>>>> >>>>> Traceback (most recent call last): >>>>> ? File " ", line 1, in >>>>> SyntaxError: encoding problem: with BOM >>>> >>>> I don't think there is a clear reason by design. ?Also try importing >>>> the same modules directly and noting the differences in the errors >>>> you get. >>> >>> IMO the point is that we can consider as a bug the fact that find_module >>> tries to somewhat read the content of the file, no? Though it seems to >>> only >>> doing this for encoding detection or like since find_module doesn't choke >>> on >>> a module containing another kind of syntax error. >>> >>> So the question is, should we deal with this in pylint/astng, or can we >>> expect >>> this to be fixed at some point? >> >> Considering these semantics changed between Python 2 and 3 w/o a >> discernable benefit (I would consider it a negative as finding a >> module should not be impacted by syntactic correctness; the full act >> of importing should be the only thing that cares about that), I would >> consider it a bug that should be filed. > > The output of imp.find_module() returns an open file io object, and it's > output feeds directly into to imp.load_module(). > >>>> imp.find_module('pydoc') > (<_io.TextIOWrapper name=4 encoding='utf-8'>, > '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1)) > > So I think the imp.find_module() is suppose to be used when you *do* want to > do the full act of importing and not for just finding out if or where module > xyz exists. Going with your line of argument, why can't imp.load_module be the call that figures out there is a syntax error? If you look at this from the perspective of PEP 302, finding a module has absolutely nothing to do with the validity of the found source, just that something was found somewhere which (hopefully) contains code that represents the module. From solipsis at pitrou.net Tue Nov 30 20:44:14 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Nov 2010 20:44:14 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF55346.1040108@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> <4CF54DA1.5080900@v.loewis.de> <1291144993.8628.1.camel@localhost.localdomain> <4CF55346.1040108@v.loewis.de> Message-ID: <1291146254.8628.4.camel@localhost.localdomain> Le mardi 30 novembre 2010 ? 20:40 +0100, "Martin v. L?wis" a ?crit : > Am 30.11.2010 20:23, schrieb Antoine Pitrou: > > Le mardi 30 novembre 2010 ? 20:16 +0100, "Martin v. L?wis" a ?crit : > >>> Would moving this functionality to the locale module make the issues any > >>> easier to fix? > >> > >> You could delegate it to the C library, so: yes. > > > > I hope you don't suggest delegating it to the C locale functions. > > Do you? > > Yes, I do. Why do you hope I don't? Because we all know how locale is a pile of cr*p, both in specification and in implementations. Our unit tests for it are a clear proof of that. Actually, I remember you saying that locale should ideally be replaced with a wrapper around the ICU library. Regards Antoine. From brett at python.org Tue Nov 30 20:46:07 2010 From: brett at python.org (Brett Cannon) Date: Tue, 30 Nov 2010 11:46:07 -0800 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: <20101130083418.GB4157@lupus.logilab.fr> References: <201011251530.23947.emile.anclin@logilab> <4CEE9B72.1070002@ronadam.com> <20101129115311.GD18888@lupus.logilab.fr> <20101130083418.GB4157@lupus.logilab.fr> Message-ID: On Tue, Nov 30, 2010 at 00:34, Sylvain Th?nault wrote: > On 29 novembre 14:21, Ron Adam wrote: >> On 11/29/2010 01:22 PM, Brett Cannon wrote: >> >Considering these semantics changed between Python 2 and 3 w/o a >> >discernable benefit (I would consider it a negative as finding a >> >module should not be impacted by syntactic correctness; the full act >> >of importing should be the only thing that cares about that), I would >> >consider it a bug that should be filed. >> >> The output of imp.find_module() returns an open file io object, and >> it's output feeds directly into to imp.load_module(). >> >> >>> imp.find_module('pydoc') >> (<_io.TextIOWrapper name=4 encoding='utf-8'>, >> '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1)) >> >> So I think the imp.find_module() is suppose to be used when you *do* >> want to do the full act of importing and not for just finding out if >> or where module xyz exists. > > in python 2, find_module was usable for such usage, and this is a needed api > for a tool like pylint. Is there another way to do so with python 3? At the moment, no. Best option would be to create an importlib.find_module function which returns a loader if the module is found, else returns None. The loader can have its get_source method called to read the source code (w/o verification). I have this planned for Python 3.3 but not 3.2 with us so close to 3.2b1. > -- > Sylvain Th?nault ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? LOGILAB, Paris (France) > Formations Python, Debian, M?th. Agiles: http://www.logilab.fr/formations > D?veloppement logiciel sur mesure: ? ? ? http://www.logilab.fr/services > CubicWeb, the semantic web framework: ? ?http://www.cubicweb.org > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org > From martin at v.loewis.de Tue Nov 30 20:55:52 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 30 Nov 2010 20:55:52 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <1291146254.8628.4.camel@localhost.localdomain> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> <4CF54DA1.5080900@v.loewis.de> <1291144993.8628.1.camel@localhost.localdomain> <4CF55346.1040108@v.loewis.de> <1291146254.8628.4.camel@localhost.localdomain> Message-ID: <4CF556C8.9010704@v.loewis.de> > Because we all know how locale is a pile of cr*p, both in specification > and in implementations. Our unit tests for it are a clear proof of that. I wouldn't use expletives, but rather claim that the locale module is highly platform-dependent. > Actually, I remember you saying that locale should ideally be replaced > with a wrapper around the ICU library. By that, I stand - however, I have given up the hope that this will happen anytime soon. Wrt. to local number parsing, I think that the locale module would be way better than the nonsense that Python currently does. In the locale module, somebody at least has thought about what specifically constitutes a number. The current not-ASCII-but-not-local-either approach is just useless. Maintaining a reasonable implementation is a burden, so deferring to the C library is more attractive than having to maintain an unreasonable implementation. Regards, Martin From solipsis at pitrou.net Tue Nov 30 21:11:59 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Nov 2010 21:11:59 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF556C8.9010704@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> <4CF54DA1.5080900@v.loewis.de> <1291144993.8628.1.camel@localhost.localdomain> <4CF55346.1040108@v.loewis.de> <1291146254.8628.4.camel@localhost.localdomain> <4CF556C8.9010704@v.loewis.de> Message-ID: <1291147919.8628.12.camel@localhost.localdomain> Le mardi 30 novembre 2010 ? 20:55 +0100, "Martin v. L?wis" a ?crit : > Wrt. to local number parsing, I think that the locale module would be > way better than the nonsense that Python currently does. In the locale > module, somebody at least has thought about what specifically > constitutes a number. The current not-ASCII-but-not-local-either > approach is just useless. It depends what you need. If you parse integers it's probably good enough. And it's better to have a trustable standard (unicode) than a myriad of ad-hoc, possibly buggy or incomplete, often unavailable, cultural specifications drafted by OS vendors who have no business (and no expertise) in drafting them. At least you can build more sophisticated routines on the simple information given to you by the unicode database. You cannot build anything solid on the C locale functions (and even then you are limited by various issues inherent in the locale semantics, such as the fact that it relies on process-wide state, which would only be ok, at best, for single-user applications). There's a reason that e.g. Babel (*) reimplements locale-like functionality from scratch. (*) http://pypi.python.org/pypi/Babel/ Regards Antoine. From brett at python.org Tue Nov 30 21:11:58 2010 From: brett at python.org (Brett Cannon) Date: Tue, 30 Nov 2010 12:11:58 -0800 Subject: [Python-Dev] PEP 291 versus Python 3 In-Reply-To: <20101130103531.54d79465@mission> References: <4CF49ACF.6070904@netwok.org> <4CF4F77C.4000308@voidspace.org.uk> <20101130103531.54d79465@mission> Message-ID: On Tue, Nov 30, 2010 at 07:35, Barry Warsaw wrote: > On Nov 30, 2010, at 01:09 PM, Michael Foord wrote: > >>PEP 291 is very old and should probably be retired. I don't think anyone is >>maintaining standard libraries in py3k that are also compatible with Python >>2.anything. (At least not in a single codebase.) > > I agree. Same here; I have purposefully ignored compatibility requirements because I always found those promises to be extremely annoying and somewhat painful to enforce. > ?I think we should change the status of PEP 291 to Final, and add a > few words to make it clear it applies only to Python 2. ?Since Neal owns the > PEP, he should get first crack at doing the update, but I volunteer to make > those changes if he declines (or does not respond). > I will channel Neal: "I decline and/or do not want to respond". =) > We may eventually need a similar document for Python 3, but it should be a new > PEP. I hope not. From solipsis at pitrou.net Tue Nov 30 21:13:07 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Nov 2010 21:13:07 +0100 Subject: [Python-Dev] ICU In-Reply-To: <4CF556C8.9010704@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> <4CF54DA1.5080900@v.loewis.de> <1291144993.8628.1.camel@localhost.localdomain> <4CF55346.1040108@v.loewis.de> <1291146254.8628.4.camel@localhost.localdomain> <4CF556C8.9010704@v.loewis.de> Message-ID: <1291147987.8628.13.camel@localhost.localdomain> Oh, about ICU: > > Actually, I remember you saying that locale should ideally be replaced > > with a wrapper around the ICU library. > > By that, I stand - however, I have given up the hope that this will > happen anytime soon. Perhaps this could be made a GSOC topic. Regards Antoine. From ben+python at benfinney.id.au Tue Nov 30 21:24:08 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 01 Dec 2010 07:24:08 +1100 Subject: [Python-Dev] Python and the Unicode Character Database References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87r5e236hj.fsf@benfinney.id.au> haiyang kang writes: > I think it is a little ugly to have code like this: num = > float("?.?"), expected result is: num = 1.1 That's a straw man, though. The string need not be a literal in the program; it can be input to the program. num = float(input_from_the_external_world) Does that change your assessment of whether non-ASCII digits are used? -- \ ?The greatest tragedy in mankind's entire history may be the | `\ hijacking of morality by religion.? ?Arthur C. Clarke, 1991 | _o__) | Ben Finney From barry at python.org Tue Nov 30 22:05:43 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 30 Nov 2010 16:05:43 -0500 Subject: [Python-Dev] PEP 291 versus Python 3 In-Reply-To: References: <4CF49ACF.6070904@netwok.org> <4CF4F77C.4000308@voidspace.org.uk> <20101130103531.54d79465@mission> Message-ID: <20101130160543.3b478311@mission> On Nov 30, 2010, at 12:11 PM, Brett Cannon wrote: >I will channel Neal: "I decline and/or do not want to respond". =) PEP 291 updated. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From tjreedy at udel.edu Tue Nov 30 23:43:22 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 30 Nov 2010 17:43:22 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote: > I see no reason not to make a similar promise for numeric literals. I > see no good reason to allow compatibility full-width Japanese "ASCII" > numerals or Arabic cursive numerals in "for i in range(...)" for > example. I do not think that anyone, at least not me, has argued for anything other than 0-9 digits (or 0-f for hex) in literals in program code. The only issue is whether non-programmer *users* should be able to use their native digits in applications in response to input prompts. -- Terry Jan Reedy From rrr at ronadam.com Tue Nov 30 23:48:56 2010 From: rrr at ronadam.com (Ron Adam) Date: Tue, 30 Nov 2010 16:48:56 -0600 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: References: <201011251530.23947.emile.anclin@logilab> <4CEE9B72.1070002@ronadam.com> <20101129115311.GD18888@lupus.logilab.fr> Message-ID: On 11/30/2010 01:41 PM, Brett Cannon wrote: > On Mon, Nov 29, 2010 at 12:21, Ron Adam wrote: >> >> >> On 11/29/2010 01:22 PM, Brett Cannon wrote: >>> >>> On Mon, Nov 29, 2010 at 03:53, Sylvain Th?nault >>> wrote: >>>> >>>> On 25 novembre 11:22, Ron Adam wrote: >>>>> >>>>> On 11/25/2010 08:30 AM, Emile Anclin wrote: >>>>>> >>>>>> hello, >>>>>> >>>>>> working on Pylint, we have a lot of voluntary corrupted files to test >>>>>> Pylint behavior; for instance >>>>>> >>>>>> $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py >>>>>> # -*- coding: IBO-8859-1 -*- >>>>>> """ check correct unknown encoding declaration >>>>>> """ >>>>>> >>>>>> __revision__ = '????' >>>>>> >>>>>> >>>>>> and we try to find that module : >>>>>> find_module('func_unknown_encoding', None). But python3 raises >>>>>> SyntaxError >>>>>> in that case ; it didn't raise SyntaxError on python2 nor does so on >>>>>> our >>>>>> func_nonascii_noencoding and func_wrong_encoding modules (with obvious >>>>>> names) >>>>>> >>>>>> Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) >>>>>> [GCC 4.3.4] on linux2 >>>>>> Type "help", "copyright", "credits" or "license" for more information. >>>>>>>> >>>>>>>> >from imp import find_module >>>>>>>>> >>>>>>>>> find_module('func_unknown_encoding', None) >>>>>> >>>>>> Traceback (most recent call last): >>>>>> File " ", line 1, in >>>>>> SyntaxError: encoding problem: with BOM >>>>> >>>>> I don't think there is a clear reason by design. Also try importing >>>>> the same modules directly and noting the differences in the errors >>>>> you get. >>>> >>>> IMO the point is that we can consider as a bug the fact that find_module >>>> tries to somewhat read the content of the file, no? Though it seems to >>>> only >>>> doing this for encoding detection or like since find_module doesn't choke >>>> on >>>> a module containing another kind of syntax error. >>>> >>>> So the question is, should we deal with this in pylint/astng, or can we >>>> expect >>>> this to be fixed at some point? >>> >>> Considering these semantics changed between Python 2 and 3 w/o a >>> discernable benefit (I would consider it a negative as finding a >>> module should not be impacted by syntactic correctness; the full act >>> of importing should be the only thing that cares about that), I would >>> consider it a bug that should be filed. >> >> The output of imp.find_module() returns an open file io object, and it's >> output feeds directly into to imp.load_module(). >> >>>>> imp.find_module('pydoc') >> (<_io.TextIOWrapper name=4 encoding='utf-8'>, >> '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1)) >> >> So I think the imp.find_module() is suppose to be used when you *do* want to >> do the full act of importing and not for just finding out if or where module >> xyz exists. > > Going with your line of argument, why can't imp.load_module be the > call that figures out there is a syntax error? If you look at this > from the perspective of PEP 302, finding a module has absolutely > nothing to do with the validity of the found source, just that > something was found somewhere which (hopefully) contains code that > represents the module. The part that I'm looking at, is what would find_module return if the encoding is bad or not found for the encoding? <_io.TextIOWrapper name=4 encoding='bad_encoding'> Maybe we could have some library introspection function in the inspect for just looking in the library rather than loading modules. But I think those would have the same issues, as packages need to be loaded in order to find sub modules.* * It almost seems like the concept of a sub-module (in a package) is flawed. I'm not sure I can explain what causes me to feel that way at the moment though. Ron


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4