A problem occurred in a Python script. and the following error on the console: d:\temp>c:\python32\python.exe test12.py Error in sys.excepthook: Traceback (most recent call last): File "c:\python32\lib\tempfile.py", line 209, in _mkstemp_inner fd = _os.open(file, flags, 0o600) OSError: [Errno 22] Invalid argument Original exception was: Traceback (most recent call last): File "d:\my\py\test12.py", line 8, in fhb.write("abcdef") # try writing non-binary to binary file. Expect an error, of course. TypeError: 'str' does not support the buffer interface I was expecting see a whole cgitb in sob, but no such luck. Not sure why it is trying to create a temporary file, but it seems to fail to do that. Of course, the next test, would have been to write binary data into fhb, and try to copy it to stdout, which would fail, because stdout has to not be binary to make cgitb work??? That brings me to http.server, the 3.2a4 replacement for CGIHTTPServer. There are definitely some improvements here, and some reported-but-yet-unfixed bugs. And some pitiful missing features, especially on Windows. I applied some of the whacks I had applied to CGIHTTPServer, and got some things working, but, per what I was trying to demonstrate above, there seems to be an incompatibility with the idea of using cgitb (which wants stdout open with some encoding provided) and serving binary files (which wants stdout open in binary) [this latter is supported by the WSGI spec too]. So it seems to be that there are some problems. Yet, it seems that http.server can some accept the data sent by cgitb, which comes from subprocess running my CGI script, but my CGI script fails to be able to copy a binary file to its stdout (a subprocess created PIPE). The subprocess documentation doesn't say what encoding is supplied to the PIPE-created handles, if any, but since cgitb data is accepted but binary file data is not, I infer it must be a non-binary handle, encoding unknown. The subprocess documentation doesn't document any way to specify what encoding should be used on the PIPE-created handles, either. So this isn't very enlightening. In the absence of a specification or parameter, I would have expected the PIPEs to be binary, but this seems to be experimentally false. Yet http.server, when serving plain files, seems to open them in binary mode, and transfer them successfully to the browser. And it can also accept the non-binary?? data from cgitb from my CGI script, and display it in the browser. The former comes from a file it opens in binary mode, and the latter from the subprocess PIPE in unknown mode. It seems that the socketfile.server opens the socket in "wb" mode, and encodes most data. That in turn, seems to imply that the binary data from SimpleHTTPServer files are reasonably returned, and I note the headers and such are expliticly encoded before being written to wfile... again, consistent with the socket, wfile, being in binary mode. But the data coming back from the subprocess PIPE from my CGI script seems to be acceptable to be written to wfile also, implying that the PIPEs are binary, like the absence of specifications and parameters and knowledge of pipes as being bytestreams would be expected. But then, it would seem that the cgitb output should be in binary to get into the PIPE, but it seems that using a binary stdout makes cgitb fail, in the above experiment... and I can't find any code in cgitb that does explicit encoding. So I'm confused, and it seems a little extra documentation might help decide which are the modules that have bugs or missing features, and which do not. One of the cgitb outputs from my attempt to serve the binary file claims that my CGI script's output file (which comes from a subprocess PIPE) is a TextIOWrapper with encoding cp1252. Maybe that is the default that comes when a new Python is launched, even though it gets a subprocess PIPE as stdout? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sat Nov 20 05:11:48 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 20 Nov 2010 13:11:48 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CE6FE30.1050903@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> Message-ID: <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > The term "UCS-2" is a character set that can encode only encode 65536 > characters; it thus refers to Unicode 1.1. According to the Unicode > Consortium's FAQ, the term UCS-2 should be avoided these days. So what do you propose we call the Python implementation? You can call it "code-unit-oriented" if you like, but in fact it is identical to UCS-2 for all non-hairsplitting purposes. AFAICS the Unicode Consortium deprecates the *term* UCS-2 because they would like us to avoid *implementations* that don't encode the full Unicode character set, not because the term is technically incorrect. Strictly speaking, internally Python only encodes 65536 characters in 2-octet builds. Its (Unicode) string-handling code does not know about surrogates at all, AFAIK, and therefore is not UTF-16 conforming. (The anomolies discussed here are type transformations, not string-handling, for my purpose.) I really don't see why we shouldn't call a UCS-2 implementation by its name. AFAIK this was not supposed to change in Python 3; indexing and slicing go by code unit (isomorphic to UCS-n), not character, and due to PEP 383 4-octet builds do not conform (internally) to UTF-32, and can produce output that conforms to Unicode not at all (as a user option, of course, but it's still non-conformant). > > IMO, we should go back to the Python2 terms UCS2 and UCS4 which > > are correct and provide a clear description of what Python uses > > internally for code units. > > No, we shouldn't. The term UCS-2 is deprecated, see above. Too bad for the Unicode Consortium, I say. UCS-2 is the closest term that folks who are not Unicode geeks will have a chance of understanding. I agree with Marc-Andre that "narrow" and "wide" are too ambiguous to be useful. Many people will interpret that as "UTF-16" (or even "UTF-8") and "UTF-32", respectively, which is dead wrong. Others won't have a clue. Using "UCS-2" and "UCS-4" has the correct connotations to Unicode geeks, and they are easy to look up for non-geeks who care about precise definitions. Cf. the second half of the FAQ you quote: Instead, "UCS-2" has sometimes been used in the past to indicate that an implementation does not support supplementary characters and doesn't interpret pairs of surrogate code points as characters. Such an implementation would not handle processing like character properties, codepoint boundaries, collation, etc. for supplementary characters. "Hey, Python, I'm looking at you!" (Strictly speaking, Python libraries do some of that for us, but the Python *language* does not.) From brian.curtin at gmail.com Sat Nov 20 05:24:38 2010 From: brian.curtin at gmail.com (Brian Curtin) Date: Fri, 19 Nov 2010 22:24:38 -0600 Subject: [Python-Dev] [Python-checkins] r86540 - in python/branches/py3k: Parser/asdl_c.py Python/Python-ast.c In-Reply-To: <20101120020146.25797EE989@mail.python.org> References: <20101120020146.25797EE989@mail.python.org> Message-ID: On Fri, Nov 19, 2010 at 20:01, benjamin.peterson wrote: > Author: benjamin.peterson > Date: Sat Nov 20 03:01:45 2010 > New Revision: 86540 > > Log: > c89 declarations > > Modified: > python/branches/py3k/Parser/asdl_c.py > python/branches/py3k/Python/Python-ast.c > > Modified: python/branches/py3k/Parser/asdl_c.py > > ============================================================================== > --- python/branches/py3k/Parser/asdl_c.py (original) > +++ python/branches/py3k/Parser/asdl_c.py Sat Nov 20 03:01:45 2010 > @@ -366,9 +366,9 @@ > self.emit("obj2ast_%s(PyObject* obj, %s* out, PyArena* arena)" % > (name, ctype), 0) > self.emit("{", 0) > self.emit("PyObject* tmp = NULL;", 1) > + self.emit("int isinstance;", 1) > # Prevent compiler warnings about unused variable. > self.emit("tmp = tmp;", 1) > - self.emit("int isinstance;", 1) > self.emit("", 0) > > def sumTrailer(self, name, add_label=False): > > Modified: python/branches/py3k/Python/Python-ast.c > > ============================================================================== > --- python/branches/py3k/Python/Python-ast.c (original) > +++ python/branches/py3k/Python/Python-ast.c Sat Nov 20 03:01:45 2010 > @@ -3375,8 +3375,8 @@ > obj2ast_mod(PyObject* obj, mod_ty* out, PyArena* arena) > { > PyObject* tmp = NULL; > - tmp = tmp; > int isinstance; > + tmp = tmp; Windows builds fail due to this change. -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Sat Nov 20 07:56:18 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 19 Nov 2010 22:56:18 -0800 Subject: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4 In-Reply-To: <4CE7452A.7050109@g.nevcal.com> References: <4CE7452A.7050109@g.nevcal.com> Message-ID: <4CE77112.3080604@g.nevcal.com> On 11/19/2010 7:48 PM, Glenn Linderman wrote: > One of the cgitb outputs from my attempt to serve the binary file > claims that my CGI script's output file (which comes from a subprocess > PIPE) is a TextIOWrapper with encoding cp1252. Maybe that is the > default that comes when a new Python is launched, even though it gets > a subprocess PIPE as stdout? So the rather gross code below solves the cp1252 stdout problem, and also permits both strings and bytes to be written to the same file, although those two features are separable. But now that I've worked around it, it seems that subprocesss should somehow ensure that launched Python programs know they are working on a binary stream? Of course, not all programs launched are Python programs... so maybe it should be a documentation issue, but it seems to be missing from the documentation. ##################################### if sys.version_info[ 0 ] == 2: class IOMix(): def __init__( self, fh, encoding="UTF-8"): self.fh = fh def write( self, param ): if isinstance( param, unicode ): self.fh.write( param.encode( encoding )) else: self.fh.write( param ) ##################################### if sys.version_info[ 0 ] == 3: class IOMix(): def __init__( self, fh, encoding="UTF-8"): if hasattr( fh, 'buffer'): self.bio = fh.buffer fh.flush() self.last = 'b' import io self.txt = io.TextIOWrapper( self.bio, encoding, None, '\r\n') else: raise ValueError("not a buffered stream") def write( self, param ): if isinstance( param, str ): self.last = 't' self.txt.write( param ) else: if self.last == 't': self.txt.flush() self.last = 'b' self.bio.write( param ) ##################################### -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Sat Nov 20 10:05:38 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 20 Nov 2010 10:05:38 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CE78F62.7060707@v.loewis.de> Am 20.11.2010 05:11, schrieb Stephen J. Turnbull: > "Martin v. L?wis" writes: > > > The term "UCS-2" is a character set that can encode only encode 65536 > > characters; it thus refers to Unicode 1.1. According to the Unicode > > Consortium's FAQ, the term UCS-2 should be avoided these days. > > So what do you propose we call the Python implementation? A technical correct description would be to say that Python uses either 16-bit code units or 32-bit code units; for brevity, these can be called narrow and wide code units. > Strictly speaking, internally Python only encodes 65536 characters in > 2-octet builds. Its (Unicode) string-handling code does not know > about surrogates at all, AFAIK Here you are mistaken: it does indeed know about UTF-16 and surrogates in several places, e.g. in the UTF-8 codec, or in the repr() implementation; likewise in the parser. > and therefore is not UTF-16 conforming. I disagree. Python does "conform" to "UTF-16" (certainly in the sense that no UTF-16 specification ever mandates a certain Python API, and that Python follows all general requirements of the UTF-16 specification). > AFAIK this was not supposed to change in Python 3; indexing and > slicing go by code unit (isomorphic to UCS-n), not character, and due > to PEP 383 4-octet builds do not conform (internally) to UTF-32, and > can produce output that conforms to Unicode not at all (as a user > option, of course, but it's still non-conformant). What behavior specifically do you consider non-conforming, and what specific specification do you think it is not conforming to? For example, it *is* fully conforming with UTF-8. Regards, Martin From merwok at netwok.org Sat Nov 20 12:38:53 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Sat, 20 Nov 2010 12:38:53 +0100 Subject: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4 In-Reply-To: <4CE7452A.7050109@g.nevcal.com> References: <4CE7452A.7050109@g.nevcal.com> Message-ID: <4CE7B34D.4020309@netwok.org> Hello > cgitb.enable(0,"d:\temp") Isn?t that expanded to ?d: emp?? From ncoghlan at gmail.com Sat Nov 20 14:16:27 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 20 Nov 2010 23:16:27 +1000 Subject: [Python-Dev] [Python-checkins] pymigr: Build identification patch is updated, but only for Unix. In-Reply-To: References: Message-ID: On Sat, Nov 20, 2010 at 6:02 PM, georg.brandl wrote: > georg.brandl pushed abd0dc1328ce to pymigr: > > http://hg.python.org/pymigr/rev/abd0dc1328ce > changeset: ? 70:abd0dc1328ce > tag: ? ? ? ? tip > user: ? ? ? ?Georg Brandl > date: ? ? ? ?Sat Nov 20 09:01:03 2010 +0100 > summary: ? ? Build identification patch is updated, but only for Unix. > files: ? ? ? todo.txt Does this repository use the same set of hooks as distutils2? (I'm hoping not, since if it does, my change to the email hook didn't work...) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Nov 20 14:55:57 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 20 Nov 2010 23:55:57 +1000 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: References: <4CE2CF8F.4040500@jcea.es> <4CE55385.6080002@v.loewis.de> <4CE56331.3050508@v.loewis.de> <4CE5DD52.7050907@jcea.es> <20101119094657.1a7cc24a@mission> Message-ID: On Sat, Nov 20, 2010 at 2:51 AM, Georg Brandl wrote: > I'm at it. ?In fact, I think I will merge both todo.txt and tasks.txt > into the PEP. ?It's not more of a burden to update it there, and it's > more visible to the developer community. The latest checkin was definitely an improvement (especially the updated timeline). According to the PEP, the .hgeol rules aren't currently enforced server side - having such a hook in place before Hg went live was definitely one of the things we agreed on before the hgeol extension even existed in a usable form. For fixing whitespace issues (another open question mentioned in the PEP), "make patchcheck" can continue to handle that - no need to create a Hg specific extension for it. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Nov 20 16:21:32 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Nov 2010 01:21:32 +1000 Subject: [Python-Dev] [Python-checkins] r86566 - in python/branches/py3k: Doc/glossary.rst Doc/library/inspect.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS Misc/python-wing4.wpr In-Reply-To: <20101120150731.2D346E78E@mail.python.org> References: <20101120150731.2D346E78E@mail.python.org> Message-ID: On Sun, Nov 21, 2010 at 1:07 AM, michael.foord wrote: > +Fetching attributes statically > +------------------------------ > + > +Both :func:`getattr` and :func:`hasattr` can trigger code execution when > +fetching or checking for the existence of attributes. Descriptors, like > +properties, will be invoked and :meth:`__getattr__` and :meth:`__getattribute__` > +may be called. > + > +For cases where you want passive introspection, like documentation tools, this > +can be inconvenient. `getattr_static` has the same signature as :func:`getattr` > +but avoids executing code when it fetches attributes. This description feels a little strong to me - getattr_static still executes all those things on the metaclass as it retrieves the information it needs to do the "static" lookup. Leaving this original description (which assumes metaclass=type) alone and adding a note near the end of the section to say that metaclass code is still executed might be an improvement. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Sat Nov 20 16:29:13 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 20 Nov 2010 15:29:13 +0000 Subject: [Python-Dev] [Python-checkins] r86566 - in python/branches/py3k: Doc/glossary.rst Doc/library/inspect.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS Misc/python-wing4.wpr In-Reply-To: References: <20101120150731.2D346E78E@mail.python.org> Message-ID: <4CE7E949.5030300@voidspace.org.uk> On 20/11/2010 15:21, Nick Coghlan wrote: > On Sun, Nov 21, 2010 at 1:07 AM, michael.foord > wrote: >> +Fetching attributes statically >> +------------------------------ >> + >> +Both :func:`getattr` and :func:`hasattr` can trigger code execution when >> +fetching or checking for the existence of attributes. Descriptors, like >> +properties, will be invoked and :meth:`__getattr__` and :meth:`__getattribute__` >> +may be called. >> + >> +For cases where you want passive introspection, like documentation tools, this >> +can be inconvenient. `getattr_static` has the same signature as :func:`getattr` >> +but avoids executing code when it fetches attributes. > This description feels a little strong to me - getattr_static still > executes all those things on the metaclass as it retrieves the > information it needs to do the "static" lookup. Leaving this original > description (which assumes metaclass=type) alone and adding a note > near the end of the section to say that metaclass code is still > executed might be an improvement. Can you give an example of code in a metaclass that may be executed by getattr_static? It's not that I don't believe you I just can't think of an example. Looking up the class and the mro are the only two examples I can think of (klass.__mro__ and instance.__class__ - and they are noted in the docs?) but aren't metaclass specific. Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From solipsis at pitrou.net Sat Nov 20 16:42:30 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 20 Nov 2010 16:42:30 +0100 Subject: [Python-Dev] r86570 - in python/branches/py3k: Lib/unittest/case.py Lib/unittest/test/test_case.py Misc/NEWS References: <20101120153426.47AC0ED9A@mail.python.org> Message-ID: <20101120164230.5dc326bc@pitrou.net> On Sat, 20 Nov 2010 16:34:26 +0100 (CET) michael.foord wrote: > + > + def testPickle(self): > + # Issue 10326 > + > + # Can't use TestCase classes defined in Test class as > + # pickle does not work with inner classes > + test = unittest.TestCase('run') > + for protocol in range(pickle.HIGHEST_PROTOCOL + 1): > + > + # blew up prior to fix > + pickled_test = pickle.dumps(test, protocol=protocol) You must also check that the object can be unpickled, otherwise making TestCase picklable is not only pointless, but misleading the user. Other classes which claim to be picklable (such as e.g. io.BytesIO) are careful to check that unpickling works fine and produces an usable object. Regards Antoine. From fuzzyman at voidspace.org.uk Sat Nov 20 16:48:59 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 20 Nov 2010 15:48:59 +0000 Subject: [Python-Dev] r86570 - in python/branches/py3k: Lib/unittest/case.py Lib/unittest/test/test_case.py Misc/NEWS In-Reply-To: <20101120164230.5dc326bc@pitrou.net> References: <20101120153426.47AC0ED9A@mail.python.org> <20101120164230.5dc326bc@pitrou.net> Message-ID: <4CE7EDEB.9080706@voidspace.org.uk> On 20/11/2010 15:42, Antoine Pitrou wrote: > On Sat, 20 Nov 2010 16:34:26 +0100 (CET) > michael.foord wrote: >> + >> + def testPickle(self): >> + # Issue 10326 >> + >> + # Can't use TestCase classes defined in Test class as >> + # pickle does not work with inner classes >> + test = unittest.TestCase('run') >> + for protocol in range(pickle.HIGHEST_PROTOCOL + 1): >> + >> + # blew up prior to fix >> + pickled_test = pickle.dumps(test, protocol=protocol) > You must also check that the object can be unpickled, otherwise > making TestCase picklable is not only pointless, but misleading the > user. Other classes which claim to be picklable (such as e.g. > io.BytesIO) are careful to check that unpickling works fine and > produces an usable object. Well, given the *particular* bug it is fixing, ensuring that the TestCase instances can be pickled is enough. If they fail to unpickle that is a bug in pickle and not in unittest. *However*, the test is very easy to extend to what you suggest so I have done it. All the best, Michael > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From solipsis at pitrou.net Sat Nov 20 16:59:49 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 20 Nov 2010 16:59:49 +0100 Subject: [Python-Dev] r86570 - in python/branches/py3k: Lib/unittest/case.py Lib/unittest/test/test_case.py Misc/NEWS In-Reply-To: <4CE7EDEB.9080706@voidspace.org.uk> References: <20101120153426.47AC0ED9A@mail.python.org> <20101120164230.5dc326bc@pitrou.net> <4CE7EDEB.9080706@voidspace.org.uk> Message-ID: <1290268789.3560.12.camel@localhost.localdomain> Le samedi 20 novembre 2010 ? 15:48 +0000, Michael Foord a ?crit : > On 20/11/2010 15:42, Antoine Pitrou wrote: > > On Sat, 20 Nov 2010 16:34:26 +0100 (CET) > > michael.foord wrote: > >> + > >> + def testPickle(self): > >> + # Issue 10326 > >> + > >> + # Can't use TestCase classes defined in Test class as > >> + # pickle does not work with inner classes > >> + test = unittest.TestCase('run') > >> + for protocol in range(pickle.HIGHEST_PROTOCOL + 1): > >> + > >> + # blew up prior to fix > >> + pickled_test = pickle.dumps(test, protocol=protocol) > > You must also check that the object can be unpickled, otherwise > > making TestCase picklable is not only pointless, but misleading the > > user. Other classes which claim to be picklable (such as e.g. > > io.BytesIO) are careful to check that unpickling works fine and > > produces an usable object. > > Well, given the *particular* bug it is fixing, ensuring that the > TestCase instances can be pickled is enough. If they fail to unpickle > that is a bug in pickle and not in unittest. It wouldn't be, no. pickle provides several different APIs to ensure that state gets correctly stored *and* restored, but it's up to application classes such as TestCase to ensure that they implement those APIs correctly for the intended behaviour. Therefore, checking that pickling "works" fine (or, rather, seems to work) is only half ot the job. (for example, if you define a __getstate__, chances are you must define a __setstate__ too, and it is your job to make it work properly) Antoine. From ncoghlan at gmail.com Sat Nov 20 17:01:06 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Nov 2010 02:01:06 +1000 Subject: [Python-Dev] [Python-checkins] r86566 - in python/branches/py3k: Doc/glossary.rst Doc/library/inspect.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS Misc/python-wing4.wpr In-Reply-To: <4CE7E949.5030300@voidspace.org.uk> References: <20101120150731.2D346E78E@mail.python.org> <4CE7E949.5030300@voidspace.org.uk> Message-ID: On Sun, Nov 21, 2010 at 1:29 AM, Michael Foord wrote: > Can you give an example of code in a metaclass that may be executed by > getattr_static? It's not that I don't believe you I just can't think of an > example. Looking up the class and the mro are the only two examples I can > think of (klass.__mro__ and instance.__class__ - and they are noted in the > docs?) but aren't metaclass specific. The description heavily implies that arbitrary Python code won't be executed by calling getattr_static, and that isn't necessarily true. It's almost certain to be true in the case when the metaclass is type, but can't be guaranteed otherwise. The retrieval of __class__ is a normal lookup on the object, so it can trigger all of the things getattr_static is trying to avoid (unavoidable if you want to support proxy classes at all), and the lookup of __mro__ invokes all of those things on the metaclass. I'll see if I'm still of the same opinion after I sleep on it, but my first impression of the docs was that they slightly oversold the strength of the "doesn't execute arbitrary code" aspect of the new function. The existing caveats were all relating to when getattr() and getattr_static() might give different answers, while the additional caveats I was suggesting related to cases where arbitrary code may still be executed. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Sat Nov 20 17:06:59 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 20 Nov 2010 16:06:59 +0000 Subject: [Python-Dev] [Python-checkins] r86566 - in python/branches/py3k: Doc/glossary.rst Doc/library/inspect.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS Misc/python-wing4.wpr In-Reply-To: References: <20101120150731.2D346E78E@mail.python.org> <4CE7E949.5030300@voidspace.org.uk> Message-ID: <4CE7F223.5040009@voidspace.org.uk> On 20/11/2010 16:01, Nick Coghlan wrote: > On Sun, Nov 21, 2010 at 1:29 AM, Michael Foord > wrote: >> Can you give an example of code in a metaclass that may be executed by >> getattr_static? It's not that I don't believe you I just can't think of an >> example. Looking up the class and the mro are the only two examples I can >> think of (klass.__mro__ and instance.__class__ - and they are noted in the >> docs?) but aren't metaclass specific. > The description heavily implies that arbitrary Python code won't be > executed by calling getattr_static, and that isn't necessarily true. > It's almost certain to be true in the case when the metaclass is type, > but can't be guaranteed otherwise. Given the way that member lookups are done by getattr_static I don't think any assumptions about the metaclass are made. I'm happy to be proven wrong (but would rather fix it than document it as an exception). (Actually we assume the metaclass doesn't use __slots__, but only because it isn't *possible* for a metaclass to use __slots__.) > The retrieval of __class__ is a > normal lookup on the object, so it can trigger all of the things > getattr_static is trying to avoid (unavoidable if you want to support > proxy classes at all), and the lookup of __mro__ invokes all of those > things on the metaclass. __class__ and mro lookup are noted in the docs as being exceptions. We could actually remove the __class__ lookup from the list of exceptions by using type(...) instead of obj.__class__. > I'll see if I'm still of the same opinion after I sleep on it, but my > first impression of the docs was that they slightly oversold the > strength of the "doesn't execute arbitrary code" aspect of the new > function. The existing caveats were all relating to when getattr() and > getattr_static() might give different answers, while the additional > caveats I was suggesting related to cases where arbitrary code may > still be executed. I'm happy to change the wording to make the promise less strong. All the best, Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fuzzyman at voidspace.org.uk Sat Nov 20 17:10:42 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 20 Nov 2010 16:10:42 +0000 Subject: [Python-Dev] r86570 - in python/branches/py3k: Lib/unittest/case.py Lib/unittest/test/test_case.py Misc/NEWS In-Reply-To: <1290268789.3560.12.camel@localhost.localdomain> References: <20101120153426.47AC0ED9A@mail.python.org> <20101120164230.5dc326bc@pitrou.net> <4CE7EDEB.9080706@voidspace.org.uk> <1290268789.3560.12.camel@localhost.localdomain> Message-ID: <4CE7F302.8090909@voidspace.org.uk> On 20/11/2010 15:59, Antoine Pitrou wrote: > Le samedi 20 novembre 2010 ? 15:48 +0000, Michael Foord a ?crit : >> On 20/11/2010 15:42, Antoine Pitrou wrote: >>> On Sat, 20 Nov 2010 16:34:26 +0100 (CET) >>> michael.foord wrote: >>>> + >>>> + def testPickle(self): >>>> + # Issue 10326 >>>> + >>>> + # Can't use TestCase classes defined in Test class as >>>> + # pickle does not work with inner classes >>>> + test = unittest.TestCase('run') >>>> + for protocol in range(pickle.HIGHEST_PROTOCOL + 1): >>>> + >>>> + # blew up prior to fix >>>> + pickled_test = pickle.dumps(test, protocol=protocol) >>> You must also check that the object can be unpickled, otherwise >>> making TestCase picklable is not only pointless, but misleading the >>> user. Other classes which claim to be picklable (such as e.g. >>> io.BytesIO) are careful to check that unpickling works fine and >>> produces an usable object. >> Well, given the *particular* bug it is fixing, ensuring that the >> TestCase instances can be pickled is enough. If they fail to unpickle >> that is a bug in pickle and not in unittest. > It wouldn't be, no. pickle provides several different APIs to ensure > that state gets correctly stored *and* restored, but it's up to > application classes such as TestCase to ensure that they implement those > APIs correctly for the intended behaviour. Therefore, checking that > pickling "works" fine (or, rather, seems to work) is only half ot the > job. > > (for example, if you define a __getstate__, chances are you must define > a __setstate__ too, and it is your job to make it work properly) Yes, but unittest.TestCase doesn't implement any of those APIs (and if we did we would *definitely* need to test unpickling). That aside I have extended the test in the way you suggest. Actually it would be nice to implement custom pickling / unpickling methods to allow Python 2.7 / 3.2 pickled TestCases to be unpickled on earlier versions of Python. I couldn't see how to change the class name in the pickle using the pickle protocol methods. Suggestions welcomed. Michael > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fuzzyman at voidspace.org.uk Sat Nov 20 17:28:40 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 20 Nov 2010 16:28:40 +0000 Subject: [Python-Dev] [Python-checkins] r86566 - in python/branches/py3k: Doc/glossary.rst Doc/library/inspect.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS Misc/python-wing4.wpr In-Reply-To: <4CE7F223.5040009@voidspace.org.uk> References: <20101120150731.2D346E78E@mail.python.org> <4CE7E949.5030300@voidspace.org.uk> <4CE7F223.5040009@voidspace.org.uk> Message-ID: <4CE7F738.90706@voidspace.org.uk> On 20/11/2010 16:06, Michael Foord wrote: > On 20/11/2010 16:01, Nick Coghlan wrote: > [snip...] >> The retrieval of __class__ is a >> normal lookup on the object, so it can trigger all of the things >> getattr_static is trying to avoid (unavoidable if you want to support >> proxy classes at all), and the lookup of __mro__ invokes all of those >> things on the metaclass. > > __class__ and mro lookup are noted in the docs as being exceptions. We > could actually remove the __class__ lookup from the list of exceptions > by using type(...) instead of obj.__class__. > Done. >> I'll see if I'm still of the same opinion after I sleep on it, but my >> first impression of the docs was that they slightly oversold the >> strength of the "doesn't execute arbitrary code" aspect of the new >> function. The existing caveats were all relating to when getattr() and >> getattr_static() might give different answers, while the additional >> caveats I was suggesting related to cases where arbitrary code may >> still be executed. > I'm happy to change the wording to make the promise less strong. I've also removed the __mro__ exception. This is done with: type.__dict__['__mro__'].__get__(klass) If you can think of any other exceptions then please let me know. Michael > All the best, > > Michael > >> Cheers, >> Nick. >> > > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From v+python at g.nevcal.com Sat Nov 20 19:19:11 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sat, 20 Nov 2010 10:19:11 -0800 Subject: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4 In-Reply-To: <4CE7B34D.4020309@netwok.org> References: <4CE7452A.7050109@g.nevcal.com> <4CE7B34D.4020309@netwok.org> Message-ID: <4CE8111F.9060502@g.nevcal.com> On 11/20/2010 3:38 AM, ?ric Araujo wrote: > Hello > >> cgitb.enable(0,"d:\temp") > Isn?t that expanded to ?d: emp?? > Oops. Yes, that fixes the problem with creation of the temp file, thanks for catching that. I now get a complete report of the original error in the temp file (below). I am a bit less confused now... but it seems that there are still a number of issues. Here is an enumeration of problems I was hard pressed to make before you removed my confusion on this issue. 1. cgitb should expect to report to a binary stdout, using whatever encoding (possibly ASCII) that seems appropriate for the output that in generates. 2. Some appropriate documentation or API or both should be provided to enable a script to set "binary" mode for stdout for CGI scripts. This link demonstrates the confusion (wish I had found it earlier) that is encountered by such lack. One must tell msvcrt the stream is binary (I had figured that out early on), one must also sidestep the use of the cp1252 default when printing binary, one must also choose a proper text encoding corresponding to the HTTP headers sent. My second email in this thread, sent a few hours after the first, shows a convenient set of cures for all but msvcrt (as long as only "write" is used for writing. "print" support could be added, similarly). Likely something along this line is needed for stdin as well, I haven't yet experimented with uploading binary content to a CGI. One could speculate about having the Python runtime auto-detect CGI mode, but I don't know of any foolproof technique for that, and the selection of the "proper" text encoding depends on the details of the CGI, so having instead an API or two that assists with doing this sort of thing would be better; the need for documentation, at least, seems imperative. 3. subprocess documentation could be improved to point out that when using subprocess.PIPE to talk to a Python subprocess, that the communications will be in binary. Again, I don't know of any way to autodetect the subprocess environment, but if it were possible to select an appropriate encoding and use it consistently on both sides of the PIPE, that would be a convenience to its use; if not possible, documenting the issue, and providing an API to use to easily select such encodings both in client and server, would be helpful. While the layers are all there, and ".buffer" is documented for TextIOWrapper, the use of sys.stdout.buffer and the fact that it has a full set of operations isn't immediately obvious from the reference material; perhaps it is in a tutorial I haven't found, but... I was looking, and didn't find it. Of course, subprocess may launch non-Python programs; they will have their own ideas of binary vs text encoding, so it is important that it is convenient to match them on the Python side. It would be nice if subprocess had a mechanism for providing no-deadlock stdout data to the parent prior to the child terminating. A CGI implementation via subprocess shouldn't accumulate all of stdout (or all of stderr, for that matter, although less important). I don't (yet) know enough about Python threading to know if this is possible, but it certainly would be useful. 4. http.server has a number of bugs and limitations. 4a. _url_collapse_path_split seems inefficient (although I have to benchmark it against what I think would be more efficient), and for its only use within http.server it produces the wrong information, so the information has to be recombined and resplit to make it function properly, adding to the perception of inefficiency. 4b. Detection of "executable" on Windows is simply wrong. Unix execution bits do not exist. 4c. is_cgi doesn't properly handle PATHINFO parts of the path, this is the other half of 4a. The Python2.x CGIHTTPServer.py had this right, but the introduction and use of _url_collapse_path_split broke it. 4d. Searching for a ? to find an explicit query string should use .find('?') rather than .rfind('?') as there is no prohibition on using '?' within a query string, AFAIK. 4e. doesn't set the REQUEST_URI, HTTP_HOST, or HTTP_PORT environment variables for the CGI. 4f. Should not send the 200 response until it sees if the CGI sends a Status: header. 4g. Should not buffer all of stdout: subprocess.communicate is inappropriate for a web server CGI interface. The data should stream through to avoid consuming inordinate amounts of memory. The only solution within the current limitations of subprocess is to abandon stderr, force the CGI to do its own error logging, and use shutil.copyfileobj to hook up p.stdout to self.wfile once the Status: message processing has happened. 4h. Doesn't seem to close p.stdin (I'm not sure if that is necessary, it may happen when p is garbage collected, but effort was made to close p.stdout and p.stderr, which seem similar.) *TypeError* Python 3.2a4: c:\python32\python.exe Sat Nov 20 09:28:41 2010 A problem occurred in a Python script. Here is the sequence of function calls leading up to the error, in the order they occurred. d:\my\py\test12.py in **() 4 import cgitb 5 sys.stdout.write("out") 6 fhb = open("fhb", "wb") 7 cgitb.enable(0,"d:\\temp") => 8 fhb.write("abcdef") # try writing non-binary to binary file. Expect an error, of course. *fhb* = <_io.BufferedWriter name='fhb'>, fhb.*write* = *TypeError*: 'str' does not support the buffer interface args = ("'str' does not support the buffer interface",) with_traceback = -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sat Nov 20 23:32:28 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 20 Nov 2010 17:32:28 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CE78F62.7060707@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> Message-ID: On Sat, Nov 20, 2010 at 4:05 AM, "Martin v. L?wis" wrote: .. > A technical correct description would be to say that Python uses either > 16-bit code units or 32-bit code units; for brevity, these can be called > narrow and wide code units. +1 PEP 261 introduced terms "wide Py_UNICODE" and "narrow Py_UNICODE," but when discussion is at Python level, I don't think we should use names of C typedefs. I think "wide/narrow Unicode" builds describe the two options clearly and unambiguously. I prefer Python-specific terminology to Unicode terms because in Python reference documentation we often discuss details that are outside of the scope of Unicode Standard. For example, interpretation of lone surrogates on narrow builds is one such detail. From ziade.tarek at gmail.com Sun Nov 21 00:05:12 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sun, 21 Nov 2010 00:05:12 +0100 Subject: [Python-Dev] Reminder: Distutils vs Distutils2 Message-ID: Hello, I have seen some efforts recently to improve Distutils in the standard library, Just a quick reminder of the status of Distutils: it's frozen and is just being bug fixed at this time. The work I done last year was reverted and pushed to Distutils2. A lot of work has been done since then, and we had 4 GSOC students working this summer on Distutils2. It's backward-incompatible, so we can remove the things we don't like and add new things w/o suffering from backward compatibility pains. So if you want to improve the tool, or if you have some pending changes to Distutils, I would encourage you to join the Distutils2 effort and not to waste time on Distutils anymore. The patches that did not make it to Distutils can still be added in Distutils2, for most of them. The workflow we currently use to change the code is as follow and make it easy for everyone to contribute: 1. clone http://bitbucket.org/tarek/distutils2 2. discuss / propose a patch on IRC (#distutils - Freenode) or on the dedicated mailing list (http://groups.google.com/group/the-fellowship-of-the-packaging) 3. I review and merge all changes at bitbucket, then push them on http://hg,python.org/distutils2 Crazy ideas are welcome. "setup.py" is gone in d2 for instance ;) Thanks ! Regards. Tarek -- Tarek Ziad? | http://ziade.org From ziade.tarek at gmail.com Sun Nov 21 00:15:41 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sun, 21 Nov 2010 00:15:41 +0100 Subject: [Python-Dev] Reminder: Distutils vs Distutils2 In-Reply-To: References: Message-ID: On Sun, Nov 21, 2010 at 12:05 AM, Tarek Ziad? wrote: .. > Crazy ideas are welcome. "setup.py" is gone in d2 for instance ;) But you can still use a similar form if you want - just to mention From ncoghlan at gmail.com Sun Nov 21 04:52:19 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Nov 2010 13:52:19 +1000 Subject: [Python-Dev] [Python-checkins] r86566 - in python/branches/py3k: Doc/glossary.rst Doc/library/inspect.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS Misc/python-wing4.wpr In-Reply-To: <4CE7F223.5040009@voidspace.org.uk> References: <20101120150731.2D346E78E@mail.python.org> <4CE7E949.5030300@voidspace.org.uk> <4CE7F223.5040009@voidspace.org.uk> Message-ID: On Sun, Nov 21, 2010 at 2:06 AM, Michael Foord wrote: >> I'll see if I'm still of the same opinion after I sleep on it, but my >> first impression of the docs was that they slightly oversold the >> strength of the "doesn't execute arbitrary code" aspect of the new >> function. The existing caveats were all relating to when getattr() and >> getattr_static() might give different answers, while the additional >> caveats I was suggesting related to cases where arbitrary code may >> still be executed. > > I'm happy to change the wording to make the promise less strong. Your latest changes may have actually made the stronger wording accurate (I certainly can't think of any loopholes off the top of my head). If you did still want to soften the wording, I'd be inclined to replace the word "avoids" with "minimises" in the appropriate places. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sun Nov 21 04:54:11 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Nov 2010 13:54:11 +1000 Subject: [Python-Dev] [Python-checkins] r86566 - in python/branches/py3k: Doc/glossary.rst Doc/library/inspect.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS Misc/python-wing4.wpr In-Reply-To: <20101120150731.2D346E78E@mail.python.org> References: <20101120150731.2D346E78E@mail.python.org> Message-ID: On Sun, Nov 21, 2010 at 1:07 AM, michael.foord wrote: > Author: michael.foord > Date: Sat Nov 20 16:07:30 2010 > New Revision: 86566 > > Log: > Issue 9732: addition of getattr_static to the inspect module > > Modified: > ? python/branches/py3k/Doc/glossary.rst > ? python/branches/py3k/Doc/library/inspect.rst > ? python/branches/py3k/Lib/inspect.py > ? python/branches/py3k/Lib/test/test_inspect.py > ? python/branches/py3k/Misc/NEWS > ? python/branches/py3k/Misc/python-wing4.wpr Unrelated to my previous comment - when adding inspect.getgeneratorstate, I noticed that inspect.getattr_static isn't mentioned in the 3.2 What's New yet (I put a XXX placeholder in for you/Raymond). -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From v+python at g.nevcal.com Sun Nov 21 08:52:45 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sat, 20 Nov 2010 23:52:45 -0800 Subject: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4 In-Reply-To: <4CE8111F.9060502@g.nevcal.com> References: <4CE7452A.7050109@g.nevcal.com> <4CE7B34D.4020309@netwok.org> <4CE8111F.9060502@g.nevcal.com> Message-ID: <4CE8CFCD.4040906@g.nevcal.com> On 11/20/2010 10:19 AM, Glenn Linderman wrote: > Oops. Yes, that fixes the problem with creation of the temp file, > thanks for catching that. I now get a complete report of the > original error in the temp file (below). I am a bit less confused > now... but it seems that there are still a number of issues. Here is > an enumeration of problems I was hard pressed to make before you > removed my confusion on this issue. Related issues, regarding binary stream requirements for cgi interface. Perhaps the cgi module should have the API to set binary mode. http://bugs.python.org/issue1610654 http://bugs.python.org/issue8077 http://bugs.python.org/issue4953 Sadly, cgi.py input handling seems to depend on the email module, thought to be fixed for 3.2, but it is not clear if that has been achieved, or if the surrogate encode workaround is sufficient for this. More testing needed, but I don't have such a test case developed yet. > 1. cgitb should expect to report to a binary stdout, using whatever > encoding (possibly ASCII) that seems appropriate for the output that > in generates. Maybe cgi.py should have an API to set the stdin and stdout to binary streams. Although cgi.py deals more with stdin than stdout, cgitb deals more with stdout. Created http://bugs.python.org/issue10479 > > 2. Some appropriate documentation or API or both should be provided to > enable a script to set "binary" mode for stdout for CGI scripts. This > link > > demonstrates the confusion (wish I had found it earlier) that is > encountered by such lack. One must tell msvcrt the stream is binary > (I had figured that out early on), one must also sidestep the use of > the cp1252 default when printing binary, one must also choose a proper > text encoding corresponding to the HTTP headers sent. My second email > in this thread, sent a few hours after the first, shows a convenient > set of cures for all but msvcrt (as long as only "write" is used for > writing. "print" support could be added, similarly). Likely > something along this line is needed for stdin as well, I haven't yet > experimented with uploading binary content to a CGI. > > One could speculate about having the Python runtime auto-detect CGI > mode, but I don't know of any foolproof technique for that, and the > selection of the "proper" text encoding depends on the details of the > CGI, so having instead an API or two that assists with doing this sort > of thing would be better; the need for documentation, at least, seems > imperative. Created http://bugs.python.org/issue10480 > > 3. subprocess documentation could be improved to point out that when > using subprocess.PIPE to talk to a Python subprocess, that the > communications will be in binary. Again, I don't know of any way to > autodetect the subprocess environment, but if it were possible to > select an appropriate encoding and use it consistently on both sides > of the PIPE, that would be a convenience to its use; if not possible, > documenting the issue, and providing an API to use to easily select > such encodings both in client and server, would be helpful. > > While the layers are all there, and ".buffer" is documented for > TextIOWrapper, the use of sys.stdout.buffer and the fact that it has a > full set of operations isn't immediately obvious from the reference > material; perhaps it is in a tutorial I haven't found, but... I was > looking, and didn't find it. > > Of course, subprocess may launch non-Python programs; they will have > their own ideas of binary vs text encoding, so it is important that it > is convenient to match them on the Python side. > > It would be nice if subprocess had a mechanism for providing > no-deadlock stdout data to the parent prior to the child terminating. > A CGI implementation via subprocess shouldn't accumulate all of stdout > (or all of stderr, for that matter, although less important). I don't > (yet) know enough about Python threading to know if this is possible, > but it certainly would be useful. http://bugs.python.org/issue1048 for subprocess to document that communicate produces byte stream output. http://bugs.python.org/issue10482 for subprocess enhancements to handle more cases without deadlock. Found http://bugs.python.org/issue4571 which documents how to switch stdin/stdout/stderr to binary mode, and even back! I couldn't track the documented change to the actual documentation, though, but I did find it in section 26.1, under the documentation for the three stdio streams: def make_streams_binary(): sys.stdin = sys.stdin.detach() sys.stdout = sys.stdout.detach() > 4. http.server has a number of bugs and limitations. > 4a. _url_collapse_path_split seems inefficient (although I have to > benchmark it against what I think would be more efficient), and for > its only use within http.server it produces the wrong information, so > the information has to be recombined and resplit to make it function > properly, adding to the perception of inefficiency. > 4b. Detection of "executable" on Windows is simply wrong. Unix > execution bits do not exist. http://bugs.python.org/issue10483 for 4b. > 4c. is_cgi doesn't properly handle PATHINFO parts of the path, this is > the other half of 4a. The Python2.x CGIHTTPServer.py had this right, > but the introduction and use of _url_collapse_path_split broke it. http://bugs.python.org/issue10484 for 4a and 4c. > 4d. Searching for a ? to find an explicit query string should use > .find('?') rather than .rfind('?') as there is no prohibition on using > '?' within a query string, AFAIK. http://bugs.python.org/issue10485 for 4d. > 4e. doesn't set the REQUEST_URI, HTTP_HOST, or HTTP_PORT environment > variables for the CGI. http://bugs.python.org/issue10486 for 4e. > 4f. Should not send the 200 response until it sees if the CGI sends a > Status: header. http://bugs.python.org/issue10487 for 4f and 4g. > 4g. Should not buffer all of stdout: subprocess.communicate is > inappropriate for a web server CGI interface. The data should stream > through to avoid consuming inordinate amounts of memory. The only > solution within the current limitations of subprocess is to abandon > stderr, force the CGI to do its own error logging, and use > shutil.copyfileobj to hook up p.stdout to self.wfile once the Status: > message processing has happened. > 4h. Doesn't seem to close p.stdin (I'm not sure if that is necessary, > it may happen when p is garbage collected, but effort was made to > close p.stdout and p.stderr, which seem similar.) Discovered that subprocess.communicate closes p.stdin, so it wasn't needed until I quit using .communicate in my version of the code. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sun Nov 21 13:55:12 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 21 Nov 2010 21:55:12 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CE78F62.7060707@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> Message-ID: <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > Am 20.11.2010 05:11, schrieb Stephen J. Turnbull: > > "Martin v. L?wis" writes: > > > > > The term "UCS-2" is a character set that can encode only encode 65536 > > > characters; it thus refers to Unicode 1.1. According to the Unicode > > > Consortium's FAQ, the term UCS-2 should be avoided these days. > > > > So what do you propose we call the Python implementation? > > A technical correct description would be to say that Python uses either > 16-bit code units or 32-bit code units; for brevity, these can be called > narrow and wide code units. I agree that's technically correct. Unfortunately, it's also useless to anybody who doesn't already know more about Unicode than anybody should have to know. > > and therefore is not UTF-16 conforming. > > I disagree. Python does "conform" to "UTF-16" I'm sure the codecs do. But the Unicode standard doesn't care about the parts of the process, it cares about what it does as a whole. Python's internal coding does not conform to UTF-16, and that internal coding can, under certain conditions, escape to the outside world as invalid "Unicode" output. > > AFAIK this was not supposed to change in Python 3; indexing and > > slicing go by code unit (isomorphic to UCS-n), not character, and due > > to PEP 383 4-octet builds do not conform (internally) to UTF-32, and > > can produce output that conforms to Unicode not at all (as a user > > option, of course, but it's still non-conformant). > > What behavior specifically do you consider non-conforming, and what > specific specification do you think it is not conforming to? For > example, it *is* fully conforming with UTF-8. Oh, f = open('/tmp/broken','wt',encoding='utf8',errors='surrogateescape') f.write(chr(int('dc80',16))) f.close() for one. That produces a non-UTF-8 file in a 32-bit-code-unit build. You can say, "oh, but that's not really a UTF-8 codec", and I'd agree. Nevertheless, the program is able to produce output from internal "Unicode" strings that does not conform to Unicode at all. A Unicode- conforming Python implementation would error at the chr() call, or perhaps would not provide surrogateescape error handlers. It is, of course, possible to write Python programs that conform (and easier than in any other language I know), but Python itself does not conform to post-1.1 Unicode standards. Too bad for the standards: "Although practicality beats purity." The point is that internal code is *not* UTF-16 (or -32), but it *is* isomorphic to UCS-2 (or -4). *That is very useful information to users*, it's not a technical detail of interest only to Unicode geeks. It means that if you stick to defined characters in the BMP when giving Python input, then slicing and indexing unicode (Python 2) or str (Python 3) objects gives only valid output even in builds with 16-bit code units. OTOH, invalid processing (involving functions like 'chr' or input using surrogateescape codecs) can lead to invalid output even in builds with 32-bit code units. IMO, saying "UCS-2" or "UCS-4" tells ordinary developers most of what they need to know about the limitations of their Python vis-a-vis full conformance, at least with respect to the string manipulation functions. From rdmurray at bitdance.com Sun Nov 21 18:18:20 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 21 Nov 2010 12:18:20 -0500 Subject: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4 In-Reply-To: <4CE8CFCD.4040906@g.nevcal.com> References: <4CE7452A.7050109@g.nevcal.com> <4CE7B34D.4020309@netwok.org> <4CE8111F.9060502@g.nevcal.com> <4CE8CFCD.4040906@g.nevcal.com> Message-ID: <20101121171821.195552194AC@kimball.webabinitio.net> On Sat, 20 Nov 2010 23:52:45 -0800, Glenn Linderman wrote: > Sadly, cgi.py input handling seems to depend on the email module, > thought to be fixed for 3.2, but it is not clear if that has been > achieved, or if the surrogate encode workaround is sufficient for this. > More testing needed, but I don't have such a test case developed yet. Indeed, this should theoretically be fixable now. The email module is now perfectly capable of both consuming and producing binary data. The user of the module doesn't need to care how this was achieved unless they want to do processing of non-RFC conformant data. I want to look at the CGI issue, but I'm not sure when I'll get to it. -- R. David Murray www.bitdance.com From jcea at jcea.es Sun Nov 21 18:27:42 2010 From: jcea at jcea.es (Jesus Cea) Date: Sun, 21 Nov 2010 18:27:42 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <4CE2CF8F.4040500@jcea.es> References: <4CE2CF8F.4040500@jcea.es> Message-ID: <4CE9568E.4010102@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 What is the impact in the buildbot architecture?. Slaves must do anything?. At least they need to have mercurial installed, I guess. What, as a buildslave manager, must I do to ready my server for the migration?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOlWjplgi5GaxT1NAQKwJAP/W1w/mn3Jv9XECxGCLKFj1Xvjz4fKq8im e1oKpvrl5hzXfKfYtIC4K2fy5G4O3iP1gS/Iwy0iGSSqcpnxFIfpwcTpjigRGaBi rpZp956TosaSLTGZxS2Wb11KFxsGlhAcgVF2ooFF7Z+wL73wCyVjfUqMXCB/50Nr dztlJuv3Wvg= =ntFy -----END PGP SIGNATURE----- From rdmurray at bitdance.com Sun Nov 21 18:38:25 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 21 Nov 2010 12:38:25 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20101121173825.B1BFB235977@kimball.webabinitio.net> On Sun, 21 Nov 2010 21:55:12 +0900, "Stephen J. Turnbull" wrote: > "Martin v. L??wis" writes: > > Am 20.11.2010 05:11, schrieb Stephen J. Turnbull: > > > "Martin v. L??wis" writes: > > > > > > > The term "UCS-2" is a character set that can encode only encode 65536 > > > > characters; it thus refers to Unicode 1.1. According to the Unicode > > > > Consortium's FAQ, the term UCS-2 should be avoided these days. > > > > > > So what do you propose we call the Python implementation? > > > > A technical correct description would be to say that Python uses either > > 16-bit code units or 32-bit code units; for brevity, these can be called > > narrow and wide code units. > > I agree that's technically correct. Unfortunately, it's also useless > to anybody who doesn't already know more about Unicode than anybody > should have to know. [...] > The point is that internal code is *not* UTF-16 (or -32), but it *is* > isomorphic to UCS-2 (or -4). *That is very useful information to > users*, it's not a technical detail of interest only to Unicode geeks. > It means that if you stick to defined characters in the BMP when > giving Python input, then slicing and indexing unicode (Python 2) or > str (Python 3) objects gives only valid output even in builds with > 16-bit code units. OTOH, invalid processing (involving functions like > 'chr' or input using surrogateescape codecs) can lead to invalid > output even in builds with 32-bit code units. > > IMO, saying "UCS-2" or "UCS-4" tells ordinary developers most of what > they need to know about the limitations of their Python vis-a-vis full > conformance, at least with respect to the string manipulation functions. I'm sorry, but I have to disagree. As a relative unicode ignoramus, "UCS-2" and "UCS-4" convey almost no information to me, and the bits I have heard about them on this list have only confused me. On the other hand, I understand that 'narrow' means that fewer bytes are used for each internal character, meaning that some unicode characters need to be represented by more than one string element, and thus that slicing strings containing such characters on a narrow build causes problems. Now, you could tell me the same information using the terms 'UCS-2' and 'UCS-4' instead of 'narrow' and 'wide', but to my ear 'narrow' and 'wide' convey a better gut level feeling for what is going on than 'UCS-2' and 'UCS-4' do. And it avoids any question of whether or not Python's internal representation actually conforms to whatever standard it is that UCS refers to, a point on which there seems to be some dissension. Having written the above, I googled for UCS-2 and got the Wikipedia article on UTF16/UCS-2 [1]. Scanning that article, I do not see anything that would clue me in to the problems of slicing strings in a Python narrow build. Indeed, reading that article with my limited unicode knowledge, if I were told Python used UCS-2, I would assume that non-BMP characters could not be processed by a Python narrow build. -- R. David Murray www.bitdance.com [1] http://en.wikipedia.org/wiki/UTF-16/UCS-2 From g.brandl at gmx.net Sun Nov 21 18:58:53 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 21 Nov 2010 18:58:53 +0100 Subject: [Python-Dev] Mercurial Schedule In-Reply-To: <4CE9568E.4010102@jcea.es> References: <4CE2CF8F.4040500@jcea.es> <4CE9568E.4010102@jcea.es> Message-ID: Am 21.11.2010 18:27, schrieb Jesus Cea: > What is the impact in the buildbot architecture?. Slaves must do > anything?. At least they need to have mercurial installed, I guess. > > What, as a buildslave manager, must I do to ready my server for the > migration?. Apart from having Mercurial installed and "hg" in the PATH (that will be important for Windows I assume), I don't think anything else is required. Georg From raymond.hettinger at gmail.com Sun Nov 21 19:17:57 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 21 Nov 2010 10:17:57 -0800 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <20101121173825.B1BFB235977@kimball.webabinitio.net> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> Message-ID: <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> On Nov 21, 2010, at 9:38 AM, R. David Murray wrote: > > I'm sorry, but I have to disagree. As a relative unicode ignoramus, > "UCS-2" and "UCS-4" convey almost no information to me, and the bits I > have heard about them on this list have only confused me. From the users point of view, it doesn't much matter which encoding is used internally. Neither UTF-16 nor UCS-2 is exactly correct anyway. The former encodes the entire range of unicode characters in a variable length code (a character is usually 2 bytes but is sometimes 4 bytes long). The latter encodes only a subset of unicode (the basic mulitlingual plane) in a fixed-length code of bytes per character). What we use internally looks like utf-16 but a character encoded with 4 bytes is treated as two 2-byte characters (hence the subject of this thread). Our hybrid internal coding lets use handle the entire range of unicode while getting speed and simplicity by doing len() and slicing with a surrogate pair being treated as two separate characters). For the "wide" build, the entire range of unicode is encoded at 4 bytes per character and slicing/len operate correctly since every character is the same length. This used to be called UCS-4 and is now UTF-32. So, with "wide" builds there isn't much confusion (except perhaps unfamiliar terminology). The real issue seems to be that for "narrow" builds, none of the usual encoding names is exactly correct. From a users point-of-view, the actual encoding or encoding name doesn't matter much. They just need to be able to predict the relevant behaviors (memory consumption and len/slicing behavior). For the narrow build, that behavior is: - Characters in the BMP consume 2 bytes and count as one char for purposes of len and slicing. - Characters above the BMP consume 4 bytes and counts as two distinct chars for purpose of len and slicing. For wide builds, all characters are 4 bytes and count as a single char for len and slicing. Hope this helps, Raymond From martin at v.loewis.de Sun Nov 21 19:51:44 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 21 Nov 2010 19:51:44 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CE96A40.1050705@v.loewis.de> > > I disagree. Python does "conform" to "UTF-16" > > I'm sure the codecs do. But the Unicode standard doesn't care about > the parts of the process, it cares about what it does as a whole. Chapter and verse? > Python's internal coding does not conform to UTF-16, and that internal > coding can, under certain conditions, escape to the outside world as > invalid "Unicode" output. I'm fairly certain there are provisions in the Unicode standard for such behavior (taking into account "certain conditions"). > > What behavior specifically do you consider non-conforming, and what > > specific specification do you think it is not conforming to? For > > example, it *is* fully conforming with UTF-8. > > Oh, > > f = open('/tmp/broken','wt',encoding='utf8',errors='surrogateescape') > f.write(chr(int('dc80',16))) > f.close() > > for one. That produces a non-UTF-8 file Right. You are using an API that does not promise to create UTF-8, and hence isn't UTF-8. The Unicode standard certainly allows implementations to use character encoding schemes other than UTF-8; this one being "UTF-8 with surrogate escapes", which is different from "UTF-8" (IANA MIBEnum 106). > You can say, "oh, but that's not really a UTF-8 codec", and I'd agree. See above :-) > Nevertheless, the program is able to produce output from internal > "Unicode" strings that does not conform to Unicode at all. *Any* Unicode implementation will do that, since they all have to support legacy encodings in some form. This is certainly conforming to the Unicode standard, and in fact one of the primary Unicode design principles. > A Unicode- > conforming Python implementation would error at the chr() call, or > perhaps would not provide surrogateescape error handlers. Chapter and verse? > "Although practicality beats purity." The Unicode standard itself is based on practicality. It wouldn't have received the success it did if it was based on purity only (and indeed, was often rejected in cases where it put purity over practicality, e.g. with the Hangul syllables). Regards, Martin From rdmurray at bitdance.com Sun Nov 21 20:29:15 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 21 Nov 2010 14:29:15 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> Message-ID: <20101121192915.0FFE1209B7A@kimball.webabinitio.net> On Sun, 21 Nov 2010 10:17:57 -0800, Raymond Hettinger wrote: > On Nov 21, 2010, at 9:38 AM, R. David Murray wrote: > > I'm sorry, but I have to disagree. As a relative unicode ignoramus, > > "UCS-2" and "UCS-4" convey almost no information to me, and the bits I > > have heard about them on this list have only confused me. [...] > 6rom a users point-of-view, the actual encoding or encoding name > doesn't matter much. They just need to be able to predict the relevant > behaviors (memory consumption and len/slicing behavior). > > For the narrow build, that behavior is: > - Characters in the BMP consume 2 bytes and count as one char > for purposes of len and slicing. > - Characters above the BMP consume 4 bytes and counts as > two distinct chars for purpose of len and slicing. > > For wide builds, all characters are 4 bytes and count as a single > char for len and slicing. > > Hope this helps, Thank you, that nicely summarizes and confirms what I thought I knew about wide versus narrow build. And as I said, using the names UCS-2/UCS-4 would only *confuse* that understanding, not clarify it. -- R. David Murray www.bitdance.com From alexander.belopolsky at gmail.com Sun Nov 21 23:13:22 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 21 Nov 2010 17:13:22 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CE6EF91.1040803@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6EF91.1040803@v.loewis.de> Message-ID: On Fri, Nov 19, 2010 at 4:43 PM, "Martin v. L?wis" wrote: >> In my opinion, the question is more what was it not fixed in Python2. I suppose >> that the answer is something ugly like "backward compatibility" or "historical >> reasons" :-) > > No, there was a deliberate decision to not support that, see > > http://www.python.org/dev/peps/pep-0261/ > > There had been a long discussion on this specific detail when PEP 261 > was written, and in the end, an explicit, deliberate, considered > decision was made to raise a ValueError. > Yes, the existence of PEP 261 was one of the reasons I was surprised that a change like this was made without a deliberation. Personally, I've never used chr() or ord() other than on the python command prompt. Processing text one character at a time is just too slow in Python. So for my own use cases, the change is quite welcome. I also find that with bytes() items being int in 3.x more or less removes the need for ord(). On the other hand any 2.x program that uses unichr() and ord() is very likely to exhibit subtly buggy behavior when ported to 3.x. I don't think len(chr(i)) = 2 is likely to cause problems, but map(ord, s) not being an iterator over code points is likely to break naive programs. This is especially true because as far as I can tell there is no easy way to iterate over code points in a Python string on a narrow build. From merwok at netwok.org Mon Nov 22 01:54:34 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Mon, 22 Nov 2010 01:54:34 +0100 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS In-Reply-To: <20101121034404.52924F20A@mail.python.org> References: <20101121034404.52924F20A@mail.python.org> Message-ID: <4CE9BF4A.1020302@netwok.org> > Author: nick.coghlan > New Revision: 86633 > > Issue #10220: Add inspect.getgeneratorstate(). Initial patch by Rodolpho Eckhardt > > Modified: python/branches/py3k/Doc/library/inspect.rst > ============================================================================== > --- python/branches/py3k/Doc/library/inspect.rst (original) > +++ python/branches/py3k/Doc/library/inspect.rst Sun Nov 21 04:44:04 2010 > @@ -620,3 +620,25 @@ > # in which case the descriptor itself will > # have to do > pass > + > +Current State of a Generator > +---------------------------- > + > +When implementing coroutine schedulers and for other advanced uses of > +generators, it is useful to determine whether a generator is currently > +executing, is waiting to start or resume or execution, or has already > +terminated. func:`getgeneratorstate` allows the current state of a > +generator to be determined easily. > + > +.. function:: getgeneratorstate(generator) > + > + Get current state of a generator-iterator. > + > + Possible states are: > + GEN_CREATED: Waiting to start execution. > + GEN_RUNNING: Currently being executed by the interpreter. > + GEN_SUSPENDED: Currently suspended at a yield expression. > + GEN_CLOSED: Execution has completed. I wonder if those shouldn?t be marked up as :data: or something to make them indexed. From v+python at g.nevcal.com Mon Nov 22 04:59:54 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sun, 21 Nov 2010 19:59:54 -0800 Subject: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4 In-Reply-To: <20101121171821.195552194AC@kimball.webabinitio.net> References: <4CE7452A.7050109@g.nevcal.com> <4CE7B34D.4020309@netwok.org> <4CE8111F.9060502@g.nevcal.com> <4CE8CFCD.4040906@g.nevcal.com> <20101121171821.195552194AC@kimball.webabinitio.net> Message-ID: <4CE9EABA.1090306@g.nevcal.com> On 11/21/2010 9:18 AM, R. David Murray wrote: > I want to look at the CGI issue, but I'm not sure when I'll get to it. Actually, since this code was working before 3.x, and if email.parser can now accept binary streams, it seems like maybe the only thing that might be wrong is that presently it is getting a text stream instead, so that is something cgi.py or the application program would have to switch, and then maybe some testing would discover correctness, or maybe a specification of UTF-8 as the encoding to use for the text parts would have to be done. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmurray at bitdance.com Mon Nov 22 05:39:57 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 21 Nov 2010 23:39:57 -0500 Subject: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4 In-Reply-To: <4CE9EABA.1090306@g.nevcal.com> References: <4CE7452A.7050109@g.nevcal.com> <4CE7B34D.4020309@netwok.org> <4CE8111F.9060502@g.nevcal.com> <4CE8CFCD.4040906@g.nevcal.com> <20101121171821.195552194AC@kimball.webabinitio.net> <4CE9EABA.1090306@g.nevcal.com> Message-ID: <20101122043957.2A5D6235C7A@kimball.webabinitio.net> On Sun, 21 Nov 2010 19:59:54 -0800, Glenn Linderman wrote: > On 11/21/2010 9:18 AM, R. David Murray wrote: > > I want to look at the CGI issue, but I'm not sure when I'll get to it. > > Actually, since this code was working before 3.x, and if email.parser > can now accept binary streams, it seems like maybe the only thing that > might be wrong is that presently it is getting a text stream instead, so > that is something cgi.py or the application program would have to > switch, and then maybe some testing would discover correctness, or maybe > a specification of UTF-8 as the encoding to use for the text parts would > have to be done. Well, given the bytes/string split in Python3, code definitely has to be changed to make this work, since you have to explicitly call bytes processing routines (message_from_bytes, message_from_binary_file, BytesFeedparser, etc) to parse binary data, and likewise use BytesGenerator to emit binary data. -- R. David Murray www.bitdance.com From brian.curtin at gmail.com Mon Nov 22 06:14:24 2010 From: brian.curtin at gmail.com (Brian Curtin) Date: Sun, 21 Nov 2010 23:14:24 -0600 Subject: [Python-Dev] Bug week-end on the 20th-21st? In-Reply-To: <20101025220401.0406722b@pitrou.net> References: <20101023190828.47b7f03e@pitrou.net> <20101025153242.2FBEC219F92@kimball.webabinitio.net> <20101025220401.0406722b@pitrou.net> Message-ID: On Mon, Oct 25, 2010 at 15:04, Antoine Pitrou wrote: > On Mon, 25 Oct 2010 11:32:42 -0400 > "R. David Murray" wrote: > > On Mon, 25 Oct 2010 12:22:24 -0200, Rodrigo Bernardo Pimentel < > rbp at isnomore.net> wrote: > > >> Am 23.10.2010 19:08, schrieb Antoine Pitrou: > > >>> The first 3.2 beta is scheduled by Georg for November 13th. > > >>> What would you think of scheduling a bug week-end one week later, > that > > >>> is on November 20th and 21st? We would need enough core developers to > > >>> be available on #python-dev. > > > > > >FWIW, I'm +1, and I'll try to get the Sao Paulo users group to > participate. > > > > I think this is a great idea (both Antoine's initial suggestion and the > > idea of getting users groups to participate). > > > > I'll be around and able to participate that weekend except for evening > > US Eastern time. > > Ok, so 20th-21st of November it shall be! > > Regards > > Antoine. Although a few time zones are still celebrating Bug Weekend, it looks like at least 76 bugs got closed out [0]. Some of those happened thanks to a number of first time contributors. Thanks to everyone for their efforts! [0] http://bugs.python.org/issue?%40columns=title&%40columns=id&activity=from+2010-11-20+to+2010-11-22&%40columns=activity&%40sort=activity&%40group=priority&status=2&%40columns=status&%40pagesize=50&%40startwith=0&%40action=search -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Mon Nov 22 06:28:13 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 22 Nov 2010 14:28:13 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CE96A40.1050705@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE96A40.1050705@v.loewis.de> Message-ID: <87ipzqc4gi.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > Chapter and verse? Unicode 5.0, Chapter 3, verse C9: When a process generates a code unit sequence which purports to be in a Unicode character encoding form, it shall not emit ill-formed code sequences. I think anything called "UTF-8 something" is likely to be taken to "purport". Furthermore, users don't necessarily see which error handlers are being used. A user who specifies "utf8" as the output codec is likely to be rather surprised if non-UTF-8 is emitted because the app specified surrogateescape. Eg, consider a script which munges file descriptions into reasonable-length file names on Unix. Yes, technically the non-Unicode output is the app's fault, but I expect many users will put some blame on Python. I am in full agreement with you about the technicalities, but I am looking for ways to clue in users that (a) the technicalities matter, and (b) that Python does a *very* good job of making things as safe as possible without becoming unable to handle bytes. I think "wide" vs. "narrow" fails at both. It focuses on storage issues, which of course are important, but at the cost of ignoring the fact that for users of non-BMP characters 32-bit code units are much safer. Users who need non-BMP characters are relatively few, and at least at the present time most are painfully aware of the need to care for technicalities. I expect them to be pleasantly surprised by how easy it is to get reasonably safe behavior even from a 16-bit build. > > Python's internal coding does not conform to UTF-16, and that internal > > coding can, under certain conditions, escape to the outside world as > > invalid "Unicode" output. > > I'm fairly certain there are provisions in the Unicode standard for such > behavior (taking into account "certain conditions"). Sure. There's nothing in the Unicode standard that says you have to conform to it unless you claim to conform to it. So it is valid to say that Python's Unicode codecs without surrogateescape do conform. The point is that Python does not, even if all of the input is valid Unicode, because of the provision of surrogateescape and the lack of Unicode conformance-checking for certain internal functionality like chr() and slicing. You can say "we don't make any such claim", but IMO the distinction in question is too fine a point for most users, and requires a very large amount of Unicode knowledge (not to mention standards geekiness) to even understand the precise statement. "Unicode support" to users should mean that Python does the right thing, not that if you look hard enough in the documentation you will discover that Python doesn't claim to do the right thing even though in practice it mostly does. IMO, "UCS-2" is a pretty good description of what the user can leave up to Python in perfect safety. RDM's reply worries me a little, but I'll reply to his message separately. > *Any* Unicode implementation will do that, since they all have to > support legacy encodings in some form. This is certainly conforming to > the Unicode standard, and in fact one of the primary Unicode design > principles. No. Support for legacy encodings takes you outside of the realm of Unicode conformance by definition. Their names tell you that, however. "UTF-8 with surrogate escapes" on the other hand is an entirely different kettle of fish. It pretends to be UTF-8, but isn't. I think that users who give Python valid input should be able to expect valid output, but they can't. Chapter 3, verse C7: When a process purports not to modify the interpretation of a valid coded character sequence, it shall make no change to that coded character sequence other than the possible replacement of character sequences by their canonical-equivalent sequences, or the deletion of *noncharacter* code points. Sure, you can tell users the truth: "Python may modify your Unicode characters if you slice or index Unicode strings. It may even silently turn them into invalid codes which will eventually raise Errors." Then you are conformant, but why would anyone want to use such a program? If you tell them "UCS-2[sic] Python is safe to use with *no* extra care if you use only UCS-2 [or BMP] characters", suddenly Python looks very nice indeed again. "UCS-4" Python is even better; all you have to do is to avoid surrogateescape codecs. However, you're still vulnerable to hard-to-diagnose errors at the output stage in case of program bugs, because not enough checking of values is done by Python itself. > > A Unicode-conforming Python implementation would error at the > > chr() call, or perhaps would not provide surrogateescape error > > handlers. > > Chapter and verse? Chapter 3, verse C9 again. > > "Although practicality beats purity." > > The Unicode standard itself is based on practicality. It wouldn't > have received the success it did if it was based on purity only > (and indeed, was often rejected in cases where it put purity over > practicality, e.g. with the Hangul syllables). Python practicality is very different from Unicode practicality. From v+python at g.nevcal.com Mon Nov 22 06:40:22 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sun, 21 Nov 2010 21:40:22 -0800 Subject: [Python-Dev] is this a bug? no environment variables Message-ID: <4CEA0246.9080607@g.nevcal.com> In reviewing my notes from my experimentations with CGIHTTPServer (Python2.6) and then http.server (Python 3.2a4), I note one behavior I haven't reported as a bug, nor do I know where to start to figure it out, other than experimentally. The experiment: launching CGIHTTPServer without environment variables, by the simple expedient of using a batch file to unset all the existing environment variables, and then launching Python2.6 with CGIHTTPServer. So it failed early: random.py fails at line 110 (Python 2.6). I suppose it is possible that some environment variables are used by Python directly (but I can't seem to find a documented list of them) although I would expect that usage to be optional, with fall-back defaults when they don't exist. I suppose it is even possible that some Windows APIs might depend on some environment variables, but I expected that the registry had replaced such usage completely, by now, with the environment variables mostly being a convenience tool for batch files, or for optional, temporary alteration of particular settings. If anyone knows of documentation listing what environment variables are required by Python on Windows, I would appreciate a pointer, searches and doc browsing having not turned it up. I'll attempt to recreate the test situation later this week with Python 3.2a4, if no one responds, but the only debug technique I can think of is to slowly remove environment variables until I find the minimum set required to run http.server successfully for my tests with CGI files. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Mon Nov 22 07:14:46 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 22 Nov 2010 15:14:46 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <20101121173825.B1BFB235977@kimball.webabinitio.net> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> Message-ID: <87hbf9dgvd.fsf@uwakimon.sk.tsukuba.ac.jp> R. David Murray writes: > I'm sorry, but I have to disagree. As a relative unicode ignoramus, > "UCS-2" and "UCS-4" convey almost no information to me, and the bits I > have heard about them on this list have only confused me. OK, point taken. > On the other hand, I understand that 'narrow' means that fewer > bytes are used for each internal character, meaning that some > unicode characters need to be represented by more than one string > element, and thus that slicing strings containing such characters > on a narrow build causes problems. Now, you could tell me the same > information using the terms 'UCS-2' and 'UCS-4' instead of 'narrow' > and 'wide', but to my ear 'narrow' and 'wide' convey a better gut > level feeling for what is going on than 'UCS-2' and 'UCS-4' do. I think that is probably conditioned by your long experience with Python's Unicode features, specifically the knowledge that Python's Unicode strings are not arrays of characters, which often is referred to on this list. My guess is that very few newbies would know that, and it is not implied by "narrow". For example, both Emacs (for sure) and Perl (IIUC) index strings of variable-width character by characters (at great expense of performance in Emacs, at least), not as code units. > And it avoids any question of whether or not Python's internal > representation actually conforms to whatever standard it is that > UCS refers to, a point on which there seems to be some dissension. UCS-2 refers to ISO 10646, Annex 1 IIRC.[1] Anyway, it's somewhere in ISO 10646. I don't think there's actually dissension on conformance to UCS-2, as that's very easy to achieve. Rather, Guido explicitly pronounced that Python processes arrays of code units, not characters. My point is that if you pretend that Python is processing *characters* according to UCS-2 rules for characters, you'll always come to the same conclusion about what Python will do as if you use the technically correct terminology of code units. (At least for the BMP and UTF-16 private areas. There will necessarily be some confusion about surrogates, since in UCS-2 they are characters while in UTF-16 they're merely "code points", and the Unicode characters they represent can't be represented at all in UCS-2.) > Indeed, reading that article with my limited unicode knowledge, if > I were told Python used UCS-2, I would assume that non-BMP > characters could not be processed by a Python narrow build. Actually, I'm almost happy with that. That is, the precise formulation is "could not be processed *safely without extra care* by a Python narrow build." Specifically, AFAIK if you range check characters that have been indexed out of a string, or are located at slice boundaries, or produced by chr() or a surrogateescape input codec, you're safe. But practically speaking few apps will actually do those checks and therefore they are unsafe: processing non-BMP characters can easily lead to show-stopping Exceptions. It's very analogous to the kind of show-stopping "bad character in a header" exception that plagued Mailman for so long, and had to be fixed on a case-by-case basis. But the restriction to BMP characters is much more reasonable (at least for now) than RFC 822's restriction to ASCII! But evidently you take it much more stringently. So the question is, "what fraction of developers who think as you do would therefore be put off from using Python to build their applications?" If most would say "OK, we'll stick with BMP for now and use UCS-4 or some hack to deal with extended characters later -- it can't really be true that it's absolutely impossible to use non-BMP characters," I don't mind that misunderstanding. OTOH, yes, it would be bad if the use of "UCS-2" were to imply to more than a couple of developers that 16-bit builds of Python can't handle UTF-16 *at all*. Footnotes: [1] It simply says "we have a subset of the Unicode character set all of whose code points can be represented in 16 bits, excluding 0xFFFF." It goes on to define a private area, reserved for use by applications that will never be standardized, and it says that if you don't know what a code point in the character area is, don't change it (you can delete it, however). ISTR that a later Amendment added 0xFFFE to the short-list of non-characters. The surrogate area was taken out of the private area, so a UCS-2 application will simply consider each surrogate to be an unknown character and pass it through unchanged -- unless it deletes it, or inserts other characters between the code points of a surrogate pair. And that's why UCS-2 isn't UTF-16 conforming -- which is basically why Python isn't either. From martin at v.loewis.de Mon Nov 22 09:20:59 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 22 Nov 2010 09:20:59 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87ipzqc4gi.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE96A40.1050705@v.loewis.de> <87ipzqc4gi.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEA27EB.8000104@v.loewis.de> > Unicode 5.0, Chapter 3, verse C9: > > When a process generates a code unit sequence which purports to be > in a Unicode character encoding form, it shall not emit ill-formed > code sequences. > > > A Unicode-conforming Python implementation would error at the > > > chr() call, or perhaps would not provide surrogateescape error > > > handlers. > > > > Chapter and verse? > > Chapter 3, verse C9 again. I agree that the surrogateescape error handler is non-conforming, but, as you say, it doesn't claim to, either (would your concern about utf-8 being misleading here been resolved if the thing had been called "utf-8b"?) More interestingly (and to the subject) is chr: how did you arrive at C9 banning Python3's definition of chr? This chr function puts the code sequence into well-formed UTF-16; that's the whole point of UTF-16. Regards, Martin From stephen at xemacs.org Mon Nov 22 11:47:09 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 22 Nov 2010 19:47:09 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEA27EB.8000104@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE96A40.1050705@v.loewis.de> <87ipzqc4gi.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA27EB.8000104@v.loewis.de> Message-ID: <87fwutd49e.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > More interestingly (and to the subject) is chr: how did you arrive > at C9 banning Python3's definition of chr? This chr function puts > the code sequence into well-formed UTF-16; that's the whole point of > UTF-16. No, it doesn't, in the specific case of surrogate code points. In 3.1.2 from MacPorts on a iBook G4 and from Gentoo on AMD64, chr(0xd800) returns "\ud800". I don't know if that's by design (eg, so that it can be used in the implementation of the surrogateescape error handler) or a correctable oversight, but it's not conformant. From stephen at xemacs.org Mon Nov 22 11:48:42 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 22 Nov 2010 19:48:42 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> Message-ID: <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Raymond Hettinger writes: > Neither UTF-16 nor UCS-2 is exactly correct anyway. >From a standards lawyer point of view, UCS-2 is exactly correct, as far as I can tell upon rereading ISO 10646-1, especially Annexes H ("retransmitting devices") and Q ("UTF-16"). Annex Q makes it clear that UTF-16 was intentionally designed so that Python-style processing could be done in a UCS-2 context. > For the "wide" build, the entire range of unicode is encoded at > 4 bytes per character and slicing/len operate correctly since > every character is the same length. This used to be called UCS-4 > and is now UTF-32. That's inaccurate, I believe. UCS-4 is not a UTF, and doesn't satisfy the range restrictions of a UTF. > So, with "wide" builds there isn't much confusion (except perhaps > unfamiliar terminology). The real issue seems to be that for > "narrow" builds, none of the usual encoding names is exactly > correct. I disagree. I do see a problem with "UCS-2", because it fails to tell us that Python implements a large number of features that make it easy to do a very good job of working with non-BMP data in 16-bit builds of Python, with no extra effort. Python is not perfect, and (rarely) some of the imperfections may be very distressing. But it's very good, and deserves to be advertised as such. However, I don't see how "narrow" tells us more than "UCS-2" does. If "UCS-2" is equally (or more) informative, I prefer it because it is the technically precise, already well-defined, term. > From a users point-of-view, the actual encoding or encoding name > doesn't matter much. They just need to be able to predict the relevant > behaviors (memory consumption and len/slicing behavior). "UCS-2" indicates those behaviors precisely and concisely. The problems are (a) the lack of familiarity of users with this term, if David is reasonably representative, and (b) the fact that it fails to advertise Python's UTF-16 capabilities. "Narrow" suffers from both of those problems, and further from the fact that it has no independent standard definition. Furthermore, "wide" has a very widespread, platform-dependent meaning derived from wchar_t. If we have to document what the terms we choose mean anyway, why not document the existing terms and reduce entropy, rather than invent new ones and increase entropy? From martin at v.loewis.de Mon Nov 22 12:22:35 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 22 Nov 2010 12:22:35 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87fwutd49e.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE96A40.1050705@v.loewis.de> <87ipzqc4gi.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA27EB.8000104@v.loewis.de> <87fwutd49e.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEA527B.4030002@v.loewis.de> Am 22.11.2010 11:47, schrieb Stephen J. Turnbull: > "Martin v. L?wis" writes: > > > More interestingly (and to the subject) is chr: how did you arrive > > at C9 banning Python3's definition of chr? This chr function puts > > the code sequence into well-formed UTF-16; that's the whole point of > > UTF-16. > > No, it doesn't, in the specific case of surrogate code points. In > 3.1.2 from MacPorts on a iBook G4 and from Gentoo on AMD64, > chr(0xd800) returns "\ud800". Ah, I see - this is *not* the subject's issue, right? > > I don't know if that's by design (eg, so that it can be used in the > implementation of the surrogateescape error handler) or a correctable > oversight, but it's not conformant. I disagree: Quoting from Unicode 5.0, section 5.4: # The individual components of implementations may have different # levels of support for surrogates, as long as those components are # assembled and communicate correctly. Low-level string processing, # where a Unicode string is not interpreted but is handled simply as an # array of code units, may ignore surrogate pairs. With such strings, # for example, a truncation operation with an arbitrary offset might # break a surrogate pair. (For further discussion, see Section 2.7, # Unicode Strings.) For performance in string operations, such behavior # is reasonable at a low level, but it requires higher-level processes # to ensure that offsets are on character boundaries so as to guarantee # the integrity of surrogate pairs. So lower-level routines (which I claim chr() is one) are allowed to create lone surrogates. The formal requirement behind this is C1: # A process shall not interpret a high-surrogate code point or a # low-surrogate code point as an abstract character. I also claim that Python, in both narrow and wide mode, conforms to this requirement. Notice that the requirement is a ban on interpreting the code point as a character. In particular, unicodedata.category claims that the code point is of class Cs (surrogate), which I consider conforming. By the same line of reasoning, it is also OK that chr() allows the creation of unassigned code points, even though C2 says that they must not be interpreted as abstract characters. The rationale for supporting these characters in chr() goes back much further than the surrogateescape handler - as Python unicode strings are sequences of code points, it would be impractical if you couldn't create some of them, or even would have to consult the UCD before determining whether they can be created. Regards, Martin From martin at v.loewis.de Mon Nov 22 12:43:00 2010 From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 22 Nov 2010 12:43:00 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEA5744.3080308@v.loewis.de> Am 22.11.2010 11:48, schrieb Stephen J. Turnbull: > Raymond Hettinger writes: > > > Neither UTF-16 nor UCS-2 is exactly correct anyway. > >>From a standards lawyer point of view, UCS-2 is exactly correct, as > far as I can tell upon rereading ISO 10646-1, especially Annexes H > ("retransmitting devices") and Q ("UTF-16"). Annex Q makes it clear > that UTF-16 was intentionally designed so that Python-style processing > could be done in a UCS-2 context. I could only find the FCD of 10646:2010, where annex H was integrated into section 10: http://www.itscj.ipsj.or.jp/sc2/open/02n4125/FCD10646-Main.pdf There they have stopped using the term UCS-2, and added a note # NOTE ? Former editions of this standard included references to a # two-octet BMP form called UCS-2 which would be a subset # of the UTF-16 encoding form restricted to the BMP UCS scalar values. # The UCS-2 form is deprecated. I think they are now acknowledging that UCS-2 was a misleading term, making it ambiguous whether this refers to a CCS, a CEF, or a CES; like "ASCII", people have been using it for all three of them. Apparently, the ISO WG interprets earlier revisions as saying that UCS-2 is a CEF that restricted UTF-16 to the BMP. THIS IS NOT WHAT PYTHON DOES. In a narrow Python build, the character set is *not* restricted to the BMP. Instead, Unicode strings are meant to be interpreted (by applications) as UTF-16. > > For the "wide" build, the entire range of unicode is encoded at > > 4 bytes per character and slicing/len operate correctly since > > every character is the same length. This used to be called UCS-4 > > and is now UTF-32. > > That's inaccurate, I believe. UCS-4 is not a UTF, and doesn't satisfy > the range restrictions of a UTF. Not sure what it says in your copy; in mine, section 9.3 says # 9.3 UTF-32 (UCS-4) # UTF-32 (or UCS-4) is the UCS encoding form that assigns each UCS # scalar value to a single unsigned 32-bit code unit. The terms UTF-32 # and UCS-4 can be used interchangeably to designate this encoding # form. so they (now) view the two as synonyms. I think that when ISO 10646 started, they were also fairly confused about these issues (as the group/plane/row/cell structure demonstrates, IMO). This is not surprising, since the notion of byte-based character sets had been ingrained for so long. It took 20 years to learn that a UCS scalar value really is *not* a sequence of bytes, but a natural number. > However, I don't see how "narrow" tells us more than "UCS-2" does. If > "UCS-2" is equally (or more) informative, I prefer it because it is > the technically precise, already well-defined, term. But it's not. It is a confusing term, one that the relevant standards bodies are abandoning. After reading FCD 10646:2010, I could agree to call the two implementations UTF-16 and UTF-32 (as these terms designate CEFs). Unfortunately, they also designate CESs. > If we have to document what the terms we choose mean anyway, why not > document the existing terms and reduce entropy, rather than invent new > ones and increase entropy? Because the proposed existing term is deprecated. Regards, Martin From mal at egenix.com Mon Nov 22 13:47:29 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 22 Nov 2010 13:47:29 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEA5744.3080308@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> Message-ID: <4CEA6661.4080402@egenix.com> Martin, it is really irrelevant whether the standards have decided to no longer use the terms UCS-2 and UCS-4 in their latest standard documents. The definitions still stand (just like Unicode 2.0 is still a valid standard, even if it's ten years old): * UCS-2 is defined as "Universal Character Set coded in 2 octets" by ISO 10464: (see http://www.unicode.org/versions/Unicode5.2.0/appC.pdf) * UCS-4 is defined as "Universal Character Set coded in 4 octets" by ISO 10464. Those two terms have been in use for many years. They refer to the Unicode character set as it can be represented in 2 or 4 bytes. As such they don't include any of the special meanings associated with the UTF transfer encodings. There are no invalid sequences, no invalid code points, etc. as you can find in the UTF encodings. And that's an important detail. If you interpret them as encodings, they are 1-1 mappings of Unicode code point ordinals to integers represented using 2 or 4 bytes. UCS-2 only supports BMP code points and can conveniently be interpreted as UTF-16, if you need to encode non-BMP code points (which we do in the UTF codecs). UCS-4 also supports non-BMP code points directly. Now, from a ISO or Unicode Consortium point of view, deprecating the term UCS-2 in *their* standard papers is only natural, since they are actively starting to assign non-BMP code points which cannot be represented in UCS-2. However, this deprecation is only relevant for the purpose of defining the standard. The above definitions are still useful when it comes to defining code units, i.e. the used storage format, (as opposed to the transfer format). For the purpose of describing the code units we are using in Python they are (still) the most correct terms and that's also the reason why we chose to use them when introducing the configure options in Python2. There are no other accurate definitions we could use. The terms "narrow" and "wide" are simply too inaccurate to be used as description of UCS-2 and UCS-4 code units. Please also note that we have used the terms UCS-2 and UCS-4 in Python2 for 9+ years now and users are just starting to learn the difference and get acquainted with the fact that Python uses these two forms. Confronting them with "narrow" and "wide" builds is only going to cause more confusion, not less, and adding those strings to Python package files isn't going to help much either, since the terms don't convey any relationship to Unicode: package-3.1.3.linux-x86_64-py2.6_ucs2.egg vs. package-3.1.3.linux-x86_64-py2.6_narrow.egg I opt for switching to the following config options: --with-unicode=ucs2 (default) --with-unicode=ucs4 and using "UCS-2" and "UCS-4" in the Python documentation when describing the two different build modes. We can add glossary entries for the two which clarify the differences. Python2 used --enable-unicode=ucs2/ucs4, but since Python3 doesn't build without Unicode support, the above two versions appear more appropriate. We can keep the alternative --with-wide-unicode as an alias for --with-unicode=ucs4 to maintain 3.x backwards compatibility. Cheers, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 22 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ "Martin v. L?wis" wrote: > Am 22.11.2010 11:48, schrieb Stephen J. Turnbull: >> Raymond Hettinger writes: >> >> > Neither UTF-16 nor UCS-2 is exactly correct anyway. >> >> >From a standards lawyer point of view, UCS-2 is exactly correct, as >> far as I can tell upon rereading ISO 10646-1, especially Annexes H >> ("retransmitting devices") and Q ("UTF-16"). Annex Q makes it clear >> that UTF-16 was intentionally designed so that Python-style processing >> could be done in a UCS-2 context. > > I could only find the FCD of 10646:2010, where annex H was integrated > into section 10: > > http://www.itscj.ipsj.or.jp/sc2/open/02n4125/FCD10646-Main.pdf > > There they have stopped using the term UCS-2, and added a note > > # NOTE ? Former editions of this standard included references to a > # two-octet BMP form called UCS-2 which would be a subset > # of the UTF-16 encoding form restricted to the BMP UCS scalar values. # > The UCS-2 form is deprecated. > > I think they are now acknowledging that UCS-2 was a misleading term, > making it ambiguous whether this refers to a CCS, a CEF, or a CES; > like "ASCII", people have been using it for all three of them. > > Apparently, the ISO WG interprets earlier revisions as saying that > UCS-2 is a CEF that restricted UTF-16 to the BMP. THIS IS NOT WHAT > PYTHON DOES. In a narrow Python build, the character set is *not* > restricted to the BMP. Instead, Unicode strings are meant to be > interpreted (by applications) as UTF-16. > >> > For the "wide" build, the entire range of unicode is encoded at >> > 4 bytes per character and slicing/len operate correctly since >> > every character is the same length. This used to be called UCS-4 >> > and is now UTF-32. >> >> That's inaccurate, I believe. UCS-4 is not a UTF, and doesn't satisfy >> the range restrictions of a UTF. > > Not sure what it says in your copy; in mine, section 9.3 says > > # 9.3 UTF-32 (UCS-4) > # UTF-32 (or UCS-4) is the UCS encoding form that assigns each UCS > # scalar value to a single unsigned 32-bit code unit. The terms UTF-32 # > and UCS-4 can be used interchangeably to designate this encoding > # form. > > so they (now) view the two as synonyms. > > I think that when ISO 10646 started, they were also fairly confused > about these issues (as the group/plane/row/cell structure demonstrates, > IMO). This is not surprising, since the notion of byte-based character > sets had been ingrained for so long. It took 20 years to learn that > a UCS scalar value really is *not* a sequence of bytes, but a natural > number. > >> However, I don't see how "narrow" tells us more than "UCS-2" does. If >> "UCS-2" is equally (or more) informative, I prefer it because it is >> the technically precise, already well-defined, term. > > But it's not. It is a confusing term, one that the relevant standards > bodies are abandoning. After reading FCD 10646:2010, I could agree to > call the two implementations UTF-16 and UTF-32 (as these terms > designate CEFs). Unfortunately, they also designate CESs. > >> If we have to document what the terms we choose mean anyway, why not >> document the existing terms and reduce entropy, rather than invent new >> ones and increase entropy? > > Because the proposed existing term is deprecated. > > Regards, > Martin From foom at fuhm.net Mon Nov 22 15:18:02 2010 From: foom at fuhm.net (James Y Knight) Date: Mon, 22 Nov 2010 09:18:02 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEA6661.4080402@egenix.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> Message-ID: Why don't ya'll just call them "--unichar-width=16/32". That describes precisely what the options do, and doesn't invite any quibbling over definitions. James From ncoghlan at gmail.com Mon Nov 22 16:14:46 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 23 Nov 2010 01:14:46 +1000 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS In-Reply-To: <4CE9BF4A.1020302@netwok.org> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> Message-ID: On Mon, Nov 22, 2010 at 10:54 AM, ?ric Araujo wrote: >> +.. function:: getgeneratorstate(generator) >> + >> + ? ?Get current state of a generator-iterator. >> + >> + ? ?Possible states are: >> + ? ? ?GEN_CREATED: Waiting to start execution. >> + ? ? ?GEN_RUNNING: Currently being executed by the interpreter. >> + ? ? ?GEN_SUSPENDED: Currently suspended at a yield expression. >> + ? ? ?GEN_CLOSED: Execution has completed. > > I wonder if those shouldn?t be marked up as :data: or something to make > them indexed. The same definitions are in the docstrings, and they're just integer constants so I'm not sure why anyone would be looking them up directly. Still, if someone with greater Sphinx-fu thinks additional markup would be helpful, I have no problem with them adding it :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Mon Nov 22 16:19:04 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 22 Nov 2010 15:19:04 +0000 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> Message-ID: <4CEA89E8.5090107@voidspace.org.uk> On 22/11/2010 15:14, Nick Coghlan wrote: > On Mon, Nov 22, 2010 at 10:54 AM, ?ric Araujo wrote: >>> +.. function:: getgeneratorstate(generator) >>> + >>> + Get current state of a generator-iterator. >>> + >>> + Possible states are: >>> + GEN_CREATED: Waiting to start execution. >>> + GEN_RUNNING: Currently being executed by the interpreter. >>> + GEN_SUSPENDED: Currently suspended at a yield expression. >>> + GEN_CLOSED: Execution has completed. >> I wonder if those shouldn?t be marked up as :data: or something to make >> them indexed. > The same definitions are in the docstrings, and they're just integer > constants so I'm not sure why anyone would be looking them up > directly. Still, if someone with greater Sphinx-fu thinks additional > markup would be helpful, I have no problem with them adding it :) > Why not use string constants instead? You lose comparability (less than / greater than) but gain readability. Comparability may be a requirement - of course if Python had an Enum type we could use that and have both. Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From ncoghlan at gmail.com Mon Nov 22 16:37:21 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 23 Nov 2010 01:37:21 +1000 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEA6661.4080402@egenix.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> Message-ID: On Mon, Nov 22, 2010 at 10:47 PM, M.-A. Lemburg wrote: > Please also note that we have used the terms UCS-2 and UCS-4 in Python2 > for 9+ years now and users are just starting to learn the difference > and get acquainted with the fact that Python uses these two forms. > > Confronting them with "narrow" and "wide" builds is only > going to cause more confusion, not less, and adding those > strings to Python package files isn't going to help much either, > since the terms don't convey any relationship to Unicode: I was personally surprised to learn in this discussion that there had even been an *attempt* to change the names of the two build variants to anything other than UCS2/UCS4. The concrete API implementations certainly still use those two terms to prevent inadvertent linkage with the wrong version of the C API. For practical purposes, UCS2/UCS4 convey far more inherent information than narrow/wide: - many developers will recognise them as Unicode related, even if they don't know exactly what they mean - even those that don't recognise them, can soon learn that they're Unicode related just by plugging them into Google* - a bit more digging should reveal that they're Unicode storage formats closely related to the UTF-16 and UTF-32 transfer encodings respectively* *(The first Google hit for "ucs2" is the UTF-16/UCS-2 article on Wikipedia, the first hit for "ucs4" is the UTF-32/UCS-4 article) All that just armed with Google, without even looking at the Python docs specifically. So don't just think about "what will developers know?", also think about "what will developers know, and what will a quick trip to a search engine tell them?". And once you take that stance, the overly generic narrow/wide terms fail, badly. +1 for MAL's suggested tweaks to the Py3k configure options. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Mon Nov 22 16:37:22 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 22 Nov 2010 16:37:22 +0100 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> Message-ID: <20101122163722.7e96d123@pitrou.net> On Mon, 22 Nov 2010 15:19:04 +0000 Michael Foord wrote: > On 22/11/2010 15:14, Nick Coghlan wrote: > > On Mon, Nov 22, 2010 at 10:54 AM, ?ric Araujo wrote: > >>> +.. function:: getgeneratorstate(generator) > >>> + > >>> + Get current state of a generator-iterator. > >>> + > >>> + Possible states are: > >>> + GEN_CREATED: Waiting to start execution. > >>> + GEN_RUNNING: Currently being executed by the interpreter. > >>> + GEN_SUSPENDED: Currently suspended at a yield expression. > >>> + GEN_CLOSED: Execution has completed. > >> I wonder if those shouldn?t be marked up as :data: or something to make > >> them indexed. > > The same definitions are in the docstrings, and they're just integer > > constants so I'm not sure why anyone would be looking them up > > directly. Still, if someone with greater Sphinx-fu thinks additional > > markup would be helpful, I have no problem with them adding it :) > > > > Why not use string constants instead? You lose comparability (less than > / greater than) but gain readability. Comparability may be a requirement > - of course if Python had an Enum type we could use that and have both. +1. The problem with int constants is that the int gets printed, not the name, when you dump them for debugging purposes :) cheers Antoine. From ncoghlan at gmail.com Mon Nov 22 16:45:28 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 23 Nov 2010 01:45:28 +1000 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS In-Reply-To: <4CEA89E8.5090107@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> Message-ID: On Tue, Nov 23, 2010 at 1:19 AM, Michael Foord wrote: > On 22/11/2010 15:14, Nick Coghlan wrote: >> On Mon, Nov 22, 2010 at 10:54 AM, ?ric Araujo ?wrote: >>>> + ? ?Possible states are: >>>> + ? ? ?GEN_CREATED: Waiting to start execution. >>>> + ? ? ?GEN_RUNNING: Currently being executed by the interpreter. >>>> + ? ? ?GEN_SUSPENDED: Currently suspended at a yield expression. >>>> + ? ? ?GEN_CLOSED: Execution has completed. >>> >>> I wonder if those shouldn?t be marked up as :data: or something to make >>> them indexed. >> >> The same definitions are in the docstrings, and they're just integer >> constants so I'm not sure why anyone would be looking them up >> directly. Still, if someone with greater Sphinx-fu thinks additional >> markup would be helpful, I have no problem with them adding it :) >> > > Why not use string constants instead? You lose comparability (less than / > greater than) but gain readability. Comparability may be a requirement - of > course if Python had an Enum type we could use that and have both. With only 4 states, comparability isn't really necessary. I'm just so used to using the range() trick as a replacement for the lack of proper Enum type that using strings instead didn't even occur to me. The lack of printability did bother me a bit, so yeah, +1 from me as well (I've reopened the relevant issue to remind me to change it before beta 1). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From alexander.belopolsky at gmail.com Mon Nov 22 17:03:47 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 22 Nov 2010 11:03:47 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> Message-ID: On Mon, Nov 22, 2010 at 10:37 AM, Nick Coghlan wrote: .. > *(The first Google hit for "ucs2" is the UTF-16/UCS-2 article on > Wikipedia, the first hit for "ucs4" is the UTF-32/UCS-4 article) > Do you think these articles are helpful for someone learning how to use chr() and ord() in Python for the first time? From hrvoje.niksic at avl.com Mon Nov 22 17:08:36 2010 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Mon, 22 Nov 2010 17:08:36 +0100 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS In-Reply-To: <20101122163722.7e96d123@pitrou.net> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> Message-ID: <4CEA9584.7040301@avl.com> On 11/22/2010 04:37 PM, Antoine Pitrou wrote: > +1. The problem with int constants is that the int gets printed, not > the name, when you dump them for debugging purposes :) Well, it's trivial to subclass int to something with a nicer __repr__. PyGTK uses that technique for wrapping C enums: >>> gtk.PREVIEW_GRAYSCALE >>> isinstance(gtk.PREVIEW_GRAYSCALE, int) True >>> gtk.PREVIEW_GRAYSCALE + 0 1 From ncoghlan at gmail.com Mon Nov 22 17:13:39 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 23 Nov 2010 02:13:39 +1000 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> Message-ID: On Tue, Nov 23, 2010 at 2:03 AM, Alexander Belopolsky wrote: > On Mon, Nov 22, 2010 at 10:37 AM, Nick Coghlan wrote: > .. >> *(The first Google hit for "ucs2" is the UTF-16/UCS-2 article on >> Wikipedia, the first hit for "ucs4" is the UTF-32/UCS-4 article) >> > > Do you think these articles are helpful for someone learning how to > use chr() and ord() in Python for the first time? No, that's what the documentation of chr() and ord() is for. For that use case, it doesn't matter *what* the terms are. They could say "in a FOO build this will do X, in a BAR build it will do Y, see for a detailed explanation of the differences between FOO and BAR builds of Python" and be perfectly adequate for the task. If there is no appropriate documentation link to point to (probably somewhere in the C API docs if it isn't anywhere else) then that is a key issue that needs to be fixed, rather than trying to change the terms that have been in use for the better part of a decade already. The raw meaning of UCS2/UCS4 mainly comes into the story when people are encountering this as a config option when building Python. The whole idea of changing the terms for the two build types *should* have been short circuited by the "status quo wins a stalemate" guideline, but apparently that didn't happen at the time. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Mon Nov 22 17:24:40 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 22 Nov 2010 17:24:40 +0100 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> Message-ID: <20101122172440.77d27ed5@pitrou.net> On Mon, 22 Nov 2010 17:08:36 +0100 Hrvoje Niksic wrote: > On 11/22/2010 04:37 PM, Antoine Pitrou wrote: > > +1. The problem with int constants is that the int gets printed, not > > the name, when you dump them for debugging purposes :) > > Well, it's trivial to subclass int to something with a nicer __repr__. > PyGTK uses that technique for wrapping C enums: Nice. It might be useful to add a private _Constant class somewhere for stdlib purposes. Regards Antoine. From guido at python.org Mon Nov 22 17:33:57 2010 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Nov 2010 08:33:57 -0800 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: <4CEA0246.9080607@g.nevcal.com> References: <4CEA0246.9080607@g.nevcal.com> Message-ID: On Sun, Nov 21, 2010 at 9:40 PM, Glenn Linderman wrote: > In reviewing my notes from my experimentations with CGIHTTPServer > (Python2.6) and then http.server (Python 3.2a4), I note one behavior I > haven't reported as a bug, nor do I know where to start to figure it out, > other than experimentally. > > The experiment: launching CGIHTTPServer without environment variables, by > the simple expedient of using a batch file to unset all the existing > environment variables, and then launching Python2.6 with CGIHTTPServer. > > So it failed early: random.py fails at line 110 (Python 2.6). What specific traceback do you get? In my copy of the code that line says a = long(_hexlify(_urandom(16)), 16) and I could just imagine that _urandom() fails for some reason to do with the environment (it is a reference to os.urandom()), which, being part of the C library code, might depend on the environment. But you're not giving enough info to debug this. > I suppose it is possible that some environment variables are used by Python > directly (but I can't seem to find a documented list of them) although I > would expect that usage to be optional, with fall-back defaults when they > don't exist. That is certainly the idea, but the fallbacks may not always be nice. Environment variables used by Python or the stdlib itself are supposed to be named PYTHON if they are Python-specific, and there's a way to disable all of these (-E). But there are other environment variables (HOME and PATH come to mind) that have a broader definition and that are used in some part of the stdlib. Plus, as I mentioned, who knows what the non-Python C library uses (well, somebody probably knows, but I don't know of a central source that we can actually trust across the many platforms where Python runs). > I suppose it is even possible that some Windows APIs might > depend on some environment variables, but I expected that the registry had > replaced such usage completely, by now, with the environment variables > mostly being a convenience tool for batch files, or for optional, temporary > alteration of particular settings. That sounds like wishful thinking. :-) > If anyone knows of documentation listing what environment variables are > required by Python on Windows, I would appreciate a pointer, searches and > doc browsing having not turned it up. > > I'll attempt to recreate the test situation later this week with Python > 3.2a4, if no one responds, but the only debug technique I can think of is to > slowly remove environment variables until I find the minimum set required to > run http.server successfully for my tests with CGI files. -- --Guido van Rossum (python.org/~guido) From fuzzyman at voidspace.org.uk Mon Nov 22 17:58:56 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 22 Nov 2010 16:58:56 +0000 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS In-Reply-To: <20101122172440.77d27ed5@pitrou.net> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> Message-ID: <4CEAA150.3020106@voidspace.org.uk> On 22/11/2010 16:24, Antoine Pitrou wrote: > On Mon, 22 Nov 2010 17:08:36 +0100 > Hrvoje Niksic wrote: >> On 11/22/2010 04:37 PM, Antoine Pitrou wrote: >>> +1. The problem with int constants is that the int gets printed, not >>> the name, when you dump them for debugging purposes :) >> Well, it's trivial to subclass int to something with a nicer __repr__. >> PyGTK uses that technique for wrapping C enums: > Nice. It might be useful to add a private _Constant class somewhere for > stdlib purposes. Why not just solve the problem properly and add it to the standard library... (Allowing for flag enums too that can be or'd together and still have a decent repr.) Michael > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From alexander.belopolsky at gmail.com Mon Nov 22 18:00:14 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 22 Nov 2010 12:00:14 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> Message-ID: On Mon, Nov 22, 2010 at 11:13 AM, Nick Coghlan wrote: .. >> Do you think these articles are helpful for someone learning how to >> use chr() and ord() in Python for the first time? > > No, that's what the documentation of chr() and ord() is for. For that > use case, it doesn't matter *what* the terms are. I recently updated chr() and ord() documentation and used "narrow/wide" terms. I thought USC2/4 proponents objected to that on the basis that these terms are imprecise. http://docs.python.org/dev/library/functions.html#chr http://docs.python.org/dev/library/functions.html#ord > They could say "in a > FOO build this will do X, in a BAR build it will do Y, see for > a detailed explanation of the differences between FOO and BAR builds > of Python" and be perfectly adequate for the task. If there is no > appropriate documentation link to point to (probably somewhere in the > C API docs if it isn't anywhere else) then that is a key issue that > needs to be fixed, rather than trying to change the terms that have > been in use for the better part of a decade already. > That's the point that I was trying to make. Using somewhat vague narrow/wide terms gives us an opportunity to describe exactly what is going on without confusing the reader with the intricacies of the Unicode Standard or Python'd compliance with a particular version of it. > The raw meaning of UCS2/UCS4 mainly comes into the story when people > are encountering this as a config option when building Python. The > whole idea of changing the terms for the two build types *should* have > been short circuited by the "status quo wins a stalemate" guideline, > but apparently that didn't happen at the time. > It also comes in the "Data model" reference section on String which is currently out of date: """ Strings The items of a string object are Unicode code units. A Unicode code unit is represented by a string object of one item and can hold either a 16-bit or 32-bit value representing a Unicode ordinal (the maximum value for the ordinal is given in sys.maxunicode, and depends on how Python is configured at compile time). Surrogate pairs may be present in the Unicode object, and will be reported as two separate items. The built-in functions chr() and ord() convert between code units and nonnegative integers representing the Unicode ordinals as defined in the Unicode Standard 3.0. Conversion from and to other encodings are possible through the string method encode(). """ http://docs.python.org/dev/reference/datamodel.html The out of date part is the reference to the Unicode Standard 3.0. I don't think we should refer to a specific version of Unicode here. It has little consequence for the "Python data model" and AFAICT does not come into play anywhere except unicodedata which is currently at version 6.0. The description of chr() and ord() is also not accurate on narrow builds and nether is the statement "The items of a string object are Unicode code units." From exarkun at twistedmatrix.com Mon Nov 22 17:46:54 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Mon, 22 Nov 2010 16:46:54 -0000 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS In-Reply-To: <20101122172440.77d27ed5@pitrou.net> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> Message-ID: <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> On 04:24 pm, solipsis at pitrou.net wrote: >On Mon, 22 Nov 2010 17:08:36 +0100 >Hrvoje Niksic wrote: >>On 11/22/2010 04:37 PM, Antoine Pitrou wrote: >> > +1. The problem with int constants is that the int gets printed, >>not >> > the name, when you dump them for debugging purposes :) >> >>Well, it's trivial to subclass int to something with a nicer __repr__. >>PyGTK uses that technique for wrapping C enums: > >Nice. It might be useful to add a private _Constant class somewhere for >stdlib purposes. http://www.python.org/dev/peps/pep-0354/ >Regards > >Antoine. > > >_______________________________________________ >Python-Dev mailing list >Python-Dev at python.org >http://mail.python.org/mailman/listinfo/python-dev >Unsubscribe: http://mail.python.org/mailman/options/python- >dev/exarkun%40twistedmatrix.com From ezio.melotti at gmail.com Mon Nov 22 18:14:03 2010 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Mon, 22 Nov 2010 19:14:03 +0200 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest Message-ID: <4CEAA4DB.6020904@gmail.com> I would like to re-enable by default warnings for regrtest and/or unittest. The reasons are: 1) these tools are used mainly by developers and they (should) care about warnings; 2) developers won't have to remember that warning are silenced and how to enable them manually; 3) developers won't have to enable them manually every time they run the tests; 4) some developers are not even aware that warnings have been silenced and might not notice things like DeprecationWarnings until the function/method/class/etc gets removed and breaks their code; 5) another developer tool -- the --with-pydebug flag -- already re-enables warnings when it's used; If this is fixed in unittest it won't be necessary to patch regrtest. If it's fixed in regrtest only the core developers will benefit from this. This could be fixed checking if any warning flags (-Wx) are passed to python. If no flags are passed the default will be -Wd, otherwise the behavior will be the one specified by the flag. This will allow developers to use `python -Wi` to ignore errors explicitly. Best Regards, Ezio Melotti From rdmurray at bitdance.com Mon Nov 22 18:30:29 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 22 Nov 2010 12:30:29 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> Message-ID: <20101122173029.CB5AA235E1E@kimball.webabinitio.net> On Mon, 22 Nov 2010 12:00:14 -0500, Alexander Belopolsky wrote: > I recently updated chr() and ord() documentation and used > "narrow/wide" terms. I thought USC2/4 proponents objected to that on > the basis that these terms are imprecise. For reference, a grep in py3k/Doc reveals that there are currently exactly 23 lines mentioning UCS2 or UCS4 in the docs. Most are in the unicode part of the c-api, and 6 are in what's new for 2.2: c-api/arg.rst: Convert a null-terminated buffer of Unicode (UCS-2 or UCS-4) data to a Python c-api/arg.rst: Convert a Unicode (UCS-2 or UCS-4) data buffer and its length to a Python c-api/unicode.rst: for :c:type:`Py_UNICODE` and store Unicode values internally as UCS2. It is also c-api/unicode.rst: possible to build a UCS4 version of Python (most recent Linux distributions come c-api/unicode.rst: with UCS4 builds of Python). These builds then use a 32-bit type for c-api/unicode.rst: :c:type:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms c-api/unicode.rst: short` (UCS2) or :c:type:`unsigned long` (UCS4). c-api/unicode.rst:Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep c-api/unicode.rst: values is interpreted as an UCS-2 character. whatsnew/2.2.rst:usually stored as UCS-2, as 16-bit unsigned integers. Python 2.2 can also be whatsnew/2.2.rst:compiled to use UCS-4, 32-bit unsigned integers, as its internal encoding by whatsnew/2.2.rst:supplying :option:`--enable-unicode=ucs4` to the configure script. (It's also whatsnew/2.2.rst:When built to use UCS-4 (a "wide Python"), the interpreter can natively handle whatsnew/2.2.rst:compiled to use UCS-2 (a "narrow Python"), values greater than 65535 will still whatsnew/2.2.rst:Marc-Andr?? Lemburg. The changes to support using UCS-4 internally were howto/unicode.rst:.. comment Additional topic: building Python w/ UCS2 or UCS4 support howto/unicode.rst: - [ ] Building Python (UCS2, UCS4) library/sys.rst: characters are stored as UCS-2 or UCS-4. library/json.rst: specified. Encodings that are not ASCII based (such as UCS-2) are not faq/extending.rst:When importing module X, why do I get "undefined symbol: PyUnicodeUCS2*"? faq/extending.rst:If instead the name of the undefined symbol starts with ``PyUnicodeUCS4``, the faq/extending.rst: ... print('UCS4 build') faq/extending.rst: ... print('UCS2 build') -- R. David Murray www.bitdance.com From lukasz at langa.pl Mon Nov 22 18:35:16 2010 From: lukasz at langa.pl (=?UTF-8?B?xYF1a2FzeiBMYW5nYQ==?=) Date: Mon, 22 Nov 2010 18:35:16 +0100 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: <4CEAA4DB.6020904@gmail.com> References: <4CEAA4DB.6020904@gmail.com> Message-ID: <4CEAA9D4.2020904@langa.pl> Am 22.11.2010 18:14, schrieb Ezio Melotti: > I would like to re-enable by default warnings for regrtest and/or > unittest. +1 Especially in regrtest it could help manage stdlib quality (currently we have a horde of ResourceWarnings, zipfile mostly). I would even be +1 on making warnings errors for regrtest but that seems to be unpopular on #python-dev. Best regards, ?ukasz Langa From alexander.belopolsky at gmail.com Mon Nov 22 18:37:59 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 22 Nov 2010 12:37:59 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <20101122173029.CB5AA235E1E@kimball.webabinitio.net> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> <20101122173029.CB5AA235E1E@kimball.webabinitio.net> Message-ID: On Mon, Nov 22, 2010 at 12:30 PM, R. David Murray wrote: .. > For reference, a grep in py3k/Doc reveals that there are currently exactly > 23 lines mentioning UCS2 or UCS4 in the docs. Did you grep for USC-2 and USC-4 as well? I have to admit that my aversion to these terms is mostly due to the fact that I don't know how to spell them correctly. :-) From tjreedy at udel.edu Mon Nov 22 18:41:46 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 22 Nov 2010 12:41:46 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11/22/2010 5:48 AM, Stephen J. Turnbull wrote: > I disagree. I do see a problem with "UCS-2", because it fails to tell > us that Python implements a large number of features that make it easy > to do a very good job of working with non-BMP data in 16-bit builds of Yes. As I read the standard, UCS-2 is limited to BMP chars. So I was a bit confused when Python was described as UCS-2, until I realized that the term was inaccurate. Using that term punishes people like me who take the time to read the standard or otherwise learn what the term means. What Python does might be called USC-2+ or UCS-2e (xtended). -- Terry Jan Reedy From fuzzyman at voidspace.org.uk Mon Nov 22 18:45:58 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 22 Nov 2010 17:45:58 +0000 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: <4CEAA9D4.2020904@langa.pl> References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> Message-ID: <4CEAAC56.2090702@voidspace.org.uk> On 22/11/2010 17:35, ?ukasz Langa wrote: > Am 22.11.2010 18:14, schrieb Ezio Melotti: >> I would like to re-enable by default warnings for regrtest and/or >> unittest. > > +1 > > Especially in regrtest it could help manage stdlib quality (currently > we have a horde of ResourceWarnings, zipfile mostly). I would even be > +1 on making warnings errors for regrtest but that seems to be > unpopular on #python-dev. > Enabling it for regrtest makes sense. For unittest I still think it is a choice that should be left to developers. Michael > Best regards, > ?ukasz Langa > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From raymond.hettinger at gmail.com Mon Nov 22 19:13:30 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 22 Nov 2010 10:13:30 -0800 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Nov 22, 2010, at 2:48 AM, Stephen J. Turnbull wrote: > Raymond Hettinger writes: > >> Neither UTF-16 nor UCS-2 is exactly correct anyway. > > From a standards lawyer point of view, UCS-2 is exactly correct, You're twisting yourself into definitional knots. Any explanation we give users needs to let them know two things: * that we cover the entire range of unicode not just BMP * that sometimes len(chr(i)) is one and sometimes two The term UCS-2 is a complete communications failure in that regard. If someone looks up the term, they will immediately see something like the wikipedia entry which says, "UCS-2 cannot represent code points outside the BMP". How is that helpful? Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Mon Nov 22 19:29:33 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 22 Nov 2010 10:29:33 -0800 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Nov 22, 2010, at 9:41 AM, Terry Reedy wrote: > On 11/22/2010 5:48 AM, Stephen J. Turnbull wrote: > >> I disagree. I do see a problem with "UCS-2", because it fails to tell >> us that Python implements a large number of features that make it easy >> to do a very good job of working with non-BMP data in 16-bit builds of > > Yes. As I read the standard, UCS-2 is limited to BMP chars. So I was a bit confused when Python was described as UCS-2, until I realized that the term was inaccurate. Using that term punishes people like me who take the time to read the standard or otherwise learn what the term means. Bingo! Thanks for the excellent summary of the problem. > > What Python does might be called USC-2+ or UCS-2e (xtended). That would be a step in the right direction. Raymond From jcea at jcea.es Mon Nov 22 19:34:49 2010 From: jcea at jcea.es (Jesus Cea) Date: Mon, 22 Nov 2010 19:34:49 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling Message-ID: <4CEAB7C9.7020504@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 A Solaris installation contains ALWAYS 32 and 64 bits libraries. So in any Solaris you can run 32/64 bits programs, and compile in 32 and 64 bits. For this, libraries are stores in "/usr/lib", for instance, for 32 bits, while the same 64 bits libraries are stored in "/usr/lib/64". Currently, python do not considerate this. We have Solaris 10 buildslaves, but they compile in 32 bits, aparently. For instance . We now have 32 and 64 bits OpenIndiana buildslaves, so we can actually check this. They were deployed yesterday. Apparently the changes would be pretty simple, adding ".../64" to library paths, to try to find the extra libraries. What do you think?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOq3yZlgi5GaxT1NAQLQhAP9G2liX+YveYmfYDOuVjWWS8PE7r2wM/XA 5rik9mJM4Z7/wDnY4wrWjG5l3B9sSyrhhNI1YmIcXm4klfYxV9xTkG9dMNL+2bVc +s98rlTdjNlMVTf8Xc7U3tMpdkG/JK0+XWmRfWsf52ATdtxPHazI9L6KvqdYjNuZ 2w3dXNXErZE= =oYXo -----END PGP SIGNATURE----- From mal at egenix.com Mon Nov 22 19:53:00 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 22 Nov 2010 19:53:00 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEABC0C.4080909@egenix.com> Raymond Hettinger wrote: > Any explanation we give users needs to let them know two things: > * that we cover the entire range of unicode not just BMP > * that sometimes len(chr(i)) is one and sometimes two > > The term UCS-2 is a complete communications failure > in that regard. If someone looks up the term, they will > immediately see something like the wikipedia entry which says, > "UCS-2 cannot represent code points outside the BMP". > How is that helpful? It's very helpful, since it explains why a UCS-2 build of Python requires a surrogates pair to represent a non-BMP code point and explains why chr(i) gives you a length 2 string rather than a length 1 string. A UCS-4 build does not need to use surrogates for this, hence you get a length 1 string from chr(i). There are two levels we have to explain to users: 1. the transfer level 2. the storage level The UTF encodings address the transfer level and is what you deal with in I/O. These provide variable length encodings of the complete Unicode code point range, regardless of whether you have a UCS-2 or a UCS-4 build. The storage level becomes important if you want to work on strings using indexing and slicing. Here you do have to know whether you're dealing with a UCS-2 or a UCS-4 build, since the indexes will vary if you're using non-BMP code points. Finally, to tie both together, we have to explain that UTF-16 (the transfer encoding) maps to UCS-2 in a straight-forward way, so it is possible to work with a UCS-2 build of Python and still use the complete Unicode code point range - you only have to take into consideration, that Python's string indexing will not necessarily point you to n-th code point in a string, but may well give you half or a surrogate. Note that while that last aspect may appear like a good argument for UCS-4 builds, in reality it is not. UCS-4 has the same issue on a different level: the letters that get printed on the screen or printer (graphemes) may well be made up of multiple combining code points, e.g. an "e" and an "?". Those again map to two indexes in the Python string, even though, the appear to be one character on output. Now try to explain all of the above using the terms "narrow" and "wide" (while remembering "explicit is better than implicit" and "avoid the temptation to guess") :-) It is not really helpful to replace a correct and accurate term with a fuzzy term: either way we're stuck with the semantics. However, the correct and accurate terms at least give you a chance to figure out and understand the reasoning behind the design. UCS-2 vs. UCS-4 is a trade-off, "narrow" and "wide" is marketing talk with an implicit emphasis on one side :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 22 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ezio.melotti at gmail.com Mon Nov 22 19:58:33 2010 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Mon, 22 Nov 2010 20:58:33 +0200 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: <4CEAAC56.2090702@voidspace.org.uk> References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> <4CEAAC56.2090702@voidspace.org.uk> Message-ID: <4CEABD59.6080005@gmail.com> On 22/11/2010 19.45, Michael Foord wrote: > On 22/11/2010 17:35, ?ukasz Langa wrote: >> Am 22.11.2010 18:14, schrieb Ezio Melotti: >>> I would like to re-enable by default warnings for regrtest and/or >>> unittest. >> >> +1 >> >> Especially in regrtest it could help manage stdlib quality (currently >> we have a horde of ResourceWarnings, zipfile mostly). I would even be >> +1 on making warnings errors for regrtest but that seems to be >> unpopular on #python-dev. >> As I said on IRC I think it makes sense to turn them into errors once we fixed/silenced all the ones that we have now. That would help keeping the number of warning to 0. > > Enabling it for regrtest makes sense. For unittest I still think it is > a choice that should be left to developers. If we consider that most of the developers want to see them, I'd prefer to have the warnings by default rather than having to use -Wd explicitly every time I run the tests (keep in mind that many developers out there don't even know/remember that now they should use -Wd). > > Michael > >> Best regards, >> ?ukasz Langa >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > > From alexander.belopolsky at gmail.com Mon Nov 22 20:09:14 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 22 Nov 2010 14:09:14 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Nov 22, 2010 at 12:41 PM, Terry Reedy wrote: .. > What Python does might be called USC-2+ or UCS-2e (xtended). > Wow! I am not the only one who can't get the order of letters right in these acronyms. (I am usually consistent within one sentence, though.) :-) I-can't-spell-three-letter-acronyms-right-ly yours ... From brett at python.org Mon Nov 22 20:12:26 2010 From: brett at python.org (Brett Cannon) Date: Mon, 22 Nov 2010 11:12:26 -0800 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAB7C9.7020504@jcea.es> References: <4CEAB7C9.7020504@jcea.es> Message-ID: On Mon, Nov 22, 2010 at 10:34, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > A Solaris installation contains ALWAYS 32 and 64 bits libraries. So in > any Solaris you can run 32/64 bits programs, and compile in 32 and 64 bits. > > For this, libraries are stores in "/usr/lib", for instance, for 32 bits, > while the same 64 bits libraries are stored in "/usr/lib/64". > > Currently, python do not considerate this. > > We have Solaris 10 buildslaves, but they compile in 32 bits, aparently. > For instance > . > > We now have 32 and 64 bits OpenIndiana buildslaves, so we can actually > check this. They were deployed yesterday. > > Apparently the changes would be pretty simple, adding ".../64" to > library paths, to try to find the extra libraries. > > What do you think?. Are you asking about buildbots only or as a general policy? If you are asking about the buildbots then I definitely think we should use 64 bits. If you are asking about policy I would say it should be an option in case people are using C extensions that are not designed to work with 64 bits. From brett at python.org Mon Nov 22 20:24:34 2010 From: brett at python.org (Brett Cannon) Date: Mon, 22 Nov 2010 11:24:34 -0800 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: <4CEABD59.6080005@gmail.com> References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> <4CEAAC56.2090702@voidspace.org.uk> <4CEABD59.6080005@gmail.com> Message-ID: On Mon, Nov 22, 2010 at 10:58, Ezio Melotti wrote: > On 22/11/2010 19.45, Michael Foord wrote: >> >> On 22/11/2010 17:35, ?ukasz Langa wrote: >>> >>> Am 22.11.2010 18:14, schrieb Ezio Melotti: >>>> >>>> I would like to re-enable by default warnings for regrtest and/or >>>> unittest. >>> >>> +1 >>> >>> Especially in regrtest it could help manage stdlib quality (currently we >>> have a horde of ResourceWarnings, zipfile mostly). I would even be +1 on >>> making warnings errors for regrtest but that seems to be unpopular on >>> #python-dev. >>> > > As I said on IRC I think it makes sense to turn them into errors once we > fixed/silenced all the ones that we have now. That would help keeping the > number of warning to 0. I agree. > >> >> Enabling it for regrtest makes sense. For unittest I still think it is a >> choice that should be left to developers. > > If we consider that most of the developers want to see them, I'd prefer to > have the warnings by default rather than having to use -Wd explicitly every > time I run the tests (keep in mind that many developers out there don't even > know/remember that now they should use -Wd). The problem with that is it means developers who switch to Python 3.2 or whatever are suddenly going to have their tests fail until they update their code to turn the warnings off. Then again, if we make the switch for this dead simple to add and backwards-compatible so that turning them off doesn't trigger an error in older versions then I am all for turning warnings on by default. Another approach is to have unittest's runner, when run in verbose mode, print out what the warnings filter is set to so developers are aware that they are silencing warnings. -Brett > > >> >> Michael >> >>> Best regards, >>> ?ukasz Langa >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev at python.org >>> http://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: >>> http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk >> >> > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brett%40python.org > From jcea at jcea.es Mon Nov 22 20:26:40 2010 From: jcea at jcea.es (Jesus Cea) Date: Mon, 22 Nov 2010 20:26:40 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: References: <4CEAB7C9.7020504@jcea.es> Message-ID: <4CEAC3F0.4040806@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 22/11/10 20:12, Brett Cannon wrote: > Are you asking about buildbots only or as a general policy? If you are > asking about the buildbots then I definitely think we should use 64 > bits. If you are asking about policy I would say it should be an > option in case people are using C extensions that are not designed to > work with 64 bits. The point is that building python in 64 bits under Solaris (family) is not easy, because the 64 bits libraries (zlib, openssl, berkeley db, curses, etc., etc., etc) are not is "/usr/lib", "/usr/local/lib", etc., but "/usr/lib/64", "/usr/local/lib/64", etc. Solaris overcomes most of the issue having separate library searchpath in 32 and 64 bits (via the "crle" command). But in some cases python try to find some library in "/usr/local/lib", and my point is that it should search TOO inside "/usr/local/lib/64". - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOrD8Jlgi5GaxT1NAQJhRQP/dd4q70eXsq5AUFrleqUx3A+AagChpCcp UDHAomaX26cMl0tLFwLOd4SaKizzRMvjdTJc3GhZDIqYrF3QuqZAyLPjr5tyogP8 /4KPM73l5L2cb7IdHdSHpruwMh8f2WJ4S6+ig8DzOj6qBcttXKMymrV/skum4ENJ yb4mbpH9q/0= =Oe2G -----END PGP SIGNATURE----- From barry at python.org Mon Nov 22 20:28:43 2010 From: barry at python.org (Barry Warsaw) Date: Mon, 22 Nov 2010 14:28:43 -0500 Subject: [Python-Dev] issue 9807 - abiflags in paths and symlinks (updated patch) In-Reply-To: <20101110162719.11ae7fe6@mission> References: <20101110162719.11ae7fe6@mission> Message-ID: <20101122142843.45ae45ae@mission> On Nov 10, 2010, at 04:27 PM, Barry Warsaw wrote: >I finally found a chance to address all the outstanding technical issues >mentioned in bug 9807: > > http://bugs.python.org/issue9807 > >I've uploaded a new patch which contains the rest of the changes I'm >proposing. I think we still need consensus about whether these changes are >good to commit. With 3.2b1 coming soon, now's the time to do that. > >If there are any remaining concerns about the details of the patch, please add >them to the tracker issue. If you have any remaining objections to the >change, please let me know or follow up here. The patch has now been updated to address the last few comments in the tracker issue. I am now ready to commit it to py3k. If there are any remaining objections or concerns, please reply here or update the tracker issue. Otherwise, I plan to commit this to py3k on Wednesday. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From martin at v.loewis.de Mon Nov 22 20:42:16 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 22 Nov 2010 20:42:16 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAC3F0.4040806@jcea.es> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> Message-ID: <4CEAC798.5050707@v.loewis.de> > Solaris overcomes most of the issue having separate library searchpath > in 32 and 64 bits (via the "crle" command). But in some cases python try > to find some library in "/usr/local/lib", and my point is that it should > search TOO inside "/usr/local/lib/64". I don't think this will work. If the linker finds a library of the wrong ELF type, then it will choke. Before enabling anything on a build slave, a patch needs to be contributed to make it work in the first place. Regards, Martin From rdmurray at bitdance.com Mon Nov 22 20:50:14 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 22 Nov 2010 14:50:14 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> <20101122173029.CB5AA235E1E@kimball.webabinitio.net> Message-ID: <20101122195014.B3D9A235C94@kimball.webabinitio.net> On Mon, 22 Nov 2010 12:37:59 -0500, Alexander Belopolsky wrote: > On Mon, Nov 22, 2010 at 12:30 PM, R. David Murray wrote: > .. > > For reference, a grep in py3k/Doc reveals that there are currently exactly > > 23 lines mentioning UCS2 or UCS4 in the docs. > > Did you grep for USC-2 and USC-4 as well? I have to admit that my > aversion to these terms is mostly due to the fact that I don't know > how to spell them correctly. :-) I grepped using "-ri ucs." and eliminated the false positives (of which there were only a few) by hand. -- R. David Murray www.bitdance.com From guido at python.org Mon Nov 22 22:08:57 2010 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Nov 2010 13:08:57 -0800 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> <4CEAAC56.2090702@voidspace.org.uk> <4CEABD59.6080005@gmail.com> Message-ID: On Mon, Nov 22, 2010 at 11:24 AM, Brett Cannon wrote: > The problem with that is it means developers who switch to Python 3.2 > or whatever are suddenly going to have their tests fail until they > update their code to turn the warnings off. That sounds like a feature to me... :-) -- --Guido van Rossum (python.org/~guido) From jcea at jcea.es Mon Nov 22 22:31:21 2010 From: jcea at jcea.es (Jesus Cea) Date: Mon, 22 Nov 2010 22:31:21 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAC798.5050707@v.loewis.de> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> Message-ID: <4CEAE129.2060505@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 22/11/10 20:42, "Martin v. L?wis" wrote: > Before enabling anything on a build slave, a patch needs to be > contributed to make it work in the first place. I actually agree. I am not sure yet, but I am thinking that adding a "--build-64" parameter to "configure" could be an option under Solaris. Most OSs (let say, Linux) force you to choose 32/64 bits at install time, but Solaris can use both at the same time, and compilers allow to compile both (using -m32 or -m64). Since choosing 32 or 64 bits when compiling python under Solaris change the requirement, paths, etc., automating it should be a goal. PS: Martin, is there any reason to restrict the solaris 10 buildslaves to 32 bits, beside the said problems?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOrhKZlgi5GaxT1NAQI0cAP+OUFGVDd7UV6MdHzMenBn8fO3h4M1n0dR UZrVyYJhUYvEX9p7MRBdDNFY/6LrUITb3WCVegD3PuOymQP16GgksRfIA/jGDXyl Fe+Ed5amlDgdVPeVVH/55OodrO4SuOrJZ846G6GB1wav2IjR7I9YGxZQ6PA0LR7l 4Iph6HfcMlw= =hTNy -----END PGP SIGNATURE----- From v+python at g.nevcal.com Mon Nov 22 22:54:47 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 22 Nov 2010 13:54:47 -0800 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: References: <4CEA0246.9080607@g.nevcal.com> Message-ID: <4CEAE6A7.3010902@g.nevcal.com> On 11/22/2010 8:33 AM, Guido van Rossum wrote: > On Sun, Nov 21, 2010 at 9:40 PM, Glenn Linderman wrote: >> In reviewing my notes from my experimentations with CGIHTTPServer >> (Python2.6) and then http.server (Python 3.2a4), I note one behavior I >> haven't reported as a bug, nor do I know where to start to figure it out, >> other than experimentally. >> >> The experiment: launching CGIHTTPServer without environment variables, by >> the simple expedient of using a batch file to unset all the existing >> environment variables, and then launching Python2.6 with CGIHTTPServer. >> >> So it failed early: random.py fails at line 110 (Python 2.6). > What specific traceback do you get? In my copy of the code that line says > > a = long(_hexlify(_urandom(16)), 16) > > and I could just imagine that _urandom() fails for some reason to do > with the environment (it is a reference to os.urandom()), which, being > part of the C library code, might depend on the environment. > > But you're not giving enough info to debug this. Yep, that's the line. I'll have to re-run the scenario, but will do it on 3.2a4, hopefully tonight or tomorrow, to get the traceback. >> I suppose it is possible that some environment variables are used by Python >> directly (but I can't seem to find a documented list of them) although I >> would expect that usage to be optional, with fall-back defaults when they >> don't exist. > That is certainly the idea, but the fallbacks may not always be nice. > > Environment variables used by Python or the stdlib itself are supposed > to be named PYTHON if they are Python-specific, and there's > a way to disable all of these (-E). But there are other environment > variables (HOME and PATH come to mind) that have a broader definition > and that are used in some part of the stdlib. Plus, as I mentioned, > who knows what the non-Python C library uses (well, somebody probably > knows, but I don't know of a central source that we can actually trust > across the many platforms where Python runs). OK, thanks for the philosophy statement. That's what I didn't know, being new. >> I suppose it is even possible that some Windows APIs might >> depend on some environment variables, but I expected that the registry had >> replaced such usage completely, by now, with the environment variables >> mostly being a convenience tool for batch files, or for optional, temporary >> alteration of particular settings. > That sounds like wishful thinking. :-) Well, wishful thinking from me regarding the Windows and the registry is that Windows would be better off without a registry. But it seemed like their direction was instead to do away with environment variables, but in any case, I have little idea if they've achieved it, but should have achieved something in 6.1 versions of Windows! -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at voidspace.org.uk Mon Nov 22 23:01:12 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 22 Nov 2010 22:01:12 +0000 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> <4CEAAC56.2090702@voidspace.org.uk> <4CEABD59.6080005@gmail.com> Message-ID: <4CEAE828.5000801@voidspace.org.uk> On 22/11/2010 21:08, Guido van Rossum wrote: > On Mon, Nov 22, 2010 at 11:24 AM, Brett Cannon wrote: >> The problem with that is it means developers who switch to Python 3.2 >> or whatever are suddenly going to have their tests fail until they >> update their code to turn the warnings off. > That sounds like a feature to me... :-) > I think Ezio was suggesting just turning warnings on by default when unittest is run, not turning them into errors. Ezio is suggesting that developers could explicitly turn warnings off again, but when you use the default test runner warnings would be shown. His logic is that warnings are for developers, and so are tests... Michael -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From martin at v.loewis.de Mon Nov 22 23:05:40 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 22 Nov 2010 23:05:40 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAE129.2060505@jcea.es> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> Message-ID: <4CEAE934.9000106@v.loewis.de> > I actually agree. I am not sure yet, but I am thinking that adding a > "--build-64" parameter to "configure" could be an option under Solaris. > Most OSs (let say, Linux) force you to choose 32/64 bits at install > time Actually, that's not at all the case. Most systems these days support 32-bit and 64-bit applications simultaneously, and also support compiler tool chains that allow building for either mode. Solaris, Linux, and Windows are about on-par in this respect; OS X is more advanced as it allows to have a single binary that supports both 32-bit and 64-bit execution (making the need for adjusted path names irrelevant). > Since choosing 32 or 64 bits when compiling python under Solaris change > the requirement, paths, etc., automating it should be a goal. > > PS: Martin, is there any reason to restrict the solaris 10 buildslaves > to 32 bits, beside the said problems?. I don't see that as a restriction. I have to make a choice, and there are sooo many choices to make: - gcc vs. SunPRO - 32-bit vs. 64-bit - GNU make vs. /usr/ccs/bin/make I picked the combination which was most easy to setup, and is therefore likely to be used by most users (except for those who think 64-bit is somehow "better" than 32-bit, when it is actually the other way 'round - IMO). As for configuration, I personally prefer that setting CC indicates what type of build you want. Set CC to "gcc -m64" to indicate a 64-build. Ideally, you will *not* have to adjust library paths, since the other compiler will know on its own where to search things. Regards, Martin From nad at acm.org Mon Nov 22 23:12:05 2010 From: nad at acm.org (Ned Deily) Date: Mon, 22 Nov 2010 14:12:05 -0800 Subject: [Python-Dev] Solaris family and 64 bits compiling References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> Message-ID: In article <4CEAE129.2060505 at jcea.es>, Jesus Cea wrote: > On 22/11/10 20:42, "Martin v. L?wis" wrote: > > Before enabling anything on a build slave, a patch needs to be > > contributed to make it work in the first place. > > I actually agree. I am not sure yet, but I am thinking that adding a > "--build-64" parameter to "configure" could be an option under Solaris. > Most OSs (let say, Linux) force you to choose 32/64 bits at install > time, but Solaris can use both at the same time, and compilers allow to > compile both (using -m32 or -m64). > > Since choosing 32 or 64 bits when compiling python under Solaris change > the requirement, paths, etc., automating it should be a goal. You might want to look at the existing --with-universal-archs=ARCH in configure for how this is done for OS X builds. It's probably both simpler and more complicated than would be needed elsewhere: on OS X, a single file can contain object codes for multiple architectures, e.g 32-bit and 64-bit, rather than having to have multiple files. -- Ned Deily, nad at acm.org From brett at python.org Mon Nov 22 23:20:21 2010 From: brett at python.org (Brett Cannon) Date: Mon, 22 Nov 2010 14:20:21 -0800 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> <4CEAAC56.2090702@voidspace.org.uk> <4CEABD59.6080005@gmail.com> Message-ID: On Mon, Nov 22, 2010 at 13:08, Guido van Rossum wrote: > On Mon, Nov 22, 2010 at 11:24 AM, Brett Cannon wrote: >> The problem with that is it means developers who switch to Python 3.2 >> or whatever are suddenly going to have their tests fail until they >> update their code to turn the warnings off. > > That sounds like a feature to me... :-) =) I meant update their tests with the switch to turn off the warnings, not update to make the warnings properly disappear. I guess it's a question of whether it will be errors by default or simply output the warning. I can get behind printing the warnings by default and adding a switch to make them errors or off otherwise. -Brett > > -- > --Guido van Rossum (python.org/~guido) > From anurag.chourasia at gmail.com Mon Nov 22 23:46:16 2010 From: anurag.chourasia at gmail.com (Anurag Chourasia) Date: Tue, 23 Nov 2010 04:16:16 +0530 Subject: [Python-Dev] Missing Python Symbols when Starting Python App (Apache/Django/Mod_Wsgi) Message-ID: All, I have a problem in starting my Python(Django) App using Apache and Mod_Wsgi I am using Django 1.2.3 and Python 2.6.6 running on Apache 2.2.17 with Mod_Wsgi 3.3 When I try to access the app from Web Browser, I am getting these errors. [Mon Nov 22 09:45:25 2010] [notice] Apache/2.2.17 (Unix) mod_wsgi/3.3 Python/2.6.6 configured -- resuming normal operations [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] mod_wsgi (pid=1273874): Target WSGI script '/u01/home/apli/wm/app/gdd/pyserver/ apache/django.wsgi' cannot be loaded as Python module. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] mod_wsgi (pid=1273874): Exception occurred processing WSGI script '/u01/home/ apli/wm/app/gdd/pyserver/apache/django.wsgi'. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] Traceback (most recent call last): [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] File "/u01/ home/apli/wm/app/gdd/pyserver/apache/django.wsgi", line 19, in [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] import django.core.handlers.wsgi [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] File "/usr/ local/lib/python2.6/site-packages/django/core/handlers/wsgi.py", line 1, in [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from threading import Lock [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] File "/usr/ local/lib/python2.6/threading.py", line 13, in [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from functools import wraps [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] File "/usr/ local/lib/python2.6/functools.py", line 10, in [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from _functools import partial, reduce [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] ImportError: rtld: 0712-001 Symbol PyArg_UnpackTuple was referenced [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from module /usr/local/lib/python2.6/lib-dynload/_functools.so(), but a runtime definition [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] of the symbol was not found. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] rtld: 0712-001 Symbol PyCallable_Check was referenced [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from module /usr/local/lib/python2.6/lib-dynload/_functools.so(), but a runtime definition [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] of the symbol was not found. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] rtld: 0712-001 Symbol PyDict_Copy was referenced [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from module /usr/local/lib/python2.6/lib-dynload/_functools.so(), but a runtime definition [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] of the symbol was not found. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] rtld: 0712-001 Symbol PyDict_Merge was referenced [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from module /usr/local/lib/python2.6/lib-dynload/_functools.so(), but a runtime definition [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] of the symbol was not found. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] rtld: 0712-001 Symbol PyDict_New was referenced [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from module /usr/local/lib/python2.6/lib-dynload/_functools.so(), but a runtime definition [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] of the symbol was not found. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] rtld: 0712-001 Symbol PyErr_Occurred was referenced [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from module /usr/local/lib/python2.6/lib-dynload/_functools.so(), but a runtime definition [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] of the symbol was not found. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] rtld: 0712-001 Symbol PyErr_SetString was referenced [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] from module /usr/local/lib/python2.6/lib-dynload/_functools.so(), but a runtime definition [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] of the symbol was not found. [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] \t0509-021 Additional errors occurred but are not reported. I assume that those missing runtime definitions are supposed to be in the Python executable. Doing an nm on the first missing symbol reveals that it does exist. root [zibal]% nm /usr/local/bin/python | grep -i PyArg_UnpackTuple .PyArg_UnpackTuple T 268683204 524 PyArg_UnpackTuple D 537073500 PyArg_UnpackTuple d 537073500 12 PyArg_UnpackTuple:F-1 - 224 Please guide. Regards, Guddu -------------- next part -------------- An HTML attachment was scrubbed... URL: From merwok at netwok.org Mon Nov 22 23:51:18 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Mon, 22 Nov 2010 23:51:18 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAB7C9.7020504@jcea.es> References: <4CEAB7C9.7020504@jcea.es> Message-ID: <4CEAF3E6.4080602@netwok.org> Hi, I think this bug is related: http://bugs.python.org/issue1294959 ?Problems with /usr/lib64 builds.? Regards From tlesher at gmail.com Mon Nov 22 23:56:25 2010 From: tlesher at gmail.com (Tim Lesher) Date: Mon, 22 Nov 2010 17:56:25 -0500 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: <4CEAE6A7.3010902@g.nevcal.com> References: <4CEA0246.9080607@g.nevcal.com> <4CEAE6A7.3010902@g.nevcal.com> Message-ID: On Mon, Nov 22, 2010 at 16:54, Glenn Linderman wrote: > I suppose it is possible that some environment variables are used by Python > directly (but I can't seem to find a documented list of them) although I > would expect that usage to be optional, with fall-back defaults when they > don't exist. I can verify that that's the case: Python (at least through 3.1.2) runs fine on Windows platforms when environment variables are completely unavailable. I know that from running our port for Windows CE (which has no environment variables at all), cross-compiled for Windows XP. -- Tim Lesher From martin at v.loewis.de Tue Nov 23 00:16:47 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 23 Nov 2010 00:16:47 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAF3E6.4080602@netwok.org> References: <4CEAB7C9.7020504@jcea.es> <4CEAF3E6.4080602@netwok.org> Message-ID: <4CEAF9DF.6070509@v.loewis.de> Am 22.11.2010 23:51, schrieb ?ric Araujo: > Hi, > > I think this bug is related: http://bugs.python.org/issue1294959 > ?Problems with /usr/lib64 builds.? Perhaps more closely related: http://bugs.python.org/issue847812 http://bugs.python.org/issue1733484 http://bugs.python.org/issue1676121 http://bugs.python.org/issue1628484 Regards, Martin From jcea at jcea.es Tue Nov 23 00:41:19 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 23 Nov 2010 00:41:19 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAE934.9000106@v.loewis.de> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> Message-ID: <4CEAFF9F.5070503@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 22/11/10 23:05, "Martin v. L?wis" wrote: >> PS: Martin, is there any reason to restrict the solaris 10 buildslaves >> to 32 bits, beside the said problems?. > > I don't see that as a restriction. I have to make a choice, and there > are sooo many choices to make: > - gcc vs. SunPRO > - 32-bit vs. 64-bit > - GNU make vs. /usr/ccs/bin/make > > I picked the combination which was most easy to setup, and is therefore > likely to be used by most users (except for those who think 64-bit > is somehow "better" than 32-bit, when it is actually the other way > 'round - IMO). Do not think this is a personal attack. Not at all. I am deploying 32 and 64 bits buildslaves (in the same machine) and feeling the pain. You are far more experiences than me with buildbots and python. I want to know if I am missing something. > As for configuration, I personally prefer that setting CC indicates > what type of build you want. Set CC to "gcc -m64" to indicate a > 64-build. Ideally, you will *not* have to adjust library paths, since > the other compiler will know on its own where to search things. The problem is not with system library paths. Compilers overcome that. The problem is with things like "/usr/local/lib" and hardcoded library paths in Python. For example, checking : """ gcc -shared -m64 build/temp.solaris-2.11-i86pc-3.2-pydebug/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/readline.o - -L/usr/lib/termcap -L/usr/local/lib -lreadline -lncursesw -o build/lib.solaris-2.11-i86pc-3.2-pydebug/readline.so ld: fatal: file /usr/local/lib/libncursesw.so: wrong ELF class: ELFCLASS32 ld: fatal: file processing errors. No output written to build/lib.solaris-2.11-i86pc-3.2-pydebug/readline.so collect2: ld returned 1 exit status """ The "-L/usr/local/lib" should be "-L/usr/local/lib/64". An example of many. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOr/n5lgi5GaxT1NAQLzogP/Sb2VMe7UwK/YeB8/cQSxhuoKeNRre0pZ XCJDePusysqI3uXBHmH8vitEIILmUKd5kQ6vsFwErPIry7ikl2fbDHe7eQgNr2HK o5Xcul36bqtuKWGkDV+gIyBH/m9k4pkvc7Lfp3mvR7yiYTBB75V/azt64XSTC9si 7QjjetX5wnA= =NCtE -----END PGP SIGNATURE----- From benjamin at python.org Tue Nov 23 00:47:16 2010 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 22 Nov 2010 17:47:16 -0600 Subject: [Python-Dev] [Python-checkins] r86699 - python/branches/py3k/Lib/zipfile.py In-Reply-To: <20101122233126.C8BDBEE981@mail.python.org> References: <20101122233126.C8BDBEE981@mail.python.org> Message-ID: No test? 2010/11/22 lukasz.langa : > Author: lukasz.langa > Date: Tue Nov 23 00:31:26 2010 > New Revision: 86699 > > Log: > Issue #9846: ZipExtFile provides no mechanism for closing the underlying file object > > > > Modified: > ? python/branches/py3k/Lib/zipfile.py > > Modified: python/branches/py3k/Lib/zipfile.py > ============================================================================== > --- python/branches/py3k/Lib/zipfile.py (original) > +++ python/branches/py3k/Lib/zipfile.py Tue Nov 23 00:31:26 2010 > @@ -473,9 +473,11 @@ > ? ? # Search for universal newlines or line chunks. > ? ? PATTERN = re.compile(br'^(?P [^\r\n]+)|(?P \n|\r\n?)') > > - ? ?def __init__(self, fileobj, mode, zipinfo, decrypter=None): > + ? ?def __init__(self, fileobj, mode, zipinfo, decrypter=None, > + ? ? ? ? ? ? ? ? close_fileobj=False): > ? ? ? ? self._fileobj = fileobj > ? ? ? ? self._decrypter = decrypter > + ? ? ? ?self._close_fileobj = close_fileobj > > ? ? ? ? self._compress_type = zipinfo.compress_type > ? ? ? ? self._compress_size = zipinfo.compress_size > @@ -647,6 +649,12 @@ > ? ? ? ? self._offset += len(data) > ? ? ? ? return data > > + ? ?def close(self): > + ? ? ? ?try: > + ? ? ? ? ? ?if self._close_fileobj: > + ? ? ? ? ? ? ? ?self._fileobj.close() > + ? ? ? ?finally: > + ? ? ? ? ? ?super().close() > > > ?class ZipFile: > @@ -889,8 +897,10 @@ > ? ? ? ? # given a file object in the constructor > ? ? ? ? if self._filePassed: > ? ? ? ? ? ? zef_file = self.fp > + ? ? ? ? ? ?should_close = False > ? ? ? ? else: > ? ? ? ? ? ? zef_file = io.open(self.filename, 'rb') > + ? ? ? ? ? ?should_close = True > > ? ? ? ? # Make sure we have an info object > ? ? ? ? if isinstance(name, ZipInfo): > @@ -944,7 +954,7 @@ > ? ? ? ? ? ? if h[11] != check_byte: > ? ? ? ? ? ? ? ? raise RuntimeError("Bad password for file", name) > > - ? ? ? ?return ?ZipExtFile(zef_file, mode, zinfo, zd) > + ? ? ? ?return ?ZipExtFile(zef_file, mode, zinfo, zd, close_fileobj=should_close) > > ? ? def extract(self, member, path=None, pwd=None): > ? ? ? ? """Extract a member from the archive to the current working directory, > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://mail.python.org/mailman/listinfo/python-checkins > -- Regards, Benjamin From jcea at jcea.es Tue Nov 23 00:48:06 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 23 Nov 2010 00:48:06 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAE934.9000106@v.loewis.de> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> Message-ID: <4CEB0136.9050602@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I think this is probably trivial, but is there any foolproof way to detect 64 bit builds in python, beside "sys.maxint"?. And any macro useable for conditional compilation in C?. Checking Solaris 10 header files, I see macros like "_LP64". Portability would be nice, but in this personal case, probably unneeded... - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOsBNplgi5GaxT1NAQLkJwP+P1YyABBPGInHJXvwsU2ZLuj+u/OuZCRE m6hmbZgMajAyc5NtTie36qyHKAtVBcxFFvUdDeyfDZXV5gU+dF9Ha7/R16dclG3k b5W0CbccnGFcQJ/XypNPjH2dYPFDiqF8kCkDfeLJ7ZyL9ojA1YlRGFrgswN77/cF XM7Cwq1mh5k= =JXDq -----END PGP SIGNATURE----- From tjreedy at udel.edu Tue Nov 23 00:58:03 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 22 Nov 2010 18:58:03 -0500 Subject: [Python-Dev] Missing Python Symbols when Starting Python App (Apache/Django/Mod_Wsgi) In-Reply-To: References: Message-ID: On 11/22/2010 5:46 PM, Anurag Chourasia wrote: > > [Mon Nov 22 09:45:43 2010] [error] [client 108.10.0.191] mod_wsgi > (pid=1273874): Target WSGI script '/u01/home/apli/wm/app/gdd/pyserver/ > apache/django.wsgi' cannot be loaded as Python module. All other error stem probably from this. > Please guide. Ask usage questions like this on python-list or a django-specific list. python-list is for discussion of development of future versions of Python, not usage of current versions. -- Terry Jan Reedy From martin at v.loewis.de Tue Nov 23 01:05:59 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 23 Nov 2010 01:05:59 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEAFF9F.5070503@jcea.es> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> <4CEAFF9F.5070503@jcea.es> Message-ID: <4CEB0567.8040500@v.loewis.de> Am 23.11.2010 00:41, schrieb Jesus Cea: > On 22/11/10 23:05, "Martin v. L?wis" wrote: >>> PS: Martin, is there any reason to restrict the solaris 10 buildslaves >>> to 32 bits, beside the said problems?. > >> I don't see that as a restriction. I have to make a choice, and there >> are sooo many choices to make: >> - gcc vs. SunPRO >> - 32-bit vs. 64-bit >> - GNU make vs. /usr/ccs/bin/make > >> I picked the combination which was most easy to setup, and is therefore >> likely to be used by most users (except for those who think 64-bit >> is somehow "better" than 32-bit, when it is actually the other way >> 'round - IMO). > > Do not think this is a personal attack. No offense taken. If you really want to know the historical background: this was the very first build slave (before I actually announced it to python-dev), and I haven't changed much from the initial setup. I just point out that none of the binaries in /usr/bin is a 64-bit binary; this includes the Sun-provided /usr/sfw/bin/python > The "-L/usr/local/lib" should be "-L/usr/local/lib/64". An example of many. Is that really the case? I.e. will ncurses automatically install into /usr/local/lib/64 if built with a 64-bit compiler? My installation doesn't even have a /usr/local/lib/64 folder. In any case: this shouldn't need a configure option. Instead, Python can find out itself whether it's a 64-bit build, and make modifications it considers necessary. Regards, Martin From solipsis at pitrou.net Tue Nov 23 01:06:12 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 01:06:12 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> <4CEB0136.9050602@jcea.es> Message-ID: <20101123010612.119d401c@pitrou.net> On Tue, 23 Nov 2010 00:48:06 +0100 Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I think this is probably trivial, but is there any foolproof way to > detect 64 bit builds in python, beside "sys.maxint"?. sys.maxsize > And any macro useable for conditional compilation in C?. SIZEOF_VOID_P > 4 From brian.curtin at gmail.com Tue Nov 23 01:06:33 2010 From: brian.curtin at gmail.com (Brian Curtin) Date: Mon, 22 Nov 2010 18:06:33 -0600 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEB0136.9050602@jcea.es> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> <4CEB0136.9050602@jcea.es> Message-ID: On Mon, Nov 22, 2010 at 17:48, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I think this is probably trivial, but is there any foolproof way to > detect 64 bit builds in python, beside "sys.maxint"?. > import platform platform.architecture() -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Tue Nov 23 01:12:16 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 23 Nov 2010 01:12:16 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEB0136.9050602@jcea.es> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> <4CEB0136.9050602@jcea.es> Message-ID: <4CEB06E0.1080204@v.loewis.de> Am 23.11.2010 00:48, schrieb Jesus Cea: > I think this is probably trivial, but is there any foolproof way to > detect 64 bit builds in python, beside "sys.maxint"?. The canonical way is to use platform.architecture(). > And any macro useable for conditional compilation in C?. You need to be more specific than that. There are perhaps ten independent properties you may query, depending on what precise problem you try to solve. Most likely, you are looking for SIZEOF_VOID_P (but don't use that unless you literally want to know how many bytes a pointer uses, or whether it uses 4 or 8 bytes). Regards, Martin From lukasz at langa.pl Tue Nov 23 01:25:01 2010 From: lukasz at langa.pl (=?utf-8?Q?=C5=81ukasz_Langa?=) Date: Tue, 23 Nov 2010 01:25:01 +0100 Subject: [Python-Dev] [Python-checkins] r86699 - python/branches/py3k/Lib/zipfile.py In-Reply-To: References: <20101122233126.C8BDBEE981@mail.python.org> Message-ID: <66720F75-169A-4702-AF53-69845701AA55@langa.pl> Wiadomo?? napisana przez Benjamin Peterson w dniu 2010-11-23, o godz. 00:47: > No test? > The tests were there already, raising ResourceWarnings. After this change, they stopped doing that. You may say: now they pass for the first time :) Best regards, ?ukasz > 2010/11/22 lukasz.langa : >> Author: lukasz.langa >> Date: Tue Nov 23 00:31:26 2010 >> New Revision: 86699 >> >> Log: >> Issue #9846: ZipExtFile provides no mechanism for closing the underlying file object >> >> >> >> Modified: >> python/branches/py3k/Lib/zipfile.py >> >> Modified: python/branches/py3k/Lib/zipfile.py >> ============================================================================== >> --- python/branches/py3k/Lib/zipfile.py (original) >> +++ python/branches/py3k/Lib/zipfile.py Tue Nov 23 00:31:26 2010 >> @@ -473,9 +473,11 @@ >> # Search for universal newlines or line chunks. >> PATTERN = re.compile(br'^(?P [^\r\n]+)|(?P \n|\r\n?)') >> >> - def __init__(self, fileobj, mode, zipinfo, decrypter=None): >> + def __init__(self, fileobj, mode, zipinfo, decrypter=None, >> + close_fileobj=False): >> self._fileobj = fileobj >> self._decrypter = decrypter >> + self._close_fileobj = close_fileobj >> >> self._compress_type = zipinfo.compress_type >> self._compress_size = zipinfo.compress_size >> @@ -647,6 +649,12 @@ >> self._offset += len(data) >> return data >> >> + def close(self): >> + try: >> + if self._close_fileobj: >> + self._fileobj.close() >> + finally: >> + super().close() >> >> >> class ZipFile: >> @@ -889,8 +897,10 @@ >> # given a file object in the constructor >> if self._filePassed: >> zef_file = self.fp >> + should_close = False >> else: >> zef_file = io.open(self.filename, 'rb') >> + should_close = True >> >> # Make sure we have an info object >> if isinstance(name, ZipInfo): >> @@ -944,7 +954,7 @@ >> if h[11] != check_byte: >> raise RuntimeError("Bad password for file", name) >> >> - return ZipExtFile(zef_file, mode, zinfo, zd) >> + return ZipExtFile(zef_file, mode, zinfo, zd, close_fileobj=should_close) >> >> def extract(self, member, path=None, pwd=None): >> """Extract a member from the archive to the current working directory, >> _______________________________________________ >> Python-checkins mailing list >> Python-checkins at python.org >> http://mail.python.org/mailman/listinfo/python-checkins >> > > > > -- > Regards, > Benjamin > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://mail.python.org/mailman/listinfo/python-checkins -- Pozdrawiam serdecznie, ?ukasz Langa tel. +48 791 080 144 WWW http://lukasz.langa.pl/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From reinout at vanrees.org Mon Nov 22 23:52:10 2010 From: reinout at vanrees.org (Reinout van Rees) Date: Mon, 22 Nov 2010 23:52:10 +0100 Subject: [Python-Dev] Missing Python Symbols when Starting Python App (Apache/Django/Mod_Wsgi) In-Reply-To: References: Message-ID: On 11/22/2010 11:46 PM, Anurag Chourasia wrote: > > I have a problem in starting my Python(Django) App using Apache and Mod_Wsgi I'm pretty sure you're asking on the wrong list. This one is for discussing development of python-the-language :-) You'd better head over to the django-user mailinglist, for instance via http://groups.google.com/group/django-users Reinout -- Reinout van Rees - reinout at vanrees.org - http://reinout.vanrees.org Collega's gezocht! Django/python vacature in Utrecht: http://tinyurl.com/35v34f9 From lukasz at langa.pl Tue Nov 23 01:43:21 2010 From: lukasz at langa.pl (=?utf-8?Q?=C5=81ukasz_Langa?=) Date: Tue, 23 Nov 2010 01:43:21 +0100 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: <4CEAE828.5000801@voidspace.org.uk> References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> <4CEAAC56.2090702@voidspace.org.uk> <4CEABD59.6080005@gmail.com> <4CEAE828.5000801@voidspace.org.uk> Message-ID: Wiadomo?? napisana przez Michael Foord w dniu 2010-11-22, o godz. 23:01: > On 22/11/2010 21:08, Guido van Rossum wrote: >> On Mon, Nov 22, 2010 at 11:24 AM, Brett Cannon wrote: >>> The problem with that is it means developers who switch to Python 3.2 >>> or whatever are suddenly going to have their tests fail until they >>> update their code to turn the warnings off. >> That sounds like a feature to me... :-) >> > I think Ezio was suggesting just turning warnings on by default when unittest is run, not turning them into errors. Ezio is suggesting that developers could explicitly turn warnings off again, but when you use the default test runner warnings would be shown. His logic is that warnings are for developers, and so are tests... Then again, he is not against the idea to turn those warnings into errors, at least for regrtest. If you agree to do that for regrtest I will clean up the tests for warnings. Already did that for zipfile so it doesn't raise ResourceWarnings anymore. I just need to correct multiprocessing and xmlrpc ResourceWarnings, silence some DeprecationWarnings in the tests and we're all set. Ah, I see a couple more with -uall but nothing scary. Anyway, I find warnings as errors in regrtest a welcome feature. Let's make it happen :) -- Best regards, ?ukasz Langa tel. +48 791 080 144 WWW http://lukasz.langa.pl/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcea at jcea.es Tue Nov 23 01:47:01 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 23 Nov 2010 01:47:01 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEB0567.8040500@v.loewis.de> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> <4CEAFF9F.5070503@jcea.es> <4CEB0567.8040500@v.loewis.de> Message-ID: <4CEB0F05.1040700@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 23/11/10 01:05, "Martin v. L?wis" wrote: > No offense taken. If you really want to know the historical background: > this was the very first build slave (before I actually announced it to > python-dev), and I haven't changed much from the initial setup. I do really want to know. I love trivia :-). Thanks. > I just point out that none of the binaries in /usr/bin is a 64-bit > binary; this includes the Sun-provided /usr/sfw/bin/python > >> The "-L/usr/local/lib" should be "-L/usr/local/lib/64". An example of many. > > Is that really the case? I.e. will ncurses automatically install into > /usr/local/lib/64 if built with a 64-bit compiler? My installation > doesn't even have a /usr/local/lib/64 folder. A fresh Solaris 10 install doesn't even have a "/usr/local" directory :). Sadly today most Open Source code is written like if Linux were the only Unix system out there. I was amazed that OpenSSL 1.0 installs automatically in "/usr/local/ssl/lib" when compiled in 32 bits, and in "/usr/local/ssl/lib/64" when compiled in 64 bits. I almost cry. > In any case: this shouldn't need a configure option. Instead, Python can > find out itself whether it's a 64-bit build, and make modifications > it considers necessary. I agree. Python should detect it automatically and update the paths when compiling. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOsPBZlgi5GaxT1NAQIw+QP/ZuxpWo2WZYUUcDfARRnOtp60n4PbIGMf fqQ4ZnC9JnelzKDU9kBo0yReL2zYAw0ZwezsGwZ98M9i3XyKkFCtcJcM1vXpIsDL eBwga8kPDpab5loP/vuac5kVC0wn0Z0z8x+BRMW6mwoOMHJzd463E8GTQywdx3x1 06FUHwJ0Hv4= =PV43 -----END PGP SIGNATURE----- From jcea at jcea.es Tue Nov 23 01:58:46 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 23 Nov 2010 01:58:46 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEB0567.8040500@v.loewis.de> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> <4CEAFF9F.5070503@jcea.es> <4CEB0567.8040500@v.loewis.de> Message-ID: <4CEB11C6.1010504@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 23/11/10 01:05, "Martin v. L?wis" wrote: > I just point out that none of the binaries in /usr/bin is a 64-bit > binary; this includes the Sun-provided /usr/sfw/bin/python True. This is for simplicity reasons (provide only one binary valid for 32 and 64 bits CPUs) and because 64 bits is overkill for a lot of stuff. In my own system my only 64 bits libraries are OpenSSL, GMP, and some multimedia stuff like mencoder, vorbis, etc, where the difference is big. And the GCC 4.5.x install, that installs libraries (fortran, stdc++, objective C, etc) automatically under "/usr/local/lib/64". GOOD. But if we say the Python can be compiled as 64 bits under Solaris, would be nice if that was actually true. Now that we have a buildbot (under OpenIndiana) to test, it is doable. If not, we could say that Solaris+64 bits is unsupported. I don't think we should go that way. Solaris+64 bits should be a full citizen. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOsRxplgi5GaxT1NAQKqqAP/fkiPpnPswMYOWc30Bflg3nDqRf6ih1bW ZZYHEMuJN9C8rm419LnRtoTyeAruHQYJ3o/dAoA2xDZu1xDYz8OOJKzG1L8hRVce OGm9TmziS4zuwWS4sYdmh21/ZCuD0MVq3gqD1h8zYPwrqbTTA6shYr6/He5hAo6j 5PsYWj4gIAE= =Rr80 -----END PGP SIGNATURE----- From benjamin at python.org Tue Nov 23 05:00:08 2010 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 22 Nov 2010 22:00:08 -0600 Subject: [Python-Dev] [Python-checkins] r86699 - python/branches/py3k/Lib/zipfile.py In-Reply-To: <66720F75-169A-4702-AF53-69845701AA55@langa.pl> References: <20101122233126.C8BDBEE981@mail.python.org> <66720F75-169A-4702-AF53-69845701AA55@langa.pl> Message-ID: 2010/11/22 ?ukasz Langa : > Wiadomo?? napisana przez Benjamin Peterson w dniu 2010-11-23, o godz. 00:47: > > No test? > > > The tests were there already, raising ResourceWarnings. After this change, > they stopped doing that. You may say: now they pass for the first time :) It looks like you added new API, though. For that, we would expect new tests. -- Regards, Benjamin From ocean-city at m2.ccsnet.ne.jp Tue Nov 23 05:13:38 2010 From: ocean-city at m2.ccsnet.ne.jp (Hirokazu Yamamoto) Date: Tue, 23 Nov 2010 13:13:38 +0900 Subject: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a) Message-ID: <4CEB3F72.7000006@m2.ccsnet.ne.jp> Hello. Does this affect python? Thank you. http://www.openssl.org/news/secadv_20101116.txt From glyph at twistedmatrix.com Tue Nov 23 06:07:09 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 23 Nov 2010 00:07:09 -0500 Subject: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a) In-Reply-To: <4CEB3F72.7000006@m2.ccsnet.ne.jp> References: <4CEB3F72.7000006@m2.ccsnet.ne.jp> Message-ID: On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto < ocean-city at m2.ccsnet.ne.jp> wrote: > Hello. Does this affect python? Thank you. > > http://www.openssl.org/news/secadv_20101116.txt > No. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Nov 23 07:13:44 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 23 Nov 2010 01:13:44 -0500 Subject: [Python-Dev] [Python-checkins] r86702 - python/branches/py3k/Lib/idlelib/IOBinding.py In-Reply-To: <20101123060131.EB345EE9C0@mail.python.org> References: <20101123060131.EB345EE9C0@mail.python.org> Message-ID: <4CEB5B98.6070003@udel.edu> On 11/23/2010 1:01 AM, terry.reedy wrote: > Author: terry.reedy > Date: Tue Nov 23 07:01:31 2010 > New Revision: 86702 > > Log: Issue 9222 Fix filetypes for open dialog Sorry, forgot to add this before clicking [go] or whatever the button is. Is there any way to revise a revision ;-? > Modified: > python/branches/py3k/Lib/idlelib/IOBinding.py > > Modified: python/branches/py3k/Lib/idlelib/IOBinding.py > ============================================================================== > --- python/branches/py3k/Lib/idlelib/IOBinding.py (original) > +++ python/branches/py3k/Lib/idlelib/IOBinding.py Tue Nov 23 07:01:31 2010 > @@ -476,8 +476,8 @@ > savedialog = None > > filetypes = [ > - ("Python and text files", "*.py *.pyw *.txt", "TEXT"), > - ("All text files", "*", "TEXT"), > + ("Python files", "*.py *.pyw", "TEXT"), > + ("Text files", "*.txt", "TEXT"), > ("All files", "*"), > ] From orsenthil at gmail.com Tue Nov 23 07:16:12 2010 From: orsenthil at gmail.com (Senthil Kumaran) Date: Tue, 23 Nov 2010 14:16:12 +0800 Subject: [Python-Dev] [Python-checkins] r86703 - python/branches/release31-maint/Lib/idlelib/IOBinding.py In-Reply-To: <20101123060705.0651CEE9C0@mail.python.org> References: <20101123060705.0651CEE9C0@mail.python.org> Message-ID: Hi Terry, On Tue, Nov 23, 2010 at 2:07 PM, terry.reedy wrote: > Author: terry.reedy > Date: Tue Nov 23 07:07:04 2010 > New Revision: 86703 > > Log: > Issue 9222 Fix filetypes for open dialog > > Modified: > ? python/branches/release31-maint/Lib/idlelib/IOBinding.py You should be using svnmerge.py script ( referenced in the dev FAQ), to merge your changes to release31-maint. This helps in merge tracking and helpful to release managers when they do the release. It is pretty simple, in your release31-maint checkout: Just run python svnmerge.py merge -r 9221 (your py3k revision value) If successful, do a svn commit -F svnmerge-output-filename ( this file is autogenerated) If any conflicts occur, resolve them and then do the step 2. Thanks, Senthil From g.brandl at gmx.net Tue Nov 23 07:44:43 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 23 Nov 2010 07:44:43 +0100 Subject: [Python-Dev] [Python-checkins] r86702 - python/branches/py3k/Lib/idlelib/IOBinding.py In-Reply-To: <4CEB5B98.6070003@udel.edu> References: <20101123060131.EB345EE9C0@mail.python.org> <4CEB5B98.6070003@udel.edu> Message-ID: Am 23.11.2010 07:13, schrieb Terry Reedy: > > > On 11/23/2010 1:01 AM, terry.reedy wrote: >> Author: terry.reedy >> Date: Tue Nov 23 07:01:31 2010 >> New Revision: 86702 >> >> Log: > Issue 9222 Fix filetypes for open dialog > > Sorry, forgot to add this before clicking [go] or whatever the button > is. Is there any way to revise a revision ;-? Yes, with SVN there is. I don't know if you can do it with whatever GUI tool you use, but the command is the following: svn propedit --revprop -r 86702 svn:log In a short time however, after switching to Mercurial, commits will be truly immutable. However, since the equivalent to committing in SVN is a two-step process (commit locally and then push one or more commits to the public repo on the server), you can review your commits locally before pushing them, and fix mistakes by "rewriting history" (you can see from that description that it won't work when the changes are already public). Georg From tjreedy at udel.edu Tue Nov 23 07:49:56 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 23 Nov 2010 01:49:56 -0500 Subject: [Python-Dev] [Python-checkins] r86703 - python/branches/release31-maint/Lib/idlelib/IOBinding.py In-Reply-To: References: <20101123060705.0651CEE9C0@mail.python.org> Message-ID: <4CEB6414.9020606@udel.edu> On 11/23/2010 1:16 AM, Senthil Kumaran wrote: > Hi Terry, > > On Tue, Nov 23, 2010 at 2:07 PM, terry.reedy wrote: >> Author: terry.reedy >> Date: Tue Nov 23 07:07:04 2010 >> New Revision: 86703 >> >> Log: >> Issue 9222 Fix filetypes for open dialog >> >> Modified: >> python/branches/release31-maint/Lib/idlelib/IOBinding.py > > > You should be using svnmerge.py script ( referenced in the dev FAQ), > to merge your changes to release31-maint. This helps in merge tracking > and helpful to release managers when they do the release. > > It is pretty simple, in your release31-maint checkout: > > Just run python svnmerge.py merge -r 9221 (your py3k revision value) > If successful, do a svn commit -F svnmerge-output-filename ( this file > is autogenerated) I am using TortoiseSVN which has a similar merge but does not seem to autogenerate anything. I did use its merge + commit for the 2.7 backport. Terry From martin at v.loewis.de Tue Nov 23 07:55:20 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 23 Nov 2010 07:55:20 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEB11C6.1010504@jcea.es> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> <4CEAFF9F.5070503@jcea.es> <4CEB0567.8040500@v.loewis.de> <4CEB11C6.1010504@jcea.es> Message-ID: <4CEB6558.3000600@v.loewis.de> > But if we say the Python can be compiled as 64 bits under Solaris, would > be nice if that was actually true. Now that we have a buildbot (under > OpenIndiana) to test, it is doable. But it is true, and always has been true. The lib/64 issue did not prevent one building Python on Solaris/SPARC64 at all, including the extension modules. Just edit Modules/Setup to suit your needs - that works since 1995 (before distutils was even written). > If not, we could say that Solaris+64 bits is unsupported. I don't think > we should go that way. Solaris+64 bits should be a full citizen. There we go again: "supported". Python builds on many systems which we don't have buildbots for, including obscure systems (although Guido has ruled that we won't specifically accept code for obscure systems anymore, unlike we did before). It is never fully automatic (you always have to at least make sure manually that the dependencies are installed). Regards, Martin From tjreedy at udel.edu Tue Nov 23 08:16:11 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 23 Nov 2010 02:16:11 -0500 Subject: [Python-Dev] [Python-checkins] r86702 - python/branches/py3k/Lib/idlelib/IOBinding.py In-Reply-To: References: <20101123060131.EB345EE9C0@mail.python.org> <4CEB5B98.6070003@udel.edu> Message-ID: On 11/23/2010 1:44 AM, Georg Brandl wrote: > Am 23.11.2010 07:13, schrieb Terry Reedy: >> >> >> On 11/23/2010 1:01 AM, terry.reedy wrote: >>> Author: terry.reedy >>> Date: Tue Nov 23 07:01:31 2010 >>> New Revision: 86702 >>> >>> Log: >> Issue 9222 Fix filetypes for open dialog >> >> Sorry, forgot to add this before clicking [go] or whatever the button >> is. Is there any way to revise a revision ;-? > > Yes, with SVN there is. I don't know if you can do it with whatever > GUI tool you use, but the command is the following: > > svn propedit --revprop -r 86702 svn:log (followed by new message?) OK, done. TortoiseSVN has a nice revision log dialog. Right click and one of the choices is 'edit log message'. Easy. I see that there is a TortoiseHg as well. -- Terry Jan Reedy From g.brandl at gmx.net Tue Nov 23 09:10:46 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 23 Nov 2010 09:10:46 +0100 Subject: [Python-Dev] [Python-checkins] r86703 - python/branches/release31-maint/Lib/idlelib/IOBinding.py In-Reply-To: <4CEB6414.9020606@udel.edu> References: <20101123060705.0651CEE9C0@mail.python.org> <4CEB6414.9020606@udel.edu> Message-ID: Am 23.11.2010 07:49, schrieb Terry Reedy: > > > On 11/23/2010 1:16 AM, Senthil Kumaran wrote: >> Hi Terry, >> >> On Tue, Nov 23, 2010 at 2:07 PM, terry.reedy wrote: >>> Author: terry.reedy >>> Date: Tue Nov 23 07:07:04 2010 >>> New Revision: 86703 >>> >>> Log: >>> Issue 9222 Fix filetypes for open dialog >>> >>> Modified: >>> python/branches/release31-maint/Lib/idlelib/IOBinding.py >> >> >> You should be using svnmerge.py script ( referenced in the dev FAQ), >> to merge your changes to release31-maint. This helps in merge tracking >> and helpful to release managers when they do the release. >> >> It is pretty simple, in your release31-maint checkout: >> >> Just run python svnmerge.py merge -r 9221 (your py3k revision value) >> If successful, do a svn commit -F svnmerge-output-filename ( this file >> is autogenerated) > > I am using TortoiseSVN which has a similar merge but does not seem to > autogenerate anything. I did use its merge + commit for the 2.7 backport. While the policy is to use svnmerge and I'd expect developers to follow this policy, in this specific case it's not as important anymore since we use neither svnmerge's mass merging nor its blocking feature anymore. Georg From trent at snakebite.org Tue Nov 23 09:40:50 2010 From: trent at snakebite.org (Trent Nelson) Date: Tue, 23 Nov 2010 03:40:50 -0500 Subject: [Python-Dev] Stable buildbots In-Reply-To: References: <20101113133712.60e9be27@pitrou.net> Message-ID: <4CEB7E12.1070201@snakebite.org> On 14-Nov-10 3:48 AM, David Bolen wrote: > This is a completely separate issue, though probably around just as > long, and like the popup problem its frequency changes over time. By > "hung" here I'm referring to cases where something must go wrong with > a test and/or its cleanup such that a python_d process remains > running, usually several of them at the same time. My guess: the "hung" (single-threaded) Python process has called select() without a timeout in order to wait for some data. However, the data never arrives (due to a broken/failed test), and the select() never returns. On Windows, processes seem harder to kill when they get into this state. If I purposely wedge a Windows process via select() via the interactive interpreter, ctrl-c has absolutely no effect (whereas on Unix, ctrl-c will interrupt the select()). As for why kill_python.exe doesn't seem to be able to kill said wedged processes, the MSDN documentation on TerminateProcess[1] states the following: The terminated process cannot exit until all pending I/O has been completed or canceled. (sic) It's not unreasonable to assume a wedged select() constitutes pending I/O, so that's a possible explanation as to why kill_python.exe isn't able to terminate the processes. (Also, kill_python currently assumes TerminateProcess() always works; perhaps this optimism is misplaced. Also note the XXX TODO regarding the fact that we don't kill processes that have loaded our python*.dll, but may not be named python_d.exe. I don't think that's the issue here, though.) On 14-Nov-10 5:32 AM, David Bolen wrote: > "Martin v. L?wis" writes: > >> This is what kill_python.exe is supposed to solve. So I recommend to >> investigate why it fails to kill the hanging Pythons. > > Yeah, I know, and I can't say I disagree in principle - not sure why > Windows doesn't let the kill in that module work (or if there's an > issue actually running it under all conditions). > > At the moment though, I do know that using the sysinternals pskill > utility externally (which is what I currently do interactively) > definitely works so to be honest, That's interesting. (That kill_python.exe doesn't kill the wedged processes, but pskill does.) kill_python is pretty simple, it just calls TerminateProcess() after acquiring a handle with the relevant PROCESS_TERMINATE access right. That being said, that's the recommended way to kill a process -- I doubt pskill would be going about it any differently (although, it is sysinternals... you never know what kind of crazy black magic it's doing behind the scenes). Are you calling pskill with the -t flag? i.e. kill process and all dependents? That might be the ticket, especially if killing the child process that wedged select() is waiting on causes it to return, and thus, makes it killable. Otherwise, if it happens again, can you try kill_python.exe first, then pskill, and confirm if the former fails but the latter succeeds? Trent. [1]: http://msdn.microsoft.com/en-us/library/ms686714(VS.85).aspx From v+python at g.nevcal.com Tue Nov 23 11:30:31 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 23 Nov 2010 02:30:31 -0800 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: References: <4CEA0246.9080607@g.nevcal.com> Message-ID: <4CEB97C7.1070708@g.nevcal.com> On 11/22/2010 8:33 AM, Guido van Rossum wrote: > On Sun, Nov 21, 2010 at 9:40 PM, Glenn Linderman wrote: >> > In reviewing my notes from my experimentations with CGIHTTPServer >> > (Python2.6) and then http.server (Python 3.2a4), I note one behavior I >> > haven't reported as a bug, nor do I know where to start to figure it out, >> > other than experimentally. >> > >> > The experiment: launching CGIHTTPServer without environment variables, by >> > the simple expedient of using a batch file to unset all the existing >> > environment variables, and then launching Python2.6 with CGIHTTPServer. >> > >> > So it failed early: random.py fails at line 110 (Python 2.6). > What specific traceback do you get? In my copy of the code that line says > > a = long(_hexlify(_urandom(16)), 16) > > and I could just imagine that _urandom() fails for some reason to do > with the environment (it is a reference to os.urandom()), which, being > part of the C library code, might depend on the environment. > > But you're not giving enough info to debug this. OK, here is the traceback. I've upgraded the application from Python 2.6 + CGIHTTPServer.py + bugfixes to Python 3.2a4 + http.server + bugfixes, hoping that it would fix it, but since it didn't that the traceback would be more relevant. It seems that _urandom is the likely culprit. Traceback (most recent call last): File "d:\my\web\areliabl\0test\https.py", line 5, in import server File "d:\my\web\areliabl\0test\server.py", line 88, in import email.message File "C:\Python32\lib\email\message.py", line 17, in from email import utils File "C:\Python32\lib\email\utils.py", line 27, in import random File "C:\Python32\lib\random.py", line 698, in _inst = Random() File "C:\Python32\lib\random.py", line 90, in __init__ self.seed(x) File "C:\Python32\lib\random.py", line 108, in seed a = int.from_bytes(_urandom(32), 'big') WindowsError: [Error -2146893818] Invalid Signature -------------- next part -------------- An HTML attachment was scrubbed... URL: From amauryfa at gmail.com Tue Nov 23 11:55:08 2010 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 23 Nov 2010 11:55:08 +0100 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: <4CEB97C7.1070708@g.nevcal.com> References: <4CEA0246.9080607@g.nevcal.com> <4CEB97C7.1070708@g.nevcal.com> Message-ID: Hi, 2010/11/23 Glenn Linderman : > ? File "C:\Python32\lib\random.py", line 108, in seed > ??? a = int.from_bytes(_urandom(32), 'big') > WindowsError: [Error -2146893818] Invalid Signature In the subprocess documentation http://docs.python.org/library/subprocess.html """On Windows, in order to run a side-by-side assembly the specified env *must* include a valid SystemRoot.""" Can you keep this variable and start again? -- Amaury Forgeot d'Arc From martin at v.loewis.de Tue Nov 23 12:55:38 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 23 Nov 2010 12:55:38 +0100 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: References: <4CEA0246.9080607@g.nevcal.com> <4CEB97C7.1070708@g.nevcal.com> Message-ID: <4CEBABBA.9050002@v.loewis.de> Am 23.11.2010 11:55, schrieb Amaury Forgeot d'Arc: > Hi, > > 2010/11/23 Glenn Linderman : >> File "C:\Python32\lib\random.py", line 108, in seed >> a = int.from_bytes(_urandom(32), 'big') >> WindowsError: [Error -2146893818] Invalid Signature > > In the subprocess documentation http://docs.python.org/library/subprocess.html > """On Windows, in order to run a side-by-side assembly the specified > env *must* include a valid SystemRoot.""" Indeed, setting SystemRoot might solve this problem. According to http://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/ CrypoAPI, in Windows 7, requires this variable be set. Failure to find the enhanced crypto provider would explain why the "random" module of Python fails to work. The specific cause is in the registry: HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Cryptography\Defaults\Provider\Microsoft Strong Cryptographic Provider has as it's ImagePath value %SystemRoot%\system32\rsaenh.dll So the registry (and COM) do rely on environment variables. Regards, Martin From stephen at xemacs.org Tue Nov 23 13:15:20 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 23 Nov 2010 21:15:20 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <877hg4ck2v.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > Yes. As I read the standard, UCS-2 is limited to BMP chars. Et tu, Terry? OK, I change my vote on the suggestion of "UCS2" to -1. If a couple of conscientious blokes like you and David both understand it that way, I can't see any way to fight it. FWIW, ISO/IEC 10646 (which is authoritative for UCS-2 and UCS-4) is available via http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html Probably I'm the last non-author to ever read that document! From nadeem.vawda at gmail.com Tue Nov 23 13:15:18 2010 From: nadeem.vawda at gmail.com (Nadeem Vawda) Date: Tue, 23 Nov 2010 14:15:18 +0200 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> <4CEAAC56.2090702@voidspace.org.uk> <4CEABD59.6080005@gmail.com> <4CEAE828.5000801@voidspace.org.uk> Message-ID: 2010/11/23 ?ukasz Langa : > If you agree to do that for regrtest I will clean up the tests for warnings. > Already did that for zipfile so it doesn't raise ResourceWarnings anymore. I > just need to correct multiprocessing and xmlrpc ResourceWarnings, silence > some DeprecationWarnings in the tests and we're all set. Ah, I see a couple > more with -uall but nothing scary. There are also some in test_socket - I've submitted a patch on Roundup: http://bugs.python.org/issue10512 Looking at the multiprocessing warnings, they seem to be caused by leaks in the underlying package, unlike xmlrpc and socket, where it's just a matter of the test code neglecting to close the connection. So +1 to: > Anyway, I find warnings as errors in regrtest a welcome feature. Let's make > it happen :) Nadeem From jcea at jcea.es Tue Nov 23 13:19:39 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 23 Nov 2010 13:19:39 +0100 Subject: [Python-Dev] Solaris family and 64 bits compiling In-Reply-To: <4CEB6558.3000600@v.loewis.de> References: <4CEAB7C9.7020504@jcea.es> <4CEAC3F0.4040806@jcea.es> <4CEAC798.5050707@v.loewis.de> <4CEAE129.2060505@jcea.es> <4CEAE934.9000106@v.loewis.de> <4CEAFF9F.5070503@jcea.es> <4CEB0567.8040500@v.loewis.de> <4CEB11C6.1010504@jcea.es> <4CEB6558.3000600@v.loewis.de> Message-ID: <4CEBB15B.1010800@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 23/11/10 07:55, "Martin v. L?wis" wrote: >> >> But if we say the Python can be compiled as 64 bits under Solaris, would >> >> be nice if that was actually true. Now that we have a buildbot (under >> >> OpenIndiana) to test, it is doable. > > > > But it is true, and always has been true. The lib/64 issue did not > > prevent one building Python on Solaris/SPARC64 at all, including the > > extension modules. Just edit Modules/Setup to suit your needs - that > > works since 1995 (before distutils was even written). Would be acceptable to change something like: """ add_library_path("/usr/local/lib") """ to something similar to: """ if (platform.uname()=="SunOS") and (platform.architecture()[0]=="64bits") : add_library_path("/usr/local/lib/64") else : add_library_path("/usr/local/lib") """ python-dev would consider that change OK?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOuxW5lgi5GaxT1NAQJuDwP/dzbhDZScanoSnPeF4Ze5XHm+WnSmowx+ x9qvM782i4bYzqYNsbpPHflshROpUwdl9dC0/dFySLFWmMYo12hYogbM6vr5RD6k vEgq1iriIfsei9yNrtt2Ou6+1LVxJ2FMsbpY0Av5hDQVfuJpvB5WRML/mbyYj4T7 9w/jmPT2+rc= =riDG -----END PGP SIGNATURE----- From ncoghlan at gmail.com Tue Nov 23 14:41:05 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 23 Nov 2010 23:41:05 +1000 Subject: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS In-Reply-To: <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> Message-ID: On Tue, Nov 23, 2010 at 2:46 AM, wrote: > On 04:24 pm, solipsis at pitrou.net wrote: >> >> On Mon, 22 Nov 2010 17:08:36 +0100 >> Hrvoje Niksic wrote: >>> >>> On 11/22/2010 04:37 PM, Antoine Pitrou wrote: >>> > +1. ?The problem with int constants is that the int gets printed, not >>> > the name, when you dump them for debugging purposes :) >>> >>> Well, it's trivial to subclass int to something with a nicer __repr__. >>> PyGTK uses that technique for wrapping C enums: >> >> Nice. It might be useful to add a private _Constant class somewhere for >> stdlib purposes. > > http://www.python.org/dev/peps/pep-0354/ Indeed, it is difficult to do enums is such a way that they feel sufficiently robust to be worth the effort of including them (although these days, I would be inclined to follow the namedtuple API style rather than that presented in PEP 354). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Tue Nov 23 14:50:53 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 23 Nov 2010 13:50:53 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> Message-ID: <4CEBC6BD.9060402@voidspace.org.uk> On 23/11/2010 13:41, Nick Coghlan wrote: > On Tue, Nov 23, 2010 at 2:46 AM, wrote: >> On 04:24 pm, solipsis at pitrou.net wrote: >>> On Mon, 22 Nov 2010 17:08:36 +0100 >>> Hrvoje Niksic wrote: >>>> On 11/22/2010 04:37 PM, Antoine Pitrou wrote: >>>>> +1. The problem with int constants is that the int gets printed, not >>>>> the name, when you dump them for debugging purposes :) >>>> Well, it's trivial to subclass int to something with a nicer __repr__. >>>> PyGTK uses that technique for wrapping C enums: >>> Nice. It might be useful to add a private _Constant class somewhere for >>> stdlib purposes. >> http://www.python.org/dev/peps/pep-0354/ > Indeed, it is difficult to do enums is such a way that they feel > sufficiently robust to be worth the effort of including them (although > these days, I would be inclined to follow the namedtuple API style > rather than that presented in PEP 354). Right. As it happens I just submitted a patch to Barry Warsaw's enum package (nice), flufl.enum [1], to allow namedtuple style creation of named constants: >>> from flufl.enum import make_enum >>> Colors = make_enum('Colors', 'red green blue') >>> Colors PEP 354 was rejected for two primary reasons - lack of interest and nowhere obvious to put it. Would it be *so bad* if an enum type lived in its own module? There is certainly more interest now, and if we are to use something like this in the standard library it *has* to be in the standard library (unless every module implements their own private _Constant class). Time to revisit the PEP? All the best, Michael [1] https://launchpad.net/flufl.enum > Cheers, > Nick. > -- http://www.voidspace.org.uk/ From solipsis at pitrou.net Tue Nov 23 15:02:19 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 15:02:19 +0100 Subject: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a) References: <4CEB3F72.7000006@m2.ccsnet.ne.jp> Message-ID: <20101123150219.29e20374@pitrou.net> On Tue, 23 Nov 2010 00:07:09 -0500 Glyph Lefkowitz wrote: > On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto < > ocean-city at m2.ccsnet.ne.jp> wrote: > > > Hello. Does this affect python? Thank you. > > > > http://www.openssl.org/news/secadv_20101116.txt > > > > No. Well, actually it does, but Python links against the system OpenSSL on most platforms (except Windows), so it's up to the OS vendor to apply the patch. Regards Antoine. From ncoghlan at gmail.com Tue Nov 23 15:03:53 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 24 Nov 2010 00:03:53 +1000 Subject: [Python-Dev] Re-enable warnings in regrtest and/or unittest In-Reply-To: <4CEAE828.5000801@voidspace.org.uk> References: <4CEAA4DB.6020904@gmail.com> <4CEAA9D4.2020904@langa.pl> <4CEAAC56.2090702@voidspace.org.uk> <4CEABD59.6080005@gmail.com> <4CEAE828.5000801@voidspace.org.uk> Message-ID: On Tue, Nov 23, 2010 at 8:01 AM, Michael Foord wrote: > On 22/11/2010 21:08, Guido van Rossum wrote: >> >> On Mon, Nov 22, 2010 at 11:24 AM, Brett Cannon ?wrote: >>> >>> The problem with that is it means developers who switch to Python 3.2 >>> or whatever are suddenly going to have their tests fail until they >>> update their code to turn the warnings off. >> >> That sounds like a feature to me... :-) >> > I think Ezio was suggesting just turning warnings on by default when > unittest is run, not turning them into errors. Ezio is suggesting that > developers could explicitly turn warnings off again, but when you use the > default test runner warnings would be shown. His logic is that warnings are > for developers, and so are tests... Having at least the default test runner change the default warnings behaviour to -Wd (while still respecting sys.warnoptions) sounds like a good idea. That way users won't see the warnings (as intended with that change), but developers are less likely to get nasty surprises when things break in future releases (which was one of our major concerns when we made the decision to change the default handling of DeprecationWarning). A similar change may be appropriate for doctest as well. Printing out the list of suppressed warnings in verbose mode may also be useful. A blanket -We is unlikely to work for the test suite, since generating warnings on some platforms is expected behaviour (e.g. due to the ongoing argument between multiprocessing and FreeBSD as to the appropriate behaviour of semaphores). However, we may be able to get to the point where it is run that way by default and then affected tests use check_warnings() to alter the filter configuration (something that many such affected tests already do). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Tue Nov 23 15:02:57 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 15:02:57 +0100 Subject: [Python-Dev] r86699 - python/branches/py3k/Lib/zipfile.py References: <20101122233126.C8BDBEE981@mail.python.org> <66720F75-169A-4702-AF53-69845701AA55@langa.pl> Message-ID: <20101123150257.76a423ad@pitrou.net> On Mon, 22 Nov 2010 22:00:08 -0600 Benjamin Peterson wrote: > 2010/11/22 ?ukasz Langa : > > Wiadomo?? napisana przez Benjamin Peterson w dniu 2010-11-23, o godz. 00:47: > > > > No test? > > > > > > The tests were there already, raising ResourceWarnings. After this change, > > they stopped doing that. You may say: now they pass for the first time :) > > It looks like you added new API, though. For that, we would expect new tests. It's an internal API, although ZipExtFile doesn't begin with an underscore. Regards Antoine. From ncoghlan at gmail.com Tue Nov 23 15:16:15 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 24 Nov 2010 00:16:15 +1000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEBC6BD.9060402@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> Message-ID: On Tue, Nov 23, 2010 at 11:50 PM, Michael Foord wrote: > PEP 354 was rejected for two primary reasons - lack of interest and nowhere > obvious to put it. Would it be *so bad* if an enum type lived in its own > module? There is certainly more interest now, and if we are to use something > like this in the standard library it *has* to be in the standard library > (unless every module implements their own private _Constant class). > > Time to revisit the PEP? If you (or anyone else) wanted to revisit the PEP, then I would advise trawling through the standard library looking for constants that could be sensibly converted to enum values. A decision would also need to be made as to whether or not to subclass int, or just provide __index__ (the former has the advantage of being able to drop cleanly into OS level APIs that expect a numerical constant). Whether enums should provide arbitrary name-value mappings (ala C enums) or were restricted to sequential indices starting from zero would be another question best addressed by a code survey of at least the stdlib. And getgeneratorstate() doesn't count as a use case, since the ordering isn't needed and using string literals instead of integers will cover the debugging aspect :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Tue Nov 23 15:24:18 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 23 Nov 2010 14:24:18 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> Message-ID: <4CEBCE92.40801@voidspace.org.uk> On 23/11/2010 14:16, Nick Coghlan wrote: > On Tue, Nov 23, 2010 at 11:50 PM, Michael Foord > wrote: >> PEP 354 was rejected for two primary reasons - lack of interest and nowhere >> obvious to put it. Would it be *so bad* if an enum type lived in its own >> module? There is certainly more interest now, and if we are to use something >> like this in the standard library it *has* to be in the standard library >> (unless every module implements their own private _Constant class). >> >> Time to revisit the PEP? > If you (or anyone else) wanted to revisit the PEP, then I would advise > trawling through the standard library looking for constants that could > be sensibly converted to enum values. > > A decision would also need to be made as to whether or not to subclass > int, or just provide __index__ (the former has the advantage of being > able to drop cleanly into OS level APIs that expect a numerical > constant). > > Whether enums should provide arbitrary name-value mappings (ala C > enums) or were restricted to sequential indices starting from zero > would be another question best addressed by a code survey of at least > the stdlib. > > And getgeneratorstate() doesn't count as a use case, since the > ordering isn't needed and using string literals instead of integers > will cover the debugging aspect :) > Well, for backwards compatibility reasons the new constants would have to *behave* like the old ones (including having the same underlying value and comparing equal to it). In many cases it is *likely* that subclassing int is a better way of achieving that. Actually looking through the standard library to evaluate it is the only way of confirming that. Another API, that reduces the duplication of creating the enum and setting the names, could be something like: make_enums("Names", "NAME_ONE NAME_TWO NAME_THREE", base_type=int, module=__name__) Using __name__ we can set the module globals in the call to make_enums. All the best, Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ From solipsis at pitrou.net Tue Nov 23 15:42:29 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 15:42:29 +0100 Subject: [Python-Dev] constant/enum type in stdlib References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> Message-ID: <20101123154229.474f7a90@pitrou.net> On Tue, 23 Nov 2010 14:24:18 +0000 Michael Foord wrote: > Well, for backwards compatibility reasons the new constants would have > to *behave* like the old ones (including having the same underlying > value and comparing equal to it). > > In many cases it is *likely* that subclassing int is a better way of > achieving that. Actually looking through the standard library to > evaluate it is the only way of confirming that. > > Another API, that reduces the duplication of creating the enum and > setting the names, could be something like: > > make_enums("Names", "NAME_ONE NAME_TWO NAME_THREE", base_type=int, > module=__name__) > > Using __name__ we can set the module globals in the call to make_enums. I don't understand why people insist on calling that an "enum". enum is a C legacy and it doesn't bring anything useful as I can tell. Instead, just assign the values explicitly. Antoine. From benjamin at python.org Tue Nov 23 15:49:37 2010 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 23 Nov 2010 08:49:37 -0600 Subject: [Python-Dev] r86699 - python/branches/py3k/Lib/zipfile.py In-Reply-To: <20101123150257.76a423ad@pitrou.net> References: <20101122233126.C8BDBEE981@mail.python.org> <66720F75-169A-4702-AF53-69845701AA55@langa.pl> <20101123150257.76a423ad@pitrou.net> Message-ID: 2010/11/23 Antoine Pitrou : > On Mon, 22 Nov 2010 22:00:08 -0600 > Benjamin Peterson wrote: >> 2010/11/22 ?ukasz Langa : >> > Wiadomo?? napisana przez Benjamin Peterson w dniu 2010-11-23, o godz. 00:47: >> > >> > No test? >> > >> > >> > The tests were there already, raising ResourceWarnings. After this change, >> > they stopped doing that. You may say: now they pass for the first time :) >> >> It looks like you added new API, though. For that, we would expect new tests. > > It's an internal API, although ZipExtFile doesn't begin with an > underscore. Why is it internal API then? -- Regards, Benjamin From benjamin at python.org Tue Nov 23 15:52:09 2010 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 23 Nov 2010 08:52:09 -0600 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123154229.474f7a90@pitrou.net> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> Message-ID: 2010/11/23 Antoine Pitrou : > On Tue, 23 Nov 2010 14:24:18 +0000 > Michael Foord wrote: >> Well, for backwards compatibility reasons the new constants would have >> to *behave* like the old ones (including having the same underlying >> value and comparing equal to it). >> >> In many cases it is *likely* that subclassing int is a better way of >> achieving that. Actually looking through the standard library to >> evaluate it is the only way of confirming that. >> >> Another API, that reduces the duplication of creating the enum and >> setting the names, could be something like: >> >> ? ? ?make_enums("Names", "NAME_ONE NAME_TWO NAME_THREE", base_type=int, >> module=__name__) >> >> Using __name__ we can set the module globals in the call to make_enums. > > I don't understand why people insist on calling that an "enum". enum is > a C legacy and it doesn't bring anything useful as I can tell. Instead, > just assign the values explicitly. The concept of a "enumeration" of values is still useful outside its stunted C incarnation. Out of curiosity, why is enum "legacy" in C? -- Regards, Benjamin From fuzzyman at voidspace.org.uk Tue Nov 23 15:56:36 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 23 Nov 2010 14:56:36 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123154229.474f7a90@pitrou.net> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> Message-ID: <4CEBD624.9000402@voidspace.org.uk> On 23/11/2010 14:42, Antoine Pitrou wrote: > On Tue, 23 Nov 2010 14:24:18 +0000 > Michael Foord wrote: >> Well, for backwards compatibility reasons the new constants would have >> to *behave* like the old ones (including having the same underlying >> value and comparing equal to it). >> >> In many cases it is *likely* that subclassing int is a better way of >> achieving that. Actually looking through the standard library to >> evaluate it is the only way of confirming that. >> >> Another API, that reduces the duplication of creating the enum and >> setting the names, could be something like: >> >> make_enums("Names", "NAME_ONE NAME_TWO NAME_THREE", base_type=int, >> module=__name__) >> >> Using __name__ we can set the module globals in the call to make_enums. > I don't understand why people insist on calling that an "enum". enum is > a C legacy and it doesn't bring anything useful as I can tell. Instead, > just assign the values explicitly. > enum isn't only in C. (They are in C# as well at least.) Wikipedia links enum to "enumerated type" and says: an enumerated type (also called enumeration or enum) is a data type consisting of a set of named values It sounds entirely appropriate. I have no problem with explicitly assigning values instead of doing it automagically. All the best, Michael > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ From stephen at xemacs.org Tue Nov 23 16:00:22 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 24 Nov 2010 00:00:22 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEA5744.3080308@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> Message-ID: <8762voccft.fsf@uwakimon.sk.tsukuba.ac.jp> If you don't care about the ISO standard, but only about Python, Martin's right, I was wrong. You can stop reading now. "Martin v. L?wis" writes: > I could only find the FCD of 10646:2010, where annex H was integrated > into section 10: Thank you for the reference. I referred to two older versions, 10646-1:1993 (for the annexes and Amendment, and my basic understanding) and 10646:2003 (for the detailed definition of UCS-2 in Sections 7, 8 and 13; unfortunately, I missed the most important detail, which is in Section 9). In :2003 the Annex I referred to as "Annex H" is Annex J, and "Annex Q" is partly in Section 9.1 and mostly in Annex C. I don't know where the former is in the 2010 FCD, and the latter is section 9.2. > I think they are now acknowledging that UCS-2 was a misleading term, > making it ambiguous whether this refers to a CCS, a CEF, or a CES; > like "ASCII", people have been using it for all three of them. In :1993 it wasn't ambiguous, they simply didn't make those distinctions. They were not needed for ISO 10646's published versions, although they certainly are for Unicode. Now, quite clearly, the ISO has *changed the definition* in every new version, progressively adding new restrictions that go beyond clarifying ambiguity. But even in :2003, in view of 4.2, 6.2, 6.3, and 13.1, UCS-2 is clearly well-defined as a CM according to UTR#17, which can probably be identified with CCS in :2003 terminology. Ie, returning to UTR#17 terminology, it is the composition of a CES, a CEF, and a CCS, which are not defined individually. Note: The definition of "coded character" changed between :2003 and the 2010 FCD, from "character with representation" to "character with integer". There is a NOTE indicating that 16-bit integers may be used in processing. Given that this is a non-normative note, I take it to mean that in an array of 16-bit integers, "most significant octet" is to be interpreted in the natural way for the architecture rather than by the representation in memory, which might be little-endian. IMO it's unnatural to think that that changes the definition of UCS-2 to be either a CEF, or a composition of a CEF and a CCS. > Apparently, the ISO WG interprets earlier revisions as saying that > UCS-2 is a CEF that restricted UTF-16 to the BMP. I think that ISO 10646-1:1993 admits only one interpretation, a CM restricted to the BMP (including surrogates), and ISO 10646:2003 admits only one interpretation, a CM restricted to the BMP (not including surrogates). The note under Table 4 on p.24 of the FCD is, uh, well, a lie. Earlier versions certainly did not restrict to "scalar values"; they had no such concept. > THIS IS NOT WHAT PYTHON DOES. Well, no shit, Sherlock. You don't have to yell at me, I know what Python does. The question is, is what does UCS-2 do? The answer is that in :1993, AFAICT it did what Python does. In :2003, they added (last sentence, section 9.1): UCS-2 cannot be used to represent any characters on the supplementary planes. I assume they maintain that position in 2010, so End Of Thread. I apologize for missing that when I was reviewing the standard earlier, but I expected restrictions on UCS-2 to be explained in 13.1 or perhaps 14. And 13.1 simply requires that characters in the BMP be represented by their defined code positions, truncated to two octets. Like earlier versions, it doesn't prohibit use of surrogates or say that non-BMP characters can't be represented. > Not sure what it says in your copy; in mine, section 9.3 says [snip] Mine (:2003) says "NOTE 2 - When confined to the code positions in Planes 00 to 10, UCS-4 is also referred to as UCS Transformation Format 32 (UTF-32)." Then it references the Unicode Standard (v4.0) as the authority for UTF-32. Obviously they continued to be confused at this point in time; by the draft you have, apparently the WG had decided to pretty much completely synchronize the whole standard to a subset of Unicode. This seems pointless to me (unlike, say, the work that has been done on standardizing criteria for repertoire changes). In particular, the :1993 definition of UCS-2 was a perfectly good standard for describing the processing Python actually does internally. The current definition of UCS-2 as identical to the BMP is useless, and good riddance, I'm perfectly happy to have them deprecate it. From solipsis at pitrou.net Tue Nov 23 16:01:06 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 16:01:06 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> Message-ID: <1290524466.3642.4.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 08:52 -0600, Benjamin Peterson a ?crit : > 2010/11/23 Antoine Pitrou : > > On Tue, 23 Nov 2010 14:24:18 +0000 > > Michael Foord wrote: > >> Well, for backwards compatibility reasons the new constants would have > >> to *behave* like the old ones (including having the same underlying > >> value and comparing equal to it). > >> > >> In many cases it is *likely* that subclassing int is a better way of > >> achieving that. Actually looking through the standard library to > >> evaluate it is the only way of confirming that. > >> > >> Another API, that reduces the duplication of creating the enum and > >> setting the names, could be something like: > >> > >> make_enums("Names", "NAME_ONE NAME_TWO NAME_THREE", base_type=int, > >> module=__name__) > >> > >> Using __name__ we can set the module globals in the call to make_enums. > > > > I don't understand why people insist on calling that an "enum". enum is > > a C legacy and it doesn't bring anything useful as I can tell. Instead, > > just assign the values explicitly. > > The concept of a "enumeration" of values is still useful outside its > stunted C incarnation. Well, it is easy to assign range(N) to a tuple of names when desired. I don't think an automatically-enumerating constant generator is needed. Regards Antoine. From solipsis at pitrou.net Tue Nov 23 16:01:59 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 16:01:59 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEBD624.9000402@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <4CEBD624.9000402@voidspace.org.uk> Message-ID: <1290524519.3642.5.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 14:56 +0000, Michael Foord a ?crit : > On 23/11/2010 14:42, Antoine Pitrou wrote: > > On Tue, 23 Nov 2010 14:24:18 +0000 > > Michael Foord wrote: > >> Well, for backwards compatibility reasons the new constants would have > >> to *behave* like the old ones (including having the same underlying > >> value and comparing equal to it). > >> > >> In many cases it is *likely* that subclassing int is a better way of > >> achieving that. Actually looking through the standard library to > >> evaluate it is the only way of confirming that. > >> > >> Another API, that reduces the duplication of creating the enum and > >> setting the names, could be something like: > >> > >> make_enums("Names", "NAME_ONE NAME_TWO NAME_THREE", base_type=int, > >> module=__name__) > >> > >> Using __name__ we can set the module globals in the call to make_enums. > > I don't understand why people insist on calling that an "enum". enum is > > a C legacy and it doesn't bring anything useful as I can tell. Instead, > > just assign the values explicitly. > > > > enum isn't only in C. (They are in C# as well at least.) Well, it's been inherited by C-like languages, no doubt. Like braces and semicolumns :) Regards Antoine. From solipsis at pitrou.net Tue Nov 23 15:59:59 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 15:59:59 +0100 Subject: [Python-Dev] r86699 - python/branches/py3k/Lib/zipfile.py In-Reply-To: References: <20101122233126.C8BDBEE981@mail.python.org> <66720F75-169A-4702-AF53-69845701AA55@langa.pl> <20101123150257.76a423ad@pitrou.net> Message-ID: <1290524399.3642.3.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 08:49 -0600, Benjamin Peterson a ?crit : > 2010/11/23 Antoine Pitrou : > > On Mon, 22 Nov 2010 22:00:08 -0600 > > Benjamin Peterson wrote: > >> 2010/11/22 ?ukasz Langa : > >> > Wiadomo?? napisana przez Benjamin Peterson w dniu 2010-11-23, o godz. 00:47: > >> > > >> > No test? > >> > > >> > > >> > The tests were there already, raising ResourceWarnings. After this change, > >> > they stopped doing that. You may say: now they pass for the first time :) > >> > >> It looks like you added new API, though. For that, we would expect new tests. > > > > It's an internal API, although ZipExtFile doesn't begin with an > > underscore. > > Why is it internal API then? Because it's for use by ZipFile.open(). The ZipExtFile constructor is not supposed to be called by the user. You might instead asked why ZipExtFile isn't called _ZipExtFile instead, and I have no idea. Regards Antoine. From fuzzyman at voidspace.org.uk Tue Nov 23 16:15:29 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 23 Nov 2010 15:15:29 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290524466.3642.4.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> Message-ID: <4CEBDA91.4050205@voidspace.org.uk> On 23/11/2010 15:01, Antoine Pitrou wrote: > Le mardi 23 novembre 2010 ? 08:52 -0600, Benjamin Peterson a ?crit : >> 2010/11/23 Antoine Pitrou : >>> On Tue, 23 Nov 2010 14:24:18 +0000 >>> Michael Foord wrote: >>>> Well, for backwards compatibility reasons the new constants would have >>>> to *behave* like the old ones (including having the same underlying >>>> value and comparing equal to it). >>>> >>>> In many cases it is *likely* that subclassing int is a better way of >>>> achieving that. Actually looking through the standard library to >>>> evaluate it is the only way of confirming that. >>>> >>>> Another API, that reduces the duplication of creating the enum and >>>> setting the names, could be something like: >>>> >>>> make_enums("Names", "NAME_ONE NAME_TWO NAME_THREE", base_type=int, >>>> module=__name__) >>>> >>>> Using __name__ we can set the module globals in the call to make_enums. >>> I don't understand why people insist on calling that an "enum". enum is >>> a C legacy and it doesn't bring anything useful as I can tell. Instead, >>> just assign the values explicitly. >> The concept of a "enumeration" of values is still useful outside its >> stunted C incarnation. > Well, it is easy to assign range(N) to a tuple of names when desired. I > don't think an automatically-enumerating constant generator is needed. > Right, and that is current practise. It has the disadvantage (that you seemed to acknowledge) that when debugging the integer values are seen instead of something with a useful repr. Having a *simple* class (and API to create them) that produces named constants with a useful repr, is what we are discussing, and that seems awfully like an enum (in the general sense not in a C specific sense). For backwards compatibility these constants, where they replace integer constants, would need to be integer subclasses with the same behaviour. Like the Qt example you appreciated so much. ;-) There are still two reasonable APIs (unless you have changed your mind and think that sticking with plain integers is best), of which I prefer the latter: SOME_CONST = Constant('SOME_CONST', 1) OTHER_CONST = Constant('OTHER_CONST', 2) or: Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1) SOME_CONST = Constants.SOME_CONST OTHER_CONST = Constants.OTHER_CONST (Well, there is a third option that takes __name__ and sets the constants in the module automagically. I can understand why people would dislike that though.) All the best, Michael Foord Michael > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ From solipsis at pitrou.net Tue Nov 23 16:30:53 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 16:30:53 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEBDA91.4050205@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> Message-ID: <1290526253.3642.9.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 15:15 +0000, Michael Foord a ?crit : > There are still two reasonable APIs (unless you have changed your mind > and think that sticking with plain integers is best), of which I prefer > the latter: > > SOME_CONST = Constant('SOME_CONST', 1) > OTHER_CONST = Constant('OTHER_CONST', 2) > > or: > > Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1) Or: Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', values=range(1, 3)) Again, auto-enumeration is useless since it's trivial to achieve explicitly. Regards Antoine. From fuzzyman at voidspace.org.uk Tue Nov 23 16:40:28 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 23 Nov 2010 15:40:28 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290526253.3642.9.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> Message-ID: <4CEBE06C.9030101@voidspace.org.uk> On 23/11/2010 15:30, Antoine Pitrou wrote: > Le mardi 23 novembre 2010 ? 15:15 +0000, Michael Foord a ?crit : >> There are still two reasonable APIs (unless you have changed your mind >> and think that sticking with plain integers is best), of which I prefer >> the latter: >> >> SOME_CONST = Constant('SOME_CONST', 1) >> OTHER_CONST = Constant('OTHER_CONST', 2) >> >> or: >> >> Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1) > Or: > > Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', > values=range(1, 3)) > > Again, auto-enumeration is useless since it's trivial to achieve > explicitly. Ah, I see. It is the auto-enumeration you disliked. Sure - not a problem. I think the step that Nick described, of evaluating places in the standard library that this could be used, is a good one. I'll try to get around to it and perhaps attempt to resuscitate the PEP. (Any suggestions as to an appropriate module if having it live in its own module is still an objection?) Michael > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From solipsis at pitrou.net Tue Nov 23 17:05:19 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 17:05:19 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEBE06C.9030101@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> Message-ID: <1290528319.3642.11.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 15:40 +0000, Michael Foord a ?crit : > On 23/11/2010 15:30, Antoine Pitrou wrote: > > Le mardi 23 novembre 2010 ? 15:15 +0000, Michael Foord a ?crit : > >> There are still two reasonable APIs (unless you have changed your mind > >> and think that sticking with plain integers is best), of which I prefer > >> the latter: > >> > >> SOME_CONST = Constant('SOME_CONST', 1) > >> OTHER_CONST = Constant('OTHER_CONST', 2) > >> > >> or: > >> > >> Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1) > > Or: > > > > Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', > > values=range(1, 3)) > > > > Again, auto-enumeration is useless since it's trivial to achieve > > explicitly. > > Ah, I see. It is the auto-enumeration you disliked. Sure - not a problem. > > I think the step that Nick described, of evaluating places in the > standard library that this could be used, is a good one. I'll try to get > around to it and perhaps attempt to resuscitate the PEP. (Any > suggestions as to an appropriate module if having it live in its own > module is still an objection?) We already have a bunch of bizarrely unrelated stuff in collections (such as Callable), so we could put enum there too. Regards Antoine. From fuzzyman at voidspace.org.uk Tue Nov 23 17:07:30 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 23 Nov 2010 16:07:30 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290528319.3642.11.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> Message-ID: <4CEBE6C2.1070204@voidspace.org.uk> On 23/11/2010 16:05, Antoine Pitrou wrote: > Le mardi 23 novembre 2010 ? 15:40 +0000, Michael Foord a ?crit : >> On 23/11/2010 15:30, Antoine Pitrou wrote: >>> Le mardi 23 novembre 2010 ? 15:15 +0000, Michael Foord a ?crit : >>>> There are still two reasonable APIs (unless you have changed your mind >>>> and think that sticking with plain integers is best), of which I prefer >>>> the latter: >>>> >>>> SOME_CONST = Constant('SOME_CONST', 1) >>>> OTHER_CONST = Constant('OTHER_CONST', 2) >>>> >>>> or: >>>> >>>> Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1) >>> Or: >>> >>> Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', >>> values=range(1, 3)) >>> >>> Again, auto-enumeration is useless since it's trivial to achieve >>> explicitly. >> Ah, I see. It is the auto-enumeration you disliked. Sure - not a problem. >> >> I think the step that Nick described, of evaluating places in the >> standard library that this could be used, is a good one. I'll try to get >> around to it and perhaps attempt to resuscitate the PEP. (Any >> suggestions as to an appropriate module if having it live in its own >> module is still an objection?) > We already have a bunch of bizarrely unrelated stuff in collections > (such as Callable), so we could put enum there too. > I guess it creates collections of constants... Michael > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From Ben.Cottrell at nominum.com Tue Nov 23 16:37:43 2010 From: Ben.Cottrell at nominum.com (Ben.Cottrell at nominum.com) Date: Tue, 23 Nov 2010 07:37:43 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: Your message of "Tue, 23 Nov 2010 15:15:29 GMT." <4CEBDA91.4050205@voidspace.org.uk> Message-ID: <20101123153743.3D9451B8ED4@shell-too.nominum.com> On Tue, 23 Nov 2010 15:15:29 +0000, Michael Foord wrote: > There are still two reasonable APIs (unless you have changed your mind > and think that sticking with plain integers is best), of which I prefer > the latter: > > SOME_CONST = Constant('SOME_CONST', 1) > OTHER_CONST = Constant('OTHER_CONST', 2) > > or: > > Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1) > SOME_CONST = Constants.SOME_CONST > OTHER_CONST = Constants.OTHER_CONST I prefer the latter too, because that makes it possible to have 'Constants' be a rendezvous point for making sure that you're passing something valid. Perhaps using 'in': def func(foo): if foo not in Constants: raise ValueError('foo must be SOME_CONST or OTHER_CONST') ... I know this is probably not going to happen, but I would *so much* like it if functions would start rejecting "the wrong kind of 2". Constants that are valid, integer-wise, but which aren't part of the set of constants allowed for that argument. I'd prefer not to think of the number of times I've made the following mistake: s = socket.socket(socket.SOCK_DGRAM, socket.AF_INET) ~Ben From turnbull at sk.tsukuba.ac.jp Tue Nov 23 17:16:55 2010 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 24 Nov 2010 01:16:55 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEA527B.4030002@v.loewis.de> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE96A40.1050705@v.loewis.de> <87ipzqc4gi.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA27EB.8000104@v.loewis.de> <87fwutd49e.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA527B.4030002@v.loewis.de> Message-ID: <871v6cc8w8.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > I disagree: Quoting from Unicode 5.0, section 5.4: > > # The individual components of implementations may have different > # levels of support for surrogates, as long as those components are > # assembled and communicate correctly. "Assembly" is the problem. If chr() or a slice creates a lone surrogate and surrogateescape passes it back out, Python as a whole is non-conforming. Technically, you can hide behind "none of slicing, chr(), or surrogateescape promises to conform", and maybe that would fly to a standards lawyer; I'd have to see the precise statement. Here's a more convincing example. A user specifies "utf8" as her locale charset. Then she specifies a string containing a non-BMP character as the "description" of a file, and internal code munges this via slicing into a file name conforming to some specification (eg, length limit + uniquifier if needed). Then if the non-BMP character is in the "right" place, she will get either a broken file name, which will either get written to disk or raise an exception, depending on whether the munging program has enabled surrogateescape or not. I claim both of those results are non-conforming to the specification of UTF-16, and therefore Python Unicode processing as a whole must be considered non-conforming. It's still pretty damn good. But I've elaborated that point elsewhere. > The rationale for supporting these characters in chr() goes back much > further than the surrogateescape handler - as Python unicode strings > are sequences of code points, it would be impractical if you couldn't > create some of them, or even would have to consult the UCD before > determining whether they can be created. The Zen is irrelevant to determining conformance to Unicode, which has its own Zen. From stephen at xemacs.org Tue Nov 23 17:18:57 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 24 Nov 2010 01:18:57 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEA5744.3080308@v.loewis.de> <4CEA6661.4080402@egenix.com> Message-ID: <87zkt0au8e.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > For practical purposes, UCS2/UCS4 convey far more inherent information > than narrow/wide: That was my stance, but in fact (1) the ISO JTC1/SC2 has deliberately made them ambiguous by changing their definitions over the years[1], and (2) the more recent definitions and "interpretations" of UCS-2 *prohibit* use of surrogates in UCS-2 as far as I can tell. And that's what you'll see everywhere you look, because Wikipedia and friends pick up the most recent versions of everything. > So don't just think about "what will developers know?", also think > about "what will developers know, and what will a quick trip to a > search engine tell them?". It will tell them that UCS-2 cannot even *express* non-BMP characters. Terry and David are *not* dummies, and that's what they got from more or less careful study of the issue. > And once you take that stance, the overly > generic narrow/wide terms fail, badly. I still agree that something more accurate would be nice, but face it: the ISO will redefine and deprecate such terms as soon as they notice us using them. > +1 for MAL's suggested tweaks to the Py3k configure options. Despite my natural sympathy for your arguments, and MAL's, I'm still -1. I really wish I could switch back, but it seems to me that "UCS-2" is a liability we don't need, *especially* on Windows where the default build is presumably going to be UCS2 forever. Footnotes: [1] You'd think it would be hard to change the definition of UCS-4, but they managed. :-( From fuzzyman at voidspace.org.uk Tue Nov 23 17:19:16 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 23 Nov 2010 16:19:16 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123153743.3D9451B8ED4@shell-too.nominum.com> References: <20101123153743.3D9451B8ED4@shell-too.nominum.com> Message-ID: <4CEBE984.4050807@voidspace.org.uk> On 23/11/2010 15:37, Ben.Cottrell at nominum.com wrote: > On Tue, 23 Nov 2010 15:15:29 +0000, Michael Foord wrote: >> There are still two reasonable APIs (unless you have changed your mind >> and think that sticking with plain integers is best), of which I prefer >> the latter: >> >> SOME_CONST = Constant('SOME_CONST', 1) >> OTHER_CONST = Constant('OTHER_CONST', 2) >> >> or: >> >> Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1) >> SOME_CONST = Constants.SOME_CONST >> OTHER_CONST = Constants.OTHER_CONST > I prefer the latter too, because that makes it possible to have > 'Constants' be a rendezvous point for making sure that you're > passing something valid. Perhaps using 'in': > > def func(foo): > if foo not in Constants: > raise ValueError('foo must be SOME_CONST or OTHER_CONST') > ... > > I know this is probably not going to happen, but I would *so much* > like it if functions would start rejecting "the wrong kind of 2". > Constants that are valid, integer-wise, but which aren't part of > the set of constants allowed for that argument. I'd prefer not to > think of the number of times I've made the following mistake: > > s = socket.socket(socket.SOCK_DGRAM, socket.AF_INET) Well it would be perfectly possible for the __contains__ method (on the metaclass so that a Constants class can act as a container) to permit a *raw integer* (to be backwards compatible with code using hard coded values) but not permit other constants that aren't valid. Code that is *deliberately* using the wrong constants would be screwed of course... All the best, Michael > ~Ben > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From barry at python.org Tue Nov 23 17:27:03 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 23 Nov 2010 11:27:03 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEBC6BD.9060402@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> Message-ID: <20101123112703.42b42812@mission> On Nov 23, 2010, at 01:50 PM, Michael Foord wrote: >Right. As it happens I just submitted a patch to Barry Warsaw's enum package >(nice), flufl.enum [1], to allow namedtuple style creation of named >constants: Thanks for the plug (and the nice patch). FWIW, the documentation for the package is here: http://packages.python.org/flufl.enum/ I made some explicit decisions about the API and semantics of this package, to fit my own use cases and sensibilities. I guess you wouldn't expect anything else , but I'm willing to acknowledge that others would make different decisions, and certainly the number of existing enum implementations out there proves that there are lots of interesting ways to go about it. That said, there are several things I like about my package: * Enums are not subclassed from ints or strs. They are a distinct data type that can be converted to and from ints and strs. EIBTI. * The typical way to create them is through a simple, but explicit class definition. I personally like being explicit about the item values, and the assignments are required to make the metaclass work properly, but Michael's convenience patch is totally appropriate for cases where you don't care, or you want a one-liner. * Enum items are singletons and are intended to be compared by identity. They can be compared by equality but are not ordered. * Enum items have an unambiguous symbolic repr and a nice human readable str. * Given an enum item, you can get to its enum class, and given the class you can get to the set of items. * Enums can be subclassed (though all items in the subclass must have unique values). In any case it may be that enums are too tied to specific use cases to find a good common ground for the stdlib. I've been using my module for years and if there's interest I would of course be happy to donate it for use in the stdlib. Like the original sets implementation, it makes perfect sense to provide them in a separate module rather than as a built-in type. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Tue Nov 23 17:31:27 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 23 Nov 2010 11:31:27 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEBDA91.4050205@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> Message-ID: <20101123113127.78506cb5@mission> On Nov 23, 2010, at 03:15 PM, Michael Foord wrote: >(Well, there is a third option that takes __name__ and sets the constants in >the module automagically. I can understand why people would dislike that >though.) Personally, I think if you want that, then the explicit class definition is a better way to go. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From pje at telecommunity.com Tue Nov 23 17:52:37 2010 From: pje at telecommunity.com (P.J. Eby) Date: Tue, 23 Nov 2010 11:52:37 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123113127.78506cb5@mission> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <20101123113127.78506cb5@mission> Message-ID: <20101123165252.0C0743A4114@sparrow.telecommunity.com> At 11:31 AM 11/23/2010 -0500, Barry Warsaw wrote: >On Nov 23, 2010, at 03:15 PM, Michael Foord wrote: > > >(Well, there is a third option that takes __name__ and sets the constants in > >the module automagically. I can understand why people would dislike that > >though.) > >Personally, I think if you want that, then the explicit class definition is a >better way to go. This reminds me: a stdlib enum should support proper pickling and copying; i.e.: assert SomeEnum.anEnum is pickle.loads(pickle.dumps(SomeEnum.anEnum)) This could probably be implemented by adding something like: def __reduce__(self): return getattr, (self._class, self._enumname) in the EnumValue class. From fuzzyman at voidspace.org.uk Tue Nov 23 18:02:33 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 23 Nov 2010 17:02:33 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123112703.42b42812@mission> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <20101123112703.42b42812@mission> Message-ID: <4CEBF3A9.3060604@voidspace.org.uk> On 23/11/2010 16:27, Barry Warsaw wrote: > On Nov 23, 2010, at 01:50 PM, Michael Foord wrote: > >> Right. As it happens I just submitted a patch to Barry Warsaw's enum package >> (nice), flufl.enum [1], to allow namedtuple style creation of named >> constants: > Thanks for the plug (and the nice patch). > > FWIW, the documentation for the package is here: > > http://packages.python.org/flufl.enum/ > > I made some explicit decisions about the API and semantics of this package, to > fit my own use cases and sensibilities. I guess you wouldn't expect anything > else , but I'm willing to acknowledge that others would make different > decisions, and certainly the number of existing enum implementations out there > proves that there are lots of interesting ways to go about it. > > That said, there are several things I like about my package: > > * Enums are not subclassed from ints or strs. They are a distinct data type > that can be converted to and from ints and strs. EIBTI. But if we are to use it *in* the standard library (as opposed to merely adding a module *to* the standard library) there are backwards compatibility concerns. Where modules are already using integers for constants then integers still need to work. One easy way to achieve this is to subclass integer. If we don't do that (assuming we decide that putting a solution in the standard library is appropriate) then we'll have to evaluate what we mean by backwards compatible. If the modules that use the constants aren't to change then comparing equal to the underlying value is the minimum (so that the original value can still be used in place of the new named constant). Not sure if you'd be happy to make that change in flufl.enum. > * The typical way to create them is through a simple, but explicit class > definition. I personally like being explicit about the item values, and the > assignments are required to make the metaclass work properly, but Michael's > convenience patch is totally appropriate for cases where you don't care, or > you want a one-liner. If make_enum was to take a set of values to use (as Antoine suggested) I don't see what's un-explicit about it. All the best, Michael > * Enum items are singletons and are intended to be compared by identity. They > can be compared by equality but are not ordered. > > * Enum items have an unambiguous symbolic repr and a nice human readable str. > > * Given an enum item, you can get to its enum class, and given the class you > can get to the set of items. > > * Enums can be subclassed (though all items in the subclass must have unique > values). > > In any case it may be that enums are too tied to specific use cases to find a > good common ground for the stdlib. I've been using my module for years and if > there's interest I would of course be happy to donate it for use in the > stdlib. Like the original sets implementation, it makes perfect sense to > provide them in a separate module rather than as a built-in type. > > -Barry > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Nov 23 18:37:40 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 18:37:40 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> Message-ID: <1290533860.3642.73.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 12:32 -0500, Isaac Morland a ?crit : > On Tue, 23 Nov 2010, Antoine Pitrou wrote: > > > We already have a bunch of bizarrely unrelated stuff in collections > > (such as Callable), so we could put enum there too. > > Why not just "enum" (i.e., "from enum import [...]" or "import > enum.[...]")? Enumerations are one of the basic kinds of types overall > (speaking informally and independent of any specific language) - they > aren't at all exotic. Enumerations aren't a type at all (they have no distinguishing property). > And "Flat is better than nested", after all. Not when it means creating a separate module for every micro-feature. Regards Antoine. From ijmorlan at uwaterloo.ca Tue Nov 23 18:32:15 2010 From: ijmorlan at uwaterloo.ca (Isaac Morland) Date: Tue, 23 Nov 2010 12:32:15 -0500 (EST) Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290528319.3642.11.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> Message-ID: On Tue, 23 Nov 2010, Antoine Pitrou wrote: > We already have a bunch of bizarrely unrelated stuff in collections > (such as Callable), so we could put enum there too. Why not just "enum" (i.e., "from enum import [...]" or "import enum.[...]")? Enumerations are one of the basic kinds of types overall (speaking informally and independent of any specific language) - they aren't at all exotic. And "Flat is better than nested", after all. Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist From ijmorlan at uwaterloo.ca Tue Nov 23 18:50:31 2010 From: ijmorlan at uwaterloo.ca (Isaac Morland) Date: Tue, 23 Nov 2010 12:50:31 -0500 (EST) Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290533860.3642.73.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> Message-ID: On Tue, 23 Nov 2010, Antoine Pitrou wrote: > Le mardi 23 novembre 2010 ? 12:32 -0500, Isaac Morland a ?crit : >> On Tue, 23 Nov 2010, Antoine Pitrou wrote: >> >>> We already have a bunch of bizarrely unrelated stuff in collections >>> (such as Callable), so we could put enum there too. >> >> Why not just "enum" (i.e., "from enum import [...]" or "import >> enum.[...]")? Enumerations are one of the basic kinds of types overall >> (speaking informally and independent of any specific language) - they >> aren't at all exotic. > > Enumerations aren't a type at all (they have no distinguishing > property). Each enumeration is a type (well, OK, not in every language, presumably, but certainly in many languages). The word "basic" is more important than "types" in my sentence - the point is that an enumeration capability is a very common one in a type system, and is very general, not specific to any particular application. >> And "Flat is better than nested", after all. > > Not when it means creating a separate module for every micro-feature. Classes have their own keyword. I don't think it's disproportionate to give enums a top-level module name. Having said that, I understand we're trying to have a not-too-flat module namespace and I can see the sense in putting it in "collections". But I think the idea that enumerations are of very wide applicability and hence deserve a shorter name should be seriously considered. I'll leave it at that, except for: Hey, how about this syntax: enum Colors: red = 0 green = 10 blue (blue gets the value 11) ;-) Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist From fdrake at acm.org Tue Nov 23 18:57:20 2010 From: fdrake at acm.org (Fred Drake) Date: Tue, 23 Nov 2010 12:57:20 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290533860.3642.73.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> Message-ID: On Tue, Nov 23, 2010 at 12:37 PM, Antoine Pitrou wrote: > Enumerations aren't a type at all (they have no distinguishing > property). In any given language, this may be true, or not. Whether they should be distinct in Python is core to the current discussion. >From a backward-compatibility perspective, what makes sense depends on whether they're used to implement existing constants (socket.AF_INET, etc.) or if they reserved for new features only. ? -Fred -- Fred L. Drake, Jr.? ? "A storm broke loose in my mind."? --Albert Einstein From solipsis at pitrou.net Tue Nov 23 19:06:42 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 19:06:42 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> Message-ID: <1290535602.3642.87.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 12:57 -0500, Fred Drake a ?crit : > On Tue, Nov 23, 2010 at 12:37 PM, Antoine Pitrou wrote: > > Enumerations aren't a type at all (they have no distinguishing > > property). > > In any given language, this may be true, or not. Whether they should > be distinct in Python is core to the current discussion. I meant "type" in the structural sense (hence the parenthesis). enums are just auto-generated constants. Since Python makes it trivial to generate sequential integers, there's no need for a specific "enum" construct. Now you may argue that enums should be strongly-typed, but that would be a bit backwards given Python's preference for duck-typing. > From a backward-compatibility perspective, what makes sense depends on > whether they're used to implement existing constants (socket.AF_INET, > etc.) or if they reserved for new features only. It's not only backwards compatibility. New features relying on C APIs have to be able to map constants to the integers used in the C library. It would be much better if this were done naturally rather than through explicit conversion maps. (this really means subclassing int, if we don't want to complicate C-level code) Regards Antoine. From solipsis at pitrou.net Tue Nov 23 19:07:56 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 19:07:56 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> Message-ID: <1290535676.3642.89.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 12:50 -0500, Isaac Morland a ?crit : > Each enumeration is a type (well, OK, not in every language, presumably, > but certainly in many languages). The word "basic" is more important than > "types" in my sentence - the point is that an enumeration capability is a > very common one in a type system, and is very general, not specific to any > particular application. Python already has an enumeration capability. It's called range(). There's nothing else that C enums have. AFAICT, neither do enums in other mainstream languages (assuming they even exist; I don't remember Perl, PHP or Javascript having anything like that, but perhaps I'm mistaken). Regards Antoine. From v+python at g.nevcal.com Tue Nov 23 19:56:20 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 23 Nov 2010 10:56:20 -0800 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: <4CEBABBA.9050002@v.loewis.de> References: <4CEA0246.9080607@g.nevcal.com> <4CEB97C7.1070708@g.nevcal.com> <4CEBABBA.9050002@v.loewis.de> Message-ID: <4CEC0E54.5070101@g.nevcal.com> On 11/23/2010 3:55 AM, "Martin v. L?wis" wrote: > Am 23.11.2010 11:55, schrieb Amaury Forgeot d'Arc: >> Hi, >> >> 2010/11/23 Glenn Linderman : >>> File "C:\Python32\lib\random.py", line 108, in seed >>> a = int.from_bytes(_urandom(32), 'big') >>> WindowsError: [Error -2146893818] Invalid Signature >> In the subprocess documentation http://docs.python.org/library/subprocess.html >> """On Windows, in order to run a side-by-side assembly the specified >> env *must* include a valid SystemRoot.""" > Indeed, setting SystemRoot might solve this problem. According to > > http://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/ > > CrypoAPI, in Windows 7, requires this variable be set. Failure to > find the enhanced crypto provider would explain why the "random" > module of Python fails to work. > > The specific cause is in the registry: > HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Cryptography\Defaults\Provider\Microsoft > Strong Cryptographic Provider has as it's ImagePath value > > %SystemRoot%\system32\rsaenh.dll > > So the registry (and COM) do rely on environment variables. > > Regards, > Martin I find it sad but hilarious that after working so hard to remove the need for environment variables from Windows that M$ has introduced new dependencies on them. I wonder if this particular registry variable is simply an oversight/bug on M$' part, that they will eventually fix, or if it a turnaround toward the use of more environment variables in the future. Hmm. Time will tell, I suppose. I'm unaware of any benefits in _changing_ SystemRoot to other values, so not pre-expanding it in that registry location seems only to add an unnecessary dependency on the environment. Indeed, preserving that one environment variable allows my version of http.server to proceed with, as far as initial testing can determine, proper behavior. Thanks for your help in figuring this out. That was a lot faster than a "binary search" to choose which variable(s) to preserve. My purpose in such testing was two-fold: firstly, web servers, for security purposes, generally limit the number of environment variables that are seen by CGI programs, and secondly, in debugging whether or not http.server was properly setting the necessary environment variables, the many other environment variables were cluttering up log dumps of all environment variables. It will be nicer to limit the "passed through" environment variables to SystemRoot, as see how things go. I have read some about side-by-side assemblies but had considered them a good reason to stick with the outdated M$VC 6.0 compiler, which doesn't seem to need to create them, and their myriad requirements, which seem far from necessary for simply compiling a program. I was disappointed to realize that Python was heading down the path of using the newer tools that create side-by-side assemblies, but I suppose using an old and crufty compiler like M$VC 6.0 cannot support some of the newer features of Windows, which may seem to be necessary to some.... like 64-bit support, which does seem necessary, even to me. I was well aware that shortcuts and the registry _may_ refer to environment variables, and have a number of environment variables of my own which leverage that capability, to avoid hard-coded drive letters and paths in certain areas, and for the convenience of shorting the specification of some of the long-winded path names that Windows foists upon us (some of those have been significantly shortened in Windows 6.1, and maybe 6.0 which I used only for 2 months with disgust; 6.1 has helped alleviate the disgust, but I still recommend XP for people that don't need 64-bit capabilities). -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Tue Nov 23 19:58:37 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 23 Nov 2010 10:58:37 -0800 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: References: <4CEA0246.9080607@g.nevcal.com> <4CEAE6A7.3010902@g.nevcal.com> Message-ID: <4CEC0EDD.5080604@g.nevcal.com> On 11/22/2010 2:56 PM, Tim Lesher wrote: > On Mon, Nov 22, 2010 at 16:54, Glenn Linderman wrote: >> I suppose it is possible that some environment variables are used by Python >> directly (but I can't seem to find a documented list of them) although I >> would expect that usage to be optional, with fall-back defaults when they >> don't exist. > I can verify that that's the case: Python (at least through 3.1.2) > runs fine on Windows platforms when environment variables are > completely unavailable. I know that from running our port for Windows > CE (which has no environment variables at all), cross-compiled for > Windows XP. Is the Windows CE port generally available? From where? The CE ports I have found in past searches seem to have been quite outdated and not much on-going activity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Nov 23 20:11:06 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 23 Nov 2010 14:11:06 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Nov 22, 2010 at 1:13 PM, Raymond Hettinger wrote: .. > Any explanation we give users needs to let them know two things: > * that we cover the entire range of unicode not just BMP > * that sometimes len(chr(i)) is one and sometimes two This discussion motivated me to start looking into how well Python library itself is prepared to deal with len(chr(i)) = 2. I was not surprised to find that textwrap does not handle the issue that well: >>> len(wrap(' \U00010140' * 80, 20)) 12 >>> len(wrap(' \U00000140' * 80, 20)) 8 That module should probably be rewritten to properly implement the Unicode line breaking algorithm . Yet finding a bug in a str object method after a 5 min review was a bit discouraging: >>> 'xyz'.center(20, '\U00010140') Traceback (most recent call last): File " ", line 1, in TypeError: The fill character must be exactly one character long Given the apparent difficulty of writing even basic text processing algorithms in presence of surrogate pairs, I wonder how wise it is to expose Python users to them. As Wikipedia explains, [1] """ Because the most commonly used characters are all in the Basic Multilingual Plane, converting between surrogate pairs and the original values is often not tested thoroughly. This leads to persistent bugs, and potential security holes, even in popular and well-reviewed application software. """ Since UCS-2 (the Character Encoding Form (CEF)) is now defined [1] to cover only BMP, maybe rather than changing the terms used in the reference manual, we should tighten the code to conform to the updated standards? Again, given that the str object itself has at least one non-BMP character bug as we are closing on the third major release of py3k, how likely are 3rd party developers to get their libraries right as they port to 3.x? [1] http://en.wikipedia.org/wiki/UTF-16/UCS-2 [2] http://unicode.org/reports/tr17/#CharacterEncodingForm From amauryfa at gmail.com Tue Nov 23 20:19:28 2010 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 23 Nov 2010 20:19:28 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: 2010/11/23 Alexander Belopolsky : > This discussion motivated me to start looking into how well Python > library itself is prepared to deal with len(chr(i)) = 2. ?I was not > surprised to find that textwrap does not handle the issue that well: > >>>> len(wrap(' \U00010140' * 80, 20)) > 12 >>>> len(wrap(' \U00000140' * 80, 20)) > 8 > > That module should probably be rewritten to properly implement ?the > Unicode line breaking algorithm > . > > Yet finding a bug in a str object method after a 5 min review was a > bit discouraging: > >>>> 'xyz'.center(20, '\U00010140') > Traceback (most recent call last): > ?File " ", line 1, in > TypeError: The fill character must be exactly one character long > > Given the apparent difficulty of writing even basic text processing > algorithms in presence of surrogate pairs, I wonder how wise it is to > expose Python users to them. This was already discussed two years ago: http://mail.python.org/pipermail/python-dev/2008-July/080900.html So yes, wrap() and center() should be fixed. -- Amaury Forgeot d'Arc From janssen at parc.com Tue Nov 23 20:26:57 2010 From: janssen at parc.com (Bill Janssen) Date: Tue, 23 Nov 2010 11:26:57 PST Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> Message-ID: <58396.1290540417@parc.com> Isaac Morland wrote: > On Tue, 23 Nov 2010, Antoine Pitrou wrote: > > > Le mardi 23 novembre 2010 ? 12:32 -0500, Isaac Morland a ?crit : > >> On Tue, 23 Nov 2010, Antoine Pitrou wrote: > >> > >>> We already have a bunch of bizarrely unrelated stuff in collections > >>> (such as Callable), so we could put enum there too. > >> > >> Why not just "enum" (i.e., "from enum import [...]" or "import > >> enum.[...]")? Enumerations are one of the basic kinds of types overall > >> (speaking informally and independent of any specific language) - they > >> aren't at all exotic. > > > > Enumerations aren't a type at all (they have no distinguishing > > property). Not in C, but in some other languages. > Each enumeration is a type (well, OK, not in every language, > presumably, but certainly in many languages). The main purpose of that is to be able to catch type mismatches with static typing, though. Seems kind of pointless for Python. > Classes have their own keyword. I don't think it's disproportionate > to give enums a top-level module name. I do. > Hey, how about this syntax: > > enum Colors: > red = 0 > green = 10 > blue Why not class Color: red = (255, 0, 0) green = (0, 255, 0) blue = (0, 0, 255) Seems to handle the situation OK. Bill From mal at egenix.com Tue Nov 23 20:31:37 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 23 Nov 2010 20:31:37 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEC1699.40700@egenix.com> Alexander Belopolsky wrote: > On Mon, Nov 22, 2010 at 1:13 PM, Raymond Hettinger > wrote: > .. >> Any explanation we give users needs to let them know two things: >> * that we cover the entire range of unicode not just BMP >> * that sometimes len(chr(i)) is one and sometimes two > > This discussion motivated me to start looking into how well Python > library itself is prepared to deal with len(chr(i)) = 2. I was not > surprised to find that textwrap does not handle the issue that well: > >>>> len(wrap(' \U00010140' * 80, 20)) > 12 >>>> len(wrap(' \U00000140' * 80, 20)) > 8 > > That module should probably be rewritten to properly implement the > Unicode line breaking algorithm > . > > Yet finding a bug in a str object method after a 5 min review was a > bit discouraging: > >>>> 'xyz'.center(20, '\U00010140') > Traceback (most recent call last): > File " ", line 1, in > TypeError: The fill character must be exactly one character long > > Given the apparent difficulty of writing even basic text processing > algorithms in presence of surrogate pairs, I wonder how wise it is to > expose Python users to them. What's the alternative ? Without surrogates, Python users with UCS-2 build (e.g. the Windows Python users) would not be allowed to play with non-BMP code points. IMHO, it's better to fix the stdlib. This is a long process, as you can see with the Python3 stdlib evolution, but Python will eventually get there. > As Wikipedia explains, [1] > > """ > Because the most commonly used characters are all in the Basic > Multilingual Plane, converting between surrogate pairs and the > original values is often not tested thoroughly. This leads to > persistent bugs, and potential security holes, even in popular and > well-reviewed application software. > """ > > Since UCS-2 (the Character Encoding Form (CEF)) is now defined [1] to > cover only BMP, maybe rather than changing the terms used in the > reference manual, we should tighten the code to conform to the updated > standards? Can we please stop turning this around over and over again :-) UCS-2 has never supported anything other than the BMP. However, you can interpret sequences of UCS-2 code unit as UTF-16 and then get access to the full Unicode character set. We've been doing this in codecs ever since UCS-4 builds were introduced some 8-9 years ago. The change to have chr(i) return surrogates on UCS-2 builds was perhaps done too early, but then, without such changes you'd never notice that your code doesn't work well with surrogates. It's just one piece of the puzzle when going from 8-bit strings to Unicode. > Again, given that the str object itself has at least one non-BMP > character bug as we are closing on the third major release of py3k, > how likely are 3rd party developers to get their libraries right as > they port to 3.x? > > [1] http://en.wikipedia.org/wiki/UTF-16/UCS-2 > [2] http://unicode.org/reports/tr17/#CharacterEncodingForm -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 23 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From guido at python.org Tue Nov 23 20:34:17 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 23 Nov 2010 11:34:17 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290535602.3642.87.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535602.3642.87.camel@localhost.localdomain> Message-ID: On Tue, Nov 23, 2010 at 10:06 AM, Antoine Pitrou wrote: > Le mardi 23 novembre 2010 ? 12:57 -0500, Fred Drake a ?crit : >> On Tue, Nov 23, 2010 at 12:37 PM, Antoine Pitrou wrote: >> > Enumerations aren't a type at all (they have no distinguishing >> > property). >> >> In any given language, this may be true, or not. ?Whether they should >> be distinct in Python is core to the current discussion. > > I meant "type" in the structural sense (hence the parenthesis). enums > are just auto-generated constants. Since Python makes it trivial to > generate sequential integers, there's no need for a specific "enum" > construct. > > Now you may argue that enums should be strongly-typed, but that would be > a bit backwards given Python's preference for duck-typing. Please take a step back. The best example of the utility of enums even for Python is bool. I resisted this for the longest time but people kept asking for it. Some properties of bool: (a) bool is a (final) subclass of int, and an int is acceptable in a pinch where a bool is expected (b) bool values are guaranteed unique -- there is only one instance with value True, and only one with value False (c) bool values have a str() and repr() that shows their name instead of their value (but not their class -- that's rarely an issue, and makes the output more compact) I think it makes sense to add a way to the stdlib to add other types like bool. I think (c) is probably the most important feature, followed by (a) -- except the *final* part: I want to subclass enums. (b) is probably easy to do but I don't think it matters that much in practice. >> From a backward-compatibility perspective, what makes sense depends on >> whether they're used to implement existing constants (socket.AF_INET, >> etc.) or if they reserved for new features only. > > It's not only backwards compatibility. New features relying on C APIs > have to be able to map constants to the integers used in the C library. > It would be much better if this were done naturally rather than through > explicit conversion maps. I'm not sure what you mean here. Can you give an example of what you mean? I agree that it should be possible to make pretty much any constant in the OS modules enums -- even if the values vary across platforms. > (this really means subclassing int, if we don't want to complicate > C-level code) Right. FWIW I don't think I'm particular about the exact API to construct a new enum type in Python code; I think in most cases explicitly assigning values is fine. Often the values are constrained by something external anyway; it should be easy to dynamically set the values of a particular enum type (even add new values after the fact). There might also be enums with the same value (even though the mapping from int to enum will then have to pick one). I expect that the API to convert between enums and bare ints should be i = int(e) and e = (i). It would be nice if s = str(e) and e = (s) would work too. -- --Guido van Rossum (python.org/~guido) From barry at python.org Tue Nov 23 20:40:45 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 23 Nov 2010 14:40:45 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> Message-ID: <20101123144045.17b00ac4@mission> On Nov 23, 2010, at 12:57 PM, Fred Drake wrote: >>From a backward-compatibility perspective, what makes sense depends on >whether they're used to implement existing constants (socket.AF_INET, >etc.) or if they reserved for new features only. As is usually the case, there's little reason to change existing working code. Enums can be used whenever a module or API is updated. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Tue Nov 23 20:47:47 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 23 Nov 2010 14:47:47 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEBF3A9.3060604@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <20101123112703.42b42812@mission> <4CEBF3A9.3060604@voidspace.org.uk> Message-ID: <20101123144747.44a2f4c9@mission> On Nov 23, 2010, at 05:02 PM, Michael Foord wrote: >> * Enums are not subclassed from ints or strs. They are a distinct data type >> that can be converted to and from ints and strs. EIBTI. > >But if we are to use it *in* the standard library (as opposed to merely >adding a module *to* the standard library) there are backwards compatibility >concerns. Where modules are already using integers for constants then >integers still need to work. Is int(enum_value) enough, or must the enum value actually *be* an int? >One easy way to achieve this is to subclass integer. If we don't do that >(assuming we decide that putting a solution in the standard library is >appropriate) then we'll have to evaluate what we mean by backwards >compatible. If the modules that use the constants aren't to change then >comparing equal to the underlying value is the minimum (so that the original >value can still be used in place of the new named constant). Not sure if >you'd be happy to make that change in flufl.enum. I'm not sure either. In flufl.enum enum_class(i) also works as expected. >> * The typical way to create them is through a simple, but explicit class >> definition. I personally like being explicit about the item values, and >> the assignments are required to make the metaclass work properly, but >> Michael's convenience patch is totally appropriate for cases where you >> don't care, or you want a one-liner. > >If make_enum was to take a set of values to use (as Antoine suggested) I >don't see what's un-explicit about it. When I saw your patch I immediately thought that I could add a default argument that was something like `int_iter`, i.e. an iterator of integers for the values in the string. I suspect YAGNI, which is why I didn't just add it, but I'm not totally opposed to it. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Tue Nov 23 21:01:02 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 23 Nov 2010 15:01:02 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123165252.0C0743A4114@sparrow.telecommunity.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <20101123113127.78506cb5@mission> <20101123165252.0C0743A4114@sparrow.telecommunity.com> Message-ID: <20101123150102.75f6256c@mission> On Nov 23, 2010, at 11:52 AM, P.J. Eby wrote: >This reminds me: a stdlib enum should support proper pickling and copying; >i.e.: > > assert SomeEnum.anEnum is pickle.loads(pickle.dumps(SomeEnum.anEnum)) > >This could probably be implemented by adding something like: > > def __reduce__(self): > return getattr, (self._class, self._enumname) > >in the EnumValue class. Excellent idea, thanks. Added to flufl.enum in r38. However, only enums created with the class syntax can be pickled though. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From guido at python.org Tue Nov 23 21:00:51 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 23 Nov 2010 12:00:51 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123144747.44a2f4c9@mission> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <20101123112703.42b42812@mission> <4CEBF3A9.3060604@voidspace.org.uk> <20101123144747.44a2f4c9@mission> Message-ID: On Tue, Nov 23, 2010 at 11:47 AM, Barry Warsaw wrote: > On Nov 23, 2010, at 05:02 PM, Michael Foord wrote: > >>> * Enums are not subclassed from ints or strs. ?They are a distinct data type >>> ? ?that can be converted to and from ints and strs. ?EIBTI. >> >>But if we are to use it *in* the standard library (as opposed to merely >>adding a module *to* the standard library) there are backwards compatibility >>concerns. Where modules are already using integers for constants then >>integers still need to work. > > Is int(enum_value) enough, or must the enum value actually *be* an int? I vote for *be*, following bool's example. >>One easy way to achieve this is to subclass integer. If we don't do that >>(assuming we decide that putting a solution in the standard library is >>appropriate) then we'll have to evaluate what we mean by backwards >>compatible. If the modules that use the constants aren't to change then >>comparing equal to the underlying value is the minimum (so that the original >>value can still be used in place of the new named constant). Not sure if >>you'd be happy to make that change in flufl.enum. > > I'm not sure either. ?In flufl.enum enum_class(i) also works as expected. > >>> * The typical way to create them is through a simple, but explicit class >>> ? ?definition. ?I personally like being explicit about the item values, and >>> ? ?the assignments are required to make the metaclass work properly, but >>> ? ?Michael's convenience patch is totally appropriate for cases where you >>> ? ?don't care, or you want a one-liner. >> >>If make_enum was to take a set of values to use (as Antoine suggested) I >>don't see what's un-explicit about it. > > When I saw your patch I immediately thought that I could add a default > argument that was something like `int_iter`, i.e. an iterator of integers for > the values in the string. ?I suspect YAGNI, which is why I didn't just add it, > but I'm not totally opposed to it. > > -Barry > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > > -- --Guido van Rossum (python.org/~guido) From jcea at jcea.es Tue Nov 23 21:33:02 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 23 Nov 2010 21:33:02 +0100 Subject: [Python-Dev] Sporadic problems with bugs.python.org Message-ID: <4CEC24FE.70107@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Happen to me last Sunday, and happening just now. I can access http://bugs.python.org/ just fine, but trying to post a message, open a new bug, change nosy, etc., takes a LONG time (minutes) and it is finally failing with a "400 Bad Request" error: """ Bad Request Your browser sent a request that this server could not understand. Apache/2.2.9 (Debian) mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g mod_wsgi/2.5 Server at bugs.python.org Port 80 """ Last sunday I was able to open the bug after a time. Today I have been retrying for while, with no luck yet. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOwk/plgi5GaxT1NAQJYuQP+LhEUtOXyaz0Ut6586/cwura87jq/XVxn XatNzwadYNH4yF3ewXVkLk6eSjXOnEszr8kWX3inoLY9ND7o3TCMn5uCKOF2G4Lh sgogv7eB5KEffAaXoxZxT+ZJVYBEPyUISgMeD40DL/tQJIcMBtyZtU1nY5QxwPzN O8mGHBlEGpQ= =i/s7 -----END PGP SIGNATURE----- From martin at v.loewis.de Tue Nov 23 21:33:19 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 23 Nov 2010 21:33:19 +0100 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: <4CEC0E54.5070101@g.nevcal.com> References: <4CEA0246.9080607@g.nevcal.com> <4CEB97C7.1070708@g.nevcal.com> <4CEBABBA.9050002@v.loewis.de> <4CEC0E54.5070101@g.nevcal.com> Message-ID: <4CEC250F.6060102@v.loewis.de> > I have read some about side-by-side assemblies but had considered them a > good reason to stick with the outdated M$VC 6.0 compiler, which doesn't > seem to need to create them, and their myriad requirements, which seem > far from necessary for simply compiling a program. I was disappointed > to realize that Python was heading down the path of using the newer > tools that create side-by-side assemblies, but I suppose using an old > and crufty compiler like M$VC 6.0 cannot support some of the newer > features of Windows, which may seem to be necessary to some.... like > 64-bit support, which does seem necessary, even to me. The rationale for moving along with the releases is different, though: you cannot obtain the old versions anymore, except perhaps on Ebay. So new developers coming to Python would not be able to build Python extensions if we didn't always try to use a compiler that is still available (and we are stressing that a little bit: 3.2 will use VS 2008, even though it has been already superceded). In any case, VS 2010 will stop using SxS for the CRT. Regards, Martin From v+python at g.nevcal.com Tue Nov 23 21:42:40 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 23 Nov 2010 12:42:40 -0800 Subject: [Python-Dev] is this a bug? no environment variables In-Reply-To: <4CEC250F.6060102@v.loewis.de> References: <4CEA0246.9080607@g.nevcal.com> <4CEB97C7.1070708@g.nevcal.com> <4CEBABBA.9050002@v.loewis.de> <4CEC0E54.5070101@g.nevcal.com> <4CEC250F.6060102@v.loewis.de> Message-ID: <4CEC2740.7@g.nevcal.com> On 11/23/2010 12:33 PM, "Martin v. L?wis" wrote: > In any case, VS 2010 will stop using SxS for the CRT. Good news! Maybe M$VC will become a useful compiler yet again :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Tue Nov 23 21:43:05 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 23 Nov 2010 12:43:05 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535602.3642.87.camel@localhost.localdomain> Message-ID: <4CEC2759.40203@g.nevcal.com> On 11/23/2010 11:34 AM, Guido van Rossum wrote: > The best example of the utility of enums even for Python is bool. I > resisted this for the longest time but people kept asking for it. Some > properties of bool: > > (a) bool is a (final) subclass of int, and an int is acceptable in a > pinch where a bool is expected > (b) bool values are guaranteed unique -- there is only one instance > with value True, and only one with value False > (c) bool values have a str() and repr() that shows their name instead > of their value (but not their class -- that's rarely an issue, and > makes the output more compact) > > I think it makes sense to add a way to the stdlib to add other types > like bool. I think (c) is probably the most important feature, > followed by (a) -- except the *final* part: I want to subclass enums. > (b) is probably easy to do but I don't think it matters that much in > practice. I was concerned about uniqueness constraints some were touting. While that can be a useful property for some enumerations, it can also be convenient for other enumerations to have multiple names map to the same value. Bool seems appropriately not extensible to additional values. While there are tri-valued (and other) logic systems, they deserve a separate namespace. Bool seems to be an example, then of a "set of distingushed names, with values associated to the names", and is restricted to [two] [unique] integer values. C/C++/C# enum is somewhat like that, and is also restricted to integer values [not necessarily unique]. I wonder if a set of distinguished names need to be restricted to integer values to be useful, although I have no doubt that distinguished names with integer values are useful. Someone used an example of color names class having RGB tuple values, which is a counter example to a restriction to integers. I can think of others as well. Perhaps a "set of distinguished names, with values associated to the names" is really a dict, with the unique names restricted to Python identifier syntax (to be useful), and the values unrestricted. The type of the named value, and the value of the named value, seem not to need to be restricted. But the implementations Bool = dict('False': 0, 'True': 1) or alternately class Bool(): self.False = 0 self.True = 1 is missing a couple characteristics of Python's present bool: the names are not special, and the values are not immutable. Perhaps games could be played to make the second implementation effectively immutable. So I think the real trick of the "enum" (or a generalized "distinguished names") is in the naming. A technique to import the keys that are legal Python identifiers from a dict into a namespace, and retain henceforth immutable values for those names would permit the syntactical usage that people are accustomed to from the C/C++/C# enum, but with extended ranges and types of values, and it seems Bool could be mostly reimplemented via that technique. What is still missing? The "debugging" help: the values, once imported, should not become "just" values of their type, but rather a new type of value, that has an associated name (and type, I think). Whatever magic is worked under the covers to make sure that there is just one True and just one False, so that they can be distinguished from the values 1 and 0, and so reported, should also be applied to these values. So there need not be new syntax for creating the name/value pairs; just use dict. The only new API would be the code that "imports" the dict into the local namespace. Note that other scoped definitions of True and False are not possible today because True and False are keywords. It would be inappropriate to define these distinguished names as all being keywords, so it seems like one could still override the names, even once defined, but such overridden names would lose their special value that makes them a distinguished name. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Nov 23 21:48:43 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 21:48:43 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535602.3642.87.camel@localhost.localdomain> Message-ID: <1290545323.3642.101.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 11:34 -0800, Guido van Rossum a ?crit : > >> From a backward-compatibility perspective, what makes sense depends on > >> whether they're used to implement existing constants (socket.AF_INET, > >> etc.) or if they reserved for new features only. > > > > It's not only backwards compatibility. New features relying on C APIs > > have to be able to map constants to the integers used in the C library. > > It would be much better if this were done naturally rather than through > > explicit conversion maps. > > I'm not sure what you mean here. Can you give an example of what you > mean? I agree that it should be possible to make pretty much any > constant in the OS modules enums -- even if the values vary across > platforms. I mean that PyArg_ParseTuple should continue to be pratical even if e.g. os.SEEK_SET and friends become named constants. It implies that the various format codes such as "i", "l", etc. are still usable with those constants. Hence: > > (this really means subclassing int, if we don't want to complicate > > C-level code) > > Right. :-) Regards Antoine. From rrr at ronadam.com Tue Nov 23 22:03:21 2010 From: rrr at ronadam.com (Ron Adam) Date: Tue, 23 Nov 2010 15:03:21 -0600 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290535676.3642.89.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535676.3642.89.camel@localhost.localdomain> Message-ID: On 11/23/2010 12:07 PM, Antoine Pitrou wrote: > Le mardi 23 novembre 2010 ? 12:50 -0500, Isaac Morland a ?crit : >> Each enumeration is a type (well, OK, not in every language, presumably, >> but certainly in many languages). The word "basic" is more important than >> "types" in my sentence - the point is that an enumeration capability is a >> very common one in a type system, and is very general, not specific to any >> particular application. > > Python already has an enumeration capability. It's called range(). > There's nothing else that C enums have. AFAICT, neither do enums in > other mainstream languages (assuming they even exist; I don't remember > Perl, PHP or Javascript having anything like that, but perhaps I'm > mistaken). Aren't we forgetting enumerate? >>> colors = 'BLACK BROWN RED ORANGE YELLOW GREEN BLUE VIOLET GREY WHITE' >>> dict(e for e in enumerate(colors.split())) {0: 'BLACK', 1: 'BROWN', 2: 'RED', 3: 'ORANGE', 4: 'YELLOW', 5: 'GREEN', 6: 'BLUE', 7: 'VIOLET', 8: 'GREY', 9: 'WHITE'} >>> dict((f, n) for (n, f) in enumerate(colors.split())) {'BLUE': 6, 'BROWN': 1, 'GREY': 8, 'YELLOW': 4, 'GREEN': 5, 'VIOLET': 7, 'ORANGE': 3, 'BLACK': 0, 'WHITE': 9, 'RED': 2} Most other languages that use numbered constants number them by base n^2. >>> [x**2 for x in range(10)] [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] Binary flags have the advantage of saving memory because you can assign more than one to a single integer. Another advantage is other languages use them so it can make it easier interface with them. There also may be some performance advantages as well since you can test for multiple flags with a single comparison. Sets of strings can also work when you don't need to associate a numeric value to the constant. ie... the constant is the value. In this case the set supplies the api. Cheers, Ron From glyph at twistedmatrix.com Tue Nov 23 22:06:41 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 23 Nov 2010 16:06:41 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123153743.3D9451B8ED4@shell-too.nominum.com> References: <20101123153743.3D9451B8ED4@shell-too.nominum.com> Message-ID: On Nov 23, 2010, at 10:37 AM, Ben.Cottrell at nominum.com wrote: > I'd prefer not to think of the number of times I've made the following mistake: > > s = socket.socket(socket.SOCK_DGRAM, socket.AF_INET) If it's any consolation, it's fewer than the number of times I have :). (More fun, actually, is where you pass a file descriptor to the wrong argument of 'fromfd'...) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Nov 23 22:06:45 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 24 Nov 2010 08:06:45 +1100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290526253.3642.9.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> Message-ID: <4CEC2CE5.8000302@pearwood.info> Antoine Pitrou wrote: > Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', > values=range(1, 3)) > > Again, auto-enumeration is useless since it's trivial to achieve > explicitly. That doesn't make auto-enumeration "useless". Unnecessary, perhaps, but not useless. But even then it's only unnecessary if the number of constants are small enough that you can see how many there are without counting (essentially, 4 or fewer). When you have more, it becomes error-prone and a nuisance to have to count them by hand: Constants = make_constants( 'Constants', 'ST_MODE ST_INO ST_DEV ST_NLINK ST_UID ST_GID' \ 'ST_SIZE ST_ATIME ST_MTIME ST_CTIME', values=range(10) ) -- Steven From glyph at twistedmatrix.com Tue Nov 23 22:10:00 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 23 Nov 2010 16:10:00 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290524466.3642.4.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> Message-ID: <935CA187-6799-437E-8F18-2A35886B5117@twistedmatrix.com> On Nov 23, 2010, at 10:01 AM, Antoine Pitrou wrote: > Well, it is easy to assign range(N) to a tuple of names when desired. I > don't think an automatically-enumerating constant generator is needed. I don't think that numerical enumerations are the only kind of constants we're talking about. Others have already mentioned strings. Also, see for some other use-cases. Since this isn't coming to 2.x, we're probably going to do our own thing anyway (unless it turns out that flufl.enum is so great that we want to add another dependency...) but I'm hoping that the outcome of this discussion will point to something we can be compatible with. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Nov 23 22:15:20 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Nov 2010 22:15:20 +0100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <935CA187-6799-437E-8F18-2A35886B5117@twistedmatrix.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <935CA187-6799-437E-8F18-2A35886B5117@twistedmatrix.com> Message-ID: <1290546920.3642.104.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 16:10 -0500, Glyph Lefkowitz a ?crit : > > On Nov 23, 2010, at 10:01 AM, Antoine Pitrou wrote: > > > Well, it is easy to assign range(N) to a tuple of names when > > desired. I > > don't think an automatically-enumerating constant generator is > > needed. > > I don't think that numerical enumerations are the only kind of > constants we're talking about. Others have already mentioned strings. > Also, see for some other use-cases. Since this > isn't coming to 2.x, we're probably going to do our own thing anyway > (unless it turns out that flufl.enum is so great that we want to add > another dependency...) but I'm hoping that the outcome of this > discussion will point to something we can be compatible with. I think that asking for too many features would get in the way, and also make the API quite un-Pythonic. If you want your values to be e.g. OR'able, just choose your values wisely ;) Regards Antoine. From rrr at ronadam.com Tue Nov 23 22:21:17 2010 From: rrr at ronadam.com (Ron Adam) Date: Tue, 23 Nov 2010 15:21:17 -0600 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535676.3642.89.camel@localhost.localdomain> Message-ID: Oops.. x**2 should have been 2**x below. On 11/23/2010 03:03 PM, Ron Adam wrote: > > > On 11/23/2010 12:07 PM, Antoine Pitrou wrote: >> Le mardi 23 novembre 2010 ? 12:50 -0500, Isaac Morland a ?crit : >>> Each enumeration is a type (well, OK, not in every language, presumably, >>> but certainly in many languages). The word "basic" is more important than >>> "types" in my sentence - the point is that an enumeration capability is a >>> very common one in a type system, and is very general, not specific to any >>> particular application. >> >> Python already has an enumeration capability. It's called range(). >> There's nothing else that C enums have. AFAICT, neither do enums in >> other mainstream languages (assuming they even exist; I don't remember >> Perl, PHP or Javascript having anything like that, but perhaps I'm >> mistaken). > > > Aren't we forgetting enumerate? > > >>> colors = 'BLACK BROWN RED ORANGE YELLOW GREEN BLUE VIOLET GREY WHITE' > > >>> dict(e for e in enumerate(colors.split())) > {0: 'BLACK', 1: 'BROWN', 2: 'RED', 3: 'ORANGE', 4: 'YELLOW', 5: 'GREEN', 6: > 'BLUE', 7: 'VIOLET', 8: 'GREY', 9: 'WHITE'} > > >>> dict((f, n) for (n, f) in enumerate(colors.split())) > {'BLUE': 6, 'BROWN': 1, 'GREY': 8, 'YELLOW': 4, 'GREEN': 5, 'VIOLET': 7, > 'ORANGE': 3, 'BLACK': 0, 'WHITE': 9, 'RED': 2} > > > Most other languages that use numbered constants number them by base n^2. > > >>> [x**2 for x in range(10)] > [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] >>> [2**x for x in range(10)] [1, 2, 4, 8, 16, 32, 64, 128, 256, 512] > Binary flags have the advantage of saving memory because you can assign > more than one to a single integer. Another advantage is other languages use > them so it can make it easier interface with them. There also may be some > performance advantages as well since you can test for multiple flags with a > single comparison. > > Sets of strings can also work when you don't need to associate a numeric > value to the constant. ie... the constant is the value. In this case the > set supplies the api. > > Cheers, > Ron > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gmane.org > From steve at pearwood.info Tue Nov 23 22:30:37 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 24 Nov 2010 08:30:37 +1100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290535676.3642.89.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535676.3642.89.camel@localhost.localdomain> Message-ID: <4CEC327D.1050503@pearwood.info> Antoine Pitrou wrote: > Python already has an enumeration capability. It's called range(). > There's nothing else that C enums have. AFAICT, neither do enums in > other mainstream languages (assuming they even exist; I don't remember > Perl, PHP or Javascript having anything like that, but perhaps I'm > mistaken). In Pascal, enumerations are a type, and the value of the named values are an implementation detail. E.g. one would define an enumerated type: type flavour = (sweet, salty, sour, bitter, umame); var x: flavour; and then you would write something like: x := sour; Notice that the constants sweet etc. aren't explicitly predefined, since they're purely internal details and the compiler is allowed to number them any way it likes. In Python, we would need stronger guarantees about the values chosen, so that they could be exposed to external modules, pickled, etc. But that doesn't mean we should be forced to specify the values ourselves. -- Steven From greg.ewing at canterbury.ac.nz Tue Nov 23 22:26:58 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Nov 2010 10:26:58 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <20101123154229.474f7a90@pitrou.net> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> Message-ID: <4CEC31A2.5080809@canterbury.ac.nz> Antoine Pitrou wrote: > I don't understand why people insist on calling that an "enum". enum is > a C legacy and it doesn't bring anything useful as I can tell. The usefulness is that they can have a str() or repr() that displays the name of the value instead of an integer. The bool type was added for much the same reason -- otherwise we would simply have gotten builtin names False = 0 and True = 1. -- Greg From greg.ewing at canterbury.ac.nz Tue Nov 23 22:27:02 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Nov 2010 10:27:02 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290524519.3642.5.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <4CEBD624.9000402@voidspace.org.uk> <1290524519.3642.5.camel@localhost.localdomain> Message-ID: <4CEC31A6.5090505@canterbury.ac.nz> Antoine Pitrou wrote: > Well, it's been inherited by C-like languages, no doubt. Like braces and > semicolumns :) The idea isn't confined to the C family. Pascal and many of the languages inspired by it also have enumerated types. -- Greg From tjreedy at udel.edu Tue Nov 23 23:44:07 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 23 Nov 2010 17:44:07 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11/23/2010 2:11 PM, Alexander Belopolsky wrote: > This discussion motivated me to start looking into how well Python > library itself is prepared to deal with len(chr(i)) = 2. I was not Good idea! > surprised to find that textwrap does not handle the issue that well: > >>>> len(wrap(' \U00010140' * 80, 20)) > 12 >>>> len(wrap(' \U00000140' * 80, 20)) > 8 How well does textwrap handles composable pairs (letter + accent)? Does is count two codepoints as one char space? and avoid putting line breaks between? I suspect textwrap should be regarded as (extended?)_ascii_textwrap. > > That module should probably be rewritten to properly implement the > Unicode line breaking algorithm > . Probably a good idea > Yet finding a bug in a str object method after a 5 min review was a > bit discouraging: > >>>> 'xyz'.center(20, '\U00010140') > Traceback (most recent call last): > File " ", line 1, in > TypeError: The fill character must be exactly one character long Again, what does it do with letter + decorator combinations? It seems to me that the whole notion that one code point == one printed character space is broken once one leaves ascii. Perhaps we need an is_uchar function to recognize multi-code sequences, inclusing surrogate pairs, that represent one char for the purpose of character oriented functions. > Given the apparent difficulty of writing even basic text processing > algorithms in presence of surrogate pairs, I wonder how wise it is to > expose Python users to them. As Wikipedia explains, [1] > > """ > Because the most commonly used characters are all in the Basic > Multilingual Plane, converting between surrogate pairs and the > original values is often not tested thoroughly. This leads to > persistent bugs, and potential security holes, even in popular and > well-reviewed application software. > """ So we did not test thoroughly enough and need to add appropriate unit tests as bugs are fixed. -- Terry Jan Reedy From tjreedy at udel.edu Wed Nov 24 00:07:03 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 23 Nov 2010 18:07:03 -0500 Subject: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS In-Reply-To: <4CEC43A4.80907@netwok.org> References: <20101123203252.39BE7EE9CF@mail.python.org> <4CEC43A4.80907@netwok.org> Message-ID: <4CEC4917.2070508@udel.edu> On 11/23/2010 5:43 PM, ?ric Araujo wrote: >> Modified: python/branches/py3k/Misc/ACKS >> ============================================================================== >> --- python/branches/py3k/Misc/ACKS (original) >> +++ python/branches/py3k/Misc/ACKS Tue Nov 23 21:32:47 2010 >> @@ -1,4 +1,4 @@ >> -Acknowledgements >> +?Acknowledgements > > This change introduced a so-called UTF-8 BOM in the file. Is > TortoiseSvn the culprit or a text editor? I used Notepad to edit the file, TortoiseSvn to commit, the same as I did for #9222, rev86702, Lib\idlelib\IOBinding.py, yesterday. If the latter is OK, perhaps *.py gets filtered better than misc. text files. I believe I have the config as specified in dev/faq. [miscellany] enable-auto-props = yes [auto-props] * = svn:eol-style=native *.c = svn:keywords=Id *.h = svn:keywords=Id *.py = svn:keywords=Id *.txt = svn:keywords=Author Date Id Revision Terry From ijmorlan at uwaterloo.ca Wed Nov 24 00:15:03 2010 From: ijmorlan at uwaterloo.ca (Isaac Morland) Date: Tue, 23 Nov 2010 18:15:03 -0500 (EST) Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <58396.1290540417@parc.com> References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <58396.1290540417@parc.com> Message-ID: On Tue, 23 Nov 2010, Bill Janssen wrote: > The main purpose of that is to be able to catch type mismatches with > static typing, though. Seems kind of pointless for Python. The concept can work dynamically. In fact, the flufl.enum package which has been discussed here makes each enumeration into a separate class so many of the advantages of catching type mismatches are obtained. >> Hey, how about this syntax: >> >> enum Colors: >> red = 0 >> green = 10 >> blue > > Why not > > class Color: > red = (255, 0, 0) > green = (0, 255, 0) > blue = (0, 0, 255) > > Seems to handle the situation OK. Yes, this looks almost exactly like flufl.enum syntax. In any case my suggestion of a new keyword was not meant to be taken seriously. If I ever think I have a good reason to suggest a new keyword I'll sleep on it, take a vacation, and then if I still think a new keyword is justified I will specifically disclaim any possibility of the suggestion being a joke. Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist From db3l.net at gmail.com Wed Nov 24 00:18:33 2010 From: db3l.net at gmail.com (David Bolen) Date: Tue, 23 Nov 2010 18:18:33 -0500 Subject: [Python-Dev] Stable buildbots References: <20101113133712.60e9be27@pitrou.net> <4CEB7E12.1070201@snakebite.org> Message-ID: Trent Nelson writes: > That's interesting. (That kill_python.exe doesn't kill the wedged > processes, but pskill does.) kill_python is pretty simple, it just > calls TerminateProcess() after acquiring a handle with the relevant > PROCESS_TERMINATE access right. (...) > > Are you calling pskill with the -t flag? i.e. kill process and all > dependents? That might be the ticket, especially if killing the child > process that wedged select() is waiting on causes it to return, and > thus, makes it killable. Nope, just "pskill python_d". Haven't bothered to check the pskill source but I'm assuming it's just a basic TerminateProcess. Ideally my quickest workaround would just be to replace the kill_python in the buildbot tools script with that command but of course they could get updated on checkouts and I'm not arguing it's generally appropriate enough to belong in the source. I suspect the problem may be on the "identify which process to kill" rather than the "kill it" part, but it's definitely going to take time to figure that out for sure. While the approach kill_python takes is much more appropriate, since we don't currently have multiple builds running simultaneously (and for me the machines are dedicated as build slaves, so I won't be having my own python_d), a more blanket kill operation is safe enough. > Otherwise, if it happens again, can you try kill_python.exe first, > then pskill, and confirm if the former fails but the latter succeeds? Yeah, I've got a temporary tree with a built-binary around, but still have to make sure of the right way to run it manually in a way that it will do the identification right (which I think also means I need to figure out from which build tree the hung process started). Up until now, typically when I've found a hung setup, the rest of the build tree which originally applied to that process has been cleaned. I definitely sympathize with Martin's position though - it wasn't the simplest tool to write (and I still have some email from him about the week+ it took just to test the process identification part remotely through buildbots at the time), so I regret not jumping right in to try to fix it. But it's just way more effort than typing "pskill python_d", at least with my current availability. -- David From greg.ewing at canterbury.ac.nz Wed Nov 24 00:32:39 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Nov 2010 12:32:39 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290546920.3642.104.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <935CA187-6799-437E-8F18-2A35886B5117@twistedmatrix.com> <1290546920.3642.104.camel@localhost.localdomain> Message-ID: <4CEC4F17.7030600@canterbury.ac.nz> Antoine Pitrou wrote: > I think that asking for too many features would get in the way, and also > make the API quite un-Pythonic. If you want your values to be e.g. > OR'able, just choose your values wisely ;) On the other hand it could be useful to have an easy way to request power-of-2 value assignment, seeing as it's another common pattern. -- Greg From greg.ewing at canterbury.ac.nz Wed Nov 24 00:32:56 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Nov 2010 12:32:56 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <58396.1290540417@parc.com> References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <58396.1290540417@parc.com> Message-ID: <4CEC4F28.7010904@canterbury.ac.nz> Bill Janssen wrote: > The main purpose of that is to be able to catch type mismatches with > static typing, though. Seems kind of pointless for Python. But catching type mismatches with dynamic typing doesn't seem pointless for Python. There's nothing static about the proposals being made here that I can see. > Why not > > class Color: > red = (255, 0, 0) > green = (0, 255, 0) > blue = (0, 0, 255) If all you want is a bunch of named constants, that's fine. But the facilities being discussed here are designed to give you other things as well, such as c = Color.red print(c) printing "red" rather than "(255, 0, 0)". -- Greg From greg.ewing at canterbury.ac.nz Wed Nov 24 00:33:02 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Nov 2010 12:33:02 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290526253.3642.9.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> Message-ID: <4CEC4F2E.6080601@canterbury.ac.nz> Antoine Pitrou wrote: > Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', > values=range(1, 3)) > > Again, auto-enumeration is useless since it's trivial to achieve > explicitly. But seeing as it's going to be a common thing to do, why not make it the default? When defining an enum, often you don't *care* what the underlying values are, so assigning sequential natural numbers is as good a default as any. In fact, with the Pascal concept of an enumerated type you don't get any choice in the matter. It's only in the C family that you get this bastardised conflation of enumerations with arbitrary named constants... -- Greg From greg.ewing at canterbury.ac.nz Wed Nov 24 00:41:50 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Nov 2010 12:41:50 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <58396.1290540417@parc.com> Message-ID: <4CEC513E.4050603@canterbury.ac.nz> Isaac Morland wrote: > In any case my > suggestion of a new keyword was not meant to be taken seriously. I don't think it need be taken entirely as a joke, either. All the proposed patterns for creating enums that I've seen end up leaving something to be desired. They violate DRY by requiring you to write the class name twice, or they make you write the names of the values in quotes, or some other minor ugliness. While it may be possible to work around these things with sufficient levels of metaclass hackery and black magic, at some point one has to consider whether new syntax might be the least worst option. -- Greg From greg.ewing at canterbury.ac.nz Wed Nov 24 00:49:42 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Nov 2010 12:49:42 +1300 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEC5316.4010608@canterbury.ac.nz> Alexander Belopolsky wrote: > """ > Because the most commonly used characters are all in the Basic > Multilingual Plane, converting between surrogate pairs and the > original values is often not tested thoroughly. This leads to > persistent bugs, and potential security holes, even in popular and > well-reviewed application software. > """ Maybe Python should have used UTF-8 as its internal unicode representation. Then people who were foolish enough to assume one character per string item would have their programs break rather soon under only light unicode testing. :-) -- Greg From foom at fuhm.net Wed Nov 24 01:22:23 2010 From: foom at fuhm.net (James Y Knight) Date: Tue, 23 Nov 2010 19:22:23 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEC5316.4010608@canterbury.ac.nz> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> Message-ID: <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> On Nov 23, 2010, at 6:49 PM, Greg Ewing wrote: > Maybe Python should have used UTF-8 as its internal unicode > representation. Then people who were foolish enough to assume > one character per string item would have their programs break > rather soon under only light unicode testing. :-) You put a smiley, but, in all seriousness, I think that's actually the right thing to do if anyone writes a new programming language. It is clearly the right thing if you don't have to be concerned with backwards-compatibility: nobody really needs to be able to access the Nth codepoint in a string in constant time, so there's not really any point in storing a vector of codepoints. Instead, provide bidirectional iterators which can traverse the string by byte, codepoint, or by grapheme (that is: the set of combining characters + base character that go together, making up one thing which a human would think of as a character). James From jcea at jcea.es Wed Nov 24 01:31:01 2010 From: jcea at jcea.es (Jesus Cea) Date: Wed, 24 Nov 2010 01:31:01 +0100 Subject: [Python-Dev] Sporadic problems with bugs.python.org In-Reply-To: <4CEC24FE.70107@jcea.es> References: <4CEC24FE.70107@jcea.es> Message-ID: <4CEC5CC5.5070305@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 23/11/10 21:33, Jesus Cea wrote: > Happen to me last Sunday, and happening just now. > > I can access http://bugs.python.org/ just fine, but trying to post a > message, open a new bug, change nosy, etc., takes a LONG time (minutes) > and it is finally failing with a "400 Bad Request" error: > > """ > Bad Request > > Your browser sent a request that this server could not understand. > Apache/2.2.9 (Debian) mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 > OpenSSL/0.9.8g mod_wsgi/2.5 Server at bugs.python.org Port 80 > """ > > Last sunday I was able to open the bug after a time. Today I have been > retrying for while, with no luck yet. Still retrying, with no luck. Anybody else can reproduce?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOxcxZlgi5GaxT1NAQJGEQQApyTPFFyPbzc45v5AfeLwT0YHvIcFyT5a lZVZIJ+TVeI1PY/bZpebO4YnjQ6JrHIIedXf8IUqBi9sD8UUDY5tST8TikZPwvvk pGvdCRwa2A6slGG5zgnA4u4+H2MiOiRhua0sTELNQJYAgzTNER+LDTWQ04p31kOD D++Hjb2mBs8= =TI1J -----END PGP SIGNATURE----- From fuzzyman at voidspace.org.uk Wed Nov 24 01:41:37 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 24 Nov 2010 00:41:37 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1290546920.3642.104.camel@localhost.localdomain> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <935CA187-6799-437E-8F18-2A35886B5117@twistedmatrix.com> <1290546920.3642.104.camel@localhost.localdomain> Message-ID: <4CEC5F41.8060806@voidspace.org.uk> On 23/11/2010 21:15, Antoine Pitrou wrote: > Le mardi 23 novembre 2010 ? 16:10 -0500, Glyph Lefkowitz a ?crit : >> On Nov 23, 2010, at 10:01 AM, Antoine Pitrou wrote: >> >>> Well, it is easy to assign range(N) to a tuple of names when >>> desired. I >>> don't think an automatically-enumerating constant generator is >>> needed. >> I don't think that numerical enumerations are the only kind of >> constants we're talking about. Others have already mentioned strings. >> Also, see for some other use-cases. Since this >> isn't coming to 2.x, we're probably going to do our own thing anyway >> (unless it turns out that flufl.enum is so great that we want to add >> another dependency...) but I'm hoping that the outcome of this >> discussion will point to something we can be compatible with. > I think that asking for too many features would get in the way, and also > make the API quite un-Pythonic. If you want your values to be e.g. > OR'able, just choose your values wisely ;) > Well, the point of an OR'able flag is that the result shows the OR'd values in the repr. Raymond suggests using a set of strings where you need flag constants. For new apis (so no backwards compatibility constraints) where you don't need to use integers (i.e. not wrapping a C library) that's a great suggestion: flags = {'FOO', 'BAR'} Michael > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From lukasz at langa.pl Wed Nov 24 01:50:23 2010 From: lukasz at langa.pl (=?utf-8?Q?=C5=81ukasz_Langa?=) Date: Wed, 24 Nov 2010 01:50:23 +0100 Subject: [Python-Dev] Centos 5.5 freeze during test_concurrent_futures Message-ID: Hi there! py3k built from trunk on Centos 5.5 freezes during regrtest on test_concurrent_futures with "Fatal Python error: Invalid thread state for this thread". As in a typical concurrent problem, subsequent calls freeze in different test cases, but the freeze itself is always reproducible and always during this test. A colorful example: http://bpaste.net/show/11493/ I created an issue for that here: http://bugs.python.org/issue10517 If necessary, I can provide Centos 5.5 shell access. I would also like to donate a Centos 5.5 buildbot. -- Best regards, ?ukasz Langa tel. +48 791 080 144 WWW http://lukasz.langa.pl/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcea at jcea.es Wed Nov 24 02:32:05 2010 From: jcea at jcea.es (Jesus Cea) Date: Wed, 24 Nov 2010 02:32:05 +0100 Subject: [Python-Dev] Sporadic problems with bugs.python.org In-Reply-To: <4CEC5CC5.5070305@jcea.es> References: <4CEC24FE.70107@jcea.es> <4CEC5CC5.5070305@jcea.es> Message-ID: <4CEC6B15.6060606@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 24/11/10 01:31, Jesus Cea wrote: > Still retrying, with no luck. > > Anybody else can reproduce?. One of my tracker changes was just processed. The important one still retrying every 5 minutes... I hope I can go sleep before dawn :-P. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOxrFZlgi5GaxT1NAQLHUQP+IyN3X/vt5AQKpg/fTjSUpfX2f3wTzeOp 8+5Gnb2ktyZQEF0ELBo0wiWNReJcxicw3ZD9Zqy05cprJ8VL7QZSRHkom+BiXrKK P+Rllulp8Eu+wq59NKJb5DGk8tfDt6zywepUAHB449Dkcyq9p8gt8L5LAiABTfsy dFaQPP2w1Kg= =ERTw -----END PGP SIGNATURE----- From tjreedy at udel.edu Wed Nov 24 02:51:20 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 23 Nov 2010 20:51:20 -0500 Subject: [Python-Dev] Sporadic problems with bugs.python.org In-Reply-To: <4CEC6B15.6060606@jcea.es> References: <4CEC24FE.70107@jcea.es> <4CEC5CC5.5070305@jcea.es> <4CEC6B15.6060606@jcea.es> Message-ID: On 11/23/2010 8:32 PM, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 24/11/10 01:31, Jesus Cea wrote: >> Still retrying, with no luck. >> >> Anybody else can reproduce?. > > One of my tracker changes was just processed. > > The important one still retrying every 5 minutes... > > I hope I can go sleep before dawn :-P. I added a comment to one issue and opened another with no problem during the last couple of hours. -- Terry Jan Reedy From glyph at twistedmatrix.com Wed Nov 24 02:52:13 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 23 Nov 2010 20:52:13 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> Message-ID: On Nov 23, 2010, at 7:22 PM, James Y Knight wrote: > On Nov 23, 2010, at 6:49 PM, Greg Ewing wrote: >> Maybe Python should have used UTF-8 as its internal unicode >> representation. Then people who were foolish enough to assume >> one character per string item would have their programs break >> rather soon under only light unicode testing. :-) > > You put a smiley, but, in all seriousness, I think that's actually the right thing to do if anyone writes a new programming language. It is clearly the right thing if you don't have to be concerned with backwards-compatibility: nobody really needs to be able to access the Nth codepoint in a string in constant time, so there's not really any point in storing a vector of codepoints. > > Instead, provide bidirectional iterators which can traverse the string by byte, codepoint, or by grapheme (that is: the set of combining characters + base character that go together, making up one thing which a human would think of as a character). I really hope that this idea is not just for new programming languages. If you switch from doing unicode "wrong" to doing unicode "right" in Python, you quadruple the memory footprint of programs which primarily store and manipulate large amounts of text. This is especially ridiculous in PyGTK applications, where the GUI's internal representation required by the GUI UTF-8 anyway, so the round-tripping of string data back and forth to the exploded UTF-32 representation is wasting gobs of memory and time. It at least makes sense when your C library's idea about character width and your Python build match up. But, in a desktop app this is unlikely to be a performance concern; in servers, it's a big deal; measurably so. I am pretty sure that in the server apps that I work on, we are eventually going to need our own string type and UTF-8 logic that does exactly what James suggested - certainly if we ever hope to support Py3. (I dimly recall that both James and I have made this point before, but it's pretty important, so it bears repeating.) From glyph at twistedmatrix.com Wed Nov 24 02:56:57 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 23 Nov 2010 20:56:57 -0500 Subject: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a) In-Reply-To: <20101123150219.29e20374@pitrou.net> References: <4CEB3F72.7000006@m2.ccsnet.ne.jp> <20101123150219.29e20374@pitrou.net> Message-ID: <720EFE43-119F-4F2F-BCB1-939275B5FA6E@twistedmatrix.com> On Nov 23, 2010, at 9:02 AM, Antoine Pitrou wrote: > On Tue, 23 Nov 2010 00:07:09 -0500 > Glyph Lefkowitz wrote: >> On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto < >> ocean-city at m2.ccsnet.ne.jp> wrote: >> >>> Hello. Does this affect python? Thank you. >>> >>> http://www.openssl.org/news/secadv_20101116.txt >>> >> >> No. > > Well, actually it does, but Python links against the system OpenSSL on > most platforms (except Windows), so it's up to the OS vendor to apply > the patch. It does? If so, I must have misunderstood the vulnerability. Can you explain how it affects Python? From stephen at xemacs.org Wed Nov 24 03:29:47 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 24 Nov 2010 11:29:47 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87tyj7bgis.fsf@uwakimon.sk.tsukuba.ac.jp> Alexander Belopolsky writes: > Yet finding a bug in a str object method after a 5 min review was a > bit discouraging: > > >>> 'xyz'.center(20, '\U00010140') > Traceback (most recent call last): > File " ", line 1, in > TypeError: The fill character must be exactly one character long > > Given the apparent difficulty of writing even basic text processing > algorithms in presence of surrogate pairs, I wonder how wise it is to > expose Python users to them. "Consenting adults" applies here. What to do? Write tests, fix the stdlib. Raise the probability of surrogate pair tests in the fuzzer. But "expose the users to surrogate pairs in an efficient (ie, UCS-2) implementation" is a fundamental design principle of Python. Tightening up the internal implementation is -10 unacceptable IMO YMMV. > Again, given that the str object itself has at least one non-BMP > character bug as we are closing on the third major release of py3k, > how likely are 3rd party developers to get their libraries right as > they port to 3.x? Not our problem, really. We need to fix the stdlib, but 3rd party libraries know what they're doing. I guess we could provide a fuzztest module that generates known nasty data (zero, very big numbers, "\0x00", "\U00010140", etc) that people would be able to plug in as a data source for their own code. Of course that doesn't replace conventional unittests based on analysis of edge cases and tests designed to tickle them, but it would be a start for many projects. From raymond.hettinger at gmail.com Wed Nov 24 03:35:35 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 23 Nov 2010 18:35:35 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEC513E.4050603@canterbury.ac.nz> References: <20101121034404.52924F20A@mail.python.org> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <58396.1290540417@parc.com> <4CEC513E.4050603@canterbury.ac.nz> Message-ID: <6A9ADF09-971A-4CD7-B583-3BF264E47CF2@gmail.com> On Nov 23, 2010, at 3:41 PM, Greg Ewing wrote: > While it may be possible to work around these things with > sufficient levels of metaclass hackery and black magic, at > some point one has to consider whether new syntax might > be the least worst option. The least worst option is to do nothing at all. That's better than creating a new little monster with its own nuances and limitations. We've gotten by well for almost two decades without this particular static language feature creeping into Python. For the most part, strings work well enough (see decimal.ROUND_UP for example). They are self-documenting and work well with the rest of the language. When a cluster of names cries out for its own namespace, the usual technique is to put the names in class (see the examples in the namedtuple docs for a way to make this a one-liner) or in a module (see opcode.py for example). For xor'able and or'able flags, sets of strings work well: flags = {'runnable', 'callable'} flags |= {'runnable', 'kissable'} if 'callable' in flags: . . . We have a hard enough time getting people to not program Java in Python. IMO, adding a new enumeration type would make this situation worse. Also, it adds weight to the language -- Python is not in needs of yet another fundamental construct. Raymond P.S. I do recognize that lots of people have written their own versions of Enum(), but I think they do it either out of habits formed from statically compiled languages that lack all of our namespace mechanisms or they do it because it is easy and fun to write (just like people seem to enjoy writing flatten() recipes more than they like actually using them). One other thought: With Py3.x, the language had its one chance to get smaller. Old-style classes were tossed, some built-ins vanished, and a few obsolete modules got nuked. It would be easy to have a "let's add thingie x" fest and lose those benefits. There are many devs who find that the language does not fit-in-their-heads anymore, so considerable restraint needs to be exercised before adding a new language feature that would soon permeate everyone's code base and add yet another thing that infrequent users have to learn before being able to read code. From stephen at xemacs.org Wed Nov 24 03:44:40 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 24 Nov 2010 11:44:40 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> Message-ID: <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> James Y Knight writes: > You put a smiley, but, in all seriousness, I think that's actually > the right thing to do if anyone writes a new programming > language. It is clearly the right thing if you don't have to be > concerned with backwards-compatibility: nobody really needs to be > able to access the Nth codepoint in a string in constant time, so > there's not really any point in storing a vector of codepoints. A sad commentary on the state of Emacs usage, "nobody". The theory is that accessing the first character of a region in a string often occurs as a primitive operation in O(N) or worse algorithms, sometimes without enough locality at the "collection of regions" level to give a reasonably small average access time. In practice, any *Emacs user can tell you that yes, we do need to be able to access the Nth codepoint in a buffer in constant time. The O(N) behavior of current Emacs implementations means that people often use a binary coding system on large files. Yes, some position caching is done, but if you have a large file (eg, a mail file) which is virtually segmented using pointers to regions, locality gets lost. (This is not a design bug, this is a fundamental requirement: consider fast switching between threaded view and author-sorted view.) And of course an operation that sorts regions in a buffer using character pointers will have the same problem. Working with memory pointers, OTOH, sucks more than that; GNU Emacs recently bit the bullet and got rid of their higher-level memory-oriented APIs, all of the Lisp structures now work with pointers, and only the very low-level structures know about character-to-memory pointer translation. This performance issue is perceptible even on 3GHz machines with not so large (50MB) mbox files. It's *horrid* if you do something like "occur" on a 1GB log file, then try randomly jumping to detected log entries. From fdrake at acm.org Wed Nov 24 03:58:47 2010 From: fdrake at acm.org (Fred Drake) Date: Tue, 23 Nov 2010 21:58:47 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <6A9ADF09-971A-4CD7-B583-3BF264E47CF2@gmail.com> References: <20101121034404.52924F20A@mail.python.org> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <58396.1290540417@parc.com> <4CEC513E.4050603@canterbury.ac.nz> <6A9ADF09-971A-4CD7-B583-3BF264E47CF2@gmail.com> Message-ID: On Tue, Nov 23, 2010 at 9:35 PM, Raymond Hettinger wrote: > The least worst option is to do nothing at all. For the standard library, I agree. There are enough variants that are needed/desired in different contexts, and there isn't a single clear winner. Nor is there any compelling reason to have a winner. I'm generally in favor of enums (or whatever you want to call them), and I'm in favor of importing support for the flavor you need, or just defining constants in whatever way makes sense for your library or application. I don't see any problems that aren't solved by that. ? -Fred -- Fred L. Drake, Jr.? ? "A storm broke loose in my mind."? --Albert Einstein From jcea at jcea.es Wed Nov 24 04:03:36 2010 From: jcea at jcea.es (Jesus Cea) Date: Wed, 24 Nov 2010 04:03:36 +0100 Subject: [Python-Dev] Sporadic problems with bugs.python.org In-Reply-To: References: <4CEC24FE.70107@jcea.es> <4CEC5CC5.5070305@jcea.es> <4CEC6B15.6060606@jcea.es> Message-ID: <4CEC8088.7010709@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 24/11/10 02:51, Terry Reedy wrote: >> I hope I can go sleep before dawn :-P. > > I added a comment to one issue and opened another with no problem during > the last couple of hours. My changes have work now. After like 8 hours and a retry every five minutes. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTOyAiJlgi5GaxT1NAQLavgP/ZmlKIu+luLw7DpJAVk/p3BCF7wmciE0J KW5SmCHVsyPuKFgOY45f5PM0q7+iXiv3m59zrDNbk0yBvLnVbmGwEeeV1/kGsZ94 NrYuHqnwW6h19tbrFTmVZ5BVKBSc4pdvBhV3+0Zx9hAfkkH/heE4WKJEFd7tIzTu h9jsvAI8pR8= =sG82 -----END PGP SIGNATURE----- From glyph at twistedmatrix.com Wed Nov 24 04:27:38 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 23 Nov 2010 22:27:38 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> On Nov 23, 2010, at 9:44 PM, Stephen J. Turnbull wrote: > James Y Knight writes: > >> You put a smiley, but, in all seriousness, I think that's actually >> the right thing to do if anyone writes a new programming >> language. It is clearly the right thing if you don't have to be >> concerned with backwards-compatibility: nobody really needs to be >> able to access the Nth codepoint in a string in constant time, so >> there's not really any point in storing a vector of codepoints. > > A sad commentary on the state of Emacs usage, "nobody". > > The theory is that accessing the first character of a region in a > string often occurs as a primitive operation in O(N) or worse > algorithms, sometimes without enough locality at the "collection of > regions" level to give a reasonably small average access time. I'm not sure what you mean by "the theory is". Whose theory? About what? > In practice, any *Emacs user can tell you that yes, we do need to be > able to access the Nth codepoint in a buffer in constant time. The > O(N) behavior of current Emacs implementations means that people often > use a binary coding system on large files. Yes, some position caching > is done, but if you have a large file (eg, a mail file) which is > virtually segmented using pointers to regions, locality gets lost. > (This is not a design bug, this is a fundamental requirement: consider > fast switching between threaded view and author-sorted view.) Sounds like a design bug to me. Personally, I'd implement "fast switching between threaded view and author-sorted view" the same way I'd address any other multiple-views-on-the-same-data problem. I'd retain data structures for both, and update them as the underlying model changed. These representations may need to maintain cursors into the underlying character data, if they must retain giant wads of character data as an underlying representation (arguably the _main_ design bug in Emacs, that it encourages you to do that for everything, rather than imposing a sensible structure), but those cursors don't need to be code-point counters; they could be byte offsets, or opaque handles whose precise meaning varied with the potentially variable underlying storage. Also, please remember that Emacs couldn't be implemented with giant Python strings anyway: crucially, all of this stuff is _mutable_ in Emacs. > And of course an operation that sorts regions in a buffer using > character pointers will have the same problem. Working with memory > pointers, OTOH, sucks more than that; GNU Emacs recently bit the > bullet and got rid of their higher-level memory-oriented APIs, all of > the Lisp structures now work with pointers, and only the very > low-level structures know about character-to-memory pointer > translation. > > This performance issue is perceptible even on 3GHz machines with not > so large (50MB) mbox files. It's *horrid* if you do something like > "occur" on a 1GB log file, then try randomly jumping to detected log > entries. Case in point: "occur" needs to scan the buffer anyway; you can't do better than linear time there. So you're going to iterate through the buffer, using one of the techniques that James proposed, and remember some locations. Why not just have those locations be opaque cursors into your data? In summary: you're right, in that James missed a spot. You need bidirectional, *copyable* iterators that can traverse the string by byte, codepoint, grapheme, or decomposed glyph. From v+python at g.nevcal.com Wed Nov 24 05:28:19 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 23 Nov 2010 20:28:19 -0800 Subject: [Python-Dev] http.server - reference to bug #427345 Message-ID: <4CEC9463.8030302@g.nevcal.com> Where might I find the bug #427345 that is referred to in a comment inside http.server ? Here is a code excerpt: # throw away additional data [see bug #427345] while select.select([self.rfile._sock], [], [], 0)[0]: if not self.rfile._sock.recv(1): break -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.curtin at gmail.com Wed Nov 24 05:35:10 2010 From: brian.curtin at gmail.com (Brian Curtin) Date: Tue, 23 Nov 2010 22:35:10 -0600 Subject: [Python-Dev] http.server - reference to bug #427345 In-Reply-To: <4CEC9463.8030302@g.nevcal.com> References: <4CEC9463.8030302@g.nevcal.com> Message-ID: On Tue, Nov 23, 2010 at 22:28, Glenn Linderman > wrote: > Where might I find the bug #427345 that is referred to in a comment inside > http.server ? Here is a code excerpt: > > # throw away additional data [see bug #427345] > while select.select([self.rfile._sock], [], [], 0)[0]: > if not self.rfile._sock.recv(1): > break > http://bugs.python.org/issue427345 http://bugs.python.org/ has a box on the left-hand side where you can enter issue numbers. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Nov 24 06:07:52 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 24 Nov 2010 14:07:52 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> Message-ID: <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> Note that I'm not saying that there shouldn't be a UTF-8 string type; I'm just saying that for some purposes it might be a good idea to keep UTF-16 and UTF-32 string types around. Glyph Lefkowitz writes: > > The theory is that accessing the first character of a region in a > > string often occurs as a primitive operation in O(N) or worse > > algorithms, sometimes without enough locality at the "collection of > > regions" level to give a reasonably small average access time. > > I'm not sure what you mean by "the theory is". Whose theory? About what? Mine. About why somebody somewhere someday would need fast random access to character positions. "Nobody ever needs that" is a strong claim. > > In practice, any *Emacs user can tell you that yes, we do need to be > > able to access the Nth codepoint in a buffer in constant time. The > > O(N) behavior of current Emacs implementations means that people often > > use a binary coding system on large files. Yes, some position caching > > is done, but if you have a large file (eg, a mail file) which is > > virtually segmented using pointers to regions, locality gets lost. > > (This is not a design bug, this is a fundamental requirement: consider > > fast switching between threaded view and author-sorted view.) > > Sounds like a design bug to me. Personally, I'd implement "fast > switching between threaded view and author-sorted view" the same > way I'd address any other multiple-views-on-the-same-data problem. > I'd retain data structures for both, and update them as the > underlying model changed. Um, that's precisely the design I'm talking about. But as you recognize later, the message content is not part of those structures because there's no real point in copying it *if you have fast access to character positions*. In a variable width character, character- addressed design, there can be a perceptible delay in accessing even the "next" message's content if you're in the wrong view. > These representations may need to maintain cursors into the > underlying character data, if they must retain giant wads of > character data as an underlying representation (arguably the _main_ > design bug in Emacs, that it encourages you to do that for > everything, rather than imposing a sensible structure), but those > cursors don't need to be code-point counters; they could be byte > offsets, or opaque handles whose precise meaning varied with the > potentially variable underlying storage. Both byte offsets and opaque handles really really suck to design, implement, and maintain, if Lisp or Python level users can use them. They're hard enough to do when you can hide them behind internal APIs, but if they're accessible to users they're an endless source of user bugs. What was that you were saying about the difficulty of remembering which argument is the fd? It's like that. Sure, you can design APIs to help get that right, but it's not easy to provide one that can be used for all the different applications out there. > Also, please remember that Emacs couldn't be implemented with giant > Python strings anyway: crucially, all of this stuff is _mutable_ in > Emacs. No, that's a red herring. The use-cases where Emacs users complain most is browsing giant logs and reading old mail; neither needs the content to be mutable (although of course it's a convenience in the mail case if you delete messages or fetch new mail, but that could be done with transaction logs that get appended to the on-disk file). > Case in point: "occur" needs to scan the buffer anyway; you can't > do better than linear time there. So you're going to iterate > through the buffer, using one of the techniques that James > proposed, and remember some locations. Why not just have those > locations be opaque cursors into your data? They are. But unless you're willing to implement correct character motion, they need to be character indicies, which will be slow to access the actual locations. We've implemented caches, as does Emacs, but they don't always get hits. Finding an arbitrary position once can involve perceptible delay on up to 1GHz machines; doing it in a loop (which mail programs have a habit of doing) could be very painful. > In summary: you're right, in that James missed a spot. You need > bidirectional, *copyable* iterators that can traverse the string by > byte, codepoint, grapheme, or decomposed glyph. That's a good start, yes. But once you talk about "remembering some locations", you're implicitly talking about random access. Either you maintain position indexes which naively implemented can easily be close to the size of the text buffer (indexes are going to be at least 4 bytes, possibly 8, per position, and something like "occur" can generate a lot of positions) -- in which case you might as well just use a representation that is an array in the first place -- or you need to implement a position cache which can be very hairy to do well. Or you can give user programs memory indicies, and enjoy the fun as the poor developers do things like "pos += 1" which works fine on the ASCII data they have lying around, then wonder why they get Unicode errors when they take substrings. I'm sure it all can be done, but I don't think it will be done right the first time around. You may be right that designs better adapted to large data sets than Emacs's "everything is a big buffer" will almost always be available with reasonable effort. But remember, a lot of good applications start small, when a flat array might make lots of sense as the underlying structure, and then need to scale. If you need to scale for the paying customers, well, "ouch!" but you can afford it, but for many volunteer or startup projects that takes the wind right out of your sails. Note that if the user doesn't use private space, in a UCS-2 build you have about 1.5K code points available for compressing non-BMP characters into a 2-byte, valid Unicode representation (of course you need to save off the table somewhere if that ever gets out of your program, but that's easy). I find it hard to imagine that there will be many use-cases that need more than that many non-BMP characters. So probably you can tell those few users who care to use a UCS-4 build; most of the array use-cases can be served by UCS-2. Note that in my Japanese corpuses, UTF-8 averages just about 2 bytes per character anyway, and those are mail files, where two lines of Japanese may be preceded by 2KB of ASCII-only header. I suspect Hebrew, Arabic, and Cyrillic users will have similar experiences. By the way, to send the ball back into your court, I have this feeling that the demand for UTF-8 is once again driven by native English speakers who are very shortly going to find themselves, and the data they are most familiar with, very much in the minority. Of course the market that benefits from UTF-8 compression will remain very large for the immediate future, but in the grand scheme of things, most of the world is going to prefer UTF-16 by a substantial margin. N.B. I'm not talking about persistent storage, where it's 6 of one and half a dozen of the other; you can translate UTF-8 to UTF-16 way faster than you can read content from disk, of course. From foom at fuhm.net Wed Nov 24 07:26:11 2010 From: foom at fuhm.net (James Y Knight) Date: Wed, 24 Nov 2010 01:26:11 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <3C1ADB64-63F3-4165-926D-EDE9846E0DBD@fuhm.net> On Nov 24, 2010, at 12:07 AM, Stephen J. Turnbull wrote: > Or you can give user programs memory indicies, and enjoy the fun as > the poor developers do things like "pos += 1" which works fine on > the ASCII data they have lying around, then wonder why they get > Unicode errors when they take substrings. a) You seem to be hung up implementation details of emacs. But yes, positions should be stored as an byte offset into the utf8 string. NOT as number of codepoints since the beginning of the string. Probably you want it to be somewhat opaque, so that you actually have to specify whether you wanted to go to +1 byte, codepoint, or grapheme. b) Those poor developers are *already* screwed if they're using pos += 1 when pos is a codepoint index and they then take a substring based on that! They will get half a character when the string contains combining characters... Pretending that "codepoints" are a useful abstraction just makes poor developers get by without doing the correct thing (incrementing to the next grapheme boundary) for a little bit longer. But once you [the language implementor] are providing correct abstractions for grapheme movement, it's just as easy to also provide an abstraction for codepoint movement, and make your low-level implementation of the iterator object be a byte-offset into a UTF8 buffer. James From foom at fuhm.net Wed Nov 24 07:27:52 2010 From: foom at fuhm.net (James Y Knight) Date: Wed, 24 Nov 2010 01:27:52 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Nov 24, 2010, at 12:07 AM, Stephen J. Turnbull wrote: > By the way, to send the ball back into your court, I have this feeling > that the demand for UTF-8 is once again driven by native English > speakers who are very shortly going to find themselves, and the data > they are most familiar with, very much in the minority. Of course the > market that benefits from UTF-8 compression will remain very large for > the immediate future, but in the grand scheme of things, most of the > world is going to prefer UTF-16 by a substantial margin. No, the demand for UTF-8 is because that's what much of the internet (and not coincidentally, unix) world has standardized on. The main pieces of software using UTF-16 (Windows, Java) started doing so before it became apparent that 16 bits wasn't enough to actually hold a unicode codepoint, so they were actually implementing UCS-2. In those days, UCS-2 was a fairly sensible choice. But, now, if your choices are UTF-8 or UTF-16, UTF-8 is clearly superior. Not because it's smaller -- it's pretty much a tossup -- but because it is an ASCII superset, and thus more easily compatible with other software. That also makes it most commonly used for internet communication. (So, there's a huge advantage for using it internally as well right there: no transcoding necessary for writing your HTML output). UTF-16 is incompatible with ASCII, and furthermore, it's still a variable-width encoding, with all the same issues that causes. As such, there's really very little to be said in favor of it. If you really want a fixed-width encoding, you have to go to UTF-32, which is excessively large. UTF-32 is a losing choice, simply because of the wasted memory usage. But that's all a side issue: even if you do choose UTF-16 as your underlying encoding, you *still* need to provide iterators that work by "byte" (only now bytes are 16-bits), by codepoint, and by grapheme. Of course, people who implement UTF-16 (such as python, java, and windows) often pretend they're still implementing UCS-2, and don't bother even providing their users with the necessary APIs to do things correctly. Which, you can often get away with...just so long as you don't mind that you sometimes end up splitting a string in the middle of a codepoint and causing a unicode error! James From v+python at g.nevcal.com Wed Nov 24 08:43:18 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 23 Nov 2010 23:43:18 -0800 Subject: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4 In-Reply-To: <20101122043957.2A5D6235C7A@kimball.webabinitio.net> References: <4CE7452A.7050109@g.nevcal.com> <4CE7B34D.4020309@netwok.org> <4CE8111F.9060502@g.nevcal.com> <4CE8CFCD.4040906@g.nevcal.com> <20101121171821.195552194AC@kimball.webabinitio.net> <4CE9EABA.1090306@g.nevcal.com> <20101122043957.2A5D6235C7A@kimball.webabinitio.net> Message-ID: <4CECC216.8090802@g.nevcal.com> On 11/21/2010 8:39 PM, R. David Murray wrote: > On Sun, 21 Nov 2010 19:59:54 -0800, Glenn Linderman wrote: >> On 11/21/2010 9:18 AM, R. David Murray wrote: >>> I want to look at the CGI issue, but I'm not sure when I'll get to it. >> Actually, since this code was working before 3.x, and if email.parser >> can now accept binary streams, it seems like maybe the only thing that >> might be wrong is that presently it is getting a text stream instead, so >> that is something cgi.py or the application program would have to >> switch, and then maybe some testing would discover correctness, or maybe >> a specification of UTF-8 as the encoding to use for the text parts would >> have to be done. > Well, given the bytes/string split in Python3, code definitely has to > be changed to make this work, since you have to explicitly call bytes > processing routines (message_from_bytes, message_from_binary_file, > BytesFeedparser, etc) to parse binary data, and likewise use > BytesGenerator to emit binary data. Looks like cgi.py also calls http.client and both of them would need to be changed to deal with bytes. I don't have the full translation of API calls in my head, nor have I ever used the email.parser API to know what the calls actually do... just read a bit about it... but that is different than using it... However, I find code in http.client.parse_headers that is attempting to work-around reading a binary stream and feeding email.parser a string. So definitely some work to be done to fix things. I did add some explicit threads to http.server CGI script code that I think work around the deadlocks that can result from attempting to serialize 3 pipes, and yet not require full buffering of stdin or stdout. At the moment, I still am doing full buffering of stderr, but that is thought to be small potatoes in an http.server environment, generally. But since my test case is a CGI form data, I'm stuck until this is fixed, or I wrap my head around the code in http.client and email.parser. But not tonight (yawn!). -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Nov 24 09:02:13 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 24 Nov 2010 09:02:13 +0100 Subject: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a) In-Reply-To: <720EFE43-119F-4F2F-BCB1-939275B5FA6E@twistedmatrix.com> References: <4CEB3F72.7000006@m2.ccsnet.ne.jp> <20101123150219.29e20374@pitrou.net> <720EFE43-119F-4F2F-BCB1-939275B5FA6E@twistedmatrix.com> Message-ID: <1290585733.3642.2.camel@localhost.localdomain> Le mardi 23 novembre 2010 ? 20:56 -0500, Glyph Lefkowitz a ?crit : > On Nov 23, 2010, at 9:02 AM, Antoine Pitrou wrote: > > > On Tue, 23 Nov 2010 00:07:09 -0500 > > Glyph Lefkowitz wrote: > >> On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto < > >> ocean-city at m2.ccsnet.ne.jp> wrote: > >> > >>> Hello. Does this affect python? Thank you. > >>> > >>> http://www.openssl.org/news/secadv_20101116.txt > >>> > >> > >> No. > > > > Well, actually it does, but Python links against the system OpenSSL on > > most platforms (except Windows), so it's up to the OS vendor to apply > > the patch. > > > It does? If so, I must have misunderstood the vulnerability. Can you > explain how it affects Python? If I believe the link above: ?Any OpenSSL based TLS server is vulnerable if it is multi-threaded and uses OpenSSL's internal caching mechanism. Servers that are multi-process and/or disable internal session caching are NOT affected.? So, you just have to create a multithreaded TLS server which doesn't disable server-side session caching (it is enabled by default according to http://www.openssl.org/docs/ssl/SSL_CTX_set_session_cache_mode.html ) Regards Antoine. From solipsis at pitrou.net Wed Nov 24 09:42:07 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 24 Nov 2010 09:42:07 +0100 Subject: [Python-Dev] Centos 5.5 freeze during test_concurrent_futures References: Message-ID: <20101124094207.33ac093f@pitrou.net> Hi, > py3k built from trunk on Centos 5.5 freezes during regrtest on test_concurrent_futures with "Fatal Python error: Invalid thread state for this thread". As in a typical concurrent problem, subsequent calls freeze in different test cases, but the freeze itself is always reproducible and always during this test. Well, could you run this under gdb and report the stacks for the various threads when the process crashes? (when compiled --with-pydebug, if possible) Thank you Antoine. From solipsis at pitrou.net Wed Nov 24 09:43:12 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 24 Nov 2010 09:43:12 +0100 Subject: [Python-Dev] http.server - reference to bug #427345 References: <4CEC9463.8030302@g.nevcal.com> Message-ID: <20101124094312.06bec373@pitrou.net> On Tue, 23 Nov 2010 22:35:10 -0600 Brian Curtin wrote: > On Tue, Nov 23, 2010 at 22:28, Glenn Linderman > > > wrote: > > > Where might I find the bug #427345 that is referred to in a comment inside > > http.server ? Here is a code excerpt: > > > > # throw away additional data [see bug #427345] > > while select.select([self.rfile._sock], [], [], 0)[0]: > > if not self.rfile._sock.recv(1): > > break > > > > http://bugs.python.org/issue427345 > > http://bugs.python.org/ has a box on the left-hand side where you can enter > issue numbers. And of course you can also reverse-engineer the clever URL scheme used by Roundup bug entries ;) Regards Antoine. From stephen at xemacs.org Wed Nov 24 10:03:29 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 24 Nov 2010 18:03:29 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <3C1ADB64-63F3-4165-926D-EDE9846E0DBD@fuhm.net> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> <3C1ADB64-63F3-4165-926D-EDE9846E0DBD@fuhm.net> Message-ID: <87mxozayam.fsf@uwakimon.sk.tsukuba.ac.jp> James Y Knight writes: > a) You seem to be hung up implementation details of emacs. Hung up? No. It's the program whose text model I know best, and even if its design could theoretically be a lot better for this purpose, I can't say I've seen a real program whose model is obviously better for the purpose of a language for implementing text editors.[1] So it's not obvious to me that its model can be ruled out on a priori grounds. If not, it would be nice if your new language could implement it efficiently without contorted programming. > But yes, positions should be stored as an byte offset into the > utf8 string. NOT as number of codepoints since the beginning of > the string. Probably you want it to be somewhat opaque, so that > you actually have to specify whether you wanted to go to +1 > byte, codepoint, or grapheme. Well, first of all, +1 byte should not be available to a text iterator, at least not with the same iterator/position object that implements character and/or grapheme movement. (You seem to have thought about this issue a lot, but mixing bytes with text units makes wonder how much practical implementation you've done.) Second, incrementing to grapheme boundaries is relatively easy to do efficiently, just as incrementing to a UTF-8 character boundary is easy to do. We already do the latter, the former is pragmatically harder, but not a conceptual stretch. That's not the question. The question is how do we identify an arbitrary position in the text? Sometimes it's nice to have a numerical measure of size or location. It is not obvious that position by grapheme count is going to be the obvious way to determine position in a text. Eg, for languages with variable metric characters, character counts as a way of lining up table columns is going the way of Tyrannosaurus. In the Han-using languages, yes, column counts within lines are going to be important forever, because the characters are literally square for most practical purposes ... but they don't use composing characters (all the Japanese kana are precomposed, for example), so position by grapheme is going to be very close to position by character, and fine positioning will be done either by mouse or by incrementing the last few characters. Nor do I think operations like "advance 1,000,000 characters" will have less meaning than "advance 1,000,000 graphemes." Both of them are just a way of saying "go way far away", end up in about the same place, and where there's a bias, it will be pretty consistent in a statistical sense for any given natural language (and therefore, for 99% of users). > But once you [the language implementor] are providing correct > abstractions for grapheme movement, it's just as easy to also > provide an abstraction for codepoint movement, and make your > low-level implementation of the iterator object be a byte-offset > into a UTF8 buffer. Sure, that's fine for something that just iterates over the text. But if you actually need to remember positions, or regions, to jump to later or to communicate to other code that manipulates them, doing this stuff the straightforward way (just copying the whole iterator object to hang on to its state) becomes expensive. You end up proliferating types that all do the same kind of thing. Judicious use of inheritance helps, but getting the fundamental abstraction right is hard. Or least, Emacs hasn't found it in 20 years of trying. OTOH, all that stuff "just works" and just works efficiently, up to the grapheme vs. character issue, with an array. About that issue, to go back to tired old Emacs, *all* of the things I can think of that I might want to do by grapheme (display, insert, delete, move a few places) do fit the "increment until done" model. These things already work quite well for the variable-width buffer that "multilingual" Emacsen use, whether the old Mule encoding or UTF-8. So I can see how the UTF-8 model with appropriate iterators for characters and graphemes can work well for lots of applications and use cases. But Emacs already has opaque "markers", yet nevertheless the use of integer character positions in strings and buffers has survived. That *may* have to do with mutability, and the "all the world is a buffer" design, as Glyph suggested, but I think it more likely that markers are very expense to create and use compared to integers. Perhaps an editor of power similar to Emacs could be implemented with string operations on lines, or the like, and these issues would go away. But it's not obvious to me. Footnotes: [1] Yes, I know that not all programs are text editors. So shoot me. It's still the text manipulation program I know best, and it's not obvious to me that it's the unique class that would need these features. From stephen at xemacs.org Wed Nov 24 10:51:49 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 24 Nov 2010 18:51:49 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87lj4jaw22.fsf@uwakimon.sk.tsukuba.ac.jp> James Y Knight writes: > But, now, if your choices are UTF-8 or UTF-16, UTF-8 is clearly > superior [...]a because it is an ASCII superset, and thus more > easily compatible with other software. That also makes it most > commonly used for internet communication. Sure, UTF-8 is very nice as a protocol for communicating text. So what? If your application involves shoveling octets real fast, don't convert and shovel those octets. If your application involves significant text processing, well, conversion can almost always be done as fast as you can do I/O so it doesn't cost wallclock time, and generally doesn't require a huge percentage of CPU time compared to the actual text processing. It's just a specialization of serialization, that we do all the time for more complex data structures. So wire protocols are not a killer argument for or against any particular internal representation of text. > (So, there's a huge advantage for using it internally as well right > there: no transcoding necessary for writing your HTML output). I don't know your use cases but for mine, transcoding (whether in Lisp or Python or C) is invariably the least of my worries. *Especially* transcoding to UTF-8, which is the default codec for me, and I *never* mix bytes and text, so having not bothered to set the codec, I don't bother to transcode explicitly. > If you really want a fixed-width encoding, you have to go to > UTF-32 Not really. I never bothered implementing the codec, because I haven't yet seen a non-BMP Unicode character in the wild (I still see a lot of non-Unicode characters, but hey, that's the price you pay for living in the land that invented sushi, sake, and anime). For most use cases, those are going to be rare, where by "rare" I mean "you aren't going to see 6400 *different* non-BMP characters."[1] So instead of having the codec produce UTF-16, you have it produce (Holy CEF, Batman!) "pure" UCS-2 with the non-BMP characters registered on demand and encoded in the BMP private area. Python, of course, will never know the difference, and your language won't need to care, either. > But that's all a side issue: even if you do choose UTF-16 as your > underlying encoding, you *still* need to provide iterators that > work by "byte" (only now bytes are 16-bits), by codepoint, Nope, see above. Codepoints can be bytes and vice versa. The needed codec is no harder to use than any other codec, and only slightly less efficient than the normal UTF-8 codec unless you're basically restricted to a rather uncommon script (and even then there are optimizations). > and by grapheme. Sure, but as I point out elsewhere, the use cases where grapheme movement is distinguished from character movement I can come up with are all iterative, and I don't need array behavior for both anyway. So since I *can* have a character array in Unicode, and I *can't* have a grapheme array (except maybe by a scheme like the above), I'll go for the character array. Unless maybe you convince me I don't need it, but I'm yet to be convinced. > away with...just so long as you don't mind that you sometimes end > up splitting a string in the middle of a codepoint and causing a > unicode error! I *do* mind, but I like Python anyway. Footnotes: [1] OK, in practice a lot of the private space will be taken by existing system characters, such as the Apple logo (absolutely essential for writing email on Mac, at least in Japan). Whose use-case is going to see 1000 different non-BMP characters in a session? I do know a couple of Buddhist dictionary editors, but aside from them, I can't think of anybody. Lara Croft, maybe. From solipsis at pitrou.net Wed Nov 24 11:27:30 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 24 Nov 2010 11:27:30 +0100 Subject: [Python-Dev] len(chr(i)) = 2? References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> <87lj4jaw22.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20101124112730.6867fb17@pitrou.net> On Wed, 24 Nov 2010 18:51:49 +0900 "Stephen J. Turnbull" wrote: > James Y Knight writes: > > > But, now, if your choices are UTF-8 or UTF-16, UTF-8 is clearly > > superior [...]a because it is an ASCII superset, and thus more > > easily compatible with other software. That also makes it most > > commonly used for internet communication. > > Sure, UTF-8 is very nice as a protocol for communicating text. So > what? If your application involves shoveling octets real fast, don't > convert and shovel those octets. If your application involves > significant text processing, well, conversion can almost always be > done as fast as you can do I/O so it doesn't cost wallclock time, and > generally doesn't require a huge percentage of CPU time compared to > the actual text processing. It's just a specialization of > serialization, that we do all the time for more complex data > structures. > > So wire protocols are not a killer argument for or against any > particular internal representation of text. Agreed. Decoding and encoding utf-8 is so fast that it should be dwarfed by any actual processing done on the text. Regards Antoine. From solipsis at pitrou.net Wed Nov 24 12:37:54 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 24 Nov 2010 12:37:54 +0100 Subject: [Python-Dev] r86726 - python/branches/release27-maint/Objects/setobject.c References: <20101124103923.DC18EDE50@mail.python.org> Message-ID: <20101124123754.3b60d3a3@pitrou.net> On Wed, 24 Nov 2010 11:39:23 +0100 (CET) armin.rigo wrote: > Author: armin.rigo > Date: Wed Nov 24 11:39:23 2010 > New Revision: 86726 > > Log: > A no-op change. It looks like this call was not meant to be a recursive > call, but just call the helper (which the recursive call ends up doing). Since it's allegedly a no-op change, it doesn't come with a test, and 2.7.1 is in rc phase, is it really the right time to do it? What is the motivation for it? Thanks Antoine. > > > Modified: > python/branches/release27-maint/Objects/setobject.c > > Modified: python/branches/release27-maint/Objects/setobject.c > ============================================================================== > --- python/branches/release27-maint/Objects/setobject.c (original) > +++ python/branches/release27-maint/Objects/setobject.c Wed Nov 24 11:39:23 2010 > @@ -1858,7 +1858,7 @@ > tmpkey = make_new_set(&PyFrozenSet_Type, key); > if (tmpkey == NULL) > return -1; > - rv = set_contains(so, tmpkey); > + rv = set_contains_key(so, tmpkey); > Py_DECREF(tmpkey); > } > return rv; From fuzzyman at voidspace.org.uk Wed Nov 24 13:30:15 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 24 Nov 2010 12:30:15 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> Message-ID: <4CED0557.9090101@voidspace.org.uk> On 23/11/2010 14:16, Nick Coghlan wrote: > On Tue, Nov 23, 2010 at 11:50 PM, Michael Foord > wrote: >> PEP 354 was rejected for two primary reasons - lack of interest and nowhere >> obvious to put it. Would it be *so bad* if an enum type lived in its own >> module? There is certainly more interest now, and if we are to use something >> like this in the standard library it *has* to be in the standard library >> (unless every module implements their own private _Constant class). >> >> Time to revisit the PEP? > If you (or anyone else) wanted to revisit the PEP, then I would advise > trawling through the standard library looking for constants that could > be sensibly converted to enum values. Based on a non-exhaustive search, Python standard library modules currently using integers for constants: * re - has flags (OR'able constants) defined in sre_constants, each flag has two names (e.g. re.IGNORECASE and re.I) * os has SEEK_SET, SEEK_CUR, SEEK_END - *plus* those implemented in posix / nt * doctest has its own flag system, but is really just using integer flags / constants (quite a few of them) * token has a tonne of constants (autogenerated) * socket exports a bunch of constants defined in _socket * gzip has flags: FTEXT, FHCRC, FEXTRA, FNAME, FCOMMENT * errno (builtin module) EALREADY, EINPROGRESS, EWOULDBLOCK, ECONNRESET, EINVAL, ENOTCONN, ESHUTDOWN, EINTR, EISCONN, EBADF, ECONNABORTED * opcode has HAVE_ARGUMENT, EXTENDED_ARG. In fact pretty much the whole of opcode is about defining and exposing named constants * msilib uses flag constants * multiprocessing.pool - RUN, CLOSE, TERMINATE * multiprocessing.util - NOTSET, SUBDEBUG, DEBUG, INFO, SUBWARNING * xml.dom and xml.dom.Node (in __init__.py) have a bunch of constants * xml.dom.NodeFilter.NodeFilter holds a bunch of constants (some of them flags) * xmlrpc.client has a bunch of error constants * calendar uses constants to represent weekdays, plus one for the EPOCH that is best left alone * http.client has a tonne of constants - recognisable as ports / error codes though * dis has flags in COMPILER_FLAG_NAMES, which are then set as locals in inspect * io defines SEEK_SET, SEEK_CUR, SEEK_END (same as os) Where constants are implemented in C but exported via a Python module (the constants exported by os and socket for example) they could be wrapped. Where they are exported directly by a C extension or builtin module (e.g. errno) they are probably best left. Raymond feels that having an enum / constant type would be Javaesque and unused. If we used it in the standard library the unused fear at least would be unwarranted. The change would be largely transparent to developers, except they get better debugging info. Twisted is also looking for an enum / constant type: http://twistedmatrix.com/trac/ticket/4671 Because we would need to subclass from int for backwards compatibility we can't (unless the base class is set dynamically which I don't propose) it couldn't replace float / string constants. Hopefully it would still be sufficient to allow Twisted to use it. (Although they do so love reimplementing parts of the standard library - usually better than the standard library it has to be said.) All the best, Michael There are a tonne of constants that are used as numbers (MAX_LINE_LENGTH appears in a few places) and aren't just arbitrary constants. There are also some other interesting ones: * pty has STDIN_FILENO, STDOUT_FILENO, STDERR_FILENO, CHILD * poplib has POP3_PORT, POP3_SSL_PORT - recognisable as port numbers, should be left as ints * datetime.py has MINYEAR and MAXYEAR * colorsys has float constants * tty uses constants for termios list indexes (used as numbers I guess) * curses.ascii has a whole bunch of integer constants referring to ascii characters * Several modules - decimal, concurrent.futures, uuid (and now inspect) already use strings > A decision would also need to be made as to whether or not to subclass > int, or just provide __index__ (the former has the advantage of being > able to drop cleanly into OS level APIs that expect a numerical > constant). > > Whether enums should provide arbitrary name-value mappings (ala C > enums) or were restricted to sequential indices starting from zero > would be another question best addressed by a code survey of at least > the stdlib. > > And getgeneratorstate() doesn't count as a use case, since the > ordering isn't needed and using string literals instead of integers > will cover the debugging aspect :) > > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From ncoghlan at gmail.com Wed Nov 24 15:08:04 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 25 Nov 2010 00:08:04 +1000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CED0557.9090101@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> Message-ID: On Wed, Nov 24, 2010 at 10:30 PM, Michael Foord wrote: > Based on a non-exhaustive search, Python standard library modules currently > using integers for constants: Thanks for that review. I think following up on the "NamedConstant" idea may make more sense than pursuing enums in their own right. That way we could get the debugging benefits on the Python side regardless of any type constraints on the value (e.g. needing to be an integer in order to interface to C code), without needing to design an enum API that suited all purposes. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From exarkun at twistedmatrix.com Wed Nov 24 16:01:06 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Wed, 24 Nov 2010 15:01:06 -0000 Subject: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a) In-Reply-To: <1290585733.3642.2.camel@localhost.localdomain> References: <4CEB3F72.7000006@m2.ccsnet.ne.jp> <20101123150219.29e20374@pitrou.net> <720EFE43-119F-4F2F-BCB1-939275B5FA6E@twistedmatrix.com> <1290585733.3642.2.camel@localhost.localdomain> Message-ID: <20101124150106.2109.660794265.divmod.xquotient.197@localhost.localdomain> On 08:02 am, solipsis at pitrou.net wrote: >Le mardi 23 novembre 2010 ? 20:56 -0500, Glyph Lefkowitz a ?crit : >>On Nov 23, 2010, at 9:02 AM, Antoine Pitrou wrote: >> >> > On Tue, 23 Nov 2010 00:07:09 -0500 >> > Glyph Lefkowitz wrote: >> >> On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto < >> >> ocean-city at m2.ccsnet.ne.jp> wrote: >> >> >> >>> Hello. Does this affect python? Thank you. >> >>> >> >>> http://www.openssl.org/news/secadv_20101116.txt >> >>> >> >> >> >> No. >> > >> > Well, actually it does, but Python links against the system OpenSSL >>on >> > most platforms (except Windows), so it's up to the OS vendor to >>apply >> > the patch. >> >> >>It does? If so, I must have misunderstood the vulnerability. Can you >>explain how it affects Python? > >If I believe the link above: > 1CAny OpenSSL based TLS server is vulnerable if it is multi-threaded and >uses OpenSSL's internal caching mechanism. Servers that are >multi-process and/or disable internal session caching are NOT >affected. 1D > >So, you just have to create a multithreaded TLS server which doesn't >disable server-side session caching (it is enabled by default according >to http://www.openssl.org/docs/ssl/SSL_CTX_set_session_cache_mode.html >) Hm. The session cache is enabled by default, but nothing will ever use it unless the server specifies a session id using SSL_set_session_id_context or SSL_CTX_set_session_id_context. Python doesn't expose these, so I don't think any Python SSL server can set them. The vulnerability announcement isn't 100% clear on this, but I took a look at the patch which fixes the issue and it /appears/ as though if a client never tries to re-use a session then you will be safe from this bug. However, perhaps this only means that only malicious clients (which send a session id even when they can't actually have one) will be able to trigger the bug. Or I may misunderstand how SSL sessions work in OpenSSL entirely. The documentation for them is on par with that for most of the rest of OpenSSL. Jean-Paul From solipsis at pitrou.net Wed Nov 24 16:11:20 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 24 Nov 2010 16:11:20 +0100 Subject: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a) References: <4CEB3F72.7000006@m2.ccsnet.ne.jp> <20101123150219.29e20374@pitrou.net> <720EFE43-119F-4F2F-BCB1-939275B5FA6E@twistedmatrix.com> <1290585733.3642.2.camel@localhost.localdomain> <20101124150106.2109.660794265.divmod.xquotient.197@localhost.localdomain> Message-ID: <20101124161120.5ddd106c@pitrou.net> On Wed, 24 Nov 2010 15:01:06 -0000 exarkun at twistedmatrix.com wrote: > > > >If I believe the link above: > > 1CAny OpenSSL based TLS server is vulnerable if it is multi-threaded and > >uses OpenSSL's internal caching mechanism. Servers that are > >multi-process and/or disable internal session caching are NOT > >affected. 1D > > > >So, you just have to create a multithreaded TLS server which doesn't > >disable server-side session caching (it is enabled by default according > >to http://www.openssl.org/docs/ssl/SSL_CTX_set_session_cache_mode.html > >) > > Hm. The session cache is enabled by default, but nothing will ever use > it unless the server specifies a session id using > SSL_set_session_id_context or SSL_CTX_set_session_id_context. Python > doesn't expose these, so I don't think any Python SSL server can set > them. Well, Python calls SSL_CTX_set_session_id_context() implicitly, starting from 3.2 (precisely so that the session cache gets used). The "documentation" I've found about the "session id context" seems to suggest that a process-wide constant is enough. (and you can verify that caching occurs using the new SSLContext.session_stats() method) > Or I may misunderstand how SSL sessions work in OpenSSL entirely. The > documentation for them is on par with that for most of the rest of > OpenSSL. Agreed. Regards Antoine. From steve at pearwood.info Wed Nov 24 16:44:57 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 25 Nov 2010 02:44:57 +1100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> Message-ID: <4CED32F9.5050004@pearwood.info> Nick Coghlan wrote: > On Wed, Nov 24, 2010 at 10:30 PM, Michael Foord > wrote: >> Based on a non-exhaustive search, Python standard library modules currently >> using integers for constants: > > Thanks for that review. I think following up on the "NamedConstant" > idea may make more sense than pursuing enums in their own right. Pardon me if I've missed something in this thread, but when you say "NamedConstant", do you mean actual constants that can only be bound once but not re-bound? If so, +1. If not, what do you mean? I thought PEP 3115 could be used to implement such constants, but I can't get it to work... class readonlydict(dict): def __setitem__(self, key, value): if key in self: raise TypeError("can't rebind constant") dict.__setitem__(self, key, value) # Need to also handle updates, del, pop, etc. class MetaConstant(type): @classmethod def __prepare__(metacls, name, bases): return readonlydict() def __new__(cls, name, bases, classdict): assert type(classdict) is readonlydict return type.__new__(cls, name, bases, classdict) class Constant(metaclass=MetaConstant): a = 1 b = 2 c = 3 What I expect is that Constant.a should return 1, and Constant.a=2 should raise TypeError, but what I get is a normal class __dict__. >>> Constant.a 1 >>> Constant.a = 2 >>> Constant.a 2 -- Steven From exarkun at twistedmatrix.com Wed Nov 24 17:23:12 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Wed, 24 Nov 2010 16:23:12 -0000 Subject: [Python-Dev] OpenSSL Vulnerability (openssl-1.0.0a) In-Reply-To: <20101124161120.5ddd106c@pitrou.net> References: <4CEB3F72.7000006@m2.ccsnet.ne.jp> <20101123150219.29e20374@pitrou.net> <720EFE43-119F-4F2F-BCB1-939275B5FA6E@twistedmatrix.com> <1290585733.3642.2.camel@localhost.localdomain> <20101124150106.2109.660794265.divmod.xquotient.197@localhost.localdomain> <20101124161120.5ddd106c@pitrou.net> Message-ID: <20101124162312.2109.1025683352.divmod.xquotient.215@localhost.localdomain> On 03:11 pm, solipsis at pitrou.net wrote: >On Wed, 24 Nov 2010 15:01:06 -0000 >exarkun at twistedmatrix.com wrote: >> > >> >If I believe the link above: >> > 1CAny OpenSSL based TLS server is vulnerable if it is multi-threaded >>and >> >uses OpenSSL's internal caching mechanism. Servers that are >> >multi-process and/or disable internal session caching are NOT >> >affected. 1D >> > >> >So, you just have to create a multithreaded TLS server which doesn't >> >disable server-side session caching (it is enabled by default >>according >> >to >>http://www.openssl.org/docs/ssl/SSL_CTX_set_session_cache_mode.html >> >) >> >>Hm. The session cache is enabled by default, but nothing will ever >>use >>it unless the server specifies a session id using >>SSL_set_session_id_context or SSL_CTX_set_session_id_context. Python >>doesn't expose these, so I don't think any Python SSL server can set >>them. > >Well, Python calls SSL_CTX_set_session_id_context() implicitly, >starting >from 3.2 (precisely so that the session cache gets used). The >"documentation" I've found about the "session id context" seems to >suggest that a process-wide constant is enough. Ah. Okay, then Python 3.2 would be vulnerable. Good thing it isn't released yet. ;) Jean-Paul From benjamin at python.org Wed Nov 24 17:32:56 2010 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 24 Nov 2010 10:32:56 -0600 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CED32F9.5050004@pearwood.info> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED32F9.5050004@pearwood.info> Message-ID: 2010/11/24 Steven D'Aprano : > Nick Coghlan wrote: >> >> On Wed, Nov 24, 2010 at 10:30 PM, Michael Foord >> wrote: >>> >>> Based on a non-exhaustive search, Python standard library modules >>> currently >>> using integers for constants: >> >> Thanks for that review. I think following up on the "NamedConstant" >> idea may make more sense than pursuing enums in their own right. > > Pardon me if I've missed something in this thread, but when you say > "NamedConstant", do you mean actual constants that can only be bound once > but not re-bound? If so, +1. If not, what do you mean? > > I thought PEP 3115 could be used to implement such constants, but I can't > get it to work... > > class readonlydict(dict): > ? ?def __setitem__(self, key, value): > ? ? ? ?if key in self: > ? ? ? ? ? ?raise TypeError("can't rebind constant") > ? ? ? ?dict.__setitem__(self, key, value) > ? ?# Need to also handle updates, del, pop, etc. > > class MetaConstant(type): > ? ?@classmethod > ? ?def __prepare__(metacls, name, bases): > ? ? ? ?return readonlydict() > ? ?def __new__(cls, name, bases, classdict): > ? ? ? ?assert type(classdict) is readonlydict > ? ? ? ?return type.__new__(cls, name, bases, classdict) > > class Constant(metaclass=MetaConstant): > ? ?a = 1 > ? ?b = 2 > ? ?c = 3 > > > What I expect is that Constant.a should return 1, and Constant.a=2 should > raise TypeError, but what I get is a normal class __dict__. The construction namespace can be customized, but class.__dict__ must always be a real dict. -- Regards, Benjamin From jsbueno at python.org.br Wed Nov 24 18:23:57 2010 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Wed, 24 Nov 2010 15:23:57 -0200 Subject: [Python-Dev] Fwd: constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> Message-ID: Hi -- If I may add my 0.02 cents - this sample has a sample implementation of the proposed features I found most interesting up to now: 1) inherit from int 2) display the constant's name on 'repr' 3) optionally populate a module with the constants 4) Optionally provide a starting value for the enum 5) Optionally provide a mapping with the values http://pastebin.com/6f1u35qJ (implementation is in python 2) Todo here: 6) Make them "read only" 7) Make the base type optional, with "int" as default - but also being able to create "constants" inheriting from other objects 8) more ideas? I am willing to play along this sample code as discussion goes on if there is any feedback. ?js ?-><- From alexander.belopolsky at gmail.com Wed Nov 24 18:37:43 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 24 Nov 2010 12:37:43 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Nov 23, 2010 at 2:18 PM, Amaury Forgeot d'Arc wrote: .. >> Given the apparent difficulty of writing even basic text processing >> algorithms in presence of surrogate pairs, I wonder how wise it is to >> expose Python users to them. > > This was already discussed two years ago: > > http://mail.python.org/pipermail/python-dev/2008-July/080900.html > Thanks for the link. Let me summarize that discussion as I read it. The discussion starts with a reference to Guido's 2001 post which concluded with """ ... if we had wanted to use a variable-lenth internal representation, we should have picked UTF-8 way back, like Perl did. Moving to a UTF-16-based internal representation now will give us all the problems of the Perl choice without any of the benefits. """ [1] and proposes to move to USC-4 completely for Python 3.0. Note that this is not the option that I would like to discuss here. I don't propose to discuss abandoning narrow builds. Instead, I would like to discuss the costs and benefits associated with using variable width CES as an internal representation. This is where the 2008 discussion moved. OP did not realize that narrow build supported UTF-16 and like myself was surprised that application developers should be aware of surrogates if they want to use narrow builds. It was also suggested that Python itself is likely to have many bugs that can be triggered by non-BMP characters on narrow builds. Guido's response was: """ I'd also prefer to receive bug reports about breakages actually encountered in the wild than purely theoretical issues """ I don't think this is a good position to take. Programs that expect one code unit where Python may produce two are likely to have security holes. Even when programmers carefully sanitize their input, they are likely to do it at the code point level based on Unicode category and 0xFFFF boundary does not mean anything special for their applications. I think anyone who wants to write a robust application has two choices in practice: (a) use wide Unicode build; (b) restrict all text to BMP. Supporting surrogates at the application level is likely to be prohibitively expensive. It was later suggested that the main benefit of "UTF-16" builds is that they can easily interface with system libraries that are "UTF-16" based. However, how likely are these libraries be bug-free when it comes to non-BMP characters? The history teaches us that not very likely. Daniel Arbuckle presented arguments against imposing the burden of dealing with surrogates on application writers. [2] The recurrent theme on the thread was that non-BMP characters are rare and those who need them can afford the extra development cost associated with the surrogates. This point was very eloquently articulated by Guido: """ Who are the many here? Who are the few? I'd venture that (at least for the foreseeable future, say, until China will finally have taken over the role of the US as the de-facto dominant super power :-) the many are people whose app will never see a Unicode character outside the BMP, or who do such minimal string processing that their code doesn't care whether it's handling UTF-16-encoded data. """ [3] This argument can also be used to support the position that narrow builds should not support non-BMP characters. Later the discussion started resembling this thread when it went into a scholastic dispute over fine points in Unicode Standard terminology. :-) Then BDFL vetoed len(u"\U00012345") returning 1 on narrow builds. [4] I would be against that as well. I don't see len("\U00012345") == 2 as a big problem because application developers can simply avoid using \U literals if they don't want to support non-BMP characters. On the other hand, an option to warn users about non-BMP literals on a narrow build may be useful but it is easy to implement in lint-like tools. There were multiple suggestions for standard library additions to help application writers to deal with surrogate pairs, but as far as I can tell, nothing has been done in this area in the following two years. I don't think there is a recipe on how to fix legacy character-by-character processing loop such as for c in string: ... to make it iterate over code points consistently in wide and narrow builds. (Note that I am not asking for a grapheme iterator here. This is clearly an application level feature.) > So yes, wrap() and center() should be fixed. I opened an issue 10521 for that. [5] I am fully prepared to see it dismissed as "theoretical" and be closed with "won't fix" or linger indefinitely. Fixing it would most likely involve writing the second version of pad() utility function specifically for the narrow build. All examples I've seen in Python C code of dealing with surrogates came with hand-coded #ifndef Py_UNICODE_WIDE fragments and no user-friendly macros or APIs that would abstract it away. A quick grep for maxunicode in the standard library revealed only one case of "narrow-build aware" code: if sys.maxunicode != 65535: # XXX: negation does not work with big charsets return charset See Lib/sre_compile.py. Not exactly a model to follow. To conclude, I feel that rather than trying to fully support non-BMP characters as surrogate pairs in narrow builds, we should make it easier for application developers to avoid them. If abandoning internal use of UTF-16 is not an option, I think we should at least add an option for decoders that currently produce surrogate pairs to treat non-BMP characters as errors and handle them according to user's choice. [1] http://mail.python.org/pipermail/i18n-sig/2001-June/001107.html [2] http://mail.python.org/pipermail/python-dev/2008-July/080912.html [3] http://mail.python.org/pipermail/python-dev/2008-July/080940.html [4] http://mail.python.org/pipermail/python-dev/2008-July/080916.html [5] http://bugs.python.org/issue10521 From fuzzyman at voidspace.org.uk Wed Nov 24 18:41:08 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 24 Nov 2010 17:41:08 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> Message-ID: <4CED4E34.5060400@voidspace.org.uk> On 24/11/2010 14:08, Nick Coghlan wrote: > On Wed, Nov 24, 2010 at 10:30 PM, Michael Foord > wrote: >> Based on a non-exhaustive search, Python standard library modules currently >> using integers for constants: > Thanks for that review. I think following up on the "NamedConstant" > idea may make more sense than pursuing enums in their own right. That > way we could get the debugging benefits on the Python side regardless > of any type constraints on the value (e.g. needing to be an integer in > order to interface to C code), without needing to design an enum API > that suited all purposes. Can you explain what you see as the difference? I'm not particularly interested in type validation but I like the fact that typical enum APIs allow you to group constants: the generated constant class acts as a namespace for all the defined constants. Are you just suggesting something along the lines of: class NamedConstant(int): def __new__(cls, name, val): return int.__new__(cls, val) def __init__(self, name, val): self._name = name def __repr__(self): return ' ' % self._name FOO = NamedConstant('FOO', 3) In general the less features the better, but I'd like a few more features than that. :-) All the best, Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From mal at egenix.com Wed Nov 24 19:50:57 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 24 Nov 2010 19:50:57 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CED5E91.9070705@egenix.com> Alexander Belopolsky wrote: > To conclude, I feel that rather than trying to fully support non-BMP > characters as surrogate pairs in narrow builds, we should make it > easier for application developers to avoid them. I don't understand what you're after here. Programmers can easily avoid them by not using them :-) > If abandoning > internal use of UTF-16 is not an option, I think we should at least > add an option for decoders that currently produce surrogate pairs to > treat non-BMP characters as errors and handle them according to user's > choice. But what do you gain by doing this ? You'd lose the round-trip safety of those codecs and that's not a good thing. Note that most text processing APIs in Python work based on code units, which in most cases represent single code points, but in some cases can also represent surrogates (both on UCS-2 and on UCS-4 builds). E.g. str.center(n) centers the string in a padded string that is composed of n code units. Whether that operation will result in a text that's centered visually on output is a completely different story. The original string could contain surrogates, it could also contain combing code points, so the visual presentation of the result may very well not be centered at all; it may not even appear as having the length n to the user. Since we're not going change the semantics of those APIs, it is OK to not support padding with non-BMP code points on UCS-2 builds. Supporting such cases would only cause problems: * if the methods would pad with surrogates, the resulting string would no longer have length n; breaking the assumption that len(str.center(n)) == n * if the methods would pad with half the number of surroagtes to make sure that len(str.center(n)) == n, the resulting output to e.g. a terminal would be further off, than what you already have with surrogates and combining code points in the original string. More on codecs supporting surrogates: http://mail.python.org/pipermail/python-dev/2008-July/080915.html Perhaps it's time to reconsider a project I once started but that never got off the ground: http://mail.python.org/pipermail/python-dev/2008-July/080911.html Here's the pre-PEP: http://mail.python.org/pipermail/python-dev/2001-July/015938.html -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 24 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From brett at python.org Wed Nov 24 20:04:01 2010 From: brett at python.org (Brett Cannon) Date: Wed, 24 Nov 2010 11:04:01 -0800 Subject: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS In-Reply-To: <4CEC4917.2070508@udel.edu> References: <20101123203252.39BE7EE9CF@mail.python.org> <4CEC43A4.80907@netwok.org> <4CEC4917.2070508@udel.edu> Message-ID: On Tue, Nov 23, 2010 at 15:07, Terry Reedy wrote: > > > On 11/23/2010 5:43 PM, ?ric Araujo wrote: >>> >>> Modified: python/branches/py3k/Misc/ACKS >>> >>> ============================================================================== >>> --- python/branches/py3k/Misc/ACKS ? ? ?(original) >>> +++ python/branches/py3k/Misc/ACKS ? ? ?Tue Nov 23 21:32:47 2010 >>> @@ -1,4 +1,4 @@ >>> -Acknowledgements >>> +?Acknowledgements >> >> This change introduced a so-called UTF-8 BOM in the file. ?Is >> TortoiseSvn the culprit or a text editor? > > I used Notepad to edit the file, TortoiseSvn to commit, the same as I did > for #9222, rev86702, Lib\idlelib\IOBinding.py, yesterday. > If the latter is OK, perhaps *.py gets filtered better than misc. text > files. I believe I have the config as specified in dev/faq. Adding the BOM will be an editor thing, not a svn thing. Doing a Google search for [ms notepad bom] shows that Notepad did the "helpful", invisible edit. -Brett > > [miscellany] > enable-auto-props = yes > > [auto-props] > * = svn:eol-style=native > *.c = svn:keywords=Id > *.h = svn:keywords=Id > *.py = svn:keywords=Id > *.txt = svn:keywords=Author Date Id Revision > > Terry > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brett%40python.org > From tjreedy at udel.edu Wed Nov 24 20:25:17 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 24 Nov 2010 14:25:17 -0500 Subject: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS In-Reply-To: References: <20101123203252.39BE7EE9CF@mail.python.org> <4CEC43A4.80907@netwok.org> <4CEC4917.2070508@udel.edu> Message-ID: On 11/24/2010 2:04 PM, Brett Cannon wrote: > On Tue, Nov 23, 2010 at 15:07, Terry Reedy wrote: >> I used Notepad to edit the file, TortoiseSvn to commit, the same as I did >> for #9222, rev86702, Lib\idlelib\IOBinding.py, yesterday. >> If the latter is OK, perhaps *.py gets filtered better than misc. text >> files. I believe I have the config as specified in dev/faq. > > Adding the BOM will be an editor thing, not a svn thing. Doing a > Google search for [ms notepad bom] shows that Notepad did the > "helpful", invisible edit. So I presume it did the same with IOBinding.py. Does *.py get filtered is a way that could be extended to no-extention files? Do *.txt files get BOM filtered off? Should all text files in repository have some extension (default .txt)? More to the point, can better filtering be added to the new hg repository? Or can a local Windows hg setup have such filtering on local commits before pushing? I know now that I could always edit with IDLE's editor, but it is a lot easier to right click and select edit than it is to run thru the directory tree in an open dialog. And of course, since the pseudo-BOM addition is undocumented within notepad itself, and probably other editors, it is easy to not know. -- Terry Jan Reedy From g.brandl at gmx.net Wed Nov 24 21:04:40 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 24 Nov 2010 21:04:40 +0100 Subject: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS In-Reply-To: References: <20101123203252.39BE7EE9CF@mail.python.org> <4CEC43A4.80907@netwok.org> <4CEC4917.2070508@udel.edu> Message-ID: Am 24.11.2010 20:25, schrieb Terry Reedy: > On 11/24/2010 2:04 PM, Brett Cannon wrote: >> On Tue, Nov 23, 2010 at 15:07, Terry Reedy wrote: > >>> I used Notepad to edit the file, TortoiseSvn to commit, the same as I did >>> for #9222, rev86702, Lib\idlelib\IOBinding.py, yesterday. >>> If the latter is OK, perhaps *.py gets filtered better than misc. text >>> files. I believe I have the config as specified in dev/faq. >> >> Adding the BOM will be an editor thing, not a svn thing. Doing a >> Google search for [ms notepad bom] shows that Notepad did the >> "helpful", invisible edit. > > So I presume it did the same with IOBinding.py. Does *.py get filtered > is a way that could be extended to no-extention files? Do *.txt files > get BOM filtered off? Should all text files in repository have some > extension (default .txt)? > > More to the point, can better filtering be added to the new hg > repository? Or can a local Windows hg setup have such filtering on local > commits before pushing? Of course it can; it's just a matter of writing the respective hooks. What we *can* do in any case is to check for UTF-8 "BOMs" server-side in the whitespace checking hook. > I know now that I could always edit with IDLE's editor, but it is a lot > easier to right click and select edit than it is to run thru the > directory tree in an open dialog. And of course, since the pseudo-BOM > addition is undocumented within notepad itself, and probably other > editors, it is easy to not know. It should show up as an invisible change in the first line of a file when you look at a "svn diff". (It is a very good practice to look at a diff before committing anyway.) Georg From alexander.belopolsky at gmail.com Wed Nov 24 21:06:25 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 24 Nov 2010 15:06:25 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CED5E91.9070705@egenix.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CED5E91.9070705@egenix.com> Message-ID: On Wed, Nov 24, 2010 at 1:50 PM, M.-A. Lemburg wrote: .. >> add an option for decoders that currently produce surrogate pairs to >> treat non-BMP characters as errors and handle them according to user's >> choice. > > But what do you gain by doing this ? You'd lose the round-trip > safety of those codecs and that's not a good thing. > Any non-trivial text processing is likely to be broken in presence of surrogates. Producing them on input is just trading known issue for an unknown one. Processing surrogate pairs in python code is hard. Software that has to support non-BMP characters will most likely be written for a wide build and contain subtle bugs when run under a narrow build. Note that my latest proposal does not abolish surrogates outright. Users who want them can still use something like "surrogateescape" error handler for non-BMP characters. > Since we're not going change the semantics of those APIs, > it is OK to not support padding with non-BMP code points on > UCS-2 builds. > Well, I think more users are willing to accept slightly misaligned text in their web-app logs than those willing to cope with Traceback (most recent call last): ... TypeError: The fill character must be exactly one character long there. Yes, allowing non-trusted users to specify fill character is unlikely, but it is quite likely that naive slicing or iteration over string units would result in Traceback (most recent call last): ... UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed > Supporting such cases would only cause problems: > > * if the methods would pad with surrogates, the resulting > ?string would no longer have length n; breaking the > ?assumption that len(str.center(n)) == n > I agree, but how is this different from breaking the assumption that len(chr(i)) == 1? > * if the methods would pad with half the number of surroagtes > ?to make sure that len(str.center(n)) == n, the resulting > ?output to e.g. a terminal would be further off, than what > ?you already have with surrogates and combining code points > ?in the original string. > I agree again. What I suggested on the tracker, supporting non-BMP characters in narrow builds should mean that library functions given input with the same UCS-4 encoding should produce output with the same UCS-4 encoding. > Perhaps it's time to reconsider a project I once started > but that never got off the ground: > > ?http://mail.python.org/pipermail/python-dev/2008-July/080911.html > > Here's the pre-PEP: > > ?http://mail.python.org/pipermail/python-dev/2001-July/015938.html I agree again, but I feel that exposing code units rather than code points at the Python string level takes us back to 2.x days of mixing bytes and strings. Let me quote Guido circa 2001 again: """ ... if we had wanted to use a variable-lenth internal representation, we should have picked UTF-8 way back, like Perl did. Moving to a UTF-16-based internal representation now will give us all the problems of the Perl choice without any of the benefits. """ I don't understand what changed since 2001 that made this argument invalid. I note that an opinion has been raised on this thread that if we want compressed internal representation for strings, we should use UTF-8. I tend to agree, but UTF-8 has been repeatedly rejected as too hard to implement. What makes UTF-16 easier than UTF-8? Only the fact that you can ignore bugs longer, in my view. From g.brandl at gmx.net Wed Nov 24 21:24:49 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 24 Nov 2010 21:24:49 +0100 Subject: [Python-Dev] [Preview] Comments and change proposals on documentation Message-ID: Hi, at , you can look at a version of the 3.2 docs that has the upcoming commenting feature. JavaScript is mandatory. I've switched on anonymous comments for testing, but usually at least comments from anonymous users can be moderated. Be sure to test the "propose a change" feature too. Login currently allows OpenID exclusively. Credits go to Jacob Mason, whose GSOC project is responsible for almost all of what you see there. [1] Please test on a smaller page, such as , there is currently a speed issue with larger pages. (Helpful tips from JS experts are welcome.) Other things I have to do before this can go live: * reuse existing logins from either wiki or tracker? * (re)Captcha integration for anonymous comments * easier moderation (currently emails are sent on new comments) * facility for (semi)automatic applying of proposals (once Hg is live, this should be easy to do due to the separation between commit and merge) * allow commenting on code blocks (figure out where to place the "bubble") Any feedback is appreciated (I'd suggest mailing it to doc-SIG only, to avoid cluttering up python-dev). Have fun, Georg [1] The source for the webapp is at , but most of the functionality is implemented in Sphinx trunk. From anurag.chourasia at gmail.com Wed Nov 24 22:01:32 2010 From: anurag.chourasia at gmail.com (Anurag Chourasia) Date: Thu, 25 Nov 2010 02:31:32 +0530 Subject: [Python-Dev] collect2: library libpython2.6 not found while building extensions (--enable-shared) Message-ID: All, When I configure python to enable shared libraries, none of the extensions are getting built during the make step due to this error. building 'cStringIO' extension gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I. -I/u01/home/apli/wm/GDD/Python-2.6.6/./Include -I. -IInclude -I./Include -I/opt/freeware/include -I/opt/freeware/include/readline -I/opt/freeware/include/ncurses -I/usr/local/include -I/u01/home/apli/wm/GDD/Python-2.6.6/Include -I/u01/home/apli/wm/GDD/Python-2.6.6 -c /u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.c -o build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.o ./Modules/ld_so_aix gcc -pthread -bI:Modules/python.exp build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.o -L/usr/local/lib *-lpython2.6* -o build/lib.aix-5.3-2.6/cStringIO.so *collect2: library libpython2.6 not found* building 'cPickle' extension gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I. -I/u01/home/apli/wm/GDD/Python-2.6.6/./Include -I. -IInclude -I./Include -I/opt/freeware/include -I/opt/freeware/include/readline -I/opt/freeware/include/ncurses -I/usr/local/include -I/u01/home/apli/wm/GDD/Python-2.6.6/Include -I/u01/home/apli/wm/GDD/Python-2.6.6 -c /u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.c -o build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.o ./Modules/ld_so_aix gcc -pthread -bI:Modules/python.exp build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.o -L/usr/local/lib *-lpython2.6* -o build/lib.aix-5.3-2.6/cPickle.so *collect2: library libpython2.6 not found* This is on AIX 5.3, GCC 4.2, Python 2.6.6 I can confirm that there is a libpython2.6.a file in the top level directory from where I am doing the configure/make etc Here are the options supplied to the configure command ./configure --enable-shared --disable-ipv6 --with-gcc=gcc CPPFLAGS="-I /opt/freeware/include -I /opt/freeware/include/readline -I /opt/freeware/include/ncurses" Please guide me in getting past this error. Thanks for your help on this. Regards, Anurag -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Wed Nov 24 23:13:50 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 24 Nov 2010 23:13:50 +0100 Subject: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS In-Reply-To: References: <20101123203252.39BE7EE9CF@mail.python.org> <4CEC43A4.80907@netwok.org> <4CEC4917.2070508@udel.edu> Message-ID: <4CED8E1E.5050400@v.loewis.de> > So I presume it did the same with IOBinding.py. No. This file contains only ASCII characters, so notepad has decided to not add the BOM. Regards, Martin From dreamingforward at gmail.com Thu Nov 25 00:38:01 2010 From: dreamingforward at gmail.com (average) Date: Wed, 24 Nov 2010 16:38:01 -0700 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> Message-ID: Is immutability a general need that should have general solution? By generalizing the idea to lists/tuples, set/frozenset, dicts, and strings (for example), it seems one could simplify the container classes, eliminate code complexity, and perhaps improve resource utilization. mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Nov 25 00:41:58 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 25 Nov 2010 00:41:58 +0100 Subject: [Python-Dev] r86731 - in python/branches/py3k: Lib/distutils/command/install.py Lib/distutils/sysconfig.py Lib/sysconfig.py Makefile.pre.in Misc/python.pc.in configure configure.in References: <20101124194347.C5C86EEA56@mail.python.org> Message-ID: <20101125004158.32b1ceaa@pitrou.net> On Wed, 24 Nov 2010 20:43:47 +0100 (CET) barry.warsaw wrote: > Author: barry.warsaw > Date: Wed Nov 24 20:43:47 2010 > New Revision: 86731 > > Log: > Final patch for issue 9807. This seems to have broken compilation under Windows: Build started: Project: ssl, Configuration: Debug|Win32 Performing Makefile project actions Traceback (most recent call last): File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 519, in main() File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 507, in main known_paths = addusersitepackages(known_paths) File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 253, in addusersitepackages user_site = getusersitepackages() File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 228, in getusersitepackages user_base = getuserbase() # this will also set USER_BASE File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 218, in getuserbase USER_BASE = get_config_var('userbase') File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\sysconfig.py", line 586, in get_config_var return get_config_vars().get(name) File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\sysconfig.py", line 478, in get_config_vars _CONFIG_VARS['abiflags'] = sys.abiflags AttributeError: 'module' object has no attribute 'abiflags' Regards Antoine. From barry at python.org Thu Nov 25 00:50:25 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 24 Nov 2010 18:50:25 -0500 Subject: [Python-Dev] r86731 - in python/branches/py3k: Lib/distutils/command/install.py Lib/distutils/sysconfig.py Lib/sysconfig.py Makefile.pre.in Misc/python.pc.in configure configure.in In-Reply-To: <20101125004158.32b1ceaa@pitrou.net> References: <20101124194347.C5C86EEA56@mail.python.org> <20101125004158.32b1ceaa@pitrou.net> Message-ID: <20101124185025.6cb67127@mission> On Nov 25, 2010, at 12:41 AM, Antoine Pitrou wrote: >On Wed, 24 Nov 2010 20:43:47 +0100 (CET) >barry.warsaw wrote: >> Author: barry.warsaw >> Date: Wed Nov 24 20:43:47 2010 >> New Revision: 86731 >> >> Log: >> Final patch for issue 9807. > >This seems to have broken compilation under Windows: > >Build started: Project: ssl, Configuration: Debug|Win32 >Performing Makefile project actions >Traceback (most recent call last): > File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 519, in > main() > File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 507, in main > known_paths = addusersitepackages(known_paths) > File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 253, in addusersitepackages > user_site = getusersitepackages() > File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 228, in getusersitepackages > user_base = getuserbase() # this will also set USER_BASE > File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", line 218, in getuserbase > USER_BASE = get_config_var('userbase') > File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\sysconfig.py", line 586, in get_config_var > return get_config_vars().get(name) > File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\sysconfig.py", line 478, in get_config_vars > _CONFIG_VARS['abiflags'] = sys.abiflags >AttributeError: 'module' object has no attribute 'abiflags' As discussed on IRC, _CONFIG_VARS['abiflags'] = '' if sys.abiflags is not defined. Amaury is going to test that. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From greg.ewing at canterbury.ac.nz Thu Nov 25 01:19:37 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Nov 2010 13:19:37 +1300 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> Message-ID: <4CEDAB99.2000005@canterbury.ac.nz> On 24/11/10 13:22, James Y Knight wrote: > Instead, provide bidirectional iterators which can traverse the string by byte, > codepoint, or by grapheme Maybe it would be a good idea to add some iterators like this to Python. (Or has the time machine beaten me there?) -- Greg From stephen at xemacs.org Thu Nov 25 03:17:44 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 25 Nov 2010 11:17:44 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CED5E91.9070705@egenix.com> Message-ID: <87bp5eb0zb.fsf@uwakimon.sk.tsukuba.ac.jp> Alexander Belopolsky writes: > Any non-trivial text processing is likely to be broken in presence of > surrogates. If you're worried about this, write a UCS-2-producing codec that rejects surrogates or stuffs them into the private zone of the BMP. Maybe such a codec should be default, but so far nobody seems to want one enough; they want UTF-16 even though they know it's wrong. One of the things that makes the 16-bit code unit attractive to me is that the options for working around the variable-width nature of UTF-16 (without actually implementing conformance to UTF-16 in internal operations!) are many. If you use octets as code units, you don't have such options: you have to do it right. > Processing surrogate pairs in python code is hard. Sure, but as James Knight and MAL point out, so is processing compose characters, and those errors will go undetected in your proposals, even with a strict UCS-2 definition. What can you do? Banning composing characters isn't going to fly! > Yes, allowing non-trusted users to specify fill character is unlikely, > but it is quite likely that naive slicing or iteration over string > units would result in > > Traceback (most recent call last): Naive slicing yes, but naive iteration (ie, iteration that consumes the whole string, or up to a known character, rather than up to a specified position) is highly unlikely to result in such a traceback. It is precisely that property (non-BMP characters get passed through unchanged, or ignored) that makes extension to non-BMP code points attractive. > I agree again, but I feel that exposing code units rather than code > points at the Python string level takes us back to 2.x days of mixing > bytes and strings. It does, but there's a difference. With bytes as UTF-8, only ASCII values have defined semantics in Unicode. The rest have semantics that is context-dependent, and they are frequent in any non-English processing and many English use cases (math symbols, correctly- oriented punctuation). With 16-bit code units, all values have well- defined semantics in Unicode, and non-characters are going to be extremely rare in the vast majority of use cases. IOW, you can think of Python as a UCS-2 device processing characters, and let surrounding UTF-16 processors deal with the errors. > Let me quote Guido circa 2001 again: > > """ > ... if we had wanted to use a > variable-lenth internal representation, we should have picked UTF-8 > way back, like Perl did. Moving to a UTF-16-based internal > representation now will give us all the problems of the Perl choice > without any of the benefits. > """ > > I don't understand what changed since 2001 that made this argument > invalid. Nothing. The internal representation of Python is UCS-2, not UTF-16. People who want to think otherwise are kidding themselves. The presence of surrogates is not sufficient to call something UTF-16. Preserving the Unicode code points through any builtin operations is a necessary condition, and Python doesn't do that. *However*, in my opinion, it's not a big deal to allow surrogates in UCS-2 a la ISO 10646-1:1996. That lets people who want a quick and dirty way to handle BMP text that *might* (but usually won't) contain some non-BMP characters go a long way fast. "Although practicality beats purity." > I note that an opinion has been raised on this thread that > if we want compressed internal representation for strings, we should > use UTF-8. I tend to agree, but UTF-8 has been repeatedly rejected as > too hard to implement. What makes UTF-16 easier than UTF-8? Only the > fact that you can ignore bugs longer, in my view. That's mostly true. My guess is that we can probably ignore those bugs for as long as it takes someone to write the higher-level libraries that James suggests and MAL has actually proposed and started a PEP for. From greg.ewing at canterbury.ac.nz Thu Nov 25 03:35:50 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Nov 2010 15:35:50 +1300 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87mxozayam.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> <3C1ADB64-63F3-4165-926D-EDE9846E0DBD@fuhm.net> <87mxozayam.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEDCB86.9030506@canterbury.ac.nz> On 24/11/10 22:03, Stephen J. Turnbull wrote: > But > if you actually need to remember positions, or regions, to jump to > later or to communicate to other code that manipulates them, doing > this stuff the straightforward way (just copying the whole iterator > object to hang on to its state) becomes expensive. If the internal representation of a text pointer (I won't call it an iterator because that means something else in Python) is a byte offset or something similar, it shouldn't take up any more space than a Python int, which is what you'd be using anyway if you represented text positions by grapheme indexes or whatever. If you want the text pointer to also remember which string it points into, it'll be a bit bigger, but again, no bigger than you would need to get the same functionality using a grapheme index plus a reference to the original string. Probably smaller, because it would all be encapsulated in one object. So I don't really see what you're arguing for here. How do *you* think positions in unicode strings should be represented? -- Greg From greg.ewing at canterbury.ac.nz Thu Nov 25 04:19:33 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Nov 2010 16:19:33 +1300 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEDD5C5.9050306@canterbury.ac.nz> On 25/11/10 06:37, Alexander Belopolsky wrote: > I don't think there is a recipe on how to fix legacy > character-by-character processing loop such as > > for c in string: > ... > > to make it iterate over code points consistently in wide and narrow > builds. A couple of possibilities: 1) Make things so that 'for c in string' does actually iterate over characters rather than code units. This could break existing code, though. 2) Provide some things like for c in string.chars(): ... for c in string.graphemes(): ... where chars() and graphemes() return appropriate iterators. (Or possibly iterable views, but that would raise the expectation that the views could also be randomly indexed by char or grapheme, which we probably wouldn't want to support.) -- Greg From greg.ewing at canterbury.ac.nz Thu Nov 25 04:46:53 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Nov 2010 16:46:53 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> Message-ID: <4CEDDC2D.204@canterbury.ac.nz> On 25/11/10 12:38, average wrote: > Is immutability a general need that should have general solution? I don't think it really generalizes. Tuples are not just frozen lists, for example -- they have a different internal structure that's more efficient to create and access. -- Greg From stephen at xemacs.org Thu Nov 25 04:55:40 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 25 Nov 2010 12:55:40 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEDCB86.9030506@canterbury.ac.nz> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> <3C1ADB64-63F3-4165-926D-EDE9846E0DBD@fuhm.net> <87mxozayam.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEDCB86.9030506@canterbury.ac.nz> Message-ID: <87ipzm6oqr.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > On 24/11/10 22:03, Stephen J. Turnbull wrote: > > But > > if you actually need to remember positions, or regions, to jump to > > later or to communicate to other code that manipulates them, doing > > this stuff the straightforward way (just copying the whole iterator > > object to hang on to its state) becomes expensive. > > If the internal representation of a text pointer (I won't call it > an iterator because that means something else in Python) is a byte > offset or something similar, it shouldn't take up any more space > than a Python int, which is what you'd be using anyway if you > represented text positions by grapheme indexes or whatever. That's not necessarily true. Eg, in Emacs ("there you go again"), Lisp integers are not only immediate (saving one pointer), but the type is encoded in the lower bits, so that there is no need for a type pointer -- the representation is smaller than the opaque marker type. Altogether, up to 8 of 12 bytes saved on a 32-bit platform, or 16 of 24 bytes on a 64-bit platform. In Python it's true that markers can use the same data structure as integers and simply provide different methods, and it's arguable that Python's design is better. But if you use bytes internally, then you have problems. Do you expose that byte value to the user? Can users (programmers using the language and end users) specify positions in terms of byte values? If so, what do you do if the user specifies a byte value that points into a multibyte character? What if the user wants to specify position by number of characters? Can you translate efficiently? As I say elsewhere, it's possible that there really never is a need to efficiently specify an absolute position in a large text as a character (grapheme, whatever) count. But I think it would be hard to implement an efficient text-processing *language*, eg, a Python module for *full conformance* in handling Unicode, on top of UTF-8. Any time you have an algorithm that requires efficient access to arbitrary text positions, you'll spend all your skull sweat fighting the representation. At least, that's been my experience with Emacsen. > So I don't really see what you're arguing for here. How do > *you* think positions in unicode strings should be represented? I think what users should see is character positions, and they should be able to specify them numerically as well as via an opaque marker object. I don't care whether that position is represented as bytes or characters internally, except that the experience of Emacsen is that representation as byte positions is both inefficient and fragile. The representation as character positions is more robust but slightly more inefficient. From alexander.belopolsky at gmail.com Thu Nov 25 05:37:33 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 24 Nov 2010 23:37:33 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87bp5eb0zb.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CED5E91.9070705@egenix.com> <87bp5eb0zb.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Nov 24, 2010 at 9:17 PM, Stephen J. Turnbull wrote: .. > ?> I note that an opinion has been raised on this thread that > ?> if we want compressed internal representation for strings, we should > ?> use UTF-8. ?I tend to agree, but UTF-8 has been repeatedly rejected as > ?> too hard to implement. ?What makes UTF-16 easier than UTF-8? ?Only the > ?> fact that you can ignore bugs longer, in my view. > > That's mostly true. ?My guess is that we can probably ignore those > bugs for as long as it takes someone to write the higher-level > libraries that James suggests and MAL has actually proposed and > started a PEP for. > As far as I can tell, that PEP generated grand total of one comment in nine years. This may or may not be indicative of how far away we are from seeing it implemented. :-) As far as UTF-8 vs. UCS-2/4 debate, I have an idea that may be even more far fetched. Once upon a time, Python Unicode strings supported buffer protocol and would lazily fill an internal buffer with bytes in the default encoding. In 3.x the default encoding has been fixed as UTF-8, buffer protocol support was removed from strings, but the internal buffer caching (now UTF-8) encoded representation remained. Maybe we can now implement defenc logic in reverse. Recall that strings are stored as UCS-2/4 sequences, but once buffer is requested in 2.x Python code or char* is obtained via _PyUnicode_AsStringAndSize() at the C level in 3.x, an internal buffer is filled with UTF-8 bytes and defenc is set to point to that buffer. So the idea is for strings to store their data as UTF-8 buffer pointed by defenc upon construction. If an application uses string indexing, UTF-8 only strings will lazily fill their UCS-2/4 buffer. Proper, Unicode-aware algorithms such as grapheme, word or line iteration or simple operations such as concatenation, search or substitution would operate directly on defenc buffers. Presumably over time fewer and fewer applications would use code unit indexing that require UCS-2/4 buffer and eventually Python strings can stop supporting indexing altogether just like they stopped supporting the buffer protocol in 3.x. From tjreedy at udel.edu Thu Nov 25 06:22:01 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 25 Nov 2010 00:22:01 -0500 Subject: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS In-Reply-To: References: <20101123203252.39BE7EE9CF@mail.python.org> <4CEC43A4.80907@netwok.org> <4CEC4917.2070508@udel.edu> Message-ID: On 11/24/2010 3:04 PM, Georg Brandl wrote: >>> Adding the BOM will be an editor thing, not a svn thing. Doing a > It should show up as an invisible change in the first line of a file when you > look at a "svn diff". (It is a very good practice to look at a diff before > committing anyway.) It does show up, and yes I agree. That should be in dev/faq if not already -- Terry Jan Reedy From tjreedy at udel.edu Thu Nov 25 06:23:27 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 25 Nov 2010 00:23:27 -0500 Subject: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS In-Reply-To: <4CED8E1E.5050400@v.loewis.de> References: <20101123203252.39BE7EE9CF@mail.python.org> <4CEC43A4.80907@netwok.org> <4CEC4917.2070508@udel.edu> <4CED8E1E.5050400@v.loewis.de> Message-ID: On 11/24/2010 5:13 PM, "Martin v. L?wis" wrote: >> So I presume it did the same with IOBinding.py. > > No. This file contains only ASCII characters, so notepad has decided > to not add the BOM. Or it somehow got removed from the .py file. I tried with another .py file (and reverted!) and the diff showed the invisible change to the first line that Georg predicted. -- Terry Jan Reedy From tjreedy at udel.edu Thu Nov 25 06:39:30 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 25 Nov 2010 00:39:30 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CED5E91.9070705@egenix.com> Message-ID: On 11/24/2010 3:06 PM, Alexander Belopolsky wrote: > Any non-trivial text processing is likely to be broken in presence of > surrogates. Producing them on input is just trading known issue for > an unknown one. Processing surrogate pairs in python code is hard. > Software that has to support non-BMP characters will most likely be > written for a wide build and contain subtle bugs when run under a > narrow build. Note that my latest proposal does not abolish > surrogates outright. Users who want them can still use something like > "surrogateescape" error handler for non-BMP characters. It seems to me that what you are asking for is an alternate, optional, utf-8-bmp codec that would raise an error, in either direction, for non-bmp chars. Then, as you suggest, if one is not prepared for surrogates, they are not allowed. -- Terry Jan Reedy From anurag.chourasia at gmail.com Thu Nov 25 10:24:34 2010 From: anurag.chourasia at gmail.com (Anurag Chourasia) Date: Thu, 25 Nov 2010 14:54:34 +0530 Subject: [Python-Dev] AIX 5.3 - Enabling Shared Library Support Vs Extensions Message-ID: All, When I configure python to enable shared libraries, none of the extensions are getting built during the make step due to this error. building 'cStringIO' extension gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I. -I/u01/home/apli/wm/GDD/Python-2.6.6/./Include -I. -IInclude -I./Include -I/opt/freeware/include -I/opt/freeware/include/readline -I/opt/freeware/include/ncurses -I/usr/local/include -I/u01/home/apli/wm/GDD/Python-2.6.6/Include -I/u01/home/apli/wm/GDD/Python-2.6.6 -c /u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.c -o build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.o ./Modules/ld_so_aix gcc -pthread -bI:Modules/python.exp build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.o -L/usr/local/lib *-lpython2.6* -o build/lib.aix-5.3-2.6/cStringIO.so *collect2: library libpython2.6 not found* building 'cPickle' extension gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I. -I/u01/home/apli/wm/GDD/Python-2.6.6/./Include -I. -IInclude -I./Include -I/opt/freeware/include -I/opt/freeware/include/readline -I/opt/freeware/include/ncurses -I/usr/local/include -I/u01/home/apli/wm/GDD/Python-2.6.6/Include -I/u01/home/apli/wm/GDD/Python-2.6.6 -c /u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.c -o build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.o ./Modules/ld_so_aix gcc -pthread -bI:Modules/python.exp build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.o -L/usr/local/lib *-lpython2.6* -o build/lib.aix-5.3-2.6/cPickle.so *collect2: library libpython2.6 not found* This is on AIX 5.3, GCC 4.2, Python 2.6.6 I can confirm that there is a libpython2.6.a file in the top level directory from where I am doing the configure/make etc Here are the options supplied to the configure command ./configure --enable-shared --disable-ipv6 --with-gcc=gcc CPPFLAGS="-I /opt/freeware/include -I /opt/freeware/include/readline -I /opt/freeware/include/ncurses" Please guide me in getting past this error. Thanks for your help on this. Regards, Anurag -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Thu Nov 25 10:34:51 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 25 Nov 2010 01:34:51 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEC2759.40203@g.nevcal.com> References: <20101121034404.52924F20A@mail.python.org> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535602.3642.87.camel@localhost.localdomain> <4CEC2759.40203@g.nevcal.com> Message-ID: <4CEE2DBB.3040502@g.nevcal.com> So the following code defines constants with associated names that get put in the repr. I'm still a Python newbie in some areas, particularly classes and metaclasses, maybe more. But this Python 3 code seems to create constants with names ... works for int and str at least. Special case for int defines a special __or__ operator to OR both the values and the names, which some might like. Dunno why it doesn't work for dict, and it is too late to research that today. That's the last test case in the code below, so you can see how it works for int and string before it bombs. There's some obvious cleanup work to be done, and it would be nice to make the names actually be constant... but they do lose their .name if you ignorantly assign the base type, so at least it is hard to change the value and keep the associated .name that gets reported by repr, which might reduce some confusion at debug time. An idea I had, but have no idea how to implement, is that it might be nice to say: with imported_constants_from_module: do_stuff where do_stuff could reference the constants without qualifying them by module. Of course, if you knew it was just a module of constants, you could "import * from module" :) But the idea of with is that they'd go away at the end of that scope. Some techniques here came from Raymond's namedtuple code. def constant( name, val ): typ = str( type( val )) if typ.startswith(" ": typ = typ[ 8:-2 ] ev = ''' class constant_%s( %s ): def __new__( cls, val, name ): self = %s.__new__( cls, val ) self.name = name return self def __repr__( self ): return self.name + ': ' + str( self ) ''' if typ == 'int': ev += ''' def __or__( self, other ): if isinstance( other, constant_int ): return constant_int( int( self ) | int( other ), self.name + ' | ' + other.name ) ''' ev += ''' %s = constant_%s( %s, '%s' ) ''' ev = ev % ( typ, typ, typ, name, typ, repr( val ), name ) print( ev ) exec( ev, globals()) constant('O_RANDOM', val=16 ) constant('O_SEQUENTIAL', val=32 ) constant("O_STRING", val="string") def foo( x ): print( str( x )) print( repr( x )) print( type( x )) foo( O_RANDOM ) foo( O_SEQUENTIAL ) foo( O_STRING ) zz = O_RANDOM | O_SEQUENTIAL foo( zz ) y = {'ab': 2, 'yz': 3 } constant('O_DICT', y ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu Nov 25 10:51:09 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 25 Nov 2010 10:51:09 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CED5E91.9070705@egenix.com> Message-ID: <4CEE318D.5000705@egenix.com> Terry Reedy wrote: > On 11/24/2010 3:06 PM, Alexander Belopolsky wrote: > >> Any non-trivial text processing is likely to be broken in presence of >> surrogates. Producing them on input is just trading known issue for >> an unknown one. Processing surrogate pairs in python code is hard. >> Software that has to support non-BMP characters will most likely be >> written for a wide build and contain subtle bugs when run under a >> narrow build. Note that my latest proposal does not abolish >> surrogates outright. Users who want them can still use something like >> "surrogateescape" error handler for non-BMP characters. > > It seems to me that what you are asking for is an alternate, optional, > utf-8-bmp codec that would raise an error, in either direction, for > non-bmp chars. Then, as you suggest, if one is not prepared for > surrogates, they are not allowed. That would be a possibility as well... but I doubt that many users are going to bother, since slicing surrogates is just as bad as slicing combining code points and the latter are much more common in real life and they do happen to mostly live in the BMP. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 25 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Thu Nov 25 10:57:17 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 25 Nov 2010 10:57:17 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CED5E91.9070705@egenix.com> <87bp5eb0zb.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CEE32FD.90507@egenix.com> Alexander Belopolsky wrote: > On Wed, Nov 24, 2010 at 9:17 PM, Stephen J. Turnbull wrote: > .. >> > I note that an opinion has been raised on this thread that >> > if we want compressed internal representation for strings, we should >> > use UTF-8. I tend to agree, but UTF-8 has been repeatedly rejected as >> > too hard to implement. What makes UTF-16 easier than UTF-8? Only the >> > fact that you can ignore bugs longer, in my view. >> >> That's mostly true. My guess is that we can probably ignore those >> bugs for as long as it takes someone to write the higher-level >> libraries that James suggests and MAL has actually proposed and >> started a PEP for. >> > > As far as I can tell, that PEP generated grand total of one comment in > nine years. This may or may not be indicative of how far away we are > from seeing it implemented. :-) At the time it was too early for people to start thinking about these issues. Actual use of Unicode really only started a few years ago. Since I didn't have a need for such an indexing module myself (and didn't have much time to work on it anyway), I punted on the idea. If someone else wants to pick up the idea, I'd gladly help out with the details. > As far as UTF-8 vs. UCS-2/4 debate, I have an idea that may be even > more far fetched. Once upon a time, Python Unicode strings supported > buffer protocol and would lazily fill an internal buffer with bytes in > the default encoding. In 3.x the default encoding has been fixed as > UTF-8, buffer protocol support was removed from strings, but the > internal buffer caching (now UTF-8) encoded representation remained. > Maybe we can now implement defenc logic in reverse. Recall that > strings are stored as UCS-2/4 sequences, but once buffer is requested > in 2.x Python code or char* is obtained via > _PyUnicode_AsStringAndSize() at the C level in 3.x, an internal buffer > is filled with UTF-8 bytes and defenc is set to point to that buffer. The original idea was for that buffer to go away once we moved to Unicode for strings. Reality has shown that we still need to stick the buffer, though, since the UTF-8 representation of Unicode objects is used a lot. > So the idea is for strings to store their data as UTF-8 buffer > pointed by defenc upon construction. If an application uses string > indexing, UTF-8 only strings will lazily fill their UCS-2/4 buffer. > Proper, Unicode-aware algorithms such as grapheme, word or line > iteration or simple operations such as concatenation, search or > substitution would operate directly on defenc buffers. Presumably > over time fewer and fewer applications would use code unit indexing > that require UCS-2/4 buffer and eventually Python strings can stop > supporting indexing altogether just like they stopped supporting the > buffer protocol in 3.x. I don't follow you: how would UTF-8, which has even more issues with variable length representation of code points, make something easier compared to UTF-16, which has far fewer such issues and then only for non-BMP code points ? Please note that we can only provide one way of string indexing in Python using the standard s[1] notation and since we don't want that operation to be fast and no more than O(1), using the code units as items is the only reasonable way to implement it. With an indexing module, we could then let applications work based on higher level indexing schemes such as complete code points (skipping surrogates), combined code points, graphemes (ignoring e.g. most control code points and zero width code points), words (with some customizations as to where to break words, which will likely have to be language dependent), lines (which can be complicated for scripts that use columns instead ;-)), paragraphs, etc. It would also help to add transparent indexing for right-to-left scripts and text that uses both left-to-right and right-to-left text (BIDI). However, in order for these indexing methods to actually work, they will need to return references to the code units, so we cannot just drop that access method. * Back on the surrogates topic: In any case, I think this discussion is losing its grip on reality. By far, most strings you find in actual applications don't use surrogates at all, so the problem is being exaggerated. If you need to be careful about surrogates for some reason, I think a single new method .hassurrogates() on string objects would go a long way in making detection and adding special-casing for these a lot easier. If adding support for surrogates doesn't make sense (e.g. in the case of the formatting methods), then we simply punt on that and leave such handling to other tools. * Regarding preventing surrogates from entering the Python runtime: It is by far more important to maintain round-trip safety for Unicode data, than getting every bit of code work correctly with surrogates (often, there won't be a single correct way). With a new method for fast detection of surrogates, we could protect code which obviously doesn't work with surrogates and then consider each case individually by either adding special cases as necessary or punting on the support. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 25 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From nadeem.vawda at gmail.com Thu Nov 25 11:12:20 2010 From: nadeem.vawda at gmail.com (Nadeem Vawda) Date: Thu, 25 Nov 2010 12:12:20 +0200 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEE2DBB.3040502@g.nevcal.com> References: <20101121034404.52924F20A@mail.python.org> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535602.3642.87.camel@localhost.localdomain> <4CEC2759.40203@g.nevcal.com> <4CEE2DBB.3040502@g.nevcal.com> Message-ID: On Thu, Nov 25, 2010 at 11:34 AM, Glenn Linderman wrote: > So the following code defines constants with associated names that get put > in the repr. The code you gave doesn't work if the constant() function is moved into a separate module from the code that calls it. The globals() function, as I understand it, gives you access to the global namespace *of the current module*, so the constants end up being defined in the module containing constant(), not the module you're calling it from. You could get around this by passing the globals of the calling module to constant(), but I think it's cleaner to use a class to provide a distinct namespace for the constants. > An idea I had, but have no idea how to implement, is that it might be nice > to say: > > ??? with imported_constants_from_module: > ??? ?????? do_stuff > > where do_stuff could reference the constants without qualifying them by > module.? Of course, if you knew it was just a module of constants, you could > "import * from module" :)? But the idea of with is that they'd go away at > the end of that scope. I don't think this is possible - the context manager protocol doesn't allow you to modify the namespace of the caller like that. Also, a with statement does not have its own namespace; any names defined inside its body will continue to be visible in the containing scope. Of course, if you want to achieve something similar (at function scope), you could say: def foo(bar, baz): from module import * ... From fuzzyman at voidspace.org.uk Thu Nov 25 11:34:25 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 25 Nov 2010 10:34:25 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535602.3642.87.camel@localhost.localdomain> <4CEC2759.40203@g.nevcal.com> <4CEE2DBB.3040502@g.nevcal.com> Message-ID: <4CEE3BB1.5090308@voidspace.org.uk> On 25/11/2010 10:12, Nadeem Vawda wrote: > On Thu, Nov 25, 2010 at 11:34 AM, Glenn Linderman wrote: >> So the following code defines constants with associated names that get put >> in the repr. > The code you gave doesn't work if the constant() function is moved > into a separate module from the code that calls it. The globals() > function, as I understand it, gives you access to the global namespace > *of the current module*, so the constants end up being defined in the > module containing constant(), not the module you're calling it from. > > You could get around this by passing the globals of the calling module > to constant(), but I think it's cleaner to use a class to provide a > distinct namespace for the constants. > >> An idea I had, but have no idea how to implement, is that it might be nice >> to say: >> >> with imported_constants_from_module: >> do_stuff >> >> where do_stuff could reference the constants without qualifying them by >> module. Of course, if you knew it was just a module of constants, you could >> "import * from module" :) But the idea of with is that they'd go away at >> the end of that scope. > I don't think this is possible - the context manager protocol doesn't > allow you to modify the namespace of the caller like that. Also, a > with statement does not have its own namespace; any names defined > inside its body will continue to be visible in the containing scope. > > Of course, if you want to achieve something similar (at function > scope), you could say: > > def foo(bar, baz): > from module import * > ... Not in Python 3 you can't. :-) That's invalid syntax, import * can only be used at module level. This makes *testing* import * (i.e. testing your __all__) annoying - you have to exec('from module import *') instead. Michael > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fuzzyman at voidspace.org.uk Thu Nov 25 11:37:13 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 25 Nov 2010 10:37:13 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEE2DBB.3040502@g.nevcal.com> References: <20101121034404.52924F20A@mail.python.org> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CEBCE92.40801@voidspace.org.uk> <20101123154229.474f7a90@pitrou.net> <1290524466.3642.4.camel@localhost.localdomain> <4CEBDA91.4050205@voidspace.org.uk> <1290526253.3642.9.camel@localhost.localdomain> <4CEBE06C.9030101@voidspace.org.uk> <1290528319.3642.11.camel@localhost.localdomain> <1290533860.3642.73.camel@localhost.localdomain> <1290535602.3642.87.camel@localhost.localdomain> <4CEC2759.40203@g.nevcal.com> <4CEE2DBB.3040502@g.nevcal.com> Message-ID: <4CEE3C59.1030002@voidspace.org.uk> On 25/11/2010 09:34, Glenn Linderman wrote: > So the following code defines constants with associated names that get > put in the repr. > > I'm still a Python newbie in some areas, particularly classes and > metaclasses, maybe more. > But this Python 3 code seems to create constants with names ... works > for int and str at least. > > Special case for int defines a special __or__ operator to OR both the > values and the names, which some might like. > > Dunno why it doesn't work for dict, and it is too late to research > that today. That's the last test case in the code below, so you can > see how it works for int and string before it bombs. > > There's some obvious cleanup work to be done, and it would be nice to > make the names actually be constant... but they do lose their .name if > you ignorantly assign the base type, so at least it is hard to change > the value and keep the associated .name that gets reported by repr, > which might reduce some confusion at debug time. > > An idea I had, but have no idea how to implement, is that it might be > nice to say: > > with imported_constants_from_module: > do_stuff > > where do_stuff could reference the constants without qualifying them > by module. Of course, if you knew it was just a module of constants, > you could "import * from module" :) But the idea of with is that > they'd go away at the end of that scope. > > Some techniques here came from Raymond's namedtuple code. > > > def constant( name, val ): > typ = str( type( val )) > if typ.startswith(" ": > typ = typ[ 8:-2 ] > ev = ''' > class constant_%s( %s ): > def __new__( cls, val, name ): > self = %s.__new__( cls, val ) > self.name = name > return self > def __repr__( self ): > return self.name + ': ' + str( self ) > ''' > if typ == 'int': > ev += ''' > def __or__( self, other ): > if isinstance( other, constant_int ): > return constant_int( int( self ) | int( other ), > self.name + ' | ' + other.name ) > ''' Not quite correct. If you or a value you with itself you should get back just the value not something with "name|name" as the repr. We can hold off on implementations until we have general agreement that some kind of named constant *should* be added, and what the feature set should look like. All the best, Michael > ev += ''' > %s = constant_%s( %s, '%s' ) > > ''' > ev = ev % ( typ, typ, typ, name, typ, repr( val ), name ) > print( ev ) > exec( ev, globals()) > > constant('O_RANDOM', val=16 ) > > constant('O_SEQUENTIAL', val=32 ) > > constant("O_STRING", val="string") > > def foo( x ): > print( str( x )) > print( repr( x )) > print( type( x )) > > foo( O_RANDOM ) > foo( O_SEQUENTIAL ) > foo( O_STRING ) > > zz = O_RANDOM | O_SEQUENTIAL > > foo( zz ) > > y = {'ab': 2, 'yz': 3 } > constant('O_DICT', y ) > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From merwok at netwok.org Thu Nov 25 12:47:00 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Thu, 25 Nov 2010 12:47:00 +0100 Subject: [Python-Dev] [Python-checkins] r86748 - in python/branches/py3k-urllib/Lib: http/client.py urllib/request.py In-Reply-To: <20101125081820.7FA2EEEA97@mail.python.org> References: <20101125081820.7FA2EEEA97@mail.python.org> Message-ID: <4CEE4CB4.6010107@netwok.org> > Author: senthil.kumaran > New Revision: 86748 > > Log: > Experimental - Transparent gzip Encoding in urllib2. There should be a good way to deal with Content-Length. Cool feature! But... > Modified: > python/branches/py3k-urllib/Lib/http/client.py > python/branches/py3k-urllib/Lib/urllib/request.py No tests? Misc/NEWS? :) Regards From rob.cliffe at btinternet.com Thu Nov 25 13:52:44 2010 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Thu, 25 Nov 2010 12:52:44 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEDDC2D.204@canterbury.ac.nz> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CEDDC2D.204@canterbury.ac.nz> Message-ID: <4CEE5C1C.9000905@btinternet.com> On 25/11/2010 03:46, Greg Ewing wrote: > On 25/11/10 12:38, average wrote: >> Is immutability a general need that should have general solution? > Yes, I have sometimes thought this. Might be nice to have a "mutable" attribute that could be read and could be changed from True to False, though presumably not vice versa. > I don't think it really generalizes. Tuples are not just frozen > lists, for example -- they have a different internal structure > that's more efficient to create and access. > But couldn't they be presented to the Python programmer as a single type, with the implementation details hidden "under the hood"? So MyList.__mutable__ = False would have the same effect as the present MyList = tuple(MyList) This would simplify some code that copes with either list(s) or tuple(s) as input data. One would need syntax for (im)mutable literals, e.g. []i # immutable list (really a tuple). Bit of a shame that "i[]" doesn't work. or []f # frozen list (same thing) [] # mutable list (same as now) []m # alternative syntax for mutable list This would reduce the overloading on parentheses and avoid having to write a tuple of one item as (t,) which often trips up newbies. It woud also avoid one FAQ: Why does Python have separate list and tuple types? Also the syntax could be extended, e.g. {a,b,c}f # frozen set with 3 objects {p:x,q:y}f # frozen dictionary with 2 items {:}f, {}f # (re the thread on set literals) frozen empty dictionary and frozen empty set! Just some thoughts for Python 4. Best wishes Rob Cliffe From g.brandl at gmx.net Thu Nov 25 14:27:14 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 25 Nov 2010 14:27:14 +0100 Subject: [Python-Dev] [Python-checkins] r86748 - in python/branches/py3k-urllib/Lib: http/client.py urllib/request.py In-Reply-To: <4CEE4CB4.6010107@netwok.org> References: <20101125081820.7FA2EEEA97@mail.python.org> <4CEE4CB4.6010107@netwok.org> Message-ID: Am 25.11.2010 12:47, schrieb ?ric Araujo: >> Author: senthil.kumaran >> New Revision: 86748 >> >> Log: >> Experimental - Transparent gzip Encoding in urllib2. There should be a good way to deal with Content-Length. > Cool feature! But... > >> Modified: >> python/branches/py3k-urllib/Lib/http/client.py >> python/branches/py3k-urllib/Lib/urllib/request.py > No tests? Misc/NEWS? :) Note that this is work in a separate branch. Georg From emile.anclin at logilab.fr Thu Nov 25 15:30:23 2010 From: emile.anclin at logilab.fr (Emile Anclin) Date: Thu, 25 Nov 2010 15:30:23 +0100 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError Message-ID: <201011251530.23947.emile.anclin@logilab> hello, working on Pylint, we have a lot of voluntary corrupted files to test Pylint behavior; for instance $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py # -*- coding: IBO-8859-1 -*- """ check correct unknown encoding declaration """ __revision__ = '????' and we try to find that module : find_module('func_unknown_encoding', None). But python3 raises SyntaxError in that case ; it didn't raise SyntaxError on python2 nor does so on our func_nonascii_noencoding and func_wrong_encoding modules (with obvious names) Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from imp import find_module >>> find_module('func_unknown_encoding', None) Traceback (most recent call last): File " ", line 1, in SyntaxError: encoding problem: with BOM >>> find_module('func_wrong_encoding', None) (<_io.TextIOWrapper name=5 encoding='utf-8'>, 'func_wrong_encoding.py', ('.py', 'U', 1)) >>> find_module('func_nonascii_noencoding', None) (<_io.TextIOWrapper name=6 encoding='utf-8'>, 'func_nonascii_noencoding.py', ('.py', 'U', 1)) So what is the reason of this selective behavior? Furthermore, there is BOM in our func_unknown_encoding.py module. -- Emile Anclin http://www.logilab.fr/ http://www.logilab.org/ Informatique scientifique & et gestion de connaissances From rrr at ronadam.com Thu Nov 25 18:22:58 2010 From: rrr at ronadam.com (Ron Adam) Date: Thu, 25 Nov 2010 11:22:58 -0600 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: <201011251530.23947.emile.anclin@logilab> References: <201011251530.23947.emile.anclin@logilab> Message-ID: <4CEE9B72.1070002@ronadam.com> On 11/25/2010 08:30 AM, Emile Anclin wrote: > > hello, > > working on Pylint, we have a lot of voluntary corrupted files to test > Pylint behavior; for instance > > $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py > # -*- coding: IBO-8859-1 -*- > """ check correct unknown encoding declaration > """ > > __revision__ = '????' > > > and we try to find that module : > find_module('func_unknown_encoding', None). But python3 raises SyntaxError > in that case ; it didn't raise SyntaxError on python2 nor does so on our > func_nonascii_noencoding and func_wrong_encoding modules (with obvious > names) > > Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) > [GCC 4.3.4] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> from imp import find_module >>>> find_module('func_unknown_encoding', None) > Traceback (most recent call last): > File " ", line 1, in > SyntaxError: encoding problem: with BOM >>>> find_module('func_wrong_encoding', None) > (<_io.TextIOWrapper name=5 encoding='utf-8'>, 'func_wrong_encoding.py', > ('.py', 'U', 1)) >>>> find_module('func_nonascii_noencoding', None) > (<_io.TextIOWrapper name=6 encoding='utf-8'>, > 'func_nonascii_noencoding.py', ('.py', 'U', 1)) > > > So what is the reason of this selective behavior? > Furthermore, there is BOM in our func_unknown_encoding.py module. I don't think there is a clear reason by design. Also try importing the same modules directly and noting the differences in the errors you get. For example, the problem that brought this to my attention in python3.2. >>> find_module('test/badsyntax_pep3120') Segmentation fault >>> from test import badsyntax_pep3120 Traceback (most recent call last): File " ", line 1, in File "/usr/local/lib/python3.2/test/badsyntax_pep3120.py", line 1 SyntaxError: Non-UTF-8 code starting with '\xf6' in file /usr/local/lib/python3.2/test/badsyntax_pep3120.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details The import statement uses parser.c, and tokenizer.c indirectly, to import a file, but the imp module uses tokenizer.c directly. They aren't consistent in how they handle errors because the different error messages are generated in different places depending on what the error is, *and* what the code path to get to that point was, *and* weather or not a filename was set. For the example above with imp.findmodule(), the filename isn't set, so you get a different error than if you used import, which uses the parser module and that does set the filename. From what I've seen, it would help if the imp module was rewritten to use parser.c like the import statement does, rather than tokenizer.c directly. The error handling in parser.c is much better than tokenizer.c. Possibly tokenizer.c could be cleaned up after that and be made much simpler. Ron Adam From rrr at ronadam.com Thu Nov 25 18:22:58 2010 From: rrr at ronadam.com (Ron Adam) Date: Thu, 25 Nov 2010 11:22:58 -0600 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: <201011251530.23947.emile.anclin@logilab> References: <201011251530.23947.emile.anclin@logilab> Message-ID: <4CEE9B72.1070002@ronadam.com> On 11/25/2010 08:30 AM, Emile Anclin wrote: > > hello, > > working on Pylint, we have a lot of voluntary corrupted files to test > Pylint behavior; for instance > > $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py > # -*- coding: IBO-8859-1 -*- > """ check correct unknown encoding declaration > """ > > __revision__ = '????' > > > and we try to find that module : > find_module('func_unknown_encoding', None). But python3 raises SyntaxError > in that case ; it didn't raise SyntaxError on python2 nor does so on our > func_nonascii_noencoding and func_wrong_encoding modules (with obvious > names) > > Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) > [GCC 4.3.4] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> from imp import find_module >>>> find_module('func_unknown_encoding', None) > Traceback (most recent call last): > File " ", line 1, in > SyntaxError: encoding problem: with BOM >>>> find_module('func_wrong_encoding', None) > (<_io.TextIOWrapper name=5 encoding='utf-8'>, 'func_wrong_encoding.py', > ('.py', 'U', 1)) >>>> find_module('func_nonascii_noencoding', None) > (<_io.TextIOWrapper name=6 encoding='utf-8'>, > 'func_nonascii_noencoding.py', ('.py', 'U', 1)) > > > So what is the reason of this selective behavior? > Furthermore, there is BOM in our func_unknown_encoding.py module. I don't think there is a clear reason by design. Also try importing the same modules directly and noting the differences in the errors you get. For example, the problem that brought this to my attention in python3.2. >>> find_module('test/badsyntax_pep3120') Segmentation fault >>> from test import badsyntax_pep3120 Traceback (most recent call last): File " ", line 1, in File "/usr/local/lib/python3.2/test/badsyntax_pep3120.py", line 1 SyntaxError: Non-UTF-8 code starting with '\xf6' in file /usr/local/lib/python3.2/test/badsyntax_pep3120.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details The import statement uses parser.c, and tokenizer.c indirectly, to import a file, but the imp module uses tokenizer.c directly. They aren't consistent in how they handle errors because the different error messages are generated in different places depending on what the error is, *and* what the code path to get to that point was, *and* weather or not a filename was set. For the example above with imp.findmodule(), the filename isn't set, so you get a different error than if you used import, which uses the parser module and that does set the filename. From what I've seen, it would help if the imp module was rewritten to use parser.c like the import statement does, rather than tokenizer.c directly. The error handling in parser.c is much better than tokenizer.c. Possibly tokenizer.c could be cleaned up after that and be made much simpler. Ron Adam From merwok at netwok.org Thu Nov 25 18:53:54 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Thu, 25 Nov 2010 18:53:54 +0100 Subject: [Python-Dev] [Python-checkins] r86748 - in python/branches/py3k-urllib/Lib: http/client.py urllib/request.py In-Reply-To: References: <20101125081820.7FA2EEEA97@mail.python.org> <4CEE4CB4.6010107@netwok.org> Message-ID: <4CEEA2B2.1030306@netwok.org> >>> Modified: >>> python/branches/py3k-urllib/Lib/http/client.py >>> python/branches/py3k-urllib/Lib/urllib/request.py >> No tests? Misc/NEWS? :) > > Note that this is work in a separate branch. Ah, didn?t notice that! Senthil replied as much in private email: > That was in a different branch. Once stable shall definitey include > the tests and news. unconsciously-ignoring-svn-branches-to-preserve-sanity-ly yours, ?ric From victor.stinner at haypocalc.com Thu Nov 25 22:39:00 2010 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 25 Nov 2010 22:39:00 +0100 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CE6F93F.9010109@egenix.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> Message-ID: <201011252239.00288.victor.stinner@haypocalc.com> On Friday 19 November 2010 23:25:03 you wrote: > > Python is unclear about non-BMP characters: narrow build was called > > "ucs2" for long time, even if it is UTF-16 (each character is encoded to > > one or two UTF-16 words). > > No, no, no :-) > > UCS2 and UCS4 are more appropriate than "narrow" and "wide" or even > "UTF-16" and "UTF-32". Ok for Python 2: $ ./python Python 2.7.0+ (release27-maint:84618M, Sep 8 2010, 12:43:49) >>> import sys; sys.maxunicode 65535 >>> x=u'\U0010ffff'; len(x) 2 >>> ord(x) ... TypeError: ord() expected a character, but string of length 2 found But Python 3 does use UTF-16 for narrow build: $ ./python Python 3.2a3+ (py3k:86396:86399M, Nov 10 2010, 15:24:09) >>> import sys; sys.maxunicode 65535 >>> c=chr(0x10ffff); len(c) 2 >>> ord(c) 1114111 Victor From merwok at netwok.org Fri Nov 26 02:32:43 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Fri, 26 Nov 2010 02:32:43 +0100 Subject: [Python-Dev] [Python-checkins] r86750 - python/branches/py3k/Demo/curses/life.py In-Reply-To: <20101125145644.D98FAEEA26@mail.python.org> References: <20101125145644.D98FAEEA26@mail.python.org> Message-ID: <4CEF0E3B.2070608@netwok.org> Hello, > Author: senthil.kumaran > Log: > Mouse support and colour to Demo/curses/life.py by Dafydd Crosby > > Modified: > python/branches/py3k/Demo/curses/life.py Okay, this time I?m reacting to the right branch > Modified: python/branches/py3k/Demo/curses/life.py > ============================================================================== > --- python/branches/py3k/Demo/curses/life.py (original) > +++ python/branches/py3k/Demo/curses/life.py Thu Nov 25 15:56:44 2010 > @@ -1,6 +1,7 @@ > #!/usr/bin/env python3 > # life.py -- A curses-based version of Conway's Game of Life. > # Contributed by AMK > +# Mouse support and colour by Dafydd Crosby Shouldn?t his name rather be in Misc/ACKS too? Modules typically (warning: non-scientific data) include the name of the author or first contributors but not the name of every contributor. I think these cool features deserve a note in Misc/NEWS too :) Re: ?colour?: the rest of the file use US English, as do the function names (see for example curses.has_color). It?s good to use one dialect consistently in one file. going-back-to-stare-at-shiny-colors-ly yours, ?ric From orsenthil at gmail.com Fri Nov 26 03:15:24 2010 From: orsenthil at gmail.com (Senthil Kumaran) Date: Fri, 26 Nov 2010 10:15:24 +0800 Subject: [Python-Dev] [Python-checkins] r86750 - python/branches/py3k/Demo/curses/life.py In-Reply-To: <4CEF0E3B.2070608@netwok.org> References: <20101125145644.D98FAEEA26@mail.python.org> <4CEF0E3B.2070608@netwok.org> Message-ID: <20101126021524.GA1450@rubuntu> On Fri, Nov 26, 2010 at 02:32:43AM +0100, ?ric Araujo wrote: > Shouldn?t his name rather be in Misc/ACKS too? Modules typically > (warning: non-scientific data) include the name of the author or first > contributors but not the name of every contributor. > > I think these cool features deserve a note in Misc/NEWS too :) I don't think it is required. Demo stuffs are usually fun demonstrations. The contributor had added his name to patch in the header, and I just left it like that. It's fine. For features and important patches (subjective), Misc/{ACKS,NEWS} are both added. > Re: ?colour?: the rest of the file use US English, as do the function > names (see for example curses.has_color). It?s good to use one dialect > consistently in one file. Good catch. Did not realize it because, we write it as colour too. Changing it. Thanks, Senthil From stephen at xemacs.org Fri Nov 26 03:42:33 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 26 Nov 2010 11:42:33 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEE318D.5000705@egenix.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CED5E91.9070705@egenix.com> <4CEE318D.5000705@egenix.com> Message-ID: <87fwuo7qli.fsf@uwakimon.sk.tsukuba.ac.jp> M.-A. Lemburg writes: > That would be a possibility as well... but I doubt that many users > are going to bother, since slicing surrogates is just as bad as > slicing combining code points and the latter are much more common in > real life and they do happen to mostly live in the BMP. That's only if you require 100% fidelity in the data, which may not be true in some use cases. Where 99.99% fidelity is good enough, an unexpected sliced surrogate pair is a show-stopper, while a sliced combining character sequence not only doesn't stop the show (at least in Python, and I doubt any correct Unicode process can signal a fatal error there either, I can put a tilde on a Cyrillic character if I want to, no?), it's probably readable enough that readers will assume a keypunch error. Personally, if available I would always use some such dodge in server software (I don't care enough about 24x7 availability to write it myself, though). And never in a script for interactive use; something needs fixing, may as well take the fatal error and fix it on the spot. (Again, "on the spot" for me can mean "tomorrow".) From stephen at xemacs.org Fri Nov 26 04:02:09 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 26 Nov 2010 12:02:09 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <4CEE32FD.90507@egenix.com> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CED5E91.9070705@egenix.com> <87bp5eb0zb.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEE32FD.90507@egenix.com> Message-ID: <87eia87pou.fsf@uwakimon.sk.tsukuba.ac.jp> M.-A. Lemburg writes: > Please note that we can only provide one way of string indexing > in Python using the standard s[1] notation and since we don't > want that operation to be fast and no more than O(1), using the > code units as items is the only reasonable way to implement it. AFAICT, the "we" that wants "no more than O(1)" does not include Glyph Lefkowitz, James Knight, and Greg Ewing. Greg even said that in designing a UTF-8 string type he might not provide a indexing operation at all. (Caution: That may not be what he meant; I'm just reporting the way I interpreted it.) Of course none of them are proposing to change Python, that's all in the context of designing a new language. But it does suggest that a lot of people can't think of use cases where O(1) string indexing is more important than Unicode robustness. > It is by far more important to maintain round-trip safety for > Unicode data, than getting every bit of code work correctly > with surrogates (often, there won't be a single correct way). But surely it's more important than that to ensure that surrogates can't crash a Python process with unexpect UnicodeErrors? From jcea at jcea.es Fri Nov 26 05:11:56 2010 From: jcea at jcea.es (Jesus Cea) Date: Fri, 26 Nov 2010 05:11:56 +0100 Subject: [Python-Dev] Question about GDB bindings and 32/64 bits Message-ID: <4CEF338C.4070509@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I have installed GDB 7.2 32 bits and 32 bits buildslaves are green. Nevertheless 64 bits buildslaves are failing test_gdb. Is there any expectation that a 32 bits GDB be able to debug a 64 bits python?. If not, gdb test should compare "platform.architecture()" (for python and gdb in the system) and run only when they are the same. If this should work, I would open a bug and maybe spend some time with it. But before thinking about investing time, I would like to know if this mix is actually expected or not to work. If not, I would consider to install a 64 bits GDB too and do some tricks (like using an "/usr/local/bin/gdb" script wrapper to choose 32/64 "real" gdb version) to actually execute "test_gdb" in both buildslaves (they are running in the same physical machine). Any advice? PS: I am talking about AMD64 OpenIndiana buildbots. Haven't check others. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTO8zjJlgi5GaxT1NAQLusgP9GVuhvQJWhPqjzdkZnrMObQg0AD6ggbIR 2B4IstFpD1bKvIcGPJv0Irk3+heaQuFbTzYVLC132d89Ektfib9ZbJ/hzJz2wqd2 lnkfNUCV0tKal3P7kbGYUk828glIrlufSuF1HYIknd2BAzHFl5Zf6q5/AXzYr90D v4Y82b7Wg0k= =NHcR -----END PGP SIGNATURE----- From glyph at twistedmatrix.com Fri Nov 26 08:21:26 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Fri, 26 Nov 2010 02:21:26 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87mxozayam.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> <3C1ADB64-63F3-4165-926D-EDE9846E0DBD@fuhm.net> <87mxozayam.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Nov 24, 2010, at 4:03 AM, Stephen J. Turnbull wrote: > You end up proliferating types that all do the same kind of thing. Judicious use of inheritance helps, but getting the fundamental abstraction right is hard. Or least, Emacs hasn't found it in 20 years of trying. Emacs hasn't even figured out how to do general purpose iteration in 20 years of trying either. The easiest way I've found to loop across an arbitrary pile of 'stuff' is the CL 'loop' macro, which you're not even supposed to use. Even then, you still have to make the arcane and pointless distinction of using 'across' or 'in' or 'on'. Python, on the other hand, has iteration pretty well tied up nicely in a bow. I don't know how to respond to the rest of your argument. Nothing you've said has in any way indicated to me why having code-point offsets is a good idea, only that people who know C and elisp would rather sling around piles of integers than have good abstract types. For example: > I think it more likely that markers are very expense to create and use compared to integers. What? When you do 'for x in str' in python, you are already creating an iterator object, which has to store the exact same amount of state that our proposed 'marker' or 'character pointer' would have to store. The proposed UTF-8 marker would have to do a tiny bit more work when iterating because it would have to combine multibyte characters, but in exchange for that you get to skip a whole ton of copying when encoding and decoding. How is this expensive to create and use? For every application I have ever designed, encountered, or can even conjecture about, this would be cheaper. (Assuming not just a UTF-8 string type, but one for UTF-16 as well, where native data is in that format already.) For what it's worth, not wanting to use abstract types in Emacs makes sense to me: I've written my share of elisp code, and it is hard to create reasonable abstractions in Emacs, because the facilities for defining types and creating polymorphic logic are so crude. It's a lot easier to just assume your underlying storage is an array, because at the end of the day you're going to need to call some functions on it which care whether it's an array or an alist or a list or a vector anyway, so you might as well just say so up front. But in Python we could just call 'mystring.by_character()' or 'mystring.by_codepoint()' and get an iterator object back and forget about all that junk. -------------- next part -------------- An HTML attachment was scrubbed... URL: From glyph at twistedmatrix.com Fri Nov 26 08:51:35 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Fri, 26 Nov 2010 02:51:35 -0500 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: <87ipzm6oqr.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> <3C1ADB64-63F3-4165-926D-EDE9846E0DBD@fuhm.net> <87mxozayam.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEDCB86.9030506@canterbury.ac.nz> <87ipzm6oqr.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Nov 24, 2010, at 10:55 PM, Stephen J. Turnbull wrote: > Greg Ewing writes: >> On 24/11/10 22:03, Stephen J. Turnbull wrote: >>> But >>> if you actually need to remember positions, or regions, to jump to >>> later or to communicate to other code that manipulates them, doing >>> this stuff the straightforward way (just copying the whole iterator >>> object to hang on to its state) becomes expensive. >> >> If the internal representation of a text pointer (I won't call it >> an iterator because that means something else in Python) is a byte >> offset or something similar, it shouldn't take up any more space >> than a Python int, which is what you'd be using anyway if you >> represented text positions by grapheme indexes or whatever. > > That's not necessarily true. Eg, in Emacs ("there you go again"), > Lisp integers are not only immediate (saving one pointer), but the > type is encoded in the lower bits, so that there is no need for a type > pointer -- the representation is smaller than the opaque marker type. > Altogether, up to 8 of 12 bytes saved on a 32-bit platform, or 16 of > 24 bytes on a 64-bit platform. Yes, yes, lisp is very clever. Maybe some other runtime, like PyPy, could make this optimization. But I don't think that anyone is filling up main memory with gigantic piles of character indexes and need to squeeze out that extra couple of bytes of memory on such a tiny object. Plus, this would allow such a user to stop copying the character data itself just to decode it, and on mostly-ascii UTF-8 text (a common use-case) this is a 2x savings right off the bat. > In Python it's true that markers can use the same data structure as > integers and simply provide different methods, and it's arguable that > Python's design is better. But if you use bytes internally, then you > have problems. No, you just have design questions. > Do you expose that byte value to the user? Yes, but only if they ask for it. It's useful for computing things like quota and the like. > Can users (programmers using the language and end users) specify positions in terms of byte values? Sure, why not? > If so, what do you do if the user specifies a byte value that points into a multibyte character? Go to the beginning of the multibyte character. Report that position; if the user then asks the requested marker object for its position, it will report that byte offset, not the originally-requested one. (Obviously, do the same thing for surrogate pair code points.) > What if the user wants to specify position by number of characters? Part of the point that we are trying to make here is that nobody really cares about that use-case. In order to know anything useful about a position in a text, you have to have traversed to that location in the text. You can remember interesting things like the offsets of starts of lines, or the x/y positions of characters. > Can you translate efficiently? No, because there's no point :). But you _could_ implement an overlay that cached things like the beginning of lines, or the x/y positions of interesting characters. > As I say elsewhere, it's possible that there really never is a need to efficiently specify an absolute position in a large text as a character (grapheme, whatever) count. > But I think it would be hard to implement an efficient text-processing *language*, eg, a Python module > for *full conformance* in handling Unicode, on top of UTF-8. Still: why? I guess if I have some free time I'll try my hand at it, and maybe I'll run into a wall and realize you're right :). > Any time you have an algorithm that requires efficient access to arbitrary text positions, you'll spend all your skull sweat fighting the representation. At least, that's been my experience with Emacsen. What sort of algorithm would that be, though? The main thing that I could think of is a text editor trying to efficiently allow the user to scroll to the middle of a large file without reading the whole thing into memory. But, in that case, you could use byte-positions to estimate, and display an heuristic number while calculating the real line numbers. (This is what 'less' does, and it seems to work well.) >> So I don't really see what you're arguing for here. How do >> *you* think positions in unicode strings should be represented? > > I think what users should see is character positions, and they should > be able to specify them numerically as well as via an opaque marker > object. I don't care whether that position is represented as bytes or > characters internally, except that the experience of Emacsen is that > representation as byte positions is both inefficient and fragile. The > representation as character positions is more robust but slightly more > inefficient. Is it really the representation as byte positions which is fragile (i.e. the internal implementation detail), or the exposure of that position to calling code, and the idiomatic usage of that number as an integer? -------------- next part -------------- An HTML attachment was scrubbed... URL: From facundobatista at gmail.com Fri Nov 26 16:05:09 2010 From: facundobatista at gmail.com (Facundo Batista) Date: Fri, 26 Nov 2010 12:05:09 -0300 Subject: [Python-Dev] [Preview] Comments and change proposals on documentation In-Reply-To: References: Message-ID: On Wed, Nov 24, 2010 at 5:24 PM, Georg Brandl wrote: > at , you can look at a version of the 3.2 > docs that has the upcoming commenting feature. ?JavaScript is mandatory. This is awesome!! Thanks for this work, remember to buy you a beer next PyCon! > Credits go to Jacob Mason, whose GSOC project is responsible for almost all > of what you see there. ?[1] Ok, two beers. -- .? ? Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From ocean-city at m2.ccsnet.ne.jp Fri Nov 26 17:33:50 2010 From: ocean-city at m2.ccsnet.ne.jp (Hirokazu Yamamoto) Date: Sat, 27 Nov 2010 01:33:50 +0900 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <201011140106.55153.victor.stinner@haypocalc.com> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011121308.30368.victor.stinner@haypocalc.com> <4CDEBB11.5050209@m2.ccsnet.ne.jp> <201011140106.55153.victor.stinner@haypocalc.com> Message-ID: <4CEFE16E.6040801@m2.ccsnet.ne.jp> On 2010/11/14 9:06, Victor Stinner wrote: > Yes, but how do you check if the input argument is a bytes or a str object > with your PyArg_Parse converter? You should use "O" format and manually > convert it to unicode, and then convert the result back to bytes (if the input > was bytes). It don't think that it makes the code shorter. > > The code is currently working. The question is if we have to drop the ANSI API > now, later or never. It looks like the decision moves to "later" (deprecate in > 3.2, remove in 3.3). I still think that drop now doesn't really hurt. > > Victor Humble thoughts... Is it possible a conversion from bytes (ANSI) to unicode fails on windows? If not, is it allowed to convert to unicode with PyUnicode_FSDecoder if function doesn't return str? For example, os.stat() takes str as arguments but doesn't return str. # I noticed win_readlink() in Modules/posixmodule.c already unicode # only. Maybe not so much problem? ;-) From ocean-city at m2.ccsnet.ne.jp Fri Nov 26 18:06:06 2010 From: ocean-city at m2.ccsnet.ne.jp (Hirokazu Yamamoto) Date: Sat, 27 Nov 2010 02:06:06 +0900 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <201011111718.08207.eckhardt@satorlaser.com> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011111718.08207.eckhardt@satorlaser.com> Message-ID: <4CEFE8FE.8060201@m2.ccsnet.ne.jp> On 2010/11/12 1:18, Ulrich Eckhardt wrote: >> # I recently did it for winsound.PlaySound with MvL's approval > > Interesting, is there a ticket associate with this? Also, was that on Python 3 > or 2? Which commits? Sorry for late posting. Rev 86300 and Issue 6317. From status at bugs.python.org Fri Nov 26 18:07:01 2010 From: status at bugs.python.org (Python tracker) Date: Fri, 26 Nov 2010 18:07:01 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20101126170701.EDA80104026@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2010-11-19 - 2010-11-26) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 2533 (-16) closed 19792 (+98) total 22325 (+82) Open issues with patches: 1083 Issues opened (66) ================== #1178: IDLE - add "paste code" functionality http://bugs.python.org/issue1178 reopened by ned.deily #3709: BaseHTTPRequestHandler innefficient when sending HTTP header http://bugs.python.org/issue3709 reopened by r.david.murray #5150: IDLE to support reindent.py http://bugs.python.org/issue5150 reopened by rhettinger #8879: Implement os.link on Windows http://bugs.python.org/issue8879 reopened by amaury.forgeotdarc #9769: PyUnicode_FromFormatV() doesn't handle non-ascii text correctl http://bugs.python.org/issue9769 reopened by belopolsky #10220: Make generator state easier to introspect http://bugs.python.org/issue10220 reopened by ncoghlan #10268: Add --enable-loadable-sqlite-extensions option to `configure` http://bugs.python.org/issue10268 reopened by ned.deily #10441: some stdlib modules need to be updated to handle SSL certifica http://bugs.python.org/issue10441 reopened by pitrou #10453: Add -h/--help option to compileall http://bugs.python.org/issue10453 reopened by eric.araujo #10464: netrc module not parsing passwords containing #s. http://bugs.python.org/issue10464 opened by the_isz #10466: locale.py resetlocale throws exception on Windows (getdefaultl http://bugs.python.org/issue10466 opened by skoczian #10469: test_socket fails using Visual Studio 2010 http://bugs.python.org/issue10469 opened by Kotan #10475: hardcoded compilers for LDSHARED/LDCXXSHARED on NetBSD http://bugs.python.org/issue10475 opened by njoly #10478: Ctrl-C locks up the interpreter http://bugs.python.org/issue10478 opened by isandler #10479: cgitb.py should assume a binary stream for output http://bugs.python.org/issue10479 opened by v+python #10480: cgi.py should document the need for binary stdin/stdout http://bugs.python.org/issue10480 opened by v+python #10481: subprocess PIPEs are byte streams http://bugs.python.org/issue10481 opened by v+python #10482: subprocess and deadlock avoidance http://bugs.python.org/issue10482 opened by v+python #10483: http.server - what is executable on Windows http://bugs.python.org/issue10483 opened by v+python #10484: http.server.is_cgi fails to handle CGI URLs containing PATH_IN http://bugs.python.org/issue10484 opened by v+python #10485: http.server fails when query string contains addition '?' char http://bugs.python.org/issue10485 opened by v+python #10486: http.server doesn't set all CGI environment variables http://bugs.python.org/issue10486 opened by v+python #10487: http.server - doesn't process Status: header from CGI scripts http://bugs.python.org/issue10487 opened by v+python #10492: test_doctest fails with iso-8859-15 locale http://bugs.python.org/issue10492 opened by pitrou #10494: Demo/comparisons/regextest.py needs some usage information. http://bugs.python.org/issue10494 opened by ramiroluz #10495: Demo/comparisons/sortingtest.py needs some usage information. http://bugs.python.org/issue10495 opened by ramiroluz #10496: "import site failed" when Python can't find home directory http://bugs.python.org/issue10496 opened by bbi5291 #10497: Incorrect use of gettext in argparse http://bugs.python.org/issue10497 opened by eric.araujo #10498: calendar.LocaleHTMLCalendar.formatyearpage() results in traceb http://bugs.python.org/issue10498 opened by r.david.murray #10499: Modular interpolation in configparser http://bugs.python.org/issue10499 opened by lukasz.langa #10500: Palevo.DZ worm msix86 installer 3.x installer http://bugs.python.org/issue10500 opened by VilIgnoble #10502: Add unittestguirunner to Tools/ http://bugs.python.org/issue10502 opened by michael.foord #10503: os.getuid() documentation should be clear on what kind of uid http://bugs.python.org/issue10503 opened by giampaolo.rodola #10504: Trivial mingw compile fixes http://bugs.python.org/issue10504 opened by jonny #10507: Check well-formedness of reST markup within "make patchcheck" http://bugs.python.org/issue10507 opened by dmalcolm #10509: PyTokenizer_FindEncoding can lead to a segfault if bad charact http://bugs.python.org/issue10509 opened by Trundle #10510: distutils upload/register should use CRLF in HTTP requests http://bugs.python.org/issue10510 opened by Brian.Jones #10512: regrtest ResourceWarning - unclosed sockets and files http://bugs.python.org/issue10512 opened by nvawda #10513: sqlite3.InterfaceError after commit http://bugs.python.org/issue10513 opened by anders.blomdell at control.lth.se #10514: configure does not create accurate Makefile http://bugs.python.org/issue10514 opened by daelious #10515: csv sniffer does not recognize quotes at the end of line http://bugs.python.org/issue10515 opened by Martin.Budaj #10516: Add list.clear() and list.copy() http://bugs.python.org/issue10516 opened by terry.reedy #10517: test_concurrent_futures crashes with "Fatal Python error: Inva http://bugs.python.org/issue10517 opened by lukasz.langa #10518: Bring back callable() http://bugs.python.org/issue10518 opened by pitrou #10519: setobject.c no-op typo http://bugs.python.org/issue10519 opened by arigo #10521: str methods don't accept non-BMP fillchar on a narrow Unicode http://bugs.python.org/issue10521 opened by belopolsky #10522: test_telnet exception http://bugs.python.org/issue10522 opened by pitrou #10523: argparse has problem parsing option files containing empty row http://bugs.python.org/issue10523 opened by Michal.Pomorski #10524: Patch to add Pardus to supported dists in platform http://bugs.python.org/issue10524 opened by zaburt #10527: multiprocessing.Pipe problem: "handle out of range in select() http://bugs.python.org/issue10527 opened by synapse #10528: argparse uses %s in gettext calls http://bugs.python.org/issue10528 opened by eric.araujo #10529: Write argparse i18n howto http://bugs.python.org/issue10529 opened by eric.araujo #10530: distutils2 should allow the installing of python files with in http://bugs.python.org/issue10530 opened by michael.foord #10531: write tilted text in turtle http://bugs.python.org/issue10531 opened by lanyjie #10532: A bug related to matching the empty string http://bugs.python.org/issue10532 opened by lanyjie #10533: Need example of using __missing__ http://bugs.python.org/issue10533 opened by lukasz.langa #10534: difflib.SequenceMatcher: expose junk sets, deprecate undocumen http://bugs.python.org/issue10534 opened by terry.reedy #10535: Enable warnings by default in unittest http://bugs.python.org/issue10535 opened by ezio.melotti #10536: Enhancements to gettext docs http://bugs.python.org/issue10536 opened by eric.araujo #10537: IDLE crashes when you paste something. http://bugs.python.org/issue10537 opened by 5ragar5 #10538: PyArg_ParseTuple("s*") does not always incref object http://bugs.python.org/issue10538 opened by krisvale #10539: Regular expression not checking 'range' element on 1st char in http://bugs.python.org/issue10539 opened by TxRxFx #10540: test_shutil fails on Windows after r86733 http://bugs.python.org/issue10540 opened by brian.curtin #10541: regrtest.py -T broken http://bugs.python.org/issue10541 opened by doerwalter #10542: Py_UNICODE_NEXT and other macros for surrogates http://bugs.python.org/issue10542 opened by belopolsky #10543: Test discovery (unittest) does not work with jython http://bugs.python.org/issue10543 opened by michael.foord Most recent 15 issues with no replies (15) ========================================== #10543: Test discovery (unittest) does not work with jython http://bugs.python.org/issue10543 #10542: Py_UNICODE_NEXT and other macros for surrogates http://bugs.python.org/issue10542 #10541: regrtest.py -T broken http://bugs.python.org/issue10541 #10539: Regular expression not checking 'range' element on 1st char in http://bugs.python.org/issue10539 #10538: PyArg_ParseTuple("s*") does not always incref object http://bugs.python.org/issue10538 #10537: IDLE crashes when you paste something. http://bugs.python.org/issue10537 #10536: Enhancements to gettext docs http://bugs.python.org/issue10536 #10534: difflib.SequenceMatcher: expose junk sets, deprecate undocumen http://bugs.python.org/issue10534 #10531: write tilted text in turtle http://bugs.python.org/issue10531 #10530: distutils2 should allow the installing of python files with in http://bugs.python.org/issue10530 #10523: argparse has problem parsing option files containing empty row http://bugs.python.org/issue10523 #10522: test_telnet exception http://bugs.python.org/issue10522 #10514: configure does not create accurate Makefile http://bugs.python.org/issue10514 #10507: Check well-formedness of reST markup within "make patchcheck" http://bugs.python.org/issue10507 #10499: Modular interpolation in configparser http://bugs.python.org/issue10499 Most recent 15 issues waiting for review (15) ============================================= #10542: Py_UNICODE_NEXT and other macros for surrogates http://bugs.python.org/issue10542 #10540: test_shutil fails on Windows after r86733 http://bugs.python.org/issue10540 #10536: Enhancements to gettext docs http://bugs.python.org/issue10536 #10535: Enable warnings by default in unittest http://bugs.python.org/issue10535 #10527: multiprocessing.Pipe problem: "handle out of range in select() http://bugs.python.org/issue10527 #10524: Patch to add Pardus to supported dists in platform http://bugs.python.org/issue10524 #10521: str methods don't accept non-BMP fillchar on a narrow Unicode http://bugs.python.org/issue10521 #10518: Bring back callable() http://bugs.python.org/issue10518 #10515: csv sniffer does not recognize quotes at the end of line http://bugs.python.org/issue10515 #10512: regrtest ResourceWarning - unclosed sockets and files http://bugs.python.org/issue10512 #10509: PyTokenizer_FindEncoding can lead to a segfault if bad charact http://bugs.python.org/issue10509 #10504: Trivial mingw compile fixes http://bugs.python.org/issue10504 #10499: Modular interpolation in configparser http://bugs.python.org/issue10499 #10498: calendar.LocaleHTMLCalendar.formatyearpage() results in traceb http://bugs.python.org/issue10498 #10497: Incorrect use of gettext in argparse http://bugs.python.org/issue10497 Top 10 most discussed issues (10) ================================= #10461: Use with statement throughout the docs http://bugs.python.org/issue10461 27 msgs #7995: On Mac / BSD sockets returned by accept inherit the parent's F http://bugs.python.org/issue7995 24 msgs #10453: Add -h/--help option to compileall http://bugs.python.org/issue10453 24 msgs #9915: speeding up sorting with a key http://bugs.python.org/issue9915 14 msgs #9742: Python 2.7: math module fails to build on Solaris 9 http://bugs.python.org/issue9742 13 msgs #10533: Need example of using __missing__ http://bugs.python.org/issue10533 13 msgs #9509: argparse FileType raises ugly exception for missing file http://bugs.python.org/issue9509 12 msgs #10469: test_socket fails using Visual Studio 2010 http://bugs.python.org/issue10469 12 msgs #10504: Trivial mingw compile fixes http://bugs.python.org/issue10504 12 msgs #10518: Bring back callable() http://bugs.python.org/issue10518 12 msgs Issues closed (92) ================== #2244: urllib and urllib2 decode userinfo multiple times http://bugs.python.org/issue2244 closed by orsenthil #2986: difflib.SequenceMatcher not matching long sequences http://bugs.python.org/issue2986 closed by terry.reedy #3292: Position index limit; s.insert(i,x) not same as s[i:i]=[x] http://bugs.python.org/issue3292 closed by rhettinger #4493: urllib2 doesn't always supply / where URI path component is em http://bugs.python.org/issue4493 closed by orsenthil #4925: Improve error message of subprocess when cannot open http://bugs.python.org/issue4925 closed by benjamin.peterson #5353: Improve IndexError messages with actual values http://bugs.python.org/issue5353 closed by rhettinger #5412: extend configparser to support mapping access(__*item__) http://bugs.python.org/issue5412 closed by lukasz.langa #5616: Distutils 2to3 support doesn't have the doctest_only flag. http://bugs.python.org/issue5616 closed by eric.araujo #6166: encoding error for 'setup.py --author' when read via subproces http://bugs.python.org/issue6166 closed by eric.araujo #6378: Patch to make 'idle.bat' run idle.pyw using appropriate Python http://bugs.python.org/issue6378 closed by brian.curtin #6466: duplicate get_version() code between cygwinccompiler and emxcc http://bugs.python.org/issue6466 closed by eric.araujo #6722: collections.namedtuple: confusing example http://bugs.python.org/issue6722 closed by rhettinger #6799: mimetypes does not give canonical extension for guess_extensio http://bugs.python.org/issue6799 closed by eric.araujo #6878: changed return type from tkinter.Canvas.coords http://bugs.python.org/issue6878 closed by belopolsky #7212: Retrieve an arbitrary element from a set without removing it http://bugs.python.org/issue7212 closed by rhettinger #7226: IDLE right-clicks don't work on Mac OS 10.5 http://bugs.python.org/issue7226 closed by ned.deily #7257: Improve documentation of list.sort and sorted() http://bugs.python.org/issue7257 closed by rhettinger #7645: test_distutils fails on Windows XP http://bugs.python.org/issue7645 closed by brian.curtin #7770: sin/cos function in decimal-docs http://bugs.python.org/issue7770 closed by rhettinger #7804: test_readline failure http://bugs.python.org/issue7804 closed by pitrou #8078: add more baud constants to termios http://bugs.python.org/issue8078 closed by pitrou #8340: bytearray undocumented on trunk http://bugs.python.org/issue8340 closed by pitrou #8381: IDLE 2.6 freezes on OS X 10.6 http://bugs.python.org/issue8381 closed by ned.deily #8569: Upgrade OpenSSL in Windows builds http://bugs.python.org/issue8569 closed by brian.curtin #8590: test_httpservers.CGIHTTPServerTestCase failure on 3.1-maint Ma http://bugs.python.org/issue8590 closed by michael.foord #8631: subprocess.Popen.communicate(...) hangs on Windows http://bugs.python.org/issue8631 closed by brian.curtin #8645: PyUnicode_AsEncodedObject is undocumented http://bugs.python.org/issue8645 closed by belopolsky #8646: PyUnicode_EncodeDecimal is undocumented http://bugs.python.org/issue8646 closed by belopolsky #8647: PyUnicode_GetMax is undocumented http://bugs.python.org/issue8647 closed by eric.araujo #8705: shutil.rmtree with empty filepath http://bugs.python.org/issue8705 closed by brian.curtin #8938: Mac OS dialogs(Save As..., Load) translation http://bugs.python.org/issue8938 closed by ned.deily #9222: IDLE: Fix open/saveas 'Files of type' choices http://bugs.python.org/issue9222 closed by terry.reedy #9500: urllib2: Content-Encoding http://bugs.python.org/issue9500 closed by r.david.murray #9732: Addition of getattr_static for inspect module http://bugs.python.org/issue9732 closed by michael.foord #9746: All sequence types support .index and .count http://bugs.python.org/issue9746 closed by eric.araujo #9802: Document 'stability' of builtin min() and max() http://bugs.python.org/issue9802 closed by rhettinger #9807: deriving configuration information for different builds with t http://bugs.python.org/issue9807 closed by barry #9846: ZipExtFile provides no mechanism for closing the underlying fi http://bugs.python.org/issue9846 closed by lukasz.langa #9852: test_ctypes fail with clang http://bugs.python.org/issue9852 closed by ned.deily #9876: ConfigParser can't interpolate values from other sections http://bugs.python.org/issue9876 closed by lukasz.langa #9965: Loading malicious pickle may cause excessive memory usage http://bugs.python.org/issue9965 closed by georg.brandl #10134: test_email failures on Windows: end of line issue? http://bugs.python.org/issue10134 closed by r.david.murray #10138: calendar module does not support years outside [1, 9999] range http://bugs.python.org/issue10138 closed by belopolsky #10164: Add an assertBytesEqual to unittest and use it for bytes asser http://bugs.python.org/issue10164 closed by rhettinger #10172: code block has no syntax coloring http://bugs.python.org/issue10172 closed by georg.brandl #10183: test_concurrent_futures failure on Windows http://bugs.python.org/issue10183 closed by bquinlan #10255: refleak in initstdio http://bugs.python.org/issue10255 closed by pitrou #10299: Add index with links section for built-in functions http://bugs.python.org/issue10299 closed by ezio.melotti #10319: SocketServer.TCPServer truncates responses on close (in some s http://bugs.python.org/issue10319 closed by orsenthil #10325: PY_LLONG_MAX & co - preprocessor constants or not? http://bugs.python.org/issue10325 closed by mark.dickinson #10366: Remove unneeded '(object)' from 3.x class examples http://bugs.python.org/issue10366 closed by eric.araujo #10371: Deprecate trace module undocumented API http://bugs.python.org/issue10371 closed by belopolsky #10377: cProfile incorrectly labels its output http://bugs.python.org/issue10377 closed by orsenthil #10391: obj2ast's error handling can lead to python crashing with a C- http://bugs.python.org/issue10391 closed by benjamin.peterson #10420: Document of Bdb.effective is wrong. http://bugs.python.org/issue10420 closed by georg.brandl #10430: _sha.sha().digest() method is endian-sensitive. and hexdigest( http://bugs.python.org/issue10430 closed by krisvale #10437: ThreadPoolExecutor should accept max_workers=None http://bugs.python.org/issue10437 closed by stutzbach #10439: PyCodec C API is not documented in reST http://bugs.python.org/issue10439 closed by georg.brandl #10448: Add Mako template benchmark to Python Benchmark Suite http://bugs.python.org/issue10448 closed by pitrou #10450: Fix markup in Misc/NEWS http://bugs.python.org/issue10450 closed by eric.araujo #10458: 2.7 += re.ASCII http://bugs.python.org/issue10458 closed by terry.reedy #10459: missing character names in unicodedata (CJK...) http://bugs.python.org/issue10459 closed by loewis #10460: Misc/indent.pro does not reflect PEP 7 http://bugs.python.org/issue10460 closed by georg.brandl #10462: Handler.close is not called in subclass while Logger.removeHan http://bugs.python.org/issue10462 closed by vinay.sajip #10463: Wrong return type for xml.etree.ElementTree.parse() http://bugs.python.org/issue10463 closed by tiwoc #10465: gzip module calls getattr incorrectly http://bugs.python.org/issue10465 closed by georg.brandl #10467: io.BytesIO.readinto() segfaults when used on BytesIO object se http://bugs.python.org/issue10467 closed by benjamin.peterson #10468: Document UnicodeError access functions http://bugs.python.org/issue10468 closed by georg.brandl #10470: python -m unittest ought to default to discovery http://bugs.python.org/issue10470 closed by michael.foord #10471: include documentation in python docs and under python -h for o http://bugs.python.org/issue10471 closed by georg.brandl #10472: Strange tab key behaviour in interactive python 2.7 OSX 10.6.2 http://bugs.python.org/issue10472 closed by ned.deily #10473: Strange behavior for socket.timeout http://bugs.python.org/issue10473 closed by ned.deily #10474: range.count returns boolean http://bugs.python.org/issue10474 closed by benjamin.peterson #10476: __iter__ on a byte file object using a method to return an ite http://bugs.python.org/issue10476 closed by benjamin.peterson #10477: AttributeError: 'NoneType' object has no attribute 'name' (bo http://bugs.python.org/issue10477 closed by eric.araujo #10488: Improve documentation for 'float' built-in. http://bugs.python.org/issue10488 closed by mark.dickinson #10489: configparser: remove broken `__name__` support http://bugs.python.org/issue10489 closed by lukasz.langa #10490: mimetypes read_windows_registry fails for non-ASCII keys http://bugs.python.org/issue10490 closed by r.david.murray #10491: Insecure Windows python directory permissions http://bugs.python.org/issue10491 closed by loewis #10493: test_strptime failures under OpenIndiana http://bugs.python.org/issue10493 closed by jcea #10501: make_buildinfo regression with unquoted path http://bugs.python.org/issue10501 closed by krisvale #10505: test_compileall: failure on Windows http://bugs.python.org/issue10505 closed by eric.araujo #10506: argparse execute system exit in python prompt http://bugs.python.org/issue10506 closed by r.david.murray #10508: compiler warnings about formatting pid_t as an int http://bugs.python.org/issue10508 closed by georg.brandl #10511: heapq docs clarification http://bugs.python.org/issue10511 closed by georg.brandl #10520: Build with --enable-shared fails http://bugs.python.org/issue10520 closed by barry #10525: Added mouse and colour support to Game of Life curses demo http://bugs.python.org/issue10525 closed by orsenthil #10526: Minor typo in What's New in Python 2.7 http://bugs.python.org/issue10526 closed by georg.brandl #10345: fcntl.ioctl always fails claiming an invalid fd http://bugs.python.org/issue10345 closed by ned.deily #1059244: distutil bdist hardcodes the python location http://bugs.python.org/issue1059244 closed by eric.araujo #1574217: isinstance swallows exceptions http://bugs.python.org/issue1574217 closed by r.david.murray #1699853: locale.getlocale() output fails as setlocale() input http://bugs.python.org/issue1699853 closed by r.david.murray From fijall at gmail.com Fri Nov 26 19:23:45 2010 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 26 Nov 2010 20:23:45 +0200 Subject: [Python-Dev] PyPy 1.4 released Message-ID: =============================== PyPy 1.4: Ouroboros in practice =============================== We're pleased to announce the 1.4 release of PyPy. This is a major breakthrough in our long journey, as PyPy 1.4 is the first PyPy release that can translate itself faster than CPython. Starting today, we are using PyPy more for our every-day development. So may you :) You can download it here: http://pypy.org/download.html What is PyPy ============ PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython. It's fast (`pypy 1.4 and cpython 2.6`_ comparison) Among its new features, this release includes numerous performance improvements (which made fast self-hosting possible), a 64-bit JIT backend, as well as serious stabilization. As of now, we can consider the 32-bit and 64-bit linux versions of PyPy stable enough to run `in production`_. Numerous speed achievements are described on `our blog`_. Normalized speed charts comparing `pypy 1.4 and pypy 1.3`_ as well as `pypy 1.4 and cpython 2.6`_ are available on benchmark website. For the impatient: yes, we got a lot faster! More highlights =============== * PyPy's built-in Just-in-Time compiler is fully transparent and automatically generated; it now also has very reasonable memory requirements. The total memory used by a very complex and long-running process (translating PyPy itself) is within 1.5x to at most 2x the memory needed by CPython, for a speed-up of 2x. * More compact instances. All instances are as compact as if they had ``__slots__``. This can give programs a big gain in memory. (In the example of translation above, we already have carefully placed ``__slots__``, so there is no extra win.) * `Virtualenv support`_: now PyPy is fully compatible with virtualenv_: note that to use it, you need a recent version of virtualenv (>= 1.5). * Faster (and JITted) regular expressions - huge boost in speeding up the `re` module. * Other speed improvements, like JITted calls to functions like map(). .. _virtualenv: http://pypi.python.org/pypi/virtualenv .. _`Virtualenv support`: http://morepypy.blogspot.com/2010/08/using-virtualenv-with-pypy.html .. _`in production`: http://morepypy.blogspot.com/2010/11/running-large-radio-telescope-software.html .. _`our blog`: http://morepypy.blogspot.com .. _`pypy 1.4 and pypy 1.3`: http://speed.pypy.org/comparison/?exe=1%2B41,1%2B172&ben=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20&env=1&hor=false&bas=1%2B41&chart=normal+bars .. _`pypy 1.4 and cpython 2.6`: http://speed.pypy.org/comparison/?exe=2%2B35,1%2B172&ben=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20&env=1&hor=false&bas=2%2B35&chart=normal+bars Cheers, Carl Friedrich Bolz, Antonio Cuni, Maciej Fijalkowski, Amaury Forgeot d'Arc, Armin Rigo and the PyPy team From reid.kleckner at gmail.com Fri Nov 26 19:33:54 2010 From: reid.kleckner at gmail.com (Reid Kleckner) Date: Fri, 26 Nov 2010 13:33:54 -0500 Subject: [Python-Dev] PyPy 1.4 released In-Reply-To: References: Message-ID: Congratulations! Excellent work. Reid On Fri, Nov 26, 2010 at 1:23 PM, Maciej Fijalkowski wrote: > =============================== > PyPy 1.4: Ouroboros in practice > =============================== > > We're pleased to announce the 1.4 release of PyPy. This is a major breakthrough > in our long journey, as PyPy 1.4 is the first PyPy release that can translate > itself faster than CPython. ?Starting today, we are using PyPy more for > our every-day development. ?So may you :) You can download it here: > > ? ?http://pypy.org/download.html > > What is PyPy > ============ > > PyPy is a very compliant Python interpreter, almost a drop-in replacement > for CPython. It's fast (`pypy 1.4 and cpython 2.6`_ comparison) > > Among its new features, this release includes numerous performance improvements > (which made fast self-hosting possible), a 64-bit JIT backend, as well > as serious stabilization. As of now, we can consider the 32-bit and 64-bit > linux versions of PyPy stable enough to run `in production`_. > > Numerous speed achievements are described on `our blog`_. Normalized speed > charts comparing `pypy 1.4 and pypy 1.3`_ as well as `pypy 1.4 and cpython 2.6`_ > are available on benchmark website. For the impatient: yes, we got a lot faster! > > More highlights > =============== > > * PyPy's built-in Just-in-Time compiler is fully transparent and > ?automatically generated; it now also has very reasonable memory > ?requirements. ?The total memory used by a very complex and > ?long-running process (translating PyPy itself) is within 1.5x to > ?at most 2x the memory needed by CPython, for a speed-up of 2x. > > * More compact instances. ?All instances are as compact as if > ?they had ``__slots__``. ?This can give programs a big gain in > ?memory. ?(In the example of translation above, we already have > ?carefully placed ``__slots__``, so there is no extra win.) > > * `Virtualenv support`_: now PyPy is fully compatible with > virtualenv_: note that > ?to use it, you need a recent version of virtualenv (>= 1.5). > > * Faster (and JITted) regular expressions - huge boost in speeding up > ?the `re` module. > > * Other speed improvements, like JITted calls to functions like map(). > > .. _virtualenv: http://pypi.python.org/pypi/virtualenv > .. _`Virtualenv support`: > http://morepypy.blogspot.com/2010/08/using-virtualenv-with-pypy.html > .. _`in production`: > http://morepypy.blogspot.com/2010/11/running-large-radio-telescope-software.html > .. _`our blog`: http://morepypy.blogspot.com > .. _`pypy 1.4 and pypy 1.3`: > http://speed.pypy.org/comparison/?exe=1%2B41,1%2B172&ben=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20&env=1&hor=false&bas=1%2B41&chart=normal+bars > .. _`pypy 1.4 and cpython 2.6`: > http://speed.pypy.org/comparison/?exe=2%2B35,1%2B172&ben=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20&env=1&hor=false&bas=2%2B35&chart=normal+bars > > Cheers, > > Carl Friedrich Bolz, Antonio Cuni, Maciej Fijalkowski, > Amaury Forgeot d'Arc, Armin Rigo and the PyPy team > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/reid.kleckner%40gmail.com > From brian.curtin at gmail.com Fri Nov 26 19:52:22 2010 From: brian.curtin at gmail.com (Brian Curtin) Date: Fri, 26 Nov 2010 12:52:22 -0600 Subject: [Python-Dev] [Python-checkins] r86817 - python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py In-Reply-To: <20101126184428.E04A0EE984@mail.python.org> References: <20101126184428.E04A0EE984@mail.python.org> Message-ID: On Fri, Nov 26, 2010 at 12:44, hirokazu.yamamoto wrote: > Author: hirokazu.yamamoto > Date: Fri Nov 26 19:44:28 2010 > New Revision: 86817 > > Log: > Now can reproduce the error on AMD64 Windows Server 2008 > even where os.symlink is not supported. > > > Modified: > python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py > > Modified: python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py > > ============================================================================== > --- python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py > (original) > +++ python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py Fri > Nov 26 19:44:28 2010 > @@ -271,24 +271,32 @@ > shutil.rmtree(src_dir) > shutil.rmtree(os.path.dirname(dst_dir)) > > - @support.skip_unless_symlink > + @unittest.skipUnless(hasattr(os, 'link'), 'requires os.link') > def test_dont_copy_file_onto_link_to_itself(self): > # bug 851123. > os.mkdir(TESTFN) > src = os.path.join(TESTFN, 'cheese') > dst = os.path.join(TESTFN, 'shop') > try: > - f = open(src, 'w') > - f.write('cheddar') > - f.close() > - > - if hasattr(os, "link"): > - os.link(src, dst) > - self.assertRaises(shutil.Error, shutil.copyfile, src, dst) > - with open(src, 'r') as f: > - self.assertEqual(f.read(), 'cheddar') > - os.remove(dst) > + with open(src, 'w') as f: > + f.write('cheddar') > + os.link(src, dst) > + self.assertRaises(shutil.Error, shutil.copyfile, src, dst) > + with open(src, 'r') as f: > + self.assertEqual(f.read(), 'cheddar') > + os.remove(dst) > + finally: > + shutil.rmtree(TESTFN, ignore_errors=True) > > + @support.skip_unless_symlink > + def test_dont_copy_file_onto_symlink_to_itself(self): > + # bug 851123. > + os.mkdir(TESTFN) > + src = os.path.join(TESTFN, 'cheese') > + dst = os.path.join(TESTFN, 'shop') > + try: > + with open(src, 'w') as f: > + f.write('cheddar') > # Using `src` here would mean we end up with a symlink pointing > # to TESTFN/TESTFN/cheese, while it should point at > # TESTFN/cheese. > @@ -298,10 +306,7 @@ > self.assertEqual(f.read(), 'cheddar') > os.remove(dst) > finally: > - try: > - shutil.rmtree(TESTFN) > - except OSError: > - pass > + shutil.rmtree(TESTFN, ignore_errors=True) > > @support.skip_unless_symlink > def test_rmtree_on_symlink(self): You might be working on something slightly different, but I have an issue created for the failure of that test: http://bugs.python.org/issue10540 It slipped past me because I was only running the test suite as a regular user without the required symlink privilege, so the test was skipped. That Server 2008 build slave runs the test suite as administrator, so it was running that test and going into the os.link block, which it didn't do until r86733. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ocean-city at m2.ccsnet.ne.jp Fri Nov 26 20:45:18 2010 From: ocean-city at m2.ccsnet.ne.jp (Hirokazu Yamamoto) Date: Sat, 27 Nov 2010 04:45:18 +0900 Subject: [Python-Dev] [Python-checkins] r86817 - python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py In-Reply-To: References: <20101126184428.E04A0EE984@mail.python.org> Message-ID: <4CF00E4E.6030507@m2.ccsnet.ne.jp> On 2010/11/27 3:52, Brian Curtin wrote: > On Fri, Nov 26, 2010 at 12:44, hirokazu.yamamoto > wrote: > >> Author: hirokazu.yamamoto >> Date: Fri Nov 26 19:44:28 2010 >> New Revision: 86817 >> >> Log: >> Now can reproduce the error on AMD64 Windows Server 2008 >> even where os.symlink is not supported. >> >> >> Modified: >> python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py >> >> Modified: python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py >> >> ============================================================================== >> --- python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py >> (original) >> +++ python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py Fri >> Nov 26 19:44:28 2010 >> @@ -271,24 +271,32 @@ >> shutil.rmtree(src_dir) >> shutil.rmtree(os.path.dirname(dst_dir)) >> >> - @support.skip_unless_symlink >> + @unittest.skipUnless(hasattr(os, 'link'), 'requires os.link') >> def test_dont_copy_file_onto_link_to_itself(self): >> # bug 851123. >> os.mkdir(TESTFN) >> src = os.path.join(TESTFN, 'cheese') >> dst = os.path.join(TESTFN, 'shop') >> try: >> - f = open(src, 'w') >> - f.write('cheddar') >> - f.close() >> - >> - if hasattr(os, "link"): >> - os.link(src, dst) >> - self.assertRaises(shutil.Error, shutil.copyfile, src, dst) >> - with open(src, 'r') as f: >> - self.assertEqual(f.read(), 'cheddar') >> - os.remove(dst) >> + with open(src, 'w') as f: >> + f.write('cheddar') >> + os.link(src, dst) >> + self.assertRaises(shutil.Error, shutil.copyfile, src, dst) >> + with open(src, 'r') as f: >> + self.assertEqual(f.read(), 'cheddar') >> + os.remove(dst) >> + finally: >> + shutil.rmtree(TESTFN, ignore_errors=True) >> >> + @support.skip_unless_symlink >> + def test_dont_copy_file_onto_symlink_to_itself(self): >> + # bug 851123. >> + os.mkdir(TESTFN) >> + src = os.path.join(TESTFN, 'cheese') >> + dst = os.path.join(TESTFN, 'shop') >> + try: >> + with open(src, 'w') as f: >> + f.write('cheddar') >> # Using `src` here would mean we end up with a symlink pointing >> # to TESTFN/TESTFN/cheese, while it should point at >> # TESTFN/cheese. >> @@ -298,10 +306,7 @@ >> self.assertEqual(f.read(), 'cheddar') >> os.remove(dst) >> finally: >> - try: >> - shutil.rmtree(TESTFN) >> - except OSError: >> - pass >> + shutil.rmtree(TESTFN, ignore_errors=True) >> >> @support.skip_unless_symlink >> def test_rmtree_on_symlink(self): > > > You might be working on something slightly different, but I have an issue > created for the failure of that test: http://bugs.python.org/issue10540 > > It slipped past me because I was only running the test suite as a regular > user without the required symlink privilege, so the test was skipped. That > Server 2008 build slave runs the test suite as administrator, so it was > running that test and going into the os.link block, which it didn't do until > r86733. I'm not sure, but why does os.path.samefile return False for hard link on windows? MSDN says, > A hard link is the file system representation of a file by which more > than one path references a single file in the same volume. (http://msdn.microsoft.com/en-us/library/aa365006%28VS.85%29.aspx) I know st_ino on windows is a bit different from POSIX, so, just I'm not sure. ;-) From brian.curtin at gmail.com Fri Nov 26 21:02:29 2010 From: brian.curtin at gmail.com (Brian Curtin) Date: Fri, 26 Nov 2010 14:02:29 -0600 Subject: [Python-Dev] [Python-checkins] r86817 - python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py In-Reply-To: <4CF00E4E.6030507@m2.ccsnet.ne.jp> References: <20101126184428.E04A0EE984@mail.python.org> <4CF00E4E.6030507@m2.ccsnet.ne.jp> Message-ID: On Fri, Nov 26, 2010 at 13:45, Hirokazu Yamamoto wrote: > On 2010/11/27 3:52, Brian Curtin wrote: > >> On Fri, Nov 26, 2010 at 12:44, hirokazu.yamamoto< >> python-checkins at python.org >> >>> wrote: >>> >> >> Author: hirokazu.yamamoto >>> Date: Fri Nov 26 19:44:28 2010 >>> New Revision: 86817 >>> >>> Log: >>> Now can reproduce the error on AMD64 Windows Server 2008 >>> even where os.symlink is not supported. >>> >>> >>> Modified: >>> python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py >>> >>> Modified: python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py >>> >>> >>> ============================================================================== >>> --- python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py >>> (original) >>> +++ python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py >>> Fri >>> Nov 26 19:44:28 2010 >>> @@ -271,24 +271,32 @@ >>> shutil.rmtree(src_dir) >>> shutil.rmtree(os.path.dirname(dst_dir)) >>> >>> - @support.skip_unless_symlink >>> + @unittest.skipUnless(hasattr(os, 'link'), 'requires os.link') >>> def test_dont_copy_file_onto_link_to_itself(self): >>> # bug 851123. >>> os.mkdir(TESTFN) >>> src = os.path.join(TESTFN, 'cheese') >>> dst = os.path.join(TESTFN, 'shop') >>> try: >>> - f = open(src, 'w') >>> - f.write('cheddar') >>> - f.close() >>> - >>> - if hasattr(os, "link"): >>> - os.link(src, dst) >>> - self.assertRaises(shutil.Error, shutil.copyfile, src, >>> dst) >>> - with open(src, 'r') as f: >>> - self.assertEqual(f.read(), 'cheddar') >>> - os.remove(dst) >>> + with open(src, 'w') as f: >>> + f.write('cheddar') >>> + os.link(src, dst) >>> + self.assertRaises(shutil.Error, shutil.copyfile, src, dst) >>> + with open(src, 'r') as f: >>> + self.assertEqual(f.read(), 'cheddar') >>> + os.remove(dst) >>> + finally: >>> + shutil.rmtree(TESTFN, ignore_errors=True) >>> >>> + @support.skip_unless_symlink >>> + def test_dont_copy_file_onto_symlink_to_itself(self): >>> + # bug 851123. >>> + os.mkdir(TESTFN) >>> + src = os.path.join(TESTFN, 'cheese') >>> + dst = os.path.join(TESTFN, 'shop') >>> + try: >>> + with open(src, 'w') as f: >>> + f.write('cheddar') >>> # Using `src` here would mean we end up with a symlink >>> pointing >>> # to TESTFN/TESTFN/cheese, while it should point at >>> # TESTFN/cheese. >>> @@ -298,10 +306,7 @@ >>> self.assertEqual(f.read(), 'cheddar') >>> os.remove(dst) >>> finally: >>> - try: >>> - shutil.rmtree(TESTFN) >>> - except OSError: >>> - pass >>> + shutil.rmtree(TESTFN, ignore_errors=True) >>> >>> @support.skip_unless_symlink >>> def test_rmtree_on_symlink(self): >>> >> >> >> You might be working on something slightly different, but I have an issue >> created for the failure of that test: http://bugs.python.org/issue10540 >> >> It slipped past me because I was only running the test suite as a regular >> user without the required symlink privilege, so the test was skipped. That >> Server 2008 build slave runs the test suite as administrator, so it was >> running that test and going into the os.link block, which it didn't do >> until >> r86733. >> > > I'm not sure, but why does os.path.samefile return False for hard link > on windows? MSDN says, > > > A hard link is the file system representation of a file by which more > > than one path references a single file in the same volume. > (http://msdn.microsoft.com/en-us/library/aa365006%28VS.85%29.aspx) > > I know st_ino on windows is a bit different from POSIX, so, just I'm not > sure. ;-) The samefile thing, I don't know either. GetFinalPathNameByHandle does not appear to work with hard links, at least how it's being used right now. It has no problem with symlinks. We briefly chatted about this on the os.link feature issue, but I never found a way around it. I'll look into it this weekend. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ocean-city at m2.ccsnet.ne.jp Fri Nov 26 21:18:58 2010 From: ocean-city at m2.ccsnet.ne.jp (Hirokazu Yamamoto) Date: Sat, 27 Nov 2010 05:18:58 +0900 Subject: [Python-Dev] [Python-checkins] r86817 - python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py In-Reply-To: References: <20101126184428.E04A0EE984@mail.python.org> <4CF00E4E.6030507@m2.ccsnet.ne.jp> Message-ID: <4CF01632.8070504@m2.ccsnet.ne.jp> On 2010/11/27 5:02, Brian Curtin wrote: > We briefly chatted about this on the os.link > feature issue, but I never found a way around it. How about implementing os.path.samefile in Modules/posixmodule.c like this? http://bugs.python.org/file19262/py3k_fix_kill_python_for_short_path.patch # I hope this works. From brian.curtin at gmail.com Fri Nov 26 21:31:49 2010 From: brian.curtin at gmail.com (Brian Curtin) Date: Fri, 26 Nov 2010 14:31:49 -0600 Subject: [Python-Dev] [Python-checkins] r86817 - python/branches/py3k-stat-on-windows/Lib/test/test_shutil.py In-Reply-To: <4CF01632.8070504@m2.ccsnet.ne.jp> References: <20101126184428.E04A0EE984@mail.python.org> <4CF00E4E.6030507@m2.ccsnet.ne.jp> <4CF01632.8070504@m2.ccsnet.ne.jp> Message-ID: On Fri, Nov 26, 2010 at 14:18, Hirokazu Yamamoto wrote: > On 2010/11/27 5:02, Brian Curtin wrote: > >> We briefly chatted about this on the os.link >> feature issue, but I never found a way around it. >> > > How about implementing os.path.samefile in > Modules/posixmodule.c like this? > > http://bugs.python.org/file19262/py3k_fix_kill_python_for_short_path.patch > > # I hope this works. > That's almost identical to what the current os.path.sameopenfile is. Lib/ntpath.py opens both files, then compares them via _getfileinformation. That function is implemented to take in a file descriptor, call GetFileInformationByHandle with it, then returns a tuple of dwVolumeSerialNumber, nFileIndexHigh, and nFileIndexLow. -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Fri Nov 26 21:39:36 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 26 Nov 2010 21:39:36 +0100 Subject: [Python-Dev] Removal of Win32 ANSI API In-Reply-To: <4CEFE16E.6040801@m2.ccsnet.ne.jp> References: <4CDC14C0.6070300@m2.ccsnet.ne.jp> <201011121308.30368.victor.stinner@haypocalc.com> <4CDEBB11.5050209@m2.ccsnet.ne.jp> <201011140106.55153.victor.stinner@haypocalc.com> <4CEFE16E.6040801@m2.ccsnet.ne.jp> Message-ID: <4CF01B08.9000409@v.loewis.de> > Is it possible a conversion from bytes (ANSI) to unicode fails on > windows? It should fail sometimes, right? Not for windows-1252, but certainly for shift-jis (you know better than me). It seems that whether MultiByteToWideChar will fail depends on whether MB_ERR_INVALID_CHARS is given or not. I don't know what it will do if this flag is not given - my guess it fills in REPLACEMENT CHARACTER. > If not, is it allowed to convert to unicode with > PyUnicode_FSDecoder if function doesn't return str? For example, > os.stat() takes str as arguments but doesn't return str. This I don't understand. os.stat doesn't return text at all - so what do you want to convert? > # I noticed win_readlink() in Modules/posixmodule.c already unicode > # only. Maybe not so much problem? ;-) Well, readlink is new on Windows, and symlinks are not widespread. So there is no backwards compatibility concern here. Regards, Martin From ncoghlan at gmail.com Sat Nov 27 08:35:52 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Nov 2010 17:35:52 +1000 Subject: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS In-Reply-To: References: <20101123203252.39BE7EE9CF@mail.python.org> <4CEC43A4.80907@netwok.org> <4CEC4917.2070508@udel.edu> Message-ID: On Thu, Nov 25, 2010 at 5:25 AM, Terry Reedy wrote: > I know now that I could always edit with IDLE's editor, but it is a lot > easier to right click and select edit than it is to run thru the directory > tree in an open dialog. If you want a decent free text editor on Windows, the open source Notepad++ does a very nice job. It also adds an "Edit with Notepad++" to the explorer context menu :) > And of course, since the pseudo-BOM addition is > undocumented within notepad itself, and probably other editors, it is easy > to not know. As far as the implicit BOM addition itself goes, reindent.py and reindent-rst.py could probably be updated to check for it, but the miscellaneous files (like ACKS) are likely to continue to need manual checks. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stephen at xemacs.org Sat Nov 27 09:48:52 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 27 Nov 2010 17:48:52 +0900 Subject: [Python-Dev] len(chr(i)) = 2? In-Reply-To: References: <201011192123.14169.victor.stinner@haypocalc.com> <4CE6F93F.9010109@egenix.com> <4CE6FE30.1050903@v.loewis.de> <87hbfc1vnf.fsf@uwakimon.sk.tsukuba.ac.jp> <4CE78F62.7060707@v.loewis.de> <8739qukf9r.fsf@uwakimon.sk.tsukuba.ac.jp> <20101121173825.B1BFB235977@kimball.webabinitio.net> <60F8726F-C1C2-4803-8B8E-688EF0443FA0@gmail.com> <87eiadd46t.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEC5316.4010608@canterbury.ac.nz> <77AAC178-F868-4F05-8509-4A9FB66F61EC@fuhm.net> <87sjyrbftz.fsf@uwakimon.sk.tsukuba.ac.jp> <635C265A-90A8-4B92-A65C-59EF3E8EFD68@twistedmatrix.com> <87oc9fb97b.fsf@uwakimon.sk.tsukuba.ac.jp> <3C1ADB64-63F3-4165-926D-EDE9846E0DBD@fuhm.net> <87mxozayam.fsf@uwakimon.sk.tsukuba.ac.jp> <4CEDCB86.9030506@canterbury.ac.nz> <87ipzm6oqr.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87y68f5eyz.fsf@uwakimon.sk.tsukuba.ac.jp> Glyph Lefkowitz writes: > But I don't think that anyone is filling up main memory with > gigantic piles of character indexes and need to squeeze out that > extra couple of bytes of memory on such a tiny object. How do you think editors and browsers represent the regions that they highlight, then? How do you think that structure-oriented editors represent the structures that they work with, then? In a detailed analysis of a C or Java file, it's easy to end up with almost 1:2 positions to characters ratio. Note that *buffer* characters are typically smaller than a platform word, so saving one word in the representation of a position mean a 100% or more increase in the character count of the buffer. Even in the case of UCS-4 on a 32-bit platform, that's a 50% increase in the maximum usable size of a buffer before a parser starts raising OOM errors. There are two plausible ways to represent these structures that I can think of offhand. The first is to do it the way Emacs does, by reading the text into a buffer and using position offsets to map to display or structure attributes. The second is to use a hierarchical document model, and render the display by traversing the document hierarchy. It's not obvious to me that forcing use of the second representation is a good idea for performance in an editor, and I would think that they have similar memory requirements. > Plus, this would allow such a user to stop copying the character > data itself just to decode it, and on mostly-ascii UTF-8 text (a > common use-case) this is a 2x savings right off the bat. Which only matters if you're a server in the business of shoveling octets really fast but are CPU bound (seems unlikely to me, but I'm no expert; WDYT?), and even then is only that big a savings if you can push off the issue of validating the purported UTF-8 text on others. If you're not validating, you may as well acknowledge that you're processing binary data, not text.[1] But we're talking about text. And of course, if you copy mostly-Han UTF-8 text (a common use-case) to UCS-2, this is a 1.5x memory savings right off the bat, and a 3x time savings when iterating in most architectures (one increment operation per character instead of three). As I've already said, I don't think this is an argument in favor of either representation. Sometimes one wins, sometimes the other. I don't think supplying both is a great idea, although I've proposed it myself for XEmacs (but made as opaque as possible). > > In Python it's true that markers can use the same data structure as > > integers and simply provide different methods, and it's arguable that > > Python's design is better. But if you use bytes internally, then you > > have problems. > > No, you just have design questions. Call them what you like, they're as yet unanswered. In any given editing scenario, I'd concede that it's a "SMOD". But if you're designing a language for text processing, it's a restriction that I believe to be a hindrance to applications. Many applications may prefer to use a straightforward array implementation of text and focus their design efforts on the real problems of their use cases. > > Do you expose that byte value to the user? If so, what do you do > > if the user specifies a byte value that points into a multibyte > > character? > > Go to the beginning of the multibyte character. Report that > position; if the user then asks the requested marker object for its > position, it will report that byte offset, not the > originally-requested one. (Obviously, do the same thing for > surrogate pair code points.) I will guarantee that some use cases will prefer that you go to the beginning of the *next* character. For an obvious example, your algorithm will infloop if you iterate "pos += 1". (And the opposite problem appears for "beginning of next character" combined with "pos -= 1".) Of course this trivial example is easily addressed by saying "the user should be using the character iterator API here", but I expect the issue can arise where that is not an easy answer. Either the API becomes complex, or the user/developers will have to do complex bookkeeping that should be done by the text implementation. Nor is it obvious that surrogate pairs will be present in a UCS-2 representation. Specifically, they can be encoded to single private space characters in almost all applications, at a very small cost in performance. > > What if the user wants to specify position by number of > > characters? > > Part of the point that we are trying to make here is that nobody > really cares about that use-case. In order to know anything useful > about a position in a text, you have to have traversed to that > location in the text. Binary search of an ordered text is useful. Granted, this particular example can be addressed usefully in terms of byte positions (viz. your example of less), but your basic premise is falsified. > You can remember interesting things like the offsets of starts of > lines, or the x/y positions of characters. > > > Can you translate efficiently? > > No, because there's no point :). But you _could_ implement an > overlay that cached things like the beginning of lines, or the x/y > positions of interesting characters. Emacs does, and a lot of effort has gone into it, and it still sucks compared to an array representation. Maybe _you_ _could_ do better, but as yet we haven't managed to pull it off. :-( > > But I think it would be hard to implement an efficient > > text-processing *language*, eg, a Python module for *full > > conformance* in handling Unicode, on top of UTF-8. > > Still: why? I guess if I have some free time I'll try my hand at > it, and maybe I'll run into a wall and realize you're right :). I'd rather have you make it plausible to me that there's no point in having efficient access to arbitrary character positions. Then maybe you can delegate that implementation to me. :-) But my Emacs experience says otherwise, and IIUC the intuition and/or experience of MAL and Guido says this is not a YAGNI. > > Any time you have an algorithm that requires efficient access to > > arbitrary text positions, you'll spend all your skull sweat > > fighting the representation. At least, that's been my experience > > with Emacsen. > > What sort of algorithm would that be, though? The main thing that > I could think of is a text editor trying to efficiently allow the > user to scroll to the middle of a large file without reading the > whole thing into memory. Reading into memory or not is a red herring, I think. For many legacy encodings you have to pretty much read the whole thing because they are stateful, and it's just not very expensive compared to the text processing itself (unless your application is shoveling octets as fast as possible, in which case character positions are indeed a YAGNI). The question is whether opaque markers are always sufficient. For example, XEmacs does use byte positions internally for markers and extents (objects representing regions of text that can carry arbitrary properties but are tuned for display properties). Obviously, we have the marker objects you propose as sufficient, and indeed the representation is as efficient as you claim. However, these positions are not exposed as integers to end users, Lisp, or even most of the C code. If a client (end user or code) requests a position, they get a character position. Such requests are frequent enough that they constitute a major drag on many practical applications. It may be that this is unnecessary, as less shows for its application. But less is not an editor, let alone a language for writing editors. Do you know of an editor language of power comparable to Emacs Lisp that is not based on an array representation of text? > Is it really the representation as byte positions which is fragile > (i.e. the internal implementation detail), or the exposure of that > position to calling code, and the idiomatic usage of that number as > an integer? It's the latter. Sufficient effort can make it safe to use byte positions, and the effort is not all that great as long as you don't demand efficiency. XEmacs vs. Emacs implementation of Mule demonstrates that. We at XEmacs never did expose byte positions to even the C code (other than to buffer and string methods), and that implementation has not had to change much, if at all, in 15 years. The caching mechanism to make character position access reasonably efficient, however, has been buggy and not so efficient, and so complex that RMS said "I was going to implement your [position cache] in Emacs but it was too hard for me to understand". (OTOH, the alternative Emacs had implemented turned out to be O(n**2) or worse, so he had to replace it. Translating byte positions to character positions seems to be a real loser.) Emacs did expose byte positions for efficiency reasons, and has had at least four regressions of the "\201 bug". "\201" prefixes a Latin-1 character in internal code, and code that treated byte positions would often result in this being duplicated because all trailing bytes in Mule code are also Latin-1 code points. (Don't ask me about the exact mechanism, XEmacs's implementation is quite different and never suffered from this bug.) Note that a \201-like bug is very unlikely to occur in Python's UCS-2 representation because the semantics of surrogate values in Unicode is unambiguous. However, I believe similar bugs would be possible in a UTF-8 representation -- if code is allowed to choose whether to view UTF-8 in binary or text mode -- because trailing byte values are Latin-1 code points. Maybe I'm just an old granny, scared of my shadow. Footnotes: [1] I have no objection to providing "text" algorithms (such as regexps) for use on "binary" data. But then they don't provide any guarantees that transformations of purported text remains text. From ncoghlan at gmail.com Sat Nov 27 11:51:38 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Nov 2010 20:51:38 +1000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CED4E34.5060400@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> Message-ID: On Thu, Nov 25, 2010 at 3:41 AM, Michael Foord wrote: > Can you explain what you see as the difference? > > I'm not particularly interested in type validation but I like the fact that > typical enum APIs allow you to group constants: the generated constant class > acts as a namespace for all the defined constants. The problem with blessing one particular "enum API" is that people have so many different ideas as to what an enum API should look like. However, the one thing they all have in common is the ability to take a value and give it a name, then present *both* of those in debugging information. > Are you just suggesting something along the lines of: > > class NamedConstant(int): > def __new__(cls, name, val): > return int.__new__(cls, val) > > def __init__(self, name, val): > self._name = name > > def __repr__(self): > return ' ' % self._name > > FOO = NamedConstant('FOO', 3) > > In general the less features the better, but I'd like a few more features > than that. :-) Not quite. I'm suggesting a factory function that works for any value, and derives the parent class from the type of the supplied value. However, what you wrote is still the essence of the idea - we would be primarily providing a building block that makes it easier for people to *create* enum APIs if they want to, but for simple use cases (where all they really wanted was the enhanced debugging information) they wouldn't need to bother. In the standard library, wherever we do "enum-like things" we would switch to using named values where it makes sense to do so. Doing so may actually make sense for more than just constants - it may make sense for significant mutable globals as well. ========================================================================== # Implementation (more than just a sketch, since it handles some interesting corner cases) import functools @functools.lru_cache() def _make_named_value_type(base_type): class _NamedValueType(base_type): def __new__(cls, name, value): return base_type.__new__(cls, value) def __init__(self, name, value): self.__name = name super().__init__(value) @property def _name(self): return self.__name def _raw(self): return base_type(self) def __repr__(self): return "{}={}".format(self._name, super().__repr__()) if base_type.__str__ is object.__str__: __str__ = base_type.__repr__ _NamedValueType.__name__ = "Named<{}>".format(base_type.__name__) return _NamedValueType def named_value(name, value): return _make_named_value_type(type(value))(name, value) def set_named_values(namespace, **kwds): for k, v in kwds.items(): namespace[k] = named_value(k, v) x = named_value("FOO", 1) y = named_value("BAR", "Hello World!") z = named_value("BAZ", dict(a=1, b=2, c=3)) print(x, y, z, sep="\n") print("\n".join(map(repr, (x, y, z)))) print("\n".join(map(str, map(type, (x, y, z))))) set_named_values(globals(), foo=x._raw(), bar=y._raw(), baz=z._raw()) print("\n".join(map(repr, (foo, bar, baz)))) print(type(x) is type(foo), type(y) is type(bar), type(z) is type(baz)) ========================================================================== # Session output for the last 6 lines >>> print(x, y, z, sep="\n") 1 Hello World! {'a': 1, 'c': 3, 'b': 2} >>> print("\n".join(map(repr, (x, y, z)))) FOO=1 BAR='Hello World!' BAZ={'a': 1, 'c': 3, 'b': 2} >>> print("\n".join(map(str, map(type, (x, y, z))))) '> '> '> >>> set_named_values(globals(), foo=x._raw(), bar=y._raw(), baz=z._raw()) >>> print("\n".join(map(repr, (foo, bar, baz)))) foo=1 bar='Hello World!' baz={'a': 1, 'c': 3, 'b': 2} >>> print(type(x) is type(foo), type(y) is type(bar), type(z) is type(baz)) True True True For "normal" use, such objects would look like ordinary instances of their class. They would only behave differently when their representation is printed (prepending their name), or when their type is interrogated (being an instance of the named subclass rather than the ordinary type). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Nov 27 13:05:32 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Nov 2010 22:05:32 +1000 Subject: [Python-Dev] [Preview] Comments and change proposals on documentation In-Reply-To: References: Message-ID: On Thu, Nov 25, 2010 at 6:24 AM, Georg Brandl wrote: > Hi, > > at , you can look at a version of the 3.2 > docs that has the upcoming commenting feature. ?JavaScript is mandatory. Very nice! I'm not sure what to do about the discoverability of the comment bubbles as the end of each paragraph. I initially thought commenting wasn't available on What's New or the Using Python docs until seeing where the blue comment bubbles appeared in the math module docs. A discreet notice at the bottom of the sidebar and/or an explanation at the "Report a Bug" page may cover it I guess. > Please test on a smaller page, such as , > there is currently a speed issue with larger pages. ?(Helpful tips from > JS experts are welcome.) I gave the JS a fair few comments on the first paragraph to digest. I also put my detailed UI comments there as well (I needed something to write about while testing, so I figured I may as well make it useful to you!) > Other things I have to do before this can go live: > > * reuse existing logins from either wiki or tracker? Tracker sounds like the best bet to me. > Any feedback is appreciated (I'd suggest mailing it to doc-SIG only, to avoid > cluttering up python-dev). My comments may on the math module may give you a chance to see how easy it is to get text out of comments into a form suitable for sending to a mailing list or posting to a tracker issue for further discussion :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Nov 27 13:17:31 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Nov 2010 22:17:31 +1000 Subject: [Python-Dev] [Python-checkins] r86745 - in python/branches/py3k: Doc/library/difflib.rst Lib/difflib.py Lib/test/test_difflib.py Misc/NEWS In-Reply-To: <20101125061234.F1CC3EEA23@mail.python.org> References: <20101125061234.F1CC3EEA23@mail.python.org> Message-ID: On Thu, Nov 25, 2010 at 4:12 PM, terry.reedy wrote: > ?The :class:`SequenceMatcher` class has this constructor: > > > -.. class:: SequenceMatcher(isjunk=None, a='', b='') > +.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True) > > ? ?Optional argument *isjunk* must be ``None`` (the default) or a one-argument > ? ?function that takes a sequence element and returns true if and only if the > @@ -340,6 +349,9 @@ > ? ?The optional arguments *a* and *b* are sequences to be compared; both default to > ? ?empty strings. ?The elements of both sequences must be :term:`hashable`. > > + ? The optional argument *autojunk* can be used to disable the automatic junk > + ? heuristic. > + Catching up on checkins traffic, so a later checkin may already fix this, but there should be a versionchanged tag in the docs to note when the autojunk parameter was added. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Nov 27 13:22:50 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Nov 2010 22:22:50 +1000 Subject: [Python-Dev] [Python-checkins] r86750 - python/branches/py3k/Demo/curses/life.py In-Reply-To: <20101126021524.GA1450@rubuntu> References: <20101125145644.D98FAEEA26@mail.python.org> <4CEF0E3B.2070608@netwok.org> <20101126021524.GA1450@rubuntu> Message-ID: On Fri, Nov 26, 2010 at 12:15 PM, Senthil Kumaran wrote: >> Re: ?colour?: the rest of the file use US English, as do the function >> names (see for example curses.has_color). ?It?s good to use one dialect >> consistently in one file. > > Good catch. Did not realize it because, we write it as colour too. > Changing it. I just resign myself to having to spell words like colour and serialise wrong when I'm working on Python. Compared to the adjustments the non-native English speakers have to make, I figure I'm getting off lightly ;) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Sat Nov 27 13:52:40 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 27 Nov 2010 12:52:40 +0000 Subject: [Python-Dev] [Python-checkins] r86750 - python/branches/py3k/Demo/curses/life.py In-Reply-To: References: <20101125145644.D98FAEEA26@mail.python.org> <4CEF0E3B.2070608@netwok.org> <20101126021524.GA1450@rubuntu> Message-ID: <4CF0FF18.4030408@voidspace.org.uk> On 27/11/2010 12:22, Nick Coghlan wrote: > On Fri, Nov 26, 2010 at 12:15 PM, Senthil Kumaran wrote: >>> Re: ?colour?: the rest of the file use US English, as do the function >>> names (see for example curses.has_color). It?s good to use one dialect >>> consistently in one file. >> Good catch. Did not realize it because, we write it as colour too. >> Changing it. > I just resign myself to having to spell words like colour and > serialise wrong when I'm working on Python. Compared to the > adjustments the non-native English speakers have to make, I figure I'm > getting off lightly ;) > I *thought* that the Python policy was that English speakers wrote documentation in English and American speakers wrote documentation in American and that we *don't* insist on US spellings in the Python documentation? Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From eliben at gmail.com Sat Nov 27 14:00:27 2010 From: eliben at gmail.com (Eli Bendersky) Date: Sat, 27 Nov 2010 15:00:27 +0200 Subject: [Python-Dev] [Python-checkins] r86745 - in python/branches/py3k: Doc/library/difflib.rst Lib/difflib.py Lib/test/test_difflib.py Misc/NEWS In-Reply-To: References: <20101125061234.F1CC3EEA23@mail.python.org> Message-ID: On Sat, Nov 27, 2010 at 14:17, Nick Coghlan wrote: > On Thu, Nov 25, 2010 at 4:12 PM, terry.reedy > wrote: > > The :class:`SequenceMatcher` class has this constructor: > > > > > > -.. class:: SequenceMatcher(isjunk=None, a='', b='') > > +.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True) > > > > Optional argument *isjunk* must be ``None`` (the default) or a > one-argument > > function that takes a sequence element and returns true if and only if > the > > @@ -340,6 +349,9 @@ > > The optional arguments *a* and *b* are sequences to be compared; both > default to > > empty strings. The elements of both sequences must be > :term:`hashable`. > > > > + The optional argument *autojunk* can be used to disable the automatic > junk > > + heuristic. > > + > > Catching up on checkins traffic, so a later checkin may already fix > this, but there should be a versionchanged tag in the docs to note > when the autojunk parameter was added. > Hi Nick, Since autojunk was added in 2.7.1 (the docs of which do indicate this is the versionchanged tag), I think Terry may have left the tag in 3.2 out on purpose. That said, personally I don't know what the policy is regarding features added just in 3.2 and 2.7 (and didn't exist in 3.1) in this respect. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at voidspace.org.uk Sat Nov 27 14:02:36 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 27 Nov 2010 13:02:36 +0000 Subject: [Python-Dev] [Python-checkins] r86745 - in python/branches/py3k: Doc/library/difflib.rst Lib/difflib.py Lib/test/test_difflib.py Misc/NEWS In-Reply-To: References: <20101125061234.F1CC3EEA23@mail.python.org> Message-ID: <4CF1016C.8050902@voidspace.org.uk> On 27/11/2010 13:00, Eli Bendersky wrote: > On Sat, Nov 27, 2010 at 14:17, Nick Coghlan > wrote: > > On Thu, Nov 25, 2010 at 4:12 PM, terry.reedy > > > wrote: > > The :class:`SequenceMatcher` class has this constructor: > > > > > > -.. class:: SequenceMatcher(isjunk=None, a='', b='') > > +.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True) > > > > Optional argument *isjunk* must be ``None`` (the default) or > a one-argument > > function that takes a sequence element and returns true if > and only if the > > @@ -340,6 +349,9 @@ > > The optional arguments *a* and *b* are sequences to be > compared; both default to > > empty strings. The elements of both sequences must be > :term:`hashable`. > > > > + The optional argument *autojunk* can be used to disable the > automatic junk > > + heuristic. > > + > > Catching up on checkins traffic, so a later checkin may already fix > this, but there should be a versionchanged tag in the docs to note > when the autojunk parameter was added. > > > Hi Nick, > > Since autojunk was added in 2.7.1 (the docs of which do indicate this > is the versionchanged tag), I think Terry may have left the tag in 3.2 > out on purpose. That said, personally I don't know what the policy is > regarding features added just in 3.2 and 2.7 (and didn't exist in 3.1) > in this respect. Features new in Python 3.2 that didn't exist in 3.1 should have a versionadded:: 3.2 tag. Michael > > Eli > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at voidspace.org.uk Sat Nov 27 15:01:22 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 27 Nov 2010 14:01:22 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> Message-ID: <4CF10F32.9020805@voidspace.org.uk> On 27/11/2010 10:51, Nick Coghlan wrote: > On Thu, Nov 25, 2010 at 3:41 AM, Michael Foord > wrote: >> Can you explain what you see as the difference? >> >> I'm not particularly interested in type validation but I like the fact that >> typical enum APIs allow you to group constants: the generated constant class >> acts as a namespace for all the defined constants. > The problem with blessing one particular "enum API" is that people > have so many different ideas as to what an enum API should look like. > There actually seemed to be quite a bit of agreement around basic functionality though. > However, the one thing they all have in common is the ability to take > a value and give it a name, then present *both* of those in debugging > information. And this is the most important functionality. I would say that the grouping (namespacing) of constants is also useful, provided by *most* Python enum APIs and easy to implement without over complexifying the API. (Note that there is no *particular* hurry to get this into 3.2 - the beta is due imminently. I wouldn't object to it ) >> Are you just suggesting something along the lines of: >> >> class NamedConstant(int): >> def __new__(cls, name, val): >> return int.__new__(cls, val) >> >> def __init__(self, name, val): >> self._name = name >> >> def __repr__(self): >> return ' ' % self._name >> >> FOO = NamedConstant('FOO', 3) >> >> In general the less features the better, but I'd like a few more features >> than that. :-) > Not quite. I'm suggesting a factory function that works for any value, > and derives the parent class from the type of the supplied value. > However, what you wrote is still the essence of the idea - we would be > primarily providing a building block that makes it easier for people > to *create* enum APIs if they want to, but for simple use cases (where > all they really wanted was the enhanced debugging information) they > wouldn't need to bother. In the standard library, wherever we do > "enum-like things" we would switch to using named values where it > makes sense to do so. > > Doing so may actually make sense for more than just constants - it may > make sense for significant mutable globals as well. Very interesting proposal (typed named values rather than just named constants). It doesn't handle flag values, which I would still like, but that only really makes sense for integers (sets can be OR'd but their representation is already understandable). Perhaps the integer named type could be special cased for that. Without the grouping functionality (associating a bunch of names together) you lose the 'from_name' functionality. Guido was in favour of this, and it is an obvious feature where you have grouping: http://mail.python.org/pipermail/python-dev/2010-November/105912.html """I expect that the API to convert between enums and bare ints should be i = int(e) and e = (i). It would be nice if s = str(e) and e = (s) would work too.""" This wouldn't work with your suggested implementation (as it is). Grouping and mutable "named values" could be inefficient and have issues around identity / equality. Maybe restrict the API to the immutable primitives. All the best, Michael > ========================================================================== > # Implementation (more than just a sketch, since it handles some > interesting corner cases) > import functools > @functools.lru_cache() > def _make_named_value_type(base_type): > class _NamedValueType(base_type): > def __new__(cls, name, value): > return base_type.__new__(cls, value) > def __init__(self, name, value): > self.__name = name > super().__init__(value) > @property > def _name(self): > return self.__name > def _raw(self): > return base_type(self) > def __repr__(self): > return "{}={}".format(self._name, super().__repr__()) > if base_type.__str__ is object.__str__: > __str__ = base_type.__repr__ > _NamedValueType.__name__ = "Named<{}>".format(base_type.__name__) > return _NamedValueType > > def named_value(name, value): > return _make_named_value_type(type(value))(name, value) > > def set_named_values(namespace, **kwds): > for k, v in kwds.items(): > namespace[k] = named_value(k, v) > > x = named_value("FOO", 1) > y = named_value("BAR", "Hello World!") > z = named_value("BAZ", dict(a=1, b=2, c=3)) > > print(x, y, z, sep="\n") > print("\n".join(map(repr, (x, y, z)))) > print("\n".join(map(str, map(type, (x, y, z))))) > > set_named_values(globals(), foo=x._raw(), bar=y._raw(), baz=z._raw()) > print("\n".join(map(repr, (foo, bar, baz)))) > print(type(x) is type(foo), type(y) is type(bar), type(z) is type(baz)) > > ========================================================================== > > # Session output for the last 6 lines >>>> print(x, y, z, sep="\n") > 1 > Hello World! > {'a': 1, 'c': 3, 'b': 2} > >>>> print("\n".join(map(repr, (x, y, z)))) > FOO=1 > BAR='Hello World!' > BAZ={'a': 1, 'c': 3, 'b': 2} > >>>> print("\n".join(map(str, map(type, (x, y, z))))) > '> > '> > '> > >>>> set_named_values(globals(), foo=x._raw(), bar=y._raw(), baz=z._raw()) >>>> print("\n".join(map(repr, (foo, bar, baz)))) > foo=1 > bar='Hello World!' > baz={'a': 1, 'c': 3, 'b': 2} > >>>> print(type(x) is type(foo), type(y) is type(bar), type(z) is type(baz)) > True True True > > For "normal" use, such objects would look like ordinary instances of > their class. They would only behave differently when their > representation is printed (prepending their name), or when their type > is interrogated (being an instance of the named subclass rather than > the ordinary type). > > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From ncoghlan at gmail.com Sat Nov 27 15:58:08 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Nov 2010 00:58:08 +1000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF10F32.9020805@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF10F32.9020805@voidspace.org.uk> Message-ID: On Sun, Nov 28, 2010 at 12:01 AM, Michael Foord wrote: > Very interesting proposal (typed named values rather than just named > constants). It doesn't handle flag values, which I would still like, but > that only really makes sense for integers (sets can be OR'd but their > representation is already understandable). Perhaps the integer named type > could be special cased for that. > > Without the grouping functionality (associating a bunch of names together) > you lose the 'from_name' functionality. Guido was in favour of this, and it > is an obvious feature where you have grouping: > http://mail.python.org/pipermail/python-dev/2010-November/105912.html > > """I expect that the API to convert between enums and bare ints should be > i = int(e) and e = (i). It would be nice if s = str(e) and > e = (s) would work too.""" Note that the "i = int(e)" and "s = str(e)" parts of Guido's expectation do work (they are, in fact, the underling implementation of the _raw() method), so an enum class would only be needed to provide the other half of the equation. The named values have no opinion on equivalence at all (they just defer to the parent class), but change the rules for identity (which are always murky anyway, since caching is optional even for immutable types). > This wouldn't work with your suggested implementation (as it is). Grouping > and mutable "named values" could be inefficient and have issues around > identity / equality. Maybe restrict the API to the immutable primitives. My proposal doesn't say anything about grouping at all - it's just an idea for "here's a standard way to associate a canonical name with a particular object, independent of the namespaces that happen to reference that object". Now, a particular *grouping* API may want to restrict itself in various ways, but that's my point. We should be looking at a standard solution for the ground level problem (i.e. the idea named_value attempts to solve) and then let various 3rd party enum/name grouping implementations flourish on top of that, rather than trying to create an all-singing all-dancing "value grouping" API (which is going to be far more intrusive than a simple API for "here's a way to give your constants and important data structures names that show up in their representations"). For example, using named_value as a primitive, you can fairly easily do: class Namegroup: # Missing lots of niceties of a real enum class, but shows the idea # as to how a real implementation could leverage named_value def __init__(self, _groupname, **kwds): self._groupname = _groupname pattern = _groupname + ".{}" self._value_map = {} for k, v in kwds.items(): attr = named_value(pattern.format(k), v) setattr(self, k, attr) self._value_map[v] = attr @classmethod def from_names(cls, groupname, *args): kwds = dict(zip(args, range(len(args)))) return cls(groupname, **kwds) def __call__(self, arg): return self._value_map[arg] silly = Namegroup.from_names("Silly", "FOO", "BAR", "BAZ") >>> silly.FOO Silly.FOO=0 >>> int(silly.FOO) 0 >>> silly(0) Silly.FOO=0 named_value deals with all the stuff to do with pretending to be the original type of object (only with an associated name), leaving the grouping API to deal with issues of creating groups of names and mapping between them and the original values in various ways. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Nov 27 16:04:17 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Nov 2010 01:04:17 +1000 Subject: [Python-Dev] [Python-checkins] r86750 - python/branches/py3k/Demo/curses/life.py In-Reply-To: <4CF0FF18.4030408@voidspace.org.uk> References: <20101125145644.D98FAEEA26@mail.python.org> <4CEF0E3B.2070608@netwok.org> <20101126021524.GA1450@rubuntu> <4CF0FF18.4030408@voidspace.org.uk> Message-ID: On Sat, Nov 27, 2010 at 10:52 PM, Michael Foord wrote: >> I just resign myself to having to spell words like colour and >> serialise wrong when I'm working on Python. Compared to the >> adjustments the non-native English speakers have to make, I figure I'm >> getting off lightly ;) >> > > I *thought* that the Python policy was that English speakers wrote > documentation in English and American speakers wrote documentation in > American and that we *don't* insist on US spellings in the Python > documentation? If we're just talking about those things in generally, then that's a reasonable rule. But when in close proximity to an actual API that uses the American spelling, or modifying a file that uses the relevant word a lot, following the prevailing style is a definite courtesy to the reader. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Sat Nov 27 16:07:18 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 27 Nov 2010 15:07:18 +0000 Subject: [Python-Dev] [Python-checkins] r86750 - python/branches/py3k/Demo/curses/life.py In-Reply-To: References: <20101125145644.D98FAEEA26@mail.python.org> <4CEF0E3B.2070608@netwok.org> <20101126021524.GA1450@rubuntu> <4CF0FF18.4030408@voidspace.org.uk> Message-ID: <4CF11EA6.8050409@voidspace.org.uk> On 27/11/2010 15:04, Nick Coghlan wrote: > On Sat, Nov 27, 2010 at 10:52 PM, Michael Foord > wrote: >>> I just resign myself to having to spell words like colour and >>> serialise wrong when I'm working on Python. Compared to the >>> adjustments the non-native English speakers have to make, I figure I'm >>> getting off lightly ;) >>> >> I *thought* that the Python policy was that English speakers wrote >> documentation in English and American speakers wrote documentation in >> American and that we *don't* insist on US spellings in the Python >> documentation? > If we're just talking about those things in generally, then that's a > reasonable rule. But when in close proximity to an actual API that > uses the American spelling, or modifying a file that uses the relevant > word a lot, following the prevailing style is a definite courtesy to > the reader. > Ok, thanks. Sounds like a good guideline. Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From ncoghlan at gmail.com Sat Nov 27 16:07:35 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Nov 2010 01:07:35 +1000 Subject: [Python-Dev] [Python-checkins] r86745 - in python/branches/py3k: Doc/library/difflib.rst Lib/difflib.py Lib/test/test_difflib.py Misc/NEWS In-Reply-To: <4CF1016C.8050902@voidspace.org.uk> References: <20101125061234.F1CC3EEA23@mail.python.org> <4CF1016C.8050902@voidspace.org.uk> Message-ID: On Sat, Nov 27, 2010 at 11:02 PM, Michael Foord wrote: > Features new in Python 3.2 that didn't exist in 3.1 should have a > versionadded:: 3.2 tag. As Michael said, from a docs point of view, the version flow is independent: "2.6 -> 2.7" and "3.1 -> 3.2". The issue has really only come up with this release, since there was no intervening 2.x release between 3.0 and 3.1. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From barry at python.org Sat Nov 27 19:22:16 2010 From: barry at python.org (Barry Warsaw) Date: Sat, 27 Nov 2010 13:22:16 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF10F32.9020805@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF10F32.9020805@voidspace.org.uk> Message-ID: <20101127132216.533f7332@mission> On Nov 27, 2010, at 02:01 PM, Michael Foord wrote: >(Note that there is no *particular* hurry to get this into 3.2 - the beta is >due imminently. I wouldn't object to it ) Indeed. I don't think the time is right to try to get this into 3.2. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From anurag.chourasia at gmail.com Sat Nov 27 19:45:44 2010 From: anurag.chourasia at gmail.com (Anurag Chourasia) Date: Sun, 28 Nov 2010 00:15:44 +0530 Subject: [Python-Dev] Python make fails with error "Fatal Python error: Interpreter not initialized (version mismatch?)" Message-ID: Hi All, During the make step of python, I am encountering a weird error. This is on AIX 5.3 using gcc as the compiler. My configuration options are as follows ./configure --enable-shared --disable-ipv6 --with-gcc=gcc CPPFLAGS="-I /opt/freeware/include -I /opt/freeware/include/readline -I /opt/freeware/include/ncurses" LDFLAGS="-L. -L/usr/local/lib" Below is the transcript from the make step. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ running build running build_ext ldd: /lib/libreadline.a: File is an archive. INFO: Can't locate Tcl/Tk libs and/or headers building '_struct' extension gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I. -I/u01/home/apli/wm/GDD/Python-2.6.6/./Include -I. -IInclude -I./Include -I/opt/freeware/include -I/opt/freeware/include/readline -I/opt/freeware/include/ncurses -I/usr/local/include -I/u01/home/apli/wm/GDD/Python-2.6.6/Include -I/u01/home/apli/wm/GDD/Python-2.6.6 -c /u01/home/apli/wm/GDD/Python-2.6.6/Modules/_struct.c -o build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/_struct.o ./Modules/ld_so_aix gcc -pthread -bI:Modules/python.exp -L. -L/usr/local/lib build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/_struct.o -L. -L/usr/local/lib -lpython2.6 -o build/lib.aix-5.3-2.6/_struct.so *Fatal Python error: Interpreter not initialized (version mismatch?)* *make: 1254-059 The signal code from the last command is 6.* ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ The last command that i see above (ld_so_aix) seems to have completed as the file _struct.so exists after this command and hence I am not sure which step is failing. There is no other Python version on my machine. Please guide. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Nov 27 21:50:11 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 27 Nov 2010 15:50:11 -0500 Subject: [Python-Dev] [Python-checkins] r86745 - in python/branches/py3k: Doc/library/difflib.rst Lib/difflib.py Lib/test/test_difflib.py Misc/NEWS In-Reply-To: References: <20101125061234.F1CC3EEA23@mail.python.org> Message-ID: <4CF16F03.9060407@udel.edu> On 11/27/2010 7:17 AM, Nick Coghlan wrote: > On Thu, Nov 25, 2010 at 4:12 PM, terry.reedy wrote: >> The :class:`SequenceMatcher` class has this constructor: >> >> >> -.. class:: SequenceMatcher(isjunk=None, a='', b='') >> +.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True) >> >> Optional argument *isjunk* must be ``None`` (the default) or a one-argument >> function that takes a sequence element and returns true if and only if the >> @@ -340,6 +349,9 @@ >> The optional arguments *a* and *b* are sequences to be compared; both default to >> empty strings. The elements of both sequences must be :term:`hashable`. >> >> + The optional argument *autojunk* can be used to disable the automatic junk >> + heuristic. >> + > > Catching up on checkins traffic, so a later checkin may already fix > this, but there should be a versionchanged tag in the docs to note > when the autojunk parameter was added. Right. When S.C. forward-ported the 2.7 patch. he must have thought it not needed and I missed the difference between the diffs. Will add note in both places needed immediately. Terry From v+python at g.nevcal.com Sat Nov 27 21:56:14 2010 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sat, 27 Nov 2010 12:56:14 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> Message-ID: <4CF1706E.5030503@g.nevcal.com> On 11/27/2010 2:51 AM, Nick Coghlan wrote: > Not quite. I'm suggesting a factory function that works for any value, > and derives the parent class from the type of the supplied value. Nick, thanks for the much better implementation than I achieved; you seem to have the same goals as my implementation. I learned a bit making mine, and more understanding yours to some degree. What I still don't understand about your implementation, is that when adding one additional line to your file, it fails: w = named_value("ABC", z ) Now I can understand why it might not be a good thing to make a named value of a named value (confusing, at least), but I was surprised, and still do not understand, that it failed reporting the __new__() takes exactly 3 arguments (2 given). -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Nov 27 23:11:44 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 28 Nov 2010 09:11:44 +1100 Subject: [Python-Dev] [Preview] Comments and change proposals on documentation In-Reply-To: References: Message-ID: <4CF18220.7000202@pearwood.info> Nick Coghlan wrote: > On Thu, Nov 25, 2010 at 6:24 AM, Georg Brandl wrote: >> Hi, >> >> at , you can look at a version of the 3.2 >> docs that has the upcoming commenting feature. JavaScript is mandatory. > > Very nice! > > I'm not sure what to do about the discoverability of the comment > bubbles as the end of each paragraph. I initially thought commenting > wasn't available on What's New or the Using Python docs until seeing > where the blue comment bubbles appeared in the math module docs. I wonder what the point of the comment bubbles is? This isn't a graphical UI where (contrary to popular opinion) a picture is *not* worth a thousand words, but may require a help-bubble to explain. This is text. If you want to make a comment on some text, the usual practice is to add more text :) I wasn't able to find a comment bubble that contained anything, so I don't know what sort of information you expect them to contain -- every one I tried said "0 comments". But it seems to me that comments are superfluous, if not actively harmful: (1) Anything important enough to tell the reader should be included in the text, where it can be easily seen, read and printed. (2) Discovery is lousy -- not only do you need to be running Javascript, which many people do not for performance, privacy and convenience[*], but you have to carefully mouse-over the paragraph just to see the blue bubble, and THEN you have to *precisely* mouse-over the bubble itself. (3) This will be a horrible and possibly even literally painful experience for anyone with a physical disability that makes precise positioning of the mouse difficult. (4) Accessibility for the blind and those using screen readers will probably be non-existent. (5) If the information in the comment bubbles is trivial enough that we're happy to say that the blind, the disabled and those who avoid Javascript don't need it, then perhaps *nobody* needs it. [*] In my experience, websites tend to fall into two basic categories: those that don't work at all without Javascript, and those that run better, faster, and with fewer anti-features and inconveniences without Javascript. -- Steven From g.brandl at gmx.net Sat Nov 27 23:37:29 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 27 Nov 2010 23:37:29 +0100 Subject: [Python-Dev] [Preview] Comments and change proposals on documentation In-Reply-To: <4CF18220.7000202@pearwood.info> References: <4CF18220.7000202@pearwood.info> Message-ID: Am 27.11.2010 23:11, schrieb Steven D'Aprano: > Nick Coghlan wrote: >> On Thu, Nov 25, 2010 at 6:24 AM, Georg Brandl wrote: >>> Hi, >>> >>> at , you can look at a version of the 3.2 >>> docs that has the upcoming commenting feature. JavaScript is mandatory. >> >> Very nice! >> >> I'm not sure what to do about the discoverability of the comment >> bubbles as the end of each paragraph. I initially thought commenting >> wasn't available on What's New or the Using Python docs until seeing >> where the blue comment bubbles appeared in the math module docs. > > I wonder what the point of the comment bubbles is? This isn't a > graphical UI where (contrary to popular opinion) a picture is *not* > worth a thousand words, but may require a help-bubble to explain. This > is text. If you want to make a comment on some text, the usual practice > is to add more text :) Yes, I already mentioned that the bubbles could be replaced by text links if they prove too confusing. > I wasn't able to find a comment bubble that contained anything, so I > don't know what sort of information you expect them to contain -- every > one I tried said "0 comments". Maybe you should have tried the page I recommended as a demo, and where Nick made his comments? :) > But it seems to me that comments are superfluous, if not actively harmful: (I've not read anything about harmful below. Was that just FUD?) > (1) Anything important enough to tell the reader should be included in > the text, where it can be easily seen, read and printed. Yes. There need to be ways for the reader to feed back to the author what they want to have included. Currently, this is I'm all for removing comments with suggestions once they have been integrated in the main text. > (2) Discovery is lousy -- not only do you need to be running Javascript, > which many people do not for performance, privacy and convenience[*], That is not an argument nowadays, seeing how many sites/web applications require JS. (Most people who deactivate JS globally maintain a whitelist anyway, and can easily add docs.python.org to that.) These comments are an optional feature and therefore do not need to be accessible for 100% of users. > but you have to carefully mouse-over the paragraph just to see the blue > bubble, and THEN you have to *precisely* mouse-over the bubble itself. Bubbles are always shown for paragraphs *with* comments. > (3) This will be a horrible and possibly even literally painful > experience for anyone with a physical disability that makes precise > positioning of the mouse difficult. You're making this point just because of the size of the bubbles? Well, these users can register on the site and there can be a user preference to display larger links instead (if we choose to keep the bubbles, anyway.) > (4) Accessibility for the blind and those using screen readers will > probably be non-existent. It will be the same as for other web apps using JavaScript. Since I'm not a professional user interface designer, I don't know what screen readers can and cannot do. > (5) If the information in the comment bubbles is trivial enough that > we're happy to say that the blind, the disabled and those who avoid > Javascript don't need it, then perhaps *nobody* needs it. Sorry, but that is a nonsensical argument. Apart from the questionable notion that anything must be available to everyone to be worth anything, it also doesn't consider that the comments are not only for fellow users: as I said above, the comments are designed to be a very quick way to give feedback to *us* developers. (This is the reason for the "propose a change" feature, for example.) So even if only 30% of all users had access to the comments and could use that to help us improve the documentation by submitting suggestions and corrections they never would have bothered registering in the tracker for, that would be a net gain. cheers, Georg From raymond.hettinger at gmail.com Sun Nov 28 00:26:13 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 27 Nov 2010 15:26:13 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF1706E.5030503@g.nevcal.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> Message-ID: <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> On Nov 27, 2010, at 12:56 PM, Glenn Linderman wrote: > On 11/27/2010 2:51 AM, Nick Coghlan wrote: >> >> Not quite. I'm suggesting a factory function that works for any value, >> and derives the parent class from the type of the supplied value. > > Nick, thanks for the much better implementation than I achieved; you seem to have the same goals as my implementation. I learned a bit making mine, and more understanding yours to some degree. What I still don't understand about your implementation, is that when adding one additional line to your file, it fails: > > w = named_value("ABC", z ) > > Now I can understand why it might not be a good thing to make a named value of a named value (confusing, at least), but I was surprised, and still do not understand, that it failed reporting the __new__() takes exactly 3 arguments (2 given). Can I suggest that an enum-maker be offered as a third-party module rather than prematurely adding it into the standard library. Raymond From steve at pearwood.info Sun Nov 28 00:58:52 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 28 Nov 2010 10:58:52 +1100 Subject: [Python-Dev] [Preview] Comments and change proposals on documentation In-Reply-To: References: <4CF18220.7000202@pearwood.info> Message-ID: <4CF19B3C.2000308@pearwood.info> Georg Brandl wrote: > Am 27.11.2010 23:11, schrieb Steven D'Aprano: >> I wasn't able to find a comment bubble that contained anything, so I >> don't know what sort of information you expect them to contain -- every >> one I tried said "0 comments". > > Maybe you should have tried the page I recommended as a demo, and where Nick > made his comments? :) Aha! I never would have guessed that the bubbles are clickable -- I thought you just moused-over them and they showed static comments put there by the developers, part of the documentation itself. I didn't realise that it was for users to add spam^W comments to the page. With that perspective, I need to rethink. Yes, I failed to fully read the instructions you sent, or understand them. That's what users do -- they don't read your instructions, and they misunderstand them. If your UI isn't easily discoverable, users will not be able to use it, and will be frustrated and annoyed. The user is always right, even when they're doing it wrong *wink* >> But it seems to me that comments are superfluous, if not actively harmful: > > (I've not read anything about harmful below. Was that just FUD?) Lowering accessibility to parts of the documentation is what I was talking about when I said "actively harmful". But now that I have better understanding of what the comment system is actually for, I have to rethink. -- Steven From glenn at nevcal.com Sun Nov 28 02:04:49 2010 From: glenn at nevcal.com (Glenn Linderman) Date: Sat, 27 Nov 2010 17:04:49 -0800 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF1706E.5030503@g.nevcal.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> Message-ID: <4CF1AAB1.4010808@nevcal.com> On 11/27/2010 12:56 PM, Glenn Linderman wrote: > On 11/27/2010 2:51 AM, Nick Coghlan wrote: >> Not quite. I'm suggesting a factory function that works for any value, >> and derives the parent class from the type of the supplied value. > > Nick, thanks for the much better implementation than I achieved; you > seem to have the same goals as my implementation. I learned a bit > making mine, and more understanding yours to some degree. What I > still don't understand about your implementation, is that when adding > one additional line to your file, it fails: > > w = named_value("ABC", z ) > > Now I can understand why it might not be a good thing to make a named > value of a named value (confusing, at least), but I was surprised, and > still do not understand, that it failed reporting the __new__() takes > exactly 3 arguments (2 given). OK, I puzzled out the error, and here is a "cure" of sorts. def __new__(cls, name, value): try: return base_type.__new__(cls, value) except TypeError: return base_type.__new__(cls, name, value) def __init__(self, name, value): self.__name = name try: super().__init__(value) except TypeError: super().__init__(name, value) Probably it would be better for the except clause to raise a different type of error ( Can't recursively create named value ) or to cleverly bypass the intermediate named value, and simply apply a new name to the original value. Hmm... For this, only __new__ need be changed: def __new__(cls, name, value): try: return base_type.__new__(cls, value) except TypeError: return _make_named_value_type( type( value._raw() ))( name, value._raw() ) def __init__(self, name, value): self.__name = name super().__init__(value) Thanks for not responding too quickly, I figured out more, and learned more. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Nov 28 03:38:27 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Nov 2010 12:38:27 +1000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> Message-ID: On Sun, Nov 28, 2010 at 9:26 AM, Raymond Hettinger wrote: > > On Nov 27, 2010, at 12:56 PM, Glenn Linderman wrote: > >> On 11/27/2010 2:51 AM, Nick Coghlan wrote: >>> >>> Not quite. I'm suggesting a factory function that works for any value, >>> and derives the parent class from the type of the supplied value. >> >> Nick, thanks for the much better implementation than I achieved; you seem to have the same goals as my implementation. ?I learned a bit ? ? making mine, and more understanding yours to some degree. ?What I still don't understand about your implementation, is that when adding one additional line to your file, it fails: >> >> w = named_value("ABC", z ) >> >> Now I can understand why it might not be a good thing to make a named value of a named value (confusing, at least), but I was surprised, and still do not understand, that it failed reporting the __new__() takes exactly 3 arguments (2 given). > > Can I suggest that an enum-maker be offered as a third-party module rather than prematurely adding it into the standard library. Indeed. Glenn's failing example suggests to me that using a new metaclass is probably going to be a cleaner option than trying to dance around type's default behaviour within an ordinary class definition (if nothing else, a separate metaclass makes it much easier to detect when you're dealing with an instance of a named type). Regardless, I still see value in approaching this whole discussion as a two-level design problem, with "named values" as the more fundamental concept, and then higher level grouping APIs to get enum-style behaviour. Eventually attaining "One Obvious Way" for the former seems achievable to me, while the diversity of use cases for grouping APIs suggests to me that "one-size-fits-all" isn't going to work unless that "one size" is a Frankenstein API with more options than anyone could reasonably hope to keep in their head at once. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From tjreedy at udel.edu Sun Nov 28 04:20:50 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 27 Nov 2010 22:20:50 -0500 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> Message-ID: On 11/27/2010 6:26 PM, Raymond Hettinger wrote: > Can I suggest that an enum-maker be offered as a third-party module Possibly with competing versions for trial and testing ;-) > rather than prematurely adding it into the standard library. I had same thought. -- Terry Jan Reedy From donjohnston at selfaware.com Sun Nov 28 05:17:11 2010 From: donjohnston at selfaware.com (Don Johnston) Date: Sun, 28 Nov 2010 04:17:11 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?=5BPreview=5D_Comments_and_change_proposal?= =?utf-8?q?s=09on=09documentation?= References: <4CF18220.7000202@pearwood.info> <4CF19B3C.2000308@pearwood.info> Message-ID: Steven D'Aprano pearwood.info> writes: > Aha! I never would have guessed that the bubbles are clickable -- I > thought you just moused-over them and they showed static comments put > there by the developers, part of the documentation itself. I didn't > realise that it was for users to add spam^W comments to the page. With > that perspective, I need to rethink. > > Yes, I failed to fully read the instructions you sent, or understand > them. That's what users do -- they don't read your instructions, and > they misunderstand them. If your UI isn't easily discoverable, users > will not be able to use it, and will be frustrated and annoyed. The user > is always right, even when they're doing it wrong *wink* > > > >> But it seems to me that comments are superfluous, if not actively harmful: > > > > (I've not read anything about harmful below. Was that just FUD?) > > Lowering accessibility to parts of the documentation is what I was > talking about when I said "actively harmful". But now that I have better > understanding of what the comment system is actually for, I have to rethink. > As an end-user, I, too, share concerns about the accessibility of the pending (proposed?) commenting functionality. A read-only JSON API would be great. Up until now, Sphinx has been an incredibly helpful tool for generating beautiful documentation from ReStructuredText, which is great for limiting the risk of malformed input. The new commenting feature ("dynamic application functionality") requires persistence for user-submitted content. Database persistence is currently implemented with the -excellent- SQLAlchemy ORM. So, this is a transition from Sphinx being an excellent publishing tool to being a dynamic publishing platform for user-submitted content ("comments"). I am sure this was not without due consideration, and FUD. The Python Web Framework communities (favorite framework *here*) will be the first to reiterate the challenges that all web application developers (and commenting API providers) face on a daily basis: - SQL Injection - XSS (Cross Site Scripting) - CSRF (Cross Site Request Forgery) Here are a few scenarios to consider: (1) Freeloading jackass decides that each paragraph of our documentation would look better with 200 "comments" for viagara. Freeloading jackass is aware of how HTTP GETs work. - What markup features are supported? - How does the application sanitize user-supplied input? - Is html5lib good enough? - On docs.python.org, how are 1000 inappropriate (freeloading) comments from 1000 different IPs deleted? - What's the roadmap for {..., Akismet, ReCaptcha, ...} support? (2) Freeloading jackass buys a block of javascript adspace on . The block of javascript surreptitiously posts helpful comments on behalf of unwitting users. - How does the application ensure that comments are submitted from the site hosting the documentation? - Which frameworks have existing, reviewed CSRF protections? Trying to read through the new source here [1], but there aren't many docstrings and BB doesn't yet support inline commenting. AFAIK, there are not yet any issues filed for these concerns. [2] 1. In the event that that kind of bug is discovered, how should the community report the issues? 2. If we have an alternate method of encouraging documentation feedback, how can this feature be turned off? Thanks again for a great publishing tool, Don [1] http://bitbucket.org/birkenfeld/sphinx [2] http://bitbucket.org/birkenfeld/sphinx/issues/new From benjamin at python.org Sun Nov 28 05:33:43 2010 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 27 Nov 2010 22:33:43 -0600 Subject: [Python-Dev] [RELEASED] Python 2.7.1 Message-ID: On behalf of the Python development team, I'm happy as a clam to announce the immediate availability of Python 2.7.1. 2.7 includes many features that were first released in Python 3.1. The faster io module, the new nested with statement syntax, improved float repr, set literals, dictionary views, and the memoryview object have been backported from 3.1. Other features include an ordered dictionary implementation, unittests improvements, a new sysconfig module, auto-numbering of fields in the str/unicode format method, and support for ttk Tile in Tkinter. For a more extensive list of changes in 2.7, see http://doc.python.org/dev/whatsnew/2.7.html or Misc/NEWS in the Python distribution. To download Python 2.7.1 visit: http://www.python.org/download/releases/2.7.1/ The 2.7.1 changelog is at: http://svn.python.org/projects/python/tags/r271/Misc/NEWS 2.7 documentation can be found at: http://docs.python.org/2.7/ This is a production release. Please report any bugs you find to the bug tracker: http://bugs.python.org/ Enjoy! -- Benjamin Peterson Release Manager benjamin at python.org (on behalf of the entire python-dev team and 2.7.1's contributors) From benjamin at python.org Sun Nov 28 05:34:42 2010 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 27 Nov 2010 22:34:42 -0600 Subject: [Python-Dev] [RELEASED] Python 3.1.3 Message-ID: On behalf of the Python development team, I'm happy as a lark to announce the third bugfix release for the Python 3.1 series, Python 3.1.3. This bug fix release features numerous bug fixes and documentation improvements over 3.1.2. The Python 3.1 version series focuses on the stabilization and optimization of the features and changes that Python 3.0 introduced. For example, the new I/O system has been rewritten in C for speed. File system APIs that use unicode strings now handle paths with undecodable bytes in them. Other features include an ordered dictionary implementation, a condensed syntax for nested with statements, and support for ttk Tile in Tkinter. For a more extensive list of changes in 3.1, see http://doc.python.org/3.1/whatsnew/3.1.html or Misc/NEWS in the Python distribution. This is a production release. To download Python 3.1.3 visit: http://www.python.org/download/releases/3.1.3/ A list of changes in 3.1.3 can be found here: http://svn.python.org/projects/python/tags/r313/Misc/NEWS The 3.1 documentation can be found at: http://docs.python.org/3.1 Bugs can always be reported to: http://bugs.python.org Enjoy! -- Benjamin Peterson Release Manager benjamin at python.org (on behalf of the entire python-dev team and 3.1.3's contributors) From martin at v.loewis.de Sun Nov 28 09:09:53 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 28 Nov 2010 09:09:53 +0100 Subject: [Python-Dev] Virus on python-3.1.2.msi? Message-ID: <4CF20E51.3050004@v.loewis.de> Issue 1050 claims that the 3.1.2 installer has the virus Palevo.DZ. Can somebody with a virus scanner please confirm or contest that claim? Thanks, Martin http://bugs.python.org/issue10500 From fuzzyman at voidspace.org.uk Sun Nov 28 14:48:08 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 28 Nov 2010 13:48:08 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> Message-ID: <4CF25D98.10105@voidspace.org.uk> On 28/11/2010 03:20, Terry Reedy wrote: > On 11/27/2010 6:26 PM, Raymond Hettinger wrote: > >> Can I suggest that an enum-maker be offered as a third-party module > > Possibly with competing versions for trial and testing ;-) > >> rather than prematurely adding it into the standard library. > > I had same thought. > There are already *several* enum packages for Python available. The implementation by Ben Finney, associated with the previous PEP, is on PyPI and the most recent release has over 4000 downloads making it reasonably popular: http://pypi.python.org/pypi/enum/ Other contenders include flufl.enum and lazr.enum. The Twisted guys would like a named constant type, and have a ticket for it, and PyQt has its own implementation (subclassing int) providing this functionality. In terms of assessing *general* usefulness in the wider community that step has already been done. This discussion came out of yet-another-set-of-integer-constants being added to the Python standard library (since changed to strings). We have integer constants, with the associated inscrutability when used from the interactive interpreter or debugging, in *many* standard library modules. The particular features and use cases being discussed have use *within* the standard library in mind. Releasing yet-another-enum-library-that-the-standard-library-can't-use would be a particularly pointless outcome of this discussion. The decision is whether or not to use named constants in the standard library, otherwise we can just point people at one of the several existing packages. All the best, Michael Foord -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From doko at ubuntu.com Sun Nov 28 16:46:09 2010 From: doko at ubuntu.com (Matthias Klose) Date: Sun, 28 Nov 2010 16:46:09 +0100 Subject: [Python-Dev] Question about GDB bindings and 32/64 bits In-Reply-To: <4CEF338C.4070509@jcea.es> References: <4CEF338C.4070509@jcea.es> Message-ID: <4CF27941.1020200@ubuntu.com> On 26.11.2010 05:11, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I have installed GDB 7.2 32 bits and 32 bits buildslaves are green. > Nevertheless 64 bits buildslaves are failing test_gdb. > > Is there any expectation that a 32 bits GDB be able to debug a 64 bits > python?. If not, gdb test should compare "platform.architecture()" (for > python and gdb in the system) and run only when they are the same. that would be too restrictive, as an 64bit gdb is able to handle 32bit binaries too. > If > this should work, I would open a bug and maybe spend some time with it. > > But before thinking about investing time, I would like to know if this > mix is actually expected or not to work. > > If not, I would consider to install a 64 bits GDB too and do some tricks > (like using an "/usr/local/bin/gdb" script wrapper to choose 32/64 > "real" gdb version) to actually execute "test_gdb" in both buildslaves > (they are running in the same physical machine). yes, and then you should be able to use this gdb for both 32 and 64bit builds. No need for a wrapper (Such a gdb is available in the gdb64 package on Debian/Ubuntu). Matthias From fuzzyman at voidspace.org.uk Sun Nov 28 17:28:00 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 28 Nov 2010 16:28:00 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> Message-ID: <4CF28310.7070304@voidspace.org.uk> On 28/11/2010 02:38, Nick Coghlan wrote: > On Sun, Nov 28, 2010 at 9:26 AM, Raymond Hettinger > wrote: >> On Nov 27, 2010, at 12:56 PM, Glenn Linderman wrote: >> >>> On 11/27/2010 2:51 AM, Nick Coghlan wrote: >>>> Not quite. I'm suggesting a factory function that works for any value, >>>> and derives the parent class from the type of the supplied value. >>> Nick, thanks for the much better implementation than I achieved; you seem to have the same goals as my implementation. I learned a bit making mine, and more understanding yours to some degree. What I still don't understand about your implementation, is that when adding one additional line to your file, it fails: >>> >>> w = named_value("ABC", z ) >>> >>> Now I can understand why it might not be a good thing to make a named value of a named value (confusing, at least), but I was surprised, and still do not understand, that it failed reporting the __new__() takes exactly 3 arguments (2 given). >> Can I suggest that an enum-maker be offered as a third-party module rather than prematurely adding it into the standard library. > Indeed. Glenn's failing example suggests to me that using a new > metaclass is probably going to be a cleaner option than trying to > dance around type's default behaviour within an ordinary class > definition (if nothing else, a separate metaclass makes it much easier > to detect when you're dealing with an instance of a named type). > Yep, for representing a group of names a single class with a metaclass seems like a reasonable approach. See my note below about agreeing minimal feature-set and minimal-api before we discuss implementation though. > Regardless, I still see value in approaching this whole discussion as > a two-level design problem, with "named values" as the more > fundamental concept, and then higher level grouping APIs to get > enum-style behaviour. It seems like using the term "enum" provokes a strong negative reaction in some of the core-devs who are basically in favour named constants and not actively against grouping. I'm happy with NamedConstant and GroupedNames (or similar) and dropping the use of the term enum. There are also valid concerns about over-engineering (and not so valid concerns...). Simplicity in creating them and no additional burden in using them are fundamental, but in the APIs / implementations suggested so far I think we are keeping that in mind. > Eventually attaining "One Obvious Way" for the > former seems achievable to me, while the diversity of use cases for > grouping APIs suggests to me that "one-size-fits-all" isn't going to > work unless that "one size" is a Frankenstein API with more options > than anyone could reasonably hope to keep in their head at once. > Well... yes - treating it as a two level design problem is fine. I don't think there are *many* competing features, in fact as far as feature requests on python-dev go I think this is a relatively straightforward one with a lot of *agreement* on the basic functionality. We have had various discussions about what the API should look like, or what the implementation should look like, but I don't think there is a lot of disagreement about basic features. There are some 'optional features'. Many of these can be added later without backwards compatibility issues, so those can profitably be omitted from an initial implementation. Features as I see them: Named constant -------------- * Nice repr * Subclass of the type it represents * Trivially easy to convert either to a string (name) and the value it represents * If an integer type, can be OR'd with other named constants and retains a useful repr Grouped constants ---------------- * Easy to create a group of named constants, accessible as attributes on group object * Capability to go from name or value to corresponding constants Optional Features --------------- * Ability to dynamically add new named values to a group. (Suggested by Guido) * Ability to test if a name or value is in a group * Ability to list all names in a group * ANDing as well as ORing * Constants are unique * OR'ing with an integer will look up the name (or calculate it if the int itself represents flags that have already been OR'd) and return a named value (with useful repr) instead of just an integer * Named constants be named values that can wrap *any* type and not just immutable values. (Note that wrapping mutable types makes providing "from_value" functionality harder *unless* we guarantee that named values are unique. If they aren't unique named values for a mutable type can have different values and there is no single definition of what the named value actually is.) Requiring that values only have one name - or alternatively that values on a group could have multiple names (obviously incompatible features). * Requiring all names in a group to be of the same type * Allow names to be set automatically in a namespace, for example in a class namespace or on a module * Allow subclassing and adding of new values only present in subclass I'd rather we agree a suitable (minimal) API and feature set and go to implementation from that. For wrapping mutable types I'm tempted to say YAGNI. For the standard library wrapping integers meets almost all our use-cases except for one float. (At work we have a decimal constant as it happens.) Perhaps we could require immutable types for groups but allow arbitrary values for individual named values? For the named values api: name = NamedValue('name', value) For the grouping (tentatively accepted as reasonable by Antoine): Group = make_constants('Group', name1=value1, name2=value2) name1, name2 = Group.name1, Group.name1 flag = name1 | name2 value = int(Group.name1) name = Group('name1') # alternatively: value = Group.from_name('name1') name = Group.from_value(value1) # Group(value1) could work only if values aren't strings # perhaps: name = Group(value=value1) Group.new_name = value3 # create new value on the group names = Group.all_names() # further bikeshedding on spelling of all_names required # correspondingly 'all_values' I guess, returning the constants themselves Some of the optional features couldn't later be added without backwards compatibility concerns (I think the type checking features and requiring unique values for example). We should at least consider these if we are to make adding them later difficult. I would be fine with not having these features. All the best, Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fuzzyman at voidspace.org.uk Sun Nov 28 18:05:12 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 28 Nov 2010 17:05:12 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF28310.7070304@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> <4CF28310.7070304@voidspace.org.uk> Message-ID: <4CF28BC8.1080508@voidspace.org.uk> On 28/11/2010 16:28, Michael Foord wrote: > [snip...] > I don't think there are *many* competing features, in fact as far as > feature requests on python-dev go I think this is a relatively > straightforward one with a lot of *agreement* on the basic functionality. > > We have had various discussions about what the API should look like, > or what the implementation should look like, but I don't think there > is a lot of disagreement about basic features. There are some > 'optional features'. Many of these can be added later without > backwards compatibility issues, so those can profitably be omitted > from an initial implementation. > > Features as I see them: > > Named constant > -------------- > > * Nice repr > * Subclass of the type it represents > * Trivially easy to convert either to a string (name) and the value it > represents > * If an integer type, can be OR'd with other named constants and > retains a useful repr > Note that having an OR repr is meaningless *unless* the constants are intended to be flags, OR'ing should be specified. name = NamedValue('name', value, flags=True) Where flags defaults to False. Typically you will use this through the grouping API anyway - where it can either be a keyword argument (slightly annoying because the suggestion is to create the named values through keyword arguments) or we can have two group-factory functions: Group = make_constants('Group', name1=value1, name2=value2) Flags = make_flags('Flags', name1=value1, name2=value2) It is sensible if flag values are only powers of 2; we could enforce that or not... (Another one for the optional feature list.) I forgot auto-enumeration (specifying names only and having values autogenerated) from the optional feature set by the way. I think Antoine strongly disapproves of this feature because it reminds him of C enums. Mark Dickinson thinks that the flags feature could be an optional feature too. If we have ORing it makes sense to have ANDing, so I guess they belong together. I think there is value in it though. I realise that the optional feature list is now not small, and implementing all of it would create the "franken-api" Nick is worried about. The minimal feature list is nicely small though and provides useful functionality. All the best, Michael > > Grouped constants > ---------------- > * Easy to create a group of named constants, accessible as attributes > on group object > * Capability to go from name or value to corresponding constants > > > Optional Features > --------------- > > * Ability to dynamically add new named values to a group. (Suggested > by Guido) > * Ability to test if a name or value is in a group > * Ability to list all names in a group > * ANDing as well as ORing > * Constants are unique > * OR'ing with an integer will look up the name (or calculate it if the > int itself represents flags that have already been OR'd) and return a > named value (with useful repr) instead of just an integer > * Named constants be named values that can wrap *any* type and not > just immutable values. (Note that wrapping mutable types makes > providing "from_value" functionality harder *unless* we guarantee that > named values are unique. If they aren't unique named values for a > mutable type can have different values and there is no single > definition of what the named value actually is.) > Requiring that values only have one name - or alternatively that > values on a group could have multiple names (obviously incompatible > features). > * Requiring all names in a group to be of the same type > * Allow names to be set automatically in a namespace, for example in a > class namespace or on a module > * Allow subclassing and adding of new values only present in subclass > > > I'd rather we agree a suitable (minimal) API and feature set and go to > implementation from that. > > For wrapping mutable types I'm tempted to say YAGNI. For the standard > library wrapping integers meets almost all our use-cases except for > one float. (At work we have a decimal constant as it happens.) Perhaps > we could require immutable types for groups but allow arbitrary values > for individual named values? > > For the named values api: > > name = NamedValue('name', value) > > For the grouping (tentatively accepted as reasonable by Antoine): > > Group = make_constants('Group', name1=value1, name2=value2) > name1, name2 = Group.name1, Group.name1 > flag = name1 | name2 > > value = int(Group.name1) > name = Group('name1') > # alternatively: value = Group.from_name('name1') > name = Group.from_value(value1) > # Group(value1) could work only if values aren't strings > # perhaps: name = Group(value=value1) > > Group.new_name = value3 # create new value on the group > names = Group.all_names() > # further bikeshedding on spelling of all_names required > # correspondingly 'all_values' I guess, returning the constants > themselves > > Some of the optional features couldn't later be added without > backwards compatibility concerns (I think the type checking features > and requiring unique values for example). We should at least consider > these if we are to make adding them later difficult. I would be fine > with not having these features. > > All the best, > > Michael > >> Cheers, >> Nick. >> > > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fuzzyman at voidspace.org.uk Sun Nov 28 18:16:21 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 28 Nov 2010 17:16:21 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF28BC8.1080508@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> <4CF28310.7070304@voidspace.org.uk> <4CF28BC8.1080508@voidspace.org.uk> Message-ID: <4CF28E65.2060405@voidspace.org.uk> On 28/11/2010 17:05, Michael Foord wrote: > [snip...] > It is sensible if flag values are only powers of 2; we could enforce > that or not... (Another one for the optional feature list.) > Another 'optional' feature I omitted was Phillip J. Eby's suggestion / requirement that named values be pickleable. Email is clunky for handling this, is there enough support (there is still some objection that is sure) to revive the PEP or create a new one? I also didn't include Nick's suggested API, which is slightly different from the one I suggested: silly = Namegroup.from_names("Silly", "FOO", "BAR", "BAZ") >>> silly.FOO Silly.FOO=0 >>> int(silly.FOO) 0 >>> silly(0) Silly.FOO=0 x = named_value("FOO", 1) y = named_value("BAR", "Hello World!") z = named_value("BAZ", dict(a=1, b=2, c=3)) set_named_values(globals(), foo=x._raw(), bar=y._raw(), baz=z._raw()) Where a named value created from an integer is an int subclass, from a dict a dict subclass and so on. Michael > I forgot auto-enumeration (specifying names only and having values > autogenerated) from the optional feature set by the way. I think > Antoine strongly disapproves of this feature because it reminds him of > C enums. > > Mark Dickinson thinks that the flags feature could be an optional > feature too. If we have ORing it makes sense to have ANDing, so I > guess they belong together. I think there is value in it though. > > I realise that the optional feature list is now not small, and > implementing all of it would create the "franken-api" Nick is worried > about. The minimal feature list is nicely small though and provides > useful functionality. > > All the best, > > Michael > >> >> Grouped constants >> ---------------- >> * Easy to create a group of named constants, accessible as attributes >> on group object >> * Capability to go from name or value to corresponding constants >> >> >> Optional Features >> --------------- >> >> * Ability to dynamically add new named values to a group. (Suggested >> by Guido) >> * Ability to test if a name or value is in a group >> * Ability to list all names in a group >> * ANDing as well as ORing >> * Constants are unique >> * OR'ing with an integer will look up the name (or calculate it if >> the int itself represents flags that have already been OR'd) and >> return a named value (with useful repr) instead of just an integer >> * Named constants be named values that can wrap *any* type and not >> just immutable values. (Note that wrapping mutable types makes >> providing "from_value" functionality harder *unless* we guarantee >> that named values are unique. If they aren't unique named values for >> a mutable type can have different values and there is no single >> definition of what the named value actually is.) >> Requiring that values only have one name - or alternatively that >> values on a group could have multiple names (obviously incompatible >> features). >> * Requiring all names in a group to be of the same type >> * Allow names to be set automatically in a namespace, for example in >> a class namespace or on a module >> * Allow subclassing and adding of new values only present in subclass >> >> >> I'd rather we agree a suitable (minimal) API and feature set and go >> to implementation from that. >> >> For wrapping mutable types I'm tempted to say YAGNI. For the standard >> library wrapping integers meets almost all our use-cases except for >> one float. (At work we have a decimal constant as it happens.) >> Perhaps we could require immutable types for groups but allow >> arbitrary values for individual named values? >> >> For the named values api: >> >> name = NamedValue('name', value) >> >> For the grouping (tentatively accepted as reasonable by Antoine): >> >> Group = make_constants('Group', name1=value1, name2=value2) >> name1, name2 = Group.name1, Group.name1 >> flag = name1 | name2 >> >> value = int(Group.name1) >> name = Group('name1') >> # alternatively: value = Group.from_name('name1') >> name = Group.from_value(value1) >> # Group(value1) could work only if values aren't strings >> # perhaps: name = Group(value=value1) >> >> Group.new_name = value3 # create new value on the group >> names = Group.all_names() >> # further bikeshedding on spelling of all_names required >> # correspondingly 'all_values' I guess, returning the constants >> themselves >> >> Some of the optional features couldn't later be added without >> backwards compatibility concerns (I think the type checking features >> and requiring unique values for example). We should at least consider >> these if we are to make adding them later difficult. I would be fine >> with not having these features. >> >> All the best, >> >> Michael >> >>> Cheers, >>> Nick. >>> >> >> > > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From steve at pearwood.info Sun Nov 28 19:05:55 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 29 Nov 2010 05:05:55 +1100 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF28E65.2060405@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> <4CF28310.7070304@voidspace.org.uk> <4CF28BC8.1080508@voidspace.org.uk> <4CF28E65.2060405@voidspace.org.uk> Message-ID: <4CF29A03.3060900@pearwood.info> Michael Foord wrote: > Another 'optional' feature I omitted was Phillip J. Eby's suggestion / > requirement that named values be pickleable. Email is clunky for > handling this, is there enough support (there is still some objection > that is sure) to revive the PEP or create a new one? I think it definitely needs a PEP. I don't care whether you revive the old PEP or write a new one. -- Steven From fuzzyman at voidspace.org.uk Sun Nov 28 19:49:30 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 28 Nov 2010 18:49:30 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF29A03.3060900@pearwood.info> References: <20101121034404.52924F20A@mail.python.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> <4CF28310.7070304@voidspace.org.uk> <4CF28BC8.1080508@voidspace.org.uk> <4CF28E65.2060405@voidspace.org.uk> <4CF29A03.3060900@pearwood.info> Message-ID: <4CF2A43A.5040009@voidspace.org.uk> On 28/11/2010 18:05, Steven D'Aprano wrote: > Michael Foord wrote: > >> Another 'optional' feature I omitted was Phillip J. Eby's suggestion >> / requirement that named values be pickleable. Email is clunky for >> handling this, is there enough support (there is still some objection >> that is sure) to revive the PEP or create a new one? > > I think it definitely needs a PEP. I don't care whether you revive the > old PEP or write a new one. > Well, "if it were to be accepted it would need a PEP" and "the next step should be a PEP" are slightly different statements. :-) As I agree with the former *anyway* at the worst starting a PEP will waste time, so I guess I'll get that underway when I get a chance... Thanks Michael -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From alexander.belopolsky at gmail.com Sun Nov 28 21:24:37 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 15:24:37 -0500 Subject: [Python-Dev] Python and the Unicode Character Database Message-ID: Two recently reported issues brought into light the fact that Python language definition is closely tied to character properties maintained by the Unicode Consortium. [1,2] For example, when Python switches to Unicode 6.0.0 (planned for the upcoming 3.2 release), we will gain two additional characters that Python can use in identifiers. [3] With Python 3.1: >>> exec('\u0CF1 = 1') Traceback (most recent call last): File " ", line 1, in File " ", line 1 ? = 1 ^ SyntaxError: invalid character in identifier but with Python 3.2a4: >>> exec('\u0CF1 = 1') >>> eval('\u0CF1') 1 Of course, the likelihood is low that this change will affect any user, but the change in str.isspace() reported in [1] is likely to cause some trouble: Python 2.6.5: >>> u'A\u200bB'.split() [u'A', u'B'] Python 2.7: >>> u'A\u200bB'.split() [u'A\u200bB'] While we have little choice but to follow UCD in defining str.isidentifier(), I think Python can promise users more stability in what it treats as space or as a digit in its builtins. For example, I don't think that supporting >>> float('????.??') 1234.56 is more important than to assure users that once their program accepted some text as a number, they can assume that the text is ASCII. [1] http://bugs.python.org/issue10567 [2] http://bugs.python.org/issue10557 [3] http://www.unicode.org/versions/Unicode6.0.0/#Database_Changes From solipsis at pitrou.net Sun Nov 28 21:43:11 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 28 Nov 2010 21:43:11 +0100 Subject: [Python-Dev] Python and the Unicode Character Database References: Message-ID: <20101128214311.092abd35@pitrou.net> On Sun, 28 Nov 2010 15:24:37 -0500 Alexander Belopolsky wrote: > While we have little choice but to follow UCD in defining > str.isidentifier(), I think Python can promise users more stability in > what it treats as space or as a digit in its builtins. Well, if "unicode support" means "support the latest version of the Unicode standard", I'm not sure we have a choice. We can make exceptions, but that would only confuse users even more, wouldn't it? > For example, > I don't think that supporting > > >>> float('????.??') > 1234.56 > > is more important than to assure users that once their program > accepted some text as a number, they can assume that the text is > ASCII. Why would they assume the text is ASCII? Regards Antoine. From alexander.belopolsky at gmail.com Sun Nov 28 21:58:33 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 15:58:33 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <20101128214311.092abd35@pitrou.net> References: <20101128214311.092abd35@pitrou.net> Message-ID: On Sun, Nov 28, 2010 at 3:43 PM, Antoine Pitrou wrote: .. >> For example, >> I don't think that supporting >> >> >>> float('????.??') >> 1234.56 >> >> is more important than to assure users that once their program >> accepted some text as a number, they can assume that the text is >> ASCII. > > Why would they assume the text is ASCII? def deposit(self, amountstr): self.balance += float(amountstr) audit_log("Deposited: " + amountstr) Auditor: $ cat numbered-account.log Deposited: ?????.?? ... From solipsis at pitrou.net Sun Nov 28 22:04:15 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 28 Nov 2010 22:04:15 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> Message-ID: <20101128220415.28b77508@pitrou.net> On Sun, 28 Nov 2010 15:58:33 -0500 Alexander Belopolsky wrote: > On Sun, Nov 28, 2010 at 3:43 PM, Antoine Pitrou wrote: > .. > >> For example, > >> I don't think that supporting > >> > >> >>> float('????.??') > >> 1234.56 > >> > >> is more important than to assure users that once their program > >> accepted some text as a number, they can assume that the text is > >> ASCII. > > > > Why would they assume the text is ASCII? > > def deposit(self, amountstr): > self.balance += float(amountstr) > audit_log("Deposited: " + amountstr) > > Auditor: > > $ cat numbered-account.log > Deposited: ?????.?? I'm not sure that's how banking applications are written :) Antoine. From jsbueno at python.org.br Sun Nov 28 22:12:09 2010 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Sun, 28 Nov 2010 19:12:09 -0200 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <20101128220415.28b77508@pitrou.net> References: <20101128214311.092abd35@pitrou.net> <20101128220415.28b77508@pitrou.net> Message-ID: On Sun, Nov 28, 2010 at 7:04 PM, Antoine Pitrou wrote: > On Sun, 28 Nov 2010 15:58:33 -0500 > Alexander Belopolsky wrote: > >> On Sun, Nov 28, 2010 at 3:43 PM, Antoine Pitrou wrote: >> .. >> >> For example, >> >> I don't think that supporting >> >> >> >> >>> float('????.??') >> >> 1234.56 >> >> >> >> is more important than to assure users that once their program >> >> accepted some text as a number, they can assume that the text is >> >> ASCII. >> > >> > Why would they assume the text is ASCII? >> >> def deposit(self, amountstr): >> ? ? ? self.balance += float(amountstr) >> ? ? ? audit_log("Deposited: " + amountstr) >> >> Auditor: >> >> $ cat numbered-account.log >> Deposited: ?????.?? > > > I'm not sure that's how banking applications are written :) > +1 for this being bogus - I see no correlation whatsoever in numbers inside unicode having to be "ASCII" if we have surpassed all technical barriers for needing to behave like that. ASCII is an oversimplification of human communication needed for computing devices not complex enough to represent it fully. Let novice C programmers in English speaking countries deal with the fact that 1 character is not 1 byte anymore. We are past this point. js -><- > Antoine. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/jsbueno%40python.org.br > From alexander.belopolsky at gmail.com Sun Nov 28 22:18:06 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 16:18:06 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <20101128220415.28b77508@pitrou.net> Message-ID: On Sun, Nov 28, 2010 at 4:12 PM, Joao S. O. Bueno wrote: .. > Let novice C programmers in English speaking countries deal with the > fact that 1 character is not 1 byte anymore. We are past this point. If you are, please contribute your expertise here: http://bugs.python.org/issue2382 From greg.ewing at canterbury.ac.nz Sun Nov 28 22:23:56 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 29 Nov 2010 10:23:56 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CEE5C1C.9000905@btinternet.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CEDDC2D.204@canterbury.ac.nz> <4CEE5C1C.9000905@btinternet.com> Message-ID: <4CF2C86C.9030505@canterbury.ac.nz> Rob Cliffe wrote: > But couldn't they be presented to the Python programmer as a single > type, with the implementation details hidden "under the hood"? Not in CPython, because tuple items are kept in the same block of memory as the object header. Because CPython can't move objects, this means that the size of the tuple must be known when the object is created. -- Greg From martin at v.loewis.de Sun Nov 28 23:17:13 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 28 Nov 2010 23:17:13 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <20101128214311.092abd35@pitrou.net> References: <20101128214311.092abd35@pitrou.net> Message-ID: <4CF2D4E9.3060607@v.loewis.de> >>>>> float('????.??') >> 1234.56 I think it's a bug that this works. The definition of the float builtin says Convert a string or a number to floating point. If the argument is a string, it must contain a possibly signed decimal or floating point number, possibly embedded in whitespace. The argument may also be '[+|-]nan' or '[+|-]inf'. Now, one may wonder what precisely a "possibly signed floating point number" is, but most likely, this refers to floatnumber ::= pointfloat | exponentfloat pointfloat ::= [intpart] fraction | intpart "." exponentfloat ::= (intpart | pointfloat) exponent intpart ::= digit+ fraction ::= "." digit+ exponent ::= ("e" | "E") ["+" | "-"] digit+ digit ::= "0"..."9" Regards, Martin From alexander.belopolsky at gmail.com Sun Nov 28 23:31:51 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 17:31:51 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2D4E9.3060607@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> Message-ID: On Sun, Nov 28, 2010 at 5:17 PM, "Martin v. L?wis" wrote: >>>>>> float('????.??') >>> 1234.56 > > I think it's a bug that this works. The definition of the float builtin says > > Convert a string or a number to floating point. If the argument is a > string, it must contain a possibly signed decimal or floating point > number, possibly embedded in whitespace. The argument may also be > '[+|-]nan' or '[+|-]inf'. > This definition fails long before we get beyond 127-th code point: >>> float('infinity') inf From mal at egenix.com Sun Nov 28 23:42:31 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 28 Nov 2010 23:42:31 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2D4E9.3060607@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> Message-ID: <4CF2DAD7.2000408@egenix.com> "Martin v. L?wis" wrote: >>>>>> float('????.??') >>> 1234.56 > > I think it's a bug that this works. The definition of the float builtin says > > Convert a string or a number to floating point. If the argument is a > string, it must contain a possibly signed decimal or floating point > number, possibly embedded in whitespace. The argument may also be > '[+|-]nan' or '[+|-]inf'. > > Now, one may wonder what precisely a "possibly signed floating point > number" is, but most likely, this refers to > > floatnumber ::= pointfloat | exponentfloat > pointfloat ::= [intpart] fraction | intpart "." > exponentfloat ::= (intpart | pointfloat) exponent > intpart ::= digit+ > fraction ::= "." digit+ > exponent ::= ("e" | "E") ["+" | "-"] digit+ > digit ::= "0"..."9" I don't see why the language spec should limit the wealth of number formats supported by float(). It is not uncommon for Asians and other non-Latin script users to use their own native script symbols for numbers. Just because these digits may look strange to someone doesn't mean that they are meaningless or should be discarded. Please also remember that Python3 now allows Unicode names for identifiers for much the same reasons. Note that the support in float() (and the other numeric constructors) to work with Unicode code points was explicitly added when Unicode support was added to Python and has been available since Python 1.6. It is not a bug by any definition of "bug", even though the feature may bug someone occasionally to go read up a bit on what else the world has to offer other than Arabic numerals :-) http://en.wikipedia.org/wiki/Numeral_system -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 28 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Sun Nov 28 23:48:59 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 28 Nov 2010 23:48:59 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: Message-ID: <4CF2DC5B.4020702@egenix.com> Alexander Belopolsky wrote: > Two recently reported issues brought into light the fact that Python > language definition is closely tied to character properties maintained > by the Unicode Consortium. [1,2] For example, when Python switches to > Unicode 6.0.0 (planned for the upcoming 3.2 release), we will gain two > additional characters that Python can use in identifiers. [3] > > With Python 3.1: > >>>> exec('\u0CF1 = 1') > Traceback (most recent call last): > File " ", line 1, in > File " ", line 1 > ? = 1 > ^ > SyntaxError: invalid character in identifier > > but with Python 3.2a4: > >>>> exec('\u0CF1 = 1') >>>> eval('\u0CF1') > 1 Such changes are not new, but I agree that they should probably be highlighted in the "What's new in Python x.x". > Of course, the likelihood is low that this change will affect any > user, but the change in str.isspace() reported in [1] is likely to > cause some trouble: > > Python 2.6.5: >>>> u'A\u200bB'.split() > [u'A', u'B'] > > Python 2.7: >>>> u'A\u200bB'.split() > [u'A\u200bB'] That's a classical bug fix. > While we have little choice but to follow UCD in defining > str.isidentifier(), I think Python can promise users more stability in > what it treats as space or as a digit in its builtins. Why should we divert from the work done by the Unicode Consortium ? After all, most of their changes are in fact bug fixes as well. > For example, > I don't think that supporting > >>>> float('????.??') > 1234.56 > > is more important than to assure users that once their program > accepted some text as a number, they can assume that the text is > ASCII. Sorry, but I don't agree. If ASCII numerals are an important aspect of an application, the application should make sure that only those numerals are used (e.g. by using a regular expression for checking). In a Unicode world, not accepting non-Arabic numerals would be a limitation, not a feature. Besides Python has had this support since Python 1.6. > [1] http://bugs.python.org/issue10567 > [2] http://bugs.python.org/issue10557 > [3] http://www.unicode.org/versions/Unicode6.0.0/#Database_Changes -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 28 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From alexander.belopolsky at gmail.com Sun Nov 28 23:51:00 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 17:51:00 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2DAD7.2000408@egenix.com> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> Message-ID: On Sun, Nov 28, 2010 at 5:42 PM, M.-A. Lemburg wrote: .. > I don't see why the language spec should limit the wealth of number > formats supported by float(). > The Language Spec (whatever it is) should not, but hopefully the Library Reference should. If you follow http://docs.python.org/dev/py3k/library/functions.html#float link and the references therein, you'll end up with digit ::= "0"..."9" http://docs.python.org/dev/py3k/reference/lexical_analysis.html#grammar-token-digit From martin at v.loewis.de Sun Nov 28 23:56:47 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 28 Nov 2010 23:56:47 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> Message-ID: <4CF2DE2F.5040405@v.loewis.de> Am 28.11.2010 23:31, schrieb Alexander Belopolsky: > On Sun, Nov 28, 2010 at 5:17 PM, "Martin v. L?wis" wrote: >>>>>>> float('????.??') >>>> 1234.56 >> >> I think it's a bug that this works. The definition of the float builtin says >> >> Convert a string or a number to floating point. If the argument is a >> string, it must contain a possibly signed decimal or floating point >> number, possibly embedded in whitespace. The argument may also be >> '[+|-]nan' or '[+|-]inf'. >> > > This definition fails long before we get beyond 127-th code point: > >>>> float('infinity') > inf What do infer from that? That the definition is wrong, or the code is wrong? Regards, Martin From tjreedy at udel.edu Mon Nov 29 00:00:25 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 28 Nov 2010 18:00:25 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> Message-ID: On 11/28/2010 3:58 PM, Alexander Belopolsky wrote: > On Sun, Nov 28, 2010 at 3:43 PM, Antoine Pitrou wrote: > .. >>> For example, >>> I don't think that supporting >>> >>>>>> float('????.??') >>> 1234.56 Even if this is somehow an accident or something that someone snuck in, I think it a good idea that *users* be able to input amounts with their native digits. That is different from requiring *programmers* to write literals with euro-ascii-digits >>> is more important than to assure users that once their program >>> accepted some text as a number, they can assume that the text is >>> ASCII. >> >> Why would they assume the text is ASCII? > > def deposit(self, amountstr): > self.balance += float(amountstr) > audit_log("Deposited: " + amountstr) If the programmer want to assure ascii, he can produce a string, possible formatted, from the amount depform = "Deposited: ${:14.2f}".format def deposit(self, amountstr): amount = float(amountstr) self.balance += amount # audit_log("Deposited: " + str(amount) # simple version audit_log(depform(amount)) Given that amountstr could be something like ' 182.33 ', I think programmer should plan to format it. -- Terry Jan Reedy From alexander.belopolsky at gmail.com Mon Nov 29 00:01:10 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 18:01:10 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2DE2F.5040405@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DE2F.5040405@v.loewis.de> Message-ID: On Sun, Nov 28, 2010 at 5:56 PM, "Martin v. L?wis" wrote: .. >> This definition fails long before we get beyond 127-th code point: >> >>>>> float('infinity') >> inf > > What do infer from that? That the definition is wrong, or the code is wrong? The development version of the reference manual is more detailed, but as far as I can tell, it still defines digit as 0-9. http://docs.python.org/dev/py3k/library/functions.html#float From martin at v.loewis.de Mon Nov 29 00:03:45 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 29 Nov 2010 00:03:45 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2DAD7.2000408@egenix.com> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> Message-ID: <4CF2DFD1.10901@v.loewis.de> >> Now, one may wonder what precisely a "possibly signed floating point >> number" is, but most likely, this refers to >> >> floatnumber ::= pointfloat | exponentfloat >> pointfloat ::= [intpart] fraction | intpart "." >> exponentfloat ::= (intpart | pointfloat) exponent >> intpart ::= digit+ >> fraction ::= "." digit+ >> exponent ::= ("e" | "E") ["+" | "-"] digit+ >> digit ::= "0"..."9" > > I don't see why the language spec should limit the wealth of number > formats supported by float(). If it doesn't, there should be some other specification of what is correct and what is not. It must not be unspecified. > It is not uncommon for Asians and other non-Latin script users to > use their own native script symbols for numbers. Just because these > digits may look strange to someone doesn't mean that they are > meaningless or should be discarded. Then these users should speak up and indicate their need, or somebody should speak up and confirm that there are users who actually want '????.??' to denote 1234.56. To my knowledge, there is no writing system in which '????.??e4' means 12345600.0. > Please also remember that Python3 now allows Unicode names for > identifiers for much the same reasons. No no no. Addition of Unicode identifiers has a well-designed, deliberate specification, with a PEP and all. The support for non-ASCII digits in float appears to be ad-hoc, and not founded on actual needs of actual users. > Note that the support in float() (and the other numeric constructors) > to work with Unicode code points was explicitly added when Unicode > support was added to Python and has been available since Python 1.6. That doesn't necessarily make it useful. Alexander's complaint is that it makes Python unstable (i.e. changing as the UCD changes). > It is not a bug by any definition of "bug" Most certainly it is: the documentation is either underspecified, or deviates from the implementation (when taking the most plausible interpretation). This is the very definition of "bug". Regards, Martin From tjreedy at udel.edu Mon Nov 29 00:03:30 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 28 Nov 2010 18:03:30 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> Message-ID: On 11/28/2010 5:51 PM, Alexander Belopolsky wrote: > The Language Spec (whatever it is) should not, but hopefully the > Library Reference should. If you follow > http://docs.python.org/dev/py3k/library/functions.html#float link and > the references therein, you'll end up with > > digit ::= "0"..."9" > > http://docs.python.org/dev/py3k/reference/lexical_analysis.html#grammar-token-digit So fix the doc for builtin float() and perhaps int(). -- Terry Jan Reedy From alexander.belopolsky at gmail.com Mon Nov 29 00:05:56 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 18:05:56 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2DFD1.10901@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <4CF2DFD1.10901@v.loewis.de> Message-ID: +1 on all point below. On Sun, Nov 28, 2010 at 6:03 PM, "Martin v. L?wis" wrote: >>> Now, one may wonder what precisely a "possibly signed floating point >>> number" is, but most likely, this refers to >>> >>> floatnumber ? ::= ?pointfloat | exponentfloat >>> pointfloat ? ?::= ?[intpart] fraction | intpart "." >>> exponentfloat ::= ?(intpart | pointfloat) exponent >>> intpart ? ? ? ::= ?digit+ >>> fraction ? ? ?::= ?"." digit+ >>> exponent ? ? ?::= ?("e" | "E") ["+" | "-"] digit+ >>> digit ? ? ? ? ?::= ?"0"..."9" >> >> I don't see why the language spec should limit the wealth of number >> formats supported by float(). > > If it doesn't, there should be some other specification of what > is correct and what is not. It must not be unspecified. > >> It is not uncommon for Asians and other non-Latin script users to >> use their own native script symbols for numbers. Just because these >> digits may look strange to someone doesn't mean that they are >> meaningless or should be discarded. > > Then these users should speak up and indicate their need, or somebody > should speak up and confirm that there are users who actually want > '????.??' to denote 1234.56. To my knowledge, there is no writing > system in which '????.??e4' means 12345600.0. > >> Please also remember that Python3 now allows Unicode names for >> identifiers for much the same reasons. > > No no no. Addition of Unicode identifiers has a well-designed, > deliberate specification, with a PEP and all. The support for > non-ASCII digits in float appears to be ad-hoc, and not founded > on actual needs of actual users. > >> Note that the support in float() (and the other numeric constructors) >> to work with Unicode code points was explicitly added when Unicode >> support was added to Python and has been available since Python 1.6. > > That doesn't necessarily make it useful. Alexander's complaint is that > it makes Python unstable (i.e. changing as the UCD changes). > >> It is not a bug by any definition of "bug" > > Most certainly it is: the documentation is either underspecified, > or deviates from the implementation (when taking the most plausible > interpretation). This is the very definition of "bug". > > Regards, > Martin > From martin at v.loewis.de Mon Nov 29 00:08:29 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 00:08:29 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DE2F.5040405@v.loewis.de> Message-ID: <4CF2E0ED.1080807@v.loewis.de> Am 29.11.2010 00:01, schrieb Alexander Belopolsky: > On Sun, Nov 28, 2010 at 5:56 PM, "Martin v. L?wis" wrote: > .. >>> This definition fails long before we get beyond 127-th code point: >>> >>>>>> float('infinity') >>> inf >> >> What do infer from that? That the definition is wrong, or the code is wrong? > > The development version of the reference manual is more detailed, but > as far as I can tell, it still defines digit as 0-9. > > http://docs.python.org/dev/py3k/library/functions.html#float > I wasn't asking about 0..9, but about "infinity". According to the spec, it shouldn't accept that (and neither should it accept 'infinitY'). However, whether that's a spec bug or an implementation bug - it seems like a minor issue to me (i.e. easily fixed). Regards, Martin From alexander.belopolsky at gmail.com Mon Nov 29 00:12:44 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 18:12:44 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2DFD1.10901@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <4CF2DFD1.10901@v.loewis.de> Message-ID: On Sun, Nov 28, 2010 at 6:03 PM, "Martin v. L?wis" wrote: .. >> Note that the support in float() (and the other numeric constructors) >> to work with Unicode code points was explicitly added when Unicode >> support was added to Python and has been available since Python 1.6. > > That doesn't necessarily make it useful. Alexander's complaint is that > it makes Python unstable (i.e. changing as the UCD changes). > What makes it worse, is that while superficially, Unicode versions follow the same X.Y.Z format as Python versions, the stability promises are completely different. For example, it appears that the general category for the ZERO WIDTH SPACE was changed in Unicode 4.0.1. I don't think a change affecting str.split(), int(), float() and probably numerous other library functions would be acceptable in a Python micro release. From alexander.belopolsky at gmail.com Mon Nov 29 00:16:24 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 18:16:24 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2E0ED.1080807@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DE2F.5040405@v.loewis.de> <4CF2E0ED.1080807@v.loewis.de> Message-ID: On Sun, Nov 28, 2010 at 6:08 PM, "Martin v. L?wis" wrote: > Am 29.11.2010 00:01, schrieb Alexander Belopolsky: >> On Sun, Nov 28, 2010 at 5:56 PM, "Martin v. L?wis" wrote: >> .. >>>> This definition fails long before we get beyond 127-th code point: >>>> >>>>>>> float('infinity') >>>> inf >>> >>> What do infer from that? That the definition is wrong, or the code is wrong? >> >> The development version of the reference manual is more detailed, but >> as far as I can tell, it still defines digit as 0-9. >> >> http://docs.python.org/dev/py3k/library/functions.html#float >> > > I wasn't asking about 0..9, but about "infinity". According to the > spec, it shouldn't accept that (and neither should it accept > 'infinitY'). According to the link that I mentioned, infinity ::= "Infinity" | "inf" and "Case is not significant, so, for example, ?inf?, ?Inf?, ?INFINITY? and ?iNfINity? are all acceptable spellings for positive infinity." I completely agree with your arguments and the reference manual has been improved a lot in the recent years. From martin at v.loewis.de Mon Nov 29 00:19:54 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 00:19:54 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <4CF2DFD1.10901@v.loewis.de> Message-ID: <4CF2E39A.6060605@v.loewis.de> > What makes it worse, is that while superficially, Unicode versions > follow the same X.Y.Z format as Python versions, the stability > promises are completely different. For example, it appears that the > general category for the ZERO WIDTH SPACE was changed in Unicode > 4.0.1. I don't think a change affecting str.split(), int(), float() > and probably numerous other library functions would be acceptable in a > Python micro release. Well, we managed to completely break Unicode normalization between 2.6.5 and 2.6.6, due to a bug. You can see the Unicode Consortium's stability policy at http://unicode.org/policies/stability_policy.html In a sense, this is stronger than Python's backwards compatibility promises (which allow for certain incompatible changes to occur over time, whereas Unicode makes promises about all future versions). Regards, Martin From benjamin at python.org Mon Nov 29 00:23:01 2010 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 28 Nov 2010 17:23:01 -0600 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2DAD7.2000408@egenix.com> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> Message-ID: 2010/11/28 M.-A. Lemburg : > > > "Martin v. L?wis" wrote: >>>>>>> float('????.??') >>>> 1234.56 >> >> I think it's a bug that this works. The definition of the float builtin says >> >> Convert a string or a number to floating point. If the argument is a >> string, it must contain a possibly signed decimal or floating point >> number, possibly embedded in whitespace. The argument may also be >> '[+|-]nan' or '[+|-]inf'. >> >> Now, one may wonder what precisely a "possibly signed floating point >> number" is, but most likely, this refers to >> >> floatnumber ? ::= ?pointfloat | exponentfloat >> pointfloat ? ?::= ?[intpart] fraction | intpart "." >> exponentfloat ::= ?(intpart | pointfloat) exponent >> intpart ? ? ? ::= ?digit+ >> fraction ? ? ?::= ?"." digit+ >> exponent ? ? ?::= ?("e" | "E") ["+" | "-"] digit+ >> digit ? ? ? ? ?::= ?"0"..."9" > > I don't see why the language spec should limit the wealth of number > formats supported by float(). > > It is not uncommon for Asians and other non-Latin script users to > use their own native script symbols for numbers. Just because these > digits may look strange to someone doesn't mean that they are > meaningless or should be discarded. That's different. Python doesn't assign any semantic meaning to the characters in identifiers. The non-latin support for numerals, though, could change the meaning of a program dramatically and needs to be well-specified. Whether int() should do this is debatable. I, for one, think this kind of support belongs in the locale module. -- Regards, Benjamin From alexander.belopolsky at gmail.com Mon Nov 29 00:29:47 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 18:29:47 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2E39A.6060605@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <4CF2DFD1.10901@v.loewis.de> <4CF2E39A.6060605@v.loewis.de> Message-ID: On Sun, Nov 28, 2010 at 6:19 PM, "Martin v. L?wis" wrote: .. > You can see the Unicode Consortium's stability policy at > > http://unicode.org/policies/stability_policy.html > >From the link above: """ As more experience is gathered in implementing the characters, adjustments in the properties may become necessary. Examples of such properties include, but are not limited to, the following: General_Category ... """ > In a sense, this is stronger than Python's backwards compatibility > promises (which allow for certain incompatible changes to occur > over time, whereas Unicode makes promises about all future versions). I would say it is *different* and should be taken into account when tying language features to Unicode specifications. This was done in PEP 3131. Note that one of the stated objections was "Unicode is young; its problems are not yet well understood and solved;" (It is still true.) From martin at v.loewis.de Mon Nov 29 00:33:23 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 29 Nov 2010 00:33:23 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> Message-ID: <4CF2E6C3.3010009@v.loewis.de> >>>>>>> float('????.??') >>>> 1234.56 > > Even if this is somehow an accident or something that someone snuck in, > I think it a good idea that *users* be able to input amounts with their > native digits. That is different from requiring *programmers* to write > literals with euro-ascii-digits So one question is what kind of data float() is aimed at. I claim that it is about "programmer" data, not "user" data. If it supported "user" data, it probably would have to support "1,000" to denote 1e3 in the U.S., and denote 1e0 in Germany. Our users are generally confused on whether they should use th full stop or the comma as the decimal separator. As not even the locale-dependent issues are considered in float(), it is clear to me that entering local numbers cannot possibly be the objective of the function. Instead, following a wide-spread Python convention, it is meant to be the reverse of repr(). Can you name a single person who actually wants to write '????.??' as a number? I'm fairly skeptical that users of arabic-indic digits. Instead, http://en.wikipedia.org/wiki/Decimal_separator suggests that they would rather U+066B, i.e. '???????', which isn't supported by Python. Regards, Martin From martin at v.loewis.de Mon Nov 29 00:40:31 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 00:40:31 +0100 Subject: [Python-Dev] PEP 384 final review Message-ID: <4CF2E86F.5000606@v.loewis.de> I have now completed http://www.python.org/dev/peps/pep-0384/ Benjamin has volunteered to rule on this PEP. Please comment with any changes you want to see, or speak in favor or against this PEP. Regards, Martin From fuzzyman at voidspace.org.uk Mon Nov 29 00:44:50 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 28 Nov 2010 23:44:50 +0000 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2E6C3.3010009@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2E6C3.3010009@v.loewis.de> Message-ID: <4CF2E972.2040209@voidspace.org.uk> On 28/11/2010 23:33, "Martin v. L?wis" wrote: >>>>>>>> float('????.??') >>>>> 1234.56 >> Even if this is somehow an accident or something that someone snuck in, >> I think it a good idea that *users* be able to input amounts with their >> native digits. That is different from requiring *programmers* to write >> literals with euro-ascii-digits > So one question is what kind of data float() is aimed at. I claim that > it is about "programmer" data, not "user" data. If it supported "user" > data, it probably would have to support "1,000" to denote 1e3 in the > U.S., and denote 1e0 in Germany. Our users are generally confused > on whether they should use th full stop or the comma as the decimal > separator. > FWIW the C# equivalent is locale aware *unless* you pass in a specific culture. (System.Double.Parse): http://msdn.microsoft.com/en-us/library/fd84bdyt.aspx If you're not aware that your code may be run on non-US computers this is a trap for the unwary. If you *are* aware then it is very useful. An alternative overload allows you to specify the culture used to do the conversion: http://msdn.microsoft.com/en-us/library/t9ebt447.aspx Michael > As not even the locale-dependent issues are considered in float(), > it is clear to me that entering local numbers cannot possibly be > the objective of the function. > > Instead, following a wide-spread Python convention, it is meant to be > the reverse of repr(). > > Can you name a single person who actually wants to write '????.??' > as a number? I'm fairly skeptical that users of arabic-indic digits. > Instead, > > http://en.wikipedia.org/wiki/Decimal_separator > > suggests that they would rather U+066B, i.e. '???????', which isn't > supported by Python. > > Regards, > Martin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From alexander.belopolsky at gmail.com Mon Nov 29 00:56:00 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 18:56:00 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2DFD1.10901@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <4CF2DFD1.10901@v.loewis.de> Message-ID: On Sun, Nov 28, 2010 at 6:03 PM, "Martin v. L?wis" wrote: .. > No no no. Addition of Unicode identifiers has a well-designed, > deliberate specification, with a PEP and all. The support for > non-ASCII digits in float appears to be ad-hoc, and not founded > on actual needs of actual users. > I wonder how carefully right-to-left scripts were considered when PEP 3131 was discussed. Try the following on the python prompt: >>> ?= int('???') >>> ? 123 In my OSX Terminal window, entering ? flips the >>> prompt and the session looks like this: ('???')int = ? <<< From martin at v.loewis.de Mon Nov 29 00:59:12 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 29 Nov 2010 00:59:12 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2E972.2040209@voidspace.org.uk> References: <20101128214311.092abd35@pitrou.net> <4CF2E6C3.3010009@v.loewis.de> <4CF2E972.2040209@voidspace.org.uk> Message-ID: <4CF2ECD0.4000003@v.loewis.de> > FWIW the C# equivalent is locale aware *unless* you pass in a specific > culture. > (System.Double.Parse): That's not quite the equivalent of float(), I would say: this one apparently is locale-aware, so it is more the equivalent of locale.atof. The next question then is if it supports indo-arabic digits in any locale (or more specifically in an arabic locale). Regards, Martin From solipsis at pitrou.net Mon Nov 29 01:01:12 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 29 Nov 2010 01:01:12 +0100 Subject: [Python-Dev] Python and the Unicode Character Database References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> Message-ID: <20101129010112.343eaf64@pitrou.net> On Sun, 28 Nov 2010 17:23:01 -0600 Benjamin Peterson wrote: > 2010/11/28 M.-A. Lemburg : > > > > > > "Martin v. L?wis" wrote: > >>>>>>> float('????.??') > >>>> 1234.56 > >> > >> I think it's a bug that this works. The definition of the float builtin says > >> > >> Convert a string or a number to floating point. If the argument is a > >> string, it must contain a possibly signed decimal or floating point > >> number, possibly embedded in whitespace. The argument may also be > >> '[+|-]nan' or '[+|-]inf'. > >> > >> Now, one may wonder what precisely a "possibly signed floating point > >> number" is, but most likely, this refers to > >> > >> floatnumber ? ::= ?pointfloat | exponentfloat > >> pointfloat ? ?::= ?[intpart] fraction | intpart "." > >> exponentfloat ::= ?(intpart | pointfloat) exponent > >> intpart ? ? ? ::= ?digit+ > >> fraction ? ? ?::= ?"." digit+ > >> exponent ? ? ?::= ?("e" | "E") ["+" | "-"] digit+ > >> digit ? ? ? ? ?::= ?"0"..."9" > > > > I don't see why the language spec should limit the wealth of number > > formats supported by float(). > > > > It is not uncommon for Asians and other non-Latin script users to > > use their own native script symbols for numbers. Just because these > > digits may look strange to someone doesn't mean that they are > > meaningless or should be discarded. > > That's different. Python doesn't assign any semantic meaning to the > characters in identifiers. The non-latin support for numerals, though, > could change the meaning of a program dramatically and needs to be > well-specified. Whether int() should do this is debatable. Perhaps int(), float(), Decimal() and friends could take an optional parameter indicating whether non-ascii digits are considered. It would then satisfy all parties. Antoine. From martin at v.loewis.de Mon Nov 29 01:02:18 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 29 Nov 2010 01:02:18 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <4CF2DFD1.10901@v.loewis.de> Message-ID: <4CF2ED8A.2010503@v.loewis.de> Am 29.11.2010 00:56, schrieb Alexander Belopolsky: > On Sun, Nov 28, 2010 at 6:03 PM, "Martin v. L?wis" wrote: > .. >> No no no. Addition of Unicode identifiers has a well-designed, >> deliberate specification, with a PEP and all. The support for >> non-ASCII digits in float appears to be ad-hoc, and not founded >> on actual needs of actual users. >> > > I wonder how carefully right-to-left scripts were considered when PEP > 3131 was discussed. IIRC, some Hebrew users have spoken in favor of the PEP, despite the obvious difficulties it would create. I may misremember, but I think someone pointed out that they had these difficulties all the time, and that it wasn't really a burden. Unicode specifies that one should always use "logical order" in memory, and that's what the PEP does. Rendering is then a tool issue. Regards, Martin From alexander.belopolsky at gmail.com Mon Nov 29 01:04:53 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 19:04:53 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2ECD0.4000003@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2E6C3.3010009@v.loewis.de> <4CF2E972.2040209@voidspace.org.uk> <4CF2ECD0.4000003@v.loewis.de> Message-ID: On Sun, Nov 28, 2010 at 6:59 PM, "Martin v. L?wis" wrote: .. > The next question then is if it supports indo-arabic digits in any > locale (or more specifically in an arabic locale). And once you answered that question, does it support Devanagari or Bengali digits? And if so, an arbitrary mix of those and indo-arabic digits? From alexander.belopolsky at gmail.com Mon Nov 29 01:25:37 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 19:25:37 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <20101129010112.343eaf64@pitrou.net> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <20101129010112.343eaf64@pitrou.net> Message-ID: On Sun, Nov 28, 2010 at 7:01 PM, Antoine Pitrou wrote: .. >> That's different. Python doesn't assign any semantic meaning to the >> characters in identifiers. The non-latin support for numerals, though, >> could change the meaning of a program dramatically and needs to be >> well-specified. Whether int() should do this is debatable. > > Perhaps int(), float(), Decimal() and friends could take an optional > parameter indicating whether non-ascii digits are considered. It would > then satisfy all parties. What parties? I don't think anyone has claimed to actually have used non-ASCII digits with float(). Of course it is fun that Python can process Bengali numerals, but so would be allowing Roman numerals. There is a reason why after careful consideration, PEP 313 was ultimately rejected. BTW, it is common in Russia to specify months using roman numerals. Maybe we should consider allowing datetime.date() accept '1.IV.2011'. From fuzzyman at voidspace.org.uk Mon Nov 29 01:25:40 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 29 Nov 2010 00:25:40 +0000 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2ECD0.4000003@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2E6C3.3010009@v.loewis.de> <4CF2E972.2040209@voidspace.org.uk> <4CF2ECD0.4000003@v.loewis.de> Message-ID: <4CF2F304.60905@voidspace.org.uk> On 28/11/2010 23:59, "Martin v. L?wis" wrote: >> FWIW the C# equivalent is locale aware *unless* you pass in a specific >> culture. >> (System.Double.Parse): > That's not quite the equivalent of float(), I would say: this one > apparently is locale-aware, so it is more the equivalent of locale.atof. Right. It is *the* standard way of getting a float from a string though, whereas in Python we have two depending on whether or not you want to be locale aware. The standard way in C# is locale aware. To be non-locale aware you pass in a specific culture or number format. > The next question then is if it supports indo-arabic digits in any > locale (or more specifically in an arabic locale). I don't think so actually. The float parse formatting rules are defined like this: [ws][$][sign][integral-digits[,]]integral-digits[.[fractional-digits]][E[sign]exponential-digits][ws] (From http://msdn.microsoft.com/en-us/library/7yd1h1be.aspx ) integral-digits, fractional-digits and exponential-digits are all defined as "A series of digits ranging from 0 to 9". Arguably this is not be conclusive. In fact the NumberFormatInfo class seems to hint that it may be otherwise: http://msdn.microsoft.com/en-us/library/system.globalization.numberformatinfo.aspx See DigitSubstitution on that page. I would have to try it to be sure and I don't have a Windows VM in convenient reach right now. All the best, Michael > Regards, > Martin -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fuzzyman at voidspace.org.uk Mon Nov 29 01:28:59 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 29 Nov 2010 00:28:59 +0000 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2E6C3.3010009@v.loewis.de> <4CF2E972.2040209@voidspace.org.uk> <4CF2ECD0.4000003@v.loewis.de> Message-ID: <4CF2F3CB.6090808@voidspace.org.uk> On 29/11/2010 00:04, Alexander Belopolsky wrote: > On Sun, Nov 28, 2010 at 6:59 PM, "Martin v. L?wis" wrote: > .. >> The next question then is if it supports indo-arabic digits in any >> locale (or more specifically in an arabic locale). > And once you answered that question, does it support Devanagari or > Bengali digits? And if so, an arbitrary mix of those and indo-arabic > digits? Haha. Go and try it yourself. :-) Michael -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From solipsis at pitrou.net Mon Nov 29 01:29:40 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 29 Nov 2010 01:29:40 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <20101129010112.343eaf64@pitrou.net> Message-ID: <1290990580.8242.2.camel@localhost.localdomain> > > Perhaps int(), float(), Decimal() and friends could take an optional > > parameter indicating whether non-ascii digits are considered. It would > > then satisfy all parties. > > What parties? I don't think anyone has claimed to actually have used > non-ASCII digits with float(). Have you done a poll of all Python 3 users? > Of course it is fun that Python can > process Bengali numerals, but so would be allowing Roman numerals. > There is a reason why after careful consideration, PEP 313 was > ultimately rejected. That's mostly irrelevant. This feature exists and someone, somewhere, may be using it. We normally don't remove stuff without deprecation. Antoine. From ncoghlan at gmail.com Mon Nov 29 01:48:51 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Nov 2010 10:48:51 +1000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF28310.7070304@voidspace.org.uk> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> <4CF28310.7070304@voidspace.org.uk> Message-ID: On Mon, Nov 29, 2010 at 2:28 AM, Michael Foord wrote: > For wrapping mutable types I'm tempted to say YAGNI. For the standard > library wrapping integers meets almost all our use-cases except for one > float. (At work we have a decimal constant as it happens.) Perhaps we could > require immutable types for groups but allow arbitrary values for individual > named values? Whereas my opinion is that "immutable vs mutable" is such a blurry distinction that we shouldn't try to make it at the lowest level. Would it be possible to name frozenset instances? Tuples? How about objects that are conceptually immutable, but don't close all the loopholes allowing you to mutate them? (e.g. Decimal, Fraction) Better to design a named value API that doesn't care about mutability, and then leave questions of reverse mappings from values back to names to the grouping API level. At that level, it would be trivial (and natural) to limit names to referencing Hashable values so that a reverse lookup table would be easy to implement. For standard library purposes, we could even reasonably provide an int-only grouping API, since the main use case is almost certainly to be in managing translation of OS-level integer constants to named values. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ben+python at benfinney.id.au Mon Nov 29 01:55:33 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Mon, 29 Nov 2010 11:55:33 +1100 Subject: [Python-Dev] Python and the Unicode Character Database References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <20101129010112.343eaf64@pitrou.net> Message-ID: <8739ql54oq.fsf@benfinney.id.au> Alexander Belopolsky writes: > On Sun, Nov 28, 2010 at 7:01 PM, Antoine Pitrou wrote: > > Perhaps int(), float(), Decimal() and friends could take an optional > > parameter indicating whether non-ascii digits are considered. It > > would then satisfy all parties. > > What parties? I don't think anyone has claimed to actually have used > non-ASCII digits with float(). Rather, it has been pointed out that there is an unknown amount of existing code which does that. You're not going to know how much or how little from this forum. > Of course it is fun that Python can process Bengali numerals, but so > would be allowing Roman numerals. There is a reason why after careful > consideration, PEP 313 was ultimately rejected. Rejecting a proposed *new* capability is a different matter from disabling an *existing* capability which works in existing Python releases. -- \ ?Following fashion and the status quo is easy. Thinking about | `\ your users' lives and creating something practical is much | _o__) harder.? ?Ryan Singer, 2008-07-09 | Ben Finney From fuzzyman at voidspace.org.uk Mon Nov 29 01:57:27 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 29 Nov 2010 00:57:27 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> <4CF28310.7070304@voidspace.org.uk> Message-ID: <4CF2FA77.3000604@voidspace.org.uk> On 29/11/2010 00:48, Nick Coghlan wrote: > On Mon, Nov 29, 2010 at 2:28 AM, Michael Foord > wrote: >> For wrapping mutable types I'm tempted to say YAGNI. For the standard >> library wrapping integers meets almost all our use-cases except for one >> float. (At work we have a decimal constant as it happens.) Perhaps we could >> require immutable types for groups but allow arbitrary values for individual >> named values? > Whereas my opinion is that "immutable vs mutable" is such a blurry > distinction that we shouldn't try to make it at the lowest level. > Would it be possible to name frozenset instances? Tuples? How about > objects that are conceptually immutable, but don't close all the > loopholes allowing you to mutate them? (e.g. Decimal, Fraction) > > Better to design a named value API that doesn't care about mutability, > and then leave questions of reverse mappings from values back to names > to the grouping API level. At that level, it would be trivial (and > natural) to limit names to referencing Hashable values so that a > reverse lookup table would be easy to implement. For standard library > purposes, we could even reasonably provide an int-only grouping API, > since the main use case is almost certainly to be in managing > translation of OS-level integer constants to named values. Sounds reasonable to me. Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From tjreedy at udel.edu Mon Nov 29 02:00:56 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 28 Nov 2010 20:00:56 -0500 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF2E86F.5000606@v.loewis.de> References: <4CF2E86F.5000606@v.loewis.de> Message-ID: On 11/28/2010 6:40 PM, "Martin v. L?wis" wrote: > I have now completed > > http://www.python.org/dev/peps/pep-0384/ The current text contains several error messages like: "System Message: WARNING/2 (pep-0384.txt, line 194) Bullet list ends without a blank line; unexpected unindent." Terry Jan Reedy From steve at pearwood.info Mon Nov 29 01:14:31 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 29 Nov 2010 11:14:31 +1100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2D4E9.3060607@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> Message-ID: <4CF2F067.5020705@pearwood.info> Martin v. L?wis wrote: >>>>>> float('????.??') >>> 1234.56 > > I think it's a bug that this works. The definition of the float builtin says [...] I think that's a documentation bug rather than a coding bug. If Python wishes to limit the digits allowed in numeric *literals* to ASCII 0...9, that's one thing, but I think that the digits allowed in numeric *strings* should allow the full range of digits supported by the Unicode standard. The former ensures that literals in code are always readable; the later allows users to enter numbers in their own number system. How could that be a bad thing? -- Steven From rob.cliffe at btinternet.com Sun Nov 28 02:07:08 2010 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Sun, 28 Nov 2010 01:07:08 +0000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF2C86C.9030505@canterbury.ac.nz> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CEDDC2D.204@canterbury.ac.nz> <4CEE5C1C.9000905@btinternet.com> <4CF2C86C.9030505@canterbury.ac.nz> Message-ID: <4CF1AB3C.3060408@btinternet.com> On 28/11/2010 21:23, Greg Ewing wrote: > Rob Cliffe wrote: > >> But couldn't they be presented to the Python programmer as a single >> type, with the implementation details hidden "under the hood"? > > Not in CPython, because tuple items are kept in the same block > of memory as the object header. Because CPython can't move > objects, this means that the size of the tuple must be known > when the object is created. > But when a frozen list a.k.a. tuple would be created - either directly, or by setting a list's mutable flag to False which would really turn it into a tuple - the size *would* be known. And since the object would now be immutable, there would be no requirement for its size to change. (My idea doesn't require additional functionality, just a different API.) Rob Cliffe From alexander.belopolsky at gmail.com Mon Nov 29 02:24:24 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 20:24:24 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <8739ql54oq.fsf@benfinney.id.au> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <20101129010112.343eaf64@pitrou.net> <8739ql54oq.fsf@benfinney.id.au> Message-ID: On Sun, Nov 28, 2010 at 7:55 PM, Ben Finney wrote: .. >> Of course it is fun that Python can process Bengali numerals, but so >> would be allowing Roman numerals. There is a reason why after careful >> consideration, PEP 313 was ultimately rejected. > > Rejecting a proposed *new* capability is a different matter from > disabling an *existing* capability which works in existing Python > releases. Was this capability ever documented? It does not feel like a deliberate feature. If it was, '\N{ARABIC DECIMAL SEPARATOR}' would be accepted in arabic-indic notation. If feels more like a CPython implementation detail similar to say: >>> int('10') is 10 True >>> int('10000') is 10000 False Note that the underlying PyUnicode_EncodeDecimal() function is described in the unicodeobject.h header file as follows: /* --- Decimal Encoder ---------------------------------------------------- */ /* Takes a Unicode string holding a decimal value and writes it into an output buffer using standard ASCII digit codes. .. The encoder converts whitespace to ' ', decimal characters to their corresponding ASCII digit and all other Latin-1 characters except \0 as-is. Characters outside this range (Unicode ordinals 1-256) are treated as errors. This includes embedded NULL bytes. */ So the support for non-ASCII digits is accidental and should be treated as a bug. From ben+python at benfinney.id.au Mon Nov 29 02:25:56 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Mon, 29 Nov 2010 12:25:56 +1100 Subject: [Python-Dev] Python and the Unicode Character Database References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> Message-ID: <87y68c53a3.fsf@benfinney.id.au> Steven D'Aprano writes: > If Python wishes to limit the digits allowed in numeric *literals* to > ASCII 0...9, that's one thing, but I think that the digits allowed in > numeric *strings* should allow the full range of digits supported by > the Unicode standard. I assume you specifically mean that the numeric class constructors, like ?int? and ?float?, should parse their input string such that any character Unicode defines as a numeric digit is mapped to the corresponding digit. That sounds attractive, but it raises questions about mixed notations, mixing digits from different writing systems, and probably other questionss I haven't thought of. It's not something to make a simple yes-or-no-decision on now, IMO. This sounds best suited to a PEP, which someone who cares enough can champion in ?python-ideas?. -- \ ?The manager has personally passed all the water served here.? | `\ ?hotel, Acapulco | _o__) | Ben Finney From steve at pearwood.info Mon Nov 29 00:43:59 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 29 Nov 2010 10:43:59 +1100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: Message-ID: <4CF2E93F.70208@pearwood.info> Alexander Belopolsky wrote: > Two recently reported issues brought into light the fact that Python > language definition is closely tied to character properties maintained > by the Unicode Consortium. [1,2] For example, when Python switches to > Unicode 6.0.0 (planned for the upcoming 3.2 release), we will gain two > additional characters that Python can use in identifiers. [3] [...] Why do you consider this a problem? It would be a problem if previously valid identifiers *stopped* being valid, but not the other way around. > Of course, the likelihood is low that this change will affect any > user, but the change in str.isspace() reported in [1] is likely to > cause some trouble: Looking at the thread here: http://bugs.python.org/issue10567 I interpret it as indicting that Python's isspace() has been buggy for many years, and is only now being fixed. It's always unfortunate when people rely on bugs, but I'm not sure we should be promising to support bug-for-bug compatibility from one version to the next :) > While we have little choice but to follow UCD in defining > str.isidentifier(), I think Python can promise users more stability in > what it treats as space or as a digit in its builtins. For example, > I don't think that supporting > >>>> float('????.??') > 1234.56 > > is more important than to assure users that once their program > accepted some text as a number, they can assume that the text is > ASCII. Seems like a pretty foolish assumption, if you ask me, pretty much akin to assuming that if string.isalpha() returns true that string is ASCII. Support for non-Arabic numerals in number strings goes back to at least Python 2.4: [steve at sylar ~]$ python2.4 Python 2.4.6 (#1, Mar 30 2009, 10:08:01) [GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> float(u'????.??') 1234.5599999999999 The fact that this is (apparently) only being raised now means that it isn't actually a problem in real life. I'd even say that it's a feature, and that if Python didn't support non-Arabic numerals, it should. -- Steven From alexander.belopolsky at gmail.com Mon Nov 29 03:32:15 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 28 Nov 2010 21:32:15 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2E93F.70208@pearwood.info> References: <4CF2E93F.70208@pearwood.info> Message-ID: On Sun, Nov 28, 2010 at 6:43 PM, Steven D'Aprano wrote: .. >> is more important than to assure users that once their program >> accepted some text as a number, they can assume that the text is >> ASCII. > > Seems like a pretty foolish assumption, if you ask me, pretty much akin to > assuming that if string.isalpha() returns true that string is ASCII. > It is not to 99.9% of Python users whose code is written for 2.x. Their strings are byte strings and string.isdigit() does imply ASCII even if string.isalpha() does not in many locales. .. > The fact that this is (apparently) only being raised now means that it isn't > actually a problem in real life. I'd even say that it's a feature, and that > if Python didn't support non-Arabic numerals, it should. > I raised this problem because I found a bug that is related to this feature. The bug is also a regression from 2.x. In 2.7: >>> float(u'1234\xa1') .. ValueError: invalid literal for float(): 1234? The last character is lost, but the error message is still meaningful. In 3.x, however: >>> float('1234\xa1') .. ValueError See http://bugs.python.org/issue10557 While investigating this issue I found that by the time the string gets to the number parser (_Py_dg_strtod), all non-ascii characters are dropped by PyUnicode_EncodeDecimal() so it cannot produce meaningful diagnostic. Of course, PyUnicode_EncodeDecimal(), can be fixed by making it pass non-ascii chars through as UTF-8 bytes, but I was wondering if preserving the ability to parse exotic numerals was worth the effort. From rrr at ronadam.com Mon Nov 29 04:03:39 2010 From: rrr at ronadam.com (Ron Adam) Date: Sun, 28 Nov 2010 21:03:39 -0600 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> Message-ID: <4CF3180B.1060306@ronadam.com> On 11/27/2010 04:51 AM, Nick Coghlan wrote: > x = named_value("FOO", 1) > y = named_value("BAR", "Hello World!") > z = named_value("BAZ", dict(a=1, b=2, c=3)) > > print(x, y, z, sep="\n") > print("\n".join(map(repr, (x, y, z)))) > print("\n".join(map(str, map(type, (x, y, z))))) > > set_named_values(globals(), foo=x._raw(), bar=y._raw(), baz=z._raw()) > print("\n".join(map(repr, (foo, bar, baz)))) > print(type(x) is type(foo), type(y) is type(bar), type(z) is type(baz)) > > ========================================================================== > > # Session output for the last 6 lines >>>> >>> print(x, y, z, sep="\n") > 1 > Hello World! > {'a': 1, 'c': 3, 'b': 2} > >>>> >>> print("\n".join(map(repr, (x, y, z)))) > FOO=1 > BAR='Hello World!' > BAZ={'a': 1, 'c': 3, 'b': 2} This reminds me of python annotations. Which seem like an already forgotten new feature. Maybe they can help with this? It does associate additional info to names and creates a nice dictionary to reference. >>> def name_values( FOO: 1, BAR: "Hello World!", BAZ: dict(a=1, b=2, c=3) ): ... return FOO, BAR, BAZ ... >>> foo(1,2,3) (1, 2, 3) >>> foo.__annotations__ {'BAR': 'Hello World!', 'FOO': 1, 'BAZ': {'a': 1, 'c': 3, 'b': 2}} Cheers, Ron From rrr at ronadam.com Mon Nov 29 04:03:39 2010 From: rrr at ronadam.com (Ron Adam) Date: Sun, 28 Nov 2010 21:03:39 -0600 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> Message-ID: <4CF3180B.1060306@ronadam.com> On 11/27/2010 04:51 AM, Nick Coghlan wrote: > x = named_value("FOO", 1) > y = named_value("BAR", "Hello World!") > z = named_value("BAZ", dict(a=1, b=2, c=3)) > > print(x, y, z, sep="\n") > print("\n".join(map(repr, (x, y, z)))) > print("\n".join(map(str, map(type, (x, y, z))))) > > set_named_values(globals(), foo=x._raw(), bar=y._raw(), baz=z._raw()) > print("\n".join(map(repr, (foo, bar, baz)))) > print(type(x) is type(foo), type(y) is type(bar), type(z) is type(baz)) > > ========================================================================== > > # Session output for the last 6 lines >>>> >>> print(x, y, z, sep="\n") > 1 > Hello World! > {'a': 1, 'c': 3, 'b': 2} > >>>> >>> print("\n".join(map(repr, (x, y, z)))) > FOO=1 > BAR='Hello World!' > BAZ={'a': 1, 'c': 3, 'b': 2} This reminds me of python annotations. Which seem like an already forgotten new feature. Maybe they can help with this? It does associate additional info to names and creates a nice dictionary to reference. >>> def name_values( FOO: 1, BAR: "Hello World!", BAZ: dict(a=1, b=2, c=3) ): ... return FOO, BAR, BAZ ... >>> foo(1,2,3) (1, 2, 3) >>> foo.__annotations__ {'BAR': 'Hello World!', 'FOO': 1, 'BAZ': {'a': 1, 'c': 3, 'b': 2}} Cheers, Ron From stephen at xemacs.org Mon Nov 29 04:39:32 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 29 Nov 2010 12:39:32 +0900 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2DAD7.2000408@egenix.com> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> Message-ID: <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> M.-A. Lemburg writes: > It is not uncommon for Asians and other non-Latin script users to > use their own native script symbols for numbers. Japanese don't, in computational or scientific work where float() would be used. Japanese numerals are used for dates and for certain felicitous ages (and even there so-called "Arabic" numerals are perfectly acceptable). Otherwise, it's all ASCII (although it might be "full-width" compatibility variants). > Please also remember that Python3 now allows Unicode names for > identifiers for much the same reasons. I don't think it's the same reason, not for Japanese, anyway. I agree that Python should make it easy for the programmer to get numerical values of native numeric strings, but it's not at all clear to me that there is any point to having float() recognize them by default. From ncoghlan at gmail.com Mon Nov 29 04:58:05 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Nov 2010 13:58:05 +1000 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Nov 29, 2010 at 1:39 PM, Stephen J. Turnbull wrote: > I agree that Python should make it easy for the programmer to get > numerical values of native numeric strings, but it's not at all clear > to me that there is any point to having float() recognize them by > default. Indeed, as someone else suggested earlier in the thread, supporting non-ASCII digits sounds more like a job for the locale module than for the builtin types. Deprecating non-ASCII support in the latter, while ensuring it is properly supported in the former sounds like a better way forward than maintaining the status quo (starting in 3.3 though, with the first beta just around the corner, we don't want to be monkeying with this in 3.2) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From martin at v.loewis.de Mon Nov 29 08:18:59 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 29 Nov 2010 08:18:59 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <20101129010112.343eaf64@pitrou.net> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <20101129010112.343eaf64@pitrou.net> Message-ID: <4CF353E3.4020706@v.loewis.de> > Perhaps int(), float(), Decimal() and friends could take an optional > parameter indicating whether non-ascii digits are considered. It would > then satisfy all parties. Not really. I still would want to see what the actual requirement is: i.e. do any users actually have the desire to have these digits accepted, yet the alternative decimal points rejected? Regards, Martin From martin at v.loewis.de Mon Nov 29 08:22:46 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 29 Nov 2010 08:22:46 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF2F067.5020705@pearwood.info> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> Message-ID: <4CF354C6.9020302@v.loewis.de> > The former ensures that literals in code are always readable; the later > allows users to enter numbers in their own number system. How could that > be a bad thing? It's YAGNI, feature bloat. It gives the illusion of supporting something that actually isn't supported very well (namely, parsing local number strings). I claim that there is no meaningful application of this feature. Regards, Martin From martin at v.loewis.de Mon Nov 29 08:25:19 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 08:25:19 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <1290990580.8242.2.camel@localhost.localdomain> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <20101129010112.343eaf64@pitrou.net> <1290990580.8242.2.camel@localhost.localdomain> Message-ID: <4CF3555F.9040106@v.loewis.de> > That's mostly irrelevant. This feature exists and someone, somewhere, > may be using it. We normally don't remove stuff without deprecation. Sure: it should be deprecated before being removed. Regards, Martin From amauryfa at gmail.com Mon Nov 29 08:55:13 2010 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Mon, 29 Nov 2010 08:55:13 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF2E86F.5000606@v.loewis.de> References: <4CF2E86F.5000606@v.loewis.de> Message-ID: 2010/11/29 "Martin v. L?wis" > I have now completed > > http://www.python.org/dev/peps/pep-0384/ was structseq.h considered? IMO it could be made PEP384-compliant with two additions that would replace two non-compliant functions: - A new function to create types, since PyStructSequence_InitType is supposed to work on a unititialized static variable: PyTypeObject *PyStructSequence_NewType(PyStructSequence_Desc *desc); - PyStructSequence_SetItem(), similar to the macro PyStructSequence_SET_ITEM; the PyStructSequence structure should be hidden. -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Mon Nov 29 09:09:14 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 09:09:14 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: References: <4CF2E86F.5000606@v.loewis.de> Message-ID: <4CF35FAA.50600@v.loewis.de> > I have now completed > > http://www.python.org/dev/peps/pep-0384/ > > > was structseq.h considered? No, it wasn't - unfortunately, it still doesn't get included when including Python.h. I'll add it. > IMO it could be made PEP384-compliant with two additions that would > replace two non-compliant functions: > > - A new function to create types, since PyStructSequence_InitType > is supposed to work on a unititialized static variable: > PyTypeObject *PyStructSequence_NewType(PyStructSequence_Desc *desc); > - PyStructSequence_SetItem(), similar to the > macro PyStructSequence_SET_ITEM; the PyStructSequence structure should > be hidden. Sounds good. Regards, Martin From mal at egenix.com Mon Nov 29 09:35:05 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 29 Nov 2010 09:35:05 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> Message-ID: <4CF365B9.5040303@egenix.com> Alexander Belopolsky wrote: > On Sun, Nov 28, 2010 at 5:42 PM, M.-A. Lemburg wrote: > .. >> I don't see why the language spec should limit the wealth of number >> formats supported by float(). >> > > The Language Spec (whatever it is) should not, but hopefully the > Library Reference should. If you follow > http://docs.python.org/dev/py3k/library/functions.html#float link and > the references therein, you'll end up with ... the language spec again :-) > digit ::= "0"..."9" > > http://docs.python.org/dev/py3k/reference/lexical_analysis.html#grammar-token-digit That's obviously a bug in the documentation, since the Python 2.7 docs don't mention any such relationship to the language spec: http://docs.python.org/library/functions.html#float -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 29 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From g.brandl at gmx.net Mon Nov 29 09:36:56 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 29 Nov 2010 09:36:56 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF35FAA.50600@v.loewis.de> References: <4CF2E86F.5000606@v.loewis.de> <4CF35FAA.50600@v.loewis.de> Message-ID: Am 29.11.2010 09:09, schrieb "Martin v. L?wis": >> I have now completed >> >> http://www.python.org/dev/peps/pep-0384/ >> >> >> was structseq.h considered? > > No, it wasn't - unfortunately, it still doesn't get included when > including Python.h. I'll add it. Would 3.2 be a good time to finally include it? All of its macros and declarations are named PyStructSequence*, so there shouldn't be a name clash concern. Georg From g.brandl at gmx.net Mon Nov 29 09:52:19 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 29 Nov 2010 09:52:19 +0100 Subject: [Python-Dev] [Preview] Comments and change proposals on documentation In-Reply-To: <4CF19B3C.2000308@pearwood.info> References: <4CF18220.7000202@pearwood.info> <4CF19B3C.2000308@pearwood.info> Message-ID: Am 28.11.2010 00:58, schrieb Steven D'Aprano: > Georg Brandl wrote: >> Am 27.11.2010 23:11, schrieb Steven D'Aprano: > >>> I wasn't able to find a comment bubble that contained anything, so I >>> don't know what sort of information you expect them to contain -- every >>> one I tried said "0 comments". >> >> Maybe you should have tried the page I recommended as a demo, and where Nick >> made his comments? :) > > Aha! I never would have guessed that the bubbles are clickable -- I > thought you just moused-over them and they showed static comments put > there by the developers, part of the documentation itself. I didn't > realise that it was for users to add spam^W comments to the page. With > that perspective, I need to rethink. > > Yes, I failed to fully read the instructions you sent, or understand > them. That's what users do -- they don't read your instructions, and > they misunderstand them. If your UI isn't easily discoverable, users > will not be able to use it, and will be frustrated and annoyed. The user > is always right, even when they're doing it wrong *wink* That's right, of course. I really come to the conclusion that having a text link that "looks like" a link, i.e. is underlined, will have a better UI experience (since we cannot put notes "click bubble to comment" everywhere). >>> But it seems to me that comments are superfluous, if not actively harmful: >> >> (I've not read anything about harmful below. Was that just FUD?) > > Lowering accessibility to parts of the documentation is what I was > talking about when I said "actively harmful". But now that I have better > understanding of what the comment system is actually for, I have to rethink. Thanks! Georg From doko at ubuntu.com Mon Nov 29 11:24:22 2010 From: doko at ubuntu.com (Matthias Klose) Date: Mon, 29 Nov 2010 11:24:22 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF2E86F.5000606@v.loewis.de> References: <4CF2E86F.5000606@v.loewis.de> Message-ID: <4CF37F56.9030808@ubuntu.com> On 29.11.2010 00:40, "Martin v. L?wis" wrote: > I have now completed > > http://www.python.org/dev/peps/pep-0384/ > > Benjamin has volunteered to rule on this PEP. > > Please comment with any changes you want to see, or speak in > favor or against this PEP. I looked at a diff with r84330 from the py3k branch. Extensions built with Py_LIMITED_API have the python version encoded in it's name. Which abi name should be used for these extensions? - The m and u modifiers in the abi name are complimentary (?) - debug builds and Py_LIMITED_API are incompatible (?) and therefore the current name should be used? - For posix systems the implementation is currently part of the abi name, are Py_LIMITED_API extensions supposed to be compatible with e.g. PyPy? Should the LIMITED_API abi name include the implementation string? - Should the distutils support for LIMITED_API be part of the pep, or be implemented later? In favour of the pep. Matthias From mal at egenix.com Mon Nov 29 12:02:57 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 29 Nov 2010 12:02:57 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CF38861.5090309@egenix.com> Nick Coghlan wrote: > On Mon, Nov 29, 2010 at 1:39 PM, Stephen J. Turnbull wrote: >> I agree that Python should make it easy for the programmer to get >> numerical values of native numeric strings, but it's not at all clear >> to me that there is any point to having float() recognize them by >> default. > > Indeed, as someone else suggested earlier in the thread, supporting > non-ASCII digits sounds more like a job for the locale module than for > the builtin types. > > Deprecating non-ASCII support in the latter, while ensuring it is > properly supported in the former sounds like a better way forward than > maintaining the status quo (starting in 3.3 though, with the first > beta just around the corner, we don't want to be monkeying with this > in 3.2) Since when do we only support certain Unicode features in specific locales ? If we would go down that road, we would also have to disable other Unicode features based on locale, e.g. whether to apply non-ASCII case mappings, what to consider whitespace, etc. We don't do that for a good reason: Unicode is supposed to be universal and not limited to a single locale. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 29 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From sylvain.thenault at logilab.fr Mon Nov 29 12:53:11 2010 From: sylvain.thenault at logilab.fr (Sylvain =?utf-8?B?VGjDqW5hdWx0?=) Date: Mon, 29 Nov 2010 12:53:11 +0100 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: <4CEE9B72.1070002@ronadam.com> References: <201011251530.23947.emile.anclin@logilab> <4CEE9B72.1070002@ronadam.com> Message-ID: <20101129115311.GD18888@lupus.logilab.fr> On 25 novembre 11:22, Ron Adam wrote: > On 11/25/2010 08:30 AM, Emile Anclin wrote: > > > >hello, > > > >working on Pylint, we have a lot of voluntary corrupted files to test > >Pylint behavior; for instance > > > >$ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py > ># -*- coding: IBO-8859-1 -*- > >""" check correct unknown encoding declaration > >""" > > > >__revision__ = '????' > > > > > >and we try to find that module : > >find_module('func_unknown_encoding', None). But python3 raises SyntaxError > >in that case ; it didn't raise SyntaxError on python2 nor does so on our > >func_nonascii_noencoding and func_wrong_encoding modules (with obvious > >names) > > > >Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) > >[GCC 4.3.4] on linux2 > >Type "help", "copyright", "credits" or "license" for more information. > >>>>from imp import find_module > >>>>find_module('func_unknown_encoding', None) > >Traceback (most recent call last): > > File " ", line 1, in > >SyntaxError: encoding problem: with BOM > > I don't think there is a clear reason by design. Also try importing > the same modules directly and noting the differences in the errors > you get. IMO the point is that we can consider as a bug the fact that find_module tries to somewhat read the content of the file, no? Though it seems to only doing this for encoding detection or like since find_module doesn't choke on a module containing another kind of syntax error. So the question is, should we deal with this in pylint/astng, or can we expect this to be fixed at some point? -- Sylvain Th?nault LOGILAB, Paris (France) Formations Python, Debian, M?th. Agiles: http://www.logilab.fr/formations D?veloppement logiciel sur mesure: http://www.logilab.fr/services CubicWeb, the semantic web framework: http://www.cubicweb.org From ncoghlan at gmail.com Mon Nov 29 13:43:26 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Nov 2010 22:43:26 +1000 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF38861.5090309@egenix.com> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF38861.5090309@egenix.com> Message-ID: On Mon, Nov 29, 2010 at 9:02 PM, M.-A. Lemburg wrote: > If we would go down that road, we would also have to disable other > Unicode features based on locale, e.g. whether to apply non-ASCII > case mappings, what to consider whitespace, etc. > > We don't do that for a good reason: Unicode is supposed to be > universal and not limited to a single locale. Because parsing numbers is about more than just the characters used for the individual digits. There are additional semantics associated with digit ordering (for any number) and decimal separators and exponential notation (for floating point numbers) and those vary by locale. We deliberately chose to make the builtin numeric parsers unaware of all of those things, and assuming that we can simply parse other digits as if they were their ASCII equivalents and otherwise assume a C locale seems questionable. If the existing semantics can be adequately defined, documented and defended, then retaining them would be fine. However, the language reference needs to define the behaviour properly so that other implementations know what they need to support and what can be chalked up as being just an implementation accident of CPython. (As a point in the plus column, both decimal.Decimal and fractions.Fraction were able to handle the '????.??' example in a manner consistent with the int and float handling) Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From merwok at netwok.org Mon Nov 29 14:14:30 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Mon, 29 Nov 2010 14:14:30 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF2E86F.5000606@v.loewis.de> References: <4CF2E86F.5000606@v.loewis.de> Message-ID: <4CF3A736.4050003@netwok.org> Hello, > Please comment with any changes you want to see, or speak in > favor or against this PEP. How to get a diff between py3k and this branch? Regards From doko at ubuntu.com Mon Nov 29 14:37:33 2010 From: doko at ubuntu.com (Matthias Klose) Date: Mon, 29 Nov 2010 14:37:33 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF3A736.4050003@netwok.org> References: <4CF2E86F.5000606@v.loewis.de> <4CF3A736.4050003@netwok.org> Message-ID: <4CF3AC9D.20309@ubuntu.com> On 29.11.2010 14:14, ?ric Araujo wrote: > Hello, > >> Please comment with any changes you want to see, or speak in >> favor or against this PEP. > > How to get a diff between py3k and this branch? I used svn diff svn://svn.python.org/python/branches/py3k at 84330 svn://svn.python.org/python/branches/pep-0384 From ncoghlan at gmail.com Mon Nov 29 14:58:50 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Nov 2010 23:58:50 +1000 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF3AC9D.20309@ubuntu.com> References: <4CF2E86F.5000606@v.loewis.de> <4CF3A736.4050003@netwok.org> <4CF3AC9D.20309@ubuntu.com> Message-ID: On Mon, Nov 29, 2010 at 11:37 PM, Matthias Klose wrote: > On 29.11.2010 14:14, ?ric Araujo wrote: >> >> Hello, >> >>> Please comment with any changes you want to see, or speak in >>> favor or against this PEP. >> >> How to get a diff between py3k and this branch? > > I used > svn diff svn://svn.python.org/python/branches/py3k at 84330 > svn://svn.python.org/python/branches/pep-0384 I had to use the full read/write svn+ssh:pythondev at svn.python.org repository URLs to get it to give me a diff. The http read only URLs didn't work (no diff returned, just "svn: OPTIONS of 'http://svn.python.org/python/branches/pep-0384': 200 OK (http://svn.python.org)"), and the bare svn protocol isn't enabled on svn.python.org. Since directory diffs don't appear to be enabled on the svn.python.org ViewVC instance, it would probably be a good idea to put this up on Reitveld so people can more easily see the details of what has been changed on the branch to date. If nobody beats me to it, I'll put one up in the morning. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Mon Nov 29 15:07:32 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 30 Nov 2010 00:07:32 +1000 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF2E86F.5000606@v.loewis.de> References: <4CF2E86F.5000606@v.loewis.de> Message-ID: On Mon, Nov 29, 2010 at 9:40 AM, "Martin v. L?wis" wrote: > I have now completed > > http://www.python.org/dev/peps/pep-0384/ > > Benjamin has volunteered to rule on this PEP. > > Please comment with any changes you want to see, or speak in > favor or against this PEP. This is probably an issue independent of the PEP, but there appear to be a *lot* of exposed typedefs for various type slots and other function signatures that don't start with the Py prefix (i.e. getter, setter, unaryfunc and friends). Python.h shouldn't be leaking unprefixed names like that. We certainly shouldn't be enshrining them in the stable ABI without adding prefixes first. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Mon Nov 29 15:19:07 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 29 Nov 2010 15:19:07 +0100 Subject: [Python-Dev] Python and the Unicode Character Database References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20101129151907.64e3f6ae@pitrou.net> On Mon, 29 Nov 2010 13:58:05 +1000 Nick Coghlan wrote: > On Mon, Nov 29, 2010 at 1:39 PM, Stephen J. Turnbull wrote: > > I agree that Python should make it easy for the programmer to get > > numerical values of native numeric strings, but it's not at all clear > > to me that there is any point to having float() recognize them by > > default. > > Indeed, as someone else suggested earlier in the thread, supporting > non-ASCII digits sounds more like a job for the locale module than for > the builtin types. Not sure, really. For example, "\d" in a regular expression will match all Unicode digits, unless you pass the re.ASCII flag. The C locale mechanism generally does a poor job of supporting what MS seems to call "culture-specific" characteristics. Regards Antoine. From solipsis at pitrou.net Mon Nov 29 15:22:24 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 29 Nov 2010 15:22:24 +0100 Subject: [Python-Dev] Python and the Unicode Character Database References: <4CF2E93F.70208@pearwood.info> Message-ID: <20101129152224.7c253a8c@pitrou.net> On Sun, 28 Nov 2010 21:32:15 -0500 Alexander Belopolsky wrote: > On Sun, Nov 28, 2010 at 6:43 PM, Steven D'Aprano wrote: > .. > >> is more important than to assure users that once their program > >> accepted some text as a number, they can assume that the text is > >> ASCII. > > > > Seems like a pretty foolish assumption, if you ask me, pretty much akin to > > assuming that if string.isalpha() returns true that string is ASCII. > > > > It is not to 99.9% of Python users whose code is written for 2.x. > Their strings are byte strings and string.isdigit() does imply ASCII > even if string.isalpha() does not in many locales. We are not talking about string.isdigit(), we are talking about the float() constructor when given an unicode string. Constructing a float from an unicode string is certainly a common thing, even in 2.x. Regards Antoine. From foom at fuhm.net Mon Nov 29 15:15:12 2010 From: foom at fuhm.net (James Y Knight) Date: Mon, 29 Nov 2010 09:15:12 -0500 Subject: [Python-Dev] PEP 384 final review In-Reply-To: References: <4CF2E86F.5000606@v.loewis.de> <4CF3A736.4050003@netwok.org> <4CF3AC9D.20309@ubuntu.com> Message-ID: <28693E2E-A60E-4F83-BF55-DBD6EAD88353@fuhm.net> On Nov 29, 2010, at 8:58 AM, Nick Coghlan wrote: > The http read only URLs > didn't work (no diff returned, just "svn: OPTIONS of > 'http://svn.python.org/python/branches/pep-0384': 200 OK > (http://svn.python.org)"), That was the wrong url: you should've used http://svn.python.org/projects/python/branches/pep-0384 James -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Mon Nov 29 16:19:19 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 29 Nov 2010 16:19:19 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF38861.5090309@egenix.com> Message-ID: <4CF3C477.1020007@egenix.com> Nick Coghlan wrote: > On Mon, Nov 29, 2010 at 9:02 PM, M.-A. Lemburg wrote: >> If we would go down that road, we would also have to disable other >> Unicode features based on locale, e.g. whether to apply non-ASCII >> case mappings, what to consider whitespace, etc. >> >> We don't do that for a good reason: Unicode is supposed to be >> universal and not limited to a single locale. > > Because parsing numbers is about more than just the characters used > for the individual digits. There are additional semantics associated > with digit ordering (for any number) and decimal separators and > exponential notation (for floating point numbers) and those vary by > locale. We deliberately chose to make the builtin numeric parsers > unaware of all of those things, and assuming that we can simply parse > other digits as if they were their ASCII equivalents and otherwise > assume a C locale seems questionable. Sure, and those additional semantics are locale dependent, even between ASCII-only locales. However, that does not apply to the basic building blocks, the decimal digits themselves. > If the existing semantics can be adequately defined, documented and > defended, then retaining them would be fine. However, the language > reference needs to define the behaviour properly so that other > implementations know what they need to support and what can be chalked > up as being just an implementation accident of CPython. (As a point in > the plus column, both decimal.Decimal and fractions.Fraction were able > to handle the '????.??' example in a manner consistent with the int > and float handling) The support is built into the C API, so there's not really much surprise there. Regarding documentation, we'd just have to add that numbers may be made up of an Unicode code point in the category "Nd". See http://www.unicode.org/versions/Unicode5.2.0/ch04.pdf, section 4.6 for details.... """ Decimal digits form a large subcategory of numbers consisting of those digits that can be used to form decimal-radix numbers. They include script-specific digits, but exclude char- acters such as Roman numerals and Greek acrophonic numerals. (Note that <1, 5> = 15 = fifteen, but = IV = four.) Decimal digits also exclude the compatibility subscript or superscript digits to prevent simplistic parsers from misinterpreting their values in context. """ int(), float() and long() (in Python2) are such simplistic parsers. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 29 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ziade.tarek at gmail.com Mon Nov 29 16:59:42 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Mon, 29 Nov 2010 16:59:42 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF37F56.9030808@ubuntu.com> References: <4CF2E86F.5000606@v.loewis.de> <4CF37F56.9030808@ubuntu.com> Message-ID: On Mon, Nov 29, 2010 at 11:24 AM, Matthias Klose wrote: > On 29.11.2010 00:40, "Martin v. L?wis" wrote: >> >> I have now completed >> >> http://www.python.org/dev/peps/pep-0384/ >> >> Benjamin has volunteered to rule on this PEP. >> >> Please comment with any changes you want to see, or speak in >> favor or against this PEP. > > I looked at a diff with r84330 from the py3k branch. > > Extensions built with Py_LIMITED_API have the python version encoded in it's > name. ?Which abi name should be used for these extensions? >.. > ?- Should the distutils support for LIMITED_API be part of the pep, or > ? be implemented later? In any case, it has to be implemented in Distutils2, not in Distutils. Distutils is frozen and just in maintenance mode. Once Distutils2 final is released (it's currently in alpha), it will be installable from 2.4 to 3.x and can provide this feature. For Python itself we can backport the feature in its setup.py, until Distutils2 is back to the sdtlib > In favour of the pep. +1 > > ?Matthias > -- Tarek Ziad? | http://ziade.org From alexander.belopolsky at gmail.com Mon Nov 29 17:07:03 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 29 Nov 2010 11:07:03 -0500 Subject: [Python-Dev] [Preview] Comments and change proposals on documentation In-Reply-To: References: <4CF18220.7000202@pearwood.info> <4CF19B3C.2000308@pearwood.info> Message-ID: On Mon, Nov 29, 2010 at 3:52 AM, Georg Brandl wrote: .. >> Yes, I failed to fully read the instructions you sent, or understand >> them. That's what users do -- they don't read your instructions, and >> they misunderstand them. If your UI isn't easily discoverable, users >> will not be able to use it, and will be frustrated and annoyed. The user >> is always right, even when they're doing it wrong *wink* > > That's right, of course. ?I really come to the conclusion that having a text > link that "looks like" a link, i.e. is underlined, will have a better UI > experience (since we cannot put notes "click bubble to comment" everywhere). > Please don't make comment bubbles more visible. Doing so will only decrease signal to noise ratio. I think a little bit of a learning barrier is a good thing: it will keep down the number of "Bart was here" comments. From alexander.belopolsky at gmail.com Mon Nov 29 19:09:58 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 29 Nov 2010 13:09:58 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF354C6.9020302@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> Message-ID: On Mon, Nov 29, 2010 at 2:22 AM, "Martin v. L?wis" wrote: >> The former ensures that literals in code are always readable; the later >> allows users to enter numbers in their own number system. How could that >> be a bad thing? > > It's YAGNI, feature bloat. It gives the illusion of supporting something > that actually isn't supported very well (namely, parsing local number > strings). I claim that there is no meaningful application > of this feature. > Speaking of YAGNI, does anyone want to defend >>> complex('????.??j') 1234.56j ? Especially given that we reject complex('1234.56i'): http://bugs.python.org/issue10562 From solipsis at pitrou.net Mon Nov 29 19:33:02 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 29 Nov 2010 19:33:02 +0100 Subject: [Python-Dev] Python and the Unicode Character Database References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> Message-ID: <20101129193302.115dbcd5@pitrou.net> On Mon, 29 Nov 2010 08:22:46 +0100 "Martin v. L?wis" wrote: > > The former ensures that literals in code are always readable; the later > > allows users to enter numbers in their own number system. How could that > > be a bad thing? > > It's YAGNI, feature bloat. It gives the illusion of supporting something > that actually isn't supported very well (namely, parsing local number > strings). I claim that there is no meaningful application > of this feature. Still, if it's not detrimental and it it's not difficult to support, then why do you care? You aren't even maintaining that part of the code. I don't think "remove feature bloat" is part of our development goals or practices. Given the diversity of our user base, such removal should be done carefully and only for serious reasons. Regards Antoine. From mal at egenix.com Mon Nov 29 19:59:57 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 29 Nov 2010 19:59:57 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> Message-ID: <4CF3F82D.2040000@egenix.com> Alexander Belopolsky wrote: > On Mon, Nov 29, 2010 at 2:22 AM, "Martin v. L?wis" wrote: >>> The former ensures that literals in code are always readable; the later >>> allows users to enter numbers in their own number system. How could that >>> be a bad thing? >> >> It's YAGNI, feature bloat. It gives the illusion of supporting something >> that actually isn't supported very well (namely, parsing local number >> strings). I claim that there is no meaningful application >> of this feature. This is not about parsing local number strings, it's about parsing number strings represented using different scripts - besides en-US is a locale as well, ye know :-) > Speaking of YAGNI, does anyone want to defend > >>>> complex('????.??j') > 1234.56j > > ? Yes. The same arguments apply. Just because ASCII-proponents may have a hard time reading such literals, doesn't mean that script users have the same trouble. > Especially given that we reject complex('1234.56i'): > > http://bugs.python.org/issue10562 We've had that discussion long before we had Unicode in Python. The main reason was that 'i' looked to similar to 1 in a number of fonts which is why it was rejected for Python source code. However, I don't any reason why we shouldn't accept both i and j for complex(), though, since the input to that constructor doesn't have to originate in Python source code. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 29 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From brett at python.org Mon Nov 29 20:22:22 2010 From: brett at python.org (Brett Cannon) Date: Mon, 29 Nov 2010 11:22:22 -0800 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: <20101129115311.GD18888@lupus.logilab.fr> References: <201011251530.23947.emile.anclin@logilab> <4CEE9B72.1070002@ronadam.com> <20101129115311.GD18888@lupus.logilab.fr> Message-ID: On Mon, Nov 29, 2010 at 03:53, Sylvain Th?nault wrote: > On 25 novembre 11:22, Ron Adam wrote: >> On 11/25/2010 08:30 AM, Emile Anclin wrote: >> > >> >hello, >> > >> >working on Pylint, we have a lot of voluntary corrupted files to test >> >Pylint behavior; for instance >> > >> >$ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py >> ># -*- coding: IBO-8859-1 -*- >> >""" check correct unknown encoding declaration >> >""" >> > >> >__revision__ = '????' >> > >> > >> >and we try to find that module : >> >find_module('func_unknown_encoding', None). But python3 raises SyntaxError >> >in that case ; it didn't raise SyntaxError on python2 nor does so on our >> >func_nonascii_noencoding and func_wrong_encoding modules (with obvious >> >names) >> > >> >Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) >> >[GCC 4.3.4] on linux2 >> >Type "help", "copyright", "credits" or "license" for more information. >> >>>>from imp import find_module >> >>>>find_module('func_unknown_encoding', None) >> >Traceback (most recent call last): >> > ? File " ", line 1, in >> >SyntaxError: encoding problem: with BOM >> >> I don't think there is a clear reason by design. ?Also try importing >> the same modules directly and noting the differences in the errors >> you get. > > IMO the point is that we can consider as a bug the fact that find_module > tries to somewhat read the content of the file, no? Though it seems to only > doing this for encoding detection or like since find_module doesn't choke on > a module containing another kind of syntax error. > > So the question is, should we deal with this in pylint/astng, or can we expect > this to be fixed at some point? Considering these semantics changed between Python 2 and 3 w/o a discernable benefit (I would consider it a negative as finding a module should not be impacted by syntactic correctness; the full act of importing should be the only thing that cares about that), I would consider it a bug that should be filed. From tjreedy at udel.edu Mon Nov 29 20:23:28 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 29 Nov 2010 14:23:28 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF3C477.1020007@egenix.com> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF38861.5090309@egenix.com> <4CF3C477.1020007@egenix.com> Message-ID: On 11/29/2010 10:19 AM, M.-A. Lemburg wrote: > Nick Coghlan wrote: >> On Mon, Nov 29, 2010 at 9:02 PM, M.-A. Lemburg wrote: >>> If we would go down that road, we would also have to disable other >>> Unicode features based on locale, e.g. whether to apply non-ASCII >>> case mappings, what to consider whitespace, etc. >>> >>> We don't do that for a good reason: Unicode is supposed to be >>> universal and not limited to a single locale. >> >> Because parsing numbers is about more than just the characters used >> for the individual digits. There are additional semantics associated >> with digit ordering (for any number) and decimal separators and >> exponential notation (for floating point numbers) and those vary by >> locale. We deliberately chose to make the builtin numeric parsers >> unaware of all of those things, and assuming that we can simply parse >> other digits as if they were their ASCII equivalents and otherwise >> assume a C locale seems questionable. > > Sure, and those additional semantics are locale dependent, even > between ASCII-only locales. However, that does not apply to the > basic building blocks, the decimal digits themselves. > >> If the existing semantics can be adequately defined, documented and >> defended, then retaining them would be fine. However, the language >> reference needs to define the behaviour properly so that other >> implementations know what they need to support and what can be chalked >> up as being just an implementation accident of CPython. (As a point in >> the plus column, both decimal.Decimal and fractions.Fraction were able >> to handle the '????.??' example in a manner consistent with the int >> and float handling) > > The support is built into the C API, so there's not really much > surprise there. > > Regarding documentation, we'd just have to add that numbers may > be made up of an Unicode code point in the category "Nd". > > See http://www.unicode.org/versions/Unicode5.2.0/ch04.pdf, section > 4.6 for details.... > > """ > Decimal digits form a large subcategory of numbers consisting of those digits that can be > used to form decimal-radix numbers. They include script-specific digits, but exclude char- > acters such as Roman numerals and Greek acrophonic numerals. (Note that<1, 5> = 15 = > fifteen, but = IV = four.) Decimal digits also exclude the compatibility subscript or > superscript digits to prevent simplistic parsers from misinterpreting their values in context. > """ > > int(), float() and long() (in Python2) are such simplistic > parsers. Since you are the knowledgable advocate of the current behavior, perhaps you could open an issue and propose a doc patch, even if not .rst formatted. -- Terry Jan Reedy From alexander.belopolsky at gmail.com Mon Nov 29 20:38:46 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 29 Nov 2010 14:38:46 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <20101129193302.115dbcd5@pitrou.net> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> Message-ID: On Mon, Nov 29, 2010 at 1:33 PM, Antoine Pitrou wrote: > On Mon, 29 Nov 2010 08:22:46 +0100 > "Martin v. L?wis" wrote: >> > The former ensures that literals in code are always readable; the later >> > allows users to enter numbers in their own number system. How could that >> > be a bad thing? >> >> It's YAGNI, feature bloat. It gives the illusion of supporting something >> that actually isn't supported very well (namely, parsing local number >> strings). I claim that there is no meaningful application >> of this feature. > > Still, if it's not detrimental and it it's not difficult to support, > then why do you care? It is difficult to support. A fix for issue10557 would be much simpler if we did not support non-European digits. I now added a patch that handles non-ascii digits, so you can see what's involved. Note that when Unicode Consortium inevitably adds more Nd characters to the non-BMP planes, we will have to add surrogate pairs' support to this code. In any case, there is little we can do about it in 3.2 other than fix bugs like issue10557 without breaking currently valid code, so I created a separate issue to continue this debate in context of 3.3. [issue10581] Now, I would like to bring this thread back to it's subject. Given that UCD is now affecting the language definition and the standard library behavior, how should changes to UCD be handled? - Should Python documentation refer to the specific version of Unicode that it supports? Current documentation refers to old versions. Should version be updated or removed to imply the latest? - How UCD updates should be handled during the language moratorium? During PEP 3003 discussion, it was suggested to handle it on a case by case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP 3003. Should this upgrade be backported to 2.7? - How specific should library reference manual be in defining methods affected by UCD such as str.upper()? - What is an acceptable level of variation between Python implementations? For example, if '\UXXXXXXXX'.isalpha() returns true in one implementation, can it return false in another? Note that even CPython narrow and wide builds are presently not consistent in this respect. [issue10581] http://bugs.python.org/issue10581 From alexander.belopolsky at gmail.com Mon Nov 29 20:43:14 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 29 Nov 2010 14:43:14 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2DAD7.2000408@egenix.com> <87pqto6bnv.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF38861.5090309@egenix.com> <4CF3C477.1020007@egenix.com> Message-ID: On Mon, Nov 29, 2010 at 2:23 PM, Terry Reedy wrote: .. > Since you are the knowledgable advocate of the current behavior, perhaps you > could open an issue and propose a doc patch, even if not .rst formatted. > I am not an advocate of the current behavior, but an issue for doc patches is at . From martin at v.loewis.de Mon Nov 29 20:38:59 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 20:38:59 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: References: <4CF2E86F.5000606@v.loewis.de> <4CF37F56.9030808@ubuntu.com> Message-ID: <4CF40153.8030100@v.loewis.de> >> - Should the distutils support for LIMITED_API be part of the pep, or >> be implemented later? > > In any case, it has to be implemented in Distutils2, not in Distutils. > Distutils is frozen and just in maintenance mode. I think it's too late for that. PEP 3149 is accepted, and it does specify a change to distutils (namely, the abi= parameter). ISTM that an approved PEP will override the distutils code freeze. > For Python itself we can backport the feature in its setup.py, until > Distutils2 is back to the sdtlib This won't be for python itself, but for extension modules. Regards, Martin From ziade.tarek at gmail.com Mon Nov 29 20:45:35 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Mon, 29 Nov 2010 20:45:35 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF40153.8030100@v.loewis.de> References: <4CF2E86F.5000606@v.loewis.de> <4CF37F56.9030808@ubuntu.com> <4CF40153.8030100@v.loewis.de> Message-ID: 2010/11/29 "Martin v. L?wis" : >>> ?- Should the distutils support for LIMITED_API be part of the pep, or >>> ? be implemented later? >> >> In any case, it has to be implemented in Distutils2, not in Distutils. >> Distutils is frozen and just in maintenance mode. > > I think it's too late for that. PEP 3149 is accepted, and it does > specify a change to distutils (namely, the abi= parameter). ISTM that > an approved PEP will override the distutils code freeze. Having an accepted PEP does not imply that it should be implemented in the standard library. For instance PEP 345 and PEP 376 are accepted but implemented in Distutils2. it's also a: - good opportunity to boost Distutils2 adoption - way to get feedback from people for that abi= option and have the chance to correct any design issue before d2 is added in the sdtlib > >> For Python itself we can backport the feature in its setup.py, until >> Distutils2 is back to the sdtlib > > This won't be for python itself, but for extension modules. ok. > > Regards, > Martin > -- Tarek Ziad? | http://ziade.org From rrr at ronadam.com Mon Nov 29 21:21:07 2010 From: rrr at ronadam.com (Ron Adam) Date: Mon, 29 Nov 2010 14:21:07 -0600 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: References: <201011251530.23947.emile.anclin@logilab> <4CEE9B72.1070002@ronadam.com> <20101129115311.GD18888@lupus.logilab.fr> Message-ID: On 11/29/2010 01:22 PM, Brett Cannon wrote: > On Mon, Nov 29, 2010 at 03:53, Sylvain Th?nault > wrote: >> On 25 novembre 11:22, Ron Adam wrote: >>> On 11/25/2010 08:30 AM, Emile Anclin wrote: >>>> >>>> hello, >>>> >>>> working on Pylint, we have a lot of voluntary corrupted files to test >>>> Pylint behavior; for instance >>>> >>>> $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py >>>> # -*- coding: IBO-8859-1 -*- >>>> """ check correct unknown encoding declaration >>>> """ >>>> >>>> __revision__ = '????' >>>> >>>> >>>> and we try to find that module : >>>> find_module('func_unknown_encoding', None). But python3 raises SyntaxError >>>> in that case ; it didn't raise SyntaxError on python2 nor does so on our >>>> func_nonascii_noencoding and func_wrong_encoding modules (with obvious >>>> names) >>>> >>>> Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) >>>> [GCC 4.3.4] on linux2 >>>> Type "help", "copyright", "credits" or "license" for more information. >>>>>> >from imp import find_module >>>>>>> find_module('func_unknown_encoding', None) >>>> Traceback (most recent call last): >>>> File " ", line 1, in >>>> SyntaxError: encoding problem: with BOM >>> >>> I don't think there is a clear reason by design. Also try importing >>> the same modules directly and noting the differences in the errors >>> you get. >> >> IMO the point is that we can consider as a bug the fact that find_module >> tries to somewhat read the content of the file, no? Though it seems to only >> doing this for encoding detection or like since find_module doesn't choke on >> a module containing another kind of syntax error. >> >> So the question is, should we deal with this in pylint/astng, or can we expect >> this to be fixed at some point? > > Considering these semantics changed between Python 2 and 3 w/o a > discernable benefit (I would consider it a negative as finding a > module should not be impacted by syntactic correctness; the full act > of importing should be the only thing that cares about that), I would > consider it a bug that should be filed. The output of imp.find_module() returns an open file io object, and it's output feeds directly into to imp.load_module(). >>> imp.find_module('pydoc') (<_io.TextIOWrapper name=4 encoding='utf-8'>, '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1)) So I think the imp.find_module() is suppose to be used when you *do* want to do the full act of importing and not for just finding out if or where module xyz exists. Ron From martin at v.loewis.de Mon Nov 29 21:22:02 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 21:22:02 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF37F56.9030808@ubuntu.com> References: <4CF2E86F.5000606@v.loewis.de> <4CF37F56.9030808@ubuntu.com> Message-ID: <4CF40B6A.6080407@v.loewis.de> > Extensions built with Py_LIMITED_API have the python version encoded in > it's name. Which abi name should be used for these extensions? PEP 3149, IIUC, says it should be "abi3". I don't understand what that means, though (with respect to, say, distutils) > - The m and u modifiers in the abi name are complimentary (?) See above: none of these will be used. Of course, it is possible to name an ABI-conforming extensions with the regular ABI name of the Python release. > - For posix systems the implementation is currently part of the abi name, > are Py_LIMITED_API extensions supposed to be compatible with e.g. PyPy? That's a choice that PyPy needs to make, of course, but Amaury has indicated that they are interested in doing so. > Should the LIMITED_API abi name include the implementation string? > - Should the distutils support for LIMITED_API be part of the pep, or > be implemented later? Depends on what support you want. Currently, all you need to do is to define Py_LIMITED_API to the preprocessor - this is something that is already supported in distutils. If you want the support suggested in PEP 3149 (specifying abi=3), it should certainly be implemented in Python 3.2, despite the distutils freeze. Regards, Martin From martin at v.loewis.de Mon Nov 29 21:36:46 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 21:36:46 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: References: <4CF2E86F.5000606@v.loewis.de> Message-ID: <4CF40EDE.10004@v.loewis.de> > This is probably an issue independent of the PEP but there appear to > be a *lot* of exposed typedefs for various type slots and other > function signatures that don't start with the Py prefix (i.e. getter, > setter, unaryfunc and friends). It's indeed independent: the names don't actually affect the ABI, but the API. Changing them is possible later without risking binary compatibility. > Python.h shouldn't be leaking > unprefixed names like that. We certainly shouldn't be enshrining them > in the stable ABI without adding prefixes first. The stable ABI isn't actually enshrining them - what gets enshrined is the value of the typedefs, not their names. I don't mind renaming them, though. I see a number of different cases: - struct names. I don't see a problem to have "typedef struct PyFoo PyFoo" I vaguely recall that there had been compiler problems with that construct at some point, but to my knowledge, they are past, and this is actually both well-formed C and well-formed C++. - function pointer type names - "various" other types For the struct types, in particular for the ones which already have a typedef, I think renaming them should be possible right away. Applications that break should be able to use the typedef instead, and continue to work with older releases. For the function pointer type names, caution is necessary. We cannot remove them, since it would break a lot of code. I also think that some smart naming scheme would be desirable that makes the names all sound right, yet allows easy mapping from the existing types. Once such a scheme is added, we should have a graceful deprecation procedure, such as: - release A: add typedefs in addition to existing pointer types, deprecate pointer types in documentation - release B>A: make the old names somehow conditional (e.g. put them all into a header file rename3.h, or some such) - release C>B: remove rename3.h For the other rest, I think many of them are considered internal (of course, they shouldn't appear in the ABI then at all). Renaming them right away might be fine. Regards, Martin From martin at v.loewis.de Mon Nov 29 21:41:09 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 29 Nov 2010 21:41:09 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <4CF3A736.4050003@netwok.org> References: <4CF2E86F.5000606@v.loewis.de> <4CF3A736.4050003@netwok.org> Message-ID: <4CF40FE5.8080800@v.loewis.de> Am 29.11.2010 14:14, schrieb ?ric Araujo: > Hello, > >> Please comment with any changes you want to see, or speak in >> favor or against this PEP. > > How to get a diff between py3k and this branch? As others have already explained: svn diff http://svn.python.org/projects/python/branches/py3k at 84329 http://svn.python.org/projects/python/branches/pep-0384 (84329 is the value of svnmerge-integrated). In any case, I posted it to Rietveld as http://codereview.appspot.com/3262043/ Regards, Martin From greg.ewing at canterbury.ac.nz Mon Nov 29 21:47:23 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Nov 2010 09:47:23 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF1AB3C.3060408@btinternet.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CEDDC2D.204@canterbury.ac.nz> <4CEE5C1C.9000905@btinternet.com> <4CF2C86C.9030505@canterbury.ac.nz> <4CF1AB3C.3060408@btinternet.com> Message-ID: <4CF4115B.7080200@canterbury.ac.nz> Rob Cliffe wrote: > But when a frozen list a.k.a. tuple would be created - either directly, > or by setting a list's mutable flag to False which would really turn it > into a tuple - the size *would* be known. But at that point the object consists of two memory blocks -- one containing just the object header and a pointer to the items, and the other containing the items. To turn that into a true tuple structure would require resizing the main object block to be big enough to hold the items and copying them into it. The main object can't be moved (because there are PyObject *s all over the place pointing to it), so if there's not enough room at its current location, you're out of luck. So lists frozen after creation would have to remain as two blocks, making them second-class citizens compared to those that were created frozen. Either that or store all lists/tuples as two blocks, and give up some of the performance advantages of the current tuple structure. -- Greg From martin at v.loewis.de Mon Nov 29 22:04:03 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 22:04:03 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <20101129193302.115dbcd5@pitrou.net> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> Message-ID: <4CF41543.1030800@v.loewis.de> Am 29.11.2010 19:33, schrieb Antoine Pitrou: > On Mon, 29 Nov 2010 08:22:46 +0100 > "Martin v. L?wis" wrote: >>> The former ensures that literals in code are always readable; the later >>> allows users to enter numbers in their own number system. How could that >>> be a bad thing? >> >> It's YAGNI, feature bloat. It gives the illusion of supporting something >> that actually isn't supported very well (namely, parsing local number >> strings). I claim that there is no meaningful application >> of this feature. > > Still, if it's not detrimental and it it's not difficult to support, > then why do you care? You aren't even maintaining that part of the code. I sure do maintain the Unicode database implementation in Python - the one that is being used (IMO incorrectly) to implement the conversion in question (and also the one that triggered this thread). > I don't think "remove feature bloat" is part of our development goals > or practices. Given the diversity of our user base, such removal should > be done carefully and only for serious reasons. I think it's a serious reason that the intuitive expectation of many people (including committers) deviates from the actual implementation - so much that they clarify the documentation in a way that makes the difference explicit. Having a mismatch between the expected behavior and the actual behavior is a serious problem because it could lead to security issues, e.g. when someone relies on float() to perform certain syntactic checking, making it then possible to sneak in values that cause corruption later on (speaking theoretically, of course - I'm not aware of an application that is vulnerable in this manner). Regards, Martin From martin at v.loewis.de Mon Nov 29 22:13:41 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 22:13:41 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> Message-ID: <4CF41785.5020807@v.loewis.de> > - Should Python documentation refer to the specific version of Unicode > that it supports? You mean, mention it somewhere? Sure (although it would be nice if the documentation generator would automatically extract it from the source, just as it extracts the Python version number). Of course, such mentioning should explain that this is specific to CPython, and not an aspect of Python-the-language. > Current documentation refers to old versions. Should version be > updated or removed to imply the latest? What specific reference are you referring to? > - How UCD updates should be handled during the language moratorium? It's clearly not affected. > During PEP 3003 discussion, it was suggested to handle it on a case by > case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP > 3003. It's covered by "As the standard library is not directly tied to the language definition it is not covered by this moratorium." > Should this upgrade be backported to 2.7? No, it's a new feature. > - How specific should library reference manual be in defining methods > affected by UCD such as str.upper()? It should specify what this actually does in Unicode terminology (probably in addition to a layman's rephrase of that) > - What is an acceptable level of variation between Python > implementations? For example, if '\UXXXXXXXX'.isalpha() returns true > in one implementation, can it return false in another? Implementations are free to use any version of the UCD. Regards, Martin From martin at v.loewis.de Mon Nov 29 22:14:07 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Nov 2010 22:14:07 +0100 Subject: [Python-Dev] PEP 384 final review In-Reply-To: References: <4CF2E86F.5000606@v.loewis.de> <4CF35FAA.50600@v.loewis.de> Message-ID: <4CF4179F.9080700@v.loewis.de> Am 29.11.2010 09:36, schrieb Georg Brandl: > Am 29.11.2010 09:09, schrieb "Martin v. L?wis": >>> I have now completed >>> >>> http://www.python.org/dev/peps/pep-0384/ >>> >>> >>> was structseq.h considered? >> >> No, it wasn't - unfortunately, it still doesn't get included when >> including Python.h. I'll add it. > > Would 3.2 be a good time to finally include it? All of its macros and > declarations are named PyStructSequence*, so there shouldn't be a > name clash concern. Sure, I see no problem with that. Regards, Martin From greg.ewing at canterbury.ac.nz Mon Nov 29 22:36:51 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Nov 2010 10:36:51 +1300 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> <4CF28310.7070304@voidspace.org.uk> Message-ID: <4CF41CF3.7040001@canterbury.ac.nz> I don't see how the grouping can be completely separated from the value-naming. If the named values are to be subclassed from the base values, then you want all the members of a group to belong to the *same* subclass. You can't get that by treating each named value on its own and then trying to group them together afterwards. -- Greg From steve at pearwood.info Mon Nov 29 23:09:15 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 30 Nov 2010 09:09:15 +1100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> Message-ID: <4CF4248B.1060409@pearwood.info> Alexander Belopolsky wrote: > Speaking of YAGNI, does anyone want to defend > >>>> complex('????.??j') > 1234.56j *If* we allow float('????.??') (as we currently do, but is being disputed by some), then we should allow complex('????.??j'). It would be silly for complex to be more restrictive than float. > Especially given that we reject complex('1234.56i'): I don't understand why you use 'i' when Python uses 'j' as the symbol for imaginary numbers. >>> complex('1234.56j') 1234.56j works fine. I have no problem with Python choosing one of i/j as the symbol for imaginary-1 and rejecting the other. I prefer i rather than j, but that's because my background is in maths rather than electrical engineering, but I can live with either. But in any case, please don't conflate the question of whether Python should accept j and/or i for complex numbers with the question of supporting non-arabic numerals. The two issues are unrelated. -- Steven From rrr at ronadam.com Tue Nov 30 00:38:26 2010 From: rrr at ronadam.com (Ron Adam) Date: Mon, 29 Nov 2010 17:38:26 -0600 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF3180B.1060306@ronadam.com> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF3180B.1060306@ronadam.com> Message-ID: On 11/28/2010 09:03 PM, Ron Adam wrote: > It does associate additional info to names and creates a nice dictionary to > reference. > > > >>> def name_values( FOO: 1, > BAR: "Hello World!", > BAZ: dict(a=1, b=2, c=3) ): > ... return FOO, BAR, BAZ > ... > >>> foo(1,2,3) > (1, 2, 3) > >>> foo.__annotations__ > {'BAR': 'Hello World!', 'FOO': 1, 'BAZ': {'a': 1, 'c': 3, 'b': 2}} sigh... I havn't been very focused lately. That should have been: >>> def named_values(FOO:1, BAR:"Hello World!", BAZ:dict(a=1, b=2, c=3)): ... return FOO, BAR, BAZ ... >>> named_values.__annotations__ {'BAR': 'Hello World!', 'FOO': 1, 'BAZ': {'a': 1, 'c': 3, 'b': 2}} >>> named_values(1, 2, 3) (1, 2, 3) Cheers, Ron From ncoghlan at gmail.com Tue Nov 30 03:04:28 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 30 Nov 2010 12:04:28 +1000 Subject: [Python-Dev] PEP 384 final review In-Reply-To: <28693E2E-A60E-4F83-BF55-DBD6EAD88353@fuhm.net> References: <4CF2E86F.5000606@v.loewis.de> <4CF3A736.4050003@netwok.org> <4CF3AC9D.20309@ubuntu.com> <28693E2E-A60E-4F83-BF55-DBD6EAD88353@fuhm.net> Message-ID: On Tue, Nov 30, 2010 at 12:15 AM, James Y Knight wrote: > > On Nov 29, 2010, at 8:58 AM, Nick Coghlan wrote: > > The http read only URLs > didn't work (no diff returned, just "svn: OPTIONS of > 'http://svn.python.org/python/branches/pep-0384': 200 OK > (http://svn.python.org)"), > > That was the wrong url: you should've > used?http://svn.python.org/projects/python/branches/pep-0384 > James Ah, thanks, I always forget that part (since it isn't there in the read/write URLs). The SVN output may qualify as one of the least helpful error messages I have ever seen, though :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Tue Nov 30 03:23:04 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 30 Nov 2010 12:23:04 +1000 Subject: [Python-Dev] constant/enum type in stdlib In-Reply-To: <4CF41CF3.7040001@canterbury.ac.nz> References: <20101121034404.52924F20A@mail.python.org> <4CE9BF4A.1020302@netwok.org> <4CEA89E8.5090107@voidspace.org.uk> <20101122163722.7e96d123@pitrou.net> <4CEA9584.7040301@avl.com> <20101122172440.77d27ed5@pitrou.net> <20101122164654.2109.588145158.divmod.xquotient.165@localhost.localdomain> <4CEBC6BD.9060402@voidspace.org.uk> <4CED0557.9090101@voidspace.org.uk> <4CED4E34.5060400@voidspace.org.uk> <4CF1706E.5030503@g.nevcal.com> <1D372F35-B455-4982-997B-2C54A7D56741@gmail.com> <4CF28310.7070304@voidspace.org.uk> <4CF41CF3.7040001@canterbury.ac.nz> Message-ID: On Tue, Nov 30, 2010 at 7:36 AM, Greg Ewing wrote: > I don't see how the grouping can be completely separated > from the value-naming. If the named values are to be > subclassed from the base values, then you want all the > members of a group to belong to the *same* subclass. > You can't get that by treating each named value on its > own and then trying to group them together afterwards. Note that my sample implementation cached the created types, so that (for example) there was only ever one "Named " type (my implementation wasn't quite kosher in that respect, since functools.lru_cache has a non-optional size limit - setting maxsize to float('inf') deals with that). A grouping API would use either single or multiple inheritance to create members that supported both the naming aspects as well as the grouping aspects. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From alexander.belopolsky at gmail.com Tue Nov 30 04:46:33 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 29 Nov 2010 22:46:33 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF4248B.1060409@pearwood.info> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <4CF4248B.1060409@pearwood.info> Message-ID: On Mon, Nov 29, 2010 at 5:09 PM, Steven D'Aprano wrote: .. > But in any case, please don't conflate the question of whether Python should > accept j and/or i for complex numbers with the question of supporting > non-arabic numerals. The two issues are unrelated. The two issues are related because they are both about how strict numerical constructors should be. If we want to accept wide variations in how numbers can be spelled, then surely using i for the imaginary unit is much more common than using ? for the digit 7. I see two problems with supporting non-ascii spellings: 1. Support costs. 2. User confusion. The two are related because when users are confused, they will report invalid bugs when Python does not meet their expectations. For example, why >>> int('???', 10) 123 works, but >>> int('??????', 16) Traceback (most recent call last): .. UnicodeEncodeError: 'decimal' codec can't encode character '\uff21' in position 3: invalid decimal Unicode string does not? And if 'decimal' is a codec, why >>> '123'.encode('decimal') Traceback (most recent call last): ... LookupError: unknown encoding: decimal Before anyone suggests that int(.., 16) should consult the new Hex_Digit property in the UCD, let me remind that int() supports bases from 2 through 36. I thought Python design was primarily driven by practicality. Here the only plausible argument that one can make is that if Unicode says it is a digit, we should treat it as a digit. Purity over practicality. In practical terms, UCD comes at a price. The unicodedata module size is over 700K on my machine. This is almost half the size of the python executable and by far the largest extension module. (only CJK encodings come close.) Making builtins depend on the largest extension module for operation does not strike me as sound design. From stephen at xemacs.org Tue Nov 30 05:20:11 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 30 Nov 2010 13:20:11 +0900 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF3F82D.2040000@egenix.com> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <4CF3F82D.2040000@egenix.com> Message-ID: <87d3pn5tok.fsf@uwakimon.sk.tsukuba.ac.jp> M.-A. Lemburg writes: > Just because ASCII-proponents may have a hard time reading such > literals, That's not the point. > doesn't mean that script users have the same trouble. The script users may have no trouble reading them, but that doesn't mean it's not a YAGNI. In Japanese, it's a YAGNI except in addresses on New Year cards and in dates, which could be handled by specialized modules, or by a generic module for extracting numeric information from general (as opposed to program) text. Neither of those is likely to appear in program text in context where they would be used as a numeric literal. In fact, Python *does* consider it a YAGNI for Han! Although my apartment number would be written "???" on a New Year card, Python won't parse it as 704: unicodedata considers those digits to be Lo, except for "?" which fails anyway because it's Nl, not Nd. (To add insult to injury, it doesn't even return numeric values for those characters, even though any Han-user would consider them numeric when used in isolation, except that Japanese would be likely to consider "?" to be the non-numeric "maru" symbol, ie, circle, meaning "OK"!) The whole concept of numeric in Unicode is a mess; why import that mess into Python? Can you give any examples where people do computation, keep books, or do nuclear physics in non-Arabic numerals? I suppose Arabic users might, but even there I suspect not. From stephen at xemacs.org Tue Nov 30 05:39:21 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 30 Nov 2010 13:39:21 +0900 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF4248B.1060409@pearwood.info> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <4CF4248B.1060409@pearwood.info> Message-ID: <87bp575ssm.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > But in any case, please don't conflate the question of whether Python > should accept j and/or i for complex numbers with the question of > supporting non-arabic numerals. The two issues are unrelated. Different, yes, unrelated, no. They're both about whether variant forms of universally used literals should be allowed in a programming language, or whether only the canonical form is allowed. Note that *nobody* is saying that Python should have no facility for parsing these numbers, only that by default literal decimal numerals should be encoded as ASCII digits. For example, I would not object to int() getting a Boolean flag meaning "consult unicodedata for non-ASCII digits", just as it has an optional parameter meaning "decode in base other than 10".[1] OTOH, until somebody says "Yes, in Mecca the bazaar traders keep books on their Lenovos using ISO-8859-6 numerals, and it would be painful for them to switch to what we call 'Arabic' numerals", I'm going to consider it a YAGNI. Just as even though mathematicians clearly prefer "i" as the imaginary unit, there's not enough pain involved in them switching to "j" to make it worth supporting both. (BTW, my first reaction to the "j" notation was "cool, Python supports quaternions out of the box!" It took only a second or so to return to reality, but that was my first reaction.) Footnotes: [1] That might not be a good idea on other grounds, but in principle I would be OK with such built-ins accepting non-ASCII digits on request. From merwok at netwok.org Tue Nov 30 07:33:51 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Tue, 30 Nov 2010 07:33:51 +0100 Subject: [Python-Dev] PEP 291 versus Python 3 Message-ID: <4CF49ACF.6070904@netwok.org> Good morning python-dev, PEP 291 (Backward Compatibility for Standard Library) does not seem to take Python 3 into account. Is this PEP only relevant for the 2.7 branch?* If it?s supposed to apply to 3.x too, despite the view that 3.0 was a clean break, what does it mean to have a module that is developed in the py3k branch and should retain compatibility with 2.3 or 1.5.2? * Tarek?s interpretation: ?The 2.x needs to stay 2.3 compatible so we should keep the 3.x as similar as possible for bugfixes.? In the particular case of distutils (should be compatible with 2.3), we (including I) have been lax. Our tests for example use modern unittest features like skips, which makes them not runnable on old Pythons. I am very uncomfortable with code that seems to run fine but which tests (however few) cannot be run, so I think I?ll have to trade the skips for old-style ?return? statements. The other way of solving that is to change the compat policy. If I remember correctly, the rationale for code compat in distutils is that people may copy distutils from Python x.y to their install of x.y-n; I don?t know if this is still an active practice, and if it is, I don?t know if it should be supported, considering that distutils2 (compatible with 2.4+ and available from PyPI) is coming. Regards From regebro at gmail.com Tue Nov 30 09:10:37 2010 From: regebro at gmail.com (Lennart Regebro) Date: Tue, 30 Nov 2010 09:10:37 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: Message-ID: On Sun, Nov 28, 2010 at 21:24, Alexander Belopolsky wrote: > While we have little choice but to follow UCD in defining > str.isidentifier(), I think Python can promise users more stability in > what it treats as space or as a digit in its builtins. Why? I can see this is a problem if one character that earlier was allowed no longer is. That breaks backwards compatibility. This doesn't. >>>> float('????.??') > 1234.56 > > is more important than to assure users that once their program > accepted some text as a number, they can assume that the text is > ASCII. *I* think it is more important. In python 3, you can never ever assume anything is ASCII any more. ASCII is practically dead an buried as far as Python goes, unless you explicitly encode to it. > def deposit(self, amountstr): > self.balance += float(amountstr) > audit_log("Deposited: " + amountstr) > > Auditor: > > $ cat numbered-account.log > Deposited: ?????.?? That log reasonably should be in UTF-8 or something else, in which case this is not a problem. And that's ignoring that it makes way more sense to log the numerical amount. -- Lennart Regebro: http://regebro.wordpress.com/ Python 3 Porting: http://python3porting.com/ +33 661 58 14 64 From hagen at zhuliguan.net Tue Nov 30 09:15:54 2010 From: hagen at zhuliguan.net (=?ISO-8859-1?Q?Hagen_F=FCrstenau?=) Date: Tue, 30 Nov 2010 09:15:54 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF41785.5020807@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF41785.5020807@v.loewis.de> Message-ID: >> During PEP 3003 discussion, it was suggested to handle it on a case by >> case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP >> 3003. > > It's covered by "As the standard library is not directly tied to the > language definition it is not covered by this moratorium." How is this restricted to the stdlib if it defines the set of valid identifiers? - Hagen From stephen at xemacs.org Tue Nov 30 09:23:10 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 30 Nov 2010 17:23:10 +0900 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: Message-ID: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> Lennart Regebro writes: > *I* think it is more important. In python 3, you can never ever assume > anything is ASCII any more. Sure you can. In Python program text, all keywords will be ASCII (English, even, though it may be en_NL.UTF-8 ) for the forseeable future. I see no reason not to make a similar promise for numeric literals. I see no good reason to allow compatibility full-width Japanese "ASCII" numerals or Arabic cursive numerals in "for i in range(...)" for example. As soon as somebody gives an example of a culture, however minor, that uses computers but actively prefers to use non-ASCII numerals to express numbers in an IT context, I'll review my thinking. But at the moment it's 101% YAGNI. From sylvain.thenault at logilab.fr Tue Nov 30 09:34:18 2010 From: sylvain.thenault at logilab.fr (Sylvain =?utf-8?B?VGjDqW5hdWx0?=) Date: Tue, 30 Nov 2010 09:34:18 +0100 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: References: <201011251530.23947.emile.anclin@logilab> <4CEE9B72.1070002@ronadam.com> <20101129115311.GD18888@lupus.logilab.fr> Message-ID: <20101130083418.GB4157@lupus.logilab.fr> On 29 novembre 14:21, Ron Adam wrote: > On 11/29/2010 01:22 PM, Brett Cannon wrote: > >Considering these semantics changed between Python 2 and 3 w/o a > >discernable benefit (I would consider it a negative as finding a > >module should not be impacted by syntactic correctness; the full act > >of importing should be the only thing that cares about that), I would > >consider it a bug that should be filed. > > The output of imp.find_module() returns an open file io object, and > it's output feeds directly into to imp.load_module(). > > >>> imp.find_module('pydoc') > (<_io.TextIOWrapper name=4 encoding='utf-8'>, > '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1)) > > So I think the imp.find_module() is suppose to be used when you *do* > want to do the full act of importing and not for just finding out if > or where module xyz exists. in python 2, find_module was usable for such usage, and this is a needed api for a tool like pylint. Is there another way to do so with python 3? -- Sylvain Th?nault LOGILAB, Paris (France) Formations Python, Debian, M?th. Agiles: http://www.logilab.fr/formations D?veloppement logiciel sur mesure: http://www.logilab.fr/services CubicWeb, the semantic web framework: http://www.cubicweb.org From cornsea at gmail.com Tue Nov 30 09:41:19 2010 From: cornsea at gmail.com (haiyang kang) Date: Tue, 30 Nov 2010 16:41:19 +0800 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: hi, I agree with this. I never seen any man in China using chinese number literals (at least two kinds:?, ?, same meaning with 1) in Python program, except UI output. They can do some mappings when want to output these non-ascii numbers. Example: if 1: print "?" I think it is a little ugly to have code like this: num = float("?.?"), expected result is: num = 1.1 br, khy On Tue, Nov 30, 2010 at 4:23 PM, Stephen J. Turnbull wrote: > Lennart Regebro writes: > > ?> *I* think it is more important. In python 3, you can never ever assume > ?> anything is ASCII any more. > > Sure you can. ?In Python program text, all keywords will be ASCII > (English, even, though it may be en_NL.UTF-8 ) for the forseeable > future. > > I see no reason not to make a similar promise for numeric literals. ?I > see no good reason to allow compatibility full-width Japanese "ASCII" > numerals or Arabic cursive numerals in "for i in range(...)" for > example. > > As soon as somebody gives an example of a culture, however minor, that > uses computers but actively prefers to use non-ASCII numerals to > express numbers in an IT context, I'll review my thinking. ?But at the > moment it's 101% YAGNI. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/cornsea%40gmail.com > From ziade.tarek at gmail.com Tue Nov 30 10:14:20 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 30 Nov 2010 10:14:20 +0100 Subject: [Python-Dev] PEP 291 versus Python 3 In-Reply-To: <4CF49ACF.6070904@netwok.org> References: <4CF49ACF.6070904@netwok.org> Message-ID: On Tue, Nov 30, 2010 at 7:33 AM, ?ric Araujo wrote: > Good morning python-dev, > > PEP 291 (Backward Compatibility for Standard Library) does not seem to > take Python 3 into account. ?Is this PEP only relevant for the 2.7 > branch?* ?If it?s supposed to apply to 3.x too, despite the view that > 3.0 was a clean break, what does it mean to have a module that is > developed in the py3k branch and should retain compatibility with 2.3 or > 1.5.2? > > * Tarek?s interpretation: ?The 2.x needs to stay 2.3 compatible > ?so we should keep the 3.x as similar as possible for bugfixes.? > > In the particular case of distutils (should be compatible with 2.3), we > (including I) have been lax. ?Our tests for example use modern unittest > features like skips, which makes them not runnable on old Pythons. ?I am > very uncomfortable with code that seems to run fine but which tests > (however few) cannot be run, so I think I?ll have to trade the skips for > old-style ?return? statements. You shouldn't be uncomfortable with the current state of distutils and try to improve its tests (or improve any other nasty stuff you'll find in that code) Distutils is dead code. All we have to do is the bare minimum maintenance. Everything else is a waste of time. >?The other way of solving that is to > change the compat policy. ?If I remember correctly, the rationale for > code compat in distutils is that people may copy distutils from Python > x.y to their install of x.y-n; I don?t know if this is still an active > practice, and if it is, I don?t know if it should be supported, > considering that distutils2 (compatible with 2.4+ and available from > PyPI) is coming. Again, don't worry about these rules in Distutils now. The only rule that now apply to Distutils is that we do only bug fixing, and we should not waste our precious time to do other stuff in there. Plain python tests are fine for what we want to do and simplify our forward ports and backports. One thing we should do though, is fix those bugs in Distutils2 first when they exist there too. I really appreciate all the hard work your are doing in triaging the issues and bug fixing by the way ! Tarek From emile.anclin at logilab.fr Tue Nov 30 10:39:29 2010 From: emile.anclin at logilab.fr (Emile Anclin) Date: Tue, 30 Nov 2010 10:39:29 +0100 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: References: <201011251530.23947.emile.anclin@logilab> <20101129115311.GD18888@lupus.logilab.fr> Message-ID: <201011301039.30033.emile.anclin@logilab> On Monday 29 November 2010 20:22:22 Brett Cannon wrote: > > Considering these semantics changed between Python 2 and 3 w/o a > discernable benefit (I would consider it a negative as finding a > module should not be impacted by syntactic correctness; the full act > of importing should be the only thing that cares about that), I would > consider it a bug that should be filed. ok, here it is : http://bugs.python.org/issue10588 Since I did not understand all of it, I just quoted Brett Cannon in the ticket. -- Emile Anclin http://www.logilab.fr/ http://www.logilab.org/ Informatique scientifique & et gestion de connaissances From steve at pearwood.info Tue Nov 30 13:59:49 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 30 Nov 2010 23:59:49 +1100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CF4F545.5030902@pearwood.info> haiyang kang wrote: > hi, > > I agree with this. > > I never seen any man in China using chinese number literals (at > least two kinds:?, ?, same meaning with 1) > in Python program, except UI output. > > They can do some mappings when want to output these non-ascii numbers. > Example: if 1: print "?" > > I think it is a little ugly to have code like this: num = > float("?.?"), expected result is: num = 1.1 I don't expect that anyone would sensibly write code like that, except for testing. You wouldn't write num = float("1.1") instead of just num = 1.1 either. But you should be able to write: text = input("Enter a number using your preferred digits: ") num = float(text) without caring whether the user enters ?.? or 1.1 or something else. -- Steven From fuzzyman at voidspace.org.uk Tue Nov 30 14:09:16 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 30 Nov 2010 13:09:16 +0000 Subject: [Python-Dev] PEP 291 versus Python 3 In-Reply-To: <4CF49ACF.6070904@netwok.org> References: <4CF49ACF.6070904@netwok.org> Message-ID: <4CF4F77C.4000308@voidspace.org.uk> On 30/11/2010 06:33, ?ric Araujo wrote: > Good morning python-dev, > > PEP 291 (Backward Compatibility for Standard Library) does not seem to > take Python 3 into account. Is this PEP only relevant for the 2.7 > branch?* If it?s supposed to apply to 3.x too, despite the view that > 3.0 was a clean break, what does it mean to have a module that is > developed in the py3k branch and should retain compatibility with 2.3 or > 1.5.2? PEP 291 is very old and should probably be retired. I don't think anyone is maintaining standard libraries in py3k that are also compatible with Python 2.anything. (At least not in a single codebase.) For Python 2.7 that may not be true, but for Python 3 I think we can start with a clean slate on compatibility. > * Tarek?s interpretation: ?The 2.x needs to stay 2.3 compatible > so we should keep the 3.x as similar as possible for bugfixes.? > > In the particular case of distutils (should be compatible with 2.3), we > (including I) have been lax. Our tests for example use modern unittest > features like skips, which makes them not runnable on old Pythons. They can be run on old Pythons with unittest2. This is what distutils2 is doing. > I am > very uncomfortable with code that seems to run fine but which tests > (however few) cannot be run, so I think I?ll have to trade the skips for > old-style ?return? statements. The other way of solving that is to > change the compat policy. This is only an issue for distutils in Python 2.7 right? Maintaining the compat policy for that will be a short-lived pain, and distutils itself is getting only infrequent bugfixes *anyway*, right? I defer to Tarek on that particular decision. All the best, Michael > If I remember correctly, the rationale for > code compat in distutils is that people may copy distutils from Python > x.y to their install of x.y-n; I don?t know if this is still an active > practice, and if it is, I don?t know if it should be supported, > considering that distutils2 (compatible with 2.4+ and available from > PyPI) is coming. > > Regards > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From steve at pearwood.info Tue Nov 30 14:23:22 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 01 Dec 2010 00:23:22 +1100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4CF4FACA.8040900@pearwood.info> Stephen J. Turnbull wrote: > Lennart Regebro writes: > > > *I* think it is more important. In python 3, you can never ever assume > > anything is ASCII any more. > > Sure you can. In Python program text, all keywords will be ASCII > (English, even, though it may be en_NL.UTF-8 ) for the forseeable > future. > > I see no reason not to make a similar promise for numeric literals. I > see no good reason to allow compatibility full-width Japanese "ASCII" > numerals or Arabic cursive numerals in "for i in range(...)" for > example. I agree with you that numeric *literals* should be restricted to the ASCII digits. I don't think anyone here is arguing differently -- if they are, they should speak up and try to make the case for allowing numeric literals in arbitrary scripts. Python doesn't currently allow non-ASCII numeric literals, and even if such a change were desirable, it would run up against the moratorium. So let's just forget the specter of code like: x = math.sqrt(????.?? ** ?.?) It ain't gonna happen :) But I think there is a good case for allowing the constructors int, float and complex to continue to accept numeric *strings* with non-ASCII digits. The code already exists, there's probably people out there who rely on it, and in the absence of any convincing demonstration that the existing behaviour is causing widespread difficulty, we should leave well-enough alone. Various people have suggested that there should be a function in the locale module that handles numeric string input in non-ASCII digits. This is a de facto admission that there are use-cases for taking user input like the string '?' and turning it into the int 3. Python can already do this, and has been able to for many years: [steve at sylar ~]$ python2.4 Python 2.4.6 (#1, Mar 30 2009, 10:08:01) [GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> int(u'?') 3 It seems to me that there's no need to move this functionality into locale. -- Steven From solipsis at pitrou.net Tue Nov 30 14:32:54 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Nov 2010 14:32:54 +0100 Subject: [Python-Dev] Python and the Unicode Character Database References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF4FACA.8040900@pearwood.info> Message-ID: <20101130143254.1964e4a8@pitrou.net> On Wed, 01 Dec 2010 00:23:22 +1100 Steven D'Aprano wrote: > > But I think there is a good case for allowing the constructors int, > float and complex to continue to accept numeric *strings* with non-ASCII > digits. The code already exists, there's probably people out there who > rely on it, and in the absence of any convincing demonstration that the > existing behaviour is causing widespread difficulty, we should leave > well-enough alone. +1 > It seems to me that there's no need to move this functionality into locale. Not only, but moving it into locale won't make it easier to maintain anyway. Regards Antoine. From solipsis at pitrou.net Tue Nov 30 14:38:22 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Nov 2010 14:38:22 +0100 Subject: [Python-Dev] Module size References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <4CF4248B.1060409@pearwood.info> Message-ID: <20101130143822.40a827de@pitrou.net> On Mon, 29 Nov 2010 22:46:33 -0500 Alexander Belopolsky wrote: > > In practical terms, UCD comes at a price. The unicodedata module size > is over 700K on my machine. This is almost half the size of the > python executable and by far the largest extension module. (only CJK > encodings come close.) Making builtins depend on the largest > extension module for operation does not strike me as sound design. Well, do they depend on it? _PyUnicode_EncodeDecimal seems to depend only on Objects/unicodectype.c. $ size Objects/unicode*.o text data bss dec hex filename 60398 0 0 60398 ebee Objects/unicodectype.o 130440 13559 2208 146207 23b1f Objects/unicodeobject.o Antoine. From alexander.belopolsky at gmail.com Tue Nov 30 15:18:13 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 30 Nov 2010 09:18:13 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF4F545.5030902@pearwood.info> References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF4F545.5030902@pearwood.info> Message-ID: On Tue, Nov 30, 2010 at 7:59 AM, Steven D'Aprano wrote: .. > But you should be able to write: > > text = input("Enter a number using your preferred digits: ") > num = float(text) > > without caring whether the user enters ?.? or 1.1 or something else. > I find it ironic that people who argue for preservation of the current behavior do it without checking what it actually is: >>> float('?.?') .. UnicodeEncodeError: 'decimal' codec can't encode character '\u4e00' .. This one of the biggest problems with this feature. It does not fit user's expectations. Even the original author of the decimal "codec" expected the above to work. [1] > Python can already do this, and has been able to for many years: > >>> int(u'?') > 3 but you can do this without support from int() as well: >>> import unicodedata >>> unicodedata.digit('?') 3 and for Unihan numbers, you can do >>> unicodedata.numeric('?') 1.0 and >>> unicodedata.numeric('?') 8.0 and if you are so inclined, >>> [unicodedata.numeric(c) for c in "? ? ? ? ?".split()] [10000.0, 5000.0, 0.6, 0.875, 90000.0] Do you want to see all these supported by float()? [1] "makeunicodedata.py does not support Unihan digit data" http://bugs.python.org/issue10575 From alexander.belopolsky at gmail.com Tue Nov 30 15:32:38 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 30 Nov 2010 09:32:38 -0500 Subject: [Python-Dev] Module size In-Reply-To: <20101130143822.40a827de@pitrou.net> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <4CF4248B.1060409@pearwood.info> <20101130143822.40a827de@pitrou.net> Message-ID: On Tue, Nov 30, 2010 at 8:38 AM, Antoine Pitrou wrote: > On Mon, 29 Nov 2010 22:46:33 -0500 > Alexander Belopolsky wrote: >> >> In practical terms, UCD comes at a price. ?The unicodedata module size >> is over 700K on my machine. ?This is almost half the size of the >> python executable and by far the largest extension module. (only CJK >> encodings come close.) ?Making builtins depend on the largest >> extension module for operation does not strike me as sound design. > > Well, do they depend on it? _PyUnicode_EncodeDecimal seems to depend > only on Objects/unicodectype.c. My mistake. That was a late night post. I wonder why unicodedata.so is so big then. It must be character names: $ python -v >>> '\N{DIGIT ONE}' dlopen("/.../unicodedata.so", 2); import unicodedata # dynamically loaded from /.../unicodedata.so '1' From solipsis at pitrou.net Tue Nov 30 15:41:48 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Nov 2010 15:41:48 +0100 Subject: [Python-Dev] Module size In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <4CF4248B.1060409@pearwood.info> <20101130143822.40a827de@pitrou.net> Message-ID: <1291128108.3538.10.camel@localhost.localdomain> Le mardi 30 novembre 2010 ? 09:32 -0500, Alexander Belopolsky a ?crit : > On Tue, Nov 30, 2010 at 8:38 AM, Antoine Pitrou wrote: > > On Mon, 29 Nov 2010 22:46:33 -0500 > > Alexander Belopolsky wrote: > >> > >> In practical terms, UCD comes at a price. The unicodedata module size > >> is over 700K on my machine. This is almost half the size of the > >> python executable and by far the largest extension module. (only CJK > >> encodings come close.) Making builtins depend on the largest > >> extension module for operation does not strike me as sound design. > > > > Well, do they depend on it? _PyUnicode_EncodeDecimal seems to depend > > only on Objects/unicodectype.c. > > My mistake. That was a late night post. I wonder why unicodedata.so > is so big then. > > It must be character names: > > $ python -v > >>> '\N{DIGIT ONE}' > dlopen("/.../unicodedata.so", 2); > import unicodedata # dynamically loaded from /.../unicodedata.so > '1' From a quick peek using hexdump, character names seem to only account for 1/4 of the module size. That said, I don't think the size is very important. For any non-trivial Python application, the size of unicodedata will be negligible compared to the size of Python objects. Regards Antoine. From tlesher at gmail.com Tue Nov 30 15:48:32 2010 From: tlesher at gmail.com (Tim Lesher) Date: Tue, 30 Nov 2010 09:48:32 -0500 Subject: [Python-Dev] Module size In-Reply-To: <1291128108.3538.10.camel@localhost.localdomain> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <4CF4248B.1060409@pearwood.info> <20101130143822.40a827de@pitrou.net> <1291128108.3538.10.camel@localhost.localdomain> Message-ID: On Tue, Nov 30, 2010 at 09:41, Antoine Pitrou wrote: > That said, I don't think the size is very important. For any non-trivial > Python application, the size of unicodedata will be negligible compared > to the size of Python objects. That depends very much on the platform and the application. For our embedded use of Python, static data size (like the text segment of a shared object) is far dearer than the heap space used by Python objects, which is why we've had to excise both the UCD and the CJK codecs in our builds. -- Tim Lesher From cornsea at gmail.com Tue Nov 30 15:56:33 2010 From: cornsea at gmail.com (haiyang kang) Date: Tue, 30 Nov 2010 22:56:33 +0800 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF4F545.5030902@pearwood.info> References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF4F545.5030902@pearwood.info> Message-ID: > But you should be able to write: > > text = input("Enter a number using your preferred digits: ") > num = float(text) > > without caring whether the user enters ?.? or 1.1 or something else. yes. from logical point of view, this can happen. But i really doubt that if really there are users who would like to input number like that, means that they first use google pinyin method to input ?, then change to english input method to input . , then change to google pinyin again for the other ?; or maybe you mean they input the whole ?.? words with google pinyin input method. To input 1, users only need to type one time keyboard, but to input ?, they need to type three times (yi SPACE). Of course, users can also input something accidentally, but we just need to give them some kind reminders. At least coders in my around will restrain their system users to input numbers with ASCII, and seems that users are still happy with the ASCII type numbers :). br, khy From alexander.belopolsky at gmail.com Tue Nov 30 16:05:42 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 30 Nov 2010 10:05:42 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF41785.5020807@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF41785.5020807@v.loewis.de> Message-ID: On Mon, Nov 29, 2010 at 4:13 PM, "Martin v. L?wis" wrote: >> - Should Python documentation refer to the specific version of Unicode >> that it supports? > > You mean, mention it somewhere? Sure (although it would be nice if the > documentation generator would automatically extract it from the source, > just as it extracts the Python version number). > > Of course, such mentioning should explain that this is specific to > CPython, and not an aspect of Python-the-language. > >> Current documentation refers to old versions. ?Should version be >> updated or removed to imply the latest? > > What specific reference are you referring to? > I found two places: A reference to Unicode 3.0 (!) in the Data Model section and a reference to 5.2.0 in unicodedata docs. See http://mail.python.org/pipermail/docs/2010-November/002074.html >> - How UCD updates should be handled during the language moratorium? > > It's clearly not affected. > This is not what Guido said last year: """ > One question: > > There are currently number of patch waiting on the tracker for > additional Unicode feature support and it's also likely that we'll > want to upgrade to a more recent Unicode version within the > next few years. > > How would such indirect changes be seen under the moratorium ? That would fall under the Case-by-Case Exemptions section. "Within the next few years" sounds like it might well wait until the moratorium is ended though. :-) """ http://mail.python.org/pipermail/python-dev/2009-November/093666.html I don't see it as a big deal, but technically speaking, with Unicode 6.0 changing properties of two characters to become identifiers Python language definition is affected. For example, an alternative implementation based on 5.2.0 will not accept a valid CPython program that uses one of these characters. >> During PEP 3003 discussion, it was suggested to handle it on a case by >> case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP >> 3003. > > It's covered by "As the standard library is not directly tied to the > language definition it is not covered by this moratorium." > See above. Also, it has been suggested that semantics of built-ins cannot change. (If that was so, it would put int('????') debate to rest at least for the time being.:-) >> ?Should this upgrade be backported to 2.7? > > No, it's a new feature. > Given that 2.7 will be maintained for 5 years and arguably Unicode Consortium takes backward compatibility very seriously, wouldn't it make sense to consider a backport at some point? I am sure we will soon see a bug report that the following does not work in 2.7: :-) >>> ord('\N{CAT FACE WITH WRY SMILE}') 128572 >> - How specific should library reference manual be in defining methods >> affected by UCD such as str.upper()? > > It should specify what this actually does in Unicode terminology > (probably in addition to a layman's rephrase of that) > I opened an issue for this: http://bugs.python.org/issue10587 >> .. For example, if '\UXXXXXXXX'.isalpha() returns true >> in one implementation, can it return false in another? > > Implementations are free to use any version of the UCD. I was more concerned about wide an narrow unicode CPython builds. Is it a bug that '\UXXXXXXXX'.isalpha() may disagree even when the two implementations are based on the same version of UCD? Thanks for your answers. From alexander.belopolsky at gmail.com Tue Nov 30 16:11:24 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 30 Nov 2010 10:11:24 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF4F545.5030902@pearwood.info> Message-ID: On Tue, Nov 30, 2010 at 9:56 AM, haiyang kang wrote: >> But you should be able to write: >> >> text = input("Enter a number using your preferred digits: ") >> num = float(text) >> >> without caring whether the user enters ?.? or 1.1 or something else. > > yes. from logical point of view, this can happen. ... Please stop discussing a non-feature. Python's float *does not* accept ' ?.?'. This was reported as a bug and closed as invalid. See "makeunicodedata.py does not support Unihan digit data" http://bugs.python.org/issue10575 From barry at python.org Tue Nov 30 16:35:31 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 30 Nov 2010 10:35:31 -0500 Subject: [Python-Dev] PEP 291 versus Python 3 In-Reply-To: <4CF4F77C.4000308@voidspace.org.uk> References: <4CF49ACF.6070904@netwok.org> <4CF4F77C.4000308@voidspace.org.uk> Message-ID: <20101130103531.54d79465@mission> On Nov 30, 2010, at 01:09 PM, Michael Foord wrote: >PEP 291 is very old and should probably be retired. I don't think anyone is >maintaining standard libraries in py3k that are also compatible with Python >2.anything. (At least not in a single codebase.) I agree. I think we should change the status of PEP 291 to Final, and add a few words to make it clear it applies only to Python 2. Since Neal owns the PEP, he should get first crack at doing the update, but I volunteer to make those changes if he declines (or does not respond). We may eventually need a similar document for Python 3, but it should be a new PEP. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From stefan-usenet at bytereef.org Tue Nov 30 16:55:19 2010 From: stefan-usenet at bytereef.org (Stefan Krah) Date: Tue, 30 Nov 2010 16:55:19 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> <4CF4F545.5030902@pearwood.info> Message-ID: <20101130155519.GA23354@yoda.bytereef.org> Alexander Belopolsky wrote: > On Tue, Nov 30, 2010 at 9:56 AM, haiyang kang wrote: > >> But you should be able to write: > >> > >> text = input("Enter a number using your preferred digits: ") > >> num = float(text) > >> > >> without caring whether the user enters ?.? or 1.1 or something else. > > > > yes. from logical point of view, this can happen. ... > > Please stop discussing a non-feature. Python's float *does not* > accept ' ?.?'. This was reported as a bug and closed as invalid. That seems irrelevant to me. One of the main topics of this thread is whether actual native speakers would be happy with ascii-only input for float(). haiyang kang confirmed that this is the case. I hope that more local speakers will contribute their views. Stefan Krah From alexander.belopolsky at gmail.com Tue Nov 30 17:40:19 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 30 Nov 2010 11:40:19 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> Message-ID: On Mon, Nov 29, 2010 at 2:38 PM, Alexander Belopolsky wrote: .. >> Still, if it's not detrimental and it it's not difficult to support, >> then why do you care? > > It is difficult to support. ?A fix for issue10557 would be much > simpler if we did not support non-European digits. ?I now added a > patch that handles non-ascii digits, so you can see what's involved. > Note that when Unicode Consortium inevitably adds more Nd characters > to the non-BMP planes, we will have to add surrogate pairs' support to > this code. > It turns out that this did in fact happen: # Newly assigned in Unicode 3.1.0 (March, 2001) .. 1D7CE..1D7FF ; 3.1 # [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE See http://unicode.org/Public/UNIDATA/DerivedAge.txt And of course, >>> unicodedata.digit('\U0001D7CE') 0 but >>> int('\U0001D7CE') .. UnicodeEncodeError: 'decimal' codec can't encode character '\ud835' .. on a narrow Unicode build. (Note the character reported in the error message!) If you think non-ASCII digits are not difficult to support, please contribute to the following tracker issues: http://bugs.python.org/issue10581 (Review and document string format accepted in numeric data type constructors) http://bugs.python.org/issue10557 (Malformed error message from float()) http://bugs.python.org/issue10435 (Document unicode C-API in reST - Specifically, PyUnicode_EncodeDecimal) http://bugs.python.org/issue8646 (PyUnicode_EncodeDecimal is undocumented) http://bugs.python.org/issue6632 (Include more fullwidth chars in the decimal codec) and back to the issue of user confusion http://bugs.python.org/issue652104 [closed/invalid] (int(u"\u1234") raises UnicodeEncodeError by Guido van Rossum) From fuzzyman at voidspace.org.uk Tue Nov 30 18:40:52 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 30 Nov 2010 17:40:52 +0000 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> Message-ID: <4CF53724.8090000@voidspace.org.uk> On 30/11/2010 16:40, Alexander Belopolsky wrote: > [snip...] > And of course, > >>>> unicodedata.digit('\U0001D7CE') > 0 > > but > >>>> int('\U0001D7CE') > .. > UnicodeEncodeError: 'decimal' codec can't encode character '\ud835' .. > > on a narrow Unicode build. (Note the character reported in the error message!) > > > If you think non-ASCII digits are not difficult to support, please > contribute to the following tracker issues: > Would moving this functionality to the locale module make the issues any easier to fix? Michael > http://bugs.python.org/issue10581 > (Review and document string format accepted in numeric data type constructors) > > http://bugs.python.org/issue10557 > (Malformed error message from float()) > > http://bugs.python.org/issue10435 > (Document unicode C-API in reST - Specifically, PyUnicode_EncodeDecimal) > > http://bugs.python.org/issue8646 > (PyUnicode_EncodeDecimal is undocumented) > > http://bugs.python.org/issue6632 > (Include more fullwidth chars in the decimal codec) > > and back to the issue of user confusion > > http://bugs.python.org/issue652104 [closed/invalid] > (int(u"\u1234") raises UnicodeEncodeError by Guido van Rossum) > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From alexander.belopolsky at gmail.com Tue Nov 30 19:21:30 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 30 Nov 2010 13:21:30 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF53724.8090000@voidspace.org.uk> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> Message-ID: On Tue, Nov 30, 2010 at 12:40 PM, Michael Foord wrote: .. >> If you think non-ASCII digits are not difficult to support, please >> contribute to the following tracker issues: >> > > Would moving this functionality to the locale module make the issues any > easier to fix? > Sure, if we code it in Python, supporting it will by much easier: def normalize_digits(s): digits = {m.group(1) for m in re.finditer('(\d)', s)} trtab = {ord(d): str(unicodedata.digit(d)) for d in digits} return s.translate(trtab) >>> normalize_digits('????.??') '1234.56' I am not sure this belongs to the locale module, however. It seems to me, something like 'unicodealgo' for unicode algorithms would be more appropriate. From solipsis at pitrou.net Tue Nov 30 19:29:52 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Nov 2010 19:29:52 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> Message-ID: <1291141792.8628.0.camel@localhost.localdomain> > Sure, if we code it in Python, supporting it will by much easier: > > def normalize_digits(s): > digits = {m.group(1) for m in re.finditer('(\d)', s)} > trtab = {ord(d): str(unicodedata.digit(d)) for d in digits} > return s.translate(trtab) > > >>> normalize_digits('????.??') > '1234.56' > > I am not sure this belongs to the locale module, however. It seems to > me, something like 'unicodealgo' for unicode algorithms would be more > appropriate. It could simply be in unicodedata if you split the implementation into a core C part and some Python bits. Regards Antoine. From alexander.belopolsky at gmail.com Tue Nov 30 19:59:29 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 30 Nov 2010 13:59:29 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <1291141792.8628.0.camel@localhost.localdomain> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> <1291141792.8628.0.camel@localhost.localdomain> Message-ID: On Tue, Nov 30, 2010 at 1:29 PM, Antoine Pitrou wrote: .. >> I am not sure this belongs to the locale module, however. ?It seems to >> me, something like 'unicodealgo' for unicode algorithms would be more >> appropriate. > > It could simply be in unicodedata if you split the implementation into a > core C part and some Python bits. > Splitting unicodedata may not be a bad idea. There are many more pieces in UCD than covered by unicodedata. [1] Hardcoding them all into unicodedata module is hard to justify, but some are quite useful. For example, PropertyValueAliases.txt is quite useful for those like myself who cannot remember what Pd or Zl category names stand for. SpecialCasing.txt is required for proper casing, but is not currently included in Python. I would not want to change str.upper or str.title because of this, but providing the raw info to someone who wants to implement proper case mappings may not be a bad idea. Blocks.txt is certainly useful for any language-dependent processing. On the other hand, I think we should keep Unicode data and Unicode algorithms separate. And the latter may not even belong to the Python stdlib. [1] http://unicode.org/Public/UNIDATA/ From martin at v.loewis.de Tue Nov 30 20:13:01 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Nov 2010 20:13:01 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF41785.5020807@v.loewis.de> Message-ID: <4CF54CBD.9030703@v.loewis.de> Am 30.11.2010 09:15, schrieb Hagen F?rstenau: >>> During PEP 3003 discussion, it was suggested to handle it on a case by >>> case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP >>> 3003. >> >> It's covered by "As the standard library is not directly tied to the >> language definition it is not covered by this moratorium." > > How is this restricted to the stdlib if it defines the set of valid > identifiers? The language does not change. The language specification says Python 3.0 introduces additional characters from outside the ASCII range (see PEP 3131). For these characters, the classification uses the version of the Unicode Character Database as included in the unicodedata module. That remains unchanged. It was a deliberate design decision of PEP 3131 to not codify a fixed set of characters that can be used in identifiers. Regards, Martin From martin at v.loewis.de Tue Nov 30 20:16:49 2010 From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Nov 2010 20:16:49 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF53724.8090000@voidspace.org.uk> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> Message-ID: <4CF54DA1.5080900@v.loewis.de> > Would moving this functionality to the locale module make the issues any > easier to fix? You could delegate it to the C library, so: yes. Regards, Martin From solipsis at pitrou.net Tue Nov 30 20:23:13 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Nov 2010 20:23:13 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF54DA1.5080900@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> <4CF54DA1.5080900@v.loewis.de> Message-ID: <1291144993.8628.1.camel@localhost.localdomain> Le mardi 30 novembre 2010 ? 20:16 +0100, "Martin v. L?wis" a ?crit : > > Would moving this functionality to the locale module make the issues any > > easier to fix? > > You could delegate it to the C library, so: yes. I hope you don't suggest delegating it to the C locale functions. Do you? From martin at v.loewis.de Tue Nov 30 20:40:54 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 30 Nov 2010 20:40:54 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <1291144993.8628.1.camel@localhost.localdomain> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> <4CF54DA1.5080900@v.loewis.de> <1291144993.8628.1.camel@localhost.localdomain> Message-ID: <4CF55346.1040108@v.loewis.de> Am 30.11.2010 20:23, schrieb Antoine Pitrou: > Le mardi 30 novembre 2010 ? 20:16 +0100, "Martin v. L?wis" a ?crit : >>> Would moving this functionality to the locale module make the issues any >>> easier to fix? >> >> You could delegate it to the C library, so: yes. > > I hope you don't suggest delegating it to the C locale functions. > Do you? Yes, I do. Why do you hope I don't? Regards, Martin From brett at python.org Tue Nov 30 20:41:47 2010 From: brett at python.org (Brett Cannon) Date: Tue, 30 Nov 2010 11:41:47 -0800 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: References: <201011251530.23947.emile.anclin@logilab> <4CEE9B72.1070002@ronadam.com> <20101129115311.GD18888@lupus.logilab.fr> Message-ID: On Mon, Nov 29, 2010 at 12:21, Ron Adam wrote: > > > On 11/29/2010 01:22 PM, Brett Cannon wrote: >> >> On Mon, Nov 29, 2010 at 03:53, Sylvain Th?nault >> ?wrote: >>> >>> On 25 novembre 11:22, Ron Adam wrote: >>>> >>>> On 11/25/2010 08:30 AM, Emile Anclin wrote: >>>>> >>>>> hello, >>>>> >>>>> working on Pylint, we have a lot of voluntary corrupted files to test >>>>> Pylint behavior; for instance >>>>> >>>>> $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py >>>>> # -*- coding: IBO-8859-1 -*- >>>>> """ check correct unknown encoding declaration >>>>> """ >>>>> >>>>> __revision__ = '????' >>>>> >>>>> >>>>> and we try to find that module : >>>>> find_module('func_unknown_encoding', None). But python3 raises >>>>> SyntaxError >>>>> in that case ; it didn't raise SyntaxError on python2 nor does so on >>>>> our >>>>> func_nonascii_noencoding and func_wrong_encoding modules (with obvious >>>>> names) >>>>> >>>>> Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) >>>>> [GCC 4.3.4] on linux2 >>>>> Type "help", "copyright", "credits" or "license" for more information. >>>>>>> >>>>>>> >from imp import find_module >>>>>>>> >>>>>>>> find_module('func_unknown_encoding', None) >>>>> >>>>> Traceback (most recent call last): >>>>> ? File " ", line 1, in >>>>> SyntaxError: encoding problem: with BOM >>>> >>>> I don't think there is a clear reason by design. ?Also try importing >>>> the same modules directly and noting the differences in the errors >>>> you get. >>> >>> IMO the point is that we can consider as a bug the fact that find_module >>> tries to somewhat read the content of the file, no? Though it seems to >>> only >>> doing this for encoding detection or like since find_module doesn't choke >>> on >>> a module containing another kind of syntax error. >>> >>> So the question is, should we deal with this in pylint/astng, or can we >>> expect >>> this to be fixed at some point? >> >> Considering these semantics changed between Python 2 and 3 w/o a >> discernable benefit (I would consider it a negative as finding a >> module should not be impacted by syntactic correctness; the full act >> of importing should be the only thing that cares about that), I would >> consider it a bug that should be filed. > > The output of imp.find_module() returns an open file io object, and it's > output feeds directly into to imp.load_module(). > >>>> imp.find_module('pydoc') > (<_io.TextIOWrapper name=4 encoding='utf-8'>, > '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1)) > > So I think the imp.find_module() is suppose to be used when you *do* want to > do the full act of importing and not for just finding out if or where module > xyz exists. Going with your line of argument, why can't imp.load_module be the call that figures out there is a syntax error? If you look at this from the perspective of PEP 302, finding a module has absolutely nothing to do with the validity of the found source, just that something was found somewhere which (hopefully) contains code that represents the module. From solipsis at pitrou.net Tue Nov 30 20:44:14 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Nov 2010 20:44:14 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF55346.1040108@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> <4CF54DA1.5080900@v.loewis.de> <1291144993.8628.1.camel@localhost.localdomain> <4CF55346.1040108@v.loewis.de> Message-ID: <1291146254.8628.4.camel@localhost.localdomain> Le mardi 30 novembre 2010 ? 20:40 +0100, "Martin v. L?wis" a ?crit : > Am 30.11.2010 20:23, schrieb Antoine Pitrou: > > Le mardi 30 novembre 2010 ? 20:16 +0100, "Martin v. L?wis" a ?crit : > >>> Would moving this functionality to the locale module make the issues any > >>> easier to fix? > >> > >> You could delegate it to the C library, so: yes. > > > > I hope you don't suggest delegating it to the C locale functions. > > Do you? > > Yes, I do. Why do you hope I don't? Because we all know how locale is a pile of cr*p, both in specification and in implementations. Our unit tests for it are a clear proof of that. Actually, I remember you saying that locale should ideally be replaced with a wrapper around the ICU library. Regards Antoine. From brett at python.org Tue Nov 30 20:46:07 2010 From: brett at python.org (Brett Cannon) Date: Tue, 30 Nov 2010 11:46:07 -0800 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: <20101130083418.GB4157@lupus.logilab.fr> References: <201011251530.23947.emile.anclin@logilab> <4CEE9B72.1070002@ronadam.com> <20101129115311.GD18888@lupus.logilab.fr> <20101130083418.GB4157@lupus.logilab.fr> Message-ID: On Tue, Nov 30, 2010 at 00:34, Sylvain Th?nault wrote: > On 29 novembre 14:21, Ron Adam wrote: >> On 11/29/2010 01:22 PM, Brett Cannon wrote: >> >Considering these semantics changed between Python 2 and 3 w/o a >> >discernable benefit (I would consider it a negative as finding a >> >module should not be impacted by syntactic correctness; the full act >> >of importing should be the only thing that cares about that), I would >> >consider it a bug that should be filed. >> >> The output of imp.find_module() returns an open file io object, and >> it's output feeds directly into to imp.load_module(). >> >> >>> imp.find_module('pydoc') >> (<_io.TextIOWrapper name=4 encoding='utf-8'>, >> '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1)) >> >> So I think the imp.find_module() is suppose to be used when you *do* >> want to do the full act of importing and not for just finding out if >> or where module xyz exists. > > in python 2, find_module was usable for such usage, and this is a needed api > for a tool like pylint. Is there another way to do so with python 3? At the moment, no. Best option would be to create an importlib.find_module function which returns a loader if the module is found, else returns None. The loader can have its get_source method called to read the source code (w/o verification). I have this planned for Python 3.3 but not 3.2 with us so close to 3.2b1. > -- > Sylvain Th?nault ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? LOGILAB, Paris (France) > Formations Python, Debian, M?th. Agiles: http://www.logilab.fr/formations > D?veloppement logiciel sur mesure: ? ? ? http://www.logilab.fr/services > CubicWeb, the semantic web framework: ? ?http://www.cubicweb.org > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org > From martin at v.loewis.de Tue Nov 30 20:55:52 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 30 Nov 2010 20:55:52 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <1291146254.8628.4.camel@localhost.localdomain> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> <4CF54DA1.5080900@v.loewis.de> <1291144993.8628.1.camel@localhost.localdomain> <4CF55346.1040108@v.loewis.de> <1291146254.8628.4.camel@localhost.localdomain> Message-ID: <4CF556C8.9010704@v.loewis.de> > Because we all know how locale is a pile of cr*p, both in specification > and in implementations. Our unit tests for it are a clear proof of that. I wouldn't use expletives, but rather claim that the locale module is highly platform-dependent. > Actually, I remember you saying that locale should ideally be replaced > with a wrapper around the ICU library. By that, I stand - however, I have given up the hope that this will happen anytime soon. Wrt. to local number parsing, I think that the locale module would be way better than the nonsense that Python currently does. In the locale module, somebody at least has thought about what specifically constitutes a number. The current not-ASCII-but-not-local-either approach is just useless. Maintaining a reasonable implementation is a burden, so deferring to the C library is more attractive than having to maintain an unreasonable implementation. Regards, Martin From solipsis at pitrou.net Tue Nov 30 21:11:59 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Nov 2010 21:11:59 +0100 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <4CF556C8.9010704@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> <4CF54DA1.5080900@v.loewis.de> <1291144993.8628.1.camel@localhost.localdomain> <4CF55346.1040108@v.loewis.de> <1291146254.8628.4.camel@localhost.localdomain> <4CF556C8.9010704@v.loewis.de> Message-ID: <1291147919.8628.12.camel@localhost.localdomain> Le mardi 30 novembre 2010 ? 20:55 +0100, "Martin v. L?wis" a ?crit : > Wrt. to local number parsing, I think that the locale module would be > way better than the nonsense that Python currently does. In the locale > module, somebody at least has thought about what specifically > constitutes a number. The current not-ASCII-but-not-local-either > approach is just useless. It depends what you need. If you parse integers it's probably good enough. And it's better to have a trustable standard (unicode) than a myriad of ad-hoc, possibly buggy or incomplete, often unavailable, cultural specifications drafted by OS vendors who have no business (and no expertise) in drafting them. At least you can build more sophisticated routines on the simple information given to you by the unicode database. You cannot build anything solid on the C locale functions (and even then you are limited by various issues inherent in the locale semantics, such as the fact that it relies on process-wide state, which would only be ok, at best, for single-user applications). There's a reason that e.g. Babel (*) reimplements locale-like functionality from scratch. (*) http://pypi.python.org/pypi/Babel/ Regards Antoine. From brett at python.org Tue Nov 30 21:11:58 2010 From: brett at python.org (Brett Cannon) Date: Tue, 30 Nov 2010 12:11:58 -0800 Subject: [Python-Dev] PEP 291 versus Python 3 In-Reply-To: <20101130103531.54d79465@mission> References: <4CF49ACF.6070904@netwok.org> <4CF4F77C.4000308@voidspace.org.uk> <20101130103531.54d79465@mission> Message-ID: On Tue, Nov 30, 2010 at 07:35, Barry Warsaw wrote: > On Nov 30, 2010, at 01:09 PM, Michael Foord wrote: > >>PEP 291 is very old and should probably be retired. I don't think anyone is >>maintaining standard libraries in py3k that are also compatible with Python >>2.anything. (At least not in a single codebase.) > > I agree. Same here; I have purposefully ignored compatibility requirements because I always found those promises to be extremely annoying and somewhat painful to enforce. > ?I think we should change the status of PEP 291 to Final, and add a > few words to make it clear it applies only to Python 2. ?Since Neal owns the > PEP, he should get first crack at doing the update, but I volunteer to make > those changes if he declines (or does not respond). > I will channel Neal: "I decline and/or do not want to respond". =) > We may eventually need a similar document for Python 3, but it should be a new > PEP. I hope not. From solipsis at pitrou.net Tue Nov 30 21:13:07 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Nov 2010 21:13:07 +0100 Subject: [Python-Dev] ICU In-Reply-To: <4CF556C8.9010704@v.loewis.de> References: <20101128214311.092abd35@pitrou.net> <4CF2D4E9.3060607@v.loewis.de> <4CF2F067.5020705@pearwood.info> <4CF354C6.9020302@v.loewis.de> <20101129193302.115dbcd5@pitrou.net> <4CF53724.8090000@voidspace.org.uk> <4CF54DA1.5080900@v.loewis.de> <1291144993.8628.1.camel@localhost.localdomain> <4CF55346.1040108@v.loewis.de> <1291146254.8628.4.camel@localhost.localdomain> <4CF556C8.9010704@v.loewis.de> Message-ID: <1291147987.8628.13.camel@localhost.localdomain> Oh, about ICU: > > Actually, I remember you saying that locale should ideally be replaced > > with a wrapper around the ICU library. > > By that, I stand - however, I have given up the hope that this will > happen anytime soon. Perhaps this could be made a GSOC topic. Regards Antoine. From ben+python at benfinney.id.au Tue Nov 30 21:24:08 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 01 Dec 2010 07:24:08 +1100 Subject: [Python-Dev] Python and the Unicode Character Database References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87r5e236hj.fsf@benfinney.id.au> haiyang kang writes: > I think it is a little ugly to have code like this: num = > float("?.?"), expected result is: num = 1.1 That's a straw man, though. The string need not be a literal in the program; it can be input to the program. num = float(input_from_the_external_world) Does that change your assessment of whether non-ASCII digits are used? -- \ ?The greatest tragedy in mankind's entire history may be the | `\ hijacking of morality by religion.? ?Arthur C. Clarke, 1991 | _o__) | Ben Finney From barry at python.org Tue Nov 30 22:05:43 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 30 Nov 2010 16:05:43 -0500 Subject: [Python-Dev] PEP 291 versus Python 3 In-Reply-To: References: <4CF49ACF.6070904@netwok.org> <4CF4F77C.4000308@voidspace.org.uk> <20101130103531.54d79465@mission> Message-ID: <20101130160543.3b478311@mission> On Nov 30, 2010, at 12:11 PM, Brett Cannon wrote: >I will channel Neal: "I decline and/or do not want to respond". =) PEP 291 updated. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From tjreedy at udel.edu Tue Nov 30 23:43:22 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 30 Nov 2010 17:43:22 -0500 Subject: [Python-Dev] Python and the Unicode Character Database In-Reply-To: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87wrnv43v5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote: > I see no reason not to make a similar promise for numeric literals. I > see no good reason to allow compatibility full-width Japanese "ASCII" > numerals or Arabic cursive numerals in "for i in range(...)" for > example. I do not think that anyone, at least not me, has argued for anything other than 0-9 digits (or 0-f for hex) in literals in program code. The only issue is whether non-programmer *users* should be able to use their native digits in applications in response to input prompts. -- Terry Jan Reedy From rrr at ronadam.com Tue Nov 30 23:48:56 2010 From: rrr at ronadam.com (Ron Adam) Date: Tue, 30 Nov 2010 16:48:56 -0600 Subject: [Python-Dev] python3k : imp.find_module raises SyntaxError In-Reply-To: References: <201011251530.23947.emile.anclin@logilab> <4CEE9B72.1070002@ronadam.com> <20101129115311.GD18888@lupus.logilab.fr> Message-ID: On 11/30/2010 01:41 PM, Brett Cannon wrote: > On Mon, Nov 29, 2010 at 12:21, Ron Adam wrote: >> >> >> On 11/29/2010 01:22 PM, Brett Cannon wrote: >>> >>> On Mon, Nov 29, 2010 at 03:53, Sylvain Th?nault >>> wrote: >>>> >>>> On 25 novembre 11:22, Ron Adam wrote: >>>>> >>>>> On 11/25/2010 08:30 AM, Emile Anclin wrote: >>>>>> >>>>>> hello, >>>>>> >>>>>> working on Pylint, we have a lot of voluntary corrupted files to test >>>>>> Pylint behavior; for instance >>>>>> >>>>>> $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py >>>>>> # -*- coding: IBO-8859-1 -*- >>>>>> """ check correct unknown encoding declaration >>>>>> """ >>>>>> >>>>>> __revision__ = '????' >>>>>> >>>>>> >>>>>> and we try to find that module : >>>>>> find_module('func_unknown_encoding', None). But python3 raises >>>>>> SyntaxError >>>>>> in that case ; it didn't raise SyntaxError on python2 nor does so on >>>>>> our >>>>>> func_nonascii_noencoding and func_wrong_encoding modules (with obvious >>>>>> names) >>>>>> >>>>>> Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) >>>>>> [GCC 4.3.4] on linux2 >>>>>> Type "help", "copyright", "credits" or "license" for more information. >>>>>>>> >>>>>>>> >from imp import find_module >>>>>>>>> >>>>>>>>> find_module('func_unknown_encoding', None) >>>>>> >>>>>> Traceback (most recent call last): >>>>>> File " ", line 1, in >>>>>> SyntaxError: encoding problem: with BOM >>>>> >>>>> I don't think there is a clear reason by design. Also try importing >>>>> the same modules directly and noting the differences in the errors >>>>> you get. >>>> >>>> IMO the point is that we can consider as a bug the fact that find_module >>>> tries to somewhat read the content of the file, no? Though it seems to >>>> only >>>> doing this for encoding detection or like since find_module doesn't choke >>>> on >>>> a module containing another kind of syntax error. >>>> >>>> So the question is, should we deal with this in pylint/astng, or can we >>>> expect >>>> this to be fixed at some point? >>> >>> Considering these semantics changed between Python 2 and 3 w/o a >>> discernable benefit (I would consider it a negative as finding a >>> module should not be impacted by syntactic correctness; the full act >>> of importing should be the only thing that cares about that), I would >>> consider it a bug that should be filed. >> >> The output of imp.find_module() returns an open file io object, and it's >> output feeds directly into to imp.load_module(). >> >>>>> imp.find_module('pydoc') >> (<_io.TextIOWrapper name=4 encoding='utf-8'>, >> '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1)) >> >> So I think the imp.find_module() is suppose to be used when you *do* want to >> do the full act of importing and not for just finding out if or where module >> xyz exists. > > Going with your line of argument, why can't imp.load_module be the > call that figures out there is a syntax error? If you look at this > from the perspective of PEP 302, finding a module has absolutely > nothing to do with the validity of the found source, just that > something was found somewhere which (hopefully) contains code that > represents the module. The part that I'm looking at, is what would find_module return if the encoding is bad or not found for the encoding? <_io.TextIOWrapper name=4 encoding='bad_encoding'> Maybe we could have some library introspection function in the inspect for just looking in the library rather than loading modules. But I think those would have the same issues, as packages need to be loaded in order to find sub modules.* * It almost seems like the concept of a sub-module (in a package) is flawed. I'm not sure I can explain what causes me to feel that way at the moment though. Ron
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4