'" into all your scripts! As for me, i've probably done this hundreds of times now, and would love to stop doing it. I anticipate a possible security concern (as this shows bits of your source code to strangers when problems happen). So i have tried to address that by providing a SECRET flag in cgitb that causes the tracebacks to get written to files instead of the Web browser. Opinions and suggestions are welcomed! (I'm looking at the good stuff that the WebWare people have done with it, and i plan to merge in their improvements. For the HTML-heads out there in particular, i'm looking for your thoughts on the reset() routine.) -- ?!ng From barry@zope.com Tue Jul 31 04:04:03 2001 From: barry@zope.com (Barry A. Warsaw) Date: Mon, 30 Jul 2001 23:04:03 -0400 Subject: [Python-Dev] cgitb.py for Python 2.2 References: Message-ID: <15206.8227.652539.471067@anthem.wooz.org> >>>>> "KY" == Ka-Ping Yee writes: KY> What i'm proposing is that we toss cgitb.py into the standard KY> library (pretty small at about 100 lines, since all the heavy KY> lifting is in pydoc and inspect). Then we can add this to KY> site.py: No time right now to look at it, but I remember it looked pretty cool at IPC9. I'd like to merge in some of the ideas I've developed in Mailman's driver script, which prints out the environment and some other sys information. driver always prints to a log file and optionally to stdout (it has a STEALTH_MODE variable that's probably equivalent to your SECRET). One thing I tried very hard to do was to make driver bulletproof, so that it only imported a very minimal amount of stuff, and that /any/ exception along the way would get caught and not allowed to percolate up out of the top frame (which would cause a non-zero exit status and unhelpful message in the browser). About the only thing that isn't caught are exceptions importing sys, but if that happens you have bigger problems! :) I'll take a closer look at cgitb.py when I get a chance, but I'm generally +1 on the idea. -Barry From gnat@oreilly.com Tue Jul 31 04:08:47 2001 From: gnat@oreilly.com (Nathan Torkington) Date: Mon, 30 Jul 2001 20:08:47 -0700 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: <20010730205657.A2298@ute.cnri.reston.va.us> References: <20010730051831.B1122@thyrsus.com> <20010731012432.G20676@xs4all.nl> <20010730205657.A2298@ute.cnri.reston.va.us> Message-ID: <15206.8511.147000.832644@gargle.gargle.HOWL> Andrew Kuchling writes: > If regex opcodes form part of the basic VM, would the main loop end up > looking like the union of ceval.c and pypcre.c/_sre.c? The thought is > too ghastly to contemplate, though a little part of me [*] would like > to see it. (perl guy speaking alert) The plan for perl6 is to implement the regular expression engine as opcodes. We feel this would be cleaner and faster than having the essentially separate module that we have right now. I think our current perl5 project manager was the one who said that we have no idea how inefficient our current RE engine is, because it's been "optimized" to the point where it's impossible to read. The core loop would just be the usual opcode dispatch loop ("call the function for the current operation, which returns the next operation"). The only difference is that some of the opcodes would be specific to RE matches. (I'm unclear on how much special logic RE opcodes involve--it may be possible to implement REs with the operations that regular language features like loops and tests require). Nat From gnat@oreilly.com Tue Jul 31 04:12:04 2001 From: gnat@oreilly.com (Nathan Torkington) Date: Mon, 30 Jul 2001 20:12:04 -0700 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: <20010731012432.G20676@xs4all.nl> References: <20010730051831.B1122@thyrsus.com> <20010731012432.G20676@xs4all.nl> Message-ID: <15206.8708.811000.468489@gargle.gargle.HOWL> Thomas Wouters writes: > Also, the Perl engine has some features SRE hasn't, yet, and vice > versa (last I checked, Perl's regexps didn't do unicode or named > groups.) Perl's REs now do Unicode. Perl 6's REs will do named groups. > And I won't even start with Perl's more archaic features, that > change the whole working of the interpreter. Those are going away. Perl people hate them as much as you do--the only time they're used now is to make deliberately hideous code, and hardly anyone will seriously lament the passing of that ability. No more "change the starting position for subscripts", no more "change all RE matches globally", and so on. Nat From gnat@oreilly.com Tue Jul 31 04:15:34 2001 From: gnat@oreilly.com (Nathan Torkington) Date: Mon, 30 Jul 2001 20:15:34 -0700 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: <20010730162901.F9578@ute.cnri.reston.va.us> References: <20010730014859.A15971@thyrsus.com> <200107301918.f6UJIt003517@odiug.digicool.com> <20010730033517.A17356@thyrsus.com> <200107302016.f6UKGoG03676@odiug.digicool.com> <20010730162901.F9578@ute.cnri.reston.va.us> Message-ID: <15206.8918.603000.448728@gargle.gargle.HOWL> Andrew Kuchling writes: > There's also the cultural difference between Python's "write it > clearly and then optimize it" and Perl's "let's write clever optimized > code right from the start". Perhaps this can be bridged, perhaps not. The people designing and implementing perl6 have already agreed on a "do it clean, then make it faster" approach. We can all see the problems with the current Perl internals, and have no desire to repeat the mistakes of the past. There may or may not be impedence mismatch between the two languages (Perl's flexitypes might be one of the sticking points) but this won't be one of them. Nat From esr@thyrsus.com Mon Jul 30 16:51:03 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 30 Jul 2001 11:51:03 -0400 Subject: [Python-Dev] cgitb.py for Python 2.2 In-Reply-To: ; from ping@lfw.org on Mon, Jul 30, 2001 at 07:43:45PM -0700 References: Message-ID: <20010730115103.A2052@thyrsus.com> Ka-Ping Yee : > The upside is that we *automagically* get pretty tracebacks for all > the Python CGI scripts there, with zero effort from the CGI script > writers. I think this is a really strong hook for people getting > started with Python. I've been to look at the cgitb page. My jaw dropped open. +1 -- Eric S. Raymond The abortion rights and gun control debates are twin aspects of a deeper question --- does an individual ever have the right to make decisions that are literally life-or-death? And if not the individual, who does? From paulp@ActiveState.com Tue Jul 31 04:49:50 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Mon, 30 Jul 2001 20:49:50 -0700 Subject: [Python-Dev] Parrot -- should life imitate satire? References: <20010730051831.B1122@thyrsus.com> <20010731012432.G20676@xs4all.nl> <20010730205657.A2298@ute.cnri.reston.va.us> Message-ID: <3B662ADD.9E701795@ActiveState.com> Andrew Kuchling wrote: > >... > > If regex opcodes form part of the basic VM, would the main loop end up > looking like the union of ceval.c and pypcre.c/_sre.c? The thought is > too ghastly to contemplate, though a little part of me [*] would like > to see it. Welcome to Perl. :) I don't really understand it but here are references that might help: http://aspn.activestate.com/ASPN/Mail/Message/638953 http://aspn.activestate.com/ASPN/Mail/Message/639000 http://aspn.activestate.com/ASPN/Mail/Message/639048 -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp@ActiveState.com Tue Jul 31 05:17:04 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Mon, 30 Jul 2001 21:17:04 -0700 Subject: [Python-Dev] Parrot -- should life imitate satire? References: <007701c11954$6b0017c0$8a73fea9@newmexico> Message-ID: <3B663140.41CB9DD7@ActiveState.com> Samuele Pedroni wrote: > >... > A question: are there already some data about > what would be the actual performance of Python.NET vs. CPython ? I think it is safe to say that the current version of Python.NET is slower than Jython. Now it hasn't been optimized as much as Jython so we might be able to get it as fast as Jython. But I don't think that there is anything in the .NET runtime that makes it a great deal better than the JVM for dynamic languages. The only difference is that Microsoft seems more aware of the problem and may move to correct it whereas I have a feeling that explicit support for our languages would dilute Sun's 100% Java marketing campaign. Also, the .NET CLR is standardized at ECMA so we could (at least in theory!) go to the meetings and try to influence version 2. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From m@moshez.org Tue Jul 31 05:15:46 2001 From: m@moshez.org (Moshe Zadka) Date: Tue, 31 Jul 2001 07:15:46 +0300 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: <20010730051831.B1122@thyrsus.com> References: <20010730051831.B1122@thyrsus.com>, <20010730014859.A15971@thyrsus.com> <200107301918.f6UJIt003517@odiug.digicool.com> <20010730033517.A17356@thyrsus.com> <200107302016.f6UKGoG03676@odiug.digicool.com> Message-ID: On Mon, 30 Jul 2001, "Eric S. Raymond" wrote: > Let's further suppose that we have a callout mechanism from the Parrot > interpreter core to the Perl or Python runtime's C level that can pass out > Python/Perl types and return them. > > Given these two premises, what other problems are there? This solution sounds like just taking two VM interpreters and forcing them together by having the first byte of the instruction be "Python opcode" or "Perl opcode". You get none of the wins you were aiming for. > I can see one: garbage collection. How is GC a problem? Python never promised a specific GC mechanism, so as long as you have something which collects garbage, Python is fine. -- gpg --keyserver keyserver.pgp.com --recv-keys 46D01BD6 54C4E1FE Secure (inaccessible): 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 Insecure (accessible): C5A5 A8FA CA39 AB03 10B8 F116 1713 1BCF 54C4 E1FE Learn Python! http://www.ibiblio.org/obp/thinkCSpy From m@moshez.org Tue Jul 31 05:18:10 2001 From: m@moshez.org (Moshe Zadka) Date: Tue, 31 Jul 2001 07:18:10 +0300 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: References: Message-ID: On Mon, 30 Jul 2001, "Steven D. Majewski" wrote: > Scheme48 is probably considered the best portable byte-code Scheme > implementation. ( Don't know anything about it's internals myself ) Last I heard (admittedly, >1 yr. ago), it didn't support 64 bit architectures. -- gpg --keyserver keyserver.pgp.com --recv-keys 46D01BD6 54C4E1FE Secure (inaccessible): 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 Insecure (accessible): C5A5 A8FA CA39 AB03 10B8 F116 1713 1BCF 54C4 E1FE Learn Python! http://www.ibiblio.org/obp/thinkCSpy From greg@cosc.canterbury.ac.nz Tue Jul 31 06:00:45 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 31 Jul 2001 17:00:45 +1200 (NZST) Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: Message-ID: <200107310500.RAA00648@s454.cosc.canterbury.ac.nz> "Steven D. Majewski" : > But then, I've always thought that one of the problems with > trying to optimize Python was that the VM was too high level. No, the problem is that Python is just too darn dynamic! This is a feature of the language, not just the VM. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@zope.com Tue Jul 31 07:22:13 2001 From: guido@zope.com (Guido van Rossum) Date: Tue, 31 Jul 2001 02:22:13 -0400 Subject: [Python-Dev] cgitb.py for Python 2.2 In-Reply-To: Your message of "Mon, 30 Jul 2001 19:43:45 PDT." References: Message-ID: <200107310622.CAA11742@cj20424-a.reston1.va.home.com> > Sorry i've been fairly quiet recently -- at least life isn't dull. You still have a few SF bugs and patches assigned! How about addressing those?! > I wanted to put in a few words for cgitb.py for your consideration. > > I think you all saw it at IPC 9 -- if you missed the presentation, > there are examples at http://www.lfw.org/python to check out. Yeah, it's cool. > What i'm proposing is that we toss cgitb.py into the standard library > (pretty small at about 100 lines, since all the heavy lifting is in > pydoc and inspect). Then we can add this to site.py: > > if os.environ.has_key("GATEWAY_INTERFACE"): > import sys, cgitb > sys.excepthook = cgitb.excepthook Why not add this to cgi.py instead? Th site.py initialization is accumulating a lot of cruft, and I don't like new additions that are irrelevant for most apps (CGI is a tiny niche for Python IMO). (I also think all the stuff that's only for interactive mode should be moved off to another module that is only run in interactive mode.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@zope.com Tue Jul 31 07:29:36 2001 From: guido@zope.com (Guido van Rossum) Date: Tue, 31 Jul 2001 02:29:36 -0400 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: Your message of "Mon, 30 Jul 2001 21:17:04 PDT." <3B663140.41CB9DD7@ActiveState.com> References: <007701c11954$6b0017c0$8a73fea9@newmexico> <3B663140.41CB9DD7@ActiveState.com> Message-ID: <200107310629.CAA11818@cj20424-a.reston1.va.home.com> > Also, the .NET CLR is standardized at ECMA so we could (at least in > theory!) go to the meetings and try to influence version 2. Notice the addition "in theory". In practice, this is BS. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Tue Jul 31 08:37:27 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 31 Jul 2001 09:37:27 +0200 Subject: [Python-Dev] Python API version & optional features References: <200107302222.f6UMM5105688@mira.informatik.hu-berlin.de> Message-ID: <3B666037.6A813780@lemburg.com> "Martin v. Loewis" wrote: > > >> I guess one could argue that extension writers should check > >> for narrow/wide builds in their extensions before using Unicode. > >> > >> Since the number of Unicode extension writers is much smaller > >> than the number of users, I think that this apporach would be > >> reasonable, provided that we document the problem clearly in the > >> NEWS file. > > > OK. I approve. > > I'm not sure I can follow. What did you approve? To use macros in unicodeobject.h which then map all interface names to either PyUnicodeUC2_* or PyUnicodeUCS4_*. The linker will then report the mismatch in interfaces. > That extension > writers should check whether their Unicode build matches the one they > get at run-time? How are they going to do that? They would have to use at least one of the PyUnicode_* APIs in their code. I think it would also be a good idea to provide a non-mangled PyUnicode_UnicodeSize() API which would then return the number of bytes occupied by Py_UNICODE of the Python build. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Tue Jul 31 09:14:53 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 31 Jul 2001 10:14:53 +0200 Subject: [Python-Dev] Python API version & optional features References: <3B655980.948BCDEF@lemburg.com> <15205.25545.353887.299167@cj42289-a.reston1.va.home.com> <3B6567A3.E386EAB9@lemburg.com> <200107301427.f6UERW802779@odiug.digicool.com> <3B65765A.9706A4A2@lemburg.com> <200107301547.f6UFlhB02991@odiug.digicool.com> Message-ID: <3B6668FD.DA986A28@lemburg.com> Guido van Rossum wrote: > > > > Hm, the "u" argument parser is a nasty one to catch. How likely is > > > this to be the *only* reference to Unicode in a particular extension? > > > > It is not very likely but IMHO possible for e.g. extensions > > which rely on the fact that wchar_t == Py_UNICODE and then do > > direct interfacing to some other third party code. > > > > I guess one could argue that extension writers should check > > for narrow/wide builds in their extensions before using Unicode. > > > > Since the number of Unicode extension writers is much smaller > > than the number of users, I think that this apporach would be > > reasonable, provided that we document the problem clearly in the > > NEWS file. > > OK. I approve. Great ! I'll go ahead and fix unicodeobject.h. > > Hmm, that would probably not make UCS-4 builds very popular ;-) > > Do you have any reason to assume that it would be popular otherwise? > :-) :-) :-) Oh, I do hope that people try out the UCS-4 builds. They may not be all that interesting yet, but I believe that for Asian users they do have some advantages. > > > These warnings should use the warnings framework, by the way, to make > > > it easier to ignore a specific warning. Currently it's a hard write > > > to stderr. > > > > Using the warnings framework would indeed be a good idea (many older > > extensions work just fine even with later API levels; the warnings > > are annoying, though) ! > > Exactly. > > I'm not going to make the change, but it should be a two-liner in > Python/modsupport.c:Py_InitModule4(). I'll look into this as well. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Tue Jul 31 09:30:20 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 31 Jul 2001 10:30:20 +0200 Subject: [Python-Dev] Revised decimal type PEP References: <0107301106520A.02216@fermi.eeel.nist.gov> Message-ID: <3B666C9C.4400BD9C@lemburg.com> Michael McLay wrote: > > PEP: 2XX > Title: Adding a Decimal type to Python > Version: $Revision:$ > Author: mclay@nist.gov > Status: Draft > Type: ?? > Created: 25-Jul-2001 > Python-Version: 2.2 > > Introduction > > This PEP describes the addition of a decimal number type to Python. > > ... > > Implementation > > The tokenizer will be modified to recognized number literals with > a 'd' suffix and a decimal() function will be added to __builtins__. How will you be able to define the precision of decimals ? Implicit by providing a decimal string with enough 0s to let the parser deduce the precision ? Explicit like so: decimal(12, 5) ? Also, what happens to the precision of the decimal object resulting from numeric operations ? > A decimal number can be used to represent integers and floating point > numbers and decimal numbers can also be displayed using scientific > notation. Examples of decimal numbers include: > > ... > > This proposal will also add an optional 'b' suffix to the > representation of binary float type literals and binary int type > literals. Hmm, I don't quite grasp the need for the 'b'... numbers without any modifier will work the same way as they do now, right ? > ... > > Expressions that mix binary floats with decimals introduce the > possibility of unexpected results because the two number types use > different internal representations for the same numerical value. I'd rather have this explicit in the sense that you define which assumptions will be made and what issues arise (rounding, truncation, loss of precision, etc.). > The > severity of this problem is dependent on the application domain. For > applications that normally use binary numbers the error may not be > important and the conversion should be done silently. For newbie > programmers a warning should be issued so the newbie will be able to > locate the source of a discrepancy between the expected results and > the results that were achieved. For financial applications the mixing > of floating point with binary numbers should raise an exception. > > To accommodate the three possible usage models the python interpreter > command line options will be used to set the level for warning and > error messages. The three levels are: > > promiscuous mode, -f or --promiscuous > safe mode -s or --save > pedantic mode -p or --pedantic How about a generic option: --numerics:[loose|safe|pedantic] or -n:[l|s|p] > The default setting will be set to the safe setting. In safe mode > mixing decimal and binary floats in a calculation will trigger a warning > message. > > >>> type(12.3d + 12.2b) > Warning: the calculation mixes decimal numbers with binary floats > > > In promiscuous mode warnings will be turned off. > > >>> type(12.3d + 12.2b) > > > In pedantic mode warning from safe mode will be turned into exceptions. > > >>> type(12.3d + 12.2b) > Traceback (innermost last): > File " ", line 1, in ? > TypeError: the calculation mixes decimal numbers with binary floats > > Semantics of Decimal Numbers > > ?? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Tue Jul 31 09:05:14 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 31 Jul 2001 10:05:14 +0200 Subject: [Python-Dev] pep-discuss References: <20010730154936.AE36899C94@waltz.rahul.net> Message-ID: <3B6666BA.7F774C46@lemburg.com> Aahz Maruch wrote: > > Paul Prescod wrote: > > > > We've talked about having a mailing list for general PEP-related > > discussions. Two things make me think that revisiting this would be a > > good idea right now. > > > > First, the recent loosening up of the python-dev rules threatens the > > quality of discussion about bread and butter issues such as patch > > discussions and process issues. > > > > Second, the flamewar on python-list basically drowned out the usual > > newbie questions and would give a person coming new to Python a very > > negative opinion about the language's future and the friendliness of the > > community. I would rather redirect as much as possible of that to a list > > that only interested participants would have to endure. > > While what you say makes sense, overall, there are a lot of people (me > included) who prefer discussion on newsgroups, and I can't quite see > creating a newsgroup for PEP discussions yet. Call me -0.25 for kicking > discussion off c.l.py and +0.25 for getting it off python-dev. I don't really mind having PEP discussions on both c.l.p (to get user feedback) and python-dev (for the purpose of reaching consensus). After all, python-dev is about developing Python, so PEP discussion is very much on topic. Note that a filter on "python-dev" in the List-ID field and "PEP" in the subject should pretty much filter out all PEP discussions from python-dev if you don't want to participate in them. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From paulp@ActiveState.com Tue Jul 31 09:47:03 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 31 Jul 2001 01:47:03 -0700 Subject: [Python-Dev] Parrot -- should life imitate satire? References: <007701c11954$6b0017c0$8a73fea9@newmexico> <3B663140.41CB9DD7@ActiveState.com> <200107310629.CAA11818@cj20424-a.reston1.va.home.com> Message-ID: <3B667087.EBBE8938@ActiveState.com> Guido van Rossum wrote: > > > Also, the .NET CLR is standardized at ECMA so we could (at least in > > theory!) go to the meetings and try to influence version 2. > > Notice the addition "in theory". In practice, this is BS. It depends on the rules and politics of each particular standards group. It is fundamentally a social activity. It also depends how much effort you are willing to put into promoting your cause. Sam Ruby is chair of the ECMA CLI group. He is a big scripting language fan. http://www2.hursley.ibm.com/tc39/ Also note the presence of Mike Cowlishaw of REXX fame and Dave Raggett of the W3C. Working within a standards body is a gamble. It can pay off big or it can completely fail. We might find Microsoft our strongest ally -- they have always been interested in having the scripting languages work well on their platforms. They would hate to give programmers to have an excuse to stick to Unix or the JVM. I don't personally know enough about this particular circumstance to know whether there is any possibility of significantly influencing version 2 or not. Maybe the gamble isn't worth the effort. But I wouldn't dismiss it out of hand. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mwh@python.net Tue Jul 31 10:23:48 2001 From: mwh@python.net (Michael Hudson) Date: 31 Jul 2001 05:23:48 -0400 Subject: [Python-Dev] Changing the Division Operator -- PEP 238, rev 1.12 In-Reply-To: Samuele Pedroni's message of "Mon, 30 Jul 2001 21:59:35 +0200 (MET DST)" References: <200107301959.VAA11733@core.inf.ethz.ch> Message-ID: <2m4rrt78pn.fsf@starship.python.net> Samuele Pedroni writes: > ... > > > > > > > > Does codeop currently work in Jython? The solution should continue to > > > > work in Jython then. > > > We have our interface compatible version of codeop that works. > > > > Would implementing the new interfaces I sketched out for codeop.py be > > possible in Jython? That's the bit I care about, not so much the > > interface to __builtin__.compile. > Yes, it's of possible. Good; hopefully we can get somewhere then. > > > > Does Jython support the same flag bit values as > > > > CPython? If not, Paul Prescod's suggestion to use keyword arguments > > > > becomes very relevant. > > > we support a subset of the co_flags, CO_NESTED e.g. is there with the same > > > value. > > > > > > But the embedding API is very different, my implementation of nested > > > scopes does not define any Py_CF... flags, we have an internal CompilerFlags > > > object but is more similar to PyFutureFeatures ... > > > > Is this object exposed to Python code at all? > Not publicily, but in Jython the separating line is a bit different, > because public java classes are always accessible from jython, > even most of the internals. That does not mean and every use of that > is welcome and supported. Ah, of course. I'd forgotten how cool Jython was in some ways. > > One approach would be > > PyObject-izing PyFutureFlags and making *that* the fourth argument to > > compile... > > > > class Compiler: > > def __init__(self): > > self.ff = ff.new() # or whatever > > def __call__(self, source, filename, start_symbol): > > code = compile(source, filename, start_symbol, self.ff) > > self.ff.merge(code.co_flags) > > return code > I see, "internally" we already have a compiler_flags function > that do the same of: > > code = compile(source, filename, start_symbol, self.ff) > > self.ff.merge(code.co_flags) > > where self.ff is a CompuilerFlags object. > > I can re-arrange things for any interface, Well, I don't want to make more work for you - I imagine Guido's doing enough of that for two! > I was only trying to explain our approach and situation and a > possible way to avoid duplicating some internal code in Python. Can you point me to the code in CVS that implements this sort of thing? I don't really know Java but I can probably muddle through to some extent. We might as well have CPython copy Jython for once... Cheers, M. -- On the other hand, the following areas are subject to boycott in reaction to the rampant impurity of design or execution, as determined after a period of study, in no particular order: ... http://www.naggum.no/profile.html From thomas@xs4all.net Tue Jul 31 10:55:22 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 31 Jul 2001 11:55:22 +0200 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: <15206.8708.811000.468489@gargle.gargle.HOWL> References: <20010730051831.B1122@thyrsus.com> <20010731012432.G20676@xs4all.nl> <15206.8708.811000.468489@gargle.gargle.HOWL> Message-ID: <20010731115521.I20676@xs4all.nl> On Mon, Jul 30, 2001 at 08:12:04PM -0700, Nathan Torkington wrote: > > And I won't even start with Perl's more archaic features, that > > change the whole working of the interpreter. > > Those are going away. Yeah, I thought as much, which is why I wasn't going to start on them :) > Perl people hate them as much as you do--the only time they're used now is > to make deliberately hideous code, and hardly anyone will seriously lament > the passing of that ability. No more "change the starting position for > subscripts", no more "change all RE matches globally", and so on. I don't really hate the features, I just don't use them, and wouldn't want them in Python :-) I do actually program Perl, and will do a lot more of it in the next couple of months at least (I switched projects at work, to one that will entail Perl programming roughly 80% of the time) -- I just like Python a lot more. Your comments do lead me to ask this question, though (and forgive me if it comes over as the arrogant ranting of a Python bigot; it's definately not intended as such, even though I only have a Python-implementors point of view.) What's going to be the difference between Perl6 and Python ? The variable typing-naming ($var, %var, etc) I presume, and the curly bracket vs. indentation blocking issue. Regex-literals, 'unless', the ' if/unless/while ' shortcut, I guess ? Those are basically all parser/compiler issues, so shouldn't be a real problem. The transmorphic typing is trickier, as is taint mode and Perl's scoping rules.... Though the latter could be done if we refactor the namespace-creation that is currently done implicitly on function-creation, and allow it to be done explicitly. The same goes for the variable-filling-assignment (which is quite different from the name-binding assignment Python has.) I don't really doubt that Perl and Python could use the same VM.... I'm not entirely certain howmuch of the shared VM the two implementations would actually be using. Is it worth it if the overlap is a mere, say, 25% ? (I think it's more, but it depends entirely on howmuch different Perl6 is from Perl5, and howmuch Python is willing to change.... Lurkers here know I'm agressively against gratuitous breakage :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From paulp@ActiveState.com Tue Jul 31 11:18:48 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 31 Jul 2001 03:18:48 -0700 Subject: [Python-Dev] Parrot -- should life imitate satire? References: <20010730051831.B1122@thyrsus.com> <20010731012432.G20676@xs4all.nl> <15206.8708.811000.468489@gargle.gargle.HOWL> <20010731115521.I20676@xs4all.nl> Message-ID: <3B668608.F68B5953@ActiveState.com> One of the things I picked up from the Perl conference is that Perl users *seem* (to me) to have a higher tolerance for code breakage than Python users. (and Python users have a higher tolerance than (let's say) Java users) Even if we put aside Perl 6, Perlers talk pretty glibly about ripping little used features out in Perl 5.8.0 and Perl 5.10 and so forth. e.g. Damian said that Autoload is going away (or pseudo hashes or something like that). Whether or not he was right, nobody in the room threw tomatoes as I'm sure they would if Guido tried to kill __getattr__. Admittedly, I never know when I hear stuff like "tr///CU is dead" or "package; is dead" whether each was a feature that has been in for three years or was added to an experimental release and removed from the next experimental release. I'm not criticizing the Perl community. Acceptance of change is a good thing! But I think they should know how conservative the Python world is. Last week there were storm troopers heading for Guidos house when he announced that the division operator is going to change its behaviour two or three years. That means it would take a major PR effort to convince the Python community that even minor language changes would be worth the benefit of sharing a VM. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From sjoerd.mullender@oratrix.com Tue Jul 31 11:23:27 2001 From: sjoerd.mullender@oratrix.com (Sjoerd Mullender) Date: Tue, 31 Jul 2001 12:23:27 +0200 Subject: [Python-Dev] Picking on platform fmod In-Reply-To: Your message of Sat, 28 Jul 2001 16:13:53 -0400. References: Message-ID: <20010731102328.1260D301CF7@bireme.oratrix.nl> Success on SGI O2 running IRIX6.5.12m with native compiler version 7.2.1.3m and compiled without -O. On Sat, Jul 28 2001 "Tim Peters" wrote: > Here's your chance to prove your favorite platform isn't a worthless pile of > monkey crap . Please run the attached. If it prints anything other > than > > 0 failures in 10000 tries > > it will probably print a lot. In that case I'd like to know which flavor of > C+libc+libm you're using, and the OS; a few of the failures it prints may be > helpful too. -- Sjoerd Mullender From barry@zope.com Tue Jul 31 11:54:54 2001 From: barry@zope.com (Barry A. Warsaw) Date: Tue, 31 Jul 2001 06:54:54 -0400 Subject: [Python-Dev] cgitb.py for Python 2.2 References: <200107310622.CAA11742@cj20424-a.reston1.va.home.com> Message-ID: <15206.36478.421953.437702@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: >> What i'm proposing is that we toss cgitb.py into the standard >> library (pretty small at about 100 lines, since all the heavy >> lifting is in pydoc and inspect). Then we can add this to >> site.py: if os.environ.has_key("GATEWAY_INTERFACE"): import >> sys, cgitb sys.excepthook = cgitb.excepthook GvR> Why not add this to cgi.py instead? Th site.py GvR> initialization is accumulating a lot of cruft, and I don't GvR> like new additions that are irrelevant for most apps (CGI is GvR> a tiny niche for Python IMO). (I also think all the stuff GvR> that's only for interactive mode should be moved off to GvR> another module that is only run in interactive mode.) I'm at best +0 on adding it to site.py too. E.g. for performance reasons Mailman's cgi wrappers invoke Python with -S to avoid the expensive overhead of importing site.py for each cgi hit. -Barry From barry@zope.com Tue Jul 31 12:01:03 2001 From: barry@zope.com (Barry A. Warsaw) Date: Tue, 31 Jul 2001 07:01:03 -0400 Subject: [Python-Dev] pep-discuss References: <3B62EB05.396DF4D7@ActiveState.com> Message-ID: <15206.36847.621663.568615@anthem.wooz.org> >>>>> "PP" == Paul Prescod writes: PP> We've talked about having a mailing list for general PP> PEP-related discussions. Two things make me think that PP> revisiting this would be a good idea right now. PP> First, the recent loosening up of the python-dev rules PP> threatens the quality of discussion about bread and butter PP> issues such as patch discussions and process issues. I'm not worrying about that until it becomes a problem. :) PP> Second, the flamewar on python-list basically drowned out the PP> usual newbie questions and would give a person coming new to PP> Python a very negative opinion about the language's future and PP> the friendliness of the community. I would rather redirect as PP> much as possible of that to a list that only interested PP> participants would have to endure. For me too, it'd be just another list to subscribe to and follow, so I'm generally against a separate pep list too. One thing I'll note: in Mailman 2.1 we will be able to define "topics" and you will be able to filter on specific topics. E.g. if we defined a pep topic, you could filter out all pep messages, receive only pep messages, or do mail client filtering on the X-Topics: header. (This only works for regular delivery, not digest delivery.) just-dont-ask-when-MM2.1-will-be-ready-ly y'rs, -Barry From guido@zope.com Tue Jul 31 12:31:21 2001 From: guido@zope.com (Guido van Rossum) Date: Tue, 31 Jul 2001 07:31:21 -0400 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: Your message of "Tue, 31 Jul 2001 01:47:03 PDT." <3B667087.EBBE8938@ActiveState.com> References: <007701c11954$6b0017c0$8a73fea9@newmexico> <3B663140.41CB9DD7@ActiveState.com> <200107310629.CAA11818@cj20424-a.reston1.va.home.com> <3B667087.EBBE8938@ActiveState.com> Message-ID: <200107311131.HAA15851@cj20424-a.reston1.va.home.com> > Guido van Rossum wrote: > > > > > Also, the .NET CLR is standardized at ECMA so we could (at least in > > > theory!) go to the meetings and try to influence version 2. > > > > Notice the addition "in theory". In practice, this is BS. > > It depends on the rules and politics of each particular standards group. > It is fundamentally a social activity. It also depends how much effort > you are willing to put into promoting your cause. Sam Ruby is chair of > the ECMA CLI group. He is a big scripting language fan. > > http://www2.hursley.ibm.com/tc39/ > > Also note the presence of Mike Cowlishaw of REXX fame and Dave Raggett > of the W3C. > > Working within a standards body is a gamble. It can pay off big or it > can completely fail. We might find Microsoft our strongest ally -- they > have always been interested in having the scripting languages work well > on their platforms. They would hate to give programmers to have an > excuse to stick to Unix or the JVM. So it boils down to us vs. MS. Guess who wins whenever there's a disagreement. I still maintain that it's a waste of our time. > I don't personally know enough about this particular circumstance to > know whether there is any possibility of significantly influencing > version 2 or not. Maybe the gamble isn't worth the effort. But I > wouldn't dismiss it out of hand. Well, your boss has a pact with MS, so AS might pull it off. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@mems-exchange.org Tue Jul 31 12:52:58 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 31 Jul 2001 07:52:58 -0400 Subject: [Python-Dev] cgitb.py for Python 2.2 In-Reply-To: <15206.8227.652539.471067@anthem.wooz.org>; from barry@zope.com on Mon, Jul 30, 2001 at 11:04:03PM -0400 References: <15206.8227.652539.471067@anthem.wooz.org> Message-ID: <20010731075258.A2757@ute.cnri.reston.va.us> On Mon, Jul 30, 2001 at 11:04:03PM -0400, Barry A. Warsaw wrote: >I'll take a closer look at cgitb.py when I get a chance, but I'm >generally +1 on the idea. +0 from me, though I also think it would be better in cgi.py and not in site.py. It would also be useful if it could mail tracebacks and return a non-committal but secure error message to the browser; I'll contribute that as a patch if cgitb.py goes in. (Or should that be cgi/tb.py? Hmm...) --amk From akuchlin@mems-exchange.org Tue Jul 31 13:01:28 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 31 Jul 2001 08:01:28 -0400 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: <15206.8511.147000.832644@gargle.gargle.HOWL>; from gnat@oreilly.com on Mon, Jul 30, 2001 at 08:08:47PM -0700 References: <20010730051831.B1122@thyrsus.com> <20010731012432.G20676@xs4all.nl> <20010730205657.A2298@ute.cnri.reston.va.us> <15206.8511.147000.832644@gargle.gargle.HOWL> Message-ID: <20010731080128.B2757@ute.cnri.reston.va.us> On Mon, Jul 30, 2001 at 08:08:47PM -0700, Nathan Torkington wrote: >Andrew Kuchling writes: >The core loop would just be the usual opcode dispatch loop ("call the >function for the current operation, which returns the next >operation"). The only difference is that some of the opcodes would be >specific to RE matches. (I'm unclear on how much special logic RE The big difference I see between regex opcodes and language opcodes is that regexes need to backtrack and language ones don't. Unless the idea is to compile a regex to actual VM code similar to that generated by Python/Perl code, but then wouldn't that sacrifice efficiency? --amk From paulp@ActiveState.com Tue Jul 31 13:39:53 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 31 Jul 2001 05:39:53 -0700 Subject: [Python-Dev] Frank Willison Message-ID: <3B66A719.4252CAAC@ActiveState.com> The Python world has lost a great friend in Frank Willison. Frank died yesterday of a massive heart attack. I've searched in vain for a biography of Frank for those that didn't know him but perhaps he was too modest to put his biography on the Web. Suffice to say that before there were 30 or 10 or 5 Python books, before acquisitions editors started cold-calling Python programmers, Frank had a sense that this little language could become something. In Frank's words: "This is my third Python Conference. At the first one, a loyal 70 or so Python loyalists debated potential new features of the language. At the second, 120 or so Python programmers split their time between a review of language features and the discussion of interesting Python applications. At this conference, the third, we moved onto a completely different level. Presentations and demonstrations at this conference of nearly 250 attendees have covered applications built on Python. Companies are demonstrating their Python-based products. There is venture capital here. There are people here because they want to learn about Python. This year, mark my words: Python is here to stay." http://www.oreilly.com/frank/pythonconf_0100.html The O'Reilly books that Frank edited helped to give Python the legitimacy it needed to get over the hump. I carefully put in the word "helped" because Frank requires honesty and modesty: "O'Reilly doesn't legitimize. If we did, lots of technology creators who enjoy their status as bastards would shun us. We try to find the technologies that are interesting and powerful, that solve the problems people really have. Then we take pleasure in publishing an interesting book on that subject. I'd like to put another issue to rest: the Camel book did not legitimize Perl. It may have accelerated Perl's adoption by making information about Perl more readily available. But the truth is that Perl would have succeeded without an O'Reilly book (as would Python and Zope), and that we're very pleased to have been smart enough to recognize Perl's potential before other publishers did." http://www.oreilly.com/frank/legitimacy_1199.html Frank was also a Perl guy. He was big enough for both worlds. To me he was a Perl guy but *the* Python guy. Frank was the guy who got Python books into print. He and his protege Laura Llewin were constantly on the lookout for opportunities to write about Python. Much more important than anything he did with or for Python: Frank was a really great guy with an excellent sense of humor and a way of connecting with people. I know all of that after only meeting him two or three times because it was just so obvious what kind of person he was that it didn't take you any time to figure it out. You can find more of Frank's writings here: http://www.oreilly.com/frank/ Paul Prescod From mal@lemburg.com Tue Jul 31 14:28:39 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 31 Jul 2001 15:28:39 +0200 Subject: [Python-Dev] PyOS_snprintf() / PyOS_vsnprintf() Message-ID: <3B66B287.5D319774@lemburg.com> Just to let you know and to initiate some cross-platform testing: While working on the warning patch for modsupport.c, I've added two new APIs which hopefully make it easier for Python to switch to buffer overflow safe [v]snprintf() APIs for error reporting et al. The two new APIs are PyOS_snprintf() and PyOS_vsnprintf() and work just like the standard ones in many C libs. On platforms which have snprintf(), the native APIs are used, on all other an emulation with snprintf() tries to do its best. Please try them out on your platform. If all goes well, I think we should replace all sprintf() (without the n in the name) with these new safer APIs. Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From skip@pobox.com (Skip Montanaro) Tue Jul 31 15:07:26 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 31 Jul 2001 09:07:26 -0500 Subject: [Python-Dev] zipfiles on sys.path In-Reply-To: <3B65AA15.27947.E9B214D@localhost> References: <20010725215830.2F49D14A25D@oratrix.oratrix.nl> <3B65AA15.27947.E9B214D@localhost> Message-ID: <15206.48030.99097.902155@beluga.mojam.com> Gordon> ... but it's my observation that package authors are enamored of Gordon> import hacks, so be wary. One for amk's quotes file? ;-) Skip From skip@pobox.com (Skip Montanaro) Tue Jul 31 15:51:22 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 31 Jul 2001 09:51:22 -0500 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: <20010731013448.H20676@xs4all.nl> References: <200107302251.KAA00585@s454.cosc.canterbury.ac.nz> <20010731013448.H20676@xs4all.nl> Message-ID: <15206.50666.998086.720321@beluga.mojam.com> Skip> The main stumbling block was that pesky "from module import *" Skip> statement. It could push an unknown quantity of stuff onto the Skip> stack Greg> Are you *sure* about that? I'm pretty certain it can't be true, Greg> since the compiler has to know at all times how much is on the Greg> stack, so it can decide how much stack space is needed. Thomas> I think Skip meant it does an arbitrary number of Thomas> load-onto-stack Thomas> store-into-namespace Thomas> operations. Skip, you'll be glad to know that's no longer true Thomas> :) Since 2.0 (or when was it that we introduced 'import as' ?) Thomas> import-* is not a special case of 'IMPORT_FROM', but rather a Thomas> separate opcode that doesn't touch the stack. I'm not sure what I meant any more. (They say eye witness testimony in a courtroom is quite unreliable.) I'm pretty sure Greg's analysis is at least partly correct (in that that couldn't have been why I failed to implement a converter for IMPORT_FROM). I went back and looked briefly at my old code last night (which was broken when I put it aside - don't *ever* do that!) and could find nothing that would indicate why I didn't like "from-import-*". The instruction set converter would refuse to try converting any code that contained these opcdes: {LOAD,STORE,DELETE}_NAME, SETUP_{FINALLY,EXCEPT}, or IMPORT_FROM. At this point in time I'm not sure which of those six opcodes were just ones I hadn't gotten around to writing converters for and which were showstoppers. wish-i-had-more-time-for-this-ly y'rs, Skip From skip@pobox.com (Skip Montanaro) Tue Jul 31 16:02:25 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 31 Jul 2001 10:02:25 -0500 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: References: <20010730051831.B1122@thyrsus.com> Message-ID: <15206.51329.561652.565480@beluga.mojam.com> I was thinking a little about a Python/Perl VM merge. One problem I imagine would be difficult to reconcile is the subtle difference in semantics of various basic types. Consider the various bits of Python's (proposed) number system that Perl might not have (or want): rationals, automatic promotion from machine ints to longs, complex numbers. These may not work well with Perl's semantics. What about exceptions? Do Python and Perl have similar notions of what exceptional conditions exist? Skip From pedroni@inf.ethz.ch Tue Jul 31 16:06:57 2001 From: pedroni@inf.ethz.ch (Samuele Pedroni) Date: Tue, 31 Jul 2001 17:06:57 +0200 Subject: [Python-Dev] Parrot -- should life imitate satire? References: <007701c11954$6b0017c0$8a73fea9@newmexico> <3B663140.41CB9DD7@ActiveState.com> Message-ID: <003801c119d2$724781c0$8a73fea9@newmexico> Thanks for the answer. > Samuele Pedroni wrote: > > > >... > > A question: are there already some data about > > what would be the actual performance of Python.NET vs. CPython ? > > I think it is safe to say that the current version of Python.NET is > slower than Jython. Now it hasn't been optimized as much as Jython so we > might be able to get it as fast as Jython. But this maybe will wonder you, but Jython is not that much optimized, it's mostly a straightforward OO design. But I think that's the only way to avoid specializing for some development state of the JVMs. For exampe we have changed nothing, but it seems (it seems) that under Java 1.4 asymptotically (meaning you need a long running process to exploit the HotSpot technology) Jython is a bit faster than CPython, at least for non I/O intesive stuff. It seems they optimized reflection. > But I don't think that there > is anything in the .NET runtime that makes it a great deal better than > the JVM for dynamic languages. I have the same impression, unless one can do something really clever with boxing/unboxing without loosing too much cycles or going in the way of the compiler. > The only difference is that Microsoft > seems more aware of the problem and may move to correct it whereas I > have a feeling that explicit support for our languages would dilute > Sun's 100% Java marketing campaign. But will Sun be such a passive actor, even if MS will have a market advatage supporting especially scripting languages. There is much hype in both camps, but Unix/C seem to show that you need a good system language and the possibility to write some scripting languages over it to have a good platform. > Also, the .NET CLR is standardized > at ECMA so we could (at least in theory!) go to the meetings and try to > influence version 2. I imagine you can go the same way entering the JCP. ASF is in for example. Samuele Pedroni. From mclay@nist.gov Tue Jul 31 04:11:52 2001 From: mclay@nist.gov (Michael McLay) Date: Mon, 30 Jul 2001 23:11:52 -0400 Subject: [Python-Dev] Revised decimal type PEP In-Reply-To: <3B666C9C.4400BD9C@lemburg.com> References: <0107301106520A.02216@fermi.eeel.nist.gov> <3B666C9C.4400BD9C@lemburg.com> Message-ID: <01073023115207.02466@fermi.eeel.nist.gov> On Tuesday 31 July 2001 04:30 am, M.-A. Lemburg wrote: > How will you be able to define the precision of decimals ? Implicit > by providing a decimal string with enough 0s to let the parser > deduce the precision ? Explicit like so: decimal(12, 5) ? Would the following work? For literal type definitions the precision would be implicit. For values set using the decimal() function the definition would be implicit unless an explicit precision definition is set. The following would all define the same value and precision. 3.40d decimal("3.40") decimal(3.4, 2) Those were easy. How would the following be interpreted? decimal 3.404, 2) decimal 3.405, 2) decimal(3.39999, 2) > Also, what happens to the precision of the decimal object resulting > from numeric operations ? Good question. I'm not the right person to answer this, but here's is a first stab at what I would expect. For addition, subtraction, and multiplication the results would be exact with no rounding of the results. Calculations that include division the number of digits in a non-terminating result will have to be explicitly set. Would it make sense for this to be definedby the numbers used in the calculation? Could this be set in the module or could it be global for the application? What do you suggestion? > > > A decimal number can be used to represent integers and floating point > > numbers and decimal numbers can also be displayed using scientific > > notation. Examples of decimal numbers include: > > > > ... > > > > This proposal will also add an optional 'b' suffix to the > > representation of binary float type literals and binary int type > > literals. > > Hmm, I don't quite grasp the need for the 'b'... numbers without > any modifier will work the same way as they do now, right ? I made a change to the parsenumber() function in compile.c so that the type of the number is determined by the suffix attached to the number. To retain backward compatibility the tokenizer automatically attaches the 'b' suffix to float and int types if they do not have a suffix in the literal definition. My original PEP included the definition of a .dp and a dpython mode for the interpreter in which the default number type is decimal instead of binary. When the mode is switch the language becomes easier to use for developing applications that use decimal numbers. > > Expressions that mix binary floats with decimals introduce the > > possibility of unexpected results because the two number types use > > different internal representations for the same numerical value. > > I'd rather have this explicit in the sense that you define which > assumptions will be made and what issues arise (rounding, truncation, > loss of precision, etc.). Can you give an example of how this might be implemented. > > To accommodate the three possible usage models the python interpreter > > command line options will be used to set the level for warning and > > error messages. The three levels are: > > > > promiscuous mode, -f or --promiscuous > > safe mode -s or --save > > pedantic mode -p or --pedantic > > How about a generic option: > > --numerics:[loose|safe|pedantic] or -n:[l|s|p] Thanks for the suggestion. I"ll change it. From aahz@rahul.net Tue Jul 31 17:37:02 2001 From: aahz@rahul.net (Aahz Maruch) Date: Tue, 31 Jul 2001 09:37:02 -0700 (PDT) Subject: [Python-Dev] Revised decimal type PEP In-Reply-To: <01073023115207.02466@fermi.eeel.nist.gov> from "Michael McLay" at Jul 30, 2001 11:11:52 PM Message-ID: <20010731163703.2F86E99C85@waltz.rahul.net> Michael McLay wrote: > > Those were easy. How would the following be interpreted? > > decimal 3.404, 2) > decimal 3.405, 2) > decimal(3.39999, 2) > > [...] > > For addition, subtraction, and multiplication the results would be > exact with no rounding of the results. Calculations that include > division the number of digits in a non-terminating result will have to > be explicitly set. Would it make sense for this to be definedby the > numbers used in the calculation? Could this be set in the module or > could it be global for the application? This is why Cowlishaw et al require a full context for all operations. At one point I tried implementing things with the context being contained in the number rather than "global" (which actually means thread-global, but I'm probably punting on *that* bit for the moment), but Tim Peters persuaded me that sticking with the spec was the Right Thing until *after* the spec was fully implemented. After seeing the mess generated by PEP-238, I'm fervently in favor of sticking with external specs whenever possible. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From mal@lemburg.com Tue Jul 31 17:36:28 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 31 Jul 2001 18:36:28 +0200 Subject: [Python-Dev] Revised decimal type PEP References: <0107301106520A.02216@fermi.eeel.nist.gov> <3B666C9C.4400BD9C@lemburg.com> <01073023115207.02466@fermi.eeel.nist.gov> Message-ID: <3B66DE8C.C9C62012@lemburg.com> Michael McLay wrote: > > On Tuesday 31 July 2001 04:30 am, M.-A. Lemburg wrote: > > How will you be able to define the precision of decimals ? Implicit > > by providing a decimal string with enough 0s to let the parser > > deduce the precision ? Explicit like so: decimal(12, 5) ? > > Would the following work? For literal type definitions the precision would > be implicit. For values set using the decimal() function the definition > would be implicit unless an explicit precision definition is set. The > following would all define the same value and precision. > > 3.40d > decimal("3.40") > decimal(3.4, 2) > > Those were easy. How would the following be interpreted? > > decimal 3.404, 2) > decimal 3.405, 2) > decimal(3.39999, 2) I'd suggest to follow the rules for the SQL definitions of DECIMAL(,). > > Also, what happens to the precision of the decimal object resulting > > from numeric operations ? > > Good question. I'm not the right person to answer this, but here's is a > first stab at what I would expect. > > For addition, subtraction, and multiplication the results would be exact with > no rounding of the results. Calculations that include division the number of > digits in a non-terminating result will have to be explicitly set. Would it > make sense for this to be definedby the numbers used in the calculation? > Could this be set in the module or could it be global for the application? > > What do you suggestion? Well, there are several options. I support that the IBM paper on decimal types has good hints as to what the type should do. Again, SQL is probably a good source for inspiration too, since it deals with decimals a lot. > > > > > A decimal number can be used to represent integers and floating point > > > numbers and decimal numbers can also be displayed using scientific > > > notation. Examples of decimal numbers include: > > > > > > ... > > > > > > This proposal will also add an optional 'b' suffix to the > > > representation of binary float type literals and binary int type > > > literals. > > > > Hmm, I don't quite grasp the need for the 'b'... numbers without > > any modifier will work the same way as they do now, right ? > > I made a change to the parsenumber() function in compile.c so that the type > of the number is determined by the suffix attached to the number. To retain > backward compatibility the tokenizer automatically attaches the 'b' suffix to > float and int types if they do not have a suffix in the literal definition. > > My original PEP included the definition of a .dp and a dpython mode for the > interpreter in which the default number type is decimal instead of binary. > When the mode is switch the language becomes easier to use for developing > applications that use decimal numbers. I see, the small 'b' still looks funny to me though. Wouldn't 1.23f and 25i be more intuitive ? > > > Expressions that mix binary floats with decimals introduce the > > > possibility of unexpected results because the two number types use > > > different internal representations for the same numerical value. > > > > I'd rather have this explicit in the sense that you define which > > assumptions will be made and what issues arise (rounding, truncation, > > loss of precision, etc.). > > Can you give an example of how this might be implemented. You would typically first coerce the types to the "larger" type, e.g. float + decimal -> float + float -> float, so you'd only have to document how the conversion is done and which accuracy to expect. > > > To accommodate the three possible usage models the python interpreter > > > command line options will be used to set the level for warning and > > > error messages. The three levels are: > > > > > > promiscuous mode, -f or --promiscuous > > > safe mode -s or --save > > > pedantic mode -p or --pedantic > > > > How about a generic option: > > > > --numerics:[loose|safe|pedantic] or -n:[l|s|p] > > Thanks for the suggestion. I"ll change it. Great. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@zope.com Tue Jul 31 17:56:51 2001 From: guido@zope.com (Guido van Rossum) Date: Tue, 31 Jul 2001 12:56:51 -0400 Subject: [Python-Dev] PyOS_snprintf() / PyOS_vsnprintf() In-Reply-To: Your message of "Tue, 31 Jul 2001 15:28:39 +0200." <3B66B287.5D319774@lemburg.com> References: <3B66B287.5D319774@lemburg.com> Message-ID: <200107311656.MAA16366@cj20424-a.reston1.va.home.com> > While working on the warning patch for modsupport.c, > I've added two new APIs which hopefully make it easier for Python > to switch to buffer overflow safe [v]snprintf() APIs for error > reporting et al. > > The two new APIs are PyOS_snprintf() and > PyOS_vsnprintf() and work just like the standard ones in many > C libs. On platforms which have snprintf(), the native APIs are used, > on all other an emulation with snprintf() tries to do its best. > > Please try them out on your platform. If all goes well, I think > we should replace all sprintf() (without the n in the name) > with these new safer APIs. It would be easier to test out the fallback implementation if there was a config option to enable it even on platforms that do have the native version. Or maybe (following the getopt example) we might consider always using our own code -- so it gets the maximum testing. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@zope.com Tue Jul 31 18:08:47 2001 From: guido@zope.com (Guido van Rossum) Date: Tue, 31 Jul 2001 13:08:47 -0400 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: Your message of "Tue, 31 Jul 2001 10:02:25 CDT." <15206.51329.561652.565480@beluga.mojam.com> References: <20010730051831.B1122@thyrsus.com> <15206.51329.561652.565480@beluga.mojam.com> Message-ID: <200107311708.NAA16497@cj20424-a.reston1.va.home.com> > I was thinking a little about a Python/Perl VM merge. One problem I imagine > would be difficult to reconcile is the subtle difference in semantics of > various basic types. Consider the various bits of Python's (proposed) > number system that Perl might not have (or want): rationals, automatic > promotion from machine ints to longs, complex numbers. These may not work > well with Perl's semantics. What about exceptions? Do Python and Perl have > similar notions of what exceptional conditions exist? Actually, this may not be as big a deal as I thought before. The PVM doesn't have a lot of knowledge about types built into its instruction set. It knows a bit about classes, lists, dicts, but not e.g. about ints and strings. The opcodes are mostly very abstract: BINARY_ADD etc. --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA@ActiveState.com Tue Jul 31 18:21:59 2001 From: DavidA@ActiveState.com (David Ascher) Date: Tue, 31 Jul 2001 10:21:59 -0700 Subject: [Python-Dev] Frank Willison Message-ID: <3B66E937.D2390F90@ActiveState.com> As Paul mentioned on python-list, Frank Willison died of a heart attack yesterday. I'm sad. --david From mal@lemburg.com Tue Jul 31 18:22:52 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 31 Jul 2001 19:22:52 +0200 Subject: [Python-Dev] PyOS_snprintf() / PyOS_vsnprintf() References: <3B66B287.5D319774@lemburg.com> <200107311656.MAA16366@cj20424-a.reston1.va.home.com> Message-ID: <3B66E96C.FBAB8A62@lemburg.com> Guido van Rossum wrote: > > > While working on the warning patch for modsupport.c, > > I've added two new APIs which hopefully make it easier for Python > > to switch to buffer overflow safe [v]snprintf() APIs for error > > reporting et al. > > > > The two new APIs are PyOS_snprintf() and > > PyOS_vsnprintf() and work just like the standard ones in many > > C libs. On platforms which have snprintf(), the native APIs are used, > > on all other an emulation with snprintf() tries to do its best. > > > > Please try them out on your platform. If all goes well, I think > > we should replace all sprintf() (without the n in the name) > > with these new safer APIs. > > It would be easier to test out the fallback implementation if there > was a config option to enable it even on platforms that do have the > native version. > > Or maybe (following the getopt example) we might consider always using > our own code -- so it gets the maximum testing. How about always enabling our version in the alpha cycle and then reverting back to the native one in the betas ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From DavidA@ActiveState.com Tue Jul 31 18:40:16 2001 From: DavidA@ActiveState.com (David Ascher) Date: Tue, 31 Jul 2001 10:40:16 -0700 Subject: [Python-Dev] pep-discuss References: <3B62EB05.396DF4D7@ActiveState.com> <15206.36847.621663.568615@anthem.wooz.org> Message-ID: <3B66ED80.61B7E4C6@ActiveState.com> "Barry A. Warsaw" wrote: > PP> Second, the flamewar on python-list basically drowned out the > PP> usual newbie questions and would give a person coming new to > PP> Python a very negative opinion about the language's future and > PP> the friendliness of the community. I would rather redirect as > PP> much as possible of that to a list that only interested > PP> participants would have to endure. > > For me too, it'd be just another list to subscribe to and follow, so > I'm generally against a separate pep list too. > > One thing I'll note: in Mailman 2.1 we will be able to define "topics" > and you will be able to filter on specific topics. E.g. if we defined > a pep topic, you could filter out all pep messages, receive only pep > messages, or do mail client filtering on the X-Topics: header. (This > only works for regular delivery, not digest delivery.) But that doesn't really solve the problem for newbies who aren't going to set up filters just for this Python list they just got onto. IMO, having 100 or so people add a new list is cheaper than having 10's of 1000's of people setting up filter. But whatever. =) --david From skip@pobox.com (Skip Montanaro) Tue Jul 31 18:48:32 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 31 Jul 2001 12:48:32 -0500 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: <200107311708.NAA16497@cj20424-a.reston1.va.home.com> References: <20010730051831.B1122@thyrsus.com> <15206.51329.561652.565480@beluga.mojam.com> <200107311708.NAA16497@cj20424-a.reston1.va.home.com> Message-ID: <15206.61296.958360.72700@beluga.mojam.com> Guido> The PVM doesn't have a lot of knowledge about types built into Guido> its instruction set.... The opcodes are mostly very abstract: Guido> BINARY_ADD etc. Yeah, but the runtime behind the virtual machine knows a hell of a lot about the types. A stream of opcodes doesn't mean anything without the semantics of the functions the interpreter loop calls to do its work. I thought the aim of Eric's Parrot idea was that Perl and Python might be able to share a virtual machine. If both can generate something like today's BINARY_ADD opcode, the underlying types of both Python and Perl better have the same semantics. Skip From nascheme@mems-exchange.org Tue Jul 31 18:54:57 2001 From: nascheme@mems-exchange.org (Neil Schemenauer) Date: Tue, 31 Jul 2001 13:54:57 -0400 Subject: [Python-Dev] Good news about ExtensionClass and Python 2.2a1 Message-ID: <20010731135457.A15139@mems-exchange.org> After a few tweaks to ExtensionClass and a few small fixes to some of our introspection code I'm happy to say that Python 2.2a1 passes our unit test suite. This is significant since there are about 45000 lines of code (counted by "wc -l") tested by 3569 test cases. Since we use ZODB ExtensionClasses are quite widely used. Merging descr_branch into HEAD sounds like a good idea to me. Well done Guido. I'm going to spend a bit of time trying to rewrite the ZODB Persistent class as a type. Attached is a diff of the changes I made to ExtensionClass. Neil --- ExtensionClass.h.dist Tue Jul 31 11:50:39 2001 +++ ExtensionClass.h Tue Jul 31 12:15:21 2001 @@ -143,12 +143,48 @@ reprfunc tp_str; getattrofunc tp_getattro; setattrofunc tp_setattro; - /* Space for future expansion */ - long tp_xxx3; - long tp_xxx4; + + /* Functions to access object as input/output buffer */ + PyBufferProcs *tp_as_buffer; + + /* Flags to define presence of optional/expanded features */ + long tp_flags; char *tp_doc; /* Documentation string */ + /* call function for all accessible objects */ + traverseproc tp_traverse; + + /* delete references to contained objects */ + inquiry tp_clear; + + /* rich comparisons */ + richcmpfunc tp_richcompare; + + /* weak reference enabler */ + long tp_weaklistoffset; + + /* Iterators */ + getiterfunc tp_iter; + iternextfunc tp_iternext; + + /* Attribute descriptor and subclassing stuff */ + struct PyMethodDef *tp_methods; + struct memberlist *tp_members; + struct getsetlist *tp_getset; + struct _typeobject *tp_base; + PyObject *tp_dict; + descrgetfunc tp_descr_get; + descrsetfunc tp_descr_set; + long tp_dictoffset; + initproc tp_init; + allocfunc tp_alloc; + newfunc tp_new; + destructor tp_free; /* Low-level free-memory routine */ + PyObject *tp_bases; + PyObject *tp_mro; /* method resolution order */ + PyObject *tp_defined; + #ifdef COUNT_ALLOCS /* these must be last */ int tp_alloc; @@ -302,7 +338,9 @@ { PyExtensionClassCAPI->Export(D,N,&T); } /* Convert a method list to a method chain. */ -#define METHOD_CHAIN(DEF) { DEF, NULL } +#define METHOD_CHAIN(DEF) \ + 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, \ + { DEF, NULL } /* The following macro checks whether a type is an extension class: */ #define PyExtensionClass_Check(TYPE) \ @@ -336,7 +374,9 @@ #define PURE_MIXIN_CLASS(NAME,DOC,METHODS) \ static PyExtensionClass NAME ## Type = { PyObject_HEAD_INIT(NULL) \ 0, # NAME, sizeof(PyPureMixinObject), 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ - 0, 0, 0, 0, 0, 0, 0, DOC, {METHODS, NULL}, \ + 0, 0, 0, 0, 0, 0, 0, DOC, \ + 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, \ + {METHODS, NULL}, \ EXTENSIONCLASS_BASICNEW_FLAG} /* The following macros provide limited access to extension-class --- ExtensionClass.c.dist Tue Jul 31 11:01:20 2001 +++ ExtensionClass.c Tue Jul 31 12:15:24 2001 @@ -119,7 +119,7 @@ static PyObject *subclass_watcher=0; /* Object that subclass events */ static void -init_py_names() +init_py_names(void) { #define INIT_PY_NAME(N) py ## N = PyString_FromString(#N) INIT_PY_NAME(__add__); @@ -1800,8 +1800,8 @@ if (PyFunction_Check(r) || NeedsToBeBound(r)) ASSIGN(r,newPMethod(self,r)); - else if (PyMethod_Check(r) && ! PyMethod_Self(r)) - ASSIGN(r,newPMethod(self, PyMethod_Function(r))); + else if (PyMethod_Check(r) && ! PyMethod_GET_SELF(r)) + ASSIGN(r,newPMethod(self, PyMethod_GET_FUNCTION(r))); return r; } @@ -3527,7 +3527,7 @@ }; void -initExtensionClass() +initExtensionClass(void) { PyObject *m, *d; char *rev="$Revision: 1.1 $"; From DavidA@ActiveState.com Tue Jul 31 18:57:35 2001 From: DavidA@ActiveState.com (David Ascher) Date: Tue, 31 Jul 2001 10:57:35 -0700 Subject: [Python-Dev] Parrot -- should life imitate satire? References: <20010730051831.B1122@thyrsus.com> <15206.51329.561652.565480@beluga.mojam.com> <200107311708.NAA16497@cj20424-a.reston1.va.home.com> <15206.61296.958360.72700@beluga.mojam.com> Message-ID: <3B66F18F.3EE81628@ActiveState.com> Skip Montanaro wrote: > > Guido> The PVM doesn't have a lot of knowledge about types built into > Guido> its instruction set.... The opcodes are mostly very abstract: > Guido> BINARY_ADD etc. > > Yeah, but the runtime behind the virtual machine knows a hell of a lot about > the types. A stream of opcodes doesn't mean anything without the semantics > of the functions the interpreter loop calls to do its work. I thought the > aim of Eric's Parrot idea was that Perl and Python might be able to share a > virtual machine. If both can generate something like today's BINARY_ADD > opcode, the underlying types of both Python and Perl better have the same > semantics. I don't think that needs to be true _in toto_. In other words, some opcodes can be used by both languages, some can be language-specific. The implementation of the VM for a given opcode can be shared per language, or even just partially shared. BINARY_ADD can do the same thing in most languages for 'native' types, and defer to per-language codepaths for objects, for example. One problem with a hybrid approach might be that optimizations become really hard to do if you can't assume much about the semantics, or if you can only assume the union of the various semantics. But the idea is intriguing anyway =). --david From guido@zope.com Tue Jul 31 20:00:01 2001 From: guido@zope.com (Guido van Rossum) Date: Tue, 31 Jul 2001 15:00:01 -0400 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: Your message of "Tue, 31 Jul 2001 12:48:32 CDT." <15206.61296.958360.72700@beluga.mojam.com> References: <20010730051831.B1122@thyrsus.com> <15206.51329.561652.565480@beluga.mojam.com> <200107311708.NAA16497@cj20424-a.reston1.va.home.com> <15206.61296.958360.72700@beluga.mojam.com> Message-ID: <200107311900.PAA17062@cj20424-a.reston1.va.home.com> > Guido> The PVM doesn't have a lot of knowledge about types built into > Guido> its instruction set.... The opcodes are mostly very abstract: > Guido> BINARY_ADD etc. > > Yeah, but the runtime behind the virtual machine knows a hell of a lot about > the types. A stream of opcodes doesn't mean anything without the semantics > of the functions the interpreter loop calls to do its work. I thought the > aim of Eric's Parrot idea was that Perl and Python might be able to share a > virtual machine. If both can generate something like today's BINARY_ADD > opcode, the underlying types of both Python and Perl better have the same > semantics. Yeah, but the runtime could offer a choice of data types -- for Python code the constants table would contain Python ints and strings etc., for Perl code it would contain Perl string-number objects. Maybe. --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh@python.net Tue Jul 31 20:11:13 2001 From: mwh@python.net (Michael Hudson) Date: 31 Jul 2001 15:11:13 -0400 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: Guido van Rossum's message of "Tue, 31 Jul 2001 15:00:01 -0400" References: <20010730051831.B1122@thyrsus.com> <15206.51329.561652.565480@beluga.mojam.com> <200107311708.NAA16497@cj20424-a.reston1.va.home.com> <15206.61296.958360.72700@beluga.mojam.com> <200107311900.PAA17062@cj20424-a.reston1.va.home.com> Message-ID: <2mr8uwsylq.fsf@starship.python.net> Guido van Rossum writes: > > Guido> The PVM doesn't have a lot of knowledge about types built into > > Guido> its instruction set.... The opcodes are mostly very abstract: > > Guido> BINARY_ADD etc. > > > > Yeah, but the runtime behind the virtual machine knows a hell of a lot about > > the types. A stream of opcodes doesn't mean anything without the semantics > > of the functions the interpreter loop calls to do its work. I thought the > > aim of Eric's Parrot idea was that Perl and Python might be able to share a > > virtual machine. If both can generate something like today's BINARY_ADD > > opcode, the underlying types of both Python and Perl better have the same > > semantics. > > Yeah, but the runtime could offer a choice of data types -- for Python > code the constants table would contain Python ints and strings etc., for > Perl code it would contain Perl string-number objects. Maybe. And the point of this would be? I don't see much more benefit than just arranging for the numbers in Include/opcode.h to match perl's equivalents (i.e. none), but I may be missing something... Cheers, M. -- I've even been known to get Marmite *near* my mouth -- but never actually in it yet. Vegamite is right out. UnicodeError: ASCII unpalatable error: vegamite found, ham expected -- Tim Peters, comp.lang.python From skip@pobox.com (Skip Montanaro) Tue Jul 31 20:22:25 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 31 Jul 2001 14:22:25 -0500 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: <200107311900.PAA17062@cj20424-a.reston1.va.home.com> References: <20010730051831.B1122@thyrsus.com> <15206.51329.561652.565480@beluga.mojam.com> <200107311708.NAA16497@cj20424-a.reston1.va.home.com> <15206.61296.958360.72700@beluga.mojam.com> <200107311900.PAA17062@cj20424-a.reston1.va.home.com> Message-ID: <15207.1393.232974.785433@beluga.mojam.com> Guido> Yeah, but the runtime could offer a choice of data types -- for Guido> Python code the constants table would contain Python ints and Guido> strings etc., for Perl code it would contain Perl string-number Guido> objects. Maybe. So I could give a code object generated by the Python compiler to the Perl runtime and get different results than if it was executed by the Python environment? Perhaps it's time for Eric to chime in again and tell us what he really has in mind. I can't see the utility in having the same set of opcodes for the two languages if the semantics of running them under either environment aren't going to be the same. It seems like it would artificially constrain people working on the internals of both languages. Skip From gnat@oreilly.com Tue Jul 31 20:31:01 2001 From: gnat@oreilly.com (Nathan Torkington) Date: Tue, 31 Jul 2001 13:31:01 -0600 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: <200107311900.PAA17062@cj20424-a.reston1.va.home.com> References: <20010730051831.B1122@thyrsus.com> <15206.51329.561652.565480@beluga.mojam.com> <200107311708.NAA16497@cj20424-a.reston1.va.home.com> <15206.61296.958360.72700@beluga.mojam.com> <200107311900.PAA17062@cj20424-a.reston1.va.home.com> Message-ID: <15207.1909.395000.123189@gargle.gargle.HOWL> Guido van Rossum writes: > Yeah, but the runtime could offer a choice of data types -- for Python > code the constants table would contain Python ints and strings etc., for > Perl code it would contain Perl string-number objects. Maybe. A perl6 value have a vtable, essentially an array of function pointers which comprises the standard operations on that value. I talked to Dan (the perl6 internals guy, dan@sidhe.org) about an impedence mismatch between Perl and Python data types, and he pointed out that you can have Perl values and Python values, each with their own semantics, simply by having separate vtables (and thus separate functions to implement the behaviour of those types). Code can work with either type because the type carries around (in its vtable) the knowledge of how it should behave. Feel free to grill Dan about these things if you want. Nat From esr@thyrsus.com Tue Jul 31 09:14:43 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 31 Jul 2001 04:14:43 -0400 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: <15207.1393.232974.785433@beluga.mojam.com>; from skip@pobox.com on Tue, Jul 31, 2001 at 02:22:25PM -0500 References: <20010730051831.B1122@thyrsus.com> <15206.51329.561652.565480@beluga.mojam.com> <200107311708.NAA16497@cj20424-a.reston1.va.home.com> <15206.61296.958360.72700@beluga.mojam.com> <200107311900.PAA17062@cj20424-a.reston1.va.home.com> <15207.1393.232974.785433@beluga.mojam.com> Message-ID: <20010731041443.A26075@thyrsus.com> Skip Montanaro : > > Guido> Yeah, but the runtime could offer a choice of data types -- for > Guido> Python code the constants table would contain Python ints and > Guido> strings etc., for Perl code it would contain Perl string-number > Guido> objects. Maybe. > > So I could give a code object generated by the Python compiler to the Perl > runtime and get different results than if it was executed by the Python > environment? No, I don't think that's what Guido is saying. He and I are both imagining a *single* runtime, but with some type-specific opcodes that are generated only by Perl and some only generated by Python. > Perhaps it's time for Eric to chime in again and tell us what he really has > in mind. I can't see the utility in having the same set of opcodes for the > two languages if the semantics of running them under either environment > aren't going to be the same. It seems like it would artificially constrain > people working on the internals of both languages. You're right. What I have in mind starts with a common opcode interpreter, perhaps based on the Python VM but with extended opcodes where Perl type semantics don't match, and a common callout mechanism to C-level runtime libraries linked to the opcode interpreter. In the conservative version of this vision, Perl and Python have different runtimes dynamically linked to an instance of the same opcode interpreter. Memory allocation/GC and scheduling/threading are handled inside the opcode interpreter but the OS and environment binding is (mostly) in the libraries. Things Python would bring to this party: our serious-cool GC, our C extension/embedding system (*much* nicer than XS). Things Perl would bring: blazingly fast regexps, taint, flexitypes, references. In the radical version, the Perl and Python runtimes merge and the differences in semantics are implemented by compiling different wrapper sequences of opcodes around the library callouts. At this point we're doing something competitive with Microsoft's CLR. My proposed work plan is: 1. Separate the Python VM from the Python compiler. Initially it's OK if they still communicate by hard linkage but that will change later. 2. Build the Parrot VM out from the Python VM by adding the minimum number of Perliferous opcodes. 3. Start building the Perl runtime on top of that, re-using as much of the Python runtime as possible to save effort. -- Eric S. Raymond Every election is a sort of advance auction sale of stolen goods. -- H.L. Mencken From m@moshez.org Tue Jul 31 21:10:50 2001 From: m@moshez.org (Moshe Zadka) Date: Tue, 31 Jul 2001 23:10:50 +0300 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: <200107311708.NAA16497@cj20424-a.reston1.va.home.com> References: <200107311708.NAA16497@cj20424-a.reston1.va.home.com>, <20010730051831.B1122@thyrsus.com> <15206.51329.561652.565480@beluga.mojam.com> Message-ID: On Tue, 31 Jul 2001, Guido van Rossum wrote: > Actually, this may not be as big a deal as I thought before. The PVM > doesn't have a lot of knowledge about types built into its instruction > set. It knows a bit about classes, lists, dicts, but not e.g. about > ints and strings. The opcodes are mostly very abstract: BINARY_ADD etc. PUSH "1" PUSH "2" BINARY_ADD In Python that gives "12". In Perl that gives 3. Unless you suggest a PERL_BINARY_ADD and a PYTHON_BINARY_ADD, I don't see how you can around these things. -- gpg --keyserver keyserver.pgp.com --recv-keys 46D01BD6 54C4E1FE Secure (inaccessible): 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 Insecure (accessible): C5A5 A8FA CA39 AB03 10B8 F116 1713 1BCF 54C4 E1FE Learn Python! http://www.ibiblio.org/obp/thinkCSpy From mclay@nist.gov Tue Jul 31 08:27:11 2001 From: mclay@nist.gov (Michael McLay) Date: Tue, 31 Jul 2001 03:27:11 -0400 Subject: [Python-Dev] Revised decimal type PEP In-Reply-To: <3B66DE8C.C9C62012@lemburg.com> References: <0107301106520A.02216@fermi.eeel.nist.gov> <01073023115207.02466@fermi.eeel.nist.gov> <3B66DE8C.C9C62012@lemburg.com> Message-ID: <01073103271101.02004@fermi.eeel.nist.gov> On Tuesday 31 July 2001 12:36 pm, M.-A. Lemburg wrote: > I'd suggest to follow the rules for the SQL definitions > of DECIMAL(,). > Well, there are several options. I support that the IBM paper > on decimal types has good hints as to what the type should do. > Again, SQL is probably a good source for inspiration too, since > it deals with decimals a lot. Ok, I know about the IBM paper. is there online document on the SQL semantics that can be referenced in the PEP? > I see, the small 'b' still looks funny to me though. Wouldn't > 1.23f and 25i be more intuitive ? I originally used 'f' for both the integer and float. The use of 'b' was suggested by Guido. There were two reasons not to use 'i' for integers. The first has to do with how the tokenizer works. It doesn't distringuish between float and int when the token string is passed to parsenumber(). Both float and int are processed by the same function. I could have got around this problem by having the switch statement in parsenumber recognize both 'i' and 'f', but there is another problem with using 'i'. The 25i would be confusing for someone if they was trying to use an imaginary numbers If they accidentally typed 25i instead of 25j they would get an integer instead of an imaginary number. The error might not be detected since 3.0 + 4i would evaluate properly. > > > I'd rather have this explicit in the sense that you define which > > > assumptions will be made and what issues arise (rounding, truncation, > > > loss of precision, etc.). > > > > Can you give an example of how this might be implemented. > > You would typically first coerce the types to the "larger" > type, e.g. float + decimal -> float + float -> float, so > you'd only have to document how the conversion is done and > which accuracy to expect. I would be concerned about the float + decimal automatically generating a float. Would it generate an error message if the pedantic flag was set? Would it generate a warning in safe mode? Also, why do you consider a float to be a "larger" value type than decimal? Do you mean that a float is less precise? From gmcm@hypernet.com Tue Jul 31 21:27:29 2001 From: gmcm@hypernet.com (Gordon McMillan) Date: Tue, 31 Jul 2001 16:27:29 -0400 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: References: <200107311708.NAA16497@cj20424-a.reston1.va.home.com> Message-ID: <3B66DC71.25881.1347D90B@localhost> Moshe Zadka wrote: > PUSH "1" > PUSH "2" > BINARY_ADD But you get a pair of LOAD_CONSTs and a BINARY_ADD. Presumably a Perl "1" is a different object that a Python "1". - Gordon From thomas@xs4all.net Tue Jul 31 21:32:59 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 31 Jul 2001 22:32:59 +0200 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: Message-ID: <20010731223259.A626@xs4all.nl> On Tue, Jul 31, 2001 at 11:10:50PM +0300, Moshe Zadka wrote: > On Tue, 31 Jul 2001, Guido van Rossum wrote: > > Actually, this may not be as big a deal as I thought before. The PVM > > doesn't have a lot of knowledge about types built into its instruction > > set. It knows a bit about classes, lists, dicts, but not e.g. about > > ints and strings. The opcodes are mostly very abstract: BINARY_ADD etc. > PUSH "1" > PUSH "2" > BINARY_ADD > In Python that gives "12". In Perl that gives 3. > Unless you suggest a PERL_BINARY_ADD and a PYTHON_BINARY_ADD, I > don't see how you can around these things. The Perl version of the compiled code could of course be PUSH "1" COERCE_INT PUSH "2" COERCE_INT BINARY_ADD for Perl's "1" + "2" and PUSH "1" PUSH "2" BINARY_ADD for it's "1" . "2" (or, in the case of variables instead of literals, an explicit 'COERCE_STRING' or whatever.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mclay@nist.gov Tue Jul 31 08:40:21 2001 From: mclay@nist.gov (Michael McLay) Date: Tue, 31 Jul 2001 03:40:21 -0400 Subject: [Python-Dev] Revised decimal type PEP In-Reply-To: <20010731163703.2F86E99C85@waltz.rahul.net> References: <20010731163703.2F86E99C85@waltz.rahul.net> Message-ID: <01073103402102.02004@fermi.eeel.nist.gov> On Tuesday 31 July 2001 12:37 pm, Aahz Maruch wrote: > Michael McLay wrote: > > For addition, subtraction, and multiplication the results would be > > exact with no rounding of the results. Calculations that include > > division the number of digits in a non-terminating result will have to > > be explicitly set. Would it make sense for this to be definedby the > > numbers used in the calculation? Could this be set in the module or > > could it be global for the application? > > This is why Cowlishaw et al require a full context for all operations. > At one point I tried implementing things with the context being > contained in the number rather than "global" (which actually means > thread-global, but I'm probably punting on *that* bit for the moment), > but Tim Peters persuaded me that sticking with the spec was the Right > Thing until *after* the spec was fully implemented. > > After seeing the mess generated by PEP-238, I'm fervently in favor of > sticking with external specs whenever possible. I had originally expected the context for decimal calculations to be the module in which a statement is defined. If a function defined in another module is called the rules of that other module would be applied to that part of the calculation. My expectations of how Python would work with decimal numbers doesn't seem to match what Guido said about his conversation with Tim, and what you said in this message. How can the rules for using decimals be stated so that a newbie can understand what they should expect to happen? We could set a default precision of 17 digits and all calculations that were not exact would be rounded to 17 digits. This would match how their calculator works. I would think this would be the model with the least suprises. For someone needing to be more precise, or less precise, how would this rule be modified? From paulp@ActiveState.com Tue Jul 31 21:48:45 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 31 Jul 2001 13:48:45 -0700 Subject: [Python-Dev] Parrot -- should life imitate satire? References: <200107311708.NAA16497@cj20424-a.reston1.va.home.com>, <20010730051831.B1122@thyrsus.com> <15206.51329.561652.565480@beluga.mojam.com> Message-ID: <3B6719AD.EAC715FA@ActiveState.com> Moshe Zadka wrote: > >... > > PUSH "1" > PUSH "2" > BINARY_ADD > > In Python that gives "12". In Perl that gives 3. > Unless you suggest a PERL_BINARY_ADD and a PYTHON_BINARY_ADD, I > don't see how you can around these things. I'm not endorsing the approach but I think the answer is: PUSH PyString("1") PUSH PyString("2") BINARY_ADD versus PUSH PlString("1") PUSH PlString("2") BINARY_ADD i.e. the operators are generic but the operand types vary across languages. So you can completely unify the bytecodes or the types, but trying to unify both seems impossible without changing the semantics of one language or the other quite a bit. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From dan@sidhe.org Tue Jul 31 21:51:30 2001 From: dan@sidhe.org (Dan Sugalski) Date: Tue, 31 Jul 2001 16:51:30 -0400 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: <20010731041443.A26075@thyrsus.com> References: <15207.1393.232974.785433@beluga.mojam.com> <20010730051831.B1122@thyrsus.com> <15206.51329.561652.565480@beluga.mojam.com> <200107311708.NAA16497@cj20424-a.reston1.va.home.com> <15206.61296.958360.72700@beluga.mojam.com> <200107311900.PAA17062@cj20424-a.reston1.va.home.com> <15207.1393.232974.785433@beluga.mojam.com> Message-ID: <5.1.0.14.0.20010731161946.02753210@24.8.96.48> [Eric, could you forward this to python-dev if it doesn't show of its own accord? I'm not yet subscribed, so I don't know if it'll make it] I should start with an apology for not being on python-dev when this started. Do please Cc me on anything, as I've not gotten on yet. (My subscription's caught in the mail, I guess... :) At 04:14 AM 7/31/2001 -0400, Eric S. Raymond wrote: >Skip Montanaro : > > > > Guido> Yeah, but the runtime could offer a choice of data types -- for > > Guido> Python code the constants table would contain Python ints and > > Guido> strings etc., for Perl code it would contain Perl string-number > > Guido> objects. Maybe. > > > > So I could give a code object generated by the Python compiler to the Perl > > runtime and get different results than if it was executed by the Python > > environment? > >No, I don't think that's what Guido is saying. He and I are both imagining >a *single* runtime, but with some type-specific opcodes that are generated >only by Perl and some only generated by Python. Odds are there won't even be a different set of opcodes. (Barring the possibility of the optimizer being able to *know* that an operation is guaranteed to be integer or float, and thus using special-purpose opcodes. And that's really an optimization, not a set of language-specific opcodes) The behaviour of data is governed by the data itself, so Python variables would have Python vtables attached to them guaranteeing Python behaviour, while perl ones would have perl vtables guaranteeing perl behaviour. This was covered, more or less, by the chunks of the internals talk I didn't get to. Slides, for the interested, are at http://dev.perl.org/perl6/talks/. I'm not sure if there's enough info on the slides themselves to be clear--they were written to be talked around. > > Perhaps it's time for Eric to chime in again and tell us what he really has > > in mind. I can't see the utility in having the same set of opcodes for the > > two languages if the semantics of running them under either environment > > aren't going to be the same. It seems like it would artificially constrain > > people working on the internals of both languages. > >You're right. > >What I have in mind starts with a common opcode interpreter, perhaps >based on the Python VM but with extended opcodes where Perl type >semantics don't match, and a common callout mechanism to C-level >runtime libraries linked to the opcode interpreter. I've snipped the rest here. I don't think Parrot will be built off the Python interpreter. This isn't out of any NIH feelings or anything--I'm obligated to make it work for Perl, as that's the primary point. If we can make Python a primary point too that's keen, and something I *want*, but I do need to keep focused on perl. Having said that, what I'm doing is stepping back from perl and trying, wherever possible, to make the runtime generic. If there's no reason to be perl specific I'm not, and so far that's not been a problem. (It actually makes life easier in a lot of ways, since we can then delegate the decision on how things are done to the variables involved, providing a default set of behaviours which the parser will end up determining anyway) On some things I think I'm being a bit more vicious than, say, Python is by default. (For example, if extension code wants to hold on to a variable across a GC boundary it had darned well better register that fact with the interpreter, or it's going to find itself with trash) I'm not sure about the extension mechanism in general--I've not had a chance to look too closely at what Python does now, but I don't doubt that, at the C level, the differences between the languages will be pretty trivial and easily abstractable. Seeing what you folks have is on the list 'o things to do--I may well steal from it wholesale. :) I expect there's a bunch of stuff I'm missing here, so if anyone wants to peg me with questions, go for it. (Cc me if they're going to the dev list please, at least until I'm sure I'm on) I really would like to see Parrot as a viable back end for Python--I think the joint development resources we could muster (possibly with the Ruby folks as well) could get us a VM for dynamically typed languages to rival the JVM/.NET for statically typed ones. Dan --------------------------------------"it's like this"------------------- Dan Sugalski even samurai dan@sidhe.org have teddy bears and even teddy bears get drunk From thomas@xs4all.net Tue Jul 31 21:54:45 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 31 Jul 2001 22:54:45 +0200 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: <15207.1909.395000.123189@gargle.gargle.HOWL> References: <20010730051831.B1122@thyrsus.com> <15206.51329.561652.565480@beluga.mojam.com> <200107311708.NAA16497@cj20424-a.reston1.va.home.com> <15206.61296.958360.72700@beluga.mojam.com> <200107311900.PAA17062@cj20424-a.reston1.va.home.com> <15207.1909.395000.123189@gargle.gargle.HOWL> Message-ID: <20010731225445.B626@xs4all.nl> On Tue, Jul 31, 2001 at 01:31:01PM -0600, Nathan Torkington wrote: > Guido van Rossum writes: > > Yeah, but the runtime could offer a choice of data types -- for Python > > code the constants table would contain Python ints and strings etc., for > > Perl code it would contain Perl string-number objects. Maybe. > A perl6 value have a vtable, essentially an array of function pointers > which comprises the standard operations on that value. I talked to > Dan (the perl6 internals guy, dan@sidhe.org) about an impedence > mismatch between Perl and Python data types, and he pointed out that > you can have Perl values and Python values, each with their own > semantics, simply by having separate vtables (and thus separate > functions to implement the behaviour of those types). Code can work > with either type because the type carries around (in its vtable) the > knowledge of how it should behave. Python objects all have vtables too (though they're structs, not arrays... I'm not sure why you'd use arrays; check the way Python uses them, you can do just about anything you want with them, including growing them without breaking binary compatibility, due to the fact Python never memmoves/copies) but that wouldn't solve the problem. The problem isn't that the VM wouldn't know what to do with the various types -- it's absolutely problem to make a Python object that behaves like a Perl scalar, or a Perl hash, including the auto-converting bits... The problem is that we'd end up with two different sets of types... Dicts/hashes could be merged, though Perl6 will have to decide if it still wants to auto-stringify the keys (Python dicts can hold any hashable object as key) and arrays could possibly be too, but scalars are a different type. You basically lose the interchangability benifit if Perl6 code all works with the 'Scalar' type, but Python code just uses the distinct int/string/class-instance... But now that I think about it, this might not be a big problem after all. I assume Perl6 will always convert to fit the operation, like Perl5 does. It'll just have to learn to handle a few more objects, and most notably user-defined types and extension types. Python C code already does things like 'PyObject_ToInt' to convert a Python value to a C value it can work with, or just uses the PyObject_ (or PyMapping_ , etc) API to manipulate objects. Python code wouldn't notice the difference unless it did type checks, and the Perl6 types could be made siblings of the Python types to make it pass those, too. We already have the 8-bit and 16-bit strings. About the only *real* problem I see with that is getting the whole farm of mexican jumping beans to figure-skate in unison... It'll be an interesting experience, with a lot of slippery falls and just-in-time recovering... not to mention quite a bit of ego-massaging :-) But I think it's just a manner of typing code and taking the time, and forget about optimizing the code the first couple of years. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From aahz@rahul.net Tue Jul 31 22:07:02 2001 From: aahz@rahul.net (Aahz Maruch) Date: Tue, 31 Jul 2001 14:07:02 -0700 (PDT) Subject: [Python-Dev] Revised decimal type PEP In-Reply-To: <01073103402102.02004@fermi.eeel.nist.gov> from "Michael McLay" at Jul 31, 2001 03:40:21 AM Message-ID: <20010731210702.A778D99C82@waltz.rahul.net> Michael McLay wrote: > > I had originally expected the context for decimal calculations to be > the module in which a statement is defined. If a function defined > in another module is called the rules of that other module would be > applied to that part of the calculation. My expectations of how > Python would work with decimal numbers doesn't seem to match what > Guido said about his conversation with Tim, and what you said in this > message. > > How can the rules for using decimals be stated so that a newbie can > understand what they should expect to happen? We could set a default > precision of 17 digits and all calculations that were not exact would > be rounded to 17 digits. This would match how their calculator works. > I would think this would be the model with the least suprises. For > someone needing to be more precise, or less precise, how would this > rule be modified? I intend to have more discussions with Cowlishaw once I finish implementing his spec, but I suspect his answer will be that whoever calls the module should set the precision. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From niemeyer@conectiva.com Tue Jul 31 22:09:54 2001 From: niemeyer@conectiva.com (Gustavo Niemeyer) Date: Tue, 31 Jul 2001 18:09:54 -0300 Subject: [Python-Dev] Info documentation Message-ID: <20010731180954.J19610@tux.distro.conectiva> Hello! I've taken the info files somebody has sent to the python-list and included in Conectiva Linux' python package. People found it very practical to use the documentation in this format. Would it be possible to have this format built just like the others for version 2.2? Thanks! -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From esr@thyrsus.com Tue Jul 31 10:18:08 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 31 Jul 2001 05:18:08 -0400 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: <20010731225445.B626@xs4all.nl>; from thomas@xs4all.net on Tue, Jul 31, 2001 at 10:54:45PM +0200 References: <20010730051831.B1122@thyrsus.com> <15206.51329.561652.565480@beluga.mojam.com> <200107311708.NAA16497@cj20424-a.reston1.va.home.com> <15206.61296.958360.72700@beluga.mojam.com> <200107311900.PAA17062@cj20424-a.reston1.va.home.com> <15207.1909.395000.123189@gargle.gargle.HOWL> <20010731225445.B626@xs4all.nl> Message-ID: <20010731051808.A27187@thyrsus.com> Thomas Wouters : > About the only *real* problem I see with that is getting the whole farm of > mexican jumping beans to figure-skate in unison... It'll be an interesting > experience, with a lot of slippery falls and just-in-time recovering... not > to mention quite a bit of ego-massaging :-) But I think it's just a manner > of typing code and taking the time, and forget about optimizing the code the > first couple of years. This is just about exactly how I see it, too. The big problem isn't any of the technical challenges -- the discussion so far indicates these are surmountable, and in fact may be less daunting than many of us originally assumed. The big problem will be summoning the political will to make the right commitments and the right compromises. Making this work is going to take strong leadership from Larry and Guido. We're laying some of the technical groundwork now. More will have to be done. But I think the key moment, if it happens, will be the one at which Guido and Larry, each flanked by their three or four chief lieutenants, shake hands for the cameras and issue a joint ukase to their tribes. Tim, hosting that meeting will be your job, of course :-). -- Eric S. Raymond "Those who make peaceful revolution impossible will make violent revolution inevitable." -- John F. Kennedy From tim.one@home.com Tue Jul 31 23:19:39 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 31 Jul 2001 18:19:39 -0400 Subject: [Python-Dev] Plan to merge descr-branch into trunk Message-ID: Unless somebody raises a killer objection over the next ~24 hours, I plan to merge the descr-branch back into the trunk Wednesday PM (EDT), thus ending descr-branch as a distinct line of Python development. Since it would be intractably hard to roll back the code changes, this represents a solid commitment to Guido's type/class work for 2.2 final. There may be objections on those grounds. If so, good luck selling them to Guido . I don't have any worries about the mechanics of the merge, so you shouldn't either. We've been very conscientious over the last month+ about merging trunk changes into descr-branch frequently, and of course I'll do that one last time before going the other direction. all's-well-that-ends-ly y'rs - tim From esr@thyrsus.com Tue Jul 31 14:59:41 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 31 Jul 2001 09:59:41 -0400 Subject: [Python-Dev] Parrot -- should life imitate satire? In-Reply-To: ; from ping@lfw.org on Tue, Jul 31, 2001 at 06:12:55PM -0700 References: <20010731041443.A26075@thyrsus.com> Message-ID: <20010731095941.E1708@thyrsus.com> Ka-Ping Yee : > On Tue, 31 Jul 2001, Eric S. Raymond wrote: > > Things Python would bring to this party: our serious-cool GC, our > > C extension/embedding system (*much* nicer than XS). Things Perl would > > bring: blazingly fast regexps, taint, flexitypes, references. > > I don't really understand the motivation. Do we want any of those things? No, but we want to be able to interoperate with Perl and have if possible have just one back end on which efforts to do things like native code compilation can be concentrated. -- Eric S. Raymond The common argument that crime is caused by poverty is a kind of slander on the poor. -- H. L. Mencken From tim.one at home.com Sun Jul 1 03:58:29 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 30 Jun 2001 21:58:29 -0400 Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: <3B3E4487.40054EAE@ActiveState.com> Message-ID: [Paul Prescod] > "The Energy is the mass of the object times the speed of light times > two." [David Ascher] > Actually, it's "squared", not times two. At least in my universe =) This is something for Guido to Pronounce on, then. Who's going to write the PEP? The threat of nuclear war seems almost laughable in Paul's universe, so it's certainly got attractions. OTOH, it's got to be a lot colder too. energy-will-do-what-guido-tells-it-to-do-ly y'rs - tim From paulp at ActiveState.com Sun Jul 1 05:59:02 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Sat, 30 Jun 2001 20:59:02 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <20010630141524.E029999C80@waltz.rahul.net> <3B3E23D3.69D591DD@ActiveState.com> <3B3E4487.40054EAE@ActiveState.com> Message-ID: <3B3EA006.14882609@ActiveState.com> David Ascher wrote: > > > "The Energy is the mass of the object times the speed of light times > > two." > > Actually, it's "squared", not times two. At least in my universe =) Pedant. Next you're going to claim that these silly equations effect my life somehow. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp at ActiveState.com Sun Jul 1 06:04:49 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Sat, 30 Jun 2001 21:04:49 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> <3B3D2869.5C1DDCF1@ActiveState.com> <3B3DBD86.81F80D06@egenix.com> Message-ID: <3B3EA161.1375F74C@ActiveState.com> "M.-A. Lemburg" wrote: > >... > > The term "character" in Python should really only be used for > the 8-bit strings. Are we going to change chr() and unichr() to one_element_string() and unicode_one_element_string() u[i] is a character. If u is Unicode, then u[i] is a Python Unicode character. No Python user will find that confusing no matter how Unicode knuckle-dragging, mouth-breathing, wife-by-hair-dragging they are. > In Unicode a "character" can mean any of: Mark Davis said that "people" can use the word to mean any of those things. He did not say that it was imprecisely defined in Unicode. Nevertheless I'm not using the Unicode definition anymore than our standard library uses an ancient Greek definition of integer. Python has a concept of integer and a concept of character. > > It has been proposed that there should be a module for working > > with UTF-16 strings in narrow Python builds through some sort of > > abstraction that handles surrogates for you. If someone wants > > to implement that, it will be another PEP. > > Uhm, narrow builds don't support UTF-16... it's UCS-2 which > is supported (basically: store everything in range(0x10000)); > the codecs can map code points to surrogates, but it is solely > their responsibility and the responsibility of the application > using them to take care of dealing with surrogates. The user can view the data as UCS-2, UTF-16, Base64, ROT-13, XML, .... Just as we have a base64 module, we could have a UTF-16 module that interprets the data in the string as UTF-16 and does surrogate manipulation for you. Anyhow, if any of those is the "real" encoding of the data, it is UTF-16. After all, if the codec reads in four non-BMP characters in, let's say, UTF-8, we represent them as 8 narrow-build Python characters. That's the definition of UTF-16! But it's easy enough for me to take that word out so I will. >... > Also, the module will be useful for both narrow and wide builds, > since the notion of an encoded character can involve multiple code > points. In that sense Unicode is always a variable length > encoding for characters and that's the application field of > this module. I wouldn't advise that you do all different types of normalization in a single module but I'll wait for your PEP. > Here's the adjusted text: > > It has been proposed that there should be a module for working > with Unicode objects using character-, word- and line- based > indexing. The details of the implementation is left to > another PEP. It has been proposed that there should be a module that handles surrogates in narrow Python builds for programmers. If someone wants to implement that, it will be another PEP. It might also be combined with features that allow other kinds of character-, word- and line- based indexing. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From DavidA at ActiveState.com Sun Jul 1 08:09:40 2001 From: DavidA at ActiveState.com (David Ascher) Date: Sat, 30 Jun 2001 23:09:40 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <20010630141524.E029999C80@waltz.rahul.net> <3B3E23D3.69D591DD@ActiveState.com> <3B3E4487.40054EAE@ActiveState.com> <3B3EA006.14882609@ActiveState.com> Message-ID: <3B3EBEA4.3EC84EAF@ActiveState.com> Paul Prescod wrote: > > David Ascher wrote: > > > > > "The Energy is the mass of the object times the speed of light times > > > two." > > > > Actually, it's "squared", not times two. At least in my universe =) > > Pedant. Next you're going to claim that these silly equations effect my > life somehow. Although one stretch the argument to say that the equations _effect_ your life, I'd limit the claim to stating that they _affect_ your life. pedantly y'rs, --dr david From paulp at ActiveState.com Sun Jul 1 08:15:46 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Sat, 30 Jun 2001 23:15:46 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <20010630141524.E029999C80@waltz.rahul.net> <3B3E23D3.69D591DD@ActiveState.com> <3B3E4487.40054EAE@ActiveState.com> <3B3EA006.14882609@ActiveState.com> <3B3EBEA4.3EC84EAF@ActiveState.com> Message-ID: <3B3EC012.A3A05E64@ActiveState.com> David Ascher wrote: > > Paul Prescod wrote: > > > > David Ascher wrote: > > > > > > > "The Energy is the mass of the object times the speed of light times > > > > two." > > > > > > Actually, it's "squared", not times two. At least in my universe =) > > > > Pedant. Next you're going to claim that these silly equations effect my > > life somehow. > > Although one stretch the argument to say that the equations _effect_ ^ might ----- > your life, I'd limit the claim to stating that they _affect_ your life. And you just bought such a shiny, new glass, house. Pity. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From nhodgson at bigpond.net.au Sun Jul 1 15:00:15 2001 From: nhodgson at bigpond.net.au (Neil Hodgson) Date: Sun, 1 Jul 2001 23:00:15 +1000 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> <3B3D2869.5C1DDCF1@ActiveState.com> <3B3DBD86.81F80D06@egenix.com> <3B3EA161.1375F74C@ActiveState.com> Message-ID: <00dd01c1022d$c61e4160$0acc8490@neil> Paul Prescod: The problem I have with this PEP is that it is a compile time option which makes it hard to work with both 32 bit and 16 bit strings in one program. Can not the 32 bit string type be introduced as an additional type? > Are we going to change chr() and unichr() to one_element_string() and > unicode_one_element_string() > > u[i] is a character. If u is Unicode, then u[i] is a Python Unicode > character. This wasn't usefully true in the past for DBCS strings and is not the right way to think of either narrow or wide strings now. The idea that strings are arrays of characters gets in the way of dealing with many encodings and is the primary difficulty in localising software for Japanese. Iteration through the code units in a string is a problem waiting to bite you and string APIs should encourage behaviour which is correct when faced with variable width characters, both DBCS and UTF style. Iteration over variable width characters should be performed in a way that preserves the integrity of the characters. M.-A. Lemburg's proposed set of iterators could be extended to indicate encoding "for c in s.asCharacters('utf-8')" and to provide for the various intended string uses such as "for c in s.inVisualOrder()" reversing the receipt of right-to-left substrings. Neil From guido at digicool.com Sun Jul 1 15:44:29 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 01 Jul 2001 09:44:29 -0400 Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: Your message of "Sun, 01 Jul 2001 23:00:15 +1000." <00dd01c1022d$c61e4160$0acc8490@neil> References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> <3B3D2869.5C1DDCF1@ActiveState.com> <3B3DBD86.81F80D06@egenix.com> <3B3EA161.1375F74C@ActiveState.com> <00dd01c1022d$c61e4160$0acc8490@neil> Message-ID: <200107011344.f61DiTM03548@odiug.digicool.com> > > > The problem I have with this PEP is that it is a compile time option > which makes it hard to work with both 32 bit and 16 bit strings in one > program. Can not the 32 bit string type be introduced as an additional type? Not without an outrageous amount of additional coding (every place in the code that currently uses PyUnicode_Check() would have to be bifurcated in a 16-bit and a 32-bit variant). I doubt that the desire to work with both 16- and 32-bit characters in one program is typical for folks using Unicode -- that's mostly limited to folks writing conversion tools. Python will offer the necessary codecs so you shouldn't have this need very often. You can use the array module to manipulate 16- and 32-bit arrays, and you can use the various Unicode encodings to do the necessary encodings. > > u[i] is a character. If u is Unicode, then u[i] is a Python Unicode > > character. > > This wasn't usefully true in the past for DBCS strings and is not the > right way to think of either narrow or wide strings now. The idea that > strings are arrays of characters gets in the way of dealing with many > encodings and is the primary difficulty in localising software for Japanese. Can you explain the kind of problems encountered in some more detail? > Iteration through the code units in a string is a problem waiting to bite > you and string APIs should encourage behaviour which is correct when faced > with variable width characters, both DBCS and UTF style. But this is not the Unicode philosophy. All the variable-length character manipulation is supposed to be taken care of by the codecs, and then the application can deal in arrays of characteres. Alternatively, the application can deal in opaque objects representing variable-length encodings, but then it should be very careful with concatenation and even more so with slicing. > Iteration over > variable width characters should be performed in a way that preserves the > integrity of the characters. M.-A. Lemburg's proposed set of iterators could > be extended to indicate encoding "for c in s.asCharacters('utf-8')" and to > provide for the various intended string uses such as "for c in > s.inVisualOrder()" reversing the receipt of right-to-left substrings. I think it's a good idea to provide a set of higher-level tools as well. However nobody seems to know what these higher-level tools should do yet. PEP 261 is specifically focused on getting the lower-level foundations right (i.e. the objects that represent arrays of code units), so that the authors of higher level tools will have a solid base. If you want to help author a PEP for such higher-level tools, you're welcome! --Guido van Rossum (home page: http://www.python.org/~guido/) From loewis at informatik.hu-berlin.de Sun Jul 1 15:52:58 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Sun, 1 Jul 2001 15:52:58 +0200 (MEST) Subject: [Python-Dev] Support for "wide" Unicode characters Message-ID: <200107011352.PAA27645@pandora.informatik.hu-berlin.de> > The problem I have with this PEP is that it is a compile time option > which makes it hard to work with both 32 bit and 16 bit strings in > one program. Can you elaborate why you think this is a problem? > Can not the 32 bit string type be introduced as an additional type? Yes, but not just "like that". You'd have to define an API for creating values of this type, you'd have to teach all functions which ought to accept it to process it, you'd have to define conversion operations and all that: In short, you'd have to go through all the trouble that introduction of the Unicode type gave us once again. Also, I cannot see any advantages in introducing yet another type. Implementing this PEP is straight forward, and with almost no visible effect to Python programs. People have suggested to make it a run-time decision, having the internal representation switch on demand, but that would give an API nightmare for C code that has to access such values. > u[i] is a character. If u is Unicode, then u[i] is a Python Unicode > character. > This wasn't usefully true in the past for DBCS strings and is not the > right way to think of either narrow or wide strings now. The idea > that strings are arrays of characters gets in the way of dealing > with many encodings and is the primary difficulty in localising > software for Japanese. While I don't know much about localising software for Japanese (*), I agree that 'u[i] is a character' isn't useful to say in many cases. If this is the old Python string type, I'd much prefer calling u[i] a 'byte'. Regards, Martin (*) Methinks that the primary difficulty still is translating all the documentation, and messages. Actually, keeping the translations up-to-date is even more challenging. From aahz at rahul.net Sun Jul 1 16:19:41 2001 From: aahz at rahul.net (Aahz Maruch) Date: Sun, 1 Jul 2001 07:19:41 -0700 (PDT) Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: <3B3EC012.A3A05E64@ActiveState.com> from "Paul Prescod" at Jun 30, 2001 11:15:46 PM Message-ID: <20010701141941.A323099C80@waltz.rahul.net> Paul Prescod wrote: > David Ascher wrote: >> Paul Prescod wrote: >>> David Ascher wrote: >>>>> >>>>> "The Energy is the mass of the object times the speed of light times >>>>> two." >>>> >>>> Actually, it's "squared", not times two. At least in my universe =) >>> >>> Pedant. Next you're going to claim that these silly equations effect my >>> life somehow. >> >> Although one stretch the argument to say that the equations _effect_ > ^ > might ----- > >> your life, I'd limit the claim to stating that they _affect_ your life. > > And you just bought such a shiny, new glass, house. Pity. All speeling falmes contain at least one erorr. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From just at letterror.com Sun Jul 1 16:43:08 2001 From: just at letterror.com (Just van Rossum) Date: Sun, 1 Jul 2001 16:43:08 +0200 Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: <200107011344.f61DiTM03548@odiug.digicool.com> Message-ID: <20010701164315-r01010600-c2d5b07d@213.84.27.177> Guido van Rossum wrote: > > > > > > The problem I have with this PEP is that it is a compile time option > > which makes it hard to work with both 32 bit and 16 bit strings in one > > program. Can not the 32 bit string type be introduced as an additional type? > > Not without an outrageous amount of additional coding (every place in > the code that currently uses PyUnicode_Check() would have to be > bifurcated in a 16-bit and a 32-bit variant). Alternatively, a Unicode object could *internally* be either 8, 16 or 32 bits wide (to be clear: not per character, but per string). Also a lot of work, but it'll be a lot less wasteful. > I doubt that the desire to work with both 16- and 32-bit characters in > one program is typical for folks using Unicode -- that's mostly > limited to folks writing conversion tools. Python will offer the > necessary codecs so you shouldn't have this need very often. Not a lot of people will want to work with 16 or 32 bit chars directly, but I think a less wasteful solution to the surrogate pair problem *will* be desired by people. Why use 32 bits for all strings in a program when only a tiny percentage actually *needs* more than 16? (Or even 8...) > > Iteration through the code units in a string is a problem waiting to bite > > you and string APIs should encourage behaviour which is correct when faced > > with variable width characters, both DBCS and UTF style. > > But this is not the Unicode philosophy. All the variable-length > character manipulation is supposed to be taken care of by the codecs, > and then the application can deal in arrays of characteres. Right: this is the way it should be. My difficulty with PEP 261 is that I'm afraid few people will actually enable 32-bit support (*what*?! all unicode strings become 32 bits wide? no way!), therefore making programs non-portable in very subtle ways. Just From DavidA at ActiveState.com Sun Jul 1 19:13:30 2001 From: DavidA at ActiveState.com (David Ascher) Date: Sun, 01 Jul 2001 10:13:30 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <20010630141524.E029999C80@waltz.rahul.net> <3B3E23D3.69D591DD@ActiveState.com> <3B3E4487.40054EAE@ActiveState.com> <3B3EA006.14882609@ActiveState.com> <3B3EBEA4.3EC84EAF@ActiveState.com> <3B3EC012.A3A05E64@ActiveState.com> Message-ID: <3B3F5A3A.A88B54B2@ActiveState.com> Paul: > And you just bought such a shiny, new glass, house. Pity. What kind of comma placement is that? --david From paulp at ActiveState.com Sun Jul 1 20:08:10 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Sun, 01 Jul 2001 11:08:10 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> <3B3D2869.5C1DDCF1@ActiveState.com> <3B3DBD86.81F80D06@egenix.com> <3B3EA161.1375F74C@ActiveState.com> <00dd01c1022d$c61e4160$0acc8490@neil> Message-ID: <3B3F670A.B5396D61@ActiveState.com> Neil Hodgson wrote: > > Paul Prescod: > > > The problem I have with this PEP is that it is a compile time option > which makes it hard to work with both 32 bit and 16 bit strings in one > program. Can not the 32 bit string type be introduced as an additional type? The two solutions are not mutually exclusive. If you (or someone) supplies a 32-bit type and Guido accepts it, then the compile option might fall into disuse. But this solution was chosen because it is much less work. Really though, I think that having 16-bit and 32-bit types is extra confusion for very little gain. I would much rather have a single space-efficient type that hid the details of its implementation. But nobody has volunteered to code it and Guido might not accept it even if someone did. >... > This wasn't usefully true in the past for DBCS strings and is not the > right way to think of either narrow or wide strings now. The idea that > strings are arrays of characters gets in the way of dealing with many > encodings and is the primary difficulty in localising software for Japanese. The whole benfit of moving to 32-bit character strings is to allow people to think of strings as arrays of characters. Forcing them to consider variable-length encodings is precisely what we are trying to avoid. > Iteration through the code units in a string is a problem waiting to bite > you and string APIs should encourage behaviour which is correct when faced > with variable width characters, both DBCS and UTF style. Iteration over > variable width characters should be performed in a way that preserves the > integrity of the characters. On wide Python builds there is no such thing as variable width Unicode characters. It doesn't make sense to combine two 32-bit characters to get a 64-bit one. On narrow Python builds you might want to treat a surrogate pair as a single character but I would strongly advise against it. If you want wide characters, move to a wide build. Even if a narrow build is more space efficient, you'll lose a ton of performance emulating wide characters in Python code. > ... M.-A. Lemburg's proposed set of iterators could > be extended to indicate encoding "for c in s.asCharacters('utf-8')" and to > provide for the various intended string uses such as "for c in > s.inVisualOrder()" reversing the receipt of right-to-left substrings. A floor wax and a desert topping. <0.5 wink> I don't think that the average Python programmer would want s.asCharacters('utf-8') when they already have s.decode('utf-8'). We decided a long time ago that the model for standard users would be fixed-length (1!), abstract characters. That's the way Python's Unicode subsystem has always worked. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp at ActiveState.com Sun Jul 1 20:19:17 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Sun, 01 Jul 2001 11:19:17 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <20010701164315-r01010600-c2d5b07d@213.84.27.177> Message-ID: <3B3F69A5.D7CE539D@ActiveState.com> Just van Rossum wrote: > > Guido van Rossum wrote: > > > > > > > > > > The problem I have with this PEP is that it is a compile time option > > > which makes it hard to work with both 32 bit and 16 bit strings in one > > > program. Can not the 32 bit string type be introduced as an additional type? > > > > Not without an outrageous amount of additional coding (every place in > > the code that currently uses PyUnicode_Check() would have to be > > bifurcated in a 16-bit and a 32-bit variant). > > Alternatively, a Unicode object could *internally* be either 8, 16 or 32 bits > wide (to be clear: not per character, but per string). Also a lot of work, but > it'll be a lot less wasteful. I hope this is where we end up one day. But the compile-time option is better than where we are today. Even though PEP 261 is not my favorite solution, it buys us a couple of years of wait-and-see time. Consider that computer memory is growing much faster than textual data. People's text processing techniques get more and more "wasteful" because it is now almost always possible to load the entire "text" into memory at once. I remember how some text editors used to boast that they only loaded your text "on demand". Maybe so much data will be passed to us from UCS-4 APIs that trying to "compress it" will actually be inefficient. Maybe two years from now Guido will make UCS-4 the default and only a tiny minority will notice or care. > ... > My difficulty with PEP 261 is that I'm afraid few people will actually enable > 32-bit support (*what*?! all unicode strings become 32 bits wide? no way!), > therefore making programs non-portable in very subtle ways. It really depends on what the default build option is. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp at ActiveState.com Sun Jul 1 20:22:01 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Sun, 01 Jul 2001 11:22:01 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <20010630141524.E029999C80@waltz.rahul.net> <3B3E23D3.69D591DD@ActiveState.com> <3B3E4487.40054EAE@ActiveState.com> <3B3EA006.14882609@ActiveState.com> <3B3EBEA4.3EC84EAF@ActiveState.com> <3B3EC012.A3A05E64@ActiveState.com> <3B3F5A3A.A88B54B2@ActiveState.com> Message-ID: <3B3F6A49.6E82B7DE@ActiveState.com> David Ascher wrote: > > Paul: > > And you just bought such a shiny, new glass, house. Pity. > > What kind of comma placement is that? I had to leave you something to complain about; -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From guido at digicool.com Sun Jul 1 20:37:48 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 01 Jul 2001 14:37:48 -0400 Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: Your message of "Sun, 01 Jul 2001 16:43:08 +0200." <20010701164315-r01010600-c2d5b07d@213.84.27.177> References: <20010701164315-r01010600-c2d5b07d@213.84.27.177> Message-ID: <200107011837.f61IbmZ03645@odiug.digicool.com> > Alternatively, a Unicode object could *internally* be either 8, 16 > or 32 bits wide (to be clear: not per character, but per > string). Also a lot of work, but it'll be a lot less wasteful. Depending on what you prefer to waste: developers' time or computer resources. I bet that if you try the measure the wasted space you'll find that it wastes very little compared to all the other overheads in a typical Python program: CPU time compared to writing your code in C, memory overhead for integers, etc. It so happened that the Unicode support was written to make it very easy to change the compile-time code unit size; but making this a per-string (or even global) run-time variable is much harder without touching almost every place that uses Unicode (not to mention slowing down the common case). Nobody was enthusiastic about fixing this, so our choice was really between staying with 16 bits or making 32 bits an option for those who need it. > Not a lot of people will want to work with 16 or 32 bit chars > directly, How do you know? There are more Chinese than Americans and Europeans together, and they will soon all have computers. :-) > but I think a less wasteful solution to the surrogate pair > problem *will* be desired by people. Why use 32 bits for all strings > in a program when only a tiny percentage actually *needs* more than > 16? (Or even 8...) So work in UTF-8 -- a lot of work can be done in UTF-8. > > But this is not the Unicode philosophy. All the variable-length > > character manipulation is supposed to be taken care of by the codecs, > > and then the application can deal in arrays of characteres. > > Right: this is the way it should be. > > My difficulty with PEP 261 is that I'm afraid few people will > actually enable 32-bit support (*what*?! all unicode strings become > 32 bits wide? no way!), therefore making programs non-portable in > very subtle ways. My hope and expectation is that those folks who need 32-bit support will enable it. If this solution is not sufficient, we may have to provide something else in the future, but given that the implementation effort for PEP 261 was very minimal (certainly less than the time expended in discussing it) I am very happy with it. It will take quite a while until lots of folks will need the 32-bit support (there aren't that many characters defined outside the basic plane yet). In the mean time, those that need to 32-bit support should be happy that we allow them to rebuild Python with 32-bit support. In the next 5-10 years, the 32-bit support requirement will become more common -- as will be the memory upgrades to make it painless. It's not like Python is making this decision in a vacuum either: Linux already has 32-bit wchar_t. 32-bit characters will eventually be common (even in Windows, which probably has the largest investment in 16-bit Unicode at the moment of any system). Like IPv6, we're trying to enable uncommon uses of Python without breaking things for the not-so-early adopters. Again, don't see PEP 261 as the ultimate answer to all your 32-bit Unicode questions. Just consider that realistically we have two choices: stick with 16-bit support only or make 32-bit support an option. Other approaches (more surrogate support, run-time choices, transparent variable-length encodings) simply aren't realistic -- no-one has the time to code them. It should be easy to write portable Python programs that work correctly with 16-bit Unicode characters on a "narrow" interpreter and also work correctly with 21-bit Unicode on a "wide" interpreter: just avoid using surrogates. If you *need* to work with surrogates, try to limit yourself to very simple operations like concatenations of valid strings, and splitting strings at known delimiters only. There's a lot you can do with this. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Sun Jul 1 20:52:36 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 1 Jul 2001 14:52:36 -0400 Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: <3B3F69A5.D7CE539D@ActiveState.com> Message-ID: [Paul Prescod] > ... > Consider that computer memory is growing much faster than textual data. > People's text processing techniques get more and more "wasteful" because > it is now almost always possible to load the entire "text" into memory > at once. Indeed, the entire text of the Bible fits in a corner of my year-old box's RAM, even at 32 bits per character. > I remember how some text editors used to boast that they only loaded > your text "on demand". Well, they still do -- fancy editors use fancy data structures, so that, e.g., inserting characters at the start of the file doesn't cause a 50Mb memmove each time. Response time is still important, but I'd wager relatively insensitive to basic character size (you need tricks that cut factors of 1000s off potential worst cases to give the appearance of instantaneous results; a factor of 2 or 4 is in the noise compared to what's needed regardless). From aahz at rahul.net Sun Jul 1 21:21:26 2001 From: aahz at rahul.net (Aahz Maruch) Date: Sun, 1 Jul 2001 12:21:26 -0700 (PDT) Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: <3B3F670A.B5396D61@ActiveState.com> from "Paul Prescod" at Jul 01, 2001 11:08:10 AM Message-ID: <20010701192126.9EB8299C80@waltz.rahul.net> Paul Prescod wrote: > > On wide Python builds there is no such thing as variable width Unicode > characters. It doesn't make sense to combine two 32-bit characters to > get a 64-bit one. On narrow Python builds you might want to treat a > surrogate pair as a single character but I would strongly advise against > it. If you want wide characters, move to a wide build. Even if a narrow > build is more space efficient, you'll lose a ton of performance > emulating wide characters in Python code. This needn't go into the PEP, I think, but I'd like you to say something about what you expect the end result of this PEP to look like under Windows, where "rebuild" isn't really a valid option for most Python users. Are we simply committing to make two builds available? If so, what happens the next time we run into a situation like this? -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From paulp at ActiveState.com Sun Jul 1 21:21:09 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Sun, 01 Jul 2001 12:21:09 -0700 Subject: [Python-Dev] Text editors References: Message-ID: <3B3F7825.CA3D1B5B@ActiveState.com> Tim Peters wrote: > >... > > > I remember how some text editors used to boast that they only loaded > > your text "on demand". > > Well, they still do -- fancy editors use fancy data structures, so that, > e.g., inserting characters at the start of the file doesn't cause a 50Mb > memmove each time. Yes, but most modern text editors take O(n) time to open the file. There was a time when the more advanced ones did not. Or maybe that was just SGML editors...I can't remember. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From guido at digicool.com Sun Jul 1 21:32:52 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 01 Jul 2001 15:32:52 -0400 Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: Your message of "Sun, 01 Jul 2001 12:21:26 PDT." <20010701192126.9EB8299C80@waltz.rahul.net> References: <20010701192126.9EB8299C80@waltz.rahul.net> Message-ID: <200107011932.f61JWq803843@odiug.digicool.com> > This needn't go into the PEP, I think, but I'd like you to say something > about what you expect the end result of this PEP to look like under > Windows, where "rebuild" isn't really a valid option for most Python > users. Are we simply committing to make two builds available? If so, > what happens the next time we run into a situation like this? I imagine that we will pick a choice (I expect it'll be UCS2) and make only that build available, until there are loud enough cries from folks who have a reasonable excuse not to have a copy of VCC around. Given that the rest of Windows uses 16-bit Unicode, I think we'll be able to get away with this for quite a while. --Guido van Rossum (home page: http://www.python.org/~guido/) From paulp at ActiveState.com Sun Jul 1 21:33:20 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Sun, 01 Jul 2001 12:33:20 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <20010701192126.9EB8299C80@waltz.rahul.net> Message-ID: <3B3F7B00.29D6832@ActiveState.com> Aahz Maruch wrote: > >... > > This needn't go into the PEP, I think, but I'd like you to say something > about what you expect the end result of this PEP to look like under > Windows, where "rebuild" isn't really a valid option for most Python > users. Are we simply committing to make two builds available? If so, > what happens the next time we run into a situation like this? Windows itself is strongly biased towards 16-bit characters. Therefore I expect that to be the default for a while. Then I expect Guido to announce that 32-bit characters are the new default with version 3000 (perhaps right after Windows 3000 ships) and we'll all change. So most Windows users will not be able to work with 32-bit characters for a while. But since Windows itself doesn't like those characters, they probably won't run into them much. I strongly doubt that we'll ever make two builds available because it would cause a mess of extension module incompatibilities. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp at ActiveState.com Sun Jul 1 21:57:09 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Sun, 01 Jul 2001 12:57:09 -0700 Subject: [Python-Dev] PEP 261, Rev 1.3 - Support for "wide" Unicode characters Message-ID: <3B3F8095.8D58631D@ActiveState.com> PEP: 261 Title: Support for "wide" Unicode characters Version: $Revision: 1.3 $ Author: paulp at activestate.com (Paul Prescod) Status: Draft Type: Standards Track Created: 27-Jun-2001 Python-Version: 2.2 Post-History: 27-Jun-2001 Abstract Python 2.1 unicode characters can have ordinals only up to 2**16 -1. This range corresponds to a range in Unicode known as the Basic Multilingual Plane. There are now characters in Unicode that live on other "planes". The largest addressable character in Unicode has the ordinal 17 * 2**16 - 1 (0x10ffff). For readability, we will call this TOPCHAR and call characters in this range "wide characters". Glossary Character Used by itself, means the addressable units of a Python Unicode string. Code point A code point is an integer between 0 and TOPCHAR. If you imagine Unicode as a mapping from integers to characters, each integer is a code point. But the integers between 0 and TOPCHAR that do not map to characters are also code points. Some will someday be used for characters. Some are guaranteed never to be used for characters. Codec A set of functions for translating between physical encodings (e.g. on disk or coming in from a network) into logical Python objects. Encoding Mechanism for representing abstract characters in terms of physical bits and bytes. Encodings allow us to store Unicode characters on disk and transmit them over networks in a manner that is compatible with other Unicode software. Surrogate pair Two physical characters that represent a single logical character. Part of a convention for representing 32-bit code points in terms of two 16-bit code points. Unicode string A Python type representing a sequence of code points with "string semantics" (e.g. case conversions, regular expression compatibility, etc.) Constructed with the unicode() function. Proposed Solution One solution would be to merely increase the maximum ordinal to a larger value. Unfortunately the only straightforward implementation of this idea is to use 4 bytes per character. This has the effect of doubling the size of most Unicode strings. In order to avoid imposing this cost on every user, Python 2.2 will allow the 4-byte implementation as a build-time option. Users can choose whether they care about wide characters or prefer to preserve memory. The 4-byte option is called "wide Py_UNICODE". The 2-byte option is called "narrow Py_UNICODE". Most things will behave identically in the wide and narrow worlds. * unichr(i) for 0 <= i < 2**16 (0x10000) always returns a length-one string. * unichr(i) for 2**16 <= i <= TOPCHAR will return a length-one string on wide Python builds. On narrow builds it will raise ValueError. ISSUE Python currently allows \U literals that cannot be represented as a single Python character. It generates two Python characters known as a "surrogate pair". Should this be disallowed on future narrow Python builds? Pro: Python already the construction of a surrogate pair for a large unicode literal character escape sequence. This is basically designed as a simple way to construct "wide characters" even in a narrow Python build. It is also somewhat logical considering that the Unicode-literal syntax is basically a short-form way of invoking the unicode-escape codec. Con: Surrogates could be easily created this way but the user still needs to be careful about slicing, indexing, printing etc. Therefore some have suggested that Unicode literals should not support surrogates. ISSUE Should Python allow the construction of characters that do not correspond to Unicode code points? Unassigned Unicode code points should obviously be legal (because they could be assigned at any time). But code points above TOPCHAR are guaranteed never to be used by Unicode. Should we allow access to them anyhow? Pro: If a Python user thinks they know what they're doing why should we try to prevent them from violating the Unicode spec? After all, we don't stop 8-bit strings from containing non-ASCII characters. Con: Codecs and other Unicode-consuming code will have to be careful of these characters which are disallowed by the Unicode specification. * ord() is always the inverse of unichr() * There is an integer value in the sys module that describes the largest ordinal for a character in a Unicode string on the current interpreter. sys.maxunicode is 2**16-1 (0xffff) on narrow builds of Python and TOPCHAR on wide builds. ISSUE: Should there be distinct constants for accessing TOPCHAR and the real upper bound for the domain of unichr (if they differ)? There has also been a suggestion of sys.unicodewidth which can take the values 'wide' and 'narrow'. * every Python Unicode character represents exactly one Unicode code point (i.e. Python Unicode Character = Abstract Unicode character). * codecs will be upgraded to support "wide characters" (represented directly in UCS-4, and as variable-length sequences in UTF-8 and UTF-16). This is the main part of the implementation left to be done. * There is a convention in the Unicode world for encoding a 32-bit code point in terms of two 16-bit code points. These are known as "surrogate pairs". Python's codecs will adopt this convention and encode 32-bit code points as surrogate pairs on narrow Python builds. ISSUE Should there be a way to tell codecs not to generate surrogates and instead treat wide characters as errors? Pro: I might want to write code that works only with fixed-width characters and does not have to worry about surrogates. Con: No clear proposal of how to communicate this to codecs. * there are no restrictions on constructing strings that use code points "reserved for surrogates" improperly. These are called "isolated surrogates". The codecs should disallow reading these from files, but you could construct them using string literals or unichr(). Implementation There is a new (experimental) define: #define PY_UNICODE_SIZE 2 There is a new configure option: --enable-unicode=ucs2 configures a narrow Py_UNICODE, and uses wchar_t if it fits --enable-unicode=ucs4 configures a wide Py_UNICODE, and uses whchar_t if it fits --enable-unicode same as "=ucs2" The intention is that --disable-unicode, or --enable-unicode=no removes the Unicode type altogether; this is not yet implemented. It is also proposed that one day --enable-unicode will just default to the width of your platforms wchar_t. Windows builds will be narrow for a while based on the fact that there have been few requests for wide characters, those requests are mostly from hard-core programmers with the ability to buy their own Python and Windows itself is strongly biased towards 16-bit characters. Notes This PEP does NOT imply that people using Unicode need to use a 4-byte encoding for their files on disk or sent over the network. It only allows them to do so. For example, ASCII is still a legitimate (7-bit) Unicode-encoding. It has been proposed that there should be a module that handles surrogates in narrow Python builds for programmers. If someone wants to implement that, it will be another PEP. It might also be combined with features that allow other kinds of character-, word- and line- based indexing. Rejected Suggestions More or less the status-quo We could officially say that Python characters are 16-bit and require programmers to implement wide characters in their application logic by combining surrogate pairs. This is a heavy burden because emulating 32-bit characters is likely to be very inefficient if it is coded entirely in Python. Plus these abstracted pseudo-strings would not be legal as input to the regular expression engine. "Space-efficient Unicode" type Another class of solution is to use some efficient storage internally but present an abstraction of wide characters to the programmer. Any of these would require a much more complex implementation than the accepted solution. For instance consider the impact on the regular expression engine. In theory, we could move to this implementation in the future without breaking Python code. A future Python could "emulate" wide Python semantics on narrow Python. Guido is not willing to undertake the implementation right now. Two types We could introduce a 32-bit Unicode type alongside the 16-bit type. There is a lot of code that expects there to be only a single Unicode type. This PEP represents the least-effort solution. Over the next several years, 32-bit Unicode characters will become more common and that may either convince us that we need a more sophisticated solution or (on the other hand) convince us that simply mandating wide Unicode characters is an appropriate solution. Right now the two options on the table are do nothing or do this. References Unicode Glossary: http://www.unicode.org/glossary/ Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End: -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From thomas at xs4all.net Mon Jul 2 00:12:48 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 2 Jul 2001 00:12:48 +0200 Subject: [Python-Dev] Python 2.1.1 release 'schedule' Message-ID: <20010702001248.H8098@xs4all.nl> This is just a heads-up to everyone. I plan to release Python 2.1.1c1 (release candidate 1) somewhere on Friday the 13th (of July) and, barring any serious problems, the full release the friday following that, July 20. The python 2.1.1 CVS branch (tagged 'release21-maint') should be stable, and should contain most bugfixes that will be in 2.1.1. If you care about 2.1.1's stability and portability, or you found bugs in 2.1 and aren't sure they are fixed, and you can check things out of CVS, please give the CVS branch a try: just 'checkout' python with cvs co -rrelease21-maint python (with the -d option from the SourceForge CVS page that applies to you) and follow the normal compile procedure. Binaries for Windows as well as source tarballs will be provided for the release candidate and the final release (obviously) but the more bugs people point out before the final release, the more bugs will be fixed in 2.1.1 :-) Python 2.1.1 (as well as the CVS branch) will fall under the new GPL-compatible PSF licence, just like Python 2.0.1. The only notable thing missing from the CVS branch is an updated NEWS file -- I'm working on it. I'm also not done searching the open bugs for ones that might need to be adressed in 2.1.1, but feel free to point me to bugs you think are important! 2.1.1-Patch-Czar-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From greg at cosc.canterbury.ac.nz Mon Jul 2 04:06:50 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 02 Jul 2001 14:06:50 +1200 (NZST) Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: <3B3EBEA4.3EC84EAF@ActiveState.com> Message-ID: <200107020206.OAA00427@s454.cosc.canterbury.ac.nz> David Ascher : > I'd limit the claim to stating that they _affect_ your life. If matter didn't have any rest energy, everything would fly about at the speed of light, which would make life very hectic. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Mon Jul 2 04:36:39 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 02 Jul 2001 14:36:39 +1200 (NZST) Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: <20010701164315-r01010600-c2d5b07d@213.84.27.177> Message-ID: <200107020236.OAA00432@s454.cosc.canterbury.ac.nz> Just van Rossum : > My difficulty with PEP 261 is that I'm afraid few people will actually enable > 32-bit support (*what*?! all unicode strings become 32 bits wide? no way!), > therefore making programs non-portable in very subtle ways. I agree. This can only be a stopgap measure. Ultimately the Unicode type needs to be made smarter. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Mon Jul 2 04:42:12 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 02 Jul 2001 14:42:12 +1200 (NZST) Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: <3B3F5A3A.A88B54B2@ActiveState.com> Message-ID: <200107020242.OAA00436@s454.cosc.canterbury.ac.nz> David Ascher : > > And you just bought such a shiny, new glass, house. Pity. > > What kind of comma placement is that? Obviously it's only the glass that is new, not the whole house. :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From nhodgson at bigpond.net.au Mon Jul 2 04:42:11 2001 From: nhodgson at bigpond.net.au (Neil Hodgson) Date: Mon, 2 Jul 2001 12:42:11 +1000 Subject: [Python-Dev] Support for "wide" Unicode characters References: <200107011352.PAA27645@pandora.informatik.hu-berlin.de> Message-ID: <01d601c102a0$98671580$0acc8490@neil> Martin von Loewis: > > The problem I have with this PEP is that it is a compile time option > > which makes it hard to work with both 32 bit and 16 bit strings in > > one program. > > Can you elaborate why you think this is a problem? A common role for Python is to act as glue between various modules. If Paul produces some interesting code that depends on 32 bit strings and I want to use that in conjunction with some Win32 specific or COM dependent code that wants 16 bit strings then it may not be possible or may require difficult workaronds. > (*) Methinks that the primary difficulty still is translating all the > documentation, and messages. Actually, keeping the translations > up-to-date is even more challenging. Translation of documentation and strings can be performed by almost anyone who writes both languages ("even managers") and can be budgeted by working out the amount of text and applying a conversion rate. Code requires careful thought and can lead to the typical buggy software schedule blowouts. Neil From greg at cosc.canterbury.ac.nz Mon Jul 2 04:49:56 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 02 Jul 2001 14:49:56 +1200 (NZST) Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: <200107011837.f61IbmZ03645@odiug.digicool.com> Message-ID: <200107020249.OAA00439@s454.cosc.canterbury.ac.nz> > It so happened that the Unicode support was written to make it very > easy to change the compile-time code unit size What about extension modules that deal with Unicode strings? Will they have to be recompiled too? If so, is there anything to detect an attempt to import an extension module with an incompatible Unicode character width? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From nhodgson at bigpond.net.au Mon Jul 2 04:52:45 2001 From: nhodgson at bigpond.net.au (Neil Hodgson) Date: Mon, 2 Jul 2001 12:52:45 +1000 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> <3B3D2869.5C1DDCF1@ActiveState.com> <3B3DBD86.81F80D06@egenix.com> <3B3EA161.1375F74C@ActiveState.com> <00dd01c1022d$c61e4160$0acc8490@neil> <200107011344.f61DiTM03548@odiug.digicool.com> Message-ID: <01ea01c102a2$128491c0$0acc8490@neil> Guido van Rossum: > > This wasn't usefully true in the past for DBCS strings and is > > not the right way to think of either narrow or wide strings > > now. The idea that strings are arrays of characters gets in > > the way of dealing with many encodings and is the primary > > difficulty in localising software for Japanese. > > Can you explain the kind of problems encountered in some more detail? Programmers used to working with character == indexable code unit will often split double wide characters when performing an action. For example searching for a particular double byte character "bc" may match "abcd" incorrectly where "ab" and "cd" are the characters. DBCS is not normally self synchronising although UTF-8 is. Another common problem is counting characters, for example when filling a line, hitting the line width and forcing half a character onto the next line. > I think it's a good idea to provide a set of higher-level tools as > well. However nobody seems to know what these higher-level tools > should do yet. PEP 261 is specifically focused on getting the > lower-level foundations right (i.e. the objects that represent arrays > of code units), so that the authors of higher level tools will have a > solid base. If you want to help author a PEP for such higher-level > tools, you're welcome! Its more likely I'll publish some of the low level pieces of Scintilla/SinkWorld as a Python extension providing some of these facilities in an editable-text class. Then we can see if anyone else finds the code worthwhile. Neil From nhodgson at bigpond.net.au Mon Jul 2 05:00:41 2001 From: nhodgson at bigpond.net.au (Neil Hodgson) Date: Mon, 2 Jul 2001 13:00:41 +1000 Subject: [Python-Dev] Support for "wide" Unicode characters References: Message-ID: <020b01c102a3$2dd23440$0acc8490@neil> Tim Peters: > Well, they still do -- fancy editors use fancy data structures, so that, > e.g., inserting characters at the start of the file doesn't cause a 50Mb > memmove each time. Response time is still important, but I'd wager > relatively insensitive to basic character size (you need tricks that cut > factors of 1000s off potential worst cases to give the appearance of > instantaneous results; a factor of 2 or 4 is in the noise compared to what's > needed regardless). I actually have some numbers here. Early versions of some new editor buffer code used UCS-2 on .NET and the JVM. Moving to an 8 bit buffer saved 10-20% of execution time on the insert string, delete string and global replace benchmarks using strings that fit into ASCII. These buffers did have some other overhead for line management and other features but I expect these did not affect the proportions much. Neil From tim.one at home.com Mon Jul 2 06:36:20 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 2 Jul 2001 00:36:20 -0400 Subject: [Python-Dev] RE: Python 2.1.1 release 'schedule' In-Reply-To: <20010702001248.H8098@xs4all.nl> Message-ID: Woo hoo! [Thomas Wouters] > ... > Binaries for Windows as well as source tarballs will be provided ... Building a Windows installer isn't straightforward, so you'd better let us do that part (e.g., you need the Wise installer program, Fred needs to supply appropriate HTML docs for the Windows installer to zip up, Tcl/Tk has to get unpacked and rearranged, etc). I just checked in 2.1.1c1 changes to the Windows part of the release21-maint tree, but the rest of it isn't in CVS. From thomas at xs4all.net Mon Jul 2 08:27:24 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 2 Jul 2001 08:27:24 +0200 Subject: [Python-Dev] Re: Python 2.1.1 release 'schedule' In-Reply-To: References: Message-ID: <20010702082724.K32419@xs4all.nl> On Mon, Jul 02, 2001 at 12:36:20AM -0400, Tim Peters wrote: > [Thomas Wouters] > > ... > > Binaries for Windows as well as source tarballs will be provided ... > Building a Windows installer isn't straightforward, so you'd better let us > do that part (e.g., you need the Wise installer program, Fred needs to > supply appropriate HTML docs for the Windows installer to zip up, Tcl/Tk has > to get unpacked and rearranged, etc). I just checked in 2.1.1c1 changes to > the Windows part of the release21-maint tree, but the rest of it isn't in > CVS. Oh yeah, I was entirely going to let you guys do it, or at least find another set of wintendows-weenies to do it :) That's part of why I posted the tentative release dates. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From loewis at informatik.hu-berlin.de Mon Jul 2 09:25:18 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 2 Jul 2001 09:25:18 +0200 (MEST) Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: <01d601c102a0$98671580$0acc8490@neil> (nhodgson@bigpond.net.au) References: <200107011352.PAA27645@pandora.informatik.hu-berlin.de> <01d601c102a0$98671580$0acc8490@neil> Message-ID: <200107020725.JAA25925@pandora.informatik.hu-berlin.de> > > > The problem I have with this PEP is that it is a compile time option > > > which makes it hard to work with both 32 bit and 16 bit strings in > > > one program. > > > > Can you elaborate why you think this is a problem? > > A common role for Python is to act as glue between various modules. If > Paul produces some interesting code that depends on 32 bit strings and I > want to use that in conjunction with some Win32 specific or COM dependent > code that wants 16 bit strings then it may not be possible or may require > difficult workaronds. Neither nor. All it will require is you to recompile your Python installation for to use wide Unicode. On Win32 APIs, this will mean that you cannot directly interpret PyUnicode object representations as WCHAR_T pointers. This is no problem, as you can transparently copy unicode objects into wchar_t strings; it's a matter of coming up with a good C API for doing so conveniently. Regards, Martin From fredrik at pythonware.com Mon Jul 2 10:20:09 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 2 Jul 2001 10:20:09 +0200 Subject: [Python-Dev] Support for "wide" Unicode characters References: <200107020236.OAA00432@s454.cosc.canterbury.ac.nz> Message-ID: <03b301c102cf$e0e3dd00$0900a8c0@spiff> greg wrote: > I agree. This can only be a stopgap measure. Ultimately the > Unicode type needs to be made smarter. PIL uses 8 bits per pixel to store bilevel images, and 32 bits per pixel to store 16- and 24-bit images. back in 1995, some people claimed that the image type had to be made smarter to be usable. these days, nobody ever notices... From fredrik at pythonware.com Mon Jul 2 10:08:10 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 2 Jul 2001 10:08:10 +0200 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> <3B3D2869.5C1DDCF1@ActiveState.com> <3B3DBD86.81F80D06@egenix.com> <3B3EA161.1375F74C@ActiveState.com> <00dd01c1022d$c61e4160$0acc8490@neil> Message-ID: <03b201c102cf$e0dab540$0900a8c0@spiff> Neil Hodgson wrote: > > u[i] is a character. If u is Unicode, then u[i] is a Python Unicode > > character. > > This wasn't usefully true in the past for DBCS strings and is not the > right way to think of either narrow or wide strings now. The idea that > strings are arrays of characters gets in the way if you stop confusing binary buffers with text strings, all such problems will go away. From mal at egenix.com Mon Jul 2 11:39:55 2001 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 02 Jul 2001 11:39:55 +0200 Subject: [Python-Dev] Support for "wide" Unicode characters References: <200107020249.OAA00439@s454.cosc.canterbury.ac.nz> Message-ID: <3B40416B.6438D1F7@egenix.com> Greg Ewing wrote: > > > It so happened that the Unicode support was written to make it very > > easy to change the compile-time code unit size > > What about extension modules that deal with Unicode strings? > Will they have to be recompiled too? If so, is there anything > to detect an attempt to import an extension module with an > incompatible Unicode character width? That's a good question ! The answer is: yes, extensions which use Unicode will have to be recompiled for narrow and wide builds of Python. The question is however, how to detect cases where the user imports an extension built for narrow Python into a wide build and vice versa. The standard way of looking at the API level won't help. We'd need some form of introspection API at the C level... hmm, perhaps looking at the sys module will do the trick for us ?! In any case, this is certainly going to cause trouble one of these days... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jul 2 12:13:59 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 02 Jul 2001 12:13:59 +0200 Subject: [Python-Dev] PEP 261, Rev 1.3 - Support for "wide" Unicode characters References: <3B3F8095.8D58631D@ActiveState.com> Message-ID: <3B404967.14FE180F@lemburg.com> Paul Prescod wrote: > > PEP: 261 > Title: Support for "wide" Unicode characters > Version: $Revision: 1.3 $ > Author: paulp at activestate.com (Paul Prescod) > Status: Draft > Type: Standards Track > Created: 27-Jun-2001 > Python-Version: 2.2 > Post-History: 27-Jun-2001 > > Abstract > > Python 2.1 unicode characters can have ordinals only up to 2**16 > -1. > This range corresponds to a range in Unicode known as the Basic > Multilingual Plane. There are now characters in Unicode that live > on other "planes". The largest addressable character in Unicode > has the ordinal 17 * 2**16 - 1 (0x10ffff). For readability, we > will call this TOPCHAR and call characters in this range "wide > characters". > > Glossary > > Character > > Used by itself, means the addressable units of a Python > Unicode string. Please add: also known as "code unit". > Code point > > A code point is an integer between 0 and TOPCHAR. > If you imagine Unicode as a mapping from integers to > characters, each integer is a code point. But the > integers between 0 and TOPCHAR that do not map to > characters are also code points. Some will someday > be used for characters. Some are guaranteed never > to be used for characters. > > Codec > > A set of functions for translating between physical > encodings (e.g. on disk or coming in from a network) > into logical Python objects. > > Encoding > > Mechanism for representing abstract characters in terms of > physical bits and bytes. Encodings allow us to store > Unicode characters on disk and transmit them over networks > in a manner that is compatible with other Unicode software. > > Surrogate pair > > Two physical characters that represent a single logical Eeek... two code units (or have you ever seen a physical character walking around ;-) > character. Part of a convention for representing 32-bit > code points in terms of two 16-bit code points. > > Unicode string > > A Python type representing a sequence of code points with > "string semantics" (e.g. case conversions, regular > expression compatibility, etc.) Constructed with the > unicode() function. > > Proposed Solution > > One solution would be to merely increase the maximum ordinal > to a larger value. Unfortunately the only straightforward > implementation of this idea is to use 4 bytes per character. > This has the effect of doubling the size of most Unicode > strings. In order to avoid imposing this cost on every > user, Python 2.2 will allow the 4-byte implementation as a > build-time option. Users can choose whether they care about > wide characters or prefer to preserve memory. > > The 4-byte option is called "wide Py_UNICODE". The 2-byte option > is called "narrow Py_UNICODE". > > Most things will behave identically in the wide and narrow worlds. > > * unichr(i) for 0 <= i < 2**16 (0x10000) always returns a > length-one string. > > * unichr(i) for 2**16 <= i <= TOPCHAR will return a > length-one string on wide Python builds. On narrow builds it will > raise ValueError. > > ISSUE > > Python currently allows \U literals that cannot be > represented as a single Python character. It generates two > Python characters known as a "surrogate pair". Should this > be disallowed on future narrow Python builds? > > Pro: > > Python already the construction of a surrogate pair > for a large unicode literal character escape sequence. > This is basically designed as a simple way to construct > "wide characters" even in a narrow Python build. It is also > somewhat logical considering that the Unicode-literal syntax > is basically a short-form way of invoking the unicode-escape > codec. > > Con: > > Surrogates could be easily created this way but the user > still needs to be careful about slicing, indexing, printing > etc. Therefore some have suggested that Unicode > literals should not support surrogates. > > ISSUE > > Should Python allow the construction of characters that do > not correspond to Unicode code points? Unassigned Unicode > code points should obviously be legal (because they could > be assigned at any time). But code points above TOPCHAR are > guaranteed never to be used by Unicode. Should we allow > access > to them anyhow? > > Pro: > > If a Python user thinks they know what they're doing why > should we try to prevent them from violating the Unicode > spec? After all, we don't stop 8-bit strings from > containing non-ASCII characters. > > Con: > > Codecs and other Unicode-consuming code will have to be > careful of these characters which are disallowed by the > Unicode specification. > > * ord() is always the inverse of unichr() > > * There is an integer value in the sys module that describes the > largest ordinal for a character in a Unicode string on the current > interpreter. sys.maxunicode is 2**16-1 (0xffff) on narrow builds > of Python and TOPCHAR on wide builds. > > ISSUE: Should there be distinct constants for accessing > TOPCHAR and the real upper bound for the domain of > unichr (if they differ)? There has also been a > suggestion of sys.unicodewidth which can take the > values 'wide' and 'narrow'. > > * every Python Unicode character represents exactly one Unicode code > point (i.e. Python Unicode Character = Abstract Unicode > character). > > * codecs will be upgraded to support "wide characters" > (represented directly in UCS-4, and as variable-length sequences > in UTF-8 and UTF-16). This is the main part of the implementation > left to be done. > > * There is a convention in the Unicode world for encoding a 32-bit > code point in terms of two 16-bit code points. These are known > as "surrogate pairs". Python's codecs will adopt this convention > and encode 32-bit code points as surrogate pairs on narrow Python > builds. > > ISSUE > > Should there be a way to tell codecs not to generate > surrogates and instead treat wide characters as > errors? > > Pro: > > I might want to write code that works only with > fixed-width characters and does not have to worry about > surrogates. > > Con: > > No clear proposal of how to communicate this to codecs. No need to pass this information to the codec: simply write a new one and give it a clear name, e.g. "ucs-2" will generate errors while "utf-16-le" converts them to surrogates. > * there are no restrictions on constructing strings that use > code points "reserved for surrogates" improperly. These are > called "isolated surrogates". The codecs should disallow reading > these from files, but you could construct them using string > literals or unichr(). > > Implementation > > There is a new (experimental) define: > > #define PY_UNICODE_SIZE 2 > > There is a new configure option: > > --enable-unicode=ucs2 configures a narrow Py_UNICODE, and uses > wchar_t if it fits > --enable-unicode=ucs4 configures a wide Py_UNICODE, and uses > whchar_t if it fits > --enable-unicode same as "=ucs2" > > The intention is that --disable-unicode, or --enable-unicode=no > removes the Unicode type altogether; this is not yet implemented. > > It is also proposed that one day --enable-unicode will just > default to the width of your platforms wchar_t. > > Windows builds will be narrow for a while based on the fact that > there have been few requests for wide characters, those requests > are mostly from hard-core programmers with the ability to buy > their own Python and Windows itself is strongly biased towards > 16-bit characters. > > Notes > > This PEP does NOT imply that people using Unicode need to use a > 4-byte encoding for their files on disk or sent over the network. > It only allows them to do so. For example, ASCII is still a > legitimate (7-bit) Unicode-encoding. > > It has been proposed that there should be a module that handles > surrogates in narrow Python builds for programmers. If someone > wants to implement that, it will be another PEP. It might also be > combined with features that allow other kinds of character-, > word- and line- based indexing. > > Rejected Suggestions > > More or less the status-quo > > We could officially say that Python characters are 16-bit and > require programmers to implement wide characters in their > application logic by combining surrogate pairs. This is a heavy > burden because emulating 32-bit characters is likely to be > very inefficient if it is coded entirely in Python. Plus these > abstracted pseudo-strings would not be legal as input to the > regular expression engine. > > "Space-efficient Unicode" type > > Another class of solution is to use some efficient storage > internally but present an abstraction of wide characters to > the programmer. Any of these would require a much more complex > implementation than the accepted solution. For instance consider > the impact on the regular expression engine. In theory, we could > move to this implementation in the future without breaking > Python > code. A future Python could "emulate" wide Python semantics on > narrow Python. Guido is not willing to undertake the > implementation right now. > > Two types > > We could introduce a 32-bit Unicode type alongside the 16-bit > type. There is a lot of code that expects there to be only a > single Unicode type. > > This PEP represents the least-effort solution. Over the next > several years, 32-bit Unicode characters will become more common > and that may either convince us that we need a more sophisticated > solution or (on the other hand) convince us that simply > mandating wide Unicode characters is an appropriate solution. > Right now the two options on the table are do nothing or do > this. > > References > > Unicode Glossary: http://www.unicode.org/glossary/ Plus perhaps the Mark Davis paper at: http://www-106.ibm.com/developerworks/unicode/library/utfencodingforms/ > Copyright > > This document has been placed in the public domain. Good work, Paul ! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jul 2 12:08:53 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 02 Jul 2001 12:08:53 +0200 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> <3B3D2869.5C1DDCF1@ActiveState.com> <3B3DBD86.81F80D06@egenix.com> <3B3EA161.1375F74C@ActiveState.com> Message-ID: <3B404835.4CE77C60@lemburg.com> Paul Prescod wrote: > > "M.-A. Lemburg" wrote: > > > >... > > > > The term "character" in Python should really only be used for > > the 8-bit strings. > > Are we going to change chr() and unichr() to one_element_string() and > unicode_one_element_string() No. I am just suggesting to make use of the crispy clear definitions which the Unicode Consortium has developed for us. > u[i] is a character. If u is Unicode, then u[i] is a Python Unicode > character. No Python user will find that confusing no matter how Unicode > knuckle-dragging, mouth-breathing, wife-by-hair-dragging they are. Except that u[i] maps to a code unit which may or may not be a code point. Whether a code point matches a grapheme (this is what users tend to regard as character) is yet another story due to combining code points. > > In Unicode a "character" can mean any of: > > Mark Davis said that "people" can use the word to mean any of those > things. He did not say that it was imprecisely defined in Unicode. > Nevertheless I'm not using the Unicode definition anymore than our > standard library uses an ancient Greek definition of integer. Python has > a concept of integer and a concept of character. Ok, I'll stop whining. Just as final remark, let me say that our little discussion is a perfect example of how people can misunderstand each other by using the terms in different ways (Kant tried to solve this for Philosophy and did not succeed; so I guess the Unicode Consortium doesn't stand a chance either ;-) > > > It has been proposed that there should be a module for working > > > with UTF-16 strings in narrow Python builds through some sort of > > > abstraction that handles surrogates for you. If someone wants > > > to implement that, it will be another PEP. > > > > Uhm, narrow builds don't support UTF-16... it's UCS-2 which > > is supported (basically: store everything in range(0x10000)); > > the codecs can map code points to surrogates, but it is solely > > their responsibility and the responsibility of the application > > using them to take care of dealing with surrogates. > > The user can view the data as UCS-2, UTF-16, Base64, ROT-13, XML, .... > Just as we have a base64 module, we could have a UTF-16 module that > interprets the data in the string as UTF-16 and does surrogate > manipulation for you. > > Anyhow, if any of those is the "real" encoding of the data, it is > UTF-16. After all, if the codec reads in four non-BMP characters in, > let's say, UTF-8, we represent them as 8 narrow-build Python characters. > That's the definition of UTF-16! But it's easy enough for me to take > that word out so I will. u[i] gives you a code unit and whether this maps to a code point or not is dependent on the implementation which in turn depends on the narrow/wide choice. In UCS-2, I believe, surrogates are regarded as two code points; in UTF-16 they always have to come in pairs. There's a semantic difference here which is for the codecs and these additional tools to be aware of -- not the Unicode type implementation. > >... > > Also, the module will be useful for both narrow and wide builds, > > since the notion of an encoded character can involve multiple code > > points. In that sense Unicode is always a variable length > > encoding for characters and that's the application field of > > this module. > > I wouldn't advise that you do all different types of normalization in a > single module but I'll wait for your PEP. I'll see if I find some time at the Bordeaux Python Meeting next week. > > Here's the adjusted text: > > > > It has been proposed that there should be a module for working > > with Unicode objects using character-, word- and line- based > > indexing. The details of the implementation is left to > > another PEP. > > It has been proposed that there should be a module that handles > surrogates in narrow Python builds for programmers. If someone > wants to implement that, it will be another PEP. It might also be > combined with features that allow other kinds of character-, > word- and line- based indexing. Hmm, I liked my version better, but what the heck ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jul 2 12:43:38 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 02 Jul 2001 12:43:38 +0200 Subject: [Python-Dev] Unicode Maintenance References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> <3B3AF307.6496AFB4@lemburg.com> <200106281225.f5SCPIr20874@odiug.digicool.com> Message-ID: <3B40505A.2F03EEC4@lemburg.com> Guido van Rossum wrote: > > Hi Marc-Andre, > > I'm dropping the i18n-sig from the distribution list. > > I hear you: > > > You didn't get my point. I feel responsable for the Unicode > > implementation design and would like to see it become a continued > > success. > > I'm sure we all share this goal! > > > In that sense and taking into account that I am the > > maintainer of all this stuff, I think it is very reasonable to > > ask me before making any significant changes to the implementation > > and also respect any comments I put forward. > > I understand you feel that we've rushed this in without waiting for > your comments. > > Given how close your implementation was, I still feel that the changes > weren't that significant, but I understand that you get nervous. If > Christian were to check in his speed hack changes to the guts of > ceval.c I would be nervous too! (Heck, I got nervous when Eric > checked in his library-wide string method changes without asking.) > > Next time I'll try to be more sensitive to situations that require > your review before going forward. Good. > > Currently, I have to watch the checkins list very closely > > to find out who changed what in the implementation and then to > > take actions only after the fact. Since I'm not supporting Unicode > > as my full-time job this is simply impossible. We have the SF manager > > and there is really no need to rush anything around here. > > Hm, apart from the fact that you ought to be left in charge, I think > that in this case the live checkins were a big win over the usual SF > process. At least two people were making changes, sometimes to each > other's code, and many others on at least three continents were > checking out the changes on many different platforms and immediately > reporting problems. We would definitely not have a patch as solid as > the code that's now checked in, after two days of using SF! (We > could've used a branch, but I've found that getting people to actually > check out the branch is not easy.) True, but I was thinking of the concept and design questions which should be resolved *before* taking the direct checkin approach. > So I think that the net result was favorable. Sometimes you just have > to let people work in the spur of the moment to get the results of > their best thinking, otherwise they lose interest or their train of > thought. Understood, but then I'd like to at least receive a summary of the changes in some way, so that I continue to understand how the implementation works after the checkins and which corners to keep in mind for future additions, changes, etc. > > If I am offline or too busy with other things for a day or two, > > then I want to see patches on SF and not find new versions of > > the implementation already checked in. > > That's still the general rule, but in our enthousiasm (and mine was > definitely part of this!) we didn't want to wait. Also, I have to > admit that I mistook your silence for consent -- I didn't think the > main proposed changes (making the size of Py_UNICODE a config choice) > were controversial at all, so I didn't realize you would have a problem > with it. I don't have a problem with it; I was just seeing things slip my fingers and getting worried about this. > > This has worked just fine during the last year, so I can only explain > > the latest actions in this direction with an urge to bypass my comments > > and any discussion this might cause. > > I think you're projecting your own stuff here. Not really. I have processed many patches on SF, gave comments etc. and did the final checkin. This has worked great over the last months and I intend to keep working this way since it is by far the best way to both manage and document the issues and questions which arise during the process. E.g. I'm currently processing a patch by Walter D?rwald which adds support for callback error handlers. He has done some great work there which was the result of many lively discussions. Working like this is fun while staying manageable at the same time... and again, there's really no need to rush things ! > I honestly didn't > think there was much disagreement on your part and thought we were > doing you a favor by implementing the consensus. IMO, Martin and and > Fredrik are familiar enough with both the code and the issues to do a > good job. Well, the above was my interpretation of how things went. I may have been wrong (and honestly do hope that I am wrong), but my gutt feeling simply said: hey, what are these guys doing there... is this some kind of > > Needless to say that > > quality control is not possible anymore. > > Unclear. Lots of other people looked over the changes in your > absence. And CVS makes code review after it's checked in easy enough. > (Hey, in many other open source projects that's the normal procedure > once the rough characteristics of a feature have been agreed upon: > check in first and review later!) That was not my point: quality control also includes checking the design approach. This is something which should normally be done in design/implementation/design/... phases -- just like I worked with you on the Unicode implementation late in 1999. > > Conclusion: > > I am not going to continue this work if this does not change. > > That would be sad, and I hope you will stay with us. We certainly > don't plan to ignore your comments! > > > Another other problem for me is the continued hostility I feel on i18n > > against parts of the design and some of my decisions. I am > > not talking about your feedback and the feedback from many other > > people on the list which was excellent and to high standards. > > But reading the postings of the last few months you will > > find notices of what I am referring to here (no, I don't want > > to be specific). > > I don't know what to say about this, and obviously nobody has the time > to go back and read the archives. I'm sure it's not you as a person > that was attacked. If the design isn't perfect -- and hey, since > Python is the 80 percent language, few things in it are quite perfect! > -- then (positive) criticism is an attempt to help, to move it closer > to perfection. > > If people have at times said "the Unicode support sucks", well, that > may hurt. You can't always stay friends with everybody. I get flames > occasionally for features in Python that folks don't like. I get used > to them, and it doesn't affect my confidence any more. Be the same! I'll try. > But sometimes, after saying "it sucks", people make specific > suggestions for improvements, and it's important to be open for those > even from sources that use offending language. (Within reason, of > course. I don't ask you to listen to somebody who is persistently > hostile to you as a person.) Ok. > > If people don't respect my comments or decision, then how can > > I defend the design and how can I stop endless discussions which > > simply don't lead anywhere ? So either I am missing something > > or there is a need for a clear statement from you about > > my status in all this. > > Do you really *want* to be the Unicode BDFL? Being something's BDFL a > full-time job, and you've indicated you're too busy. (Or is that > temporary?) I am currently doing a lot of consulting work, so things sometimes tighten up and are less work intense at other times. Given this setup, I think that I will be able to play the BD (without the FL) for Unicode for some time. I will certainly pass on the flag to someone else if I find myself not spending enough time on it. The only thing I'm asking for, is some more professional work mentality at times. If people make it hard for me to follow the development, then I cannot manage this task in a satisfying way. > I see you as the original coder, which means that you know that > section of the code better than anyone, and whenever there's a > question that others can't answer about its design, implementation, or > restrictions, I refer to you. But given that you've said you wouldn't > be able to work much on it, I welcome contributions by others as long > as they seem knowledgeable. Same here. > > If I don't have the right to comment on proposals and patches, > > possibly even rejecting them, then I simply don't see any > > ground for keeping the implementation in a state which I can > > maintain. > > Nobody said you couldn't comment, and you know that. If I don't get a chance to comment on a summary of changes (be it before or after a batch of checkings), how am I supposed to follow up on them ? Keeping a close eye on the checkin mailing list doesn't help: it simply doesn't always give you the big picture. We are all professional quality programmers and I respect Fredrik and Martin for their coding quality and ideas. What I am asking for is some more teamwork. > When it comes to rejecting or accepting, I feel that I am still the > final arbiter, even for Unicode, until I get hit by a bus. Since I > don't always understand the implementation or the issues, I'll of > course defer to you in cases where I think I can't make the decision, > but I do reserve the right to be convinced by others to override your > judgement, occasionally, if there's a good reason. And when you're > not responsive, I may try to channel you. (I'll try to be more > explicit about that.) That's perfectly OK (and indeed can be very useful at times). > > And last but not least: The fun-factor has faded which was > > the main motor driving my into working on Unicode in the first > > place. Nothing much you can do about this, though :-/ > > Yes, that happens to all of us at times. The fun factor goes up and > down, and sometimes we must look for fun elsewhere for a while. Then > the fun may come back where it appeared lost. Go on vacation, read a > book, tackle a new project in a totally different area! Then come > back and see if you can find some fun in the old stuff again. I'll visit the Bordeaux Python conference later week. That should give me some time to breathe (and hopefully to write some more PEPs :=). > > > Paul Prescod offered to write a PEP on this issue. My cynical half > > > believes that we'll never hear from him again, but my optimistic half > > > hopes that he'll actually write one, so that we'll be able to discuss > > > the various issues for the users with the users. I encourage you to > > > co-author the PEP, since you have a lot of background knowledge about > > > the issues. > > > > I guess your optimistic half won :-) I think Paul already did all the > > work, so I'll simply comment on what he wrote. > > Your suggestions were very valuable. My opinion of Paul also went up > a notch! > > > > BTW, I think that Misc/unicode.txt should be converted to a PEP, for > > > the historic record. It was very much a PEP before the PEP process > > > was invented. Barry, how much work would this be? No editing needed, > > > just formatting, and assignment of a PEP number (the lower the better). > > > > Thanks for converting the text to PEP format, Barry. > > > > Thanks for reading this far, > > You're welcome, and likewise. > > Just one more thing, Marc-Andre. Please know that I respect your work > very much even if we don't always agree. We would get by without you, > but Python would be hurt if you turned your back on us. Thanks. Be assured that I'll stay around for quite some time -- you won't get by that easily ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jul 2 12:56:00 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 02 Jul 2001 12:56:00 +0200 Subject: [Python-Dev] Bordeaux Python Meeting 04.07.-07.07. Message-ID: <3B405340.31C5AA11@lemburg.com> Hi everybody, I think nobody has posted an announcement for the conference yet, so I'll at least provide a pointer: http://www.lsm.abul.org/program/topic19/ Marc Poinot, who also organized the "First Python Day" in France, is chair of this subtopic at the "Debian One" conference in Bordeaux: http://www.lsm.abul.org/ Cheers, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik at pythonware.com Mon Jul 2 13:41:51 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 2 Jul 2001 13:41:51 +0200 Subject: [Python-Dev] Unicode Maintenance References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> <3B3AF307.6496AFB4@lemburg.com> <200106281225.f5SCPIr20874@odiug.digicool.com> <3B40505A.2F03EEC4@lemburg.com> Message-ID: <001e01c102eb$fe4995d0$4ffa42d5@hagrid> mal wrote: > The only thing I'm asking for, is some more professional > work mentality at times. for the record, your recent posts under this subject doesn't strike me as very professional. think about it. From paulp at ActiveState.com Mon Jul 2 16:25:55 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Mon, 02 Jul 2001 07:25:55 -0700 Subject: [I18n-sig] Re: [Python-Dev] PEP 261, Rev 1.3 - Support for "wide" Unicodecharacters References: <3B3F8095.8D58631D@ActiveState.com> <3B404967.14FE180F@lemburg.com> Message-ID: <3B408473.77AB6C8@ActiveState.com> "M.-A. Lemburg" wrote: > >... > > Character > > > > Used by itself, means the addressable units of a Python > > Unicode string. > > Please add: also known as "code unit". I'm not entirely comfortable with that. As you yourself pointed out, the same Python Unicode object can be interpreted as either a series of single-width code points *or* as a UTF-16 string where the characters are code units. You could also interpet it as a BASE64'd region or an XML document... It all depends on how you look at it. > .... > > Surrogate pair > > > > Two physical characters that represent a single logical > > Eeek... two code units (or have you ever seen a physical character > walking around ;-) No, that's sort of my point. The user can decide to adopt the convention of looking at the two characters as code units or they can ignore that interpretation and look at them as two code points. It's all relative, man. Dig it? That's why I use the word "convention" below: > > character. Part of a convention for representing 32-bit > > code points in terms of two 16-bit code points. "Surrogates are all in your head. Python doesn't know or care about them!" I'll change this to: Surrogate pair Two Python Unicode characters that represent a single logical Unicode code point. Part of a convention for representing 32-bit code points in terms of two 16-bit code points. Python has limited support for reading, writing and constructing strings that use this convention (described below). Otherwise Python ignores the convention. > No need to pass this information to the codec: simply write > a new one and give it a clear name, e.g. "ucs-2" will generate > errors while "utf-16-le" converts them to surrogates. That's a good point, but what if I want a UTF-8 codec that doesn't generate surrogates? Or even a UCS4 one? > Plus perhaps the Mark Davis paper at: > > http://www-106.ibm.com/developerworks/unicode/library/utfencodingforms/ Okay. > > Copyright > > > > This document has been placed in the public domain. > > Good work, Paul ! Thanks for your help. You did help me to clarify many things even though I argued with you as I was doing it. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From guido at digicool.com Mon Jul 2 17:23:56 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 02 Jul 2001 11:23:56 -0400 Subject: [Python-Dev] Unicode Maintenance In-Reply-To: Your message of "Mon, 02 Jul 2001 12:43:38 +0200." <3B40505A.2F03EEC4@lemburg.com> References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> <3B3AF307.6496AFB4@lemburg.com> <200106281225.f5SCPIr20874@odiug.digicool.com> <3B40505A.2F03EEC4@lemburg.com> Message-ID: <200107021523.f62FNun01807@odiug.digicool.com> Thanks for your response, Marc-Andre. I'd like to close this topic now. I'm not sure how to get you a "summary of changes", but I think you can ask Fredrik directly (Martin annonced he's away on vacation). One thing you can do is pipe the output of "cvs log" through tools/scripts/logmerge.py -- this gives you the checkin messages in (reverse?) chronological order. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jul 2 17:29:39 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 02 Jul 2001 11:29:39 -0400 Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: Your message of "Mon, 02 Jul 2001 11:39:55 +0200." <3B40416B.6438D1F7@egenix.com> References: <200107020249.OAA00439@s454.cosc.canterbury.ac.nz> <3B40416B.6438D1F7@egenix.com> Message-ID: <200107021529.f62FTdx01823@odiug.digicool.com> > Greg Ewing wrote: > > > > > It so happened that the Unicode support was written to make it very > > > easy to change the compile-time code unit size > > > > What about extension modules that deal with Unicode strings? > > Will they have to be recompiled too? If so, is there anything > > to detect an attempt to import an extension module with an > > incompatible Unicode character width? > > That's a good question ! > > The answer is: yes, extensions which use Unicode will have to > be recompiled for narrow and wide builds of Python. The question > is however, how to detect cases where the user imports an > extension built for narrow Python into a wide build and > vice versa. > > The standard way of looking at the API level won't help. We'd > need some form of introspection API at the C level... hmm, > perhaps looking at the sys module will do the trick for us ?! > > In any case, this is certainly going to cause trouble one > of these days... Here are some alternative ways to deal with this: (1) Use the preprocessor to rename all the Unicode APIs to get "Wide" appended to their name in wide mode. This makes any use of a Unicode API in an extension compiled for the wrong Py_UNICODE_SIZE fail with a link-time error. (Which should cause an ImportError for shared libraries.) (2) Ditto but only rename the PyModule_Init function. This is much less work but more coarse: a module that doesn't use any Unicode APIs (and I expect these will be a large majority) still would not be accepted. (3) Change the interpretation of PYTHON_API_VERSION so that a low bit of '1' means wide Unicode. Then you only get a warning (followed by a core dump when actually trying to use Unicode). I mentioned (1) and (3) in an earlier post. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at beowolf.digicool.com Mon Jul 2 17:37:45 2001 From: fdrake at beowolf.digicool.com (Fred Drake) Date: Mon, 2 Jul 2001 11:37:45 -0400 (EDT) Subject: [Python-Dev] [maintenance doc updates] Message-ID: <20010702153745.B304B28929@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/maint-docs/ Updated to reflect the current state of the Python 2.1.1 maintenance release branch. From mal at lemburg.com Mon Jul 2 18:51:58 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 02 Jul 2001 18:51:58 +0200 Subject: [Python-Dev] Support for "wide" Unicode characters References: <200107020249.OAA00439@s454.cosc.canterbury.ac.nz> <3B40416B.6438D1F7@egenix.com> <200107021529.f62FTdx01823@odiug.digicool.com> Message-ID: <3B40A6AE.EDE30857@lemburg.com> Guido van Rossum wrote: > > > Greg Ewing wrote: > > > > > > > It so happened that the Unicode support was written to make it very > > > > easy to change the compile-time code unit size > > > > > > What about extension modules that deal with Unicode strings? > > > Will they have to be recompiled too? If so, is there anything > > > to detect an attempt to import an extension module with an > > > incompatible Unicode character width? > > > > That's a good question ! > > > > The answer is: yes, extensions which use Unicode will have to > > be recompiled for narrow and wide builds of Python. The question > > is however, how to detect cases where the user imports an > > extension built for narrow Python into a wide build and > > vice versa. > > > > The standard way of looking at the API level won't help. We'd > > need some form of introspection API at the C level... hmm, > > perhaps looking at the sys module will do the trick for us ?! > > > > In any case, this is certainly going to cause trouble one > > of these days... > > Here are some alternative ways to deal with this: > > (1) Use the preprocessor to rename all the Unicode APIs to get "Wide" > appended to their name in wide mode. This makes any use of a > Unicode API in an extension compiled for the wrong Py_UNICODE_SIZE > fail with a link-time error. (Which should cause an ImportError > for shared libraries.) > > (2) Ditto but only rename the PyModule_Init function. This is much > less work but more coarse: a module that doesn't use any Unicode > APIs (and I expect these will be a large majority) still would not > be accepted. > > (3) Change the interpretation of PYTHON_API_VERSION so that a low bit > of '1' means wide Unicode. Then you only get a warning (followed > by a core dump when actually trying to use Unicode). > > I mentioned (1) and (3) in an earlier post. (4) Add a feature flag to PyModule_Init() which then looks up the features in the sys module and uses this as basis for processing the import requrest. In this case, I think that (5) would be the best solution, since old code will notice the change in width too. -- Marc-Andre Lemburg ________________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From paulp at ActiveState.com Mon Jul 2 20:15:41 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Mon, 02 Jul 2001 11:15:41 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <200107020249.OAA00439@s454.cosc.canterbury.ac.nz> <3B40416B.6438D1F7@egenix.com> <200107021529.f62FTdx01823@odiug.digicool.com> <3B40A6AE.EDE30857@lemburg.com> Message-ID: <3B40BA4D.9C85A202@ActiveState.com> "M.-A. Lemburg" wrote: > >... > > (4) Add a feature flag to PyModule_Init() which then looks up the > features in the sys module and uses this as basis for > processing the import requrest. Could an extension be carefully written so that a single binary could be compatible with both types of Python build? I'm thinking that it would pass data buffers with the "right width" based on checking a runtime flag... -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From just at letterror.com Mon Jul 2 20:20:38 2001 From: just at letterror.com (Just van Rossum) Date: Mon, 2 Jul 2001 20:20:38 +0200 Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: <3B40BA4D.9C85A202@ActiveState.com> Message-ID: <20010702202041-r01010600-d5c62b95@213.84.27.177> Paul Prescod wrote: > Could an extension be carefully written so that a single binary could be > compatible with both types of Python build? I'm thinking that it would > pass data buffers with the "right width" based on checking a runtime > flag... But then it would also be compatible with a unicode object using different internal storage units per string, so I'm sure this is a dead end ;-) Just From mal at lemburg.com Mon Jul 2 20:59:06 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 02 Jul 2001 20:59:06 +0200 Subject: [Python-Dev] Support for "wide" Unicode characters References: <20010702202041-r01010600-d5c62b95@213.84.27.177> Message-ID: <3B40C47A.94317663@lemburg.com> Just van Rossum wrote: > > Paul Prescod wrote: > > > Could an extension be carefully written so that a single binary could be > > compatible with both types of Python build? I'm thinking that it would > > pass data buffers with the "right width" based on checking a runtime > > flag... > > But then it would also be compatible with a unicode object using different > internal storage units per string, so I'm sure this is a dead end ;-) Agreed :-) Extension writer will have to provide two versions of the binary. -- Marc-Andre Lemburg ________________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jul 2 21:12:45 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 02 Jul 2001 21:12:45 +0200 Subject: [I18n-sig] Re: [Python-Dev] PEP 261, Rev 1.3 - Support for "wide" Unicodecharacters References: <3B3F8095.8D58631D@ActiveState.com> <3B404967.14FE180F@lemburg.com> <3B408473.77AB6C8@ActiveState.com> Message-ID: <3B40C7AD.F2646D56@lemburg.com> Paul Prescod wrote: > > "M.-A. Lemburg" wrote: > > > >... > > > Character > > > > > > Used by itself, means the addressable units of a Python > > > Unicode string. > > > > Please add: also known as "code unit". > > I'm not entirely comfortable with that. As you yourself pointed out, the > same Python Unicode object can be interpreted as either a series of > single-width code points *or* as a UTF-16 string where the characters > are code units. You could also interpet it as a BASE64'd region or an > XML document... It all depends on how you look at it. Well, that's what code unit tries to capture too: it's the basic storage unit used by the implementation for storing characters. Never mind, it's just a detail... > > .... > > > Surrogate pair > > > > > > Two physical characters that represent a single logical > > > > Eeek... two code units (or have you ever seen a physical character > > walking around ;-) > > No, that's sort of my point. The user can decide to adopt the convention > of looking at the two characters as code units or they can ignore that > interpretation and look at them as two code points. It's all relative, > man. Dig it? That's why I use the word "convention" below: Ok. > > > character. Part of a convention for representing 32-bit > > > code points in terms of two 16-bit code points. > > "Surrogates are all in your head. Python doesn't know or care about > them!" > > I'll change this to: > > Surrogate pair > > Two Python Unicode characters that represent a single logical > Unicode code point. Part of a convention for representing > 32-bit code points in terms of two 16-bit code points. Python > has limited support for reading, writing and constructing > strings > that use this convention (described below). Otherwise Python > ignores the convention. Good. > > No need to pass this information to the codec: simply write > > a new one and give it a clear name, e.g. "ucs-2" will generate > > errors while "utf-16-le" converts them to surrogates. > > That's a good point, but what if I want a UTF-8 codec that doesn't > generate surrogates? Or even a UCS4 one? With Walter's patch for callback error handlers, you should be able to provide handlers which implement whatever you see fit. I think that codecs should work the same on all platforms and always apply the needed conversion for the platform in question; could be wrong though... it's really only a minor issue. > > Plus perhaps the Mark Davis paper at: > > > > http://www-106.ibm.com/developerworks/unicode/library/utfencodingforms/ > > Okay. > > > > Copyright > > > > > > This document has been placed in the public domain. > > > > Good work, Paul ! > > Thanks for your help. You did help me to clarify many things even though > I argued with you as I was doing it. Thank you for taking the suggestions into account. -- Marc-Andre Lemburg ________________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at pythonware.com Mon Jul 2 21:41:33 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 2 Jul 2001 21:41:33 +0200 Subject: [Python-Dev] Unicode Maintenance References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> <3B3AF307.6496AFB4@lemburg.com> <200106281225.f5SCPIr20874@odiug.digicool.com> <3B40505A.2F03EEC4@lemburg.com> <200107021523.f62FNun01807@odiug.digicool.com> Message-ID: <013101c1032f$022770d0$4ffa42d5@hagrid> guido wrote: > I'm not sure how to get you a "summary of changes", but I think you > can ask Fredrik directly (Martin annonced he's away on vacation). summary: - portability: made unicode object behave properly also if sizeof(Py_UNICODE) > 2 and >= sizeof(long) (FL) - same for unicode codecs and the unicode database (MvL) - base unicode feature selection on unicode defines, not platform (FL) - wrap surrogate handling in #ifdef Py_UNICODE_WIDE (MvL, FL) - tweaked unit tests to work with wide unicode, by replacing explicit surrogates with \U escapes (MvL) - configure options for narrow/wide unicode (MvL) - removed bogus const and register from some scalars (GvR, FL) - default unicode configuration for PC (Tim, FL) - default unicode configuration for Mac (Jack) - added sys.maxunicode (MvL) most changes where really trivial (e.g. ~0xFC00 => 0x3FF). martin's big patch was reviewed and tested by both me and him before checkin (tim managed to check out and build before I'd gotten around to check in my windows tweaks, but that's what makes distributed egoless deve- lopment so fun ;-) From greg at cosc.canterbury.ac.nz Tue Jul 3 02:20:37 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 03 Jul 2001 12:20:37 +1200 (NZST) Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: <03b301c102cf$e0e3dd00$0900a8c0@spiff> Message-ID: <200107030020.MAA00584@s454.cosc.canterbury.ac.nz> Fredrik Lundh : > back in 1995, some people claimed that the image type had > to be made smarter to be usable. But at least you can use more than one depth of image in the same program... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From mal at lemburg.com Tue Jul 3 10:31:50 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 03 Jul 2001 10:31:50 +0200 Subject: [Python-Dev] Unicode Maintenance References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> <3B3AF307.6496AFB4@lemburg.com> <200106281225.f5SCPIr20874@odiug.digicool.com> <3B40505A.2F03EEC4@lemburg.com> <200107021523.f62FNun01807@odiug.digicool.com> <013101c1032f$022770d0$4ffa42d5@hagrid> Message-ID: <3B4182F6.DAC4C1@lemburg.com> Fredrik Lundh wrote: > > guido wrote: > > I'm not sure how to get you a "summary of changes", but I think you > > can ask Fredrik directly (Martin annonced he's away on vacation). > > summary: > > - portability: made unicode object behave properly also if > sizeof(Py_UNICODE) > 2 and >= sizeof(long) (FL) > - same for unicode codecs and the unicode database (MvL) > - base unicode feature selection on unicode defines, not platform (FL) > - wrap surrogate handling in #ifdef Py_UNICODE_WIDE (MvL, FL) > - tweaked unit tests to work with wide unicode, by replacing explicit > surrogates with \U escapes (MvL) > - configure options for narrow/wide unicode (MvL) > - removed bogus const and register from some scalars (GvR, FL) > - default unicode configuration for PC (Tim, FL) > - default unicode configuration for Mac (Jack) > - added sys.maxunicode (MvL) Thank you for the summary. Please let me suggest that for the next coding party you prepare a patch which spans all party checkins and upload that patch with a summary like the above to SF. That way we can keep the documentation of the overall changes in one place and make the process more transparent for everybody. Now let's get on with business... Thanks, -- Marc-Andre Lemburg ________________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at pythonware.com Tue Jul 3 12:21:27 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 3 Jul 2001 12:21:27 +0200 Subject: [Python-Dev] Unicode Maintenance References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> <3B3AF307.6496AFB4@lemburg.com> <200106281225.f5SCPIr20874@odiug.digicool.com> <3B40505A.2F03EEC4@lemburg.com> <200107021523.f62FNun01807@odiug.digicool.com> <013101c1032f$022770d0$4ffa42d5@hagrid> <3B4182F6.DAC4C1@lemburg.com> Message-ID: <05aa01c103a9$ec29e710$0900a8c0@spiff> mal wrote: > Please let me suggest that for the next coding party you prepare a patch > which spans all party checkins and upload that patch with a summary > like the above to SF. That way we can keep the documentation of the overall > changes in one place and make the process more transparent for everybody. Sorry, but as long as Guido wants an open development approach based on collective code ownership (aka "egoless programming"), that's what he gets. The current environment provides several tools to track changes to the code base. The python-checkins list provides instant info on every single change to the code base; the investment to track tha list is a few minutes per day. The CVS history is also easy to access; you can reach it via the viewcvs interface, or from the command line. Using both CVS and SF's patch manager to track development history is a waste of time. A development project manned by volunteers doesn't need bureaucrats; the version control system provides all the accountability we'll ever need. (commercial development projects doesn't need bureaucrats either, and usually don't have them, but that's another story). I'd also argue that using many incremental checkins improves quality -- the smaller a change is, the easier it is to understand, and the more likely it is that also non-experts will notice simple mistakes or portability issues. (I regularily comment on checkin messages that look suspicious codewise, even if I don't know anything about the problem area. I'm even right, sometimes). Reviewing big patches on SF is really hard, even for experts. And every hour a patch sits on sourceforge instead of in the code repository is ten hours less burn-in in a heterogenous testing en- vironment. That's worth a lot. Finally, my experience from this and other projects is that the "visible heartbeat" you get from a continuous flow of checkin messages improves team productivity and team morale. No- thing is more inspiring than seeing others working for a common goal. It's the final product that matters, not who's in charge of what part of it. The end user couldn't care less. I'd prefer if you didn't feel the need to play miniboss on the Python project (I'm sure you have plenty of 'mx' projects that you can use that approach, if you have to). And I'd rather see you at the next party than out there whining over how you missed the last one. Cheers /F From mal at lemburg.com Tue Jul 3 13:30:05 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 03 Jul 2001 13:30:05 +0200 Subject: [Python-Dev] Unicode Maintenance References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> <3B3AF307.6496AFB4@lemburg.com> <200106281225.f5SCPIr20874@odiug.digicool.com> <3B40505A.2F03EEC4@lemburg.com> <200107021523.f62FNun01807@odiug.digicool.com> <013101c1032f$022770d0$4ffa42d5@hagrid> <3B4182F6.DAC4C1@lemburg.com> <05aa01c103a9$ec29e710$0900a8c0@spiff> Message-ID: <3B41ACBD.9FA8FB25@lemburg.com> Fredrik Lundh wrote: > > > Please let me suggest that for the next coding party you prepare a patch > > which spans all party checkins and upload that patch with a summary > > like the above to SF. That way we can keep the documentation of the overall > > changes in one place and make the process more transparent for everybody. > > Sorry, but as long as Guido wants an open development approach > based on collective code ownership (aka "egoless programming"), > that's what he gets. > > The current environment provides several tools to track changes > to the code base. The python-checkins list provides instant info > on every single change to the code base; the investment to track > tha list is a few minutes per day. The CVS history is also easy to > access; you can reach it via the viewcvs interface, or from the > command line. I think you misunderstood my suggestion: I didn't say you can't have a coding party with lots of small checkins, I just suggested that *after* the party someone does a diff before-and-after-the-party.diff and uploads this diff to SF with a description of the overall changes. You simply don't get the big picture from looking at various small checkin messages which are sometimes spread across mutliple files/checkins. > Using both CVS and SF's patch manager to track development history > is a waste of time. A development project manned by volunteers > doesn't need bureaucrats; the version control system provides > all the accountability we'll ever need. > > (commercial development projects doesn't need bureaucrats > either, and usually don't have them, but that's another story). Wasn't talking about bureaucrats... > I'd also argue that using many incremental checkins improves > quality -- the smaller a change is, the easier it is to understand, > and the more likely it is that also non-experts will notice simple > mistakes or portability issues. (I regularily comment on checkin > messages that look suspicious codewise, even if I don't know > anything about the problem area. I'm even right, sometimes). > Reviewing big patches on SF is really hard, even for experts. It's just for keeping a combined record of changes. Following up on dozens of checkins spanning another dozen files using CVS is harder, IMHO, than looking at one single before/after diff. > And every hour a patch sits on sourceforge instead of in the code > repository is ten hours less burn-in in a heterogenous testing en- > vironment. That's worth a lot. Agreed. > Finally, my experience from this and other projects is that the > "visible heartbeat" you get from a continuous flow of checkin > messages improves team productivity and team morale. No- > thing is more inspiring than seeing others working for a common > goal. It's the final product that matters, not who's in charge of > what part of it. The end user couldn't care less. > > I'd prefer if you didn't feel the need to play miniboss on the Python > project (I'm sure you have plenty of 'mx' projects that you can use > that approach, if you have to). I have no intention of playing "miniboss" (I have enough of that being the boss of a small company), I'm just trying to keep the task of a code maintainer manageable; that's all. 'nuff said. > And I'd rather see you at the next > party than out there whining over how you missed the last one. Perhaps you can send around invitations first, before starting the party next time ?! BTW, do you have plans to update the Unicode database to the 3.1 version ? If not, I'll look into this next week. -- Marc-Andre Lemburg ________________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas at xs4all.net Tue Jul 3 13:41:51 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 3 Jul 2001 13:41:51 +0200 Subject: [Python-Dev] CVS Message-ID: <20010703134151.P8098@xs4all.nl> Slightly off-topic, but I've depleted all my other sources :) I'm trying to get CVS to give me all logentries for all checkins in a specific branch (the 2.1.1 branch) so I can pipe it through logmerge. It seems the one thing I'm missing now is a branchpoint tag (which should translate to a revision with an even number of dots, apparently) but 'release21' and 'release21-maint' both don't qualify. Even the usage logmerge suggests (cvs log -rrelease21) doesn't work, gives me a bunch of "no revision elease21' in " warnings and just all logentries for those files. Am I missing something simple, here, or should I hack logmerge to parse the symbolic names, figure out the even-dotted revision for each file from the uneven-dotted branch-tag, and filter out stuff outside that range ? :P -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From gregor at mediasupervision.de Tue Jul 3 14:09:51 2001 From: gregor at mediasupervision.de (Gregor Hoffleit) Date: Tue, 3 Jul 2001 14:09:51 +0200 Subject: [Python-Dev] PEP 250, site-python, site-packages Message-ID: <20010703140951.A27647@mediasupervision.de> PEP 250 talks about adopting site-packages for Windows systems. I'd like to discuss the sitedirs as a whole. Currently, site.py appends the following sitedirs to sys.path: * /lib/python /site-packages * /lib/site-python If exec-prefix is different from prefix, then also * /lib/python /site-packages * /lib/site-python From jepler at mail.inetnebr.com Tue Jul 3 14:38:00 2001 From: jepler at mail.inetnebr.com (Jeff Epler) Date: Tue, 3 Jul 2001 07:38:00 -0500 Subject: [Python-Dev] PEP 250, site-python, site-packages In-Reply-To: <20010703140951.A27647@mediasupervision.de>; from gregor@mediasupervision.de on Tue, Jul 03, 2001 at 02:09:51PM +0200 References: <20010703140951.A27647@mediasupervision.de> Message-ID: <20010703073759.A4972@localhost.localdomain> On Tue, Jul 03, 2001 at 02:09:51PM +0200, Gregor Hoffleit wrote: > Due to Python's good tradition of compatibility, this is the vast > majority of packages; only packages with binary modules necessarily need > to be recompiled anyway for each major new . Aren't there bytecode changes in 1.6, 2.0, and 2.1, compared to 1.5.2? If so, this either means that each version of Python does need a separate copy (for the .pyc/.pyo file), or if all versions are compatible with 1.5.2 bytecodes (and I don't know that they are) then all packages would need to be bytecompiled with 1.5.2. For instance, it appears that between 1.5.2 and 2.1, the UNPACK_LIST and UNPACK_TUPLE bytecode instructions were removed and replaced with a single UNPACK_SEQUENCE opcode. Information gathered by executing: python -c 'import dis for name in dis.opname: if name[0] != "<": print name' | sort -u > opcodes-1.5.2 and similarly for python2. Jeff From tim.one at home.com Sun Jul 1 03:58:29 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 30 Jun 2001 21:58:29 -0400 Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: <3B3E4487.40054EAE@ActiveState.com> Message-ID: [Paul Prescod] > "The Energy is the mass of the object times the speed of light times > two." [David Ascher] > Actually, it's "squared", not times two. At least in my universe =) This is something for Guido to Pronounce on, then. Who's going to write the PEP? The threat of nuclear war seems almost laughable in Paul's universe, so it's certainly got attractions. OTOH, it's got to be a lot colder too. energy-will-do-what-guido-tells-it-to-do-ly y'rs - tim
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4