A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2010-June.txt below:

tag. Similarly, older English had a significantly different glyph for 's', which looks more like a modern 'f'. If IBM's EBCDIC had codes for these glyph variants, IBM might have insisted that unicode also have such so char for char round-tripping would be possible. It does not and unicode does not. (Wordstar and other 1980s editor publishers were mostly defunct or weak and not in a position to make such demands.) If one wants to write on the history of glyph evolution, say of latin chars, one much either number the variants 'e-0', 'e-1', etc, or resort to the user area. In either case, proprietary software would be needed to actually print the variations with other text. > I know, it's a hard thing to wrap > one's head around, since on the surface it sounds like unicode is the > programmer's savior. Unfortunately, real-world text data exists which > cannot be safely roundtripped to unicode, I do not believe that. Digital information can always be recoded one way or another. As it is, the rules were bent for Japanese, in a way that they were not for English, to aid round-tripping of the major public encodings. I can, however, believe that there were private encodings for which round-tripping is more difficult. But there are also difficulties for old proprietary and even private English encodings. -- Terry Jan Reedy From stephen at xemacs.org Tue Jun 22 04:06:36 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 11:06:36 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621160105.25ae602f@heresy> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <871vc045sl.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621160105.25ae602f@heresy> Message-ID: <87sk4f3jo3.fsf@uwakimon.sk.tsukuba.ac.jp> Barry Warsaw writes: > I'm still not sure ebytes solves the problem, I don't see how it can. If you have an encoding to stuff into ebytes, you could just convert to Unicode and guarantee that all internal string operations will succeed. If you use ebytes instead, every string operation has to be wrapped in "try ... except EBytesError", to no gain that I can see. If you don't have an encoding, then you just have bytes, which strictly speaking shouldn't be operated on (in the sense of slicing, dicing, or stir-frying) at all if you're in an environment where they are a carrier for formatted information such as non-ASCII characters or PNG images. > but it avoids one I'm most concerned about seeing proposed. I > really really do not want to add encoding=blah arguments to > boatloads of function signatures. Agreed. But ebytes isn't a solution to that; it's a regression to one of the hardest problems in Python 2. OTOH, it seems to me that there's only one boatload to worry about. That's the boatload containing protocol-less APIs, ie, Unix OS data (names in the filesystem, content of environment variables). Other platforms (Windows, Mac) are standardizing on protocols for these things and enforcing them in the OS, and free Unices are going to the convention that everything is non-normalized UTF-8. What other boats are you worried about? From alexander.belopolsky at gmail.com Tue Jun 22 04:21:36 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 21 Jun 2010 22:21:36 -0400 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <1180.1277170019@parc.com> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE030.7020700@voidspace.org.uk> <1180.1277170019@parc.com> Message-ID: On Mon, Jun 21, 2010 at 9:26 PM, Bill Janssen wrote: .. > Though, isn't that behavior of urllib.proxy_bypass another bug? I don't know. Ask Ronald. From stephen at xemacs.org Tue Jun 22 04:58:57 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 11:58:57 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100621165611.GW5787@unaka.lan> References: <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> Message-ID: <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> Toshio Kuratomi writes: > One comment here -- you can also have uri's that aren't decodable into their > true textual meaning using a single encoding. > > Apache will happily serve out uris that have utf-8, shift-jis, and > euc-jp components inside of their path but the textual > representation that was intended will be garbled (or be represented > by escaped byte sequences). For that matter, apache will serve > requests that have no true textual representation as it is working > on the byte level rather than the character level. Sure. I've never seen that combination, but I have seen Shift JIS and KOI8-R in the same path. But in that case, just using 'latin-1' as the encoding allows you to use the (unicode) string operations internally, and then spew your mess out into the world for someone else to clean up, just as using bytes would. > So a complete solution really should allow the programmer to pass > in uris as bytes when the programmer knows that they need it. Other than passing bytes into a constructor, I would argue if a complete solution requires, eg, an interface that allows urljoin(base,subdir) where the types of base and subdir are not required to match, then it doesn't belong in the stdlib. For stdlib usage, that's premature optimization IMO. The RFC says that URIs are text, and therefore they can (and IMO should) be operated on as text in the stdlib. It's not just a matter of manipulating the URIs themselves, where working directly on bytes will work just as well and and with the same string operations (as long as everything is bytes). It's also a question of API complexity (eg, Barry's bugaboo of proliferation of encoding= parameters) and of debugging (if URIs are internally str, then they will display sanely in tracebacks and the interpreter). The cases where URIs can't be sanely treated as text are garbage input, and the stdlib should not try to provide a solution. Just passing in bytes and getting out bytes is GIGO. Trying to do "some" error-checking is going to be insufficient much of the time and overly strict most of the rest of the time. The programmer in the trenches is going to need to decide what to allow and what not; I don't think there are general answers because we know that allowing random URLs on the web leads to various kinds of problems. Some sites will need to address some of them. Note also that the "complete solution" argument cuts both ways. Eg, a "complete" solution should implement UTS 39 "confusables detection"[1] and IDNA[2]. Good luck doing that with bytes! If you *need* bytes (rather than simply trying to avoid conversion overhead), you're in a hazmat handling situation. Passing bytes in to stdlib APIs here is the equivalent of carrying around kilograms of fissionables in an open bucket. While the Tokaimura comparison is hyperbole, it can't be denied that use of bytes here shortcuts a lot of processing strongly suggested by the RFCs, and prevents use of various programming conveniences (such as reasonable display of URI values in debugging). Does the efficiency really justify including that in the stdlib? I dunno, I'm not a web programmer in the trenches. But I take my cue from MvL and MAL who don't seem real enthusiastic about this. And as Martin says, there is as yet no evidence offered that the overhead of conversion is a general problem. Footnotes: [1] http://www.unicode.org/reports/tr39/ [2] http://www.rfc-editor.org/rfc/rfc3490.txt From stephen at xemacs.org Tue Jun 22 06:15:19 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 13:15:19 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87ocf33dpk.fsf@uwakimon.sk.tsukuba.ac.jp> Robert Collins writes: > Perhaps you mean 3986 ? :) Thank you for the correction. > > ? ?A URI is an identifier consisting of a sequence of characters > > ? ?matching the syntax rule named in Section 3. > > > > (where the phrase "sequence of characters" appears in all ancestors I > > found back to RFC 1738), and > > Sure, ok, let me unpack what I meant just a little. An abstract URI is > neither unicode nor bytes per se - see section 1.2.1 " A URI is a > sequence of characters from a very limited set: the letters of the > basic Latin alphabet, digits, and a few special characters. " My position is that this describes the network protocol, not the abstract URI. It in no way suggests that uri-encoded forms should be handled internally. And the RFC explicitly says this is text, and therefore sanctions the user- and programmer-friendly practice of doing internal processing as text. Note that in a hypothetical bytes-oriented API base = convert_uri_to_wire_format('http://www.example.org/') formuri = uri_join(base,b'home/steve/public_html') the bytes literal b'/home/steve/public_html' clearly is intended as readable text. This is mixing types in the programmer's mind, even though base is internally in bytes format and the relative URI is also in bytes format. This is un-Pythonic IMO. > URI interpretation is fairly strictly separated between producers and > consumers. A consumer can manipulate a url with other url fragments - > e.g. doing urljoin. But it needs to keep the url as a url and not try > to decode it to a unicode representation. -------------- next part -------------- Unfortunately, outside of Kansas and Canberra, it don't work that way. How do you propose to uri_join base as above and '/home/?????/public_html'? Encoding and/or decoding must be done somewhere, and it would be damn unfriendly to make the browser user do it! In the bytes-oriented API, the programmer must be continually making decisions about whether and how to handle non-ASCII components from "outside" (or, more likely, cursing the existence of the damned foreigners, and then ignoring the possibility ... let them eat UnicodeException!) -------------- next part -------------- > As an example, if I give the uri "http://server/%c3%83", rendering > that as http://server/? is able to lead to transcription errors and > reinterpretation problems unless you know - out of band - that the > server is using utf8 to encode. Conversely if someone enters in > http://server/? in their browser window, choosing utf8 or their local > encoding is quite arbitrary and able to not match how the server would > represent that resource. Sure. Using bytes doesn't solve either problem. It just allows you to wash your hands of it and pass it on to someone else, who probably has even less information than you do. Eg, in the case of passing the uri "http://server/%c3%83" to someone else without telling them the encoding means that effectively they're limited to ASCII if they want to append meaningful relative paths without guessing the encoding. In the case of the user entering "http://server/?", you have to do *something* to produce bytes eventually. When was the last time you typed "%c3%83" at the end of a URL in a browser address field? > > ? ?2. ?Characters > > > > ? ?The URI syntax provides a method of encoding data, presumably for > > ? ?the sake of identifying a resource, as a sequence of characters. > > ? ?The URI characters are, in turn, frequently encoded as octets for > > ? ?transport or presentation. ?This specification does not mandate any > > ? ?particular character encoding for mapping between URI characters > > ? ?and the octets used to store or transmit those characters. ?When a > > ? ?URI appears in a protocol element, the character encoding is > > ? ?defined by that protocol; without such a definition, a URI is > > ? ?assumed to be in the same character encoding as the surrounding > > ? ?text. > > Thats true, but its been taken out of context; the set of characters > permitted in a URL is a strict subset of characters found in ASCII; No. Again, you're confounding "the URL" with its network format. There's no question that the network format is in bytes, and before putting the URI into a wire protocol, you need to encode non-URI characters. However, the abstract URI is text, and may not even be represented by octets or Unicode at all (eg, represented by carbon residue on recycled wood pulp). > See also the section on comparing URL's - Unicode isn't at all relevant. Not to the RFC, which talks about *characters* and gives examples that imply transcoding (eg, between EBCDIC and UTF-16), see the section you cite. However, Unicode is the canonical representation of text inside Python, and therefore TOOWTDI for URL comparison in Python. Thank you for that killer argument for my position; I hadn't thought of it. > I wish it would. The problem is not in Python here though - and > casually handwaving will exacerbate it, not fix it. Using bytes "because we just don't know" is exactly casual handwaving. Well, maybe not casual; I'm aware that many programmers are driven to it by the recognition that only the extremes (all bytes vs. all text) make sense, and they choose bytes for efficiency reasons. I believe that focus on efficiency is un-Pythonic; that in Python 3 text should be chosen (in the stdlib) because it makes writing programs more fun (you can use literal notation for non-ASCII string constants, for example) and debuggable. Sure, in some cases you'll need to punt to 'latin-1' (ie, 'binary') or perhaps PEP 383 lone surrogates (this would require special handling to get reasonably friendly presentation to users and debuggers, I suppose), but for the many cases where you know that everything is in the same encoding life is a lot better. And of course I have no objection to an additional API for efficiency for those who want it, and maybe that even belongs in the stdlib. But IMO the TOOWTDI should use text (ie, Python 3 str = Unicode) by default. > Modelling URL's as string like things is great from a convenience > perspective, but, like file paths, they are much more complex > difficult. No. Like file paths, it is the key to any real solution to the problem. Users, both server admins, URN specifiers, and browsers, think about the URI as text and expect inputting text to work. As does the RFC. Machines, on the other hand, think of both as bytes (at least in the general Unix world). It is the programmer's job to do the best she can to identify the correct encoding to bridge the mismatch. She can abdicate that job, of course, but if she chooses *not* to abdicate, (1) treating the URI as text encourages her to confront the issue early, and (2) ensures that to the extent possible the URI will maintain its quality of intelligible text. With bytes, your only sane choice is to abdicate. N.B. STD 66 refrains from redefining HTTP URLs to be UTF-8 because *it would not work*. Practically, Nippon Tel & Tel will continue to use Shift JIS URIs for cellphone-oriented sites because its handset browsers only understand Shift JIS (or some such nonsense). > If Unicode was relevant to HTTP, Again, Unicode is relevant not because of the wire protocols, but because of Python's and because of the intent of the RFCs. > I'd agree, but its not; we should put fragile heuristics at the > outer layer of the API and work as robustly and mechanically as > possible at the core. Where we need to guess, we need worker > functions that won't guess at all - for the sanity of folk writing > servers and protocol implementations. A worker function that doesn't guess must error in the absence of out-of-band information about the encoding. This is true whether you represent URIs internally as bytes or as text. Refusing to error constitutes a guess, because in a bytes-internal system, eventually text from outside will find its way into the system, and must be encoded to bytes, and in the case of a text-internal system, obviously bytes from outside are coming in and must be decoded to text. From stephen at xemacs.org Tue Jun 22 07:17:10 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 14:17:10 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621191432.710993A404D@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <871vc045sl.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621191432.710993A404D@sparrow.telecommunity.com> Message-ID: <87mxun3auh.fsf@uwakimon.sk.tsukuba.ac.jp> P.J. Eby writes: > In Kagoshima, you'd use pass in an ebytes with your encoding to a > stdlib API, and *get back an ebytes with the right encoding*, > rather than an (incorrect and useless) unicode object which has > lost data you need. How does the stdlib do that? Unless it guesses which encoding for Japanese is being used? And even if this ebytes uses Shift JIS, what makes that the "right" encoding for anything? On the other hand, I know when *I* need some encoding, and when I figure it out I will store it in an appropriate place in my program. The problem is that for some programs it is not unlikely that I will see all of Shift JIS, EUC-JP, ISO-2022-JP, UTF-8, and UTF-16, and on a very bad day, RFC 2047, GB 2312, and Big5, too, used to encode Japanese. It's not totally unlikely for a browser to send URLs to a server expecting UTF-8 to recover a message/rfc822 object containing ISO-2022-JP in the mail header and EUC-JP in the body. So I need to know which encoding was used by the server that sent the reply, but the ebytes can't tell me that if it fishes an URL in EUC-JP out of the message body. I need to convert that URL to UTF-8, or most servers will 404. > But this is not the case at all, for use cases where "no, really, you > *have to* work with bytes-encoded text streams". The mere release of > Python 3.x will not cause all the world's applications, libraries, > and protocols to suddenly work with unicode, where they did not before. Sure. That's what .encode() and .decode() are for. The problem is what to do when you don't know what to put in the parentheses, and I can't think of a use case offhand where ebytes(stuff,'garbage') does better than PEP 383-enabled str for: > Being explicit about the encoding of the bytes you're flinging > around is actually an *increase* in specificity, explicitness, > robustness, and error-checking ability over the status quo for > either 2.x *or* 3.x... *and* it improves these qualities for > essentially *all* string-handling code, without requiring that code > to be rewritten to do so. A well-spoken piece. But, you see, most of those encodings are *only* interesting so that you can transcode characters to the encoding of interest. What's the e.o.i.? That is easily found in the context or has an obvious default, if you're lucky, or otherwise a hard problem that ebytes does nothing to help solve as far as I can see. Cf. Robert Collins' post , where he makes it quite explicit that a bytes interface is all about punting in the face of missing encoding information. > >and (2) you really want this under control of higher level objects > >that have access to some knowledge of the environment, rather than > >the lowest level. > > This proposal actually has such a higher-level object: an > ebytes. I don't see how that can be true. An ebytes is a very low-level object that has no idea whether its encoding is interesting (eg, the one that an RFC or a server specifies), or a technical detail of use only until the ebytes is decoded, then can be thrown away. I just don't see, in the case where there is a real encoding in the ebytes, what harm is done by decoding the ebytes to str. If context indicates that the encoding is an interesting one (eg, it should be the default for encoding on output), then you want to save that in an appropriate place that preserves not just the encoding itself, but the context that gives it its importance. From glyph at twistedmatrix.com Tue Jun 22 07:22:22 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 22 Jun 2010 01:22:22 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100621181750.267933A404D@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <20100621023005.EE17E3A4099@sparrow.telecommunity.com> <20100621164650.16A093A414B@sparrow.telecommunity.com> <20100621181750.267933A404D@sparrow.telecommunity.com> Message-ID: On Jun 21, 2010, at 2:17 PM, P.J. Eby wrote: > One issue I remember from my "enterprise" days is some of the Asian-language developers at NTT/Verio explaining to me that unicode doesn't actually solve certain issues -- that there are use cases where you really *do* need "bytes plus encoding" in order to properly express something. The thing that I have heard in passing from a couple of folks with experience in this area is that some older software in asia would present characters differently if they were originally encoded in a "japanese" encoding versus a "chinese" encoding, even though they were really "the same" characters. I do know that Han Unification is a giant political mess ( makes for some interesting reading), but my understanding is that it has handled enough of the cases by now that one can write software to display asian languages and it will basically work with a modern version of unicode. (And of course, there's always the private use area, as Stephen Turnbull pointed out.) Regardless, this is another example where keeping around a string isn't really enough. If you need to display a japanese character in a distinct way because you are operating in the japanese *script*, you need a tag surrounding your data that is a hint to its presentation. The fact that these presentation hints were sometimes determined by their encoding is an unfortunate historical accident. -------------- next part -------------- An HTML attachment was scrubbed... URL: From glyph at twistedmatrix.com Tue Jun 22 07:31:16 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 22 Jun 2010 01:31:16 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <39CFC9B3-E55A-41BB-9718-1457E20ACECC@twistedmatrix.com> On Jun 21, 2010, at 10:58 PM, Stephen J. Turnbull wrote: > The RFC says that URIs are text, and therefore they can (and IMO > should) be operated on as text in the stdlib. No, *blue* is the best color for a shed. Oops, wait, let me try that again. While I broadly agree with this statement, it is really an oversimplification. An URI is a structured object, with many different parts, which are transformed from bytes to ASCII (or something latin1-ish, which is really just bytes with a nice face on them) to real, honest-to-goodness text via the IRI specification: . > Note also that the "complete solution" argument cuts both ways. Eg, a > "complete" solution should implement UTS 39 "confusables detection"[1] > and IDNA[2]. Good luck doing that with bytes! And good luck doing that with just characters, too. You need a parsed representation of the URI that you can encode different parts of in different ways. (My understanding is that you should only really implement confusables detection in the netloc... while that may be a bogus example, you're certainly only supposed to do IDNA in the netloc!) You can just call urlsplit() all over the place to emulate this, but this does not give you the ability to go back to the original bytes, and thereby preserve things like brokenly-encoded segments, which seems to be what a lot of this hand-wringing is about. To put it another way, there is no possible information-preserving string or bytes type that will make everyone happy as a result from urljoin(). The only return-type that gives you *everything* is "URI". > just using 'latin-1' as the encoding allows you to > use the (unicode) string operations internally, and then spew your > mess out into the world for someone else to clean up, just as using > bytes would. This is the limitation that everyone seems to keep dancing around. If you are using the stdlib, with functions that operate on sequences like 'str' or 'bytes', you need to choose from one of three options: 1. "decode" everything to latin1 (although I prefer to call it "charmap" when used in this way) so that you can have some mojibake that will fool a function that needs a unicode object, but not lose any information about your input so that it can be transformed back into exact bytes (and be very careful to never pass it somewhere that it will interact with real text!), 2. actually decode things to an appropriate encoding to be displayed to the user and manipulated with proper text-manipulation tools, and throw away information about the bytes, 3. keep both the bytes and the characters together (perhaps in a data structure) so that you can both display the data and encode it in situationally-appropriate ways. The stdlib as it is today is not going to handle the 3rd case for anyone. I think that's fine; it is not the stdlib's job to solve everyone's problems. I've been happy with it providing correctly-functioning pieces that can be used to build more elaborate solutions. This is what I meant when I said I agree with Stephen's first point: the stdlib *should* just keep operating entirely on strings, because URIs are defined, by the spec, to be sequences of ASCII characters. But that's not the whole story. PJE's "bstr" and "ebytes" proposals set my teeth on edge. I can totally understand the motivation for them, but I think it would be a big step backwards for python 3 to succumb to that temptation, even in the form of a third-party library. It is really trying to cram more information into a pile of bytes than truly exists there. (Also, if we're going to have encodings attached to bytes objects, I would very much like to add "JPEG" and "FLAC" to the list of possibilities.) The real tension there is that WSGI is desperately trying to avoid defining any data structures (i.e. classes), while still trying to work with structured data. An URI class with a 'child' method could handily solve this problem. You could happily call IRI(...).join(some bytes).join(some text) and then just say "give me some bytes, it's time to put this on the network", or "give me some characters, I have to show something to the user", or even "give me some characters appropriate for an 'href=' target in some HTML I'm generating" - although that last one could be left to the HTML generator, provided it could get enough information from the URI/IRI object's various parts itself. I don't mean to pick on WSGI, either. This is a common pain-point for porting software to 3.x - you had a string, it kinda worked most of the time before, but now you need to keep track of text too and the functions which seemed to work on bytes no longer do. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Jun 22 07:28:57 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 14:28:57 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> Message-ID: <87lja73aau.fsf@uwakimon.sk.tsukuba.ac.jp> Michael Urman writes: > It is somewhat troublesome that there doesn't appear to be an obvious > built-in idempotent-when-possible function that gives back the > provided bytes/str, If you want something idempotent, it's already the case that bytes(b'abc') => b'abc'. What might be desirable is to make bytes('abc') work and return b'abc', but only if 'abc' is pure ASCII (or maybe ISO 8859/1). Unfortunately, str(b'abc') already does work, but steve at uwakimon ~ $ python3.1 Python 3.1.2 (release31-maint, May 12 2010, 20:15:06) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> str(b'abc') "b'abc'" >>> Oops. You can see why that probably "should" be the case. From a.badger at gmail.com Tue Jun 22 07:50:40 2010 From: a.badger at gmail.com (Toshio Kuratomi) Date: Tue, 22 Jun 2010 01:50:40 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100622055040.GE5787@unaka.lan> On Tue, Jun 22, 2010 at 11:58:57AM +0900, Stephen J. Turnbull wrote: > Toshio Kuratomi writes: > > > One comment here -- you can also have uri's that aren't decodable into their > > true textual meaning using a single encoding. > > > > Apache will happily serve out uris that have utf-8, shift-jis, and > > euc-jp components inside of their path but the textual > > representation that was intended will be garbled (or be represented > > by escaped byte sequences). For that matter, apache will serve > > requests that have no true textual representation as it is working > > on the byte level rather than the character level. > > Sure. I've never seen that combination, but I have seen Shift JIS and > KOI8-R in the same path. > > But in that case, just using 'latin-1' as the encoding allows you to > use the (unicode) string operations internally, and then spew your > mess out into the world for someone else to clean up, just as using > bytes would. > This is true. I'm giving this as a real-world counter example to the assertion that URIs are "text". In fact, I think you're confusing things a little by asserting that the RFC says that URIs are text. I'll address that in two sections down. > > So a complete solution really should allow the programmer to pass > > in uris as bytes when the programmer knows that they need it. > > Other than passing bytes into a constructor, I would argue if a > complete solution requires, eg, an interface that allows > urljoin(base,subdir) where the types of base and subdir are not > required to match, then it doesn't belong in the stdlib. For stdlib > usage, that's premature optimization IMO. > I'll definitely buy that. Would urljoin(b_base, b_subdir) => bytes and urljoin(u_base, u_subdir) => unicode be acceptable though? (I think, given other options, I'd rather see two separate functions, though. It seems more discoverable and less prone to taking bad input some of the time to have two functions that clearly only take one type of data apiece.) > The RFC says that URIs are text, and therefore they can (and IMO > should) be operated on as text in the stdlib. If I'm reading the RFC correctly, you're actually operating on two different levels here. Here's the section 2 that you quoted earlier, now in its entirety:: 2. Characters The URI syntax provides a method of encoding data, presumably for the sake of identifying a resource, as a sequence of characters. The URI characters are, in turn, frequently encoded as octets for transport or presentation. This specification does not mandate any particular character encoding for mapping between URI characters and the octets used to store or transmit those characters. When a URI appears in a protocol element, the character encoding is defined by that protocol; without such a definition, a URI is assumed to be in the same character encoding as the surrounding text. The ABNF notation defines its terminal values to be non-negative integers (codepoints) based on the US-ASCII coded character set [ASCII]. Because a URI is a sequence of characters, we must invert that relation in order to understand the URI syntax. Therefore, the integer values used by the ABNF must be mapped back to their corresponding characters via US-ASCII in order to complete the syntax rules. A URI is composed from a limited set of characters consisting of digits, letters, and a few graphic symbols. A reserved subset of those characters may be used to delimit syntax components within a URI while the remaining characters, including both the unreserved set and those reserved characters not acting as delimiters, define each component's identifying data. So here's some data that matches those terms up to actual steps in the process:: # We start off with some arbitrary data that defines a resource. This is # not necessarily text. It's the data from the first sentence: data = b"\xff\xf0\xef\xe0" # We encode that into text and combine it with the scheme and host to form # a complete uri. This is the "URI characters" mentioned in section #2. # It's also the "sequence of characters mentioned in 1.1" as it is not # until this point that we actually have a URI. uri = b"http://host/" + percentencoded(data) # # Note1: percentencoded() needs to take any bytes or characters outside of # the characters listed in section 2.3 (ALPHA / DIGIT / "-" / "." / "_" # / "~") and percent encode them. The URI can only consist of characters # from this set and the reserved character set (2.2). # # Note2: in this simplistic example, we're only dealing with one piece of # data. With multiple pieces, we'd need to combine them with separators, # for instance like this: # uri = b'http://host/' + percentencoded(data1) + b'/' # + percentencoded(data2) # # Note3: at this point, the uri could be stored as unicode or bytes in # python3. It doesn't matter. It will be a subset of ASCII in either # case. # Then we take this and encode it for presentation inside of a data # file. If we're saving in any encoding that has ASCII as a subset and we # had bytes returned from the previous step, all we need to do is save to # a file. If we had unicode from the previous step, we need to transform # to the encoding we're using and output it. u_uri.encode('utf8') With all this in mind... URIs are text according to the RFC if you want to deal with URIs that are percent encoded. In other words, things like this:: http://host/%ff%f0%ef%e0 If you want to deal with things like this:: http://host/caf? Then you are going one step further; back to the orginal data that was encoded in the RFC. At that point you are no longer dealing with the sequence of characters talked about in the RFC. You are dealing with data which may or may not be text. As Robert Collins says, this is bytes by definition which I pretty much agree with. It's very very convenient to work with this data as text most of the time but the RFC does not mandate that it is text so operating on it as bytes is perfectly reasonable. > It's not just a matter > of manipulating the URIs themselves, where working directly on bytes > will work just as well and and with the same string operations (as > long as everything is bytes). It's also a question of API complexity > (eg, Barry's bugaboo of proliferation of encoding= parameters) and of > debugging (if URIs are internally str, then they will display sanely > in tracebacks and the interpreter). The proliferation of encoding I agree is a thing that is ugly. Although, if I'm thinking correctly, that only matters when you want to allow mixing bytes and unicode, correct? One of these cases: * I take in some mix of parameters with at least one unicode and output bytes * I take in some mix of parameters with at least one bytes and output unicode * I take in either bytes or unicode and transform them internally to the other type before operating on them. Then I transform the output to the input type before returning. For debugging, I'm either not understanding or you're wrong. If I'm given an arbitrary sequence of bytes how do I sanely store them as str internally? If I transform them using an encoding that anticipates the full range of bytes I may be able to display some representation of them but it's not necessarily the sanest method of display (for instance, if I know that path element 1 is always going to be a utf8 encoded string and path element 2 is always shift-jis encoded, and path element 3 is binary data, I could construct a much saner display method than treating the whole thing as latin1). > The cases where URIs can't be sanely treated as text are garbage > input, and the stdlib should not try to provide a solution. Just > passing in bytes and getting out bytes is GIGO. Trying to do "some" > error-checking is going to be insufficient much of the time and overly > strict most of the rest of the time. The programmer in the trenches > is going to need to decide what to allow and what not; I don't think > there are general answers because we know that allowing random URLs on > the web leads to various kinds of problems. Some sites will need to > address some of them. > What is your basis for asserting that URIs that aren't sanely treated as text are garbage? It's definitely not in the RFC. > Note also that the "complete solution" argument cuts both ways. Eg, a > "complete" solution should implement UTS 39 "confusables detection"[1] > and IDNA[2]. Good luck doing that with bytes! > Note that IDNA and confusables detection operate on a different portion of the uri than the need for bytes. Those operate on the domain name (looks like it's called the authority in the rfc) whereas bytes are useful for the path, query, and fragment portions. Note: I'm not sure precisely what Philip is looking to do but the little I've read sounds like its contrary to the design principles of the python3 unicode handling redesign. I'm stating my reading of the RFC not to defend the use case Philip has, but because I think that the outlook that non-text uris (before being percentencoded) are violations of the RFC is wrong and will lead to interoperability problems/warts(since you could turn them into latin1 and from there into bytes and from there into the proper values) if allowed to predominate the thinking. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From raymond.hettinger at gmail.com Tue Jun 22 08:21:51 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 21 Jun 2010 23:21:51 -0700 Subject: [Python-Dev] UserDict in 2.7 Message-ID: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> There's an entry in whatsnew for 2.7 to the effect of "The UserDict class is now a new-style class". I had thought there was a conscious decision to not change any existing classes from old-style to new-style. IIRC, Martin had championed this idea and had rejected all of proposals to make existing classes inherit from object. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronaldoussoren at mac.com Tue Jun 22 08:39:19 2010 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 22 Jun 2010 08:39:19 +0200 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <1277151926.3369.6.camel@localhost.localdomain> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <1277151926.3369.6.camel@localhost.localdomain> Message-ID: <30F79991-F933-44C6-A884-5A8D5671DB8C@mac.com> On 21 Jun, 2010, at 22:25, Antoine Pitrou wrote: > Le lundi 21 juin 2010 ? 21:13 +0100, Michael Foord a ?crit : >> >> If OS X is a supported and important platform for Python then fixing all >> problems that it reveals (or being willing to) should definitely not be >> a pre-requisite of providing a buildbot (which is already a service to >> the Python developer community). Fixing bugs / failures revealed by >> Bill's buildbot is not fixing them "for Bill" it is fixing them for Python. > > I didn't say it was a prerequisite. I was merely pointing out that when > platform-specific bugs appear, people using the specific platform should > be helping if they want to actually encourage the fixing of these bugs. > > OS X is only "a supported and important platform" if we have dedicated > core developers diagnosing or even fixing issues for it (like we > obviously have for Windows and Linux). Otherwise, I don't think we have > any moral obligation to support it. I look into and fix OSX issues, but do so in my spare time. This means it can take a while until I get around doing so. Ronald P.S. Please file bugs for issues on OSX and set the compontent to Macintosh instead of discussing them on python-dev. I don't read python-dev on a daily basis almost missed this thread. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3567 bytes Desc: not available URL: From raymond.hettinger at gmail.com Tue Jun 22 08:47:46 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 21 Jun 2010 23:47:46 -0700 Subject: [Python-Dev] bytes / unicode In-Reply-To: <39CFC9B3-E55A-41BB-9718-1457E20ACECC@twistedmatrix.com> References: <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <39CFC9B3-E55A-41BB-9718-1457E20ACECC@twistedmatrix.com> Message-ID: <89AE7ED6-FB94-45DA-9432-7FCBA25A56BF@gmail.com> On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote: > This is a common pain-point for porting software to 3.x - you had a string, it kinda worked most of the time before, but now you need to keep track of text too and the functions which seemed to work on bytes no longer do. Thanks Glyph. That is a nice summary of one kind of challenge facing programmers. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Jun 22 08:49:01 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 15:49:01 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621184700.BAD7F3A404D@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <8739wg469t.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621184700.BAD7F3A404D@sparrow.telecommunity.com> Message-ID: <87k4pr36le.fsf@uwakimon.sk.tsukuba.ac.jp> P.J. Eby writes: > I know, it's a hard thing to wrap one's head around, since on the > surface it sounds like unicode is the programmer's savior. I don't need to wrap my head around it. It's been deeply embedded, point first, and the nasty barbs ensure that I have no desire to pull it back out. To wit, I've been dealing with Japanese encoding issues on a daily basis for 20 years, and I'm well aware that programmers have several good reasons (and a lot more bad ones) for avoiding them, and even for avoiding Unicode when they must deal with encodings at all. I don't think any of the good reasons have been offered here yet, that's all. > Unfortunately, real-world text data exists which cannot be safely > roundtripped to unicode, and must be handled in "bytes with > encoding" form for certain operations. Or "Unicode with encoding" form. See below for why this makes sense in the context of Python. > I personally do not have to deal with this *particular* use case any > more -- I haven't been at NTT/Verio for six years now. As mentioned, I have a bit of understanding of the specific problems of Japanese-language computing. In particular, roundtripping Japanese from *any* encoding to *any other* encoding is problematic, because the national standards provide a proper subset of the repertoire actually used by the Japanese people. (Even JIS X 0213.) > My current needs are simpler, thank goodness. ;-) However, they > *do* involve situations where I'm dealing with *other* > encoding-restricted legacy systems, such as software for interfacing > with the US Postal Service that only works with a restricted subset > of latin1, while receiving mangled ASCII from an ecommerce provider, > and storing things in what's effectively a latin-1 database. Yes, I know of similar issues in other applications. For example, TeX error messages do not respect UTF-8 character boundaries, so Emacs has to handle them specially (basically a mechanism similar in spirit to PEP 383 is used). > Being able to easily assert what kind of bytes I've got would > actually let me catch errors sooner, *if* those assertions were > being checked when different kinds of strings or bytes were being > combined. i.e., at coercion time). I see that this would make life a little easier for you in maintaining without refactoring. I'd say it's a kludge, but without a full list of requirements I'm in no position to claim any authority . Eg, for a non-kludgey suggestion, how about defining a codec which takes Latin-1 bytes, checks (with error on failure) for the restricted subset, and converts to str? Then you can manipulate these things as str with abandon internally. Finally you get another check in the outgoing codec which converts from str to "effective Latin-1 bytes", however that is defined. But OK, maybe I'm just being naive. You need this unlovely artifice so you can put in asserts in appropriate places. Now, does it belong in the stdlib? It seems to me that in the case of Japanese roundtripping, *most* of the time encoding back to a standard Japanese encoding will work. If you run into one of the problematic characters that JIS doesn't allow but Japanese like to use because they prefer the glyph to the JIS-standard glyph, you get an occasional error on encoding to a standard Japanese encoding, which you handle specially with a database of such characters. Knowing the specific encoding originally used *normally does not help unless you're replying to that person and **only** that person*, because the extended repertoires vary widely and the only standard is Japanese. I conclude ebytes does *no* good here. For the ecommerce/USPS case, well, actually you need special-purpose encodings anyway (ISTM). 'latin-1' loses, the USPS is allergic to some valid 'latin-1' characters. 'ascii' loses, apparently you need some of the Latin-1 repertoire, and anyway AIUI the ecommerce provider munges the ASCII. So what does ebytes actually buy you here, unless you write the codecs? If you've got the codecs, what additional benefit do you get from ebytes? Note that you would *also* need to do explicit transcoding anyway if you were dealing with Japan Post instead of the USPS, although I grant your code is probably general enough to deal with Deutsche Telecom (but the German equivalent of your ecommerce provider probably has its own ways of munging Latin-1). I conclude that there may be genuine benefits to ebytes here, but they're probably not general enough to put in the stdlib (or the Python language). > Which works if and only if your outputs are truly unicode-able. With PEP 383, they always are, as long as you allow Unicode to be decoded to the same garbage your bytes-based program would have produced anyway. > If you work with legacy systems (e.g. those Asian email clients and > US postal software), you are really working with a *character set*, > not unicode, I think you're missing something. Namely, Unicode is a standard for handling character objects as integers, and a registry for mapping characters to integers. It includes over 100,000 points for making up your own mappings, and recent Python also provides (as an internal extension) for embedding non-characters in a str. Unicode does not define a repertoire, however. That's up to the application, and Python 2+ provides a convenient way to restrict repertoires by defining special purpose codecs in Python. It is then up to the program to ensure that all candidates claiming to be text pass through the cleansing fire of a codec before being allowed into the Pure Land of str. This can be something of a problem; there are a few ways for textual data to get into Python, and not all of them were obvious to me. But this problem would be even worse for mechanisms like ebytes, where it's up to the programmer to decide which things are put into ebytes. > and so putting your data in unicode form is actually *wrong* > -- an expedient lie. > > Heresy, I know, but there you go. ;-) It's not heresy, it's simply assuming a restriction on use of Unicode that just isn't true. It *is* true that mapping the data to Unicode according to some encoding is not always sufficient. It *is* often the case that further information must be provided to ensure semantic correctness. However, given the mapping (== properly defined codecs), roundtripping *is* always possible, at least up to the size of private space, which is big enough to hold the Post Office's repertoire, for sure. And that mapping is a Python object which will fit into a variable for later use. From stephen at xemacs.org Tue Jun 22 09:33:53 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 16:33:53 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: <39CFC9B3-E55A-41BB-9718-1457E20ACECC@twistedmatrix.com> References: <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <39CFC9B3-E55A-41BB-9718-1457E20ACECC@twistedmatrix.com> Message-ID: <87hbkv34im.fsf@uwakimon.sk.tsukuba.ac.jp> Glyph Lefkowitz writes: > On Jun 21, 2010, at 10:58 PM, Stephen J. Turnbull wrote: > > Note also that the "complete solution" argument cuts both ways. Eg, a > > "complete" solution should implement UTS 39 "confusables detection"[1] > > and IDNA[2]. Good luck doing that with bytes! > > And good luck doing that with just characters, too. I agree with you, sorry. I meant to cast doubt on the idea of complete solutions, or at least claims that completeness is an excuse for putting it in the stdlib. > This is the limitation that everyone seems to keep dancing around. > If you are using the stdlib, with functions that operate on > sequences like 'str' or 'bytes', you need to choose from one of > three options: There's a *fourth* way: specially designed codecs to preserve as much metainformation as you need, while always using the str format internally. This can be done for at least 100,000 separate (character, encoding) pairs by multiplexing into private space with an auxiliary table of encodings and equivalences. That's probably overkill. In many cases, adding simple PEP 383 mechanism (to preserve uninterpreted bytes) might be enough though, and that's pretty plausible IMO. From lesni.bleble at gmail.com Tue Jun 22 11:08:56 2010 From: lesni.bleble at gmail.com (lesni bleble) Date: Tue, 22 Jun 2010 11:08:56 +0200 Subject: [Python-Dev] adding new function Message-ID: hello, how can i simply add new functions to module after its initialization (Py_InitModule())? I'm missing something like PyModule_AddCFunction(). thank you L. From fetchinson at googlemail.com Tue Jun 22 11:44:38 2010 From: fetchinson at googlemail.com (Daniel Fetchinson) Date: Tue, 22 Jun 2010 11:44:38 +0200 Subject: [Python-Dev] adding new function In-Reply-To: References: Message-ID: > how can i simply add new functions to module after its initialization > (Py_InitModule())? I'm missing something like > PyModule_AddCFunction(). This type of question really belongs to python-list aka comp.lang.python which I CC-d now. Please keep the discussion on that list. Cheers, Daniel -- Psss, psss, put it down! - http://www.cafepress.com/putitdown From ncoghlan at gmail.com Tue Jun 22 12:41:39 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Jun 2010 20:41:39 +1000 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <87k4pr36le.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <8739wg469t.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621184700.BAD7F3A404D@sparrow.telecommunity.com> <87k4pr36le.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Jun 22, 2010 at 4:49 PM, Stephen J. Turnbull wrote: > ?> Which works if and only if your outputs are truly unicode-able. > > With PEP 383, they always are, as long as you allow Unicode to be > decoded to the same garbage your bytes-based program would have > produced anyway. Could it be that part of the problem here is that we need to better advertise "errors='surrogateescape'" as a mechanism for decoding incorrectly encoded data according to a nominal codec without throwing UnicodeDecode and UnicodeEncode errors all over the place? Currently it only garners a mention in the docs in the context of the os module, the list of error handlers in the codecs module and as a default error handler argument in the tarfile module. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Tue Jun 22 12:52:39 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 22 Jun 2010 20:52:39 +1000 Subject: [Python-Dev] [OT] glyphs [was Re: email package status in 3.X] In-Reply-To: References: <20100621184700.BAD7F3A404D@sparrow.telecommunity.com> Message-ID: <201006222052.39734.steve@pearwood.info> On Tue, 22 Jun 2010 11:46:27 am Terry Reedy wrote: > 3. Unicode disclaims direct representation of glyphic variants > (though again, exceptions were made for asian acceptance). For > example, in English, mechanically printed 'a' and 'g' are different > from manually printed 'a' and 'g'. Representing both by the same > codepoint, in itself, loses information. One who wishes to preserve > the distinction must instead use a font tag or perhaps a > tag. Similarly, older English had a significantly > different glyph for 's', which looks more like a modern 'f'. An unfortunate example, as the old English long-s gets its own Unicode codepoint. http://en.wikipedia.org/wiki/Long_s -- Steven D'Aprano From stephen at xemacs.org Tue Jun 22 13:31:13 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 20:31:13 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100622055040.GE5787@unaka.lan> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> Message-ID: <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> Toshio Kuratomi writes: > I'll definitely buy that. Would urljoin(b_base, b_subdir) => bytes and > urljoin(u_base, u_subdir) => unicode be acceptable though? Probably. But it doesn't matter what I say, since Guido has defined that as "polymorphism" and approved it in principle. > (I think, given other options, I'd rather see two separate > functions, though. Yes. > If you want to deal with things like this:: > http://host/caf? Yes. > At that point you are no longer dealing with the sequence of > characters talked about in the RFC. You are dealing with data > which may or may not be text. That's right, and I think that in most cases that is what programmers want to be dealing with. Let the library make sure that what goes on the wire conforms to the RFC. I don't want to know about it, I want to work with the content of the URI. > The proliferation of encoding I agree is a thing that is ugly. > Although, if I'm thinking correctly, that only matters when you > want to allow mixing bytes and unicode, correct? Well you need to know a fair amount about the encoding: that the reserved bytes are used as defined in the RFC, for example. > For debugging, I'm either not understanding or you're wrong. If I'm given > an arbitrary sequence of bytes how do I sanely store them as str internally? If it's really arbitrary, you use either a mapping to private space or PEP 383, and accept that it won't make sense. But in most cases you should be able to achieve a fair degree of sanity. > If I transform them using an encoding that anticipates the full range of > bytes I may be able to display some representation of them but it's not > necessarily the sanest method of display (for instance, if I know that path > element 1 is always going to be a utf8 encoded string and path element 2 is > always shift-jis encoded, and path element 3 is binary data, I could > construct a much saner display method than treating the whole thing as > latin1). And I think in most cases you will know, although the cases where you'll know will be because of a system-wide encoding. > What is your basis for asserting that URIs that aren't sanely treated as > text are garbage? I don't mean we can throw them away, I mean we can't do any sensible processing on them. You at least need to know about the reseved delimiters. In the same way that Philip used 'garbage' for the "unknown" encoding. And in the sense of "garbage in, garbage out". > unicode handling redesign. I'm stating my reading of the RFC not to defend > the use case Philip has, but because I think that the outlook that non-text > uris (before being percentencoded) are violations of the RFC That's not what I'm saying. What I'm trying to point out is that manipulating a bytes object as an URI sort of presumes a lot about its encoding as text. Since many of the URIs we deal with are more or less textual, why not take advantage of that? From stephen at xemacs.org Tue Jun 22 13:55:41 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 20:55:41 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <8739wg469t.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621184700.BAD7F3A404D@sparrow.telecommunity.com> <87k4pr36le.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87aaqn2sea.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > On Tue, Jun 22, 2010 at 4:49 PM, Stephen J. Turnbull wrote: > > ?> Which works if and only if your outputs are truly unicode-able. > > > > With PEP 383, they always are, as long as you allow Unicode to be > > decoded to the same garbage your bytes-based program would have > > produced anyway. > > Could it be that part of the problem here is that we need to better > advertise "errors='surrogateescape'" as a mechanism for decoding > incorrectly encoded data according to a nominal codec without throwing > UnicodeDecode and UnicodeEncode errors all over the place? Yes, I think that would make the "use str internally to urllib" strategy a lot more palatable. But it still needs to be combined with a program architecture of decode-process-encode, which might require substantial refactoring for some existing modules. From fdrake at acm.org Tue Jun 22 14:40:29 2010 From: fdrake at acm.org (Fred Drake) Date: Tue, 22 Jun 2010 08:40:29 -0400 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> Message-ID: On Tue, Jun 22, 2010 at 2:21 AM, Raymond Hettinger wrote: > I had thought there was a conscious decision to not change any existing > classes from old-style to new-style. I thought so as well. Changing any class from old-style to new-style risks breaking applications in obscure & mysterious ways. (Yes, we've been bitten by this before; it's a real problem.) -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From benjamin at python.org Tue Jun 22 14:48:25 2010 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 22 Jun 2010 07:48:25 -0500 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> Message-ID: 2010/6/22 Raymond Hettinger : > There's an entry in whatsnew for 2.7 to the effect of "The UserDict class is > now a new-style class". > I had thought there was a conscious decision to not change any existing > classes from old-style to new-style. IIRC, Martin had championed this idea > and had rejected all of proposals to make existing classes inherit from > object. IIRC this was because UserDict tries to be a MutableMapping but abcs require new style classes. -- Regards, Benjamin From lvh at laurensvh.be Tue Jun 22 15:23:36 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Tue, 22 Jun 2010 15:23:36 +0200 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> Message-ID: On Tue, Jun 22, 2010 at 2:40 PM, Fred Drake wrote: > On Tue, Jun 22, 2010 at 2:21 AM, Raymond Hettinger > wrote: >> I had thought there was a conscious decision to not change any existing >> classes from old-style to new-style. > > I thought so as well. ?Changing any class from old-style to new-style > risks breaking applications in obscure & mysterious ways. ?(Yes, we've > been bitten by this before; it's a real problem.) > > > ?-Fred +1. I've been bitten by this more than once in some of the more obscure old(-style) classes in twisted.python. Laurens From murman at gmail.com Tue Jun 22 15:24:28 2010 From: murman at gmail.com (Michael Urman) Date: Tue, 22 Jun 2010 08:24:28 -0500 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <87lja73aau.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <87lja73aau.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Jun 22, 2010 at 00:28, Stephen J. Turnbull wrote: > Michael Urman writes: > > ?> It is somewhat troublesome that there doesn't appear to be an obvious > ?> built-in idempotent-when-possible function that gives back the > ?> provided bytes/str, > > If you want something idempotent, it's already the case that > bytes(b'abc') => b'abc'. ?What might be desirable is to make > bytes('abc') work and return b'abc', but only if 'abc' is pure ASCII > (or maybe ISO 8859/1). By idempotent-when-possible, I mean to_bytes(str_or_bytes, encoding, errors) that would pass an instance of bytes through, or encode an instance of str. And of course a to_str that performs similarly, passing str through and decoding bytes. While bytes(b'abc') will give me b'abc', neither bytes('abc') nor bytes(b'abc', 'latin-1') get me the b'abc' I want to see. These are trivial functions; I just don't fully understand why the capability isn't baked in. A one argument call is idempotent capable; a two argument call isn't as it only converts. It's not a completely made-up requirement either. A cross-platform piece of software may need to present to a user items that are sometimes str and sometimes bytes - particularly filenames. > Unfortunately, str(b'abc') already does work, but > > steve at uwakimon ~ $ python3.1 > Python 3.1.2 (release31-maint, May 12 2010, 20:15:06) > [GCC 4.3.4] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> str(b'abc') > "b'abc'" >>>> > > Oops. ?You can see why that probably "should" be the case Sure, and I love having this there for debugging. But this is hardly good enough for presenting to a user once you leave ascii. >>> u = '???' >>> sjis = bytes(u, 'shift-jis') >>> utf8 = bytes(u, 'utf-8') >>> str(sjis), str(utf8) ("b'\\x93\\xfa\\x96{\\x8c\\xea'", "b'\\xe6\\x97\\xa5\\xe6\\x9c\\xac\\xe8\\xaa\\x9e'") When I happen to know the encoding, I can reverse it much more cleanly. >>> str(sjis, 'shift-jis'), str(utf8, 'utf-8') ('???', '???') But I can't mix this approach with str instances without writing a different invocation. >>> str(u, 'argh') TypeError: decoding str is not supported -- Michael Urman From guido at python.org Tue Jun 22 18:17:31 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 22 Jun 2010 09:17:31 -0700 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100622055040.GE5787@unaka.lan> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> Message-ID: [Just addressing one little issue here; generally I'm just happy that we're discussing this issue in such detail from so many points of view.] On Mon, Jun 21, 2010 at 10:50 PM, Toshio Kuratomi wrote: >[...] Would urljoin(b_base, b_subdir) => bytes and > urljoin(u_base, u_subdir) => unicode be acceptable though? ?(I think, given > other options, I'd rather see two separate functions, though. ?It seems more > discoverable and less prone to taking bad input some of the time to have two > functions that clearly only take one type of data apiece.) Hm. I'd rather see a single function (it would be "polymorphic" in my earlier terminology). After all a large number of string method calls (and some other utility function calls) already look the same regardless of whether they are handling bytes or text (as long as it's uniform). If the building blocks are all polymorphic it's easier to create additional polymorphic functions. FWIW, there are two problems with polymorphic functions, though they can be overcome: (1) Literals. If you write something like x.split('&') you are implicitly assuming x is text. I don't see a very clean way to overcome this; you'll have to implement some kind of type check e.g. x.split('&') if isinstance(x, str) else x.split(b'&') A handy helper function can be written: def literal_as(constant, variable): if isinstance(variable, str): return constant else: return constant.encode('utf-8') So now you can write x.split(literal_as('&', x)). (2) Data sources. These can be functions that produce new data from non-string data, e.g. str( ), read it from a named file, etc. An example is read() vs. write(): it's easy to create a (hypothetical) polymorphic stream object that accepts both f.write('booh') and f.write(b'booh'); but you need some other hack to make read() return something that matches a desired return type. I don't have a generic suggestion for a solution; for streams in particular, the existing distinction between binary and text streams works, of course, but there are other situations where this doesn't generalize (I think some XML interfaces have this awkwardness in their API for converting a tree to a string). -- --Guido van Rossum (python.org/~guido) From tseaver at palladion.com Tue Jun 22 18:37:14 2010 From: tseaver at palladion.com (Tres Seaver) Date: Tue, 22 Jun 2010 12:37:14 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jesse Noller wrote: > > On Jun 19, 2010, at 10:13 AM, Tres Seaver wrote: >>> Nothing is set in stone; if something is incredibly painful, or worse >>> yet broken, then someone needs to file a bug, bring it to this list, >>> or bring up a patch. >> Or walk away. >> > > Ok. If you want. I specifically said I *didn't* want to walk away. I'm pointing out that in the general case, the ordinary user who finds something incredibly painful or broken is far more likely to walk away from the platform than try to fix it, especially if there are available alternatives (e.g., Ruby, Python 2) where the pain level for that user's application is lower. >>> I guess tutorial welcome, rather than patch welcome then ;) >> The only folks who can write the tutorial are the ones who have >> already drunk the koolaid. Note that I've been making my living with Python >> for about twelve years now, and would *like* to use Python3, but can't, >> yet, and therefore haven't taken the first sip. > > Why can't you? Is it a bug? It's not *a* bug, it is that I do my day to day work on very large applications which depend on a large number of not-yet-ported libraries. This barrier is the negative "network effect" which is the whole point of this thread: there is nothing wrong with Python3 except that, to use it, I have to stop doing the work which pays to do an indeterminately-large amount of "hobby" work (of which I already do quite a lot). > Let's file it and fix it. Is it that you > need a dependency ported? I need dozens of them ported, and am working on some of them in the aforementioned "copious spare time." > Cool - let's bring it up to the maintainers, > or this list, or ask the PSF to push resources into helping port. > Anything but nothing. Nothing is the default: I am already successful with Python 2, and can't be successfulwith Python 3 (in the sense of delivering timely, cost-effective solutions to my customers) until *all* those dependencies are ported and stable there. > If what you're saying is that python 3 is a completely unsuitable > platform, well, then yeah - we can all "fix" it or walk away. I didn't say that: I said that Python 3 is unsuitable *today* for the work I'm doing, and that the relative wins it provides over Python 2 are dwarfed by the effort required to do all those ports myself. >>>> IOW, 3.x has broken TOOOWTDI for me in some areas. There may >>>> be obvious ways to do it, but, as per the Zen of Python, "that >>>> way may not be obvious at first unless you're Dutch". ;-) OT: The Dutch smiley there doesn't actually help anything but undercut any point to having TOOOWTDI in the list at all. >>> What areas. We need specifics which can either be: >>> >>> 1> Shot down. >>> 2> Turned into bugs, so they can be fixed >>> 3> Documented in the core documentation. >> That's bloody ironic in a thread which had pointed at reasons why >> people are not even considering Py3 for their projects: those folks won't >> even find the issues due to the lack of confidence in the suitability of >> the platform. > > What I saw was a thread about some issues in email, and cgi. We have > some work being done to address the issue. This will help resolve some > of the issues. > > If there are other issues, then we should step up and either help, or > get out ofthe way. Arguing about the viability of a platform we knew > would take a bit for adoption is silly and breeds ill will. I'm not arguing about viability: there are obviously users for whom Python 3 is not only viable, but superior to Python 2. However, I am quite confident that many pro-Python 3 folks arguing here underestimate the scope of the issues which have generated the (self-fullfilling) "not yet" perception. > It's not a turd, and it's not hopeless, in fact rumor has it NumPy > will be ported soon which is a major stepping stone. Sure, for the (far from trivial) subset of the community doing numerical work. > The only way to counteract this meme that python 3 is horribly > broken is to prove that it's not, fix bugs, and move on. There's no > point debating relative turdiness here. Any "turdiness" (which I am *not* arguing for) is a natural consequence of the kinds of backward incompatibilities which were *not* ruled out for Python 3, along with the (early, now waning) "build it and they will come" optimism about adoption rates. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwg5rIACgkQ+gerLs4ltQ6J7wCdFkQL7XeKtBM407Z5D2rSKk8n EWYAoJUfW+JgURUz7NJcWmqFw3PkNYde =WZEv -----END PGP SIGNATURE----- From ronaldoussoren at mac.com Tue Jun 22 18:39:03 2010 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 22 Jun 2010 18:39:03 +0200 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE4AF.80009@v.loewis.de> Message-ID: On 22 Jun, 2010, at 3:38, Alexander Belopolsky wrote: > On Mon, Jun 21, 2010 at 6:16 PM, "Martin v. L?wis" wrote: >>> The test_posix failure is a regression from 2.6 (but it only shows up on >>> some machines - it is caused by a fairly braindead implementation of a >>> couple of posix apis by Apple apparently). >>> >>> http://bugs.python.org/issue7900 >> >> Ah, that one. I definitely think this should *not* block the release: > > I agree that this is nowhere near being a release blocker, but I think > it would be nice to do something about it before the final release. > >> a) there is no clear solution in sight. So if we wait for it resolved, >> it could take months until we get a 2.7 release. > > The ideal solution will have to wait until Apple gets its act together > and fixed the problem on their end. I would say "months" is an overly > optimistic time estimate for that. I'd say there is no chance at all that this will be fixed in OSX 10.6, with some luck they'll change this in 10.7. > However, the issue is a regression > from prior versions. In 2.5 getgroups would truncate the list to 16 > groups, but won't crash. More importantly the 16 groups returned > would be correct per-process groups and not something immune to > setgroup changes. > > I proposed a very simple fix: > > http://bugs.python.org/file16326/no-darwin-ext.diff > > which simply minimally reverts the change that introduced the regression. That is one way to fix it, another just as valid fix is to change posix.getgroups to be able to return more than 16 groups on OSX (see my patch in issue7900). Both are valid fixes, both have both advantages and disadvantages. Your proposal: * Reverts to the behavior in 2.6 * Ensures that posix.getgroups and posix.setgroups are internally consistent My proposal: * Uses the newer ABI, which is more likely to be the one Apple wants you to use * Is compatible with system tools (that is, posix.getgroups() agrees with id(1)) * Is compatible with /usr/bin/python * results in posix.getgroups not reflecting results of posix.setgroups What I haven't done yet, and probably should, is to check how either implementation of getgroups interacts with groups in the System Preferences panel and with groups in managed environment (using OSX Server). My gut feeling is that second option (my proposal) would give more useful semantics, but that said: I almost never write code where I need os.setgroups. Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3567 bytes Desc: not available URL: From dirkjan at ochtman.nl Tue Jun 22 18:54:21 2010 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Tue, 22 Jun 2010 18:54:21 +0200 Subject: [Python-Dev] State of json in 2.7 Message-ID: It looks like simplejson 2.1.0 and 2.1.1 have been released: http://bob.pythonmac.org/archives/2010/03/10/simplejson-210/ http://bob.pythonmac.org/archives/2010/03/31/simplejson-211/ It looks like any changes that didn't come from the Python tree didn't go into the Python tree, either. I guess we can't put these changes into 2.7 anymore? How can we make this better next time? Cheers, Dirkjan From benjamin at python.org Tue Jun 22 18:56:09 2010 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 22 Jun 2010 11:56:09 -0500 Subject: [Python-Dev] State of json in 2.7 In-Reply-To: References: Message-ID: 2010/6/22 Dirkjan Ochtman : > I guess we can't put these changes into 2.7 anymore? How can we make > this better next time? Never have externally maintained packages. -- Regards, Benjamin From raymond.hettinger at gmail.com Tue Jun 22 18:24:42 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 22 Jun 2010 09:24:42 -0700 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> Message-ID: <4FBE15CB-2397-46B2-8417-588F8785BA20@gmail.com> On Jun 22, 2010, at 5:48 AM, Benjamin Peterson wrote: > 2010/6/22 Raymond Hettinger : >> There's an entry in whatsnew for 2.7 to the effect of "The UserDict class is >> now a new-style class". >> I had thought there was a conscious decision to not change any existing >> classes from old-style to new-style. IIRC, Martin had championed this idea >> and had rejected all of proposals to make existing classes inherit from >> object. > > IIRC this was because UserDict tries to be a MutableMapping but abcs > require new style classes. ISTM, this change should be reverted to the way it was in 2.6. The registration was already working fine: Python 2.6.4 (r264:75821M, Oct 27 2009, 19:48:32) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin >>> import UserDict >>> import collections >>> collections.MutableMapping.register(UserDict.UserDict) >>> issubclass(UserDict.UserDict, collections.MutableMapping) True We've didn't have any problems with this registration nor did there seem to be an issue with UserDict not implementing dictviews. Please revert this change. UserDicts have a long history and are used by a lot of code, so we need to avoid unnecessary breakage. Thank you, Raymond From ianb at colorstudy.com Tue Jun 22 19:03:29 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 22 Jun 2010 12:03:29 -0500 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Jun 22, 2010 at 6:31 AM, Stephen J. Turnbull wrote: > Toshio Kuratomi writes: > > > I'll definitely buy that. Would urljoin(b_base, b_subdir) => bytes and > > urljoin(u_base, u_subdir) => unicode be acceptable though? > > Probably. > > But it doesn't matter what I say, since Guido has defined that as > "polymorphism" and approved it in principle. > > > (I think, given other options, I'd rather see two separate > > functions, though. > > Yes. > > > If you want to deal with things like this:: > > http://host/caf? > > Yes. > Just for perspective, I don't know if I've ever wanted to deal with a URL like that. I know how it is supposed to work, and I know what a browser does with that, but so many tools will clean that URL up *or* won't be able to deal with it at all that it's not something I'll be passing around. So from a practical point of view this really doesn't come up, and if it did it would be in a situation where you could easily do something ad hoc (though there is not currently a routine to quote unsafe characters in a URL... that would be helpful, though maybe urllib.quote(url.encode('utf8'), '%/:') would do it). Also while it is problematic to treat the URL-unquoted value as text (because it has an unknown encoding, no encoding, or regularly a mixture of encodings), the URL-quoted value is pretty easy to pass around, and normalization (in this case to http://host/caf%C3%A9) is generally fine. While it's nice to be correct about encodings, sometimes it is impractical. And it is far nicer to avoid the situation entirely. That is, decoding content you don't care about isn't just inefficient, it's complicated and can introduce errors. The encoding of the underlying bytes of a %-decoded URL is largely uninteresting. Browsers (whose behavior drives a lot of convention) don't touch any of that encoding except lately occasionally to *display* some data in a more friendly way. But it's only display, and errors just make it revert to the old encoded display. Similarly I'd expect (from experience) that a programmer using Python to want to take the same approach, sticking with unencoded data in nearly all situations. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Jun 22 19:05:38 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 22 Jun 2010 13:05:38 -0400 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE4AF.80009@v.loewis.de> Message-ID: On Tue, Jun 22, 2010 at 12:39 PM, Ronald Oussoren wrote: .. > Both are valid fixes, both have both advantages and disadvantages. > > Your proposal: > * Reverts to the behavior in 2.6 > * Ensures that posix.getgroups and posix.setgroups are internally consistent > It is also very simple and since posix module worked fine on OSX for years without _DARWIN_C_SOURCE, I think this is a very low risk change. > My proposal: > * Uses the newer ABI, which is more likely to be the one Apple wants you to use I don't think so. In getgroups(2) I see LEGACY DESCRIPTION If _DARWIN_C_SOURCE is defined, getgroups() can return more than {NGROUPS_MAX} groups. This suggests that this is legacy behavior. Newer applications should use getgrouplist instead. > * Is compatible with system tools (that is, posix.getgroups() agrees with id(1)) I have not tested this recently, but I think if you exec id from a program after a call to setgroups(), it will return process groups, not user groups. > * Is compatible with /usr/bin/python I am sure that one this issue is fixed upstream, Apple will pick it up with the next version. > * results in posix.getgroups not reflecting results of posix.setgroups > This effectively substitutes getgrouplist called on the current user for getgroups. In 3.x, I believe the correct action will be to provide direct access to getgrouplist which is while not POSIX (yet?), is widely available. From benjamin at python.org Tue Jun 22 19:08:02 2010 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 22 Jun 2010 12:08:02 -0500 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: <4FBE15CB-2397-46B2-8417-588F8785BA20@gmail.com> References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> <4FBE15CB-2397-46B2-8417-588F8785BA20@gmail.com> Message-ID: 2010/6/22 Raymond Hettinger : > > On Jun 22, 2010, at 5:48 AM, Benjamin Peterson wrote: > >> 2010/6/22 Raymond Hettinger : >>> There's an entry in whatsnew for 2.7 to the effect of "The UserDict class is >>> now a new-style class". >>> I had thought there was a conscious decision to not change any existing >>> classes from old-style to new-style. ?IIRC, Martin had championed this idea >>> and had rejected all of proposals to make existing classes inherit from >>> object. >> >> IIRC this was because UserDict tries to be a MutableMapping but abcs >> require new style classes. > > ISTM, this change should be reverted to the way it was in 2.6. > > The registration was already working fine: Actually I believe it was an error that it could. There was a typo in abc.py which prevented it from raising errors when non new-style class objects were passed in. -- Regards, Benjamin From janssen at parc.com Tue Jun 22 19:17:01 2010 From: janssen at parc.com (Bill Janssen) Date: Tue, 22 Jun 2010 10:17:01 PDT Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE030.7020700@voidspace.org.uk> <1180.1277170019@parc.com> Message-ID: <1422.1277227021@parc.com> Alexander Belopolsky wrote: > On Mon, Jun 21, 2010 at 9:26 PM, Bill Janssen wrote: > .. > > Though, isn't that behavior of urllib.proxy_bypass another bug? > > I don't know. Ask Ronald. Hmmm. I brought up the System Preferences panel on my Mac, and sure enough, there's a checkbox, "Exclude simple hostnames". So I guess it's not a bug, though none of my Macs are configured that way. Bill From a.badger at gmail.com Tue Jun 22 19:21:23 2010 From: a.badger at gmail.com (Toshio Kuratomi) Date: Tue, 22 Jun 2010 13:21:23 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100622172123.GG5787@unaka.lan> On Tue, Jun 22, 2010 at 08:31:13PM +0900, Stephen J. Turnbull wrote: > Toshio Kuratomi writes: > > unicode handling redesign. I'm stating my reading of the RFC not to defend > > the use case Philip has, but because I think that the outlook that non-text > > uris (before being percentencoded) are violations of the RFC > > That's not what I'm saying. What I'm trying to point out is that > manipulating a bytes object as an URI sort of presumes a lot about its > encoding as text. I think we're more or less in agreement now but here I'm not sure. What manipulations are you thinking about? Which stage of URI construction are you considering? I've just taken a quick look at python3.1's urllib module and I see that there is a bit of confusion there. But it's not about unicode vs bytes but about whether a URI should be operated on at the real URI level or the data-that-makes-a-uri level. * all functions I looked at take python3 str rather than bytes so there's no confusing stuff here * urllib.request.urlopen takes a strict uri. That means that you must have a percent encoded uri at this point * urllib.parse.urljoin takes regular string values * urllib.parse and urllib.unparse take regular string values > Since many of the URIs we deal with are more or > less textual, why not take advantage of that? > Cool, so to summarize what I think we agree on: * Percent encoded URIs are text according to the RFC. * The data that is used to construct the URI is not defined as text by the RFC. * However, it is very often text in an unspecified encoding * It is extremely convenient for programmers to be able to treat the data that is used to form a URI as text in nearly all common cases. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From guido at python.org Tue Jun 22 18:53:00 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 22 Jun 2010 09:53:00 -0700 Subject: [Python-Dev] bytes / unicode In-Reply-To: <89AE7ED6-FB94-45DA-9432-7FCBA25A56BF@gmail.com> References: <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <39CFC9B3-E55A-41BB-9718-1457E20ACECC@twistedmatrix.com> <89AE7ED6-FB94-45DA-9432-7FCBA25A56BF@gmail.com> Message-ID: On Mon, Jun 21, 2010 at 11:47 PM, Raymond Hettinger wrote: > > On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote: > > ??This is a common pain-point for porting software to 3.x - you had a > string, it kinda worked most of the time before, but now you need to keep > track of text too and the functions which seemed?to work on bytes no longer > do. > > Thanks Glyph. ?That is a nice summary of one kind of challenge facing > programmers. Ironically, Glyph also described the pain in 2.x: it only "kinda" worked. -- --Guido van Rossum (python.org/~guido) From raymond.hettinger at gmail.com Tue Jun 22 19:31:36 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 22 Jun 2010 10:31:36 -0700 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> <4FBE15CB-2397-46B2-8417-588F8785BA20@gmail.com> Message-ID: On Jun 22, 2010, at 10:08 AM, Benjamin Peterson wrote: > . There was a typo in > abc.py which prevented it from raising errors when non new-style class > objects were passed in. For 2.x, that was probably a good thing, a happy accident that made it possible to register existing mapping classes as a MutableMapping. "Fixing" that typo will break code that currently uses ABCs with old-style classes. I believe we are better-off leaving this as it was released in 2.6. Raymond From guido at python.org Tue Jun 22 18:49:27 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 22 Jun 2010 09:49:27 -0700 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <87lja73aau.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <87lja73aau.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Jun 21, 2010 at 10:28 PM, Stephen J. Turnbull wrote: > Michael Urman writes: > > ?> It is somewhat troublesome that there doesn't appear to be an obvious > ?> built-in idempotent-when-possible function that gives back the > ?> provided bytes/str, > > If you want something idempotent, it's already the case that > bytes(b'abc') => b'abc'. ?What might be desirable is to make > bytes('abc') work and return b'abc', but only if 'abc' is pure ASCII > (or maybe ISO 8859/1). No, no, no! That's just what Python 2 did. > Unfortunately, str(b'abc') already does work, but > > steve at uwakimon ~ $ python3.1 > Python 3.1.2 (release31-maint, May 12 2010, 20:15:06) > [GCC 4.3.4] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> str(b'abc') > "b'abc'" >>>> > > Oops. ?You can see why that probably "should" be the case. There is a near-contract that str() of pretty much anything returns a "printable" version of that thing. -- --Guido van Rossum (python.org/~guido) From foom at fuhm.net Tue Jun 22 20:07:18 2010 From: foom at fuhm.net (James Y Knight) Date: Tue, 22 Jun 2010 14:07:18 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> On Jun 22, 2010, at 1:03 PM, Ian Bicking wrote: > Similarly I'd expect (from experience) that a programmer using > Python to want to take the same approach, sticking with unencoded > data in nearly all situations. Yeah. This is a real issue I have with the direction Python3 went: it pushes you into decoding everything to unicode early, even when you don't care -- all you really wanted to do is pass it from one API to another, with some well-defined transformations, which don't actually depend on it having being decoded properly. (For example, extracting the path from the URL and attempting to open it as a file on the filesystem.) This means that Python3 programs can become *more* fragile in the face of random data you encounter out in the real world, rather than less fragile, which was the goal of the whole exercise. The surrogateescape method is a nice workaround for this, but I can't help thinking that it might've been better to just treat stuff as possibly-invalid-but-probably-utf8 byte-strings from input, through processing, to output. It seems kinda too late for that, though: next time someone designs a language, they can try that. :) James -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Tue Jun 22 20:09:24 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 22 Jun 2010 20:09:24 +0200 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> Message-ID: <4C20FC54.9000608@egenix.com> Guido van Rossum wrote: > [Just addressing one little issue here; generally I'm just happy that > we're discussing this issue in such detail from so many points of > view.] > > On Mon, Jun 21, 2010 at 10:50 PM, Toshio Kuratomi wrote: >> [...] Would urljoin(b_base, b_subdir) => bytes and >> urljoin(u_base, u_subdir) => unicode be acceptable though? (I think, given >> other options, I'd rather see two separate functions, though. It seems more >> discoverable and less prone to taking bad input some of the time to have two >> functions that clearly only take one type of data apiece.) > > Hm. I'd rather see a single function (it would be "polymorphic" in my > earlier terminology). After all a large number of string method calls > (and some other utility function calls) already look the same > regardless of whether they are handling bytes or text (as long as it's > uniform). If the building blocks are all polymorphic it's easier to > create additional polymorphic functions. > > FWIW, there are two problems with polymorphic functions, though they > can be overcome: > > (1) Literals. > > If you write something like x.split('&') you are implicitly assuming x > is text. I don't see a very clean way to overcome this; you'll have to > implement some kind of type check e.g. > > x.split('&') if isinstance(x, str) else x.split(b'&') > > A handy helper function can be written: > > def literal_as(constant, variable): > if isinstance(variable, str): > return constant > else: > return constant.encode('utf-8') > > So now you can write x.split(literal_as('&', x)). This polymorphism is what we used in Python2 a lot to write code that works for both Unicode and 8-bit strings. Unfortunately, this no longer works as easily in Python3 due to the literals sometimes having the wrong type and using such a helper function slows things down a lot. It would be great if we could have something like the above as builtin method: x.split('&'.as(x)) Perhaps something to discuss on the language summit at EuroPython. Too bad we can't add such porting enhancements to Python2 anymore. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 22 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 26 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From a.badger at gmail.com Tue Jun 22 20:44:44 2010 From: a.badger at gmail.com (Toshio Kuratomi) Date: Tue, 22 Jun 2010 14:44:44 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <87lja73aau.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100622184444.GJ5787@unaka.lan> On Tue, Jun 22, 2010 at 08:24:28AM -0500, Michael Urman wrote: > On Tue, Jun 22, 2010 at 00:28, Stephen J. Turnbull wrote: > > Michael Urman writes: > > > > ?> It is somewhat troublesome that there doesn't appear to be an obvious > > ?> built-in idempotent-when-possible function that gives back the > > ?> provided bytes/str, > > > > If you want something idempotent, it's already the case that > > bytes(b'abc') => b'abc'. ?What might be desirable is to make > > bytes('abc') work and return b'abc', but only if 'abc' is pure ASCII > > (or maybe ISO 8859/1). > > By idempotent-when-possible, I mean to_bytes(str_or_bytes, encoding, > errors) that would pass an instance of bytes through, or encode an > instance of str. And of course a to_str that performs similarly, > passing str through and decoding bytes. While bytes(b'abc') will give > me b'abc', neither bytes('abc') nor bytes(b'abc', 'latin-1') get me > the b'abc' I want to see. > A month or so ago, I finally broke down and wrote a python2 library that had these functions in it (along with a bunch of other trivial boilerplate functions that I found myself writing over and over in different projects) https://fedorahosted.org/releases/k/i/kitchen/docs/api-text-converters.html#unicode-and-byte-str-conversion I suppose I could port this to python3 and we could see if it gains adoption as a thirdparty addon. I have been hesitating over doing that since I don't use python3 for everyday work and I have a vague feeling that 2to3 won't understand what that code needs to do. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From brett at python.org Tue Jun 22 21:27:49 2010 From: brett at python.org (Brett Cannon) Date: Tue, 22 Jun 2010 12:27:49 -0700 Subject: [Python-Dev] State of json in 2.7 In-Reply-To: References: Message-ID: [cc'ing Bob on his gmail address; didn't have any other address handy so I don't know if this will actually get to him] On Tue, Jun 22, 2010 at 09:54, Dirkjan Ochtman wrote: > It looks like simplejson 2.1.0 and 2.1.1 have been released: > > http://bob.pythonmac.org/archives/2010/03/10/simplejson-210/ > http://bob.pythonmac.org/archives/2010/03/31/simplejson-211/ > > It looks like any changes that didn't come from the Python tree didn't > go into the Python tree, either. Has anyone asked Bob why he did this? There might be a logical reason. -Brett From bob at redivi.com Tue Jun 22 22:11:10 2010 From: bob at redivi.com (Bob Ippolito) Date: Tue, 22 Jun 2010 13:11:10 -0700 Subject: [Python-Dev] State of json in 2.7 In-Reply-To: References: Message-ID: On Tuesday, June 22, 2010, Brett Cannon wrote: > [cc'ing Bob on his gmail address; didn't have any other address handy > so I don't know if this will actually get to him] > > On Tue, Jun 22, 2010 at 09:54, Dirkjan Ochtman wrote: >> It looks like simplejson 2.1.0 and 2.1.1 have been released: >> >> http://bob.pythonmac.org/archives/2010/03/10/simplejson-210/ >> http://bob.pythonmac.org/archives/2010/03/31/simplejson-211/ >> >> It looks like any changes that didn't come from the Python tree didn't >> go into the Python tree, either. > > Has anyone asked Bob why he did this? There might be a logical reason. I've just been busy. It's not trivial to move patches from one to the other, so it's not something that has been easy for me to get around to actually doing. It seems that more often than not when I have had time to look at something, it didn't line up well with python's release schedule. (and speaking of busy I'm en route for a week long honeymoon so don't expect much else from me on this thread) -bob From tjreedy at udel.edu Tue Jun 22 22:19:45 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 22 Jun 2010 16:19:45 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <20100621023005.EE17E3A4099@sparrow.telecommunity.com> <20100621164650.16A093A414B@sparrow.telecommunity.com> <20100621181750.267933A404D@sparrow.telecommunity.com> Message-ID: On 6/22/2010 1:22 AM, Glyph Lefkowitz wrote: > The thing that I have heard in passing from a couple of folks with > experience in this area is that some older software in asia would > present characters differently if they were originally encoded in a > "japanese" encoding versus a "chinese" encoding, even though they were > really "the same" characters. As I tried to say in another post, that to me is similar to wanting to present English text is different fonts depending on whether spoken by an American or Brit, or a modern person versus a Renaissance person. > I do know that Han Unification is a giant political mess > ( makes for some Thanks, I will take a look. > interesting reading), but my understanding is that it has handled enough > of the cases by now that one can write software to display asian > languages and it will basically work with a modern version of unicode. > (And of course, there's always the private use area, as Stephen Turnbull > pointed out.) > > Regardless, this is another example where keeping around a string isn't > really enough. If you need to display a japanese character in a distinct > way because you are operating in the japanese *script*, you need a tag > surrounding your data that is a hint to its presentation. The fact that > these presentation hints were sometimes determined by their encoding is > an unfortunate historical accident. Yes. The asian languages I know anything about seems to natively have almost none of the symbols English has, many borrowed from math, that have been pressed into service for text markup. -- Terry Jan Reedy From tjreedy at udel.edu Tue Jun 22 22:32:40 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 22 Jun 2010 16:32:40 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <87lja73aau.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 6/22/2010 9:24 AM, Michael Urman wrote: > By idempotent-when-possible, I mean to_bytes(str_or_bytes, encoding, > errors) that would pass an instance of bytes through, or encode an > instance of str. And of course a to_str that performs similarly, > passing str through and decoding bytes. While bytes(b'abc') will give > me b'abc', neither bytes('abc') nor bytes(b'abc', 'latin-1') get me > the b'abc' I want to see. > > These are trivial functions; > I just don't fully understand why the capability isn't baked in. Possible reasons: They are special purpose functions easily built on the basic functions provided. Fine for a 3rd party library. Most people do not need them. Some might be mislead by them. As other have said, "Not every one-liner should be builtin". -- Terry Jan Reedy From tjreedy at udel.edu Tue Jun 22 22:41:54 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 22 Jun 2010 16:41:54 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <39CFC9B3-E55A-41BB-9718-1457E20ACECC@twistedmatrix.com> <89AE7ED6-FB94-45DA-9432-7FCBA25A56BF@gmail.com> Message-ID: On 6/22/2010 12:53 PM, Guido van Rossum wrote: > On Mon, Jun 21, 2010 at 11:47 PM, Raymond Hettinger > wrote: >> >> On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote: >> >> This is a common pain-point for porting software to 3.x - you had a >> string, it kinda worked most of the time before, but now you need to keep >> track of text too and the functions which seemed to work on bytes no longer >> do. >> >> Thanks Glyph. That is a nice summary of one kind of challenge facing >> programmers. > > Ironically, Glyph also described the pain in 2.x: it only "kinda" worked. The people with problematic code to convert must imclude some who managed to tolerate and perhaps suppress the pain. I suspect that conversion attempts brings it back to the surface. It is natural to blame the re-surfacer rather than the original source. (As in 'blame the messenger'). -- Terry Jan Reedy From tjreedy at udel.edu Tue Jun 22 22:47:58 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 22 Jun 2010 16:47:58 -0400 Subject: [Python-Dev] [OT] glyphs [was Re: email package status in 3.X] In-Reply-To: <201006222052.39734.steve@pearwood.info> References: <20100621184700.BAD7F3A404D@sparrow.telecommunity.com> <201006222052.39734.steve@pearwood.info> Message-ID: On 6/22/2010 6:52 AM, Steven D'Aprano wrote: > On Tue, 22 Jun 2010 11:46:27 am Terry Reedy wrote: >> 3. Unicode disclaims direct representation of glyphic variants >> (though again, exceptions were made for asian acceptance). For >> example, in English, mechanically printed 'a' and 'g' are different >> from manually printed 'a' and 'g'. Representing both by the same >> codepoint, in itself, loses information. One who wishes to preserve >> the distinction must instead use a font tag or perhaps a >> tag. Similarly, older English had a significantly >> different glyph for 's', which looks more like a modern 'f'. > > An unfortunate example, as the old English long-s gets its own Unicode > codepoint. Whoops. I suppose I should thank you for the correction so I never make the same error again. Thank you. > http://en.wikipedia.org/wiki/Long_s Very interesting to find out the source of both the integral sign and shilling symbols. -- Terry Jan Reedy From cyounkins at gmail.com Tue Jun 22 23:14:45 2010 From: cyounkins at gmail.com (Craig Younkins) Date: Tue, 22 Jun 2010 17:14:45 -0400 Subject: [Python-Dev] Use of cgi.escape can lead to XSS vulnerabilities Message-ID: Hello, The method in question: http://docs.python.org/library/cgi.html#cgi.escape http://svn.python.org/view/python/tags/r265/Lib/cgi.py?view=markup # at the bottom "Convert the characters '&', '<' and '>' in string s to HTML-safe sequences. Use this if you need to display text that might contain such characters in HTML. If the optional flag quote is true, the quotation mark character ('"') is also translated; this helps for inclusion in an HTML attribute value, as in . If the value to be quoted might include single- or double-quote characters, or both, consider using the quoteattr() function in the xml.sax.saxutils module instead." cgi.escape never escapes single quote characters, which can easily lead to a Cross-Site Scripting (XSS) vulnerability. This seems to be known by many, but a quick search reveals many are using cgi.escape for HTML attribute escaping. The intended use of this method is unclear to me. Up to and including the latest published version of Mako (0.3.3), this method was the HTML escaping method. Used in this manner, single-quoted attributes with user-supplied data are easily susceptible to cross-site scripting vulnerabilities. Proof of concept in Mako: >>> from mako.template import Template >>> print Template("

", default_filters=['h']).render(data="' onload='alert(1);' id='")

I've emailed Michael Bayer, the creator of Mako, and this will be fixed in version 0.3.4. While the documentation says "if the value to be quoted might include single- or double-quote characters... [use the] xml.sax.saxutils module instead," it also implies that this method will make input safe for HTML. Because this method escapes 4 of the 5 key XML characters, it is reasonable to expect some will use it in the manner Mako did. I suggest rewording the documentation for the method making it more clear what it should and should not be used for. I would like to see the method changed to properly escape single-quotes, but if it is not changed, the documentation should explicitly say this method does not make input safe for inclusion in HTML. Shameless plug: http://www.PythonSecurity.org/ Craig Younkins -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianb at colorstudy.com Tue Jun 22 22:46:45 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 22 Jun 2010 15:46:45 -0500 Subject: [Python-Dev] bytes / unicode In-Reply-To: <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> Message-ID: On Tue, Jun 22, 2010 at 1:07 PM, James Y Knight wrote: > The surrogateescape method is a nice workaround for this, but I can't help > thinking that it might've been better to just treat stuff as > possibly-invalid-but-probably-utf8 byte-strings from input, through > processing, to output. It seems kinda too late for that, though: next time > someone designs a language, they can try that. :) > surrogateescape does help a lot, my only problem with it is that it's out-of-band information. That is, if you have data that went through data.decode('utf8', 'surrogateescape') you can restore it to bytes or transcode it to another encoding, but you have to know that it was decoded specifically that way. And of course if you did have to transcode it (e.g., text.encode('utf8', 'surrogateescape').decode('latin1')) then if you had actually handled the text in any way you may have broken it; you don't *really* have valid text. A lazier solution feels like it would be easier and more transparent to work with. But... I also don't see any major language constraint to having another kind of string that is bytes+encoding. I think PJE brought up a problem with a couple coercion aspects. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Jun 22 23:21:53 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 22 Jun 2010 17:21:53 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> Message-ID: Tres, I am a Python3 enthusiast and realist. I did not expect major adoption for about 3 years (more optimistic than the 5 years of some). If you are feeling pressured to 'move' to Python3, it is not from me. I am sure you will do so on your own, perhaps even with enthusiasm, when it will be good for *you* to do so. If someone wants to contribute while sticking to Python2, its easy. The tracker has perhaps 2000 open 2.x issues, hundreds with no responses. If more Python2 people worked on making 2.7 as bug-free as possible, the developers would be freer to make 3.2 as good as possible (which is what *I* want). The porting of numpy (which I suspect has gotten some urging) will not just benefit 'nemerical' computing. For instance, there cannot be a 3.x version of pygame until there is a 3.x version of numpy, its main Python dependency. (The C Simple Directmedia Llibrary it also wraps and builds upon does not care.) -- Terry Jan Reedy From guido at python.org Tue Jun 22 19:03:29 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 22 Jun 2010 10:03:29 -0700 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> Message-ID: On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver wrote: > Any "turdiness" (which I am *not* arguing for) is a natural consequence > of the kinds of backward incompatibilities which were *not* ruled out > for Python 3, along with the (early, now waning) "build it and they will > ?come" optimism about adoption rates. FWIW, my optimisim is *not* waning. I think it's good that we're having this discussion and I expect something useful will come out of it; I also expect in general that the (admittedly serious) problem of having to port all dependencies will be solved in the next few years. Not by magic, but because many people are taking small steps in the right direction, and there will be light eventually. In the mean time I don't blame anyone for sticking with 2.x or being too busy to help port stuff to 3.x. Python 3 has been a long time in the making -- it will be a bit longer still, which was expected. -- --Guido van Rossum (python.org/~guido) From janssen at parc.com Tue Jun 22 23:29:50 2010 From: janssen at parc.com (Bill Janssen) Date: Tue, 22 Jun 2010 14:29:50 PDT Subject: [Python-Dev] Use of cgi.escape can lead to XSS vulnerabilities In-Reply-To: References: Message-ID: <10286.1277242190@parc.com> Craig Younkins wrote: > cgi.escape never escapes single quote characters, which can easily lead to a > Cross-Site Scripting (XSS) vulnerability. This seems to be known by many, > but a quick search reveals many are using cgi.escape for HTML attribute > escaping. Did you file a bug report? Bill From robertc at robertcollins.net Tue Jun 22 23:40:45 2010 From: robertc at robertcollins.net (Robert Collins) Date: Wed, 23 Jun 2010 09:40:45 +1200 Subject: [Python-Dev] bytes / unicode In-Reply-To: <4C20FC54.9000608@egenix.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <4C20FC54.9000608@egenix.com> Message-ID: On Wed, Jun 23, 2010 at 6:09 AM, M.-A. Lemburg wrote: >> ? ? ? ? ? return constant.encode('utf-8') >> >> So now you can write x.split(literal_as('&', x)). > > This polymorphism is what we used in Python2 a lot to write > code that works for both Unicode and 8-bit strings. > > Unfortunately, this no longer works as easily in Python3 due > to the literals sometimes having the wrong type and using > such a helper function slows things down a lot. I didn't work in 2 either - see for instance the traceback module with an Exception with unicode args and a non-ascii file path - the file path is in its bytes form, the string joining logic triggers an implicit upcast and *boom*. > Too bad we can't add such porting enhancements to Python2 anymore Perhaps a 'py3compat' module on pypi, with things like the py._builtin reraise helper and so forth ? -Rob From martin at v.loewis.de Tue Jun 22 23:50:49 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 22 Jun 2010 23:50:49 +0200 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE4AF.80009@v.loewis.de> Message-ID: <4C213039.5090300@v.loewis.de> > This effectively substitutes getgrouplist called on the current user > for getgroups. In 3.x, I believe the correct action will be to > provide direct access to getgrouplist which is while not POSIX (yet?), > is widely available. As a policy, adding non-POSIX functions to the posix module is perfectly fine, as long as there is an autoconf test for it (plain ifdefs are gruntingly accepted also). Regards, Martin From fdrake at acm.org Tue Jun 22 21:23:13 2010 From: fdrake at acm.org (Fred Drake) Date: Tue, 22 Jun 2010 15:23:13 -0400 Subject: [Python-Dev] State of json in 2.7 In-Reply-To: References: Message-ID: On Tue, Jun 22, 2010 at 12:56 PM, Benjamin Peterson wrote: > Never have externally maintained packages. Seriously! I concur with this. Fortunately, it's not a real problem in this case. There's the (maintained) simplejson package, and the unmaintained json package. And simplejson works with older versions of Python, too, :-) -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From ncoghlan at gmail.com Tue Jun 22 23:41:51 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 23 Jun 2010 07:41:51 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> Message-ID: On Wed, Jun 23, 2010 at 2:17 AM, Guido van Rossum wrote: > (1) Literals. > > If you write something like x.split('&') you are implicitly assuming x > is text. I don't see a very clean way to overcome this; you'll have to > implement some kind of type check e.g. > > ? ?x.split('&') if isinstance(x, str) else x.split(b'&') > > A handy helper function can be written: > > ?def literal_as(constant, variable): > ? ? ?if isinstance(variable, str): > ? ? ? ? ?return constant > ? ? ?else: > ? ? ? ? ?return constant.encode('utf-8') > > So now you can write x.split(literal_as('&', x)). I think this is a key point. In checking the behaviour of the os module bytes APIs (see below), I used a simple filter along the lines of: [x for x in seq if x.endswith("b")] It would be nice if code along those lines could easily be made polymorphic. Maybe what we want is a new class method on bytes and str (this idea is similar to what MAL suggests later in the thread): def coerce(cls, obj, encoding=None, errors='surrogateescape'): if isinstance(obj, cls): return existing if encoding is None: encoding = sys.getdefaultencoding() # This is the str version, bytes,coerce would use obj.encode() instead return obj.decode(encoding, errors) Then my example above could be made polymorphic (for ASCII compatible encodings) by writing: [x for x in seq if x.endswith(x.coerce("b"))] I'm trying to see downsides to this idea, and I'm not really seeing any (well, other than 2.7 being almost out the door and the fact we'd have to grant ourselves an exception to the language moratorium) > (2) Data sources. > > These can be functions that produce new data from non-string data, > e.g. str( ), read it from a named file, etc. An example is read() > vs. write(): it's easy to create a (hypothetical) polymorphic stream > object that accepts both f.write('booh') and f.write(b'booh'); but you > need some other hack to make read() return something that matches a > desired return type. I don't have a generic suggestion for a solution; > for streams in particular, the existing distinction between binary and > text streams works, of course, but there are other situations where > this doesn't generalize (I think some XML interfaces have this > awkwardness in their API for converting a tree to a string). We may need to use the os and io modules as the precedents here: os: normal API is text using the surrogateescape error handler, parallel bytes API exposes raw bytes. Parallel API is polymorphic if possible (e.g. os.listdir), but appends a 'b' to the name if the polymorphic approach isn't practical (e.g. os.environb, os.getcwdb, os.getenvb). io. layered API, where both the raw bytes of the wire protocol and the decoded bytes of the text layer are available Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Jun 23 00:07:07 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 23 Jun 2010 08:07:07 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: <4C20FC54.9000608@egenix.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <4C20FC54.9000608@egenix.com> Message-ID: On Wed, Jun 23, 2010 at 4:09 AM, M.-A. Lemburg wrote: > It would be great if we could have something like the above as > builtin method: > > x.split('&'.as(x)) As per my other message, another possible (and reasonably intuitive) spelling would be: x.split(x.coerce('&')) Writing it as a helper function is also possible, although it be trickier to remember the correct argument ordering: def coerce_to(target, obj, encoding=None, errors='surrogateescape'): if isinstance(obj, type(target)): return obj if encoding is None: encoding = sys.getdefaultencoding() try:: convert = obj.decode except AttributeError: convert = obj.encode return convert(encoding, errors) x.split(coerce_to(x, '&')) > Perhaps something to discuss on the language summit at EuroPython. > > Too bad we can't add such porting enhancements to Python2 anymore. Well, we can if we really want to, it just entails convincing Benjamin to reschedule the 2.7 final release. Given the UserDict/ABC/old-style classes issue, there's a fair chance there's going to be at least one more 2.7 RC anyway. That said, since this kind of coercion can be done in a helper function, that should be adequate for the 2.x to 3.x conversion case (for 2.x, the helper function can be defined to just return the second argument since bytes and str are the same type, while the 3.x version would look something like the code above) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From greg.ewing at canterbury.ac.nz Wed Jun 23 01:03:06 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 23 Jun 2010 11:03:06 +1200 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> Message-ID: <4C21412A.9030709@canterbury.ac.nz> Benjamin Peterson wrote: > IIRC this was because UserDict tries to be a MutableMapping but abcs > require new style classes. Are there any use cases for UserList and UserDict in new code, now that list and dict can be subclassed? If not, I don't think it would be a big problem if they were left out of the ABC ecosystem. No worse than what happens to any other existing user-defined class that predates ABCs -- if people want them to inherit from ABCs, they have to update their code. In this case, the update would consist of changing subclasses to inherit from list or dict instead. -- Greg From fuzzyman at voidspace.org.uk Wed Jun 23 00:59:12 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 22 Jun 2010 23:59:12 +0100 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: <4C21412A.9030709@canterbury.ac.nz> References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> <4C21412A.9030709@canterbury.ac.nz> Message-ID: <4C214040.20304@voidspace.org.uk> On 23/06/2010 00:03, Greg Ewing wrote: > Benjamin Peterson wrote: > >> IIRC this was because UserDict tries to be a MutableMapping but abcs >> require new style classes. > > Are there any use cases for UserList and UserDict in new > code, now that list and dict can be subclassed? Inheriting from list or dict isn't very useful as you to have to override *every* method to control behaviour. (For example with the dict if you override __setitem__ then update and setdefault (etc) don't go through your new __setitem__ and if you override __getitem__ then pop and friends don't go through your new __getitem__.) In 2.6+ you can of course use the collections.MutableMapping abc, but if you want to write cross-Python version code UserDict is still useful. If you want abc support then you are *already* on 2.6+ though I guess. All the best, Michael > > If not, I don't think it would be a big problem if they > were left out of the ABC ecosystem. No worse than what > happens to any other existing user-defined class that > predates ABCs -- if people want them to inherit from > ABCs, they have to update their code. In this case, the > update would consist of changing subclasses to inherit > from list or dict instead. > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fuzzyman at voidspace.org.uk Wed Jun 23 01:04:15 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 23 Jun 2010 00:04:15 +0100 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <4C20FC54.9000608@egenix.com> Message-ID: <4C21416F.2040009@voidspace.org.uk> On 22/06/2010 22:40, Robert Collins wrote: > On Wed, Jun 23, 2010 at 6:09 AM, M.-A. Lemburg wrote: > > >>> return constant.encode('utf-8') >>> >>> So now you can write x.split(literal_as('&', x)). >>> >> This polymorphism is what we used in Python2 a lot to write >> code that works for both Unicode and 8-bit strings. >> >> Unfortunately, this no longer works as easily in Python3 due >> to the literals sometimes having the wrong type and using >> such a helper function slows things down a lot. >> > I didn't work in 2 either - see for instance the traceback module with > an Exception with unicode args and a non-ascii file path - the file > path is in its bytes form, the string joining logic triggers an > implicit upcast and *boom*. > > Yeah, there are still a few places in unittest where a unicode exception can cause the whole test run to bomb out. No-one has *yet* reported these as bugs and I try and ferret them out as I find them. All the best, Michael >> Too bad we can't add such porting enhancements to Python2 anymore >> > Perhaps a 'py3compat' module on pypi, with things like the py._builtin > reraise helper and so forth ? > > -Rob > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From raymond.hettinger at gmail.com Wed Jun 23 01:17:54 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 22 Jun 2010 16:17:54 -0700 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: <4C214040.20304@voidspace.org.uk> References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> <4C21412A.9030709@canterbury.ac.nz> <4C214040.20304@voidspace.org.uk> Message-ID: On Jun 22, 2010, at 3:59 PM, Michael Foord wrote: > On 23/06/2010 00:03, Greg Ewing wrote: >> Benjamin Peterson wrote: >> >>> IIRC this was because UserDict tries to be a MutableMapping but abcs >>> require new style classes. >> >> Are there any use cases for UserList and UserDict in new >> code, now that list and dict can be subclassed? > > Inheriting from list or dict isn't very useful as you to have to override *every* method to control behaviour. Benjamin fixed the UserDict and ABC problem earlier today in r82155. It is now the same as it was in Py2.6. Nothing to see here. Move along. Raymond From fuzzyman at voidspace.org.uk Wed Jun 23 01:18:29 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 23 Jun 2010 00:18:29 +0100 Subject: [Python-Dev] bytes / unicode In-Reply-To: <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> Message-ID: <4C2144C5.2070902@voidspace.org.uk> On 22/06/2010 19:07, James Y Knight wrote: > > On Jun 22, 2010, at 1:03 PM, Ian Bicking wrote: >> Similarly I'd expect (from experience) that a programmer using Python >> to want to take the same approach, sticking with unencoded data in >> nearly all situations. > > Yeah. This is a real issue I have with the direction Python3 went: it > pushes you into decoding everything to unicode early, Well, both .NET and Java take this approach as well. I wonder how they cope with the particular issues that have been mentioned for web applications - both platforms are used extensively for web apps. Having used IronPython, which has .NET unicode strings (although it does a lot of magic to *allow* you to store binary data in strings for compatibility with CPython), I have to say that this approach makes a lot of programming *so* much more pleasant. We did a lot of I/O (can you do useful programming without I/O?) including working with databases, but I didn't work *much* with wire protocols (fetching a fair bit of data from the web though now I think about it). I think wire protocols can present particular problems; sometimes having mixed encodings in the same data it seems. Where you don't have these problems keeping bytes data and all Unicode text data separate and encoding / decoding at the boundaries is really much more sane and pleasant. It would be a real shame if we decided that the way forward for Python 3 was to try and move closer to how bytes/text was handled in Python 2. All the best, Michael > even when you don't care -- all you really wanted to do is pass it > from one API to another, with some well-defined transformations, which > don't actually depend on it having being decoded properly. (For > example, extracting the path from the URL and attempting to open it as > a file on the filesystem.) > > This means that Python3 programs can become *more* fragile in the face > of random data you encounter out in the real world, rather than less > fragile, which was the goal of the whole exercise. > > The surrogateescape method is a nice workaround for this, but I can't > help thinking that it might've been better to just treat stuff as > possibly-invalid-but-probably-utf8 byte-strings from input, through > processing, to output. It seems kinda too late for that, though: next > time someone designs a language, they can try that. :) > > James > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianb at colorstudy.com Wed Jun 23 01:23:40 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 22 Jun 2010 18:23:40 -0500 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> Message-ID: On Tue, Jun 22, 2010 at 11:17 AM, Guido van Rossum wrote: > (2) Data sources. > > These can be functions that produce new data from non-string data, > e.g. str( ), read it from a named file, etc. An example is read() > vs. write(): it's easy to create a (hypothetical) polymorphic stream > object that accepts both f.write('booh') and f.write(b'booh'); but you > need some other hack to make read() return something that matches a > desired return type. I don't have a generic suggestion for a solution; > for streams in particular, the existing distinction between binary and > text streams works, of course, but there are other situations where > this doesn't generalize (I think some XML interfaces have this > awkwardness in their API for converting a tree to a string). > This reminds me of the optimization ElementTree and lxml made in Python 2 (not sure what they do in Python 3?) where they use str when a string is ASCII to avoid the memory and performance overhead of unicode. Also at least lxml is also dealing with the divide between the internal libxml2 string representation and the Python representation. This is a place where bytes+encoding might also have some benefit. XML is someplace where you might load a bunch of data but only touch a little bit of it, and the amount of data is frequently large enough that the efficiencies are important. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Wed Jun 23 01:55:11 2010 From: pje at telecommunity.com (P.J. Eby) Date: Tue, 22 Jun 2010 19:55:11 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> Message-ID: <20100622235514.7B3FC3A4099@sparrow.telecommunity.com> At 07:41 AM 6/23/2010 +1000, Nick Coghlan wrote: >Then my example above could be made polymorphic (for ASCII compatible >encodings) by writing: > > [x for x in seq if x.endswith(x.coerce("b"))] > >I'm trying to see downsides to this idea, and I'm not really seeing >any (well, other than 2.7 being almost out the door and the fact we'd >have to grant ourselves an exception to the language moratorium) Notice, however, that if multi-string operations used a coercion protocol (they currently have to do type checks already for byte/unicode mixes), then you could make the entire stdlib polymorphic by default, even for other kinds of strings that don't exist yet. If you invent a new numeric type, generally speaking you can pass it to existing stdlib functions taking numbers, as long as it implements the appropriate protocols. Why not do the same for strings? From glyph at twistedmatrix.com Wed Jun 23 02:23:56 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 22 Jun 2010 20:23:56 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <39CFC9B3-E55A-41BB-9718-1457E20ACECC@twistedmatrix.com> <89AE7ED6-FB94-45DA-9432-7FCBA25A56BF@gmail.com> Message-ID: On Jun 22, 2010, at 12:53 PM, Guido van Rossum wrote: > On Mon, Jun 21, 2010 at 11:47 PM, Raymond Hettinger > wrote: >> >> On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote: >> >> This is a common pain-point for porting software to 3.x - you had a >> string, it kinda worked most of the time before, but now you need to keep >> track of text too and the functions which seemed to work on bytes no longer >> do. >> >> Thanks Glyph. That is a nice summary of one kind of challenge facing >> programmers. > > Ironically, Glyph also described the pain in 2.x: it only "kinda" worked. It was not my intention to be ironic about it - that was exactly what I meant :). 3.x is forcing you to confront an issue that you _should_ have confronted for 2.x anyway. (And, I hope, most libraries doing a 3.x migration will take the opportunity to make their 2.x APIs unicode-clean while still in 2to3 mode, and jump ship to 3.x source only _after_ there's a nice transition path for their clients that can be taken in 2 steps.) From glyph at twistedmatrix.com Wed Jun 23 02:25:31 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 22 Jun 2010 20:25:31 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> Message-ID: <94700B9C-25B4-4A75-BA43-20FEA3FDE772@twistedmatrix.com> On Jun 22, 2010, at 2:07 PM, James Y Knight wrote: > Yeah. This is a real issue I have with the direction Python3 went: it pushes you into decoding everything to unicode early, even when you don't care -- all you really wanted to do is pass it from one API to another, with some well-defined transformations, which don't actually depend on it having being decoded properly. (For example, extracting the path from the URL and attempting to open it as a file on the filesystem.) But you _do_ need to decode it in this case. If you got your URL from some funky UTF-32 datasource, b"\x00\x00\x00/" is not a path separator, "/" is. Plus, you should really be separating path segments and looking at them individually so that you don't fall victim to "%2F" bugs. And if you want your code to be portable, you need a Unicode representation of your pathname anyway for Windows; plus, there, you need to care about "\" as well as "/". The fact that your wire-bytes were probably ASCII(-ish) and your filesystem probably encodes pathnames as UTF-8 and so everything looks like it lines up is no excuse not to be explicit about your expectations there. You may want to transcode your characters into some other characters later, but that shouldn't stop you from treating them as characters of some variety in the meanwhile. > The surrogateescape method is a nice workaround for this, but I can't help thinking that it might've been better to just treat stuff as possibly-invalid-but-probably-utf8 byte-strings from input, through processing, to output. It seems kinda too late for that, though: next time someone designs a language, they can try that. :) I can think of lots of optimizations that might be interesting for Python (or perhaps some other runtime less concerned with cleverness overload, like PyPy) to implement, like a UTF-8 combining-characters overlay that would allow for fast indexing, lazily populated as random access dictates. But this could all be implemented as smartness inside .encode() and .decode() and the str and bytes types without changing the way the API works. I realize that there are implications at the C level, but as long as you can squeeze a function call in to certain places, it could still work. I can also appreciate what's been said in this thread a bunch of times: to my knowledge, nobody has actually shown a profile of an application where encoding is significant overhead. I believe that encoding _will_ be a significant overhead for some applications (and actually I think it will be very significant for some applications that I work on), but optimizations should really be implemented once that's been demonstrated, so that there's a better understanding of what the overhead is, exactly. Is memory a big deal? Is CPU? Is it both? Do you want to tune for the tradeoff? etc, etc. Clever data-structures seem premature until someone has a good idea of all those things. From glyph at twistedmatrix.com Wed Jun 23 02:34:31 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 22 Jun 2010 20:34:31 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> Message-ID: <5A4340BB-7B64-4C76-81FF-8A43F179AA7A@twistedmatrix.com> On Jun 22, 2010, at 7:23 PM, Ian Bicking wrote: > This is a place where bytes+encoding might also have some benefit. XML is someplace where you might load a bunch of data but only touch a little bit of it, and the amount of data is frequently large enough that the efficiencies are important. Different encodings have different characteristics, though, which makes them amenable to different types of optimizations. If you've got an ASCII string or a latin1 string, the optimizations of unicode are pretty obvious; if you've got one in UTF-16 with no multi-code-unit sequences, you could also hypothetically cheat for a while if you're on a UCS4 build of Python. I suspect the practical problem here is that there's no CharacterString ABC in the collections module for third-party libraries to provide their own peculiarly-optimized implementations that could lazily turn into real 'str's as needed. I'd volunteer to write a PEP if I thought I could actually get it done :-\. If someone else wants to be the primary author though, I'll try to help out. From murman at gmail.com Wed Jun 23 02:38:00 2010 From: murman at gmail.com (Michael Urman) Date: Tue, 22 Jun 2010 19:38:00 -0500 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <87lja73aau.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Jun 22, 2010 at 15:32, Terry Reedy wrote: > On 6/22/2010 9:24 AM, Michael Urman wrote: >> These are trivial functions; >> I just don't fully understand why the capability isn't baked in. > > Possible reasons: They are special purpose functions easily built on the > basic functions provided. Fine for a 3rd party library. Most people do not > need them. Some might be mislead by them. As other have said, "Not every > one-liner should be builtin". Perhaps the two-argument constructions on bytes and str should have been removed in favor of the .decode and .encode methods on their respective classes. Or vice versa; I don't have the history to know in which order they originated, and which is theoretically preferred these days. -- Michael Urman From mike.klaas at gmail.com Wed Jun 23 02:39:04 2010 From: mike.klaas at gmail.com (Mike Klaas) Date: Tue, 22 Jun 2010 17:39:04 -0700 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> Message-ID: On Tue, Jun 22, 2010 at 4:23 PM, Ian Bicking wrote: > This reminds me of the optimization ElementTree and lxml made in Python 2 > (not sure what they do in Python 3?) where they use str when a string is > ASCII to avoid the memory and performance overhead of unicode. An optimization that forces me to typecheck the return value of the function and that I only discovered after code started breaking. I can't say was enthused about that decision when I discovered it. -Mike From robertc at robertcollins.net Wed Jun 23 02:57:48 2010 From: robertc at robertcollins.net (Robert Collins) Date: Wed, 23 Jun 2010 12:57:48 +1200 Subject: [Python-Dev] bytes / unicode In-Reply-To: <94700B9C-25B4-4A75-BA43-20FEA3FDE772@twistedmatrix.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <94700B9C-25B4-4A75-BA43-20FEA3FDE772@twistedmatrix.com> Message-ID: On Wed, Jun 23, 2010 at 12:25 PM, Glyph Lefkowitz wrote: > I can also appreciate what's been said in this thread a bunch of times: to my knowledge, nobody has actually shown a profile of an application where encoding is significant overhead. ?I believe that encoding _will_ be a significant overhead for some applications (and actually I think it will be very significant for some applications that I work on), but optimizations should really be implemented once that's been demonstrated, so that there's a better understanding of what the overhead is, exactly. ?Is memory a big deal? ?Is CPU? ?Is it both? ?Do you want to tune for the tradeoff? ?etc, etc. ?Clever data-structures seem premature until someone has a good idea of all those things. bzr has a cache of decoded strings in it precisely because decode is slow. We accept slowness encoding to the users locale because thats typically much less data to examine than we've examined while generating the commit/diff/whatever. We also face memory pressure on a regular basis, and that has been, at least partly, due to UCS4 - our translation cache helps there because we have less duplicate UCS4 strings. You're welcome to dig deeper into this, but I don't have more detail paged into my head at the moment. -Rob From janssen at parc.com Wed Jun 23 03:56:51 2010 From: janssen at parc.com (Bill Janssen) Date: Tue, 22 Jun 2010 18:56:51 PDT Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <73196.1277143019@parc.com> References: <73196.1277143019@parc.com> Message-ID: <14929.1277258211@parc.com> Bill Janssen wrote: > Considering that we've just released 2.7rc2, there are an awful lot of > red buildbots for 2.7. In fact, I don't remember having seen a green > buildbot for OS X and 2.7. Shouldn't these be fixed? Thanks to some action by Ronald, my two PPC OS X buildbots are now showing green for the trunk. Bill From fdrake at acm.org Wed Jun 23 03:58:07 2010 From: fdrake at acm.org (Fred Drake) Date: Tue, 22 Jun 2010 21:58:07 -0400 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> <4C21412A.9030709@canterbury.ac.nz> <4C214040.20304@voidspace.org.uk> Message-ID: On Tue, Jun 22, 2010 at 7:17 PM, Raymond Hettinger wrote: > Benjamin fixed the UserDict ?and ABC problem earlier today in r82155. > It is now the same as it was in Py2.6. Thanks, Benjamin! -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From stephen at xemacs.org Wed Jun 23 08:44:28 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 23 Jun 2010 15:44:28 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <871vbyp7sj.fsf@uwakimon.sk.tsukuba.ac.jp> Ian Bicking writes: > Just for perspective, I don't know if I've ever wanted to deal with a URL > like that. Ditto, I do many times a day for Japanese media sites and Wikipedia. > I know how it is supposed to work, and I know what a browser does > with that, but so many tools will clean that URL up *or* won't be > able to deal with it at all that it's not something I'll be passing > around. I'm not suggesting that is something you want to be "passing around"; it's a presentation form, and I prefer that the internal form use Unicode. > While it's nice to be correct about encodings, sometimes it is > impractical. And it is far nicer to avoid the situation entirely. But you cannot avoid it entirely. Processing bytes mean you are assuming ASCII compatibility. Granted, this is a pretty good assumption, especially if you got the bytes off the wire, but it's not universally so. Maybe it's a YAGNI, but one reason I prefer the decode-process-encode paradigm is that choice of codec is a specification of the assumptions you're making about encoding. So the Know-Nothing codec described above assumes just enough ASCII compatibility to parse the scheme. You could also have codecs which assume just enough ASCII compatibility to parse a hierarchical scheme, etc. > That is, decoding content you don't care about isn't just > inefficient, it's complicated and can introduce errors. That depends on the codec(s) used. > Similarly I'd expect (from experience) that a programmer using > Python to want to take the same approach, sticking with unencoded > data in nearly all situations. Indeed, a programmer using Python 2 would want to do so, because all her literal strings are bytes by default (ie, if she doesn't mark them with `u'), and interactive input is, too. This is no longer so obvious in Python 3 which takes the attitude that things that are expected to be human-readable should be processed as str. The obvious example in URI space is the file:/// URL, which you'll typically build up from a user string or a file browser, which will call the os.path stuff which returns str. Text editors and viewers will also use str for their buffers, and if they provide a way to fish out URIs for their users, they'll probably return str. I won't pretend to judge the relative importance of such use cases. But use cases for urllib which naturally favor str until you put the URI on the wire do exist, as does the debugging presentation aspect. From ronaldoussoren at mac.com Wed Jun 23 08:08:13 2010 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Wed, 23 Jun 2010 08:08:13 +0200 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE4AF.80009@v.loewis.de> Message-ID: On 22 Jun, 2010, at 19:05, Alexander Belopolsky wrote: > On Tue, Jun 22, 2010 at 12:39 PM, Ronald Oussoren > wrote: > .. >> Both are valid fixes, both have both advantages and disadvantages. >> >> Your proposal: >> * Reverts to the behavior in 2.6 >> * Ensures that posix.getgroups and posix.setgroups are internally consistent >> > It is also very simple and since posix module worked fine on OSX for > years without _DARWIN_C_SOURCE, I think this is a very low risk > change. I don't agree. The patch itself is pretty simple, but it does make a rather significant change to the build process: the compile-time environment in configure would be different than during the compilation of posixmodule. That is, in functions that check for features (the HAVE_FOOBAR macros in pyconfig.h) would use _DARWIN_C_SOURCE while posixmodule itself wouldn't. This may lead to subtle bugs, or even compile errors (because some function definitions change when _DARWIN_C_SOURCE active). And man compat(5) says: 32-BIT COMPILATION Defining _NONSTD_SOURCE causes library and kernel calls to behave as closely to Mac OS X 10.3's library and kernel calls as possible. Any behavioral changes in this mode are documented in the LEGACY sections of the individual function calls. Defining _POSIX_C_SOURCE or _DARWIN_C_SOURCE causes library and kernel calls to conform to the SUSv3 standards even if doing so would alter the behavior of functions used in 10.3. Defining _POSIX_C_SOURCE also removes functions, types, and other interfaces that are not part of SUSv3 from the normal C namespace, unless _DARWIN_C_SOURCE is also defined (i.e., _DARWIN_C_SOURCE is _POSIX_C_SOURCE with non-POSIX exten- sions). In any of these cases, the _DARWIN_FEATURE_UNIX_CONFORMANCE feature macro will be defined to the SUS conformance level (it is unde- fined otherwise). Starting in Mac OS X 10.5, if none of the macros _NONSTD_SOURCE, _POSIX_C_SOURCE or _DARWIN_C_SOURCE are defined, and the environment vari- able MACOSX_DEPLOYMENT_TARGET is either undefined or set to 10.5 or greater (or equivalently, the gcc(1) option -mmacosx-version-min is either not specified or set to 10.5 or greater), then UNIX conformance will be on by default, and non-POSIX extensions will also be available (this is the equivalent of defining _DARWIN_C_SOURCE). For version values less that 10.5, UNIX conformance will be off (the equivalent of defining _NONSTD_SOURCE). My interpretation of that is that _DARWIN_C_SOURCE should be used to get SUSv3 APIs while keeping access to darwin-specific API's at well. When you deploy to 10.5 or later the compiler will set _DARWIN_C_SOURCE for you unless you set one of the other feature selecting defines. > >> My proposal: >> * Uses the newer ABI, which is more likely to be the one Apple wants you to use > > I don't think so. In getgroups(2) I see > > LEGACY DESCRIPTION > If _DARWIN_C_SOURCE is defined, getgroups() can return more than > {NGROUPS_MAX} groups. > > This suggests that this is legacy behavior. Newer applications should > use getgrouplist instead. I honestly don't know why this is in the LEGACY DESCRIPTION. But as the functionality you get with _DARWIN_C_SOURCE was added later I'd say that the behavior is intentional and not legacy. By not definining _DARWIN_C_SOURCE we don't necessarily get full UNIX behavior for other APIs. > >> * Is compatible with system tools (that is, posix.getgroups() agrees with id(1)) > > I have not tested this recently, but I think if you exec id from a > program after a call to setgroups(), it will return process groups, > not user groups. > >> * Is compatible with /usr/bin/python > > I am sure that one this issue is fixed upstream, Apple will pick it up > with the next version. Haha. Apple explicitly added patches to get the current behavior instead of the default, what makes you think that they'll revert to the older behavior. > >> * results in posix.getgroups not reflecting results of posix.setgroups >> > > This effectively substitutes getgrouplist called on the current user > for getgroups. In 3.x, I believe the correct action will be to > provide direct access to getgrouplist which is while not POSIX (yet?), > is widely available. I don't mind adding getgrouplist, but that issue is seperator from this one. BTW. Appearently getgrouplist is posix ( ), although this isn't a requirement for being added to the posix module. It is still my opinion that the second option is preferable for better compatibility with system tools, even if the patch is more complicated and the library function we use can be considered to be broken. Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3567 bytes Desc: not available URL: From stephen at xemacs.org Wed Jun 23 09:07:50 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 23 Jun 2010 16:07:50 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> Message-ID: <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> James Y Knight writes: > The surrogateescape method is a nice workaround for this, but I can't > help thinking that it might've been better to just treat stuff as > possibly-invalid-but-probably-utf8 byte-strings from input, through > processing, to output. This is the world we already have, modulo s/utf8/ascii + random GR charset/. It doesn't work, and it can't, in Japan or China or Korea, and probably not in Russia or Kazakhstan, for some time yet. That's not to say that byte-oriented processing doesn't have its place. And in many cases it's reasonable (but not secure or bulletproof!) to assume ASCII compatibility of the byte stream, passing through syntactically unimportant bytes verbatim. Syntactic analysis of such streams will surely have a lot in common with that for text streams, so the same tools should be available. (That's the point of Guido's endorsement of polymorphism, AIUI.) But it's just not reasonable to assume that will work in a context where text streams from various sources are mixed with byte streams. In that case, the byte streams need to be converted to text before mixing. (You can't do it the other way around because there is no guarantee that the text is compatible with the current encoding of the byte stream, nor that all the byte streams have the same encoding.) We do need str-based implementations of modules like urllib. From mal at egenix.com Wed Jun 23 11:18:23 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 23 Jun 2010 11:18:23 +0200 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <4C20FC54.9000608@egenix.com> Message-ID: <4C21D15F.8070304@egenix.com> Nick Coghlan wrote: > On Wed, Jun 23, 2010 at 4:09 AM, M.-A. Lemburg wrote: >> It would be great if we could have something like the above as >> builtin method: >> >> x.split('&'.as(x)) > > As per my other message, another possible (and reasonably intuitive) > spelling would be: > > x.split(x.coerce('&')) You are right: there are two ways to adapt one object to another. You can either adapt object 1 to object 2 or object 2 to object 1. This is what the Python2 coercion protocol does for operators. I just wanted to avoid using that term, since Python3 removes the coercion protocol. > Writing it as a helper function is also possible, although it be > trickier to remember the correct argument ordering: > > def coerce_to(target, obj, encoding=None, errors='surrogateescape'): > if isinstance(obj, type(target)): > return obj > if encoding is None: > encoding = sys.getdefaultencoding() > try:: > convert = obj.decode > except AttributeError: > convert = obj.encode > return convert(encoding, errors) > > x.split(coerce_to(x, '&')) > >> Perhaps something to discuss on the language summit at EuroPython. >> >> Too bad we can't add such porting enhancements to Python2 anymore. > > Well, we can if we really want to, it just entails convincing Benjamin > to reschedule the 2.7 final release. Given the UserDict/ABC/old-style > classes issue, there's a fair chance there's going to be at least one > more 2.7 RC anyway. > > That said, since this kind of coercion can be done in a helper > function, that should be adequate for the 2.x to 3.x conversion case > (for 2.x, the helper function can be defined to just return the second > argument since bytes and str are the same type, while the 3.x version > would look something like the code above) True. Note that the point of using a builtin method was to get better performance. Such type adaptions are often needed in loops, so adding a few extra Python function calls just to convert a str object to a bytes object or vice-versa is a bit much overhead. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 23 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 25 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From cesare.di.mauro at gmail.com Wed Jun 23 12:12:36 2010 From: cesare.di.mauro at gmail.com (Cesare Di Mauro) Date: Wed, 23 Jun 2010 12:12:36 +0200 Subject: [Python-Dev] WPython 1.1 was released Message-ID: I've released WPython 1.1, which brings many optimizations and refactorings. The project is hosted at Google Code: http://code.google.com/p/wpython2/ and available as a Mercurial repository http://code.google.com/p/wpython2/source/checkout?repo=wpython11 . In the download section http://code.google.com/p/wpython2/downloads/listthere are the slides of the last italian PyCon where I have presented the project and illustrated the changes. You can also download the binaries for Windows (compressed in 7-Zip format: http://www.7-zip.org/ ) and sources (for Unix users, Parser/Python.asdl and configure files need to be chmod +x ). Attached there are some benchmarks with the Unladen Swallow tests suite (against Python 2.6.4). Regards Cesare -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ?Report on Darwin iMac-di-Mirco.local 10.3.0 Darwin Kernel Version 10.3.0: Fri Feb 26 11:57:13 PST 2010; root:xnu-1504.3.12~1/RELEASE_X86_64 x86_64 i386 Total CPU cores: 2 ### 2to3 ### 29.085133 -> 25.601404: 1.1361x faster ### bzr_startup ### Min: 0.204419 -> 0.096856: 2.1105x faster Avg: 0.213686 -> 0.113666: 1.8799x faster Significant (t=71.767819) Stddev: 0.01277 -> 0.00559: 2.2833x smaller Timeline: http://tinyurl.com/y7qgndp ### call_method ### Min: 0.644754 -> 0.622001: 1.0366x faster Avg: 0.806862 -> 0.725472: 1.1122x faster Significant (t=11.301638) Stddev: 0.07300 -> 0.04951: 1.4744x smaller Timeline: http://tinyurl.com/y3rfsnu ### call_method_slots ### Min: 0.626559 -> 0.589525: 1.0628x faster Avg: 0.761122 -> 0.680558: 1.1184x faster Significant (t=11.706336) Stddev: 0.06496 -> 0.05371: 1.2093x smaller Timeline: http://tinyurl.com/y7kkg9m ### call_method_unknown ### Min: 0.669814 -> 0.593711: 1.1282x faster Avg: 0.883463 -> 0.746100: 1.1841x faster Significant (t=8.601215) Stddev: 0.13619 -> 0.14039: 1.0308x larger Timeline: http://tinyurl.com/y6u5qut ### call_simple ### Min: 0.486911 -> 0.435191: 1.1188x faster Avg: 0.700634 -> 0.590928: 1.1857x faster Significant (t=9.030587) Stddev: 0.12218 -> 0.08491: 1.4390x smaller Timeline: http://tinyurl.com/y2pnbfz ### float ### Min: 0.126226 -> 0.097072: 1.3003x faster Avg: 0.174486 -> 0.164656: 1.0597x faster Significant (t=2.822244) Stddev: 0.02922 -> 0.04668: 1.5976x larger Timeline: http://tinyurl.com/y3o7gko ### hg_startup ### Min: 0.057444 -> 0.042930: 1.3381x faster Avg: 0.067769 -> 0.050515: 1.3416x faster Significant (t=109.019677) Stddev: 0.00293 -> 0.00199: 1.4687x smaller Timeline: http://tinyurl.com/y5ss3l9 ### html5lib ### Min: 16.410586 -> 15.971322: 1.0275x faster Avg: 16.579096 -> 16.119135: 1.0285x faster Significant (t=5.554462) Stddev: 0.13844 -> 0.12297: 1.1258x smaller Timeline: http://tinyurl.com/yya44oj ### html5lib_warmup ### Min: 17.765242 -> 15.582871: 1.1400x faster Avg: 17.968972 -> 16.065290: 1.1185x faster Significant (t=10.236030) Stddev: 0.28980 -> 0.29826: 1.0292x larger Timeline: http://tinyurl.com/y7osmkp ### iterative_count ### Min: 0.156827 -> 0.084917: 1.8468x faster Avg: 0.166389 -> 0.090218: 1.8443x faster Significant (t=26.855602) Stddev: 0.01766 -> 0.00950: 1.8586x smaller Timeline: http://tinyurl.com/y2kz25f ### nbody ### Min: 0.498760 -> 0.427710: 1.1661x faster Avg: 0.515754 -> 0.445318: 1.1582x faster Significant (t=22.964790) Stddev: 0.01500 -> 0.01566: 1.0442x larger Timeline: http://tinyurl.com/y7b92bm ### normal_startup ### Min: 0.534059 -> 0.817747: 1.5312x slower Avg: 0.547493 -> 0.838141: 1.5309x slower Significant (t=-127.297104) Stddev: 0.00799 -> 0.01403: 1.7567x larger Timeline: http://tinyurl.com/y5tfkm3 ### nqueens ### Min: 0.583106 -> 0.573619: 1.0165x faster Avg: 0.611182 -> 0.595222: 1.0268x faster Significant (t=3.893252) Stddev: 0.02367 -> 0.01674: 1.4142x smaller Timeline: http://tinyurl.com/y79zhpz ### pickle ### Min: 1.660705 -> 1.576223: 1.0536x faster Avg: 1.757750 -> 1.672262: 1.0511x faster Significant (t=9.284162) Stddev: 0.04774 -> 0.04427: 1.0785x smaller Timeline: http://tinyurl.com/y2f3eee ### pickle_dict ### Min: 1.389026 -> 1.468648: 1.0573x slower Avg: 1.479180 -> 1.551554: 1.0489x slower Significant (t=-7.056610) Stddev: 0.05664 -> 0.04529: 1.2507x smaller Timeline: http://tinyurl.com/y2kl4no ### pickle_list ### Min: 0.802236 -> 0.780976: 1.0272x faster Avg: 0.843450 -> 0.822717: 1.0252x faster Significant (t=3.353898) Stddev: 0.02861 -> 0.03305: 1.1554x larger Timeline: http://tinyurl.com/y2csxb9 ### pybench ### Min: 4906 -> 4344: 1.1294x faster Avg: 5235 -> 4618: 1.1336x faster ### regex_compile ### Min: 0.757385 -> 0.663902: 1.1408x faster Avg: 0.807480 -> 0.698190: 1.1565x faster Significant (t=20.304562) Stddev: 0.03027 -> 0.02308: 1.3116x smaller Timeline: http://tinyurl.com/y5vmu5y ### regex_effbot ### Min: 0.102901 -> 0.095138: 1.0816x faster Avg: 0.109344 -> 0.102460: 1.0672x faster Significant (t=5.515715) Stddev: 0.00574 -> 0.00670: 1.1678x larger Timeline: http://tinyurl.com/yyhbuzh ### regex_v8 ### Min: 0.123948 -> 0.106031: 1.1690x faster Avg: 0.128534 -> 0.111830: 1.1494x faster Significant (t=16.677634) Stddev: 0.00436 -> 0.00558: 1.2787x larger Timeline: http://tinyurl.com/y2zrssn ### richards ### Min: 0.354665 -> 0.287113: 1.2353x faster Avg: 0.381205 -> 0.306374: 1.2442x faster Significant (t=23.417400) Stddev: 0.01926 -> 0.01182: 1.6294x smaller Timeline: http://tinyurl.com/yyzqb7v ### slowpickle ### Min: 0.753230 -> 0.664495: 1.1335x faster Avg: 0.801162 -> 0.708291: 1.1311x faster Significant (t=17.994391) Stddev: 0.02267 -> 0.02860: 1.2612x larger Timeline: http://tinyurl.com/y4z6poh ### slowspitfire ### Min: 0.868708 -> 0.872393: 1.0042x slower Avg: 0.971014 -> 0.919428: 1.0561x faster Significant (t=4.503573) Stddev: 0.07780 -> 0.02253: 3.4529x smaller Timeline: http://tinyurl.com/y64sn8p ### slowunpickle ### Min: 0.337317 -> 0.299357: 1.1268x faster Avg: 0.353311 -> 0.313929: 1.1254x faster Significant (t=19.034627) Stddev: 0.01066 -> 0.01002: 1.0629x smaller Timeline: http://tinyurl.com/y3symau ### startup_nosite ### Min: 0.317232 -> 0.224719: 1.4117x faster Avg: 0.333151 -> 0.235118: 1.4170x faster Significant (t=95.671333) Stddev: 0.00851 -> 0.00571: 1.4919x smaller Timeline: http://tinyurl.com/yyvr8m5 ### threaded_count ### Min: 0.194147 -> 0.116080: 1.6725x faster Avg: 0.216559 -> 0.139140: 1.5564x faster Significant (t=50.972602) Stddev: 0.00765 -> 0.00753: 1.0162x smaller Timeline: http://tinyurl.com/y38bz5h ### unpack_sequence ### Min: 0.000093 -> 0.000082: 1.1337x faster Avg: 0.000098 -> 0.000086: 1.1343x faster Significant (t=25.434812) Stddev: 0.00007 -> 0.00008: 1.1129x larger Timeline: http://tinyurl.com/y5hv9ck ### unpickle ### Min: 1.102754 -> 1.015811: 1.0856x faster Avg: 1.138448 -> 1.052802: 1.0814x faster Significant (t=18.018135) Stddev: 0.02248 -> 0.02499: 1.1118x larger Timeline: http://tinyurl.com/y49x4pk ### unpickle_list ### Min: 0.990238 -> 0.881112: 1.1239x faster Avg: 1.043900 -> 0.933968: 1.1177x faster Significant (t=21.205782) Stddev: 0.02977 -> 0.02139: 1.3913x smaller Timeline: http://tinyurl.com/y49pm9p Report on Linux cionci-desktop 2.6.27-17-generic #1 SMP Fri Mar 12 02:08:25 UTC 2010 x86_64 Total CPU cores: 2 ### 2to3 ### 27.729733 -> 25.521595: 1.0865x faster ### bzr_startup ### Min: 0.072004 -> 0.068004: 1.0588x faster Avg: 0.094326 -> 0.091926: 1.0261x faster Not significant Stddev: 0.00883 -> 0.00958: 1.0851x larger Timeline: http://tinyurl.com/y5zc5ca ### call_method ### Min: 0.630349 -> 0.566228: 1.1132x faster Avg: 0.655913 -> 0.574280: 1.1421x faster Significant (t=54.712328) Stddev: 0.01462 -> 0.01096: 1.3344x smaller Timeline: http://tinyurl.com/y6eg77c ### call_method_slots ### Min: 0.635804 -> 0.511669: 1.2426x faster Avg: 0.660014 -> 0.528936: 1.2478x faster Significant (t=69.342882) Stddev: 0.01859 -> 0.01380: 1.3470x smaller Timeline: http://tinyurl.com/y7p9esb ### call_method_unknown ### Min: 0.766309 -> 0.562713: 1.3618x faster Avg: 0.774030 -> 0.585773: 1.3214x faster Significant (t=90.713925) Stddev: 0.00759 -> 0.02426: 3.1937x larger Timeline: http://tinyurl.com/y6y6w7a ### call_simple ### Min: 0.498106 -> 0.451661: 1.1028x faster Avg: 0.502283 -> 0.460072: 1.0917x faster Significant (t=62.530336) Stddev: 0.00738 -> 0.00373: 1.9763x smaller Timeline: http://tinyurl.com/y5gt8qa ### float ### Min: 0.117934 -> 0.102821: 1.1470x faster Avg: 0.129057 -> 0.117482: 1.0985x faster Significant (t=12.577691) Stddev: 0.00811 -> 0.01208: 1.4897x larger Timeline: http://tinyurl.com/y2pc4wj ### hg_startup ### Min: 0.012000 -> 0.012001: 1.0001x slower Avg: 0.033594 -> 0.032258: 1.0414x faster Significant (t=3.596547) Stddev: 0.00597 -> 0.00578: 1.0320x smaller Timeline: http://tinyurl.com/y449a8r ### html5lib ### Min: 16.581036 -> 15.668980: 1.0582x faster Avg: 16.823451 -> 15.946597: 1.0550x faster Significant (t=4.738181) Stddev: 0.22787 -> 0.34542: 1.5159x larger Timeline: http://tinyurl.com/y3wx52k ### html5lib_warmup ### Min: 16.436294 -> 15.664941: 1.0492x faster Avg: 16.810495 -> 15.983748: 1.0517x faster Significant (t=2.827967) Stddev: 0.43953 -> 0.48388: 1.1009x larger Timeline: http://tinyurl.com/y74vue8 ### iterative_count ### Min: 0.189088 -> 0.083317: 2.2695x faster Avg: 0.191612 -> 0.088073: 2.1756x faster Significant (t=65.385891) Stddev: 0.00501 -> 0.01001: 1.9975x larger Timeline: http://tinyurl.com/y65yy5c ### nbody ### Min: 0.568523 -> 0.426052: 1.3344x faster Avg: 0.580190 -> 0.428620: 1.3536x faster Significant (t=72.626477) Stddev: 0.01450 -> 0.00273: 5.3178x smaller Timeline: http://tinyurl.com/y5hbwsy ### normal_startup ### Min: 0.420100 -> 0.408876: 1.0275x faster Avg: 0.475876 -> 0.489076: 1.0277x slower Not significant Stddev: 0.04082 -> 0.05543: 1.3579x larger Timeline: http://tinyurl.com/y5jdfgq ### nqueens ### Min: 0.585605 -> 0.577289: 1.0144x faster Avg: 0.603038 -> 0.594904: 1.0137x faster Significant (t=2.026307) Stddev: 0.01851 -> 0.02152: 1.1629x larger Timeline: http://tinyurl.com/yydzdhw ### pickle ### Min: 1.592286 -> 1.584492: 1.0049x faster Avg: 1.611001 -> 1.606726: 1.0027x faster Not significant Stddev: 0.01343 -> 0.03570: 2.6586x larger Timeline: http://tinyurl.com/yyax7wc ### pickle_dict ### Min: 1.316577 -> 1.298239: 1.0141x faster Avg: 1.320249 -> 1.311228: 1.0069x faster Significant (t=3.270732) Stddev: 0.00367 -> 0.01915: 5.2196x larger Timeline: http://tinyurl.com/y2smb8n ### pickle_list ### Min: 0.734164 -> 0.727414: 1.0093x faster Avg: 0.749225 -> 0.738023: 1.0152x faster Significant (t=3.523434) Stddev: 0.01996 -> 0.01035: 1.9291x smaller Timeline: http://tinyurl.com/yybbuct ### pybench ### Min: 5133 -> 4264: 1.2038x faster Avg: 5370 -> 4448: 1.2073x faster ### regex_compile ### Min: 0.783521 -> 0.706420: 1.1091x faster Avg: 0.805385 -> 0.743189: 1.0837x faster Significant (t=14.697890) Stddev: 0.01900 -> 0.02312: 1.2168x larger Timeline: http://tinyurl.com/y4ng9oz ### regex_effbot ### Min: 0.106946 -> 0.108064: 1.0105x slower Avg: 0.108937 -> 0.112714: 1.0347x slower Significant (t=-4.189386) Stddev: 0.00158 -> 0.00618: 3.9173x larger Timeline: http://tinyurl.com/y2xs6yp ### regex_v8 ### Min: 0.114305 -> 0.110961: 1.0301x faster Avg: 0.119100 -> 0.113885: 1.0458x faster Significant (t=6.210478) Stddev: 0.00525 -> 0.00278: 1.8876x smaller Timeline: http://tinyurl.com/y5q2nlh ### richards ### Min: 0.376030 -> 0.309641: 1.2144x faster Avg: 0.389031 -> 0.314998: 1.2350x faster Significant (t=29.499544) Stddev: 0.01745 -> 0.00325: 5.3669x smaller Timeline: http://tinyurl.com/y5rh4av ### slowpickle ### Min: 0.800369 -> 0.711095: 1.1255x faster Avg: 0.824734 -> 0.735770: 1.1209x faster Significant (t=19.434640) Stddev: 0.02554 -> 0.01989: 1.2842x smaller Timeline: http://tinyurl.com/y79lh35 ### slowspitfire ### Min: 0.813913 -> 0.761560: 1.0687x faster Avg: 0.829754 -> 0.841118: 1.0137x slower Not significant Stddev: 0.01202 -> 0.05522: 4.5958x larger Timeline: http://tinyurl.com/y4y6f4x ### slowunpickle ### Min: 0.369238 -> 0.296829: 1.2439x faster Avg: 0.384044 -> 0.300151: 1.2795x faster Significant (t=32.788791) Stddev: 0.01766 -> 0.00391: 4.5186x smaller Timeline: http://tinyurl.com/y84c2bp ### startup_nosite ### Min: 0.173227 -> 0.183291: 1.0581x slower Avg: 0.234029 -> 0.235226: 1.0051x slower Not significant Stddev: 0.02222 -> 0.01951: 1.1389x smaller Timeline: http://tinyurl.com/y2esfmd ### threaded_count ### Min: 0.203453 -> 0.084667: 2.4030x faster Avg: 0.263979 -> 0.105661: 2.4984x faster Significant (t=26.001645) Stddev: 0.03833 -> 0.01960: 1.9552x smaller Timeline: http://tinyurl.com/y74qvbf ### unpack_sequence ### Min: 0.000116 -> 0.000108: 1.0728x faster Avg: 0.000121 -> 0.000118: 1.0261x faster Significant (t=13.346440) Stddev: 0.00004 -> 0.00004: 1.0544x larger Timeline: http://tinyurl.com/y6rld7k ### unpickle ### Min: 0.919231 -> 0.922668: 1.0037x slower Avg: 0.936096 -> 0.947798: 1.0125x slower Significant (t=-3.379601) Stddev: 0.01505 -> 0.01931: 1.2834x larger Timeline: http://tinyurl.com/y3ymn85 ### unpickle_list ### Min: 0.690399 -> 0.690025: 1.0005x faster Avg: 0.729519 -> 0.698789: 1.0440x faster Significant (t=11.660568) Stddev: 0.01430 -> 0.01195: 1.1965x smaller Timeline: http://tinyurl.com/y38lfuh Report on Linux sauron 2.6.33-ARCH #1 SMP PREEMPT Sun Apr 4 10:27:30 CEST 2010 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5200+ Total CPU cores: 2 ### 2to3 ### 29.598071 -> 23.691789: 1.2493x faster ### bzr_startup ### Min: 0.083328 -> 0.076661: 1.0870x faster Avg: 0.100727 -> 0.094061: 1.0709x faster Significant (t=5.464159) Stddev: 0.00863 -> 0.00863: 1.0000x larger Timeline: http://tinyurl.com/y6mng7k ### call_method ### Min: 0.796609 -> 0.538237: 1.4800x faster Avg: 0.816184 -> 0.547101: 1.4918x faster Significant (t=92.212665) Stddev: 0.03177 -> 0.01636: 1.9417x smaller Timeline: http://tinyurl.com/yygle37 ### call_method_slots ### Min: 0.780177 -> 0.535730: 1.4563x faster Avg: 0.797951 -> 0.544117: 1.4665x faster Significant (t=104.627536) Stddev: 0.02414 -> 0.01733: 1.3926x smaller Timeline: http://tinyurl.com/y76hawm ### call_method_unknown ### Min: 0.808852 -> 0.610603: 1.3247x faster Avg: 0.821008 -> 0.614395: 1.3363x faster Significant (t=109.946891) Stddev: 0.02158 -> 0.00800: 2.6994x smaller Timeline: http://tinyurl.com/y43e5fl ### call_simple ### Min: 0.602984 -> 0.484837: 1.2437x faster Avg: 0.627628 -> 0.508925: 1.2332x faster Significant (t=56.792486) Stddev: 0.02009 -> 0.01587: 1.2658x smaller Timeline: http://tinyurl.com/yyrerh8 ### float ### Min: 0.145489 -> 0.120753: 1.2048x faster Avg: 0.157275 -> 0.131557: 1.1955x faster Significant (t=29.200486) Stddev: 0.01020 -> 0.00948: 1.0763x smaller Timeline: http://tinyurl.com/y5h4frq ### hg_startup ### Min: 0.013332 -> 0.016666: 1.2501x slower Avg: 0.030811 -> 0.033631: 1.0915x slower Significant (t=-7.625262) Stddev: 0.00610 -> 0.00558: 1.0933x smaller Timeline: http://tinyurl.com/y7c2vbv ### html5lib ### Min: 16.772239 -> 13.632444: 1.2303x faster Avg: 17.400199 -> 13.809100: 1.2601x faster Significant (t=19.710438) Stddev: 0.35648 -> 0.19722: 1.8075x smaller Timeline: http://tinyurl.com/y52q84h ### html5lib_warmup ### Min: 17.155307 -> 13.597860: 1.2616x faster Avg: 17.758442 -> 14.069391: 1.2622x faster Significant (t=12.638530) Stddev: 0.58006 -> 0.29922: 1.9386x smaller Timeline: http://tinyurl.com/y5ragx4 ### iterative_count ### Min: 0.272019 -> 0.144380: 1.8841x faster Avg: 0.321844 -> 0.155405: 2.0710x faster Significant (t=23.655896) Stddev: 0.04319 -> 0.02469: 1.7493x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.46044492722&chco=FF0000,0000FF&chdl=/usr/bin/python|../wpython11/python&chds=0,1.46044492722&chd=t:0.28,0.28,0.28,0.28,0.33,0.33,0.31,0.31,0.29,0.3,0.32,0.35,0.29,0.3,0.29,0.28,0.27,0.27,0.27,0.29,0.32,0.35,0.31,0.28,0.27,0.3,0.35,0.3,0.29,0.28,0.3,0.29,0.31,0.31,0.33,0.32,0.34,0.41,0.34,0.33,0.33,0.34,0.34,0.36,0.4,0.43,0.46,0.41,0.38,0.35|0.3,0.15,0.15,0.15,0.17,0.15,0.15,0.15,0.16,0.15,0.14,0.14,0.2,0.16,0.17,0.19,0.16,0.16,0.16,0.2,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.15,0.16,0.16,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.15,0.14,0.14,0.15,0.14,0.17,0.15&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=iterative_count ### nbody ### Min: 0.639303 -> 0.496505: 1.2876x faster Avg: 0.663221 -> 0.507123: 1.3078x faster Significant (t=42.102614) Stddev: 0.01815 -> 0.01892: 1.0424x larger Timeline: http://tinyurl.com/y64lglq ### normal_startup ### Min: 0.374472 -> 0.461435: 1.2322x slower Avg: 0.413358 -> 0.515210: 1.2464x slower Significant (t=-17.591195) Stddev: 0.02972 -> 0.02815: 1.0558x smaller Timeline: http://tinyurl.com/y7qj6zz ### nqueens ### Min: 0.698012 -> 0.507417: 1.3756x faster Avg: 0.748165 -> 0.559723: 1.3367x faster Significant (t=21.603119) Stddev: 0.03138 -> 0.05310: 1.6921x larger Timeline: http://tinyurl.com/y3xv95e ### pickle ### Min: 1.584518 -> 1.526627: 1.0379x faster Avg: 1.673835 -> 1.658376: 1.0093x faster Not significant Stddev: 0.06500 -> 0.07568: 1.1644x larger Timeline: http://tinyurl.com/y4224pp ### pickle_dict ### Min: 1.568636 -> 1.498363: 1.0469x faster Avg: 1.618752 -> 1.575946: 1.0272x faster Significant (t=4.120055) Stddev: 0.04758 -> 0.05598: 1.1767x larger Timeline: http://tinyurl.com/yyzl6b5 ### pickle_list ### Min: 0.771403 -> 0.752089: 1.0257x faster Avg: 0.797367 -> 0.778438: 1.0243x faster Significant (t=3.157783) Stddev: 0.02620 -> 0.03332: 1.2721x larger Timeline: http://tinyurl.com/yyp5cjx ### pybench ### Min: 5994 -> 4470: 1.3409x faster Avg: 6250 -> 4781: 1.3073x faster ### regex_compile ### Min: 0.838116 -> 0.664657: 1.2610x faster Avg: 0.846488 -> 0.691629: 1.2239x faster Significant (t=31.710076) Stddev: 0.01236 -> 0.03224: 2.6085x larger Timeline: http://tinyurl.com/y65ceh8 ### regex_effbot ### Min: 0.169898 -> 0.152830: 1.1117x faster Avg: 0.179772 -> 0.158301: 1.1356x faster Significant (t=13.100118) Stddev: 0.00746 -> 0.00887: 1.1895x larger Timeline: http://tinyurl.com/yyazgxh ### regex_v8 ### Min: 0.152255 -> 0.134914: 1.1285x faster Avg: 0.159778 -> 0.144822: 1.1033x faster Significant (t=10.310186) Stddev: 0.00598 -> 0.00834: 1.3944x larger Timeline: http://tinyurl.com/y4znhxx ### richards ### Min: 0.361250 -> 0.281802: 1.2819x faster Avg: 0.384307 -> 0.294562: 1.3047x faster Significant (t=27.621845) Stddev: 0.02043 -> 0.01052: 1.9419x smaller Timeline: http://tinyurl.com/y3hx8w2 ### slowpickle ### Min: 0.826115 -> 0.610384: 1.3534x faster Avg: 0.872314 -> 0.627799: 1.3895x faster Significant (t=43.041072) Stddev: 0.03384 -> 0.02165: 1.5626x smaller Timeline: http://tinyurl.com/y4dr42c ### slowspitfire ### Min: 0.820168 -> 0.697804: 1.1754x faster Avg: 0.840062 -> 0.736274: 1.1410x faster Significant (t=20.687150) Stddev: 0.02540 -> 0.02477: 1.0256x smaller Timeline: http://tinyurl.com/y6cn2c7 ### slowunpickle ### Min: 0.423866 -> 0.306436: 1.3832x faster Avg: 0.431624 -> 0.308273: 1.4001x faster Significant (t=103.485543) Stddev: 0.00781 -> 0.00318: 2.4556x smaller Timeline: http://tinyurl.com/y7p5ugb ### startup_nosite ### Min: 0.182274 -> 0.166099: 1.0974x faster Avg: 0.201290 -> 0.185015: 1.0880x faster Significant (t=8.405736) Stddev: 0.01255 -> 0.01474: 1.1748x larger Timeline: http://tinyurl.com/y26jqjm ### threaded_count ### Min: 0.292005 -> 0.174754: 1.6710x faster Avg: 0.345331 -> 0.191805: 1.8004x faster Significant (t=48.856578) Stddev: 0.02041 -> 0.00877: 2.3267x smaller Timeline: http://tinyurl.com/y6dl2e6 ### unpack_sequence ### Min: 0.000106 -> 0.000091: 1.1684x faster Avg: 0.000114 -> 0.000099: 1.1433x faster Significant (t=21.367174) Stddev: 0.00009 -> 0.00012: 1.2958x larger Timeline: http://tinyurl.com/y2sujno ### unpickle ### Min: 0.908351 -> 0.803020: 1.1312x faster Avg: 0.984448 -> 0.856525: 1.1494x faster Significant (t=19.812585) Stddev: 0.03248 -> 0.03209: 1.0122x smaller Timeline: http://tinyurl.com/y4zmlaj ### unpickle_list ### Min: 0.754476 -> 0.719254: 1.0490x faster Avg: 0.802729 -> 0.759628: 1.0567x faster Significant (t=6.699951) Stddev: 0.03771 -> 0.02544: 1.4821x smaller Timeline: http://tinyurl.com/y6tv2us Report on Linux raffaello 2.6.31.12-0.2-desktop #1 SMP PREEMPT 2010-03-16 21:25:39 +0100 i686 athlon Total CPU cores: 1 ### 2to3 ### 43.432397 -> 43.283420: 1.0034x faster ### bzr_startup ### Min: 0.140979 -> 0.144978: 1.0284x slower Avg: 0.159606 -> 0.157596: 1.0128x faster Significant (t=2.709326) Stddev: 0.00578 -> 0.00465: 1.2418x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.175973&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.175973&chd=t:0.16,0.16,0.16,0.16,0.16,0.16,0.15,0.18,0.16,0.16,0.17,0.17,0.17,0.16,0.16,0.17,0.16,0.16,0.16,0.15,0.16,0.16,0.16,0.16,0.16,0.16,0.17,0.15,0.16,0.16,0.15,0.16,0.16,0.15,0.16,0.16,0.16,0.16,0.16,0.17,0.16,0.15,0.16,0.16,0.16,0.16,0.15,0.16,0.15,0.16,0.16,0.16,0.17,0.16,0.16,0.16,0.16,0.15,0.16,0.16,0.15,0.16,0.17,0.16,0.15,0.15,0.14,0.16,0.16,0.16,0.16,0.17,0.16,0.15,0.16,0.16,0.17,0.16,0.16,0.16,0.15,0.15,0.17,0.16,0.16,0.16,0.15,0.15,0.16,0.14,0.15,0.17,0.16,0.15,0.16,0.16,0.15,0.16,0.15,0.15|0.16,0.15,0.15,0.15,0.16,0.15,0.16,0.16,0.16,0.15,0.15,0.15,0.16,0.15,0.16,0.15,0.16,0.15,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.17,0.15,0.15,0.16,0.16,0.16,0.15,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.15,0.16,0.16,0.15,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.17,0.16,0.16,0.16,0.16,0.16,0.16,0.15,0.16,0.16,0.15,0.15,0.15,0.16,0.16,0.15,0.15,0.16,0.16,0.15,0.15,0.14,0.16,0.17,0.16,0.15,0.15,0.16,0.16,0.16,0.16,0.15,0.16,0.16,0.16,0.15,0.15,0.16,0.16,0.16,0.16,0.15,0.15,0.15,0.16,0.16,0.16,0.16,0.17&chxl=0:|1|20|40|60|80|100|2:||Iteration|3:||Time+(secs)&chtt=bzr_startup ### call_method ### Min: 1.158909 -> 1.059187: 1.0941x faster Avg: 1.161172 -> 1.113055: 1.0432x faster Significant (t=22.522125) Stddev: 0.00131 -> 0.02613: 19.9944x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0.07623100281,2.1763420105&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0.07623100281,2.1763420105&chd=t:1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.17,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.17,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16|1.13,1.16,1.13,1.11,1.11,1.14,1.16,1.08,1.1,1.14,1.14,1.14,1.1,1.13,1.15,1.15,1.13,1.14,1.17,1.11,1.1,1.11,1.14,1.11,1.13,1.11,1.14,1.11,1.11,1.15,1.13,1.18,1.16,1.1,1.1,1.1,1.12,1.08,1.11,1.09,1.09,1.09,1.12,1.16,1.08,1.1,1.08,1.12,1.13,1.15,1.14,1.16,1.13,1.14,1.16,1.09,1.14,1.15,1.13,1.11,1.1,1.09,1.1,1.1,1.11,1.11,1.11,1.14,1.15,1.12,1.13,1.14,1.16,1.14,1.09&chxl=0:|1|15|30|45|60|75|2:||Iteration|3:||Time+(secs)&chtt=call_method ### call_method_slots ### Min: 1.149059 -> 1.078626: 1.0653x faster Avg: 1.151797 -> 1.143283: 1.0074x faster Significant (t=3.330294) Stddev: 0.00124 -> 0.03128: 25.1750x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0.09424901009,2.22079586983&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0.09424901009,2.22079586983&chd=t:1.15,1.16,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.16,1.16,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15|1.16,1.13,1.2,1.16,1.16,1.15,1.17,1.13,1.14,1.13,1.16,1.19,1.13,1.17,1.17,1.14,1.11,1.14,1.19,1.2,1.17,1.22,1.2,1.14,1.14,1.15,1.21,1.16,1.19,1.1,1.15,1.13,1.15,1.13,1.09,1.18,1.18,1.14,1.13,1.13,1.12,1.15,1.18,1.17,1.19,1.21,1.19,1.19,1.22,1.18,1.18,1.17,1.16,1.16,1.18,1.18,1.16,1.16,1.16,1.16,1.17,1.16,1.14,1.13,1.12,1.14,1.15,1.19,1.14,1.15,1.15,1.15,1.15,1.13,1.1&chxl=0:|1|15|30|45|60|75|2:||Iteration|3:||Time+(secs)&chtt=call_method_slots ### call_method_unknown ### Min: 1.170848 -> 1.155544: 1.0132x faster Avg: 1.180379 -> 1.201501: 1.0179x slower Significant (t=-9.479015) Stddev: 0.01125 -> 0.02487: 2.2110x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0.17149400711,2.26189613342&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0.17149400711,2.26189613342&chd=t:1.19,1.2,1.21,1.19,1.2,1.2,1.18,1.17,1.18,1.17,1.17,1.19,1.21,1.2,1.17,1.17,1.17,1.17,1.19,1.19,1.17,1.18,1.17,1.18,1.17,1.2,1.17,1.22,1.2,1.19,1.2,1.19,1.2,1.19,1.19,1.21,1.2,1.19,1.2,1.2,1.19,1.19,1.19,1.19,1.19,1.2,1.21,1.19,1.17,1.19,1.18,1.17,1.19,1.17,1.17,1.19,1.17,1.18,1.17,1.17,1.17,1.17,1.18,1.17,1.19,1.17,1.18,1.18,1.17,1.18,1.17,1.17,1.17,1.19,1.2|1.26,1.24,1.21,1.2,1.22,1.23,1.22,1.21,1.22,1.24,1.23,1.2,1.21,1.25,1.23,1.2,1.19,1.19,1.2,1.19,1.2,1.24,1.2,1.19,1.2,1.21,1.24,1.22,1.24,1.19,1.18,1.2,1.21,1.18,1.2,1.21,1.2,1.17,1.19,1.19,1.22,1.2,1.2,1.19,1.2,1.2,1.18,1.2,1.23,1.24,1.25,1.23,1.21,1.19,1.2,1.24,1.24,1.21,1.23,1.24,1.23,1.24,1.18,1.2,1.19,1.21,1.23,1.24,1.25,1.24,1.23,1.24,1.23,1.2,1.2&chxl=0:|1|15|30|45|60|75|2:||Iteration|3:||Time+(secs)&chtt=call_method_unknown ### call_simple ### Min: 0.905800 -> 0.908177: 1.0026x slower Avg: 0.911217 -> 0.942381: 1.0342x slower Significant (t=-18.575059) Stddev: 0.00579 -> 0.01972: 3.4054x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.98918581009&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.98918581009&chd=t:0.91,0.91,0.92,0.91,0.92,0.91,0.91,0.91,0.91,0.92,0.92,0.92,0.91,0.93,0.91,0.91,0.91,0.92,0.92,0.91,0.91,0.91,0.91,0.91,0.91,0.91,0.93,0.91,0.92,0.93,0.92,0.91,0.92,0.91,0.91,0.91,0.91,0.91,0.91,0.91,0.91,0.91,0.91,0.92,0.93,0.91,0.93,0.91,0.91,0.91,0.91,0.91,0.91,0.91,0.92,0.92,0.91,0.93,0.92,0.91,0.91,0.91,0.91,0.91,0.91,0.92,0.93,0.93,0.93,0.91,0.91,0.91,0.91,0.91,0.91|0.99,0.95,0.93,0.95,0.95,0.94,0.94,0.96,0.93,0.97,0.96,0.95,0.97,0.94,0.96,0.95,0.95,0.94,0.97,0.96,0.94,0.96,0.98,0.93,0.94,0.96,0.97,0.94,0.97,0.97,0.95,0.95,0.94,0.96,0.96,0.93,0.92,0.95,0.96,0.97,0.92,0.95,0.96,0.94,0.91,0.96,0.97,0.95,0.94,0.95,0.92,0.95,0.95,0.97,0.93,0.94,0.95,0.96,0.97,0.94,0.96,0.96,0.95,0.94,0.99,0.99,0.97,0.94,0.97,0.97,0.96,0.95,0.96,0.98,0.95&chxl=0:|1|15|30|45|60|75|2:||Iteration|3:||Time+(secs)&chtt=call_simple ### float ### Min: 0.222201 -> 0.224009: 1.0081x slower Avg: 0.232227 -> 0.239783: 1.0325x slower Significant (t=-9.550820) Stddev: 0.00855 -> 0.00913: 1.0688x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.26341700554&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.26341700554&chd=t:0.24,0.24,0.25,0.25,0.24,0.24,0.25,0.24,0.25,0.24,0.24,0.24,0.24,0.23,0.25,0.24,0.25,0.23,0.25,0.23,0.24,0.25,0.24,0.23,0.24,0.25,0.25,0.25,0.24,0.25,0.24,0.24,0.23,0.25,0.24,0.25,0.23,0.25,0.24,0.24,0.25,0.25,0.23,0.24,0.24,0.24,0.25,0.24,0.24,0.24,0.25,0.23,0.25,0.24,0.24,0.23,0.25,0.23,0.24,0.25,0.24,0.23,0.24,0.24,0.24,0.25,0.24,0.25,0.24,0.25,0.24,0.25,0.24,0.25,0.23,0.25,0.24,0.25,0.25,0.25,0.23,0.24,0.25,0.24|0.25,0.25,0.26,0.26,0.24,0.25,0.26,0.25,0.26,0.25,0.25,0.25,0.25,0.24,0.26,0.25,0.25,0.24,0.25,0.24,0.25,0.26,0.25,0.24,0.25,0.25,0.26,0.26,0.25,0.25,0.25,0.26,0.24,0.26,0.25,0.25,0.25,0.26,0.25,0.26,0.26,0.25,0.23,0.24,0.25,0.24,0.25,0.24,0.25,0.24,0.24,0.23,0.25,0.24,0.26,0.24,0.25,0.25,0.25,0.25,0.25,0.23,0.25,0.25,0.24,0.25,0.25,0.25,0.25,0.26,0.24,0.26,0.25,0.25,0.24,0.26,0.24,0.25,0.26,0.25,0.24,0.25,0.25,0.25&chxl=0:|1|17|34|51|68|84|2:||Iteration|3:||Time+(secs)&chtt=float ### hg_startup ### Min: 0.045993 -> 0.048992: 1.0652x slower Avg: 0.057321 -> 0.056441: 1.0156x faster Significant (t=4.488042) Stddev: 0.00319 -> 0.00301: 1.0620x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.06599&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.06599&chd=t:0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06|0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.07,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.05,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06&chxl=0:|1|20|40|60|80|100|2:||Iteration|3:||Time+(secs)&chtt=hg_startup ### html5lib ### Min: 26.507970 -> 25.616106: 1.0348x faster Avg: 26.597557 -> 25.732688: 1.0336x faster Significant (t=9.827764) Stddev: 0.09216 -> 0.17386: 1.8865x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,24.616106,27.70594&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=24.616106,27.70594&chd=t:26.51,26.71,26.54,26.69,26.55|25.68,25.62,25.68,26.04,25.65&chxl=0:|1|2|3|4|5|2:||Iteration|3:||Time+(secs)&chtt=html5lib ### html5lib_warmup ### Min: 25.655162 -> 25.466228: 1.0074x faster Avg: 26.110781 -> 25.898441: 1.0082x faster Not significant Stddev: 0.26144 -> 0.25576: 1.0222x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,24.4662280083,27.2955319881&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=24.4662280083,27.2955319881&chd=t:25.66,26.3,26.27,26.18,26.16|25.47,26.04,26.12,25.89,25.97&chxl=0:|1|2|3|4|5|2:||Iteration|3:||Time+(secs)&chtt=html5lib_warmup ### iterative_count ### Min: 0.369361 -> 0.223053: 1.6559x faster Avg: 0.371506 -> 0.240774: 1.5430x faster Significant (t=72.130793) Stddev: 0.00198 -> 0.01266: 6.3935x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.38339400291&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.38339400291&chd=t:0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.38,0.37|0.25,0.25,0.23,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.23,0.22,0.23,0.23,0.23,0.23,0.24,0.25,0.25,0.25,0.24,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.24,0.25,0.25,0.25,0.24,0.25,0.25,0.25,0.24,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.24,0.25&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=iterative_count ### nbody ### Min: 0.935157 -> 0.931795: 1.0036x faster Avg: 0.946445 -> 0.943684: 1.0029x faster Significant (t=2.384189) Stddev: 0.00409 -> 0.00709: 1.7332x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.95390200615&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.95390200615&chd=t:0.94,0.95,0.94,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.94,0.95,0.94,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.94,0.95,0.95,0.95,0.95,0.94,0.94,0.94,0.94,0.94,0.94,0.94,0.94,0.95,0.94,0.95,0.94,0.95,0.95,0.95,0.95|0.94,0.93,0.94,0.95,0.94,0.95,0.94,0.93,0.93,0.94,0.95,0.93,0.94,0.94,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.94,0.94,0.93,0.93,0.95,0.94,0.93,0.94,0.94,0.94,0.95,0.95,0.95,0.94,0.94,0.95,0.93,0.94,0.94,0.95,0.95,0.95,0.93,0.95,0.95,0.94,0.95,0.94&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=nbody ### normal_startup ### Min: 0.685616 -> 0.676500: 1.0135x faster Avg: 0.686916 -> 0.678582: 1.0123x faster Significant (t=31.273550) Stddev: 0.00078 -> 0.00171: 2.1897x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.69004797935&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.69004797935&chd=t:0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69|0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=normal_startup ### nqueens ### Min: 0.980723 -> 0.947436: 1.0351x faster Avg: 0.989169 -> 0.954421: 1.0364x faster Significant (t=46.434070) Stddev: 0.00394 -> 0.00353: 1.1181x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.99711680412&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.99711680412&chd=t:0.99,0.99,0.99,0.99,0.99,0.99,1.0,0.99,0.99,0.99,0.99,0.99,1.0,0.99,0.98,0.99,0.99,0.99,0.99,0.99,0.98,0.98,0.98,0.98,0.99,0.99,0.99,0.99,0.99,0.99,0.99,0.99,0.99,0.99,0.98,0.98,0.99,0.99,0.99,0.99,0.99,0.99,0.99,0.99,0.99,0.99,0.98,0.99,0.99,0.99|0.95,0.96,0.95,0.96,0.95,0.96,0.95,0.96,0.95,0.96,0.96,0.95,0.95,0.95,0.95,0.95,0.96,0.95,0.96,0.96,0.95,0.95,0.95,0.95,0.96,0.96,0.96,0.95,0.95,0.95,0.95,0.96,0.96,0.95,0.96,0.96,0.95,0.96,0.95,0.95,0.95,0.96,0.96,0.96,0.95,0.96,0.95,0.96,0.95,0.96&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=nqueens ### pickle ### Min: 3.346728 -> 3.398232: 1.0154x slower Avg: 3.367508 -> 3.415437: 1.0142x slower Significant (t=-28.797501) Stddev: 0.00840 -> 0.00824: 1.0186x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,2.34672808647,4.43019509315&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=2.34672808647,4.43019509315&chd=t:3.37,3.37,3.38,3.37,3.36,3.38,3.36,3.35,3.37,3.37,3.37,3.36,3.38,3.36,3.38,3.37,3.36,3.37,3.37,3.37,3.35,3.37,3.36,3.38,3.37,3.37,3.38,3.37,3.36,3.37,3.36,3.36,3.37,3.36,3.37,3.36,3.37,3.36,3.38,3.38,3.36,3.37,3.36,3.37,3.37,3.36,3.37,3.39,3.38,3.36|3.4,3.4,3.41,3.43,3.41,3.42,3.41,3.42,3.41,3.41,3.41,3.41,3.41,3.4,3.41,3.43,3.42,3.41,3.41,3.41,3.41,3.42,3.41,3.41,3.4,3.41,3.42,3.42,3.43,3.43,3.42,3.42,3.41,3.42,3.41,3.41,3.42,3.42,3.43,3.41,3.43,3.41,3.42,3.42,3.43,3.42,3.42,3.41,3.41,3.41&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=pickle ### pickle_dict ### Min: 3.395274 -> 3.338732: 1.0169x faster Avg: 3.513604 -> 3.359646: 1.0458x faster Significant (t=16.225759) Stddev: 0.06605 -> 0.01182: 5.5896x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,2.33873200417,4.60737299919&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=2.33873200417,4.60737299919&chd=t:3.49,3.56,3.59,3.57,3.51,3.46,3.59,3.55,3.4,3.52,3.49,3.59,3.57,3.43,3.55,3.6,3.5,3.43,3.43,3.56,3.42,3.44,3.52,3.53,3.6,3.6,3.41,3.46,3.4,3.46,3.4,3.54,3.57,3.54,3.55,3.6,3.58,3.48,3.48,3.42,3.6,3.57,3.4,3.61,3.5,3.51,3.45,3.54,3.54,3.57|3.36,3.35,3.34,3.36,3.38,3.37,3.36,3.35,3.35,3.35,3.36,3.36,3.34,3.36,3.37,3.36,3.36,3.35,3.38,3.34,3.36,3.37,3.39,3.38,3.35,3.36,3.35,3.35,3.36,3.36,3.36,3.34,3.37,3.36,3.38,3.37,3.38,3.35,3.35,3.36,3.36,3.38,3.37,3.34,3.35,3.35,3.35,3.36,3.35,3.36&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=pickle_dict ### pickle_list ### Min: 1.720434 -> 1.708855: 1.0068x faster Avg: 1.762757 -> 1.719942: 1.0249x faster Significant (t=11.198322) Stddev: 0.02604 -> 0.00727: 3.5808x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0.70885491371,2.81176018715&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0.70885491371,2.81176018715&chd=t:1.77,1.77,1.8,1.79,1.76,1.75,1.77,1.8,1.8,1.78,1.74,1.79,1.81,1.77,1.74,1.76,1.8,1.79,1.78,1.73,1.78,1.8,1.77,1.76,1.79,1.78,1.75,1.72,1.72,1.72,1.72,1.76,1.74,1.79,1.8,1.75,1.74,1.72,1.75,1.73,1.73,1.79,1.73,1.74,1.75,1.77,1.74,1.76,1.77,1.78|1.72,1.71,1.71,1.71,1.72,1.73,1.71,1.71,1.71,1.71,1.73,1.72,1.71,1.71,1.71,1.73,1.73,1.73,1.74,1.73,1.72,1.71,1.71,1.72,1.71,1.73,1.72,1.71,1.72,1.72,1.73,1.73,1.71,1.72,1.72,1.73,1.72,1.72,1.73,1.73,1.73,1.73,1.73,1.73,1.71,1.71,1.72,1.72,1.72,1.72&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=pickle_list ### pybench ### Min: 8937 -> 8141: 1.0978x faster Avg: 9069 -> 8266: 1.0971x faster ### regex_compile ### Min: 1.297481 -> 1.230614: 1.0543x faster Avg: 1.303290 -> 1.235283: 1.0551x faster Significant (t=120.657667) Stddev: 0.00304 -> 0.00257: 1.1834x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0.23061418533,2.31539511681&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0.23061418533,2.31539511681&chd=t:1.31,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.31,1.3,1.31,1.3,1.3,1.3,1.31,1.31,1.3,1.31,1.3,1.3,1.3,1.3,1.3,1.31,1.3,1.3,1.31,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.31,1.3,1.3,1.3,1.3,1.3,1.3,1.32|1.23,1.24,1.24,1.23,1.24,1.23,1.24,1.24,1.24,1.23,1.23,1.23,1.24,1.23,1.23,1.23,1.24,1.24,1.24,1.24,1.24,1.24,1.24,1.25,1.24,1.24,1.24,1.24,1.24,1.23,1.23,1.23,1.23,1.23,1.23,1.23,1.24,1.24,1.23,1.23,1.24,1.23,1.24,1.23,1.23,1.23,1.24,1.23,1.23,1.23&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=regex_compile ### regex_effbot ### Min: 0.238711 -> 0.234200: 1.0193x faster Avg: 0.239331 -> 0.236123: 1.0136x faster Significant (t=19.737486) Stddev: 0.00050 -> 0.00104: 2.0828x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.24141407013&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.24141407013&chd=t:0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24|0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=regex_effbot ### regex_v8 ### Min: 0.229685 -> 0.217755: 1.0548x faster Avg: 0.232979 -> 0.219208: 1.0628x faster Significant (t=36.278688) Stddev: 0.00217 -> 0.00157: 1.3824x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.23589801788&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.23589801788&chd=t:0.23,0.24,0.23,0.24,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.24,0.23,0.23,0.23,0.24,0.23,0.24,0.23,0.23,0.23,0.24,0.23,0.24,0.23,0.23,0.23,0.24,0.23,0.23,0.23,0.24,0.23,0.24,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.24,0.23,0.24|0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.23,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=regex_v8 ### richards ### Min: 0.543314 -> 0.504176: 1.0776x faster Avg: 0.550139 -> 0.542886: 1.0134x faster Significant (t=3.118548) Stddev: 0.00397 -> 0.01596: 4.0203x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.57444500923&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.57444500923&chd=t:0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.56,0.55,0.56,0.55,0.55,0.54,0.54,0.55,0.55,0.55,0.54,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.56,0.56,0.56,0.56,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55|0.53,0.55,0.55,0.55,0.55,0.54,0.53,0.53,0.55,0.54,0.55,0.55,0.56,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.54,0.56,0.53,0.51,0.56,0.51,0.56,0.52,0.56,0.51,0.54,0.57,0.53,0.57,0.52,0.57,0.54,0.54,0.53,0.53,0.52,0.54,0.5&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=richards ### slowpickle ### Min: 1.453602 -> 1.361336: 1.0678x faster Avg: 1.459776 -> 1.370334: 1.0653x faster Significant (t=102.747004) Stddev: 0.00249 -> 0.00563: 2.2567x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0.36133599281,2.46742391586&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0.36133599281,2.46742391586&chd=t:1.46,1.46,1.46,1.46,1.46,1.45,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.45,1.47,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46|1.37,1.38,1.37,1.36,1.38,1.38,1.38,1.37,1.37,1.37,1.37,1.38,1.38,1.38,1.38,1.37,1.37,1.36,1.36,1.36,1.36,1.36,1.36,1.36,1.37,1.37,1.36,1.37,1.37,1.37,1.37,1.37,1.37,1.38,1.38,1.38,1.38,1.37,1.37,1.37,1.37,1.37,1.38,1.38,1.37,1.37,1.37,1.36,1.36,1.36&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=slowpickle ### slowspitfire ### Min: 1.507587 -> 1.393345: 1.0820x faster Avg: 1.512317 -> 1.405533: 1.0760x faster Significant (t=83.955024) Stddev: 0.00415 -> 0.00798: 1.9254x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0.39334487915,2.53158593178&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0.39334487915,2.53158593178&chd=t:1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.53,1.52,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.52,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.53,1.51,1.51,1.51,1.51,1.52,1.51,1.51,1.51,1.51|1.41,1.41,1.42,1.39,1.41,1.41,1.4,1.42,1.4,1.39,1.4,1.39,1.4,1.4,1.42,1.41,1.4,1.4,1.42,1.4,1.4,1.41,1.42,1.4,1.42,1.41,1.41,1.41,1.41,1.41,1.4,1.4,1.4,1.4,1.41,1.41,1.41,1.4,1.41,1.4,1.4,1.4,1.41,1.41,1.42,1.41,1.4,1.41,1.4,1.4&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=slowspitfire ### slowunpickle ### Min: 0.692674 -> 0.645382: 1.0733x faster Avg: 0.695322 -> 0.648033: 1.0730x faster Significant (t=102.284826) Stddev: 0.00177 -> 0.00275: 1.5551x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.70394492149&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.70394492149&chd=t:0.69,0.69,0.7,0.69,0.69,0.69,0.7,0.69,0.7,0.7,0.69,0.7,0.69,0.7,0.69,0.7,0.69,0.7,0.7,0.7,0.7,0.69,0.7,0.69,0.7,0.69,0.7,0.69,0.69,0.7,0.69,0.7,0.7,0.69,0.69,0.7,0.7,0.7,0.69,0.7,0.7,0.7,0.7,0.69,0.7,0.7,0.69,0.7,0.7,0.7|0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.66,0.65,0.65,0.65,0.65&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=slowunpickle ### startup_nosite ### Min: 0.247376 -> 0.246369: 1.0041x faster Avg: 0.249051 -> 0.248113: 1.0038x faster Significant (t=6.716428) Stddev: 0.00109 -> 0.00088: 1.2345x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.25523996353&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.25523996353&chd=t:0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.26,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25|0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25&chxl=0:|1|20|40|60|80|100|2:||Iteration|3:||Time+(secs)&chtt=startup_nosite ### threaded_count ### Min: 0.373155 -> 0.227307: 1.6416x faster Avg: 0.374912 -> 0.234906: 1.5960x faster Significant (t=224.886947) Stddev: 0.00110 -> 0.00426: 3.8673x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.37840795517&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.37840795517&chd=t:0.37,0.37,0.38,0.37,0.37,0.37,0.38,0.38,0.37,0.38,0.37,0.37,0.37,0.38,0.38,0.38,0.37,0.38,0.38,0.38,0.38,0.38,0.37,0.37,0.37,0.38,0.37,0.38,0.37,0.37,0.37,0.37,0.38,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.38,0.37,0.38,0.37,0.37,0.38,0.38,0.37,0.37|0.24,0.24,0.24,0.24,0.23,0.23,0.23,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.24,0.24,0.23,0.23,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.25,0.24,0.25,0.24,0.24,0.23,0.24,0.23&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=threaded_count ### unpack_sequence ### Min: 0.000150 -> 0.000159: 1.0605x slower Avg: 0.000153 -> 0.000161: 1.0550x slower Significant (t=-450.521988) Stddev: 0.00000 -> 0.00000: 1.2070x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.00053215027&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.00053215027&chd=t:0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0|0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0&chxl=0:|1|20|40|60|80|100|2:||Iteration|3:||Time+(secs)&chtt=unpack_sequence ### unpickle ### Min: 2.042838 -> 2.023408: 1.0096x faster Avg: 2.054084 -> 2.037836: 1.0080x faster Significant (t=13.396235) Stddev: 0.00551 -> 0.00657: 1.1931x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,1.0234079361,3.0667848587&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=1.0234079361,3.0667848587&chd=t:2.06,2.06,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.06,2.06,2.06,2.06,2.05,2.06,2.06,2.06,2.06,2.05,2.04,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.04,2.04,2.05,2.06,2.06,2.06,2.06,2.06,2.06,2.06,2.07,2.06,2.06,2.06,2.05,2.06|2.04,2.04,2.04,2.04,2.04,2.02,2.04,2.03,2.02,2.04,2.04,2.04,2.03,2.03,2.04,2.04,2.04,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.03,2.03,2.03,2.03,2.03,2.03,2.03,2.04,2.03,2.03,2.04,2.04,2.04,2.04,2.04,2.04,2.04,2.04,2.04,2.04,2.04,2.03,2.03,2.03&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=unpickle ### unpickle_list ### Min: 1.542357 -> 1.645569: 1.0669x slower Avg: 1.554601 -> 1.654697: 1.0644x slower Significant (t=-93.061602) Stddev: 0.00647 -> 0.00400: 1.6147x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0.54235696793,2.66085600853&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0.54235696793,2.66085600853&chd=t:1.56,1.55,1.55,1.56,1.55,1.56,1.56,1.56,1.56,1.55,1.55,1.54,1.56,1.55,1.55,1.54,1.55,1.55,1.55,1.54,1.54,1.55,1.55,1.54,1.55,1.54,1.55,1.55,1.56,1.55,1.55,1.55,1.56,1.55,1.56,1.56,1.56,1.56,1.56,1.56,1.56,1.56,1.56,1.57,1.56,1.56,1.56,1.56,1.57,1.56|1.66,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.66,1.66,1.65,1.66,1.66,1.65,1.66,1.66,1.66,1.66,1.66,1.66,1.65,1.66,1.65,1.66,1.66,1.66,1.66,1.66,1.65,1.66,1.66,1.65,1.66,1.66,1.66,1.66,1.66,1.65,1.66&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=unpickle_list Report on Darwin unknown-00-1e-c2-bc-ea-b3.config 10.3.0 Darwin Kernel Version 10.3.0: Fri Feb 26 11:58:09 PST 2010; root:xnu-1504.3.12~1/RELEASE_I386 i386 i386 Total CPU cores: 2 ### 2to3 ### 25.590659 -> 23.666681: 1.0813x faster ### bzr_startup ### Min: 0.102069 -> 0.099751: 1.0232x faster Avg: 0.102827 -> 0.100411: 1.0241x faster Significant (t=20.360035) Stddev: 0.00072 -> 0.00094: 1.3152x larger Timeline: http://tinyurl.com/y6yjv5w ### call_method ### Min: 0.606348 -> 0.548343: 1.1058x faster Avg: 0.609875 -> 0.556685: 1.0955x faster Significant (t=54.742949) Stddev: 0.00303 -> 0.01151: 3.7924x larger Timeline: http://tinyurl.com/y7wkkmp ### call_method_slots ### Min: 0.641415 -> 0.549939: 1.1663x faster Avg: 0.648512 -> 0.571999: 1.1338x faster Significant (t=66.043832) Stddev: 0.01162 -> 0.00815: 1.4253x smaller Timeline: http://tinyurl.com/y7mlu86 ### call_method_unknown ### Min: 0.675142 -> 0.613596: 1.1003x faster Avg: 0.685377 -> 0.616531: 1.1117x faster Significant (t=35.991776) Stddev: 0.02328 -> 0.00260: 8.9669x smaller Timeline: http://tinyurl.com/y6p65wk ### call_simple ### Min: 0.443526 -> 0.425943: 1.0413x faster Avg: 0.447255 -> 0.442844: 1.0100x faster Significant (t=4.469438) Stddev: 0.00569 -> 0.01066: 1.8738x larger Timeline: http://tinyurl.com/y8xbq2f ### float ### Min: 0.102775 -> 0.096776: 1.0620x faster Avg: 0.110484 -> 0.102809: 1.0747x faster Significant (t=13.220150) Stddev: 0.00738 -> 0.00546: 1.3507x smaller Timeline: http://tinyurl.com/yyhutwh ### hg_startup ### Min: 0.045108 -> 0.043234: 1.0433x faster Avg: 0.046845 -> 0.043972: 1.0653x faster Significant (t=28.354118) Stddev: 0.00206 -> 0.00095: 2.1622x smaller Timeline: http://tinyurl.com/y5b9xx5 ### html5lib ### Min: 15.549443 -> 14.847499: 1.0473x faster Avg: 15.582542 -> 14.859007: 1.0487x faster Significant (t=64.534012) Stddev: 0.02167 -> 0.01261: 1.7190x smaller Timeline: http://tinyurl.com/y3g6t44 ### html5lib_warmup ### Min: 15.770884 -> 15.074864: 1.0462x faster Avg: 16.133120 -> 15.319287: 1.0531x faster Significant (t=4.375747) Stddev: 0.30506 -> 0.28266: 1.0793x smaller Timeline: http://tinyurl.com/y2xcn3m ### iterative_count ### Min: 0.147178 -> 0.085756: 1.7162x faster Avg: 0.151184 -> 0.088620: 1.7060x faster Significant (t=49.925293) Stddev: 0.00651 -> 0.00601: 1.0834x smaller Timeline: http://tinyurl.com/yybv496 ### nbody ### Min: 0.471700 -> 0.463253: 1.0182x faster Avg: 0.483086 -> 0.475017: 1.0170x faster Significant (t=3.488633) Stddev: 0.01129 -> 0.01183: 1.0477x larger Timeline: http://tinyurl.com/y6lrfst ### normal_startup ### Min: 0.811946 -> 0.789491: 1.0284x faster Avg: 0.854893 -> 0.819687: 1.0430x faster Significant (t=5.095698) Stddev: 0.03899 -> 0.02943: 1.3249x smaller Timeline: http://tinyurl.com/yydc2u4 ### nqueens ### Min: 0.597376 -> 0.570333: 1.0474x faster Avg: 0.606725 -> 0.588271: 1.0314x faster Significant (t=5.653285) Stddev: 0.00920 -> 0.02117: 2.3015x larger Timeline: http://tinyurl.com/y3n2fg3 ### pickle ### Min: 1.651874 -> 1.574163: 1.0494x faster Avg: 1.680315 -> 1.612453: 1.0421x faster Significant (t=10.340275) Stddev: 0.02313 -> 0.04023: 1.7395x larger Timeline: http://tinyurl.com/y7r55ms ### pickle_dict ### Min: 1.308464 -> 1.275010: 1.0262x faster Avg: 1.318127 -> 1.296507: 1.0167x faster Significant (t=4.484688) Stddev: 0.00605 -> 0.03355: 5.5471x larger Timeline: http://tinyurl.com/y4j9v5q ### pickle_list ### Min: 0.743117 -> 0.803173: 1.0808x slower Avg: 0.751905 -> 0.810111: 1.0774x slower Significant (t=-44.249464) Stddev: 0.00663 -> 0.00652: 1.0172x smaller Timeline: http://tinyurl.com/y633yb6 ### pybench ### Min: 4763 -> 4342: 1.0970x faster Avg: 4988 -> 4463: 1.1176x faster ### regex_compile ### Min: 0.740278 -> 0.661458: 1.1192x faster Avg: 0.764527 -> 0.685639: 1.1151x faster Significant (t=15.011621) Stddev: 0.02380 -> 0.02854: 1.1995x larger Timeline: http://tinyurl.com/y524doe ### regex_effbot ### Min: 0.096349 -> 0.096083: 1.0028x faster Avg: 0.100523 -> 0.099285: 1.0125x faster Not significant Stddev: 0.00504 -> 0.00327: 1.5444x smaller Timeline: http://tinyurl.com/y3e6z2j ### regex_v8 ### Min: 0.107875 -> 0.104745: 1.0299x faster Avg: 0.114243 -> 0.109286: 1.0454x faster Significant (t=2.325803) Stddev: 0.01377 -> 0.00612: 2.2522x smaller Timeline: http://tinyurl.com/y4qvh3d ### richards ### Min: 0.329455 -> 0.286851: 1.1485x faster Avg: 0.340571 -> 0.298913: 1.1394x faster Significant (t=13.324069) Stddev: 0.01252 -> 0.01822: 1.4556x larger Timeline: http://tinyurl.com/y3d8zxk ### slowpickle ### Min: 0.717864 -> 0.646023: 1.1112x faster Avg: 0.748511 -> 0.659941: 1.1342x faster Significant (t=17.041455) Stddev: 0.03039 -> 0.02067: 1.4701x smaller Timeline: http://tinyurl.com/y5ht5y5 ### slowspitfire ### Min: 0.797233 -> 0.762146: 1.0460x faster Avg: 0.839011 -> 0.812074: 1.0332x faster Significant (t=4.203713) Stddev: 0.02803 -> 0.03560: 1.2699x larger Timeline: http://tinyurl.com/y7owc3g ### slowunpickle ### Min: 0.320963 -> 0.289625: 1.1082x faster Avg: 0.325532 -> 0.293422: 1.1094x faster Significant (t=17.014061) Stddev: 0.00791 -> 0.01075: 1.3598x larger Timeline: http://tinyurl.com/y5dcwdj ### startup_nosite ### Min: 0.210807 -> 0.219255: 1.0401x slower Avg: 0.222933 -> 0.232971: 1.0450x slower Significant (t=-4.776980) Stddev: 0.01592 -> 0.01372: 1.1601x smaller Timeline: http://tinyurl.com/y2cexr7 ### threaded_count ### Min: 0.195203 -> 0.113455: 1.7205x faster Avg: 0.225064 -> 0.176248: 1.2770x faster Significant (t=12.769360) Stddev: 0.00850 -> 0.02566: 3.0192x larger Timeline: http://tinyurl.com/y74c4w3 ### unpack_sequence ### Min: 0.000092 -> 0.000083: 1.1095x faster Avg: 0.000094 -> 0.000085: 1.1058x faster Significant (t=61.506288) Stddev: 0.00002 -> 0.00002: 1.1541x smaller Timeline: http://tinyurl.com/yykzcrg ### unpickle ### Min: 1.026543 -> 1.018970: 1.0074x faster Avg: 1.048295 -> 1.042098: 1.0059x faster Not significant Stddev: 0.01646 -> 0.03854: 2.3408x larger Timeline: http://tinyurl.com/y786tft ### unpickle_list ### Min: 0.908621 -> 0.905129: 1.0039x faster Avg: 0.926660 -> 0.928462: 1.0019x slower Not significant Stddev: 0.01631 -> 0.01509: 1.0806x smaller Timeline: http://tinyurl.com/y5m6s3u From ncoghlan at gmail.com Wed Jun 23 12:58:00 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 23 Jun 2010 20:58:00 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: <4C21D15F.8070304@egenix.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <4C20FC54.9000608@egenix.com> <4C21D15F.8070304@egenix.com> Message-ID: On Wed, Jun 23, 2010 at 7:18 PM, M.-A. Lemburg wrote: > Note that the point of using a builtin method was to get > better performance. Such type adaptions are often needed in > loops, so adding a few extra Python function calls just to > convert a str object to a bytes object or vice-versa is a > bit much overhead. I actually agree with that, I just think we need more real world experience as to what works with the Python 3 text model before we start messing with the APIs for the builtin objects (fair point that "coerce" is a loaded term given the existence of the old coercion protocol. It's the right word for the task though). One of the key points coming out of this thread (to my mind) is the lack of a Text ABC or other way of making an object that can be passed to functions expecting a str instance with a reasonable expectation of having it work. Are there some core string capabilities that can be identified and then expanded out to a full str-compatible API? (i.e. something along the lines of what collections.MutableMapping now provides for dict-alikes). However, even if something like that was added, PJE is correct in pointing out that builtin strings still don't play well with others in many cases (usually due to underlying optimisations or other sound reasons, but perhaps sometimes gratuitously). Most of the string binary operations can be dealt with through their reflected forms, but str.__mod__ will never return NotImplemented, __contains__ has no reflected form and the actual method calls are of course right out (e.g. the arguments to str.join() or str.split() calls have no ability to affect the type of the result). Third party number implementations couldn't provide comparable funtionality to builtin int and long objects until the __index__ protocol was added. Perhaps PJE is right that what this is really crying out for is a way to have third party "real string" implementations so that there can actually be genuine experimentation in the Unicode handling space outside the language core (comparable to the difference between the "you can turn me into an int" __int__ method and the "I am an int equivalent" __index__ method). That may be tapping in a nail with a sledgehammer (and would raise significant moratorium questions if pursued further), but I think it's a valid question to at least ask. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Wed Jun 23 13:12:40 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 23 Jun 2010 21:12:40 +1000 Subject: [Python-Dev] WPython 1.1 was released In-Reply-To: References: Message-ID: <201006232112.41047.steve@pearwood.info> On Wed, 23 Jun 2010 08:12:36 pm Cesare Di Mauro wrote: > I've released WPython 1.1, which brings many optimizations and > refactorings. For those of us who don't know what WPython is, and are too lazy, too busy, or reading their email off-line, could you give us a one short paragraph description of what it is? Actually, since I'm none of the above, I'll answer my own question: WPython is an implementation of Python that uses 16-bit wordcodes instead of byte code, and claims to have various performance benefits from doing so. It looks like good work, thank you. -- Steven D'Aprano From cesare.di.mauro at gmail.com Wed Jun 23 13:28:58 2010 From: cesare.di.mauro at gmail.com (Cesare Di Mauro) Date: Wed, 23 Jun 2010 13:28:58 +0200 Subject: [Python-Dev] WPython 1.1 was released In-Reply-To: <201006232112.41047.steve@pearwood.info> References: <201006232112.41047.steve@pearwood.info> Message-ID: 2010/6/23 Steven D'Aprano > On Wed, 23 Jun 2010 08:12:36 pm Cesare Di Mauro wrote: > > I've released WPython 1.1, which brings many optimizations and > > refactorings. > > For those of us who don't know what WPython is, and are too lazy, too > busy, or reading their email off-line, could you give us a one short > paragraph description of what it is? > > Actually, since I'm none of the above, I'll answer my own question: > WPython is an implementation of Python that uses 16-bit wordcodes > instead of byte code, and claims to have various performance benefits > from doing so. > > It looks like good work, thank you. > > -- > Steven D'Aprano > Hi Steven, sorry, I made a mistake, assuming that the project was known. WPython is a CPython 2.6.4 implementation that uses "wordcodes" instead of bytecodes. A wordcode is a word (16 bits, two bytes, in this case) used to represent VM opcodes. This new encoding enabled to simplify the execution of the virtual machine main cycle, improving understanding, maintenance, and extensibility; less space is required on average, and execution speed is improved too. Cesare -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at holdenweb.com Wed Jun 23 14:17:20 2010 From: steve at holdenweb.com (Steve Holden) Date: Wed, 23 Jun 2010 08:17:20 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> Message-ID: <4C21FB50.1080905@holdenweb.com> Guido van Rossum wrote: > On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver wrote: >> Any "turdiness" (which I am *not* arguing for) is a natural consequence >> of the kinds of backward incompatibilities which were *not* ruled out >> for Python 3, along with the (early, now waning) "build it and they will >> come" optimism about adoption rates. > > FWIW, my optimisim is *not* waning. I think it's good that we're > having this discussion and I expect something useful will come out of > it; I also expect in general that the (admittedly serious) problem of > having to port all dependencies will be solved in the next few years. > Not by magic, but because many people are taking small steps in the > right direction, and there will be light eventually. In the mean time > I don't blame anyone for sticking with 2.x or being too busy to help > port stuff to 3.x. Python 3 has been a long time in the making -- it > will be a bit longer still, which was expected. > +1 The important thing is to avoid bigotry and FUD, and deal with things the way they are. The #python IRC team have just helped us make a major step forward. This won't be a campaign with a victorious charge over some imaginary finish line. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Wed Jun 23 14:17:20 2010 From: steve at holdenweb.com (Steve Holden) Date: Wed, 23 Jun 2010 08:17:20 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> Message-ID: <4C21FB50.1080905@holdenweb.com> Guido van Rossum wrote: > On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver wrote: >> Any "turdiness" (which I am *not* arguing for) is a natural consequence >> of the kinds of backward incompatibilities which were *not* ruled out >> for Python 3, along with the (early, now waning) "build it and they will >> come" optimism about adoption rates. > > FWIW, my optimisim is *not* waning. I think it's good that we're > having this discussion and I expect something useful will come out of > it; I also expect in general that the (admittedly serious) problem of > having to port all dependencies will be solved in the next few years. > Not by magic, but because many people are taking small steps in the > right direction, and there will be light eventually. In the mean time > I don't blame anyone for sticking with 2.x or being too busy to help > port stuff to 3.x. Python 3 has been a long time in the making -- it > will be a bit longer still, which was expected. > +1 The important thing is to avoid bigotry and FUD, and deal with things the way they are. The #python IRC team have just helped us make a major step forward. This won't be a campaign with a victorious charge over some imaginary finish line. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From alexander.belopolsky at gmail.com Wed Jun 23 16:06:27 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 23 Jun 2010 10:06:27 -0400 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE4AF.80009@v.loewis.de> Message-ID: On Wed, Jun 23, 2010 at 2:08 AM, Ronald Oussoren wrote: .. > I don't agree. ?The patch itself is pretty simple, but it does make a rather significant change to the build process: the > compile-time environment in configure would be different than during the compilation of posixmodule. That is, in functions > that check for features (the HAVE_FOOBAR macros in pyconfig.h) would use _DARWIN_C_SOURCE while posixmodule > itself wouldn't. ? ?This may lead to subtle bugs, or even compile errors (because some function definitions change when > _DARWIN_C_SOURCE active). I agree. Messing with compatibility macros outside of pyconfig.h is not a good idea. Martin's hack, while likely to work in most cases, is still a hack. I believe, however we can undefine _DARWIN_C_SOURCE globally at least on 10.4 and higher. I grepped throught the headers on my 10.6 system and I notice that the majority of checks for _DARWIN_C_SOURCE are in the form of #if !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE) According to a comment in configure, # On Mac OS X 10.4, defining _POSIX_C_SOURCE or _XOPEN_SOURCE # disables platform specific features beyond repair. # On Mac OS X 10.3, defining _POSIX_C_SOURCE or _XOPEN_SOURCE # has no effect, don't bother defining them _POSIX_C_SOURCE is already undefined in python headers, so undefining _DARWIN_C_SOURCE will have no effect on the majority of checks. I was able to find very few exceptions: some cases check _XOPEN_SOURCE instead or in addition to _POSIX_C_SOURCE before ignoring _DARWIN_C_SOURCE: /usr/include/grp.h:#if !defined(_XOPEN_SOURCE) || defined(_DARWIN_C_SOURCE) /usr/include/pwd.h:#if (!defined(_POSIX_C_SOURCE) && !defined(_XOPEN_SOURCE)) || defined(_DARWIN_C_SOURCE) .. Since _XOPEN_SOURCE is similarly undefined in python headers, these cases are unaffected as well. This leaves a handful of cases where Apple provides additional macros for fine grained control: /usr/include/stdio.h:#if defined(__DARWIN_10_6_AND_LATER) && (defined(_DARWIN_UNLIMITED_STREAMS) || defined(_DARWIN_C_SOURCE)) /usr/include/unistd.h:#if defined(_DARWIN_UNLIMITED_GETGROUPS) || defined(_DARWIN_C_SOURCE) The second line above is our dear friend and the _DARWIN_C_SOURCE behavior conditioned on the first line can be enabled by defining _DARWIN_UNLIMITED_STREAMS macro. I believe _DARWIN_C_SOURCE casts its net to wide and more targeted macros should be used instead. .. > ? ? Defining _POSIX_C_SOURCE or _DARWIN_C_SOURCE causes library and kernel calls to conform > to the SUSv3 standards even if doing so would alter? the behavior of functions used in 10.3. I cannot reconcile this with !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE) logic that I see in the headers. From pje at telecommunity.com Wed Jun 23 16:24:18 2010 From: pje at telecommunity.com (P.J. Eby) Date: Wed, 23 Jun 2010 10:24:18 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <5A4340BB-7B64-4C76-81FF-8A43F179AA7A@twistedmatrix.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <5A4340BB-7B64-4C76-81FF-8A43F179AA7A@twistedmatrix.com> Message-ID: <20100623142422.36F873A404D@sparrow.telecommunity.com> At 08:34 PM 6/22/2010 -0400, Glyph Lefkowitz wrote: >I suspect the practical problem here is that there's no CharacterString ABC That, and the absence of a string coercion protocol so that mixing your custom string with standard strings will do the right thing for your intended use. From alexander.belopolsky at gmail.com Wed Jun 23 16:48:24 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 23 Jun 2010 10:48:24 -0400 Subject: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7 Message-ID: On Wed, Jun 23, 2010 at 2:08 AM, Ronald Oussoren wrote: .. >> >>> * [Ronald's proposal] results in posix.getgroups not reflecting results of posix.setgroups >>> >> >> This effectively substitutes getgrouplist called on the current user >> for getgroups. ?In 3.x, I believe the correct action will be to >> provide direct access to getgrouplist which is while not POSIX (yet?), >> is widely available. > > I don't mind adding getgrouplist, but that issue is seperator from this one. BTW. Appearently getgrouplist is posix > ( ), although this isn't a > requirement for being added to the posix module. > (The link you provided leads to "Linux Standard Base Core Specification," which is different from POSIX, but the distinction is not relevant for our discussion.) > > It is still my opinion that the second option is preferable for better compatibility with system tools, even if the patch > is more complicated and the library function we use can be considered to be broken. Let me try to formulate what the disagreement is. There are two different group lists that can be associated with a running process: 1) The list of current supplementary group IDs maintained by the system for each process and stored in per-process system tables; and 2) The list of the groups that include the uid under which the process is running as a member. The first list is returned by a system call getgroups and the second can be obtained using system database access functions as follows: pw = getpwuid(getuid()) getgrouplist(pw->pw_name, ..) The first list can be modified by privileged processes using setgroups system call, while the second changes when system databases change. The problem that _DARWIN_C_SOURCE introduces is that it replaces system getgroups with a database query effectively making the true process' list of supplementary group IDs inaccessible to programs. See source code at . The problem is complicated by the fact that OSX true getgroups call appears to truncate the list of groups to NGROUPS_MAX=16. Note, however that it is not clear whether the system call truncates the list or the underlying process tables are limited to 16 entries and additional groups are ignored when the process is created. In my view, getgroups and getgrouplist are two fundamentally different operations and both should be provided by the os module. Redefining os.getgroups to invoke getgrouplist instead of system getgroups on one particular platform to work around that platform's system call limitation is not right. From ronaldoussoren at mac.com Wed Jun 23 17:03:39 2010 From: ronaldoussoren at mac.com (ronaldoussoren) Date: Wed, 23 Jun 2010 08:03:39 -0700 (PDT) Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: Message-ID: <91321b7f-d5a2-6f2f-8ecd-813636aaa3bd@me.com> On 23 Jun, 2010,at 04:06 PM, Alexander Belopolsky wrote: On Wed, Jun 23, 2010 at 2:08 AM, Ronald Oussoren wrote: .. > I don't agree. ?The patch itself is pretty simple, but it does make a rather significant change to the build process: the > compile-time environment in configure would be different than during the compilation of posixmodule. That is, in functions > that check for features (the HAVE_FOOBAR macros in pyconfig.h) would use _DARWIN_C_SOURCE while posixmodule > itself wouldn't. ? ?This may lead to subtle bugs, or even compile errors (because some function definitions change when > _DARWIN_C_SOURCE active). I agree. Messing with compatibility macros outside of pyconfig.h is not a good idea. Martin's hack, while likely to work in most cases, is still a hack. I believe, however we can undefine _DARWIN_C_SOURCE globally at least on 10.4 and higher. I grepped throught the headers on my 10.6 system and I notice that the majority of checks for _DARWIN_C_SOURCE are in the form of As I wrote the system will assume _DARWIN_C_SOURCE is set when ?when you don't set _POSIX_C_SOURCE or other feature macros. ? Working around that is a hack that I don't wish to support. .. > ? ? Defining _POSIX_C_SOURCE or _DARWIN_C_SOURCE causes library and kernel calls to conform > to the SUSv3 standards even if doing so would alter? the behavior of functions used in 10.3. I cannot reconcile this with !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE) logic that I see in the headers. This seems to be arranged in sys/cdefs.h. ? I honestly don't care how this done, the documentation clearly says that this happens and that indicates that _DARWIN_C_SOURCE selects the API Apple would like you to use. Anyway, why is this discusion on python-dev instead of in the issue tracker? BTW. IMHO resolution of this issue can wait until after 2.7.0, there is always 2.7.1 and I don't think we need to rush this (the issue has been dormant for quite a while) Ronald -------------- next part -------------- An HTML attachment was scrubbed... URL: From tseaver at palladion.com Wed Jun 23 17:30:23 2010 From: tseaver at palladion.com (Tres Seaver) Date: Wed, 23 Jun 2010 11:30:23 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Stephen J. Turnbull wrote: > We do need str-based implementations of modules like urllib. Why would that be? URLs aren't text, and never will be. The fact that to the eye they may seem to be text-ish doesn't make them text. This *is* a case where "dont make me think" is a losing propsition: programmers who work with URLs in any non-opaque way as text are eventually going to be bitten by this issue no matter how hard we wave our hands. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwiKI4ACgkQ+gerLs4ltQ56/QCbBPdj8jaPbcvPIDPb7ys04oHg fLIAnR+kA2udazsnpzTp2INGz2CoWgzj =Swjw -----END PGP SIGNATURE----- From alexander.belopolsky at gmail.com Wed Jun 23 17:37:12 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 23 Jun 2010 11:37:12 -0400 Subject: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7 In-Reply-To: References: Message-ID: In my previous post, I forgot to include the link to the tracker issue where this problem is being worked on. http://bugs.python.org/issue7900 I'll repost my message there as an issue comment, so that a more detailed technical discussion can continue there. From tseaver at palladion.com Wed Jun 23 17:37:53 2010 From: tseaver at palladion.com (Tres Seaver) Date: Wed, 23 Jun 2010 11:37:53 -0400 Subject: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7 In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Alexander Belopolsky wrote: > In my view, getgroups and getgrouplist are two fundamentally different > operations and both should be provided by the os module. Redefining > os.getgroups to invoke getgrouplist instead of system getgroups on one > particular platform to work around that platform's system call > limitation is not right. +1. syscall wrappers should err on the side of thinness, even to the point of anorexia. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwiKlEACgkQ+gerLs4ltQ4vKwCg3JwpWvivq8Dk7PYy2iPrKq/E 88gAn1lfeEcDJlfGm+F0jEbxsv1BfQJW =JzHS -----END PGP SIGNATURE----- From guido at python.org Wed Jun 23 17:43:46 2010 From: guido at python.org (Guido van Rossum) Date: Wed, 23 Jun 2010 08:43:46 -0700 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Jun 23, 2010 at 8:30 AM, Tres Seaver wrote: > Stephen J. Turnbull wrote: > >> We do need str-based implementations of modules like urllib. > > Why would that be? ?URLs aren't text, and never will be. ?The fact that > to the eye they may seem to be text-ish doesn't make them text. ?This > *is* a case where "dont make me think" is a losing propsition: > programmers who work with URLs in any non-opaque way as text are > eventually going to be bitten by this issue no matter how hard we wave > our hands. This has been asserted and contested several times now, and I don't see the two positions getting any closer. So I propose that we drop the discussion "are URLs text or bytes" and try to find something more pragmatic to discuss. For example: how we can make the suite of functions used for URL processing more polymorphic, so that each developer can choose for herself how URLs need to be treated in her application. -- --Guido van Rossum (python.org/~guido) From cyounkins at gmail.com Wed Jun 23 17:51:31 2010 From: cyounkins at gmail.com (Craig Younkins) Date: Wed, 23 Jun 2010 11:51:31 -0400 Subject: [Python-Dev] Use of cgi.escape can lead to XSS vulnerabilities In-Reply-To: <10286.1277242190@parc.com> References: <10286.1277242190@parc.com> Message-ID: http://bugs.python.org/issue9061 On Tue, Jun 22, 2010 at 5:29 PM, Bill Janssen wrote: > Craig Younkins wrote: > > > cgi.escape never escapes single quote characters, which can easily lead > to a > > Cross-Site Scripting (XSS) vulnerability. This seems to be known by many, > > but a quick search reveals many are using cgi.escape for HTML attribute > > escaping. > > Did you file a bug report? > > Bill > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Wed Jun 23 18:03:27 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 23 Jun 2010 12:03:27 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100623120327.3bd030e9@heresy> On Jun 23, 2010, at 08:43 AM, Guido van Rossum wrote: >So I propose that we drop the discussion "are URLs text or bytes" and >try to find something more pragmatic to discuss. email has exactly the same question, and the answer is "yes". >For example: how we can make the suite of functions used for URL >processing more polymorphic, so that each developer can choose for >herself how URLs need to be treated in her application. I think email package hackers should watch this effort closely. RDM has written some stuff up on how we think we're going to handle this, though it's probably pretty email package specific. Maybe there's a better, general, or conventional approach lurking around somewhere. http://wiki.python.org/moin/Email%20SIG -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From janssen at parc.com Wed Jun 23 18:11:05 2010 From: janssen at parc.com (Bill Janssen) Date: Wed, 23 Jun 2010 09:11:05 PDT Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <13070.1277309465@parc.com> Tres Seaver wrote: > Stephen J. Turnbull wrote: > > > We do need str-based implementations of modules like urllib. > > Why would that be? URLs aren't text, and never will be. The fact that > to the eye they may seem to be text-ish doesn't make them text. This URLs are exactly text (strings, representable as Unicode strings in Py3K), and were designed as such from the start. The fact that some of the things tunneled or carried in URLs are string representations of non-string data shouldn't obscure that point. They're not "text-ish", they're text. They're not opaque, either; they break down in well-specified ways, mainly into strings. The trouble comes in when we try to go beyond the spec, or handle things that don't conform to the spec. Sure, a path component of a URI might actually be a %-escaped sequence of arbitrary bytes, even bytes that don't represent a string in any known encoding, but that's only *after* reversing the %-escapes, which should happen in a scheme-specific piece of code, not in generic URL parsing or manipulation. Bill From ianb at colorstudy.com Wed Jun 23 18:30:51 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Wed, 23 Jun 2010 11:30:51 -0500 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Jun 23, 2010 at 10:30 AM, Tres Seaver wrote: > Stephen J. Turnbull wrote: > > > We do need str-based implementations of modules like urllib. > > > Why would that be? URLs aren't text, and never will be. The fact that > to the eye they may seem to be text-ish doesn't make them text. This > *is* a case where "dont make me think" is a losing propsition: > programmers who work with URLs in any non-opaque way as text are > eventually going to be bitten by this issue no matter how hard we wave > our hands. > HTML is text, and URLs are embedded in that text, so it's easy to get a URL that is text. Though, with a little testing, I notice that text alone can't tell you what the right URL really is (at least the intended URL when unsafe characters are embedded in HTML). To test I created two pages, one in Latin-1 another in UTF-8, and put in the link: ./test.html?param=R?union On a Latin-1 page it created a link to test.html?param=R%E9union and on a UTF-8 page it created a link to test.html?param=R%C3%A9union (the second link displays in the URL bar as test.html?param=R?union but copies with percent encoding). Though if you link to ./R?union.html then both pages create UTF-8 links. And both pages also link http://R?union.com to http://xn--runion-bva.com/. So really neither bytes nor text works completely; query strings receive the encoding of the page, which would be handled transparently if you worked on the page's bytes. Path and domain are consistently encoded with UTF-8 and punycode respectively and so would be handled best when treated as text. And of course if you are a page with a non-ASCII-compatible encoding you really must handle encodings before the URL is sensible. Another issue here is that there's no "encoding" for turning a URL into bytes if the URL is not already ASCII. A proper way to encode a URL would be: (Totally as an aside, as I remind myself of new module names I notice it's not easy to google specifically for Python 3 docs, e.g. "python 3 urlsplit" gives me 2.6 docs) from urllib.parse import urlsplit, urlunsplit import encodings.idna def encode_http_url(url, page_encoding='ASCII', errors='strict'): scheme, netloc, path, query, fragment = urlsplit(url) scheme = scheme.encode('ASCII', errors) auth = port = None if '@' in netloc: auth, netloc = netloc.split('@', 1) if ':' in netloc: netloc, port = netloc.split(':', 1) netloc = encodings.idna.ToASCII(netloc) if port: netloc = netloc + b':' + port.encode('ASCII', errors) if auth: netloc = auth.encode('UTF-8', errors) + b'@' + netloc path = path.encode('UTF-8', errors) query = query.encode(page_encoding, errors) fragment = fragment.encode('UTF-8', errors) return urlunsplit_bytes((scheme, netloc, path, query, fragment)) Where urlunsplit_bytes handles bytes (urlunsplit does not). It's helpful for me at least to look at that code specifically: def urlunsplit(components): scheme, netloc, url, query, fragment = components if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'): if url and url[:1] != '/': url = '/' + url url = '//' + (netloc or '') + url if scheme: url = scheme + ':' + url if query: url = url + '?' + query if fragment: url = url + '#' + fragment return url In this case it really would be best to have Python 2's system where things are coerced to ASCII implicitly. Or, more specifically, if all those string literals in that routine could be implicitly converted to bytes using ASCII. Conceptually I think this is reasonable, as for URLs (at least with HTTP, but in practice I think this applies to all URLs) the ASCII bytes really do have meaning. That is, '/' (*in the context of urlunsplit*) really is \x2f specifically. Or another example, making a GET request really means sending the bytes \x47\x45\x54 and there is no other set of bytes that has that meaning. The WebSockets specification for instance defines things like "colon": http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-76#page-5 -- in an earlier version they even used bytes to describe HTTP ( http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-54#page-13), though this annoyed many people. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From janssen at parc.com Wed Jun 23 18:46:48 2010 From: janssen at parc.com (Bill Janssen) Date: Wed, 23 Jun 2010 09:46:48 PDT Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <13837.1277311608@parc.com> Guido van Rossum wrote: > So I propose that we drop the discussion "are URLs text or bytes" and > try to find something more pragmatic to discuss. > > For example: how we can make the suite of functions used for URL > processing more polymorphic, so that each developer can choose for > herself how URLs need to be treated in her application. While I agree with "find something more pragmatic to discuss", it also seems to me that introducing polymorphic URL processing might make things more confusing and error-prone. The bigger problem seems to be that we're revisiting the design discussion about urllib.parse from the summer of 2008. See http://bugs.python.org/issue3300 if you want to recall how we hashed this out 2 years ago. I didn't particularly like that design, but I had to go off on vacation :-), and things got settled while I was away. I haven't heard much from Matt Giuca since he stopped by and lobbed that patch into the standard library. But since Guido is the one who settled it, why are we talking about it again? Bill From ianb at colorstudy.com Wed Jun 23 18:49:13 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Wed, 23 Jun 2010 11:49:13 -0500 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Oops, I forgot some important quoting (important for the algorithm, maybe not actually for the discussion)... from urllib.parse import urlsplit, urlunsplit import encodings.idna # urllib.parse.quote both always returns str, and is not as conservative in quoting as required here... def quote_unsafe_bytes(b): result = [] for c in b: if c < 0x20 or c >= 0x80: result.extend(('%%%02X' % c).encode('ASCII')) else: result.append(c) return bytes(result) def encode_http_url(url, page_encoding='ASCII', errors='strict'): ??? scheme, netloc, path, query, fragment = urlsplit(url) ??? scheme = scheme.encode('ASCII', errors) ??? auth = port = None ??? if '@' in netloc: ??????? auth, netloc = netloc.split('@', 1) ??? if ':' in netloc: ??????? netloc, port = netloc.split(':', 1) ? ? netloc = encodings.idna.ToASCII(netloc) ??? if port: ??????? netloc = netloc + b':' + port.encode('ASCII', errors) ??? if auth: ??????? netloc = quote_unsafe_bytes(auth.encode('UTF-8', errors)) + b'@' + netloc ??? path = quote_unsafe_bytes(path.encode('UTF-8', errors)) ??? query = quote_unsafe_bytes(query.encode(page_encoding, errors)) ??? fragment = quote_unsafe_bytes(fragment.encode('UTF-8', errors)) ??? return urlunsplit_bytes((scheme, netloc, path, query, fragment)) -- Ian Bicking ?| ?http://blog.ianbicking.org From glyph at twistedmatrix.com Wed Jun 23 03:01:17 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 22 Jun 2010 21:01:17 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <94700B9C-25B4-4A75-BA43-20FEA3FDE772@twistedmatrix.com> Message-ID: On Jun 22, 2010, at 8:57 PM, Robert Collins wrote: > bzr has a cache of decoded strings in it precisely because decode is > slow. We accept slowness encoding to the users locale because thats > typically much less data to examine than we've examined while > generating the commit/diff/whatever. We also face memory pressure on a > regular basis, and that has been, at least partly, due to UCS4 - our > translation cache helps there because we have less duplicate UCS4 > strings. Thanks for setting the record straight - apologies if I missed this earlier in the thread. It does seem vaguely familiar. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Wed Jun 23 19:38:05 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 23 Jun 2010 13:38:05 -0400 Subject: [Python-Dev] WPython 1.1 was released In-Reply-To: References: <201006232112.41047.steve@pearwood.info> Message-ID: On 6/23/2010 7:28 AM, Cesare Di Mauro wrote: > sorry, I made a mistake, assuming that the project was known. A common mistake of people who announce their projects ;-) Someone recently make the same mistake on python-list with respect to a 'BDD' package (the Wikipedia suggests about 6 possible expansions of the acronym. > > WPython is a CPython 2.6.4 implementation that uses "wordcodes" instead > of bytecodes. A wordcode is a word (16 bits, two bytes, in this case) I suggest you specify the base version (2.6.4) on the project page as that would be very relevant to many who visit. One should not have to download and look at the source to discover to discover if they should bother downloading the code. Perhaps also add a sentence as to the choice (why not 3.1?). -- Terry Jan Reedy From cesare.di.mauro at gmail.com Wed Jun 23 19:53:46 2010 From: cesare.di.mauro at gmail.com (Cesare Di Mauro) Date: Wed, 23 Jun 2010 19:53:46 +0200 Subject: [Python-Dev] WPython 1.1 was released In-Reply-To: References: <201006232112.41047.steve@pearwood.info> Message-ID: 2010/6/23 Terry Reedy > On 6/23/2010 7:28 AM, Cesare Di Mauro wrote: > WPython is a CPython 2.6.4 implementation that uses "wordcodes" instead > of bytecodes. A wordcode is a word (16 bits, two bytes, in this case) > > I suggest you specify the base version (2.6.4) on the project page as that > would be very relevant to many who visit. One should not have to download > and look at the source to discover to discover if they should bother > downloading the code. Perhaps also add a sentence as to the choice (why not > 3.1?). > > -- > Terry Jan Reedy Thanks for the suggestions. I've updated the main project accordingly. :) Cesare -------------- next part -------------- An HTML attachment was scrubbed... URL: From tseaver at palladion.com Wed Jun 23 20:23:33 2010 From: tseaver at palladion.com (Tres Seaver) Date: Wed, 23 Jun 2010 14:23:33 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <13837.1277311608@parc.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <13837.1277311608@parc.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Bill Janssen wrote: > The bigger problem seems to be that we're revisiting the design > discussion about urllib.parse from the summer of 2008. See > http://bugs.python.org/issue3300 if you want to recall how we hashed > this out 2 years ago. I didn't particularly like that design, but I had > to go off on vacation :-), and things got settled while I was away. I > haven't heard much from Matt Giuca since he stopped by and lobbed that > patch into the standard library. > > But since Guido is the one who settled it, why are we talking about it > again? Perhaps such decisions need revisiting in light of subsequent experience / pain / learning. E.g: - - the repeated inability of the web-sig to converge on appropriate semantics for a Python3-compatible version of the WSGI spec; - - the subsequent quirkiness of the Python3 wsgiref implementation; - - the breakage in cgi.py which prevents handling file uploads in a web application; - - the slow adoption / porting rate of major web frameworks and libraries to Python 3. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwiUSAACgkQ+gerLs4ltQ49EwCeLYwrZs6QfairPP5zpeeUlxao qg8An37kRz1CrzGc3kScvSqVx8FPnO1M =lR6R -----END PGP SIGNATURE----- From martin at v.loewis.de Wed Jun 23 20:29:44 2010 From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 23 Jun 2010 20:29:44 +0200 Subject: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7 In-Reply-To: References: Message-ID: <4C225298.9010701@v.loewis.de> > The problem that _DARWIN_C_SOURCE introduces is that it replaces > system getgroups with a database query effectively making the true > process' list of supplementary group IDs inaccessible to programs. > See source code at > . If that is true (i.e. the file is really the one that is being used), I think this is a severe flaw in OSX's implementation of the POSIX specification. Then, I agree that Python, in turn, should make sure that posix.getgroups is really the POSIX version of getgroups, not the Apple version. This is a general principle: if the system has two competing implementations of some API, the Python posix module should strive to call the POSIX version of the API. If the vendor's version of the API is also useful, it can be exposed under a different name (if, in turn, this is technically possible). Just my 0.02?. Regards, Martin From glyph at twistedmatrix.com Wed Jun 23 20:31:41 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Wed, 23 Jun 2010 14:31:41 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <4C21FB50.1080905@holdenweb.com> References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> <4C21FB50.1080905@holdenweb.com> Message-ID: <9A9D719C-0ED5-4061-B314-06450CC965BB@twistedmatrix.com> On Jun 23, 2010, at 8:17 AM, Steve Holden wrote: > Guido van Rossum wrote: >> On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver wrote: >>> Any "turdiness" (which I am *not* arguing for) is a natural consequence >>> of the kinds of backward incompatibilities which were *not* ruled out >>> for Python 3, along with the (early, now waning) "build it and they will >>> come" optimism about adoption rates. >> >> FWIW, my optimisim is *not* waning. I think it's good that we're >> having this discussion and I expect something useful will come out of >> it; I also expect in general that the (admittedly serious) problem of >> having to port all dependencies will be solved in the next few years. >> Not by magic, but because many people are taking small steps in the >> right direction, and there will be light eventually. In the mean time >> I don't blame anyone for sticking with 2.x or being too busy to help >> port stuff to 3.x. Python 3 has been a long time in the making -- it >> will be a bit longer still, which was expected. >> > +1 > > The important thing is to avoid bigotry and FUD, and deal with things > the way they are. The #python IRC team have just helped us make a major > step forward. This won't be a campaign with a victorious charge over > some imaginary finish line. For sure. I don't speak for Tres, but I don't think he wasn't talking about optimism about *adoption*, overall, but optimism about adoption *rates*. And I don't think he was talking about it coming from Guido :). There has definitely been some "irrational exuberance" from some quarters. The form it usually takes is someone making a blog post which assumes, because the author could port their smallish library or application without too much hassle, that Python 2.x is already dead and everyone should be off of it in a couple of weeks. I've never heard this position from the core team or any official communication or documentation. Far from it: the realistic attitude that the Python 3 migration is something that will take a while has significantly reduced my own concerns. Even the aforementioned blog posts have been encouraging in some ways, because a lot of people are reporting surprisingly easy transitions. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tseaver at palladion.com Wed Jun 23 20:40:47 2010 From: tseaver at palladion.com (Tres Seaver) Date: Wed, 23 Jun 2010 14:40:47 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <9A9D719C-0ED5-4061-B314-06450CC965BB@twistedmatrix.com> References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> <4C21FB50.1080905@holdenweb.com> <9A9D719C-0ED5-4061-B314-06450CC965BB@twistedmatrix.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Glyph Lefkowitz wrote: > I don't speak for Tres, but I don't think he wasn't talking about > optimism about *adoption*, overall, but optimism about adoption > *rates*. And I don't think he was talking about it coming from Guido > :). You channel me correctly here. In particular, the phrase "build it and they will come" was meant to address the idea that the only thing needed to drive adoption was the release of the new, shiny Python3. That particular bit of optimism is what I meant to describe as waning: the community on the whole seems to be more realistic now than two or three years ago about the kind of extra effort required from both core developers and from existing Python 2 folks to get to Python 3. > There has definitely been some "irrational exuberance" from some > quarters. The form it usually takes is someone making a blog post > which assumes, because the author could port their smallish library > or application without too much hassle, that Python 2.x is already > dead and everyone should be off of it in a couple of weeks. > > I've never heard this position from the core team or any official > communication or documentation. Far from it: the realistic attitude > that the Python 3 migration is something that will take a while has > significantly reduced my own concerns. > > Even the aforementioned blog posts have been encouraging in some > ways, because a lot of people are reporting surprisingly easy > transitions. Indeed. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwiVS8ACgkQ+gerLs4ltQ4kQgCeJ9nwU8XyiWzOTpHSbWg21bzU 0/IAnjVOj5SlgA9mnAsx4/wMad5lNkqq =HObh -----END PGP SIGNATURE----- From solipsis at pitrou.net Wed Jun 23 21:36:45 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 23 Jun 2010 21:36:45 +0200 Subject: [Python-Dev] bytes / unicode References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <13837.1277311608@parc.com> Message-ID: <20100623213645.658517d7@pitrou.net> On Wed, 23 Jun 2010 14:23:33 -0400 Tres Seaver wrote: > > Perhaps such decisions need revisiting in light of subsequent experience > / pain / learning. E.g: > > - - the repeated inability of the web-sig to converge on appropriate > semantics for a Python3-compatible version of the WSGI spec; > > - - the subsequent quirkiness of the Python3 wsgiref implementation; The way wsgiref was adapted is admittedly suboptimal. It was totally broken at first, and PJE didn't want to look very deeply into it. We therefore had to settle on a series of small modifications that seemed rather reasonable, but without any in-depth discussion of what WSGI had to look like under Python 3 (since it was not our job and responsibility). Therefore, I don't think wsgiref should be taken as a guide to what a cleaned up, Python 3-specific WSGI must look like. > - - the slow adoption / porting rate of major web frameworks and libraries > to Python 3. Some of the major web frameworks and libraries have a ton of dependencies, which would explain why they really haven't bothered yet. I don't think you can't claim, though, that Python 3 makes things significantly harder for these frameworks. The proof is that many of them already give the user unicode strings in Python 2.x. They must have somehow got the decoding right. Regards Antoine. From ronaldoussoren at mac.com Wed Jun 23 22:31:42 2010 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Wed, 23 Jun 2010 22:31:42 +0200 Subject: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7 In-Reply-To: References: Message-ID: <02EFB202-505A-405E-AE00-ABC0A2234DDA@mac.com> On 23 Jun, 2010, at 16:48, Alexander Belopolsky wrote: > On Wed, Jun 23, 2010 at 2:08 AM, Ronald Oussoren wrote: > .. >>> >>>> * [Ronald's proposal] results in posix.getgroups not reflecting results of posix.setgroups >>>> >>> >>> This effectively substitutes getgrouplist called on the current user >>> for getgroups. In 3.x, I believe the correct action will be to >>> provide direct access to getgrouplist which is while not POSIX (yet?), >>> is widely available. >> >> I don't mind adding getgrouplist, but that issue is seperator from this one. BTW. Appearently getgrouplist is posix >> ( ), although this isn't a >> requirement for being added to the posix module. >> > > (The link you provided leads to "Linux Standard Base Core > Specification," which is different from POSIX, but the distinction is > not relevant for our discussion.) I know, but the page claims getgrouplist is in SUS. I've since looked at what claims to be a copy of SUS: http://www.unix.org/single_unix_specification/ and that does not contain getgrouplist. > >> >> It is still my opinion that the second option is preferable for better compatibility with system tools, even if the patch >> is more complicated and the library function we use can be considered to be broken. > > Let me try to formulate what the disagreement is. There are two > different group lists that can be associated with a running process: > 1) The list of current supplementary group IDs maintained by the > system for each process and stored in per-process system tables; and > 2) The list of the groups that include the uid under which the process > is running as a member. > > The first list is returned by a system call getgroups and the second > can be obtained using system database access functions as follows: > > pw = getpwuid(getuid()) > getgrouplist(pw->pw_name, ..) > > The first list can be modified by privileged processes using setgroups > system call, while the second changes when system databases change. > > The problem that _DARWIN_C_SOURCE introduces is that it replaces > system getgroups with a database query effectively making the true > process' list of supplementary group IDs inaccessible to programs. > See source code at > . > > The problem is complicated by the fact that OSX true getgroups call > appears to truncate the list of groups to NGROUPS_MAX=16. Note, > however that it is not clear whether the system call truncates the > list or the underlying process tables are limited to 16 entries and > additional groups are ignored when the process is created. > > In my view, getgroups and getgrouplist are two fundamentally different > operations and both should be provided by the os module. Redefining > os.getgroups to invoke getgrouplist instead of system getgroups on one > particular platform to work around that platform's system call > limitation is not right. But we don't redefine os.getgroups to call getgrouplist, it is the system library that seems to implement getgroups(3) using getgrouplist(3). I agree that that is odd at best, but it is IMHO functioning as designed by Apple (that is, Apple choose the pick the current behavior, they didn't accidently break this). The previous paragraph is nitpicky, but this is IMO an important distinction. I've done some more experimentation: * compat(5) lies: not setting _DARWIN_C_SOURCE is not the same as settings _DARWIN_C_SOURCE when the deployment target is 10.5, with _DARWIN_C_SOURCE getgroups it translated to the symbol "_getgroups$DARWIN_EXTSN" in the object file, without it is "_getgroups". * the id(1) command uses the version of getgroups that does not reflect setgroups. Given this script: import os os.system("id") os.setgroups([1]) os.system("id") Running it gives an unexpected output: # /usr/bin/python doit.py uid=0(root) gid=0(wheel) groups=0(wheel),204(_developer),100(_lpoperator),98(_lpadmin),80(admin),61(localaccounts),29(certusers),20(staff),12(everyone),9(procmod),8(procview),5(operator),4(tty),3(sys),2(kmem),1(daemon),401(com.apple.access_screensharing) uid=0(root) gid=0(wheel) groups=0(wheel),204(_developer),100(_lpoperator),98(_lpadmin),80(admin),61(localaccounts),29(certusers),20(staff),12(everyone),9(procmod),8(procview),5(operator),4(tty),3(sys),2(kmem),1(daemon),401(com.apple.access_screensharing) * when I add a group in the Accounts panel in System Preferences and add my account to it the id(1) command immediately reflects the change (as expected given the previous result) * adding a non-administrator account to a newly created group does not affect filesystem access for existing process (that is, if I created a file that's only readable for the new group and the test user couldn't read that file until I logged out and in again), which means the Account panel doesn't magically alter kernel state for running processes. * Setting or unsetting _DARWIN_C_SOURCE doesn't affect the contents of pyconfig.h beyond that setting: $ diff pyconfig.h-DARWIN_C_SOURCE pyconfig.h-NO_DARWIN_SOURCE 1124c1124 < #define _DARWIN_C_SOURCE 1 --- > /* #undef _DARWIN_C_SOURCE */ "pyconfig.h-DARWIN_C_SOURCE" is generated by the current configure script, the other one is generated by a configure script that was patched to not yet _DARWIN_C_SOURCE (by removing "AC_DEFINE(_DARWIN_C_SOURCE, 1, [Define on Darwin to activate all library features])" from configure.in and regenerating configure). Both were generated using "configure MACOSX_DEPLOYMENT_TARGET=10.5". * setgroups(3) cannot set more than 16 groups, that is "setgroups(17, gidset)" will always return EINVAL (this is on OSX 10.6.4). I've verified this using a C program that directly calls the right APIs. I'm busy with projects for the rest of the week and won't be able to do anything python-dev related until Sunday. Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3567 bytes Desc: not available URL: From a.badger at gmail.com Wed Jun 23 23:30:22 2010 From: a.badger at gmail.com (Toshio Kuratomi) Date: Wed, 23 Jun 2010 17:30:22 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100623213645.658517d7@pitrou.net> References: <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <13837.1277311608@parc.com> <20100623213645.658517d7@pitrou.net> Message-ID: <20100623213022.GB3470@unaka.lan> On Wed, Jun 23, 2010 at 09:36:45PM +0200, Antoine Pitrou wrote: > On Wed, 23 Jun 2010 14:23:33 -0400 > Tres Seaver wrote: > > - - the slow adoption / porting rate of major web frameworks and libraries > > to Python 3. > > Some of the major web frameworks and libraries have a ton of > dependencies, which would explain why they really haven't bothered yet. > > I don't think you can't claim, though, that Python 3 makes things > significantly harder for these frameworks. The proof is that many of > them already give the user unicode strings in Python 2.x. They must > have somehow got the decoding right. > Note that this assumption seems optimistic to me. I started talking to Graham Dumpleton, author of mod_wsgi a couple years back because mod_wsgi and paste do decoding of bytes to unicode at different layers which caused problems for application level code that should otherwise run fine when being served by mod_wsgi or paste httpserver. That was the beginning of Graham starting to talk about what the wsgi spec really should look like under python3 instead of the broken way that the appendix to the current wsgi spec states. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From solipsis at pitrou.net Wed Jun 23 23:35:12 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 23 Jun 2010 23:35:12 +0200 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100623213022.GB3470@unaka.lan> References: <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <13837.1277311608@parc.com> <20100623213645.658517d7@pitrou.net> <20100623213022.GB3470@unaka.lan> Message-ID: <20100623233512.50b5b710@pitrou.net> On Wed, 23 Jun 2010 17:30:22 -0400 Toshio Kuratomi wrote: > Note that this assumption seems optimistic to me. I started talking to Graham > Dumpleton, author of mod_wsgi a couple years back because mod_wsgi and paste > do decoding of bytes to unicode at different layers which caused problems > for application level code that should otherwise run fine when being served > by mod_wsgi or paste httpserver. That was the beginning of Graham starting > to talk about what the wsgi spec really should look like under python3 > instead of the broken way that the appendix to the current wsgi spec states. Ok, but the reason would be that the WSGI spec is broken. Not Python 3 itself. Regards Antoine. From henry at precheur.org Wed Jun 23 23:35:38 2010 From: henry at precheur.org (Henry Precheur) Date: Wed, 23 Jun 2010 14:35:38 -0700 Subject: [Python-Dev] [Web-SIG] bytes / unicode In-Reply-To: <20100623213645.658517d7@pitrou.net> References: <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <13837.1277311608@parc.com> <20100623213645.658517d7@pitrou.net> Message-ID: <20100623213538.GB9501@banane.novuscom.net> On Wed, Jun 23, 2010 at 09:36:45PM +0200, Antoine Pitrou wrote: > I don't think you can't claim, though, that Python 3 makes things > significantly harder for these frameworks. The proof is that many of > them already give the user unicode strings in Python 2.x. They must > have somehow got the decoding right. Well... Frameworks usually 'simplify' the problem by partly ignoring it. By default they assume the data in the request in UTF-8. You can specify an alternative encoding in most of them. Django [1], Werkzeug [2], and WebOb [3] do that. The problem with this approach is that you still have to deal with weird requests where one thing is unicode, and another is latin-1. Sometime you can even have 2 different encodings in a single header like Cookies. There's no solution to this problem, it has to be solved on a case by case basis. There was a big discussion a while ago on web-sig. I think the consensus was that WSGI for Python 3 should assume that the data is encoded in latin-1 since it's the default encoding according to the RFC. [1] http://docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.encoding [2] http://werkzeug.pocoo.org/documentation/dev/unicode.html#request-and-response-objects [3] http://pythonpaste.org/webob/reference.html#unicode-variables -- Henry Pr?cheur From tullarisc256 at gmail.com Wed Jun 23 21:08:52 2010 From: tullarisc256 at gmail.com (tullarisc) Date: Wed, 23 Jun 2010 12:08:52 -0700 (PDT) Subject: [Python-Dev] swig/python and intel's threadedbuildginblocks Message-ID: <28975580.post@talk.nabble.com> Hi, I've compiled intel's OSS threadedbuidlingblocks library on OpenBSD and put everything in some swig interfaces. Here you go: http://tullarisc.xtreemhost.com/swig.ttb.tgz Love, tullarisc. -- View this message in context: http://old.nabble.com/swig-python-and-intel%27s-threadedbuildginblocks-tp28975580p28975580.html Sent from the Python - python-dev mailing list archive at Nabble.com. From brett at python.org Wed Jun 23 23:53:36 2010 From: brett at python.org (Brett Cannon) Date: Wed, 23 Jun 2010 14:53:36 -0700 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? Message-ID: I finally realized why clang has not been silencing its warnings about unused return values: I have -Wno-unused-value set in CFLAGS which comes before OPT (which defines -Wall) as set in PY_CFLAGS in Makefile.pre.in. I could obviously set OPT in my environment, but that would override the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, but the README says that's for stuff that tweak binary compatibility. So basically what I am asking is what environment variable should I use? If CFLAGS is correct then does anyone have any issues if I change the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes after OPT? From a.badger at gmail.com Thu Jun 24 00:57:40 2010 From: a.badger at gmail.com (Toshio Kuratomi) Date: Wed, 23 Jun 2010 18:57:40 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100623233512.50b5b710@pitrou.net> References: <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <13837.1277311608@parc.com> <20100623213645.658517d7@pitrou.net> <20100623213022.GB3470@unaka.lan> <20100623233512.50b5b710@pitrou.net> Message-ID: <20100623225740.GC3470@unaka.lan> On Wed, Jun 23, 2010 at 11:35:12PM +0200, Antoine Pitrou wrote: > On Wed, 23 Jun 2010 17:30:22 -0400 > Toshio Kuratomi wrote: > > Note that this assumption seems optimistic to me. I started talking to Graham > > Dumpleton, author of mod_wsgi a couple years back because mod_wsgi and paste > > do decoding of bytes to unicode at different layers which caused problems > > for application level code that should otherwise run fine when being served > > by mod_wsgi or paste httpserver. That was the beginning of Graham starting > > to talk about what the wsgi spec really should look like under python3 > > instead of the broken way that the appendix to the current wsgi spec states. > > Ok, but the reason would be that the WSGI spec is broken. Not Python 3 > itself. > Agreed. Neither python2 nor python3 is broken. It's the wsgi spec and the implementation of that spec where things fall down. From your first post, I thought you were claiming that python3 was broken since web frameworks got decoding right on python2 and I just wanted to defend python3 by showing that python2 wasn't all sunshine and roses. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From foom at fuhm.net Thu Jun 24 02:26:25 2010 From: foom at fuhm.net (James Y Knight) Date: Wed, 23 Jun 2010 20:26:25 -0400 Subject: [Python-Dev] Use of cgi.escape can lead to XSS vulnerabilities In-Reply-To: References: Message-ID: <09E6BE78-066E-4BCF-AA34-C6286CF8AB98@fuhm.net> On Jun 22, 2010, at 5:14 PM, Craig Younkins wrote: > I suggest rewording the documentation for the method making it more > clear what it should and should not be used for. I would like to see > the method changed to properly escape single-quotes, but if it is > not changed, the documentation should explicitly say this method > does not make input safe for inclusion in HTML. Well, it *does* make the input safe for inclusion in HTML...in a double-quoted attribute. The docs could make it clearer that you should always use double- quotes around your attribute values when using it, though, I agree. From janssen at parc.com Thu Jun 24 03:26:46 2010 From: janssen at parc.com (Bill Janssen) Date: Wed, 23 Jun 2010 18:26:46 PDT Subject: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7 In-Reply-To: <02EFB202-505A-405E-AE00-ABC0A2234DDA@mac.com> References: <02EFB202-505A-405E-AE00-ABC0A2234DDA@mac.com> Message-ID: <1366.1277342806@parc.com> See also http://gimper.net/viewtopic.php?f=18&t=3185. Bill From ronaldoussoren at mac.com Thu Jun 24 08:10:42 2010 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 24 Jun 2010 08:10:42 +0200 Subject: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7 In-Reply-To: <1366.1277342806@parc.com> References: <02EFB202-505A-405E-AE00-ABC0A2234DDA@mac.com> <1366.1277342806@parc.com> Message-ID: On 24 Jun, 2010, at 3:26, Bill Janssen wrote: > See also http://gimper.net/viewtopic.php?f=18&t=3185. That's because setgroups(3) is limited to 16 groups (that is, the kernel doesn't support more than 16 groups at all). Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3567 bytes Desc: not available URL: From greg.ewing at canterbury.ac.nz Thu Jun 24 09:20:34 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 24 Jun 2010 19:20:34 +1200 Subject: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7 In-Reply-To: References: <02EFB202-505A-405E-AE00-ABC0A2234DDA@mac.com> <1366.1277342806@parc.com> Message-ID: <4C230742.40103@canterbury.ac.nz> Ronald Oussoren wrote: > That's because setgroups(3) is limited to 16 groups > (that is, the kernel doesn't support more than 16 groups at all). So how does an account being a member of 18 groups ever work? -- Greg From stephen at xemacs.org Thu Jun 24 10:12:13 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 24 Jun 2010 17:12:13 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > For example: how we can make the suite of functions used for URL > processing more polymorphic, so that each developer can choose for > herself how URLs need to be treated in her application. While you have come down on the side of polymorphism (as opposed to separate functions), I'm a little nervous about it. Specifically, Philip Eby expressed a desire for earlier type errors, while polymorphism seems to ensure that you'll need to Look Before You Leap to get early error detection. From regebro at gmail.com Thu Jun 24 11:05:03 2010 From: regebro at gmail.com (Lennart Regebro) Date: Thu, 24 Jun 2010 11:05:03 +0200 Subject: [Python-Dev] bytes / unicode In-Reply-To: <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> Message-ID: On Tue, Jun 22, 2010 at 20:07, James Y Knight wrote: > Yeah. This is a real issue I have with the direction Python3 went: it pushes > you into decoding everything to unicode early, even when you don't care -- Well, yes, maybe even if *you* don't care. But often the functions you need to call must care, and then you need to decode to unicode, even if you personally don't care. And in those cases, you should deocde as early as possible. In the cases where neither you nor the functions you call care, then you don't have to decode, and you can happily pass binary data from one function to another. So this is not really a question of the direction Python 3 went. It's more a case that some methods that *could* do their transformations in a well defined way on bytes don't, and then force you to decode to unicode. But that's not a problem with direction, it's just a missing feature in the stdlib. -- Lennart Regebro: http://regebro.wordpress.com/ Python 3 Porting: http://python3porting.com/ +33 661 58 14 64 From mal at egenix.com Thu Jun 24 12:58:23 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 24 Jun 2010 12:58:23 +0200 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> Message-ID: <4C233A4F.2030607@egenix.com> Lennart Regebro wrote: > On Tue, Jun 22, 2010 at 20:07, James Y Knight wrote: >> Yeah. This is a real issue I have with the direction Python3 went: it pushes >> you into decoding everything to unicode early, even when you don't care -- > > Well, yes, maybe even if *you* don't care. But often the functions you > need to call must care, and then you need to decode to unicode, even > if you personally don't care. And in those cases, you should deocde as > early as possible. > > In the cases where neither you nor the functions you call care, then > you don't have to decode, and you can happily pass binary data from > one function to another. > > So this is not really a question of the direction Python 3 went. It's > more a case that some methods that *could* do their transformations in > a well defined way on bytes don't, and then force you to decode to > unicode. But that's not a problem with direction, it's just a missing > feature in the stdlib. The discussion is showing that in at least a few application spaces, the stdlib should be able to work on both bytes and Unicode, preferably using the same interfaces using polymorphism, i.e. some_function(bytes) -> bytes some_function(str) -> str In Python2 this partially works due to the automatic bytes->str conversion (in some cases you get some_function(bytes) -> str), the codec base class implementations being a prime example. In Python3, things have to be done explicity and I think we need to add a few helpers to make writing such str/bytes interfaces easier. We've already had some suggestions in that area, but probably need to collect a few more ideas based on real-life porting attempts. I'd like to make this a topic at the upcoming language summit in Birmingham, if Michael agrees. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 24 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 24 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From fuzzyman at voidspace.org.uk Thu Jun 24 13:00:12 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 24 Jun 2010 12:00:12 +0100 Subject: [Python-Dev] bytes / unicode In-Reply-To: <4C233A4F.2030607@egenix.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <4C233A4F.2030607@egenix.com> Message-ID: <4C233ABC.40702@voidspace.org.uk> On 24/06/2010 11:58, M.-A. Lemburg wrote: > Lennart Regebro wrote: > >> On Tue, Jun 22, 2010 at 20:07, James Y Knight wrote: >> >>> Yeah. This is a real issue I have with the direction Python3 went: it pushes >>> you into decoding everything to unicode early, even when you don't care -- >>> >> Well, yes, maybe even if *you* don't care. But often the functions you >> need to call must care, and then you need to decode to unicode, even >> if you personally don't care. And in those cases, you should deocde as >> early as possible. >> >> In the cases where neither you nor the functions you call care, then >> you don't have to decode, and you can happily pass binary data from >> one function to another. >> >> So this is not really a question of the direction Python 3 went. It's >> more a case that some methods that *could* do their transformations in >> a well defined way on bytes don't, and then force you to decode to >> unicode. But that's not a problem with direction, it's just a missing >> feature in the stdlib. >> > The discussion is showing that in at least a few application spaces, > the stdlib should be able to work on both bytes and Unicode, preferably > using the same interfaces using polymorphism, i.e. > > some_function(bytes) -> bytes > some_function(str) -> str > > In Python2 this partially works due to the automatic bytes->str > conversion (in some cases you get some_function(bytes) -> str), > the codec base class implementations being a prime example. > > In Python3, things have to be done explicity and I think we need > to add a few helpers to make writing such str/bytes interfaces > easier. > > We've already had some suggestions in that area, but probably need > to collect a few more ideas based on real-life porting attempts. > > I'd like to make this a topic at the upcoming language summit > in Birmingham, if Michael agrees. > > Yep, it sounds like a great topic for the language summit. Michael -- http://www.ironpythoninaction.com/ From guido at python.org Thu Jun 24 16:33:42 2010 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Jun 2010 07:33:42 -0700 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Jun 24, 2010 at 1:12 AM, Stephen J. Turnbull wrote: > Guido van Rossum writes: > > ?> For example: how we can make the suite of functions used for URL > ?> processing more polymorphic, so that each developer can choose for > ?> herself how URLs need to be treated in her application. > > While you have come down on the side of polymorphism (as opposed to > separate functions), I'm a little nervous about it. ?Specifically, > Philip Eby expressed a desire for earlier type errors, while > polymorphism seems to ensure that you'll need to Look Before You Leap > to get early error detection. Understood, but both the majority of str/bytes methods and several existing APIs (e.g. many in the os module, like os.listdir()) do it this way. Also, IMO a polymorphic function should *not* accept *mixed* bytes/text input -- join('x', b'y') should be rejected. But join('x', 'y') -> 'x/y' and join(b'x', b'y') -> b'x/y' make sense to me. So, actually, I *don't* understand what you mean by needing LBYL. -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Thu Jun 24 17:25:18 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Jun 2010 01:25:18 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Jun 25, 2010 at 12:33 AM, Guido van Rossum wrote: > Also, IMO a polymorphic function should *not* accept *mixed* > bytes/text input -- join('x', b'y') should be rejected. But join('x', > 'y') -> 'x/y' and join(b'x', b'y') -> b'x/y' make sense to me. A policy of allowing arguments to be either str or bytes, but not a mixture, actually avoids one of the more painful aspects of the 2.x "promote mixed operations to unicode" approach. Specifically, you either had to scan all the arguments up front to check for unicode, or else you had to stop what you were doing and start again with the unicode version if you encountered unicode partway through. Neither was particularly nice to implement. As you noted elsewhere, literals and string methods are still likely to be a major sticking point with that approach - common operations like ''.join(seq) and b''.join(seq) aren't polymorphic, so functions that use them won't be polymorphic either. (It's only the str->unicode promotion behaviour in 2.x that works around this problem there). Would it be heretical to suggest that sum() be allowed to work on strings to at least eliminate ''.join() as something that breaks bytes processing? It already works for bytes, although it then fails with a confusing message for bytearray: >>> sum(b"a b c".split(), b'') b'abc' >>> sum(bytearray(b"a b c").split(), bytearray(b'')) Traceback (most recent call last): File " ", line 1, in TypeError: sum() can't sum bytes [use b''.join(seq) instead] >>> sum("a b c".split(), '') Traceback (most recent call last): File " ", line 1, in TypeError: sum() can't sum strings [use ''.join(seq) instead] Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Thu Jun 24 17:41:14 2010 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Jun 2010 08:41:14 -0700 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Jun 24, 2010 at 8:25 AM, Nick Coghlan wrote: > On Fri, Jun 25, 2010 at 12:33 AM, Guido van Rossum wrote: >> Also, IMO a polymorphic function should *not* accept *mixed* >> bytes/text input -- join('x', b'y') should be rejected. But join('x', >> 'y') -> 'x/y' and join(b'x', b'y') -> b'x/y' make sense to me. > > A policy of allowing arguments to be either str or bytes, but not a > mixture, actually avoids one of the more painful aspects of the 2.x > "promote mixed operations to unicode" approach. Specifically, you > either had to scan all the arguments up front to check for unicode, or > else you had to stop what you were doing and start again with the > unicode version if you encountered unicode partway through. Neither > was particularly nice to implement. Right. Polymorphic functions should *not* allow mixing text and bytes. It's all text or all bytes. > As you noted elsewhere, literals and string methods are still likely > to be a major sticking point with that approach - common operations > like ''.join(seq) and b''.join(seq) aren't polymorphic, so functions > that use them won't be polymorphic either. (It's only the str->unicode > promotion behaviour in 2.x that works around this problem there). > > Would it be heretical to suggest that sum() be allowed to work on > strings to at least eliminate ''.join() as something that breaks bytes > processing? It already works for bytes, although it then fails with a > confusing message for bytearray: > >>>> sum(b"a b c".split(), b'') > b'abc' > >>>> sum(bytearray(b"a b c").split(), bytearray(b'')) > Traceback (most recent call last): > ?File " ", line 1, in > TypeError: sum() can't sum bytes [use b''.join(seq) instead] > >>>> sum("a b c".split(), '') > Traceback (most recent call last): > ?File " ", line 1, in > TypeError: sum() can't sum strings [use ''.join(seq) instead] I don't think we should abuse sum for this. A simple idiom to get the *empty* string of a particular type is x[:0] so you could write something like this to concatenate a list or strings or bytes: xs[:0].join(xs). Note that if xs is empty we wouldn't know what to do anyway so this should be disallowed. -- --Guido van Rossum (python.org/~guido) From barry at python.org Thu Jun 24 17:50:48 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 11:50:48 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 Message-ID: <20100624115048.4fd152e3@heresy> This is a follow up to PEP 3147. That PEP, already implemented in Python 3.2, allows for Python source files from different Python versions to live together in the same directory. It does this by putting a magic tag in the .pyc file name and placing the .pyc file in a __pycache__ directory. Distros such as Debian and Ubuntu will use this to greatly simplifying deploying Python, and Python applications and libraries. Debian and Ubuntu usually ship more than one version of Python, and currently have to play complex games with symlinks to make this work. PEP 3147 will go a long way to eliminating the need for extra directories and symlinks. One more thing I've found we need though, is a way to handled shared libraries for extension modules. Just as we can get name collisions on foo.pyc, we can get collisions on foo.so. We obviously cannot install foo.so built for Python 3.2 and foo.so built for Python 3.3 in the same location. So symlink nightmare's mini-me is back. I have a fairly simple fix for this. I'd actually be surprised if this hasn't been discussed before, but teh Googles hasn't turned up anything. The idea is to put the Python version number in the shared library file name, and extend .so lookup to find these extended file names. So for example, we'd see foo.3.2.so instead, and Python would know how to dynload both that and the traditional foo.so file too (for backward compatibility). (On file naming: the original patch used foo.so.3.2 and that works just as well, but I thought there might be tools that expect exactly a '.so' suffix, so I changed it to put the Major.Minor version number to the left of the extension. The exact naming scheme is of course open to debate.) This is a much simpler patch than PEP 3147, though I'm not 100% sure it's the right approach. The way this works is by modifying the configure and Makefile.pre.in to put the version number in the $SO make variable. Python parses its (generated) Makefile to find $SO and it uses this deep in the bowels of distutils to decide what suffix to use when writing shared libraries built by 'python setup.py build_ext'. This means the patched Python only writes versioned .so files by default. I personally don't see that as a problem, and it does not affect the test suite, with the exception of one easily tweaked test. I don't know if third party tools will care. The fact that traditional foo.so shared libraries will still satisfy the import should be enough, I think. The patch is currently Linux only, since I need this for Debian and Ubuntu and wanted to keep the change narrow. Other possible approaches: * Extend the distutils API so that the .so file extension can be passed in, instead of being essentially hardcoded to what Python's Makefile contains. * Keep the dynload_shlib.c change, but modify the Debian/Ubuntu build environment to pass in $SO to make (though the configure.in warning and sleep is a little annoying). * Add a ./configure option to enable this, which Debuntu's build would use. The patch is available here: http://pastebin.ubuntu.com/454512/ and my working branch is here: https://code.edge.launchpad.net/~barry/python/sovers Please let me know what you think. I'm happy to just commit this to the py3k branch if there are no objections . I don't think a new PEP is in order, but an update to PEP 3147 might make sense. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From benjamin at python.org Thu Jun 24 17:58:09 2010 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 24 Jun 2010 10:58:09 -0500 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624115048.4fd152e3@heresy> References: <20100624115048.4fd152e3@heresy> Message-ID: 2010/6/24 Barry Warsaw : > Please let me know what you think. ?I'm happy to just commit this to the py3k > branch if there are no objections . ?I don't think a new PEP is in > order, but an update to PEP 3147 might make sense. How will this interact with PEP 384 if that is implemented? -- Regards, Benjamin From daniel at stutzbachenterprises.com Thu Jun 24 18:05:29 2010 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Thu, 24 Jun 2010 11:05:29 -0500 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624115048.4fd152e3@heresy> References: <20100624115048.4fd152e3@heresy> Message-ID: On Thu, Jun 24, 2010 at 10:50 AM, Barry Warsaw wrote: > The idea is to put the Python version number in the shared library file > name, > and extend .so lookup to find these extended file names. So for example, > we'd > see foo.3.2.so instead, and Python would know how to dynload both that and > the > traditional foo.so file too (for backward compatibility). > What use case does this address? PEP 3147 addresses the fact that the user may have different versions of Python installed and each wants to write a .pyc file when loading a module. .so files are not generated simply by running the Python interpreter, ergo .so files are not an issue for that use case. If you want to make it so a system can install a package in just one location to be used by multiple Python installations, then the version number isn't enough. You also need to distinguish debug builds, profiling builds, Unicode width (see issue8654), and probably several other ./configure options. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From baptiste13z at free.fr Thu Jun 24 18:58:59 2010 From: baptiste13z at free.fr (Baptiste Carvello) Date: Thu, 24 Jun 2010 18:58:59 +0200 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100621181750.267933A404D@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <20100621023005.EE17E3A4099@sparrow.telecommunity.com> <20100621164650.16A093A414B@sparrow.telecommunity.com> <20100621181750.267933A404D@sparrow.telecommunity.com> Message-ID: P.J. Eby a ?crit : > [...] stdlib constants are almost always ASCII, > and the main use cases for ebytes would involve ascii-extended encodings.) Then, how about a new "ascii string" literal? This would produce a special kind of string that would coerce to a normal string when mixed with a str, and to a bytes using ascii codec when mixed with a bytes. Then you could write >>> a"/".join(base, path) and not worry if base and path are both str, or both bytes (mixed being of course forbidden). B. From pje at telecommunity.com Thu Jun 24 19:07:01 2010 From: pje at telecommunity.com (P.J. Eby) Date: Thu, 24 Jun 2010 13:07:01 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100624170856.0853D3A4099@sparrow.telecommunity.com> At 05:12 PM 6/24/2010 +0900, Stephen J. Turnbull wrote: >Guido van Rossum writes: > > > For example: how we can make the suite of functions used for URL > > processing more polymorphic, so that each developer can choose for > > herself how URLs need to be treated in her application. > >While you have come down on the side of polymorphism (as opposed to >separate functions), I'm a little nervous about it. Specifically, >Philip Eby expressed a desire for earlier type errors, while >polymorphism seems to ensure that you'll need to Look Before You Leap >to get early error detection. This doesn't have to be in the functions; it can be in the *types*. Mixed-type string operations have to do type checking and upcasting already, but if the protocol were open, you could make an encoded-bytes type that would handle the error checking. (Btw, in some earlier emails, Stephen, you implied that this could be fixed with codecs -- but it can't, because the problem isn't with the bytes containing invalid Unicode, it's with the Unicode containing invalid bytes -- i.e., characters that can't be encoded to the ultimate codec target.) From janssen at parc.com Thu Jun 24 19:38:19 2010 From: janssen at parc.com (Bill Janssen) Date: Thu, 24 Jun 2010 10:38:19 PDT Subject: [Python-Dev] thoughts on the bytes/string discussion Message-ID: <11597.1277401099@parc.com> Here are a couple of ideas I'm taking away from the bytes/string discussion. First, it would probably be a good idea to have a String ABC. Secondly, maybe the string situation in 2.x wasn't as broken as we thought it was. In particular, those who deal with lots of encoded strings seemed to find it handy, and miss it in 3.x. Perhaps strings are more like numbers than we think. We have separate types for int, float, Decimal, etc. But they're all numbers, and they all cross-operate. In 2.x, it seems there were two missing features: no encoding attribute on str, which should have been there and should have been required, and the default encoding being "ASCII" (I can't tell you how many times I've had to fix that issue when a non-ASCII encoded str was passed to some output function). So maybe having a second string type in 3.x that consists of an encoded sequence of bytes plus the encoding, call it "estr", wouldn't have been a bad idea. It would probably have made sense to have estr cooperate with the str type, in the same way that two different kinds of numbers cooperate, "promoting" the result of an operation only when necessary. This would automatically achieve the kind of polymorphic functionality that Guido is suggesting, but without losing the ability to do x = e(ASCII)"bar" a = ''.join("foo", x) (or whatever the syntax for such an encoded string literal would be -- I'm not claiming this is a good one) which presume would bind "a" to a Unicode string "foobar" -- have to work out what gets promoted to what. The language moratorium kind of makes this all theoretical, but building a String ABC still would be a good start, and presumably isn't forbidden by the moratorium. Bill From brett at python.org Thu Jun 24 19:48:56 2010 From: brett at python.org (Brett Cannon) Date: Thu, 24 Jun 2010 10:48:56 -0700 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624115048.4fd152e3@heresy> References: <20100624115048.4fd152e3@heresy> Message-ID: On Thu, Jun 24, 2010 at 08:50, Barry Warsaw wrote: > This is a follow up to PEP 3147. ?That PEP, already implemented in Python 3.2, > allows for Python source files from different Python versions to live together > in the same directory. ?It does this by putting a magic tag in the .pyc file > name and placing the .pyc file in a __pycache__ directory. > > Distros such as Debian and Ubuntu will use this to greatly simplifying > deploying Python, and Python applications and libraries. ?Debian and Ubuntu > usually ship more than one version of Python, and currently have to play > complex games with symlinks to make this work. ?PEP 3147 will go a long way to > eliminating the need for extra directories and symlinks. > > One more thing I've found we need though, is a way to handled shared libraries > for extension modules. ?Just as we can get name collisions on foo.pyc, we can > get collisions on foo.so. ?We obviously cannot install foo.so built for Python > 3.2 and foo.so built for Python 3.3 in the same location. ?So symlink > nightmare's mini-me is back. > > I have a fairly simple fix for this. ?I'd actually be surprised if this hasn't > been discussed before, but teh Googles hasn't turned up anything. > > The idea is to put the Python version number in the shared library file name, > and extend .so lookup to find these extended file names. ?So for example, we'd > see foo.3.2.so instead, and Python would know how to dynload both that and the > traditional foo.so file too (for backward compatibility). > > (On file naming: the original patch used foo.so.3.2 and that works just as > well, but I thought there might be tools that expect exactly a '.so' suffix, > so I changed it to put the Major.Minor version number to the left of the > extension. ?The exact naming scheme is of course open to debate.) > While the idea is fine with me since I won't have any of my directories cluttered with multiple .so files, I would still want to add some moniker showing that the version number represents the interpreter and not the .so file. If I read "foo.3.2.so", that naively seems to mean to mean the foo module's 3.2 release is what is in installed, not that it's built for CPython 3.2. So even though it might be redundant, I would still want the VM name added. Adding the VM name also doesn't make extension modules the exclusive domain of CPython either. If some other VM decides to make their own .so files that are not binary compatible then we should not preclude that as this solution it is nothing more than it makes a string comparison have to look at 7 more characters. -Brett P.S.: I wish we could drop use of the 'module.so' variant at the same time, for consistency sake and to cut out a stat call, but I know that is asking too much. From barry at python.org Thu Jun 24 19:51:19 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 13:51:19 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> Message-ID: <20100624135119.00b9ac5c@heresy> On Jun 24, 2010, at 10:58 AM, Benjamin Peterson wrote: >2010/6/24 Barry Warsaw : >> Please let me know what you think. ?I'm happy to just commit this to the >> py3k branch if there are no objections . ?I don't think a new PEP is >> in order, but an update to PEP 3147 might make sense. > >How will this interact with PEP 384 if that is implemented? Good question, I'd forgotten to mention that PEP. I think the PEP is a good idea, and worth working on, but it is a longer term solution to the problem of extension source code compatibility. It's longer term because extensions will have to be rewritten to use the new API defined in PEP 384. It will take a long time to get this into practice, and supporting it will be a case-by-case basis. I'm trying to come up with something that will work immediately while PEP 384 is being adopted. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From benjamin at python.org Thu Jun 24 20:00:54 2010 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 24 Jun 2010 13:00:54 -0500 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624135119.00b9ac5c@heresy> References: <20100624115048.4fd152e3@heresy> <20100624135119.00b9ac5c@heresy> Message-ID: 2010/6/24 Barry Warsaw : > On Jun 24, 2010, at 10:58 AM, Benjamin Peterson wrote: > >>2010/6/24 Barry Warsaw : >>> Please let me know what you think. ?I'm happy to just commit this to the >>> py3k branch if there are no objections . ?I don't think a new PEP is >>> in order, but an update to PEP 3147 might make sense. >> >>How will this interact with PEP 384 if that is implemented? > I'm trying to come up with something that will work immediately while PEP 384 > is being adopted. But how will modules specify that they support multiple ABIs then? -- Regards, Benjamin From brett at python.org Thu Jun 24 20:11:07 2010 From: brett at python.org (Brett Cannon) Date: Thu, 24 Jun 2010 11:11:07 -0700 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <11597.1277401099@parc.com> References: <11597.1277401099@parc.com> Message-ID: On Thu, Jun 24, 2010 at 10:38, Bill Janssen wrote: [SNIP] > The language moratorium kind of makes this all theoretical, but building > a String ABC still would be a good start, and presumably isn't forbidden > by the moratorium. Because a new ABC would go into the stdlib (I assume in collections or string) the moratorium does not apply. From guido at python.org Thu Jun 24 20:27:37 2010 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Jun 2010 11:27:37 -0700 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> Message-ID: On Thu, Jun 24, 2010 at 10:48 AM, Brett Cannon wrote: > On Thu, Jun 24, 2010 at 08:50, Barry Warsaw wrote: >> This is a follow up to PEP 3147. ?That PEP, already implemented in Python 3.2, >> allows for Python source files from different Python versions to live together >> in the same directory. ?It does this by putting a magic tag in the .pyc file >> name and placing the .pyc file in a __pycache__ directory. >> >> Distros such as Debian and Ubuntu will use this to greatly simplifying >> deploying Python, and Python applications and libraries. ?Debian and Ubuntu >> usually ship more than one version of Python, and currently have to play >> complex games with symlinks to make this work. ?PEP 3147 will go a long way to >> eliminating the need for extra directories and symlinks. >> >> One more thing I've found we need though, is a way to handled shared libraries >> for extension modules. ?Just as we can get name collisions on foo.pyc, we can >> get collisions on foo.so. ?We obviously cannot install foo.so built for Python >> 3.2 and foo.so built for Python 3.3 in the same location. ?So symlink >> nightmare's mini-me is back. >> >> I have a fairly simple fix for this. ?I'd actually be surprised if this hasn't >> been discussed before, but teh Googles hasn't turned up anything. >> >> The idea is to put the Python version number in the shared library file name, >> and extend .so lookup to find these extended file names. ?So for example, we'd >> see foo.3.2.so instead, and Python would know how to dynload both that and the >> traditional foo.so file too (for backward compatibility). >> >> (On file naming: the original patch used foo.so.3.2 and that works just as >> well, but I thought there might be tools that expect exactly a '.so' suffix, >> so I changed it to put the Major.Minor version number to the left of the >> extension. ?The exact naming scheme is of course open to debate.) >> > > While the idea is fine with me since I won't have any of my > directories cluttered with multiple .so files, I would still want to > add some moniker showing that the version number represents the > interpreter and not the .so file. If I read "foo.3.2.so", that naively > seems to mean to mean the foo module's 3.2 release is what is in > installed, not that it's built for CPython 3.2. So even though it > might be redundant, I would still want the VM name added. Well, for versions of the .so itself, traditionally version numbers are appended *after* the .so suffix (check your /lib directory :-). > Adding the VM name also doesn't make extension modules the exclusive > domain of CPython either. If some other VM decides to make their own > .so files that are not binary compatible then we should not preclude > that as this solution it is nothing more than it makes a string > comparison have to look at 7 more characters. > > -Brett > > P.S.: I wish we could drop use of the 'module.so' variant at the same > time, for consistency sake and to cut out a stat call, but I know that > is asking too much. I wish so too. IIRC there used to be some modules that on Windows were wrappers around 3rd party DLLs and you can't have foo.dll as the module wrapping foo.dll the 3rd party DLL. (On Unix this problem doesn't exist because the 3rd party .so would be named libfoo.so, not foo.so.) -- --Guido van Rossum (python.org/~guido) From barry at python.org Thu Jun 24 20:28:30 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 14:28:30 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> <20100624135119.00b9ac5c@heresy> Message-ID: <20100624142830.4c859faf@limelight.wooz.org> On Jun 24, 2010, at 01:00 PM, Benjamin Peterson wrote: >2010/6/24 Barry Warsaw : >> On Jun 24, 2010, at 10:58 AM, Benjamin Peterson wrote: >> >>>2010/6/24 Barry Warsaw : >>>> Please let me know what you think. ?I'm happy to just commit this to the >>>> py3k branch if there are no objections . ?I don't think a new PEP is >>>> in order, but an update to PEP 3147 might make sense. >>> >>>How will this interact with PEP 384 if that is implemented? >> I'm trying to come up with something that will work immediately while PEP 384 >> is being adopted. > >But how will modules specify that they support multiple ABIs then? I didn't understand, so asked Benjamin for clarification in IRC. barry: if python 3.3 will only load x.3.3.so, but x.3.2.so supports the stable abi, will it load it? [14:25] gutworth: thanks, now i get it :) [14:26] gutworth: i think it should, but it wouldn't under my scheme. let me think about it -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From brett at python.org Thu Jun 24 20:47:14 2010 From: brett at python.org (Brett Cannon) Date: Thu, 24 Jun 2010 11:47:14 -0700 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> Message-ID: On Thu, Jun 24, 2010 at 11:27, Guido van Rossum wrote: > On Thu, Jun 24, 2010 at 10:48 AM, Brett Cannon wrote: >> On Thu, Jun 24, 2010 at 08:50, Barry Warsaw wrote: >>> This is a follow up to PEP 3147. ?That PEP, already implemented in Python 3.2, >>> allows for Python source files from different Python versions to live together >>> in the same directory. ?It does this by putting a magic tag in the .pyc file >>> name and placing the .pyc file in a __pycache__ directory. >>> >>> Distros such as Debian and Ubuntu will use this to greatly simplifying >>> deploying Python, and Python applications and libraries. ?Debian and Ubuntu >>> usually ship more than one version of Python, and currently have to play >>> complex games with symlinks to make this work. ?PEP 3147 will go a long way to >>> eliminating the need for extra directories and symlinks. >>> >>> One more thing I've found we need though, is a way to handled shared libraries >>> for extension modules. ?Just as we can get name collisions on foo.pyc, we can >>> get collisions on foo.so. ?We obviously cannot install foo.so built for Python >>> 3.2 and foo.so built for Python 3.3 in the same location. ?So symlink >>> nightmare's mini-me is back. >>> >>> I have a fairly simple fix for this. ?I'd actually be surprised if this hasn't >>> been discussed before, but teh Googles hasn't turned up anything. >>> >>> The idea is to put the Python version number in the shared library file name, >>> and extend .so lookup to find these extended file names. ?So for example, we'd >>> see foo.3.2.so instead, and Python would know how to dynload both that and the >>> traditional foo.so file too (for backward compatibility). >>> >>> (On file naming: the original patch used foo.so.3.2 and that works just as >>> well, but I thought there might be tools that expect exactly a '.so' suffix, >>> so I changed it to put the Major.Minor version number to the left of the >>> extension. ?The exact naming scheme is of course open to debate.) >>> >> >> While the idea is fine with me since I won't have any of my >> directories cluttered with multiple .so files, I would still want to >> add some moniker showing that the version number represents the >> interpreter and not the .so file. If I read "foo.3.2.so", that naively >> seems to mean to mean the foo module's 3.2 release is what is in >> installed, not that it's built for CPython 3.2. So even though it >> might be redundant, I would still want the VM name added. > > Well, for versions of the .so itself, traditionally version numbers > are appended *after* the .so suffix (check your /lib directory :-). > Second thing you taught me today (first was the x[:0] trick)! I've also been on OS X too long; /usr/lib is just .dynalib and that puts the version number before the extension. >> Adding the VM name also doesn't make extension modules the exclusive >> domain of CPython either. If some other VM decides to make their own >> .so files that are not binary compatible then we should not preclude >> that as this solution it is nothing more than it makes a string >> comparison have to look at 7 more characters. >> >> -Brett >> >> P.S.: I wish we could drop use of the 'module.so' variant at the same >> time, for consistency sake and to cut out a stat call, but I know that >> is asking too much. > > I wish so too. IIRC there used to be some modules that on Windows were > wrappers around 3rd party DLLs and you can't have foo.dll as the > module wrapping foo.dll the 3rd party DLL. (On Unix this problem > doesn't exist because the 3rd party .so would be named libfoo.so, not > foo.so.) Wouldn't Barry's proposed solution actually fill this need since it will give the file a custom Python suffix that more-or-less guarantees no name clash with a third-party DLL? From merwok at netwok.org Thu Jun 24 20:50:41 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Thu, 24 Jun 2010 20:50:41 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624115048.4fd152e3@heresy> References: <20100624115048.4fd152e3@heresy> Message-ID: <4C23A901.7060100@netwok.org> Le 24/06/2010 17:50, Barry Warsaw (FLUFL) a ?crit : > Other possible approaches: > * Extend the distutils API so that the .so file extension can be passed in, > instead of being essentially hardcoded to what Python's Makefile contains. Third-party code rely on Distutils internal quirks, so it?s frozen. Feel free to open a bug against Distutils2 on the Python tracker if that would be generally useful. Regards From merwok at netwok.org Thu Jun 24 20:53:02 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Thu, 24 Jun 2010 20:53:02 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> Message-ID: <4C23A98E.4080303@netwok.org> Le 24/06/2010 19:48, Brett Cannon a ?crit : > P.S.: I wish we could drop use of the 'module.so' variant at the same > time, for consistency sake and to cut out a stat call, but I know that > is asking too much. At least, looking for spam/__init__module.so could be avoided. It seems to me that the package definition does not allow that. The tradeoff would be code complication for one less stat call. Worth a bug report? Regards From fuzzyman at voidspace.org.uk Thu Jun 24 21:07:41 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 24 Jun 2010 20:07:41 +0100 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> Message-ID: <4C23ACFD.6040506@voidspace.org.uk> On 24/06/2010 19:11, Brett Cannon wrote: > On Thu, Jun 24, 2010 at 10:38, Bill Janssen wrote: > [SNIP] > >> The language moratorium kind of makes this all theoretical, but building >> a String ABC still would be a good start, and presumably isn't forbidden >> by the moratorium. >> > Because a new ABC would go into the stdlib (I assume in collections or > string) the moratorium does not apply. > Although it would require changes for builtin types like file to work with a new string ABC, right? Michael > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From brett at python.org Thu Jun 24 21:10:38 2010 From: brett at python.org (Brett Cannon) Date: Thu, 24 Jun 2010 12:10:38 -0700 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <4C23ACFD.6040506@voidspace.org.uk> References: <11597.1277401099@parc.com> <4C23ACFD.6040506@voidspace.org.uk> Message-ID: On Thu, Jun 24, 2010 at 12:07, Michael Foord wrote: > On 24/06/2010 19:11, Brett Cannon wrote: >> >> On Thu, Jun 24, 2010 at 10:38, Bill Janssen ?wrote: >> [SNIP] >> >>> >>> The language moratorium kind of makes this all theoretical, but building >>> a String ABC still would be a good start, and presumably isn't forbidden >>> by the moratorium. >>> >> >> Because a new ABC would go into the stdlib (I assume in collections or >> string) the moratorium does not apply. >> > > Although it would require changes for builtin types like file to work with a > new string ABC, right? Only if they wanted to rely on some concrete implementation of a method contained within the ABC. Otherwise that's what abc.register exists for. From ianb at colorstudy.com Thu Jun 24 21:49:33 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 24 Jun 2010 14:49:33 -0500 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <11597.1277401099@parc.com> References: <11597.1277401099@parc.com> Message-ID: On Thu, Jun 24, 2010 at 12:38 PM, Bill Janssen wrote: > Here are a couple of ideas I'm taking away from the bytes/string > discussion. > > First, it would probably be a good idea to have a String ABC. > > Secondly, maybe the string situation in 2.x wasn't as broken as we > thought it was. In particular, those who deal with lots of encoded > strings seemed to find it handy, and miss it in 3.x. Perhaps strings > are more like numbers than we think. We have separate types for int, > float, Decimal, etc. But they're all numbers, and they all > cross-operate. In 2.x, it seems there were two missing features: no > encoding attribute on str, which should have been there and should have > been required, and the default encoding being "ASCII" (I can't tell you > how many times I've had to fix that issue when a non-ASCII encoded str > was passed to some output function). > I've started to form a conceptual notion that I think fits these cases. We've setup a system where we think of text as natively unicode, with encodings to put that unicode into a byte form. This is certainly appropriate in a lot of cases. But there's a significant class of problems where bytes are the native structure. Network protocols are what we've been discussing, and are a notable case of that. That is, b'/' is the most native sense of a path separator in a URL, or b':' is the most native sense of what separates a header name from a header value in HTTP. To disallow unicode URLs or unicode HTTP headers would be rather anti-social, especially because unicode is now the "native" string type in Python 3 (as an aside for the WSGI spec we've been talking about using "native" strings in some positions like dictionary keys, meaning Python 2 str and Python 3 str, while being more exacting in other areas such as a response body which would always be bytes). The HTTP spec and other network protocols seems a little fuzzy on this, because it was written before unicode even existed, and even later activity happened at a point when "unicode" and "text" weren't widely considered the same thing like they are now. But I think the original intention is revealed in a more modern specification like WebSockets, where they are very explicit that ':' is just shorthand for a particular byte, it is not "text" in our new modern notion of the term. So with this idea in mind it makes more sense to me that *specific pieces of text* can be reasonably treated as both bytes and text. All the string literals in urllib.parse.urlunspit() for example. The semantics I imagine are that special('/')+b'x'==b'/x' (i.e., it does not become special('/x')) and special('/')+x=='/x' (again it becomes str). This avoids some of the cases of unicode or str infecting a system as they did in Python 2 (where you might pass in unicode and everything works fine until some non-ASCII is introduced). The one place where this might be tricky is if you have an encoding that is not ASCII compatible. But we can't guard against every possibility. So it would be entirely wrong to take a string encoded with UTF-16 and start to use b'/' with it. But there are other nonsensical combinations already possible, especially with polymorphic functions, we can't guard against all of them. Also I'm unsure if something like UTF-16 is in any way compatible with the kind of legacy systems that use bytes. Can you encode your filesystem with UTF-16? I don't think you could encode a cookie with it. So maybe having a second string type in 3.x that consists of an encoded > sequence of bytes plus the encoding, call it "estr", wouldn't have been > a bad idea. It would probably have made sense to have estr cooperate > with the str type, in the same way that two different kinds of numbers > cooperate, "promoting" the result of an operation only when necessary. > This would automatically achieve the kind of polymorphic functionality > that Guido is suggesting, but without losing the ability to do > > x = e(ASCII)"bar" > a = ''.join("foo", x) > > (or whatever the syntax for such an encoded string literal would be -- > I'm not claiming this is a good one) which presume would bind "a" to a > Unicode string "foobar" -- have to work out what gets promoted to what. > I would be entirely happy without a literal syntax. But as Phillip has noted, this can't be implemented *entirely* in a library as there are some constraints with the current str/bytes implementations. Reading PEP 3003 I'm not clear if such changes are part of the moratorium? They seem like they would be (sadly), but it doesn't seem clearly noted. I think there's a *different* use case for things like bytes-in-a-utf8-encoding (e.g., to allow XML data to be decoded lazily), but that could be yet another class, and maybe shouldn't be polymorphicly usable as bytes (i.e., treat it as an optimized str representation that is otherwise semantically equivalent). A String ABC would formalize these things. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Thu Jun 24 22:46:37 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 16:46:37 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624142830.4c859faf@limelight.wooz.org> References: <20100624115048.4fd152e3@heresy> <20100624135119.00b9ac5c@heresy> <20100624142830.4c859faf@limelight.wooz.org> Message-ID: <20100624164637.22fd9160@heresy> On Jun 24, 2010, at 02:28 PM, Barry Warsaw wrote: >On Jun 24, 2010, at 01:00 PM, Benjamin Peterson wrote: > >>2010/6/24 Barry Warsaw : >>> On Jun 24, 2010, at 10:58 AM, Benjamin Peterson wrote: >>> >>>>2010/6/24 Barry Warsaw : >>>>> Please let me know what you think. ?I'm happy to just commit this to the >>>>> py3k branch if there are no objections . ?I don't think a new PEP is >>>>> in order, but an update to PEP 3147 might make sense. >>>> >>>>How will this interact with PEP 384 if that is implemented? >>> I'm trying to come up with something that will work immediately while PEP 384 >>> is being adopted. >> >>But how will modules specify that they support multiple ABIs then? > >I didn't understand, so asked Benjamin for clarification in IRC. > > barry: if python 3.3 will only load x.3.3.so, but x.3.2.so supports > the stable abi, will it load it? [14:25] > gutworth: thanks, now i get it :) [14:26] > gutworth: i think it should, but it wouldn't under my scheme. let me > think about it So, we could say that PEP 384 compliant extension modules would get written without a version specifier. IOW, we'd treat foo.so as using the ABI. It would then be up to the Python runtime to throw ImportErrors if in fact we were loading a legacy, non-PEP 384 compliant extension. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Thu Jun 24 22:53:36 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 16:53:36 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> Message-ID: <20100624165336.27fc7cc9@heresy> On Jun 24, 2010, at 10:48 AM, Brett Cannon wrote: >While the idea is fine with me since I won't have any of my >directories cluttered with multiple .so files, I would still want to >add some moniker showing that the version number represents the >interpreter and not the .so file. If I read "foo.3.2.so", that naively >seems to mean to mean the foo module's 3.2 release is what is in >installed, not that it's built for CPython 3.2. So even though it >might be redundant, I would still want the VM name added. I have a new version of my patch that steals the "magic tag" idea from PEP 3147. Note that it does not use the *actual* same piece of information to compose the file name, but for now it does match the pyc tag string. E.g. % find . -name \*.so ./build/lib.linux-x86_64-3.2/math.cpython-32.so ./build/lib.linux-x86_64-3.2/select.cpython-32.so ./build/lib.linux-x86_64-3.2/_struct.cpython-32.so ... Further, by default, ./configure doesn't add this tag so that you would have to build Python with: % SOABI=cpython-32 ./configure to get anything between the module name and the extension. I could of course make this a configure switch instead, and could default it to some other magic string instead of the empty string. >Adding the VM name also doesn't make extension modules the exclusive >domain of CPython either. If some other VM decides to make their own >.so files that are not binary compatible then we should not preclude >that as this solution it is nothing more than it makes a string >comparison have to look at 7 more characters. > >-Brett > >P.S.: I wish we could drop use of the 'module.so' variant at the same >time, for consistency sake and to cut out a stat call, but I know that >is asking too much. I think you're right that with the $SOABI trick above, you wouldn't get the name collisions Guido recalls, and you could get rid of module.so. OTOH, as I am currently only targeting Linux, it seems like the module.so stat is wasted anyway on that platform. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Thu Jun 24 22:55:33 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 16:55:33 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> Message-ID: <20100624165533.46a5fb8e@heresy> On Jun 24, 2010, at 11:27 AM, Guido van Rossum wrote: >On Thu, Jun 24, 2010 at 10:48 AM, Brett Cannon wrote: >> While the idea is fine with me since I won't have any of my >> directories cluttered with multiple .so files, I would still want to >> add some moniker showing that the version number represents the >> interpreter and not the .so file. If I read "foo.3.2.so", that naively >> seems to mean to mean the foo module's 3.2 release is what is in >> installed, not that it's built for CPython 3.2. So even though it >> might be redundant, I would still want the VM name added. > >Well, for versions of the .so itself, traditionally version numbers >are appended *after* the .so suffix (check your /lib directory :-). Which is probably another reason not to use foo.so.X.Y for Python extension modules. I think it would be confusing, and foo. .so looks nice and is consistent with foo. .pyc. (Ref to updated patch coming...) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From guido at python.org Thu Jun 24 22:59:09 2010 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Jun 2010 13:59:09 -0700 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> Message-ID: I see it a little differently (though there is probably a common concept lurking in here). The protocols you mention are intentionally designed to be encoding-neutral as long as the encoding is an ASCII superset. This covers ASCII itself, Latin-1, Latin-N for other values of N, MacRoman, Microsoft's code pages (most of them anyways), UTF-8, presumably at least some of the Japanese encodings, and probably a host of others. But it does not cover UTF-16, EBCDIC, and others. (Encodings that have "shift bytes" that change the meaning of some or all ordinary ASCII characters also aren't covered, unless such an encoding happens to exclude the special characters that the protocol spec cares about). The protocol specs typically go out of their way to specify what byte values they use for syntactically significant positions (e.g. ':' in headers, or '/' in URLs), while hand-waving about the meaning of "what goes in between" since it is all typically treated as "not of syntactic significance". So you can write a parser that looks at bytes exclusively, and looks for a bunch of ASCII punctuation characters (e.g. '<', '>', '/', '&'), and doesn't know or care whether the stuff in between is encoded in Latin-15, MacRoman or UTF-8 -- it never looks "inside" stretches of characters between the special characters and just copies them. (Sometimes there may be *some* sections that are required to be ASCII and there equivalence of a-z and A-Z is well defined.) But I wouldn't go so far as to claim that interpreting the protocols as text is wrong. After all we're talking exclusively about protocols that are designed intentionally to be directly "human readable" (albeit as a fall-back option) -- the only tool you need to debug the traffic on the wire or socket is something that knows which subset of ASCII is considered "printable" and which renders everything else safely as a hex escape or even a special "unknown" character (like Unicode's "?" inside a black diamond). Depending on the requirements of a specific app (or framework) it may be entirely reasonable to convert everything to Unicode and process the resulting text; in other contexts it makes more sense to keep everything as bytes. It also makes sense to have an interface library to deal with a specific protocol that treats the protocol side as bytes but interacts with the application using text, since that is often how the application programmer wants to treat it anyway. Of course, some protocols require the application programmer to be aware of bytes as well in *some* cases -- examples are email and HTTP which can be used to transfer text as well as binary data (e.g. images). There is also the bootstrap problem where the wire data must be partially parsed in order to find out the encoding to be used to convert it to text. But that doesn't mean it's invalid to think about it as text in many application contexts. Regarding the proposal of a String ABC, I hope this isn't going to become a backdoor to reintroduce the Python 2 madness of allowing equivalency between text and bytes for *some* strings of bytes and not others. Finally, I do think that we should not introduce changes to the fundamental behavior of text and bytes while the moratorium is in place. Changes to specific stdlib APIs are fine however. --Guido On Thu, Jun 24, 2010 at 12:49 PM, Ian Bicking wrote: > On Thu, Jun 24, 2010 at 12:38 PM, Bill Janssen wrote: >> >> Here are a couple of ideas I'm taking away from the bytes/string >> discussion. >> >> First, it would probably be a good idea to have a String ABC. >> >> Secondly, maybe the string situation in 2.x wasn't as broken as we >> thought it was. ?In particular, those who deal with lots of encoded >> strings seemed to find it handy, and miss it in 3.x. ?Perhaps strings >> are more like numbers than we think. ?We have separate types for int, >> float, Decimal, etc. ?But they're all numbers, and they all >> cross-operate. ?In 2.x, it seems there were two missing features: no >> encoding attribute on str, which should have been there and should have >> been required, and the default encoding being "ASCII" (I can't tell you >> how many times I've had to fix that issue when a non-ASCII encoded str >> was passed to some output function). > > I've started to form a conceptual notion that I think fits these cases. > > We've setup a system where we think of text as natively unicode, with > encodings to put that unicode into a byte form.? This is certainly > appropriate in a lot of cases.? But there's a significant class of problems > where bytes are the native structure.? Network protocols are what we've been > discussing, and are a notable case of that.? That is, b'/' is the most > native sense of a path separator in a URL, or b':' is the most native sense > of what separates a header name from a header value in HTTP.? To disallow > unicode URLs or unicode HTTP headers would be rather anti-social, especially > because unicode is now the "native" string type in Python 3 (as an aside for > the WSGI spec we've been talking about using "native" strings in some > positions like dictionary keys, meaning Python 2 str and Python 3 str, while > being more exacting in other areas such as a response body which would > always be bytes). > > The HTTP spec and other network protocols seems a little fuzzy on this, > because it was written before unicode even existed, and even later activity > happened at a point when "unicode" and "text" weren't widely considered the > same thing like they are now.? But I think the original intention is > revealed in a more modern specification like WebSockets, where they are very > explicit that ':' is just shorthand for a particular byte, it is not "text" > in our new modern notion of the term. > > So with this idea in mind it makes more sense to me that *specific pieces of > text* can be reasonably treated as both bytes and text.? All the string > literals in urllib.parse.urlunspit() for example. > > The semantics I imagine are that special('/')+b'x'==b'/x' (i.e., it does not > become special('/x')) and special('/')+x=='/x' (again it becomes str).? This > avoids some of the cases of unicode or str infecting a system as they did in > Python 2 (where you might pass in unicode and everything works fine until > some non-ASCII is introduced). > > The one place where this might be tricky is if you have an encoding that is > not ASCII compatible.? But we can't guard against every possibility.? So it > would be entirely wrong to take a string encoded with UTF-16 and start to > use b'/' with it.? But there are other nonsensical combinations already > possible, especially with polymorphic functions, we can't guard against all > of them.? Also I'm unsure if something like UTF-16 is in any way compatible > with the kind of legacy systems that use bytes.? Can you encode your > filesystem with UTF-16?? I don't think you could encode a cookie with it. > >> So maybe having a second string type in 3.x that consists of an encoded >> sequence of bytes plus the encoding, call it "estr", wouldn't have been >> a bad idea. ?It would probably have made sense to have estr cooperate >> with the str type, in the same way that two different kinds of numbers >> cooperate, "promoting" the result of an operation only when necessary. >> This would automatically achieve the kind of polymorphic functionality >> that Guido is suggesting, but without losing the ability to do >> >> ?x = e(ASCII)"bar" >> ?a = ''.join("foo", x) >> >> (or whatever the syntax for such an encoded string literal would be -- >> I'm not claiming this is a good one) which presume would bind "a" to a >> Unicode string "foobar" -- have to work out what gets promoted to what. > > I would be entirely happy without a literal syntax.? But as Phillip has > noted, this can't be implemented *entirely* in a library as there are some > constraints with the current str/bytes implementations.? Reading PEP 3003 > I'm not clear if such changes are part of the moratorium?? They seem like > they would be (sadly), but it doesn't seem clearly noted. > > I think there's a *different* use case for things like > bytes-in-a-utf8-encoding (e.g., to allow XML data to be decoded lazily), but > that could be yet another class, and maybe shouldn't be polymorphicly usable > as bytes (i.e., treat it as an optimized str representation that is > otherwise semantically equivalent).? A String ABC would formalize these > things. > > -- > Ian Bicking ?| ?http://blog.ianbicking.org > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > > -- --Guido van Rossum (python.org/~guido) From brett at python.org Thu Jun 24 23:08:14 2010 From: brett at python.org (Brett Cannon) Date: Thu, 24 Jun 2010 14:08:14 -0700 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C23A98E.4080303@netwok.org> References: <20100624115048.4fd152e3@heresy> <4C23A98E.4080303@netwok.org> Message-ID: On Thu, Jun 24, 2010 at 11:53, ?ric Araujo wrote: > Le 24/06/2010 19:48, Brett Cannon a ?crit : >> P.S.: I wish we could drop use of the 'module.so' variant at the same >> time, for consistency sake and to cut out a stat call, but I know that >> is asking too much. > > At least, looking for spam/__init__module.so could be avoided. It seems > to me that the package definition does not allow that. I thought no one had bothered to change import.c to allow for extension modules to act as a package's __init__? As for not being allowed, I don't agree with that assessment. If you treat a package's __init__ module as simply that, a module that would be named __init__ when imported, then __init__module.c would be valid (and that's what importlib does). > The tradeoff > would be code complication for one less stat call. Worth a bug report? Nah. From barry at python.org Thu Jun 24 23:09:44 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 17:09:44 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> Message-ID: <20100624170944.7e68ad21@heresy> On Jun 24, 2010, at 11:05 AM, Daniel Stutzbach wrote: >On Thu, Jun 24, 2010 at 10:50 AM, Barry Warsaw wrote: > >> The idea is to put the Python version number in the shared library file >> name, >> and extend .so lookup to find these extended file names. So for example, >> we'd >> see foo.3.2.so instead, and Python would know how to dynload both that and >> the >> traditional foo.so file too (for backward compatibility). >> > >What use case does this address? Specifically, it's the use case where we (Debian/Ubuntu) plan on installing all Python 3.x packages into /usr/lib/python3/dist-packages. As of PEP 3147, we can do that without collisions on the pyc files, but would still have to symlink for extension module .so files, because they are always named foo.so and Python 3.2's foo.so won't (modulo PEP 384) be compatible with Python 3.3's foo.so. So using the same trick as in PEP 3147, if we can name Python 3.2's foo extension differently than the incompatible Python 3.3's foo extension, we can have them live in the same directory without symlink tricks. >PEP 3147 addresses the fact that the user may have different versions of >Python installed and each wants to write a .pyc file when loading a module. > .so files are not generated simply by running the Python interpreter, ergo >.so files are not an issue for that use case. See above. It doesn't matter whether the pyc or so is created at run time by the user or by the distro build system. If the files for different Python versions end up in the same directory, they must be named differently too. >If you want to make it so a system can install a package in just one >location to be used by multiple Python installations, then the version >number isn't enough. You also need to distinguish debug builds, profiling >builds, Unicode width (see issue8654), and probably several other >./configure options. This is a good point, but more easily addressed. Let's say a distro makes three Python 3.2 variants available, one "normal" build, a debug build, and UCS2 and USC4 versions of the above. All we need to do is choose a different .so ABI tag (see previous follow) for each of those builds. My updated patch (coming soon) allows you to define that tag to configure. So e.g. Normal build UCSX: SOABI=cpython-32 ./configure Debug build UCSX: SOABI=cpython-32-d ./configure Normal build UCSY: SOABI=cpython-32-w ./configure Debug build UCSY: SOABI=cpython-32-dw ./configure Mix and match for any other build options you care about. Because the distro controls how Python is configured, this should be fairly easy to achieve. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From fdrake at acm.org Thu Jun 24 23:12:21 2010 From: fdrake at acm.org (Fred Drake) Date: Thu, 24 Jun 2010 17:12:21 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624165533.46a5fb8e@heresy> References: <20100624115048.4fd152e3@heresy> <20100624165533.46a5fb8e@heresy> Message-ID: On Thu, Jun 24, 2010 at 4:55 PM, Barry Warsaw wrote: > Which is probably another reason not to use foo.so.X.Y for Python extension > modules. Clearly, foo.so.3.2 is the man page for the foo.so.3 system call. The ABI ident definitely has to be elsewhere. -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From barry at python.org Thu Jun 24 23:23:02 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 17:23:02 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C23A901.7060100@netwok.org> References: <20100624115048.4fd152e3@heresy> <4C23A901.7060100@netwok.org> Message-ID: <20100624172302.024687ef@heresy> On Jun 24, 2010, at 08:50 PM, ?ric Araujo wrote: >Le 24/06/2010 17:50, Barry Warsaw (FLUFL) a ?crit : >> Other possible approaches: >> * Extend the distutils API so that the .so file extension can be passed in, >> instead of being essentially hardcoded to what Python's Makefile contains. > >Third-party code rely on Distutils internal quirks, so it?s frozen. Feel >free to open a bug against Distutils2 on the Python tracker if that >would be generally useful. Depending on how strict this constraint is, it could make things more difficult. I can control what shared library file names Python will load statically, but in order to support PEP 384 I think I need to be able to control what file extensions build_ext writes. My updated patch does this in a backward compatible way. Of course, distutils hacks have their tentacles all up in the distutils internals, so maybe my patch will break something after all. I can think of a few even hackier ways to work around that if necessary. My updated patch: * Adds an optional argument to build_ext.get_ext_fullpath() and build_ext.get_ext_filename(). This extra argument is the Extension instance being built. (Boy, just in case anyone's already playing with the time machine, it sure would have been nice if these methods had originally just taken the Extension instance and dug out ext.name instead of passing the string in.) * Adds an optional new keyword argument to the Extension class, called so_abi_tag. If given, this overrides the Makefile $SO variable extension. What this means is that with no changes, a non-PEP 384 compliant extension module wouldn't have to change anything: setup( name='stupid', version='0.0', packages=['stupid', 'stupid.tests'], ext_modules=[Extension('_stupid', ['src/stupid.c'], )], test_suite='stupid.tests', ) With a Python built like so: % SOABI=cpython-32 ./configure you'd end up with a _stupid.cpython-32.so module. However, if you knew your extension module was PEP 384 compliant, and could be shared on >=Python 3.2, you would do: setup( name='stupid', version='0.0', packages=['stupid', 'stupid.tests'], ext_modules=[Extension('_stupid', ['src/stupid.c'], so_abi_tag='', )], test_suite='stupid.tests', ) and now you'd end up with _stupid.so, which I propose to mean it's PEP 384 ABI compliant. (There may not be any other use case than so_abi_tag='' or so_abi_tag=None, in which case, the Extension keyword *might* be better off as a boolean.) Now of course PEP 384 isn't implemented, so it's a bit of a moot point. But if some form of versioned .so file naming is accepted for Python 3.2, I'll update PEP 384 with possible solutions. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Thu Jun 24 23:27:00 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 17:27:00 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624115048.4fd152e3@heresy> References: <20100624115048.4fd152e3@heresy> Message-ID: <20100624172700.0b837222@heresy> On Jun 24, 2010, at 11:50 AM, Barry Warsaw wrote: >Please let me know what you think. I'm happy to just commit this to the py3k >branch if there are no objections . I don't think a new PEP is in >order, but an update to PEP 3147 might make sense. Thanks for all the quick feedback. I've made some changes based on the comments so far. The bzr branch is updated, and a new patch is available here: http://pastebin.ubuntu.com/454688/ If reception continues to be mildly approving, I'll open an issue on bugs.python.org and attach the patch to that. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From merwok at netwok.org Thu Jun 24 23:37:10 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Thu, 24 Jun 2010 23:37:10 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624172302.024687ef@heresy> References: <20100624115048.4fd152e3@heresy> <4C23A901.7060100@netwok.org> <20100624172302.024687ef@heresy> Message-ID: <4C23D006.6080800@netwok.org> Your plan seems good. Adding keyword arguments should not create compatibility issues, and I suspect the impact on the code of build_ext may be actually quite small. I?ll try to review your patch even though I don?t know C or compiler oddities, but Tarek will have the best insight and the final word. In case the time machine?s not available, your suggestion about getting the filename from the Extension instance instead of passing in a string can most certainly land in distutils2. Regards From ianb at colorstudy.com Thu Jun 24 23:44:12 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 24 Jun 2010 16:44:12 -0500 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> Message-ID: On Thu, Jun 24, 2010 at 3:59 PM, Guido van Rossum wrote: > The protocol specs typically go out of their way to specify what byte > values they use for syntactically significant positions (e.g. ':' in > headers, or '/' in URLs), while hand-waving about the meaning of "what > goes in between" since it is all typically treated as "not of > syntactic significance". So you can write a parser that looks at bytes > exclusively, and looks for a bunch of ASCII punctuation characters > (e.g. '<', '>', '/', '&'), and doesn't know or care whether the stuff > in between is encoded in Latin-15, MacRoman or UTF-8 -- it never looks > "inside" stretches of characters between the special characters and > just copies them. (Sometimes there may be *some* sections that are > required to be ASCII and there equivalence of a-z and A-Z is well > defined.) > Yes, these are the specific characters that I think we can handle specially. For instance, the list of all string literals used by urlsplit and urlunsplit: '//' '/' ':' '?' '#' '' 'http' A list of all valid scheme characters (a-z etc) Some lists for scheme-specific parsing (which all contain valid scheme characters) All of these are constrained to ASCII, and must be constrained to ASCII, and everything else in a URL is treated as basically opaque. So if we turned these characters into byte-or-str objects I think we'd basically be true to the intent of the specs, and in a practical sense we'd be able to make these functions polymorphic. I suspect this same pattern will be present most places where people want polymorphic behavior. For now we could do something incomplete and just avoid using operators we can't overload (is it possible to at least make them produce a readable exception?) I think we'll avoid a lot of the confusion that was present with Python 2 by not making the coercions transitive. For instance, here's something that would work in Python 2: urlunsplit(('http', 'example.com', '/foo', u'bar=baz', '')) And you'd get out a unicode string, except that would break the first time that query string (u'bar=baz') was not ASCII (but not until then!) Here's the urlunsplit code: def urlunsplit(components): scheme, netloc, url, query, fragment = components if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'): if url and url[:1] != '/': url = '/' + url url = '//' + (netloc or '') + url if scheme: url = scheme + ':' + url if query: url = url + '?' + query if fragment: url = url + '#' + fragment return url If all those literals were this new special kind of string, if you call: urlunsplit((b'http', b'example.com', b'/foo', 'bar=baz', b'')) You'd end up constructing the URL b'http://example.com/foo' and then running: url = url + special('?') + query And that would fail because b'http://example.com/foo' + special('?') would be b'http://example.com/foo?' and you cannot add that to the str 'bar=baz'. So we'd be avoiding the Python 2 craziness. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Jun 24 23:50:56 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 24 Jun 2010 23:50:56 +0200 Subject: [Python-Dev] thoughts on the bytes/string discussion References: <11597.1277401099@parc.com> <4C23ACFD.6040506@voidspace.org.uk> Message-ID: <20100624235056.5a9930e6@pitrou.net> On Thu, 24 Jun 2010 20:07:41 +0100 Michael Foord wrote: > > Although it would require changes for builtin types like file to work > with a new string ABC, right? There is no builtin file type in 3.x. Besides, it is not an ABC-level problem; the IO layer is written in C (although there's still the Python implementation to play with), which would mandate an abstract C API to access unicode-like objects (similarly as there's already the buffer API to access bytes-like objects). Regards Antoine. From scott+python-dev at scottdial.com Thu Jun 24 23:53:06 2010 From: scott+python-dev at scottdial.com (Scott Dial) Date: Thu, 24 Jun 2010 17:53:06 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624170944.7e68ad21@heresy> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> Message-ID: <4C23D3C2.1060500@scottdial.com> On 6/24/2010 5:09 PM, Barry Warsaw wrote: >> What use case does this address? > > Specifically, it's the use case where we (Debian/Ubuntu) plan on installing > all Python 3.x packages into /usr/lib/python3/dist-packages. As of PEP 3147, > we can do that without collisions on the pyc files, but would still have to > symlink for extension module .so files, because they are always named foo.so > and Python 3.2's foo.so won't (modulo PEP 384) be compatible with Python 3.3's > foo.so. If the package has .so files that aren't compatible with other version of python, then what is the motivation for placing that in a shared location (since it can't actually be shared)? > So using the same trick as in PEP 3147, if we can name Python 3.2's foo > extension differently than the incompatible Python 3.3's foo extension, we can > have them live in the same directory without symlink tricks. Why would a symlink trick even be necessary if there is a version-unspecific directory and a version-specific directory on the search path? >> PEP 3147 addresses the fact that the user may have different versions of >> Python installed and each wants to write a .pyc file when loading a module. >> .so files are not generated simply by running the Python interpreter, ergo >> .so files are not an issue for that use case. > > See above. It doesn't matter whether the pyc or so is created at run time by > the user or by the distro build system. If the files for different Python > versions end up in the same directory, they must be named differently too. But the only motivation for doing this with .pyc files is that the .py files are able to be shared, since the .pyc is an on-demand-generated, version-specific artifact (and not the source). The .so file is created offline by another toolchain, is version-specific, and presumably you are not suggesting that Python generate it on-demand. > >> If you want to make it so a system can install a package in just one >> location to be used by multiple Python installations, then the version >> number isn't enough. You also need to distinguish debug builds, profiling >> builds, Unicode width (see issue8654), and probably several other >> ./configure options. > > This is a good point, but more easily addressed. Let's say a distro makes > three Python 3.2 variants available, one "normal" build, a debug build, and > UCS2 and USC4 versions of the above. All we need to do is choose a different > .so ABI tag (see previous follow) for each of those builds. My updated patch > (coming soon) allows you to define that tag to configure. So e.g. Why is this use case not already addressed by having independent directories? And why is there an incentive to co-mingle these version-punned files with version-agnostic ones? > Mix and match for any other build options you care about. Because the distro > controls how Python is configured, this should be fairly easy to achieve. For packages that have .so files, won't the distro already have to build multiple copies of that package for all version of Python? So, why can't it place them in separate directories that are version-specific at that time? This is not the same as placing .py files that are version-agnostic into a version-agnostic location. -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From tjreedy at udel.edu Fri Jun 25 00:00:30 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 24 Jun 2010 18:00:30 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <11597.1277401099@parc.com> References: <11597.1277401099@parc.com> Message-ID: On 6/24/2010 1:38 PM, Bill Janssen wrote: > > Secondly, maybe the string situation in 2.x wasn't as broken as we > thought it was. In particular, those who deal with lots of encoded > strings seemed to find it handy, and miss it in 3.x. Perhaps strings > are more like numbers than we think. We have separate types for int, > float, Decimal, etc. But they're all numbers, and they all > cross-operate. No they do not. Decimal only mixes properly with ints, but not with anything else, sometime with surprising and havoc-creating ways: >>> Decimal(0) == float(0) False I believe that and other comparisons may be fixed in 3.2, but I know there was lots of discussion of whether float + decimal should return a float or decimal, with good arguments both ways. To put it another way, there are potential problems with either choice. Automatic mixed-mode arithmetic is not always a slam-dunk, no-problem choise. That aside, there are a couple of places where I think the comparison breaks down. If one adds a thousand ints and then a float, there is only the final number to convert. If one adds a thousand bytes and then a unicode, there is the concantenation of the thousand bytes to convert. Or short the result be the concatenation of a thousand unicode conversions. This brings up the distributivity (or not) of conversion over summation. In general, float(i) + float(j) = float(i+j), for i,j ints. I an not sure the same is true if i,j are bytes with some encoding and the conversion is unicode. Does it depend on the encoding? -- Terry Jan Reedy From ncoghlan at gmail.com Fri Jun 25 00:01:38 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Jun 2010 08:01:38 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100624170856.0853D3A4099@sparrow.telecommunity.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> <20100624170856.0853D3A4099@sparrow.telecommunity.com> Message-ID: On Fri, Jun 25, 2010 at 3:07 AM, P.J. Eby wrote: > (Btw, in some earlier emails, Stephen, you implied that this could be fixed > with codecs -- but it can't, because the problem isn't with the bytes > containing invalid Unicode, it's with the Unicode containing invalid bytes > -- i.e., characters that can't be encoded to the ultimate codec target.) That's what the surrogateescape error handler is for though - it will happily accept mojibake on input (putting invalid bytes into the PUA), and happily generate mojibake on output (recreating the invalid bytes from the PUA) as well. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Fri Jun 25 00:01:46 2010 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Jun 2010 15:01:46 -0700 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> Message-ID: On Thu, Jun 24, 2010 at 2:44 PM, Ian Bicking wrote: > I think we'll avoid a lot of the confusion that was present with Python 2 by > not making the coercions transitive.? For instance, here's something that > would work in Python 2: > > ? urlunsplit(('http', 'example.com', '/foo', u'bar=baz', '')) > > And you'd get out a unicode string, except that would break the first time > that query string (u'bar=baz') was not ASCII (but not until then!) Actually, that wouldn't be a problem. The problem would be this: urlunsplit(('http', 'example.com', u'/foo', 'bar=baz', '')) (I moved the "u" prefix from bar=baz to /foo.) And this would break when instead of baz there was some non-ASCII UTF-8, e.g. urlunsplit(('http', 'example.com', u'/foo', 'bar=\xe1\x88\xb4', '')) -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Fri Jun 25 00:15:02 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Jun 2010 08:15:02 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Jun 25, 2010 at 1:41 AM, Guido van Rossum wrote: > I don't think we should abuse sum for this. A simple idiom to get the > *empty* string of a particular type is x[:0] so you could write > something like this to concatenate a list or strings or bytes: > xs[:0].join(xs). Note that if xs is empty we wouldn't know what to do > anyway so this should be disallowed. That's a good trick, although there's a "[0]" missing from your join example ("type(xs[0])()" is another way to spell the same idea, but the subscripting version would likely be faster since it skips the builtin lookup). Promoting that over explicit use of empty str and bytes literals is probably step 1 in eliminating gratuitous breakage of bytes/str polymorphism (this trick also has the benefit of working with non-builtin character sequence types). Use of non-empty bytes/str literals is going to be harder to handle - actually trying to apply a polymorphic philosophy to the Python 3 URL parsing libraries may be a good way to learn more on that front. Cheers, Nick. P.S. I'm off to Sydney for PyconAU this evening, so I'm not sure how much time I'll get to follow python-dev until next week. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tjreedy at udel.edu Fri Jun 25 00:20:52 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 24 Jun 2010 18:20:52 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> Message-ID: On 6/24/2010 4:59 PM, Guido van Rossum wrote: > But I wouldn't go so far as to claim that interpreting the protocols > as text is wrong. After all we're talking exclusively about protocols > that are designed intentionally to be directly "human readable" I agree that the claim "':' is just a byte" is a bit shortsighted. If the designers of the protocols had intended to use uninterpreted bytes as protocol markers, they could and I suspect would have used unused control codes, of which there are several. Then there would have been no need for escape mechanisms to put things like :<> into content text. I am very sure that the reason for specifying *ascii* byte values was to be crysal clear as to what *character* was meant and to *exclude* use on the internet of the main imcompatible competitor encoding -- IBM's EBCDIC -- which IBM used in all of *its* networks. Until the IBM PC came out in the early 1980s (and IBM originally saw that as a minor sideline and something of a toy), there was a battle over byte encodings between IBM and everyone else. -- Terry Jan Reedy From mal at egenix.com Fri Jun 25 00:35:05 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 25 Jun 2010 00:35:05 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C23D3C2.1060500@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> Message-ID: <4C23DD99.9050604@egenix.com> Scott Dial wrote: > On 6/24/2010 5:09 PM, Barry Warsaw wrote: >>> What use case does this address? >> >>> If you want to make it so a system can install a package in just one >>> location to be used by multiple Python installations, then the version >>> number isn't enough. You also need to distinguish debug builds, profiling >>> builds, Unicode width (see issue8654), and probably several other >>> ./configure options. >> >> This is a good point, but more easily addressed. Let's say a distro makes >> three Python 3.2 variants available, one "normal" build, a debug build, and >> UCS2 and USC4 versions of the above. All we need to do is choose a different >> .so ABI tag (see previous follow) for each of those builds. My updated patch >> (coming soon) allows you to define that tag to configure. So e.g. > > Why is this use case not already addressed by having independent > directories? And why is there an incentive to co-mingle these > version-punned files with version-agnostic ones? I don't think this is a good idea. After a while your Python lib directories would need some serious dusting off to make them maintainable again. Disk space is cheap so setting up dedicated directories for each variant will result in a much easier to manage installation. If you want a really clever setup, use hard links between those directory (you can also use symlinks if you like). Then a change in one Python file will automatically propagate to all other variant dirs without any maintenance effort. Together with PYTHONHOME this makes a really nice virtualenv-like environment. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 25 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 23 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Fri Jun 25 00:35:07 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Jun 2010 08:35:07 +1000 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624115048.4fd152e3@heresy> References: <20100624115048.4fd152e3@heresy> Message-ID: On Fri, Jun 25, 2010 at 1:50 AM, Barry Warsaw wrote: > Please let me know what you think. ?I'm happy to just commit this to the py3k > branch if there are no objections . ?I don't think a new PEP is in > order, but an update to PEP 3147 might make sense. I like the idea, but I think summarising the rest of this discussion in its own (relatively short) PEP would be good (there are a few things that are tricky - exact versioning scheme, PEP 384 forward compatibility, impact on distutils, articulating the benefits for distro packaging, etc). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at thorne.id.au Fri Jun 25 01:28:21 2010 From: stephen at thorne.id.au (Stephen Thorne) Date: Fri, 25 Jun 2010 09:28:21 +1000 Subject: [Python-Dev] "2 or 3" link on python.org Message-ID: <20100624232821.GB10805@thorne.id.au> Steve Holden Wrote: > Given the amount of interest this thread has generated I can't help > wondering why it isn't more prominent in python.org content. Is the > developer community completely disjoint with the web content editor > community? > > If there is such a disconnect we should think about remedying it: a > large "Python 2 or 3?" button could link to a reasoned discussion of the > pros and cons as evinced in this thread. That way people will end up > with the right version more often (and be writing Python 2 that will > more easily migrate to Python 3, if they cannot yet use 3). > > There seems to be a perception that the PSF can help fund developments, > and indeed Jesse Noller has made a small start with his sprint funding > proposal (which now has some funding behind it). I think if it is to do > so the Foundation will have to look for substantial new funding. I do > not currently understand where this funding would come from, and would > like to tap your developer creativity in helping to define how the > Foundation can effectively commit more developer time to Python. > > GSoC and GHOP are great examples, but there is plenty of room for all > sorts of initiatives that result in development opportunities. I'd like > to help. I am extremely keen for this to happen. Does anyone have ownership of this project? There was some discussion of it up-list but the discussion fizzled. -- Regards, Stephen Thorne Development Engineer Netbox Blue From martin at v.loewis.de Fri Jun 25 02:00:45 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Jun 2010 02:00:45 +0200 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <20100624232821.GB10805@thorne.id.au> References: <20100624232821.GB10805@thorne.id.au> Message-ID: <4C23F1AD.9040809@v.loewis.de> Am 25.06.2010 01:28, schrieb Stephen Thorne: > Steve Holden Wrote: >> Given the amount of interest this thread has generated I can't help >> wondering why it isn't more prominent in python.org content. Is the >> developer community completely disjoint with the web content editor >> community? >> >> If there is such a disconnect we should think about remedying it: a >> large "Python 2 or 3?" button could link to a reasoned discussion of the >> pros and cons as evinced in this thread. That way people will end up >> with the right version more often (and be writing Python 2 that will >> more easily migrate to Python 3, if they cannot yet use 3). >> >> There seems to be a perception that the PSF can help fund developments, >> and indeed Jesse Noller has made a small start with his sprint funding >> proposal (which now has some funding behind it). I think if it is to do >> so the Foundation will have to look for substantial new funding. I do >> not currently understand where this funding would come from, and would >> like to tap your developer creativity in helping to define how the >> Foundation can effectively commit more developer time to Python. >> >> GSoC and GHOP are great examples, but there is plenty of room for all >> sorts of initiatives that result in development opportunities. I'd like >> to help. > > I am extremely keen for this to happen. Does anyone have ownership of this > project? There was some discussion of it up-list but the discussion fizzled. Can you please explain what "this project" is, in the context of your message? GSoC? GHOP? Regards, Martin From foom at fuhm.net Fri Jun 25 02:23:51 2010 From: foom at fuhm.net (James Y Knight) Date: Thu, 24 Jun 2010 20:23:51 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C23D3C2.1060500@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> Message-ID: On Jun 24, 2010, at 5:53 PM, Scott Dial wrote: > On 6/24/2010 5:09 PM, Barry Warsaw wrote: >>> What use case does this address? >> >> Specifically, it's the use case where we (Debian/Ubuntu) plan on >> installing >> all Python 3.x packages into /usr/lib/python3/dist-packages. As of >> PEP 3147, >> we can do that without collisions on the pyc files, but would still >> have to >> symlink for extension module .so files, because they are always >> named foo.so >> and Python 3.2's foo.so won't (modulo PEP 384) be compatible with >> Python 3.3's >> foo.so. > > If the package has .so files that aren't compatible with other version > of python, then what is the motivation for placing that in a shared > location (since it can't actually be shared) Because python looks for .so files in the same place it looks for the .py files of the same package. E.g., given a module like lxml, it contains the following files (among others): lxml/ lxml/__init__.py lxml/__init__.pyc lxml/builder.py lxml/builder.pyc lxml/etree.so And you can only put it in one place. Really, python should store the .py files in /usr/share/python/, the .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc files in /var/lib/python2.5- debug. But python doesn't work like that. James From stephen at thorne.id.au Fri Jun 25 02:31:49 2010 From: stephen at thorne.id.au (Stephen Thorne) Date: Fri, 25 Jun 2010 10:31:49 +1000 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <4C23F1AD.9040809@v.loewis.de> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> Message-ID: <20100625003149.GA16084@thorne.id.au> On 2010-06-25, "Martin v. L?wis" wrote: > Am 25.06.2010 01:28, schrieb Stephen Thorne: > > Steve Holden Wrote: > >> Given the amount of interest this thread has generated I can't help > >> wondering why it isn't more prominent in python.org content. Is the > >> developer community completely disjoint with the web content editor > >> community? > >> > >> If there is such a disconnect we should think about remedying it: a > >> large "Python 2 or 3?" button could link to a reasoned discussion of the > >> pros and cons as evinced in this thread. That way people will end up > >> with the right version more often (and be writing Python 2 that will > >> more easily migrate to Python 3, if they cannot yet use 3). > >> > >> There seems to be a perception that the PSF can help fund developments, > >> and indeed Jesse Noller has made a small start with his sprint funding > >> proposal (which now has some funding behind it). I think if it is to do > >> so the Foundation will have to look for substantial new funding. I do > >> not currently understand where this funding would come from, and would > >> like to tap your developer creativity in helping to define how the > >> Foundation can effectively commit more developer time to Python. > >> > >> GSoC and GHOP are great examples, but there is plenty of room for all > >> sorts of initiatives that result in development opportunities. I'd like > >> to help. > > > > I am extremely keen for this to happen. Does anyone have ownership of this > > project? There was some discussion of it up-list but the discussion fizzled. > > Can you please explain what "this project" is, in the context of your > message? GSoC? GHOP? Oh, I thought this was quite clear. I was specifically meaning the large "Python 2 or 3" button on python.org. It would help users who want to know what version of python to use if they had a clear guide as to what version to download. It doesn't help if someone goes to do greenfield development in python if a library they depend upon has yet to be ported, and they're trying to use python 3. (As an addendum add pygtk to the list of libs that python 3 users on #python are alarmed to find haven't been ported yet) -- Regards, Stephen Thorne Development Engineer Netbox Blue From healey.rich at gmail.com Fri Jun 25 02:51:18 2010 From: healey.rich at gmail.com (Rich Healey) Date: Fri, 25 Jun 2010 10:51:18 +1000 Subject: [Python-Dev] docs - Copy Message-ID: http://docs.python.org/library/copy.html Just near the bottom it reads: """Shallow copies of dictionaries can be made using?dict.copy(), and of lists by assigning a slice of the entire list, for example, copied_list?=?original_list[:].""" Surely this is a typo? To my understanding, copied_list = original_list[:] gives you a clean copy (slicing returns a new object....) Can this be updated? Or someone explain to me why it's correct? Cheers Example: >>> t = [1, 2, 3] >>> y = t >>> u = t[:] >>> y[1] = "rawr" >>> t [1, 'rawr', 3] >>> u [1, 2, 3] >>> From ben+python at benfinney.id.au Fri Jun 25 02:54:30 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Fri, 25 Jun 2010 10:54:30 +1000 Subject: [Python-Dev] FHS compliance of Python installation (was: versioned .so files for Python 3.2) References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> Message-ID: <876318lynt.fsf_-_@benfinney.id.au> James Y Knight writes: > Really, python should store the .py files in /usr/share/python/, the > .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc > files in /var/lib/python2.5- debug. But python doesn't work like that. +1 So who's going to draft the ?Filesystem Hierarchy Standard compliance? PEP? :-) -- \ ?Having sex with Rachel is like going to a concert. She yells a | `\ lot, and throws frisbees around the room; and when she wants | _o__) more, she lights a match.? ?Steven Wright | Ben Finney From steve at holdenweb.com Fri Jun 25 02:58:41 2010 From: steve at holdenweb.com (Steve Holden) Date: Thu, 24 Jun 2010 20:58:41 -0400 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <20100625003149.GA16084@thorne.id.au> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> Message-ID: <4C23FF41.5020006@holdenweb.com> Stephen Thorne wrote: > On 2010-06-25, "Martin v. L?wis" wrote: >> Am 25.06.2010 01:28, schrieb Stephen Thorne: >>> Steve Holden Wrote: >>>> Given the amount of interest this thread has generated I can't help >>>> wondering why it isn't more prominent in python.org content. Is the >>>> developer community completely disjoint with the web content editor >>>> community? >>>> >>>> If there is such a disconnect we should think about remedying it: a >>>> large "Python 2 or 3?" button could link to a reasoned discussion of the >>>> pros and cons as evinced in this thread. That way people will end up >>>> with the right version more often (and be writing Python 2 that will >>>> more easily migrate to Python 3, if they cannot yet use 3). >>>> >>>> There seems to be a perception that the PSF can help fund developments, >>>> and indeed Jesse Noller has made a small start with his sprint funding >>>> proposal (which now has some funding behind it). I think if it is to do >>>> so the Foundation will have to look for substantial new funding. I do >>>> not currently understand where this funding would come from, and would >>>> like to tap your developer creativity in helping to define how the >>>> Foundation can effectively commit more developer time to Python. >>>> >>>> GSoC and GHOP are great examples, but there is plenty of room for all >>>> sorts of initiatives that result in development opportunities. I'd like >>>> to help. >>> I am extremely keen for this to happen. Does anyone have ownership of this >>> project? There was some discussion of it up-list but the discussion fizzled. >> Can you please explain what "this project" is, in the context of your >> message? GSoC? GHOP? > > Oh, I thought this was quite clear. I was specifically meaning the large > "Python 2 or 3" button on python.org. It would help users who want to know > what version of python to use if they had a clear guide as to what version > to download. > > It doesn't help if someone goes to do greenfield development in python > if a library they depend upon has yet to be ported, and they're trying to > use python 3. > > (As an addendum add pygtk to the list of libs that python 3 users on #python > are alarmed to find haven't been ported yet) > This topic really needs to go to the pydotorg list, as the guys there maintain the site content. I know that Michael Foord is on both lists, so he may be a good candidate for leading the charge, so to speak. This topic is likely to assume increasing importance. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Fri Jun 25 02:58:41 2010 From: steve at holdenweb.com (Steve Holden) Date: Thu, 24 Jun 2010 20:58:41 -0400 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <20100625003149.GA16084@thorne.id.au> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> Message-ID: <4C23FF41.5020006@holdenweb.com> Stephen Thorne wrote: > On 2010-06-25, "Martin v. L?wis" wrote: >> Am 25.06.2010 01:28, schrieb Stephen Thorne: >>> Steve Holden Wrote: >>>> Given the amount of interest this thread has generated I can't help >>>> wondering why it isn't more prominent in python.org content. Is the >>>> developer community completely disjoint with the web content editor >>>> community? >>>> >>>> If there is such a disconnect we should think about remedying it: a >>>> large "Python 2 or 3?" button could link to a reasoned discussion of the >>>> pros and cons as evinced in this thread. That way people will end up >>>> with the right version more often (and be writing Python 2 that will >>>> more easily migrate to Python 3, if they cannot yet use 3). >>>> >>>> There seems to be a perception that the PSF can help fund developments, >>>> and indeed Jesse Noller has made a small start with his sprint funding >>>> proposal (which now has some funding behind it). I think if it is to do >>>> so the Foundation will have to look for substantial new funding. I do >>>> not currently understand where this funding would come from, and would >>>> like to tap your developer creativity in helping to define how the >>>> Foundation can effectively commit more developer time to Python. >>>> >>>> GSoC and GHOP are great examples, but there is plenty of room for all >>>> sorts of initiatives that result in development opportunities. I'd like >>>> to help. >>> I am extremely keen for this to happen. Does anyone have ownership of this >>> project? There was some discussion of it up-list but the discussion fizzled. >> Can you please explain what "this project" is, in the context of your >> message? GSoC? GHOP? > > Oh, I thought this was quite clear. I was specifically meaning the large > "Python 2 or 3" button on python.org. It would help users who want to know > what version of python to use if they had a clear guide as to what version > to download. > > It doesn't help if someone goes to do greenfield development in python > if a library they depend upon has yet to be ported, and they're trying to > use python 3. > > (As an addendum add pygtk to the list of libs that python 3 users on #python > are alarmed to find haven't been ported yet) > This topic really needs to go to the pydotorg list, as the guys there maintain the site content. I know that Michael Foord is on both lists, so he may be a good candidate for leading the charge, so to speak. This topic is likely to assume increasing importance. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Fri Jun 25 03:04:03 2010 From: steve at holdenweb.com (Steve Holden) Date: Thu, 24 Jun 2010 21:04:03 -0400 Subject: [Python-Dev] docs - Copy In-Reply-To: References: Message-ID: Rich Healey wrote: > http://docs.python.org/library/copy.html > > Just near the bottom it reads: > > """Shallow copies of dictionaries can be made using dict.copy(), and > of lists by assigning a slice of the entire list, for example, > copied_list = original_list[:].""" > > > Surely this is a typo? To my understanding, copied_list = > original_list[:] gives you a clean copy (slicing returns a new > object....) > Yes, but it's a shallow copy: the new object references exactly the same objects as the original list (not copies of those objects). A deep copy would need to copy any referenced lists, and so on. > Can this be updated? Or someone explain to me why it's correct? > It sounds correct to me. regards Steve > Cheers > > Example: > > >>>> t = [1, 2, 3] >>>> y = t >>>> u = t[:] >>>> y[1] = "rawr" >>>> t > [1, 'rawr', 3] >>>> u > [1, 2, 3] -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From alexander.belopolsky at gmail.com Fri Jun 25 03:05:09 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 24 Jun 2010 21:05:09 -0400 Subject: [Python-Dev] docs - Copy In-Reply-To: References: Message-ID: On Thu, Jun 24, 2010 at 8:51 PM, Rich Healey wrote: > http://docs.python.org/library/copy.html > > Just near the bottom it reads: > > """Shallow copies of dictionaries can be made using?dict.copy(), and > of lists by assigning a slice of the entire list, for example, > copied_list?=?original_list[:].""" > > > Surely this is a typo? To my understanding, copied_list = > original_list[:] gives you a clean copy (slicing returns a new > object....) > If you read the doc excerpt carefully, you will realize that it says the same thing. I agree that the language can be improved, though. There is no need to bring in assignment to explain that a[:] makes a copy of list a. Please create a documentation issue at http://bugs.python.org . If you can suggest a better formulation, it is likely to be accepted. From greg.ewing at canterbury.ac.nz Fri Jun 25 03:18:18 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 25 Jun 2010 13:18:18 +1200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C23D3C2.1060500@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> Message-ID: <4C2403DA.5000907@canterbury.ac.nz> Scott Dial wrote: > But the only motivation for doing this with .pyc files is that the .py > files are able to be shared, In an application made up of a mixture of pure Python and extension modules, the .py files are able to be shared too. Seems to me that a similar motivation exists here as well. Not exactly the same, but closely related. -- Greg From healey.rich at gmail.com Fri Jun 25 03:14:39 2010 From: healey.rich at gmail.com (Rich Healey) Date: Fri, 25 Jun 2010 11:14:39 +1000 Subject: [Python-Dev] docs - Copy In-Reply-To: References: Message-ID: On Fri, Jun 25, 2010 at 11:04 AM, Steve Holden wrote: > Rich Healey wrote: >> http://docs.python.org/library/copy.html >> >> Just near the bottom it reads: >> >> """Shallow copies of dictionaries can be made using dict.copy(), and >> of lists by assigning a slice of the entire list, for example, >> copied_list = original_list[:].""" >> >> >> Surely this is a typo? To my understanding, copied_list = >> original_list[:] gives you a clean copy (slicing returns a new >> object....) >> > Yes, but it's a shallow copy: the new object references exactly the same > objects as the original list (not copies of those objects). A deep copy > would need to copy any referenced lists, and so on. > My apologies guys, I see now. I will see if I can think of a less ambiguous way to word this and submit a bug. Thankyou! From tjreedy at udel.edu Fri Jun 25 03:18:13 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 24 Jun 2010 21:18:13 -0400 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <20100625003149.GA16084@thorne.id.au> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> Message-ID: On 6/24/2010 8:31 PM, Stephen Thorne wrote: > Oh, I thought this was quite clear. I was specifically meaning the large > "Python 2 or 3" button on python.org. It would help users who want to know > what version of python to use if they had a clear guide as to what version > to download. I think everyone on pydev agrees that that would be good, but I do believe anyone has taken ownership of the issue as yet. I am not sure who currently maintains the site and whether such are aware of the proposal. I believe there is material on the wiki as well as the two existing pages on other sites that were discussed here. So a new page on python.org could consist of a few links. Someone just has to write it. > > It doesn't help if someone goes to do greenfield development in python > if a library they depend upon has yet to be ported, and they're trying to > use python 3. > > (As an addendum add pygtk to the list of libs that python 3 users on #python > are alarmed to find haven't been ported yet) The list, if it exists, should be on the wiki, where any registered user can edit it, rather than on the .org page. I suspect that the feedback about Python on #python is somewhat different from that on python-list. I also suspect that some of it could be used to improve python, the docs, and the site. Is that happening much? I know I regularly open tracker issues (such as 6507, 8824, and 8945) based on python-list discussions , and I know others have made wiki edits. -- Terry Jan Reedy From greg.ewing at canterbury.ac.nz Fri Jun 25 03:28:14 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 25 Jun 2010 13:28:14 +1200 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> Message-ID: <4C24062E.7040105@canterbury.ac.nz> Terry Reedy wrote: > On 6/24/2010 1:38 PM, Bill Janssen wrote: > >> We have separate types for int, >> float, Decimal, etc. But they're all numbers, and they all >> cross-operate. > > No they do not. Decimal only mixes properly with ints, but not with > anything else I think there are also some important differences between numbers and strings concerning how they interact with C code. In C there are really only two choices for representing a Python number in a way that C code can directly operate on -- long or double -- and there is a set of functions for coercing a Python object into one of these that C code almost universally uses. So a new number type only has to implement the appropriate conversion methods to be usable by all of that C code. On the other hand, the existing C code that operates on Python strings often assumes that it has a particular internal representation. A new abstract string-access API would have to be devised, and all existing C code updated to use it. Also, this new API would not be as easy to use as the number API, because it would involve asking for the data in some specified encoding, which would require memory allocation and management. -- Greg From ncoghlan at gmail.com Fri Jun 25 05:34:33 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Jun 2010 13:34:33 +1000 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> Message-ID: On Fri, Jun 25, 2010 at 11:18 AM, Terry Reedy wrote: > I believe there is material on the wiki as well as the two existing pages on > other sites that were discussed here. So a new page on python.org could > consist of a few links. Someone just has to write it. There's material on the wiki *now* (the Python2orPython3 page), but there wasn't before the recent discussion started. The whole Beginner's Guide on the wiki could actually use some TLC to bring it up to speed with the existence of Python 3.x. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From orsenthil at gmail.com Fri Jun 25 06:54:07 2010 From: orsenthil at gmail.com (Senthil Kumaran) Date: Fri, 25 Jun 2010 10:24:07 +0530 Subject: [Python-Dev] docs - Copy In-Reply-To: References: Message-ID: <20100625045407.GA3191@remy> On Thu, Jun 24, 2010 at 09:05:09PM -0400, Alexander Belopolsky wrote: > On Thu, Jun 24, 2010 at 8:51 PM, Rich Healey wrote: > > http://docs.python.org/library/copy.html > > > > Just near the bottom it reads: > > > > """Shallow copies of dictionaries can be made using?dict.copy(), and > > of lists by assigning a slice of the entire list, for example, > > copied_list?=?original_list[:].""" > > > > > > Surely this is a typo? To my understanding, copied_list = > > original_list[:] gives you a clean copy (slicing returns a new > > object....) > > > > the same thing. I agree that the language can be improved, though. > There is no need to bring in assignment to explain that a[:] makes a > copy of list a. Please create a documentation issue at Better still, add your doc change suggestion (possible explanation) to this issue: http://bugs.python.org/issue9021 -- Senthil From stephen at xemacs.org Fri Jun 25 09:05:43 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 25 Jun 2010 16:05:43 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <878w63oam0.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > On Thu, Jun 24, 2010 at 1:12 AM, Stephen J. Turnbull wrote: > Understood, but both the majority of str/bytes methods and several > existing APIs (e.g. many in the os module, like os.listdir()) do it > this way. Understood. > Also, IMO a polymorphic function should *not* accept *mixed* > bytes/text input -- join('x', b'y') should be rejected. Agreed. > But join('x', 'y') -> 'x/y' and join(b'x', b'y') -> b'x/y' make > sense to me. > > So, actually, I *don't* understand what you mean by needing LBYL. Consider docutils. Some folks assert that URIs *are* bytes and should be manipulated as such. So base URIs should be bytes. But there are various ways to refer to a base URI and combine it with relative URI taken from literal text in reST. That literal text will be represented as str. So you want to use urljoin, but this usage isn't polymorphic. If you forget to do a conversion here, urljoin will raise, of course. But late conversion may not be appropriate. AIUI Philip at least wants ways to raise exceptions earlier than that on some code paths. That's LBYL, no? From stephen at xemacs.org Fri Jun 25 09:49:16 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 25 Jun 2010 16:49:16 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100624170856.0853D3A4099@sparrow.telecommunity.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> <20100624170856.0853D3A4099@sparrow.telecommunity.com> Message-ID: <877hlno8lf.fsf@uwakimon.sk.tsukuba.ac.jp> P.J. Eby writes: > This doesn't have to be in the functions; it can be in the > *types*. Mixed-type string operations have to do type checking and > upcasting already, but if the protocol were open, you could make an > encoded-bytes type that would handle the error checking. Don't you realize that "encoded-bytes" is equivalent to use of a very limited profile of ISO 2022 coding extensions? Such as Emacs/MULE internal encoding or TRON code? It has been tried. It does not work. I understand how types can do such checking; my point is that the encoded-bytes type doesn't have enough information to do it in the cases where you think it is better than converting to str. There are *no useful operations* that can be done on two encoded-bytes with different encodings unless you know the ultimate target codec. The only sensible way to define the concatenation of ('ascii', 'English') with ('euc-jp','??????') is something like ('ascii', 'English', 'euc-jp','??????'), and *not* ('euc-jp','English??????'), because you don't know that the ultimate target codec is 'euc-jp'-compatible. Worse, you need to build in all the information about which codecs are mutually compatible into the encoded-bytes type. For example, if the ultimate target is known to be 'shift_jis', it's trivially compatible with 'ascii' and 'euc-jp' requires a conversion, but latin-9 you can't have. > (Btw, in some earlier emails, Stephen, you implied that this could be > fixed with codecs -- but it can't, because the problem isn't with the > bytes containing invalid Unicode, it's with the Unicode containing > invalid bytes -- i.e., characters that can't be encoded to the > ultimate codec target.) No, the problem is not with the Unicode, it is with the code that allows characters not encodable with the target codec. If you don't have a target codec, there are ascii-safe source codecs, such as 'latin-1' or 'ascii' with surrogateescape, that will work any time that bytes-oriented processing can work. From scott+python-dev at scottdial.com Fri Jun 25 10:53:21 2010 From: scott+python-dev at scottdial.com (Scott Dial) Date: Fri, 25 Jun 2010 04:53:21 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> Message-ID: <4C246E81.3020302@scottdial.com> On 6/24/2010 8:23 PM, James Y Knight wrote: > On Jun 24, 2010, at 5:53 PM, Scott Dial wrote: >> If the package has .so files that aren't compatible with other version >> of python, then what is the motivation for placing that in a shared >> location (since it can't actually be shared) > > Because python looks for .so files in the same place it looks for the > .py files of the same package. My suggestion was that a package that contains .so files should not be shared (e.g., the entire lxml package should be placed in a version-specific path). The motivation for this PEP was to simplify the installation python packages for distros; it was not to reduce the number of .py files on the disk. Placing .so files together does not simplify that install process in any way. You will still have to handle such packages in a special way. You must still compile the package multiple times for each relevant version of python (with special tagging that I imagine distutils can take care of) and, worse yet, you have created a more trick install than merely having multiple search paths (e.g., installing/uninstalling lxml for *one* version of python is actually more difficult in this scheme). Either the motivation for this PEP is inaccurate or I am failing to understand how this is *simpler*. In the case of pure-python, this PEP is clearly a win, but I have not seen an argument that it is a win for .so files. Moreover, the PEP itself is titled "PYC Repository Directories" (not "shared site-packages") and makes no mention of .so files at all. -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From scott+python-dev at scottdial.com Fri Jun 25 11:02:24 2010 From: scott+python-dev at scottdial.com (Scott Dial) Date: Fri, 25 Jun 2010 05:02:24 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C2403DA.5000907@canterbury.ac.nz> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C2403DA.5000907@canterbury.ac.nz> Message-ID: <4C2470A0.4000802@scottdial.com> On 6/24/2010 9:18 PM, Greg Ewing wrote: > Scott Dial wrote: > >> But the only motivation for doing this with .pyc files is that the .py >> files are able to be shared, > > In an application made up of a mixture of pure Python and > extension modules, the .py files are able to be shared too. > Seems to me that a similar motivation exists here as well. > Not exactly the same, but closely related. > If I recall Barry's motivation correctly, the PEP was intended to simplify the installation of packages for multiple versions of Python, although the PEP states that in a less direct way. In the case of pure-python packages, this is merely about avoiding .pyc collisions. But, in the case of packages with .so files, I fail to see how this is simpler (in face, I believe it to be more complicated). So, I am not sure the PEP supports this feature being proposed (since it makes no mention of .so files), and more importantly, I am not sure it actually makes anything better for anyone (still requires multiple compilations and un/install gymnastics). -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From lvh at laurensvh.be Fri Jun 25 11:18:18 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Fri, 25 Jun 2010 11:18:18 +0200 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> Message-ID: On Fri, Jun 25, 2010 at 5:34 AM, Nick Coghlan wrote: > On Fri, Jun 25, 2010 at 11:18 AM, Terry Reedy wrote: >> I believe there is material on the wiki as well as the two existing pages on >> other sites that were discussed here. So a new page on python.org could >> consist of a few links. Someone just has to write it. > > There's material on the wiki *now* (the Python2orPython3 page), but > there wasn't before the recent discussion started. The whole > Beginner's Guide on the wiki could actually use some TLC to bring it > up to speed with the existence of Python 3.x. > > Cheers, > Nick. > +1, this definitely sounds like a good idea to me. cheers, Laurens From stephen at xemacs.org Fri Jun 25 12:06:33 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 25 Jun 2010 19:06:33 +0900 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> Message-ID: <876317o28m.fsf@uwakimon.sk.tsukuba.ac.jp> Ian Bicking writes: > We've setup a system where we think of text as natively unicode, with > encodings to put that unicode into a byte form. This is certainly > appropriate in a lot of cases. But there's a significant class of problems > where bytes are the native structure. Network protocols are what we've been > discussing, and are a notable case of that. That is, b'/' is the most > native sense of a path separator in a URL, or b':' is the most native sense > of what separates a header name from a header value in HTTP. IMHO, URIs don't have a native language in this sense. Network programmers do, however, and it is bytes. Text-handling programmers also do, and it is str. > So with this idea in mind it makes more sense to me that *specific pieces of > text* can be reasonably treated as both bytes and text. All the string > literals in urllib.parse.urlunspit() for example. > > The semantics I imagine are that special('/')+b'x'==b'/x' (i.e., it does not > become special('/x')) and special('/')+x=='/x' (again it becomes str). This > avoids some of the cases of unicode or str infecting a system as they did in > Python 2 (where you might pass in unicode and everything works fine until > some non-ASCII is introduced). I think you need to give explicit examples where this actually helps in terms of "type contagion". I expect that it doesn't help at all, especially not for the people whose native language for URIs is bytes. These specials are still going to flip to unicode as soon as it comes in, and that will be incompatible with the bytes they'll need later. So they're still going to need to filter out unicode on input. It looks like it would be useful for programmers of polymorphic functions, though. From pje at telecommunity.com Fri Jun 25 15:07:46 2010 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 25 Jun 2010 09:07:46 -0400 Subject: [Python-Dev] bytes / unicode Message-ID: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> At 04:49 PM 6/25/2010 +0900, Stephen J. Turnbull wrote: >P.J. Eby writes: > > > This doesn't have to be in the functions; it can be in the > > *types*. Mixed-type string operations have to do type checking and > > upcasting already, but if the protocol were open, you could make an > > encoded-bytes type that would handle the error checking. > >Don't you realize that "encoded-bytes" is equivalent to use of a very >limited profile of ISO 2022 coding extensions? Such as Emacs/MULE >internal encoding or TRON code? It has been tried. It does not work. > >I understand how types can do such checking; my point is that the >encoded-bytes type doesn't have enough information to do it in the >cases where you think it is better than converting to str. There are >*no useful operations* that can be done on two encoded-bytes with >different encodings unless you know the ultimate target codec. I do know the ultimate target codec -- that's the point. IOW, I want to be able to do to all my operations by passing target-encoded strings to polymorphic functions. Then, the moment something creeps in that won't go to the target codec, I'll be able to track down the hole in the legacy code that's letting bad data creep in. > The >only sensible way to define the concatenation of ('ascii', 'English') >with ('euc-jp','??????') is something like ('ascii', 'English', >'euc-jp','??????'), and *not* ('euc-jp','English??????'), because you >don't know that the ultimate target codec is 'euc-jp'-compatible. >Worse, you need to build in all the information about which codecs are >mutually compatible into the encoded-bytes type. For example, if the >ultimate target is known to be 'shift_jis', it's trivially compatible >with 'ascii' and 'euc-jp' requires a conversion, but latin-9 you can't >have. The interaction won't be with other encoded bytes, it'll be with other *unicode* strings. Ones coming from other code, and literals embedded in the stdlib. >No, the problem is not with the Unicode, it is with the code that >allows characters not encodable with the target codec. And which code that is, precisely, is the thing that may be very difficult to find, unless I can identify it at the first point it enters (and corrupts) my output data. When dealing with a large code base, this may be a nontrivial problem. From ianb at colorstudy.com Fri Jun 25 17:35:44 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 25 Jun 2010 10:35:44 -0500 Subject: [Python-Dev] bytes / unicode In-Reply-To: <878w63oam0.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> <878w63oam0.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Jun 25, 2010 at 2:05 AM, Stephen J. Turnbull wrote: > > But join('x', 'y') -> 'x/y' and join(b'x', b'y') -> b'x/y' make > > sense to me. > > > > So, actually, I *don't* understand what you mean by needing LBYL. > > Consider docutils. Some folks assert that URIs *are* bytes and should > be manipulated as such. So base URIs should be bytes. I don't get what you are arguing against. Are you worried that if we make URL code polymorphic that this will mean some code will treat URLs as bytes, and that code will be incompatible with URLs as text? No one is arguing we remove text support from any of these functions, only that we allow bytes. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianb at colorstudy.com Fri Jun 25 17:40:56 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 25 Jun 2010 10:40:56 -0500 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <876317o28m.fsf@uwakimon.sk.tsukuba.ac.jp> References: <11597.1277401099@parc.com> <876317o28m.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Jun 25, 2010 at 5:06 AM, Stephen J. Turnbull wrote: > > So with this idea in mind it makes more sense to me that *specific > pieces of > > text* can be reasonably treated as both bytes and text. All the string > > literals in urllib.parse.urlunspit() for example. > > > > The semantics I imagine are that special('/')+b'x'==b'/x' (i.e., it does > not > > become special('/x')) and special('/')+x=='/x' (again it becomes str). > This > > avoids some of the cases of unicode or str infecting a system as they > did in > > Python 2 (where you might pass in unicode and everything works fine > until > > some non-ASCII is introduced). > > I think you need to give explicit examples where this actually helps > in terms of "type contagion". I expect that it doesn't help at all, > especially not for the people whose native language for URIs is bytes. > These specials are still going to flip to unicode as soon as it comes > in, and that will be incompatible with the bytes they'll need later. > So they're still going to need to filter out unicode on input. > > It looks like it would be useful for programmers of polymorphic > functions, though. > I'm proposing these specials would be used in polymorphic functions, like the functions in urllib.parse. I would not personally use them in my own code (unless of course I was writing my own polymorphic functions). This also makes it less important that the objects be a full stand-in for text, as their use should be isolated to specific functions, they aren't objects that should be passed around much. So you can easily identify and quickly detect if you use unsupported operations on those text-like objects. (This is all a very different use case from bytes+encoding, I think) -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri Jun 25 18:08:26 2010 From: status at bugs.python.org (Python tracker) Date: Fri, 25 Jun 2010 18:08:26 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20100625160826.0C34078182@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2010-06-18 - 2010-06-25) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2795 open (+38) / 18104 closed (+14) / 20899 total (+52) Open issues with patches: 1130 Average duration of open issues: 712 days. Median duration of open issues: 503 days. Open Issues Breakdown open 2765 (+38) languishing 13 ( +0) pending 16 ( +0) Issues Created Or Reopened (55) _______________________________ os.path.normcase documentation/behaviour unclear on Mac OS X 2010-06-25 http://bugs.python.org/issue3485 reopened ezio.melotti patch uuid.uuid4() generates non-unique values on OSX 2010-06-21 http://bugs.python.org/issue8621 reopened skrah patch test_support.run_unittest cmdline options and arguments 2010-06-20 http://bugs.python.org/issue9028 reopened techtonik errors='replace' works in IDLE, fails at Windows command line. 2010-06-18 http://bugs.python.org/issue9029 created jvanpraag ctypes variable limits 2010-06-18 http://bugs.python.org/issue9030 created kumma distutils uses invalid "-Wstrict-prototypes" flag when compili 2010-06-18 http://bugs.python.org/issue9031 created matteo.vescovi xmlrpc: Transport.request() should also catch socket.error(EPI 2010-06-18 http://bugs.python.org/issue9032 created haypo patch cmd module tab misbehavior 2010-06-19 http://bugs.python.org/issue9033 created slcott datetime module should use int32_t for date/time components 2010-06-20 http://bugs.python.org/issue9034 created belopolsky os.path.ismount on windows doesn't support windows mount point 2010-06-20 http://bugs.python.org/issue9035 created Oren_Held Simplify Py_CHARMASK 2010-06-20 http://bugs.python.org/issue9036 created skrah patch, needs review Add explanation as to how to raise a custom exception in the e 2010-06-20 http://bugs.python.org/issue9037 created jonathan.underwood patch test_distutils failure 2010-06-20 http://bugs.python.org/issue9038 created pitrou IDLE and module Doc 2010-06-20 http://bugs.python.org/issue9039 created Yoda_Uchiha using MIMEApplication to attach a PDF raises a TypeError excep 2010-06-21 http://bugs.python.org/issue9040 created Enrico.Sartori raised exception is misleading 2010-06-21 http://bugs.python.org/issue9041 created kumma Gettext cache and classes 2010-06-21 http://bugs.python.org/issue9042 created v_peter patch 2to3 doesn't handle byte comparison well 2010-06-21 CLOSED http://bugs.python.org/issue9043 created vdupras [optparse] confusion over an option and its value without any 2010-06-21 http://bugs.python.org/issue9044 created kszawala 2.7rc1: 64-bit OSX installer is not built with 64-bit tkinter 2010-06-21 http://bugs.python.org/issue9045 created srid Python 2.7rc2 doesn't build on Mac OS X 10.4 2010-06-21 http://bugs.python.org/issue9046 created lemburg Python 2.7rc2 includes -isysroot twice on each gcc command lin 2010-06-21 http://bugs.python.org/issue9047 created lemburg no OS X buildbots in the stable list 2010-06-21 http://bugs.python.org/issue9048 created janssen buildbot UnboundLocalError in nested function 2010-06-21 CLOSED http://bugs.python.org/issue9049 created Andreas Hofmeister UnboundLocalError in nested function 2010-06-21 CLOSED http://bugs.python.org/issue9050 created Andreas Hofmeister Improve pickle format for aware datetime instances 2010-06-21 http://bugs.python.org/issue9051 created belopolsky 2.7rc2 fails test_urllib_localnet tests on OS X 2010-06-21 CLOSED http://bugs.python.org/issue9052 created janssen distutils compiles extensions so that Python.h cannot be found 2010-06-21 http://bugs.python.org/issue9053 created exarkun pyexpat configured with "--with-system-expat" is incompatible 2010-06-21 http://bugs.python.org/issue9054 created dmalcolm patch test_issue_8959_b fails when run from a service 2010-06-21 http://bugs.python.org/issue9055 created pmoore buildbot Adding additional level of bookmarks and section numbers in py 2010-06-22 http://bugs.python.org/issue9056 created pengyu.ut Distutils2 needs a home page 2010-06-22 http://bugs.python.org/issue9057 created dabrahams PyUnicodeDecodeError_Create asserts that various arguments are 2010-06-22 CLOSED http://bugs.python.org/issue9058 created dmalcolm patch Backwards compatibility 2010-06-23 CLOSED http://bugs.python.org/issue9059 created Raven Python/dup2.c doesn't compile on (at least) newlib 2010-06-23 http://bugs.python.org/issue9060 created torne patch cgi.escape Can Lead To XSS Vulnerabilities 2010-06-23 http://bugs.python.org/issue9061 created Craig.Younkins urllib.urlopen crashes when launched from a thread 2010-06-23 CLOSED http://bugs.python.org/issue9062 created olivier-berten TZ examples in datetime.rst are incorrect 2010-06-23 http://bugs.python.org/issue9063 created belopolsky pdb enhancement up/down traversals 2010-06-23 http://bugs.python.org/issue9064 created vandyswa patch tarfile: default root:root ownership is incorrect. 2010-06-23 http://bugs.python.org/issue9065 created jsbronder patch Standard type codes for array.array, same as struct 2010-06-24 http://bugs.python.org/issue9066 created cmcqueen1975 Use macros from pyctype.h 2010-06-24 http://bugs.python.org/issue9067 created skrah "from . import *" 2010-06-24 CLOSED http://bugs.python.org/issue9068 created bhy test_float failure on Solaris 2010-06-24 http://bugs.python.org/issue9069 created mark.dickinson Timestamps are rounded differently in py3k and trunk 2010-06-24 CLOSED http://bugs.python.org/issue9070 created belopolsky TarFile doesn't support member files with a leading "./" 2010-06-24 CLOSED http://bugs.python.org/issue9071 created free.ekanayaka Unloading modules - memleaks? 2010-06-24 CLOSED http://bugs.python.org/issue9072 created yappie Tkinter module missing from install on OS X 10.6.4 2010-06-24 http://bugs.python.org/issue9073 created RolandJ [includes patch] subprocess module closes standard file descri 2010-06-24 http://bugs.python.org/issue9074 created kr patch ssl module sets "debug" flag on SSL struct 2010-06-24 CLOSED http://bugs.python.org/issue9075 created pitrou Add C-API documentation for PyUnicode_AsDecodedObject/Unicode 2010-06-24 http://bugs.python.org/issue9076 created haypo patch argparse does not handle arguments correctly after -- 2010-06-24 CLOSED http://bugs.python.org/issue9077 created iElectric Fix C API documentation of unicode 2010-06-24 http://bugs.python.org/issue9078 created haypo patch Make gettimeofday available in time module 2010-06-25 http://bugs.python.org/issue9079 created belopolsky patch, needs review Provide list prepend method (even though it's not efficient) 2010-06-25 CLOSED http://bugs.python.org/issue9080 created andybuckley Issues Now Closed (43) ______________________ MultiMethods with type annotations in 3000 1035 days http://bugs.python.org/issue1004 benjamin.peterson patch subprocess.list2cmdline doesn't do pipe symbols 975 days http://bugs.python.org/issue1300 chops at demiurgestudios.com easy Popen.poll always returns None 816 days http://bugs.python.org/issue2475 tjreedy Python interpreter uses Unicode surrogate pairs only before th 713 days http://bugs.python.org/issue3297 haypo patch py3k shouldn't use -fno-strict-aliasing anymore 712 days http://bugs.python.org/issue3326 benjamin.peterson patch create a numbits() method for int and long types 699 days http://bugs.python.org/issue3439 mark.dickinson patch, needs review os.path.realpath() get the wrong result 554 days http://bugs.python.org/issue4654 r.david.murray Compiling python 2.5.2 under Wine on linux. 527 days http://bugs.python.org/issue4883 BreamoreBoy 3.0 sqlite doc: most examples refer to pysqlite2, use 2.x synt 516 days http://bugs.python.org/issue5005 tjreedy Implement a way to change the python process name 448 days http://bugs.python.org/issue5672 piro patch setup build with Platform SDK, finding vcvarsall.bat 407 days http://bugs.python.org/issue5969 georg.brandl Failing test_signal.py on Redhat 4.1.2-44 407 days http://bugs.python.org/issue5972 georg.brandl datetime.strptime doesn't support %z format ? 1 days http://bugs.python.org/issue6641 merwok patch webbrowser.get("firefox") does not work on Mac with installed 243 days http://bugs.python.org/issue7192 ronaldoussoren patch Backport 3.x nonlocal keyword to 2.7 117 days http://bugs.python.org/issue8018 mark.dickinson test_heapq interfering with test_import on py3k 65 days http://bugs.python.org/issue8440 tim.golden enumerate() test cases do not cover optional start argument 46 days http://bugs.python.org/issue8636 merwok patch _ssl.c uses PyWeakref_GetObject but doesn't incref result 45 days http://bugs.python.org/issue8682 pitrou patch Remove "w" format of PyParse_ParseTuple() 27 days http://bugs.python.org/issue8850 haypo patch msvc9compiler.py: find_vcvarsall() doesn't work with VS2008 on 23 days http://bugs.python.org/issue8854 lemburg patch, 64bit execfile does not work with UNC paths 21 days http://bugs.python.org/issue8869 tim.golden getargs.c: release the buffer on error 18 days http://bugs.python.org/issue8926 haypo patch PyArg_Parse*(): "z" should not accept bytes 16 days http://bugs.python.org/issue8949 haypo patch PyArg_Parse*(): factorize code of 's' and 'z' formats, and 'u' 16 days http://bugs.python.org/issue8951 haypo patch WINFUNCTYPE wrapped ctypes callbacks not functioning correctly 12 days http://bugs.python.org/issue8959 theller Year range in timetuple 5 days http://bugs.python.org/issue9005 belopolsky patch os.path.normcase(None) does not raise an error on linux and sh 8 days http://bugs.python.org/issue9018 ezio.melotti patch, easy 2to3 doesn't handle byte comparison well 0 days http://bugs.python.org/issue9043 merwok UnboundLocalError in nested function 1 days http://bugs.python.org/issue9049 mark.dickinson UnboundLocalError in nested function 0 days http://bugs.python.org/issue9050 merwok 2.7rc2 fails test_urllib_localnet tests on OS X 0 days http://bugs.python.org/issue9052 belopolsky PyUnicodeDecodeError_Create asserts that various arguments are 0 days http://bugs.python.org/issue9058 benjamin.peterson patch Backwards compatibility 0 days http://bugs.python.org/issue9059 ezio.melotti urllib.urlopen crashes when launched from a thread 0 days http://bugs.python.org/issue9062 orsenthil "from . import *" 0 days http://bugs.python.org/issue9068 brett.cannon Timestamps are rounded differently in py3k and trunk 0 days http://bugs.python.org/issue9070 belopolsky TarFile doesn't support member files with a leading "./" 1 days http://bugs.python.org/issue9071 free.ekanayaka Unloading modules - memleaks? 0 days http://bugs.python.org/issue9072 yappie ssl module sets "debug" flag on SSL struct 0 days http://bugs.python.org/issue9075 pitrou argparse does not handle arguments correctly after -- 1 days http://bugs.python.org/issue9077 iElectric Provide list prepend method (even though it's not efficient) 0 days http://bugs.python.org/issue9080 andybuckley webbrowser.open_new() opens in an existing browser window 2463 days http://bugs.python.org/issue812089 r.david.murray mbcs encoding ignores errors 2394 days http://bugs.python.org/issue850997 haypo patch Top Issues Most Discussed (10) ______________________________ 19 Non-uniformity in randrange for large arguments. 7 days open http://bugs.python.org/issue9025 19 2.7: eval hangs on AIX 8 days open http://bugs.python.org/issue9020 17 Python 2.7rc2 doesn't build on Mac OS X 10.4 4 days open http://bugs.python.org/issue9046 14 test_float failure on Solaris 1 days open http://bugs.python.org/issue9069 13 msvc9compiler.py: find_vcvarsall() doesn't work with VS2008 on 23 days closed http://bugs.python.org/issue8854 10 no OS X buildbots in the stable list 4 days open http://bugs.python.org/issue9048 10 os.path.normcase(None) does not raise an error on linux and sho 8 days closed http://bugs.python.org/issue9018 8 Provide list prepend method (even though it's not efficient) 0 days closed http://bugs.python.org/issue9080 8 Improve quality of Python/dtoa.c 9 days open http://bugs.python.org/issue9009 8 Add Mercurial support to patchcheck 10 days open http://bugs.python.org/issue8999 From barry at python.org Fri Jun 25 18:18:47 2010 From: barry at python.org (Barry Warsaw) Date: Fri, 25 Jun 2010 12:18:47 -0400 Subject: [Python-Dev] Schedule for Python 2.6.6 Message-ID: <20100625121847.60331d9e@heresy> Benjamin is still planning to release Python 2.7 final on 2010-07-03, so it's time for me to work out the release schedule for Python 2.6.6 - likely the last maintenance release for Python 2.6. Because summer schedules are crazy, and I want to leave two weeks between 2.6.6 rc1 and 2.6.6 final, my current schedule looks like: * Python 2.6.6 rc 1 on Monday 2010-08-02 * Python 2.6.6 final on Monday 2010-08-16 This should give folks plenty of time to relax after 2.7 final, and still be able to get those last minute fixes into the 2.6 tree. Let me know if these dates don't work for you. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From stephen at xemacs.org Fri Jun 25 18:18:33 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 26 Jun 2010 01:18:33 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> Message-ID: <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> P.J. Eby writes: > I do know the ultimate target codec -- that's the point. > > IOW, I want to be able to do to all my operations by passing > target-encoded strings to polymorphic functions. IOW, you *do* have text and (ignoring efficiency issues) could just as well use str. But That Other Code is unreliable, so you need a marker for your own internal strings indicating that they are validated, while other strings are not. This has nothing to do with bytes vs. str as string types, then; it's all about validated (which your architecture indicates by using the bytes type) vs. unvalidated (which your architecture indicates with unicode). Eg, in the case of your USPS vs. ecommerce example, you can't even handle all bytes, so not all possible bytes objects are valid. And other applications might not be able to handle all Japanese, but only a subset, so having valid EUC-JP wouldn't be enough, you'd have to check repertoire -- might as well use str. It seems to me what is wanted here is something like Perl's taint mechanism, for *both* kinds of strings. Am I missing something? But with your architecture, it seems to me that you actually don't want polymorphic functions in the stdlib. You want the stdlib functions to be bytes-oriented if and only if they are reliable. (This is what I was saying to Guido elsewhere.) BTW, this was a little unclear to me: > [Collisions will] be with other *unicode* strings. Ones coming > from other code, and literals embedded in the stdlib. What about the literals in the stdlib? Are you saying they contain invalid code points for your known output encoding? Or are you saying that with non-polymorphic unicode stdlib, you get lots of false positives when combining with your validated bytes? From barry at python.org Fri Jun 25 18:28:29 2010 From: barry at python.org (Barry Warsaw) Date: Fri, 25 Jun 2010 12:28:29 -0400 Subject: [Python-Dev] Schedule for Python 2.6.6 In-Reply-To: <20100625121847.60331d9e@heresy> References: <20100625121847.60331d9e@heresy> Message-ID: <20100625122829.30b20e67@heresy> On Jun 25, 2010, at 12:18 PM, Barry Warsaw wrote: >* Python 2.6.6 rc 1 on Monday 2010-08-02 >* Python 2.6.6 final on Monday 2010-08-16 I've also updated the Google calendar of Python releases: http://www.google.com/calendar/ical/b6v58qvojllt0i6ql654r1vh00%40group.calendar.google.com/public/basic.ics -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From stephen at xemacs.org Fri Jun 25 18:30:08 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 26 Jun 2010 01:30:08 +0900 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <876317o28m.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <871vbvnkhb.fsf@uwakimon.sk.tsukuba.ac.jp> Ian Bicking writes: > I'm proposing these specials would be used in polymorphic functions, like > the functions in urllib.parse. I would not personally use them in my own > code (unless of course I was writing my own polymorphic functions). > > This also makes it less important that the objects be a full stand-in for > text, as their use should be isolated to specific functions, they aren't > objects that should be passed around much. So you can easily identify and > quickly detect if you use unsupported operations on those text-like > objects. OK. That sounds reasonable to me, but I don't see any need for a builtin type for it. Inclusion in the stdlib is not quite a no-brainer, but given Guido's endorsement of polymorphism, I can't bring myself to go lower than +0.9 . > (This is all a very different use case from bytes+encoding, I think) Very much so. From stephen at xemacs.org Fri Jun 25 18:37:58 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 26 Jun 2010 01:37:58 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> <878w63oam0.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87zkyjm5jt.fsf@uwakimon.sk.tsukuba.ac.jp> Ian Bicking writes: > I don't get what you are arguing against. Are you worried that if > we make URL code polymorphic that this will mean some code will > treat URLs as bytes, and that code will be incompatible with URLs > as text? No one is arguing we remove text support from any of > these functions, only that we allow bytes. No, I understand what Guido means by "polymorphic". I'm arguing that as I understand one of Philip Eby's use cases, "bytes" is a misspelling of "validated" and "unicode" is a misspelling of "unvalidated". In case of some kind of bug, polymorphic stdlib functions would allow propagation of unvalidated/unicode within the validated zone, aka "errors passing silently". Now that I understand that that use case doesn't actually care about bytes vs. unicode *string* semantics at all, the argument becomes moot, I guess. From ianb at colorstudy.com Fri Jun 25 18:54:05 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 25 Jun 2010 11:54:05 -0500 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <871vbvnkhb.fsf@uwakimon.sk.tsukuba.ac.jp> References: <11597.1277401099@parc.com> <876317o28m.fsf@uwakimon.sk.tsukuba.ac.jp> <871vbvnkhb.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Jun 25, 2010 at 11:30 AM, Stephen J. Turnbull wrote: > Ian Bicking writes: > > > I'm proposing these specials would be used in polymorphic functions, > like > > the functions in urllib.parse. I would not personally use them in my > own > > code (unless of course I was writing my own polymorphic functions). > > > > This also makes it less important that the objects be a full stand-in > for > > text, as their use should be isolated to specific functions, they aren't > > objects that should be passed around much. So you can easily identify > and > > quickly detect if you use unsupported operations on those text-like > > objects. > > OK. That sounds reasonable to me, but I don't see any need for > a builtin type for it. Inclusion in the stdlib is not quite a > no-brainer, but given Guido's endorsement of polymorphism, I can't > bring myself to go lower than +0.9 . > Agreed on a builtin; I think it would be fine to put something in the strings module, and then in these examples code that used '/' would instead use strings.ascii('/') (not sure so sure of what the name should be though). -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Jun 25 18:57:50 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 25 Jun 2010 12:57:50 -0400 Subject: [Python-Dev] docs - Copy In-Reply-To: References: Message-ID: On 6/24/2010 8:51 PM, Rich Healey wrote: > http://docs.python.org/library/copy.html Discussion of the wording of current docs should go to python-list. Py-dev is for development of future Python. -- Terry Jan Reedy From fuzzyman at voidspace.org.uk Fri Jun 25 20:35:35 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 25 Jun 2010 19:35:35 +0100 Subject: [Python-Dev] Creating APIs that work as both decorators and context managers Message-ID: <4C24F6F7.4040200@voidspace.org.uk> Hello all, I've put a recipe up on the Python cookbook for creating APIs that work as both decorators and context managers and wonder if it would be considered a useful addition to the functools module. http://code.activestate.com/recipes/577273-decorator-and-context-manager-from-a-single-api/ I wrote this after writing almost identical code the second time for "patch" in the mock module. (The patch decorator can be used as a decorator or as a context manager and I was writing a new variant.) Both py.test and django have similar code in places, so it is not an uncommon pattern. It is only 40 odd lines (ignore the ugly Python 2 & 3 compatibility hack), so I'm fine with it living on the cookbook - but it is at least slightly fiddly to write and has the added niceness of providing the optional exception handling semantics of __exit__ for decorators as well. Example use (really hope email doesn't swallow the whitespace - my apologies in advance if it does): from context import Context class mycontext(Context): def __init__(self, *args): """Normal initialiser""" def start(self): """ Called on entering the with block or starting the decorated function. If used in a with statement whatever this method returns will be the context manager. """ def finish(self, *exc): """ Called on exit. Arguments and return value of this method have the same meaning as the __exit__ method of a normal context manager. """ @mycontext('some', 'args') def function(): pass with mycontext('some', 'args') as something: pass I'm not entirely happy with the name of the class or the start and finish methods, so open to suggestions there. start and finish *could* be __enter__ and __exit__ - but that would make the class you implement *look* like a normal context manager and I thought it was better to distinguish them. Perhaps before and after? All the best, Michael Foord -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Jun 25 20:58:42 2010 From: brett at python.org (Brett Cannon) Date: Fri, 25 Jun 2010 11:58:42 -0700 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C246E81.3020302@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: On Fri, Jun 25, 2010 at 01:53, Scott Dial wrote: > On 6/24/2010 8:23 PM, James Y Knight wrote: >> On Jun 24, 2010, at 5:53 PM, Scott Dial wrote: >>> If the package has .so files that aren't compatible with other version >>> of python, then what is the motivation for placing that in a shared >>> location (since it can't actually be shared) >> >> Because python looks for .so files in the same place it looks for the >> .py files of the same package. > > My suggestion was that a package that contains .so files should not be > shared (e.g., the entire lxml package should be placed in a > version-specific path). The motivation for this PEP was to simplify the > installation python packages for distros; it was not to reduce the > number of .py files on the disk. I assume you are talking about PEP 3147. You're right that the PEP was for pyc files and that's it. No one is talking about rewriting the PEP. The motivation Barry is using is an overarching one of distros wanting to use a single directory install location for all installed Python versions. That led to PEP 3147 and now this work. > > Placing .so files together does not simplify that install process in any > way. You will still have to handle such packages in a special way. You > must still compile the package multiple times for each relevant version > of python (with special tagging that I imagine distutils can take care > of) and, worse yet, you have created a more trick install than merely > having multiple search paths (e.g., installing/uninstalling lxml for > *one* version of python is actually more difficult in this scheme). This is meant to be used by distros in a programmatic fashion, so my response is "so what?" Their package management system is going to maintain the directory, not a person. You and I are not going to be using this for anything. This is purely meant for Linux OS vendors (maybe OS X) to manage their installs through their package software. I honestly do not expect human beings to be mucking around with these installs (and I suspect Barry doesn't either). > > Either the motivation for this PEP is inaccurate or I am failing to > understand how this is *simpler*. In the case of pure-python, this PEP > is clearly a win, but I have not seen an argument that it is a win for > .so files. Moreover, the PEP itself is titled "PYC Repository > Directories" (not "shared site-packages") and makes no mention of .so > files at all. You're conflating what is being discussed with PEP 3147. That PEP is independent of this. PEP 3147 just empowered this work to be relevant. -Brett > > -- > Scott Dial > scott at scottdial.com > scodial at cs.indiana.edu > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org > From scott+python-dev at scottdial.com Fri Jun 25 21:42:38 2010 From: scott+python-dev at scottdial.com (Scott Dial) Date: Fri, 25 Jun 2010 15:42:38 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: <4C2506AE.3060002@scottdial.com> On 6/25/2010 2:58 PM, Brett Cannon wrote: > I assume you are talking about PEP 3147. You're right that the PEP was > for pyc files and that's it. No one is talking about rewriting the > PEP. Yes, I am making reference to PEP 3147. I make reference to that PEP because this change is of the same order of magnitude as the .pyc change, and we asked for a PEP for that, and if this .so stuff is an extension of that thought process, then it should either be reflected by that PEP or a new PEP. > The motivation Barry is using is an overarching one of distros > wanting to use a single directory install location for all installed > Python versions. That led to PEP 3147 and now this work. It's unclear to me that that is the correct motivation, which you are divining. As I understand it, the motivation to be to *simplify installation* for distros, which may or may not be achieved by using a single directory. In the case of pure-python packages, a single directory is an obvious win. In the case of mixed-python packages, I remain to be persuaded there is any improvement achieved. > This is meant to be used by distros in a programmatic fashion, so my > response is "so what?" Their package management system is going to > maintain the directory, not a person. Then why is the status quo unacceptable? I have already explained how this will still require programmatic steps of at least the same difficulty as the status quo requires, so why should we change anything? I am skeptical that this is a simple programmatic problem either: take any random package on PyPI and tell me whether or not it has a .so file that must be compiled. If such a .so file exists, then this package must be special-cased and compiled for each version of Python on the system (or will ever be on the system?). Such a package yields an arbitrary number of .so files due to the number of version of Python on the machine, and I can't imagine how it is simpler to manage all of those files than it is to manage multiple site-packages. > You're conflating what is being discussed with PEP 3147. That PEP is > independent of this. PEP 3147 just empowered this work to be relevant. Without a PEP (be it PEP 3147 or some other), what is the justification for doing this? The burden should be on "you" to explain why this is a good idea and not just a clever idea. -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From dickinsm at gmail.com Fri Jun 25 22:02:36 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 25 Jun 2010 21:02:36 +0100 Subject: [Python-Dev] Creating APIs that work as both decorators and context managers In-Reply-To: <4C24F6F7.4040200@voidspace.org.uk> References: <4C24F6F7.4040200@voidspace.org.uk> Message-ID: On Fri, Jun 25, 2010 at 7:35 PM, Michael Foord wrote: > Hello all, > > I've put a recipe up on the Python cookbook for creating APIs that work as > both decorators and context managers and wonder if it would be considered a > useful addition to the functools module. > http://code.activestate.com/recipes/577273-decorator-and-context-manager-from-a-single-api/ It's an interesting idea. I wanted almost exactly this a little while ago, while doing some experiments to add an IEEE 754-compliance wrapper to the decimal module (for my own use). It seems quite natural that one might want to wrap both functions and blocks in the same way. [1] In case anyone wants the details, this was for a 'delay-exceptions' operation, that allows you to execute some number of arithmetic operations, keeping track of the floating-point signals that they produce but not raising the corresponding exceptions until the end of the block; obviously this idea applies equally well to functions as to blocks. It's one of the recommended exception handling modes from section 8 of IEEE 754-2008. Mark From foom at fuhm.net Fri Jun 25 22:12:34 2010 From: foom at fuhm.net (James Y Knight) Date: Fri, 25 Jun 2010 16:12:34 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C246E81.3020302@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: On Jun 25, 2010, at 4:53 AM, Scott Dial wrote: > On 6/24/2010 8:23 PM, James Y Knight wrote: >> On Jun 24, 2010, at 5:53 PM, Scott Dial wrote: >>> If the package has .so files that aren't compatible with other >>> version >>> of python, then what is the motivation for placing that in a shared >>> location (since it can't actually be shared) >> >> Because python looks for .so files in the same place it looks for the >> .py files of the same package. > > My suggestion was that a package that contains .so files should not be > shared (e.g., the entire lxml package should be placed in a > version-specific path). The motivation for this PEP was to simplify > the > installation python packages for distros; it was not to reduce the > number of .py files on the disk. > > Placing .so files together does not simplify that install process in > any > way. You will still have to handle such packages in a special way. This is a good point, but I think still falls short of a solution. For a package like lxml, indeed you are correct. Since debian needs to build it once per version, it could just put the entire package (.py files and .so files) into a different per-python-version directory. However, then you have to also consider python packages made up of multiple distro packages -- like twisted or zope. Twisted includes some C extensions in the core package. But then there are other twisted modules (installed under a "twisted.foo" name) which do not include C extensions. If the base twisted package is installed under a version-specific directory, then all of the submodule packages need to also be installed under the same version-specific directory (and thus built for all versions). In the past, it has proven somewhat tricky to coordinate which directory the modules for package "foo" should be installed in, because you need to know whether *any* of the related packages includes a native ".so" file, not just the current package. The converse situation, where a base package did *not* get installed into a version-specific directory because it includes no native code, but a submodule *does* include a ".so" file, is even trickier. James From martin at v.loewis.de Fri Jun 25 22:27:31 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Jun 2010 22:27:31 +0200 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <20100625003149.GA16084@thorne.id.au> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> Message-ID: <4C251133.2090505@v.loewis.de> >>> I am extremely keen for this to happen. Does anyone have ownership of this >>> project? There was some discussion of it up-list but the discussion fizzled. >> >> Can you please explain what "this project" is, in the context of your >> message? GSoC? GHOP? > > Oh, I thought this was quite clear. I was specifically meaning the large > "Python 2 or 3" button on python.org. It would help users who want to know > what version of python to use if they had a clear guide as to what version > to download. Ah, ok. No, nobody has taken ownership of that project, and likely, nobody actually will - unless you volunteer. Regards, Martin From martin at v.loewis.de Fri Jun 25 22:30:34 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Jun 2010 22:30:34 +0200 Subject: [Python-Dev] docs - Copy In-Reply-To: References: Message-ID: <4C2511EA.3000200@v.loewis.de> Am 25.06.2010 18:57, schrieb Terry Reedy: > On 6/24/2010 8:51 PM, Rich Healey wrote: >> http://docs.python.org/library/copy.html > > Discussion of the wording of current docs should go to python-list. > Py-dev is for development of future Python. No no no. Mis-worded documentation is a bug, just like any other bug, and deserves being discussed here. Furthermore, a sufficient condition for mis-wording is if a user read it in full, and still managed to misunderstand (as happened here). Regards, Martin From martin at v.loewis.de Fri Jun 25 22:31:28 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Jun 2010 22:31:28 +0200 Subject: [Python-Dev] docs - Copy In-Reply-To: References: Message-ID: <4C251220.3050106@v.loewis.de> > My apologies guys, I see now. > > I will see if I can think of a less ambiguous way to word this and submit a bug. Please don't take out or rephrase the word "shallow", though. This has a long CS tradition of meaning exactly what is meant here. Regards, Martin From martin at v.loewis.de Fri Jun 25 22:33:38 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Jun 2010 22:33:38 +0200 Subject: [Python-Dev] Schedule for Python 2.6.6 In-Reply-To: <20100625121847.60331d9e@heresy> References: <20100625121847.60331d9e@heresy> Message-ID: <4C2512A2.1040404@v.loewis.de> Am 25.06.2010 18:18, schrieb Barry Warsaw: > Benjamin is still planning to release Python 2.7 final on 2010-07-03, so it's > time for me to work out the release schedule for Python 2.6.6 - likely the > last maintenance release for Python 2.6. > > Because summer schedules are crazy, and I want to leave two weeks between > 2.6.6 rc1 and 2.6.6 final, my current schedule looks like: > > * Python 2.6.6 rc 1 on Monday 2010-08-02 > * Python 2.6.6 final on Monday 2010-08-16 That would barely work for me. If schedule slips in any way, we'll have to move the release into end-of-September (but the days as proposed are fine). Regards, Martin From glyph at twistedmatrix.com Fri Jun 25 22:43:55 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Fri, 25 Jun 2010 16:43:55 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> Message-ID: <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> On Jun 24, 2010, at 4:59 PM, Guido van Rossum wrote: > Regarding the proposal of a String ABC, I hope this isn't going to > become a backdoor to reintroduce the Python 2 madness of allowing > equivalency between text and bytes for *some* strings of bytes and not > others. For my part, what I want out of a string ABC is simply the ability to do application-specific optimizations. There are many applications where all input and output is text, but _must_ be UTF-8. Even GTK uses UTF-8 as its native text representation, so "output" could just be display. Right now, in Python 3, the only way to be "correct" about this is to copy every byte of input into 4 bytes of output, then copy each code point *back* into a single byte of output. If all your application does is rewrite the occasional XML attribute, for example, this cost can be significant, if not overwhelming. I'd like a version of 'decode' which would give me a type that was, in every respect, unicode, and responded to all protocols exactly as other unicode objects (or "str objects", if you prefer py3 nomenclature ;-)) do, but wouldn't actually copy any of that memory unless it really needed to (for example, to pass to a C API that expected native wide characters), and that would hold on to the original bytes so that it could produce them on demand if encoded to the same encoding again. So, as others in this thread have mentioned, the 'ABC' really implies some stuff about C APIs as well. I'm not sure about the exact performance impact of such a class, which is why I'd like the ability to implement it *outside* of the stdlib and see how it works on a project, and return with a proposal along with some data. There are also different ways to implement this, and other optimizations (like ropes) which might be better. You can almost do this today, but the lack of things like the hypothetical "__rcontains__" does make it impossible to be totally transparent about it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Fri Jun 25 22:59:44 2010 From: barry at python.org (Barry Warsaw) Date: Fri, 25 Jun 2010 16:59:44 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C23D006.6080800@netwok.org> References: <20100624115048.4fd152e3@heresy> <4C23A901.7060100@netwok.org> <20100624172302.024687ef@heresy> <4C23D006.6080800@netwok.org> Message-ID: <20100625165944.2cac0053@heresy> On Jun 24, 2010, at 11:37 PM, ?ric Araujo wrote: >Your plan seems good. Adding keyword arguments should not create >compatibility issues, and I suspect the impact on the code of build_ext >may be actually quite small. I?ll try to review your patch even though I >don?t know C or compiler oddities, but Tarek will have the best insight >and the final word. The C and configure/Makefile bits are pretty trivial. It basically extends the list of shared library extensions searched for on *nix machines, and allows that to be set on the ./configure command. As for the impact on distutils, with updated tests, it's less than 100 lines of diff. Again there it essentially allows us to pass the extension that build_ext writes to from the setup.py, via the Extension class. Because distutil's default is to use the $SO variable from the system-installed Makefile, with the change to dynload_shlib.c, configure.in, and Makefile.pre.in, we would get distutils writing the versioned .so files for free. I'll note further that if you *don't* specify this to ./configure, nothing much changes[1]. The distutils part of the patch is only there to disable or override the default, and *that's* only there to support proposed semantics that foo.so be used for PEP 384-compliant ABI extension modules. IOW, until PEP 384 is actually implemented, the distutils part of the patch is unnecessary. However, if the other changes are accepted, then I will add a discussion of this issue to PEP 384, and we can figure out the best semantics and implementation at that point. I honestly don't know if I am going to get to work on PEP 384 before 3.2 beta. >In case the time machine?s not available, your suggestion about getting >the filename from the Extension instance instead of passing in a string >can most certainly land in distutils2. Cool. -Barry [1] Well, I now realize you'll get an extra useless stat call, but I will fix that. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From guido at python.org Fri Jun 25 23:02:05 2010 From: guido at python.org (Guido van Rossum) Date: Fri, 25 Jun 2010 14:02:05 -0700 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> Message-ID: On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz wrote: > > On Jun 24, 2010, at 4:59 PM, Guido van Rossum wrote: > > Regarding the proposal of a String ABC, I hope this isn't going to > become a backdoor to reintroduce the Python 2 madness of allowing > equivalency between text and bytes for *some* strings of bytes and not > others. > > For my part, what I want out of a string ABC is simply the ability to do > application-specific optimizations. > There are many applications where all input and output is text, but _must_ > be UTF-8. ?Even GTK uses UTF-8 as its native text representation, so > "output" could just be display. > Right now, in Python 3, the only way to be "correct" about this is to copy > every byte of input into 4 bytes of output, then copy each code point *back* > into a single byte of output. ?If all your application does is rewrite the > occasional XML attribute, for example, this cost can be significant, if not > overwhelming. > I'd like a version of 'decode' which would give me a type that was, in every > respect, unicode, and responded to all protocols exactly as other > unicode?objects?(or "str objects", if you prefer py3 nomenclature ;-)) do, > but wouldn't actually copy any of that memory unless it really needed to > (for example, to pass to a C API that expected native wide characters), and > that would hold on to the original bytes so that it could produce them on > demand if encoded to the same encoding again.?So, as others in this thread > have mentioned, the 'ABC' really implies some stuff about C APIs as well. > I'm not sure about the exact performance impact of such a class, which is > why I'd like the ability to implement it *outside* of the stdlib and see how > it works on a project, and return with a proposal along with some data. > ?There are also different ways to implement this, and other optimizations > (like ropes) which might be better. > You can almost do this today, but the lack of things like the hypothetical > "__rcontains__" does make it impossible to be totally transparent about it. But you'd still have to validate it, right? You wouldn't want to go on using what you thought was wrapped UTF-8 if it wasn't actually valid UTF-8 (or you'd be worse off than in Python 2). So you're really just worried about space consumption. I'd like to see a lot of hard memory profiling data before I got overly worried about that. -- --Guido van Rossum (python.org/~guido) From barry at python.org Fri Jun 25 23:03:22 2010 From: barry at python.org (Barry Warsaw) Date: Fri, 25 Jun 2010 17:03:22 -0400 Subject: [Python-Dev] Schedule for Python 2.6.6 In-Reply-To: <4C2512A2.1040404@v.loewis.de> References: <20100625121847.60331d9e@heresy> <4C2512A2.1040404@v.loewis.de> Message-ID: <20100625170322.5ece724f@heresy> On Jun 25, 2010, at 10:33 PM, Martin v. L?wis wrote: >Am 25.06.2010 18:18, schrieb Barry Warsaw: >> Benjamin is still planning to release Python 2.7 final on 2010-07-03, so it's >> time for me to work out the release schedule for Python 2.6.6 - likely the >> last maintenance release for Python 2.6. >> >> Because summer schedules are crazy, and I want to leave two weeks between >> 2.6.6 rc1 and 2.6.6 final, my current schedule looks like: >> >> * Python 2.6.6 rc 1 on Monday 2010-08-02 >> * Python 2.6.6 final on Monday 2010-08-16 > >That would barely work for me. If schedule slips in any way, we'll have >to move the release into end-of-September (but the days as proposed are >fine). Would that be bad or good (slipping into September)? I'd like to get a release out as soon after 2.7 final as possible, but it's an entirely self-imposed deadline. There's no reason why we can't push the whole 2.6.6 thing later if that works better for you. OTOH, I can't go much earlier so if September is bad for you, then we'll stick to the above dates. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From fuzzyman at voidspace.org.uk Fri Jun 25 23:06:00 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 25 Jun 2010 22:06:00 +0100 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <4C251133.2090505@v.loewis.de> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> <4C251133.2090505@v.loewis.de> Message-ID: <4C251A38.3090205@voidspace.org.uk> On 25/06/2010 21:27, "Martin v. L?wis" wrote: >>>> I am extremely keen for this to happen. Does anyone have ownership of this >>>> project? There was some discussion of it up-list but the discussion fizzled. >>>> >>> Can you please explain what "this project" is, in the context of your >>> message? GSoC? GHOP? >>> >> Oh, I thought this was quite clear. I was specifically meaning the large >> "Python 2 or 3" button on python.org. It would help users who want to know >> what version of python to use if they had a clear guide as to what version >> to download. >> > Ah, ok. No, nobody has taken ownership of that project, and likely, > nobody actually will - unless you volunteer. > What page were we suggesting linking to? IIRC someone made a good start in the wiki. I'll move the discussion to pydotorg-www (still need the question about answering) and see if we can get it done. All the best, Michael > Regards, > Martin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From martin at v.loewis.de Fri Jun 25 23:14:53 2010 From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Jun 2010 23:14:53 +0200 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <4C251A38.3090205@voidspace.org.uk> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> <4C251133.2090505@v.loewis.de> <4C251A38.3090205@voidspace.org.uk> Message-ID: <4C251C4D.50806@v.loewis.de> > What page were we suggesting linking to? I don't think anybody proposed anything specific. Steve Holden suggested it should go to "reasoned discussion of the pros and cons as evinced in this thread". Stephen Thorne didn't propose anything specific but to have a large button. > I'll move the discussion to pydotorg-www I'll predict that this is its death :-( Regards, Martin From martin at v.loewis.de Fri Jun 25 23:16:23 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Jun 2010 23:16:23 +0200 Subject: [Python-Dev] Schedule for Python 2.6.6 In-Reply-To: <20100625170322.5ece724f@heresy> References: <20100625121847.60331d9e@heresy> <4C2512A2.1040404@v.loewis.de> <20100625170322.5ece724f@heresy> Message-ID: <4C251CA7.3070902@v.loewis.de> > Would that be bad or good (slipping into September)? I'd like to get a > release out as soon after 2.7 final as possible, but it's an entirely > self-imposed deadline. There's no reason why we can't push the whole 2.6.6 > thing later if that works better for you. OTOH, I can't go much earlier so if > September is bad for you, then we'll stick to the above dates. I think we can strive for your original proposal. If it slips, we let it slip by a month or two. Regards, Martin From fuzzyman at voidspace.org.uk Fri Jun 25 23:31:45 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 25 Jun 2010 22:31:45 +0100 Subject: [Python-Dev] Creating APIs that work as both decorators and context managers In-Reply-To: <4C24F6F7.4040200@voidspace.org.uk> References: <4C24F6F7.4040200@voidspace.org.uk> Message-ID: <4C252041.7000808@voidspace.org.uk> On 25/06/2010 19:35, Michael Foord wrote: > Hello all, > > I've put a recipe up on the Python cookbook for creating APIs that > work as both decorators and context managers and wonder if it would be > considered a useful addition to the functools module. > > http://code.activestate.com/recipes/577273-decorator-and-context-manager-from-a-single-api/ Actually contextlib would be a much more sensible home for it. Michael > > I wrote this after writing almost identical code the second time for > "patch" in the mock module. (The patch decorator can be used as a > decorator or as a context manager and I was writing a new variant.) > Both py.test and django have similar code in places, so it is not an > uncommon pattern. > > It is only 40 odd lines (ignore the ugly Python 2 & 3 compatibility > hack), so I'm fine with it living on the cookbook - but it is at least > slightly fiddly to write and has the added niceness of providing the > optional exception handling semantics of __exit__ for decorators as well. > > Example use (really hope email doesn't swallow the whitespace - my > apologies in advance if it does): > > from context import Context > > class mycontext(Context): > def __init__(self, *args): > """Normal initialiser""" > > def start(self): > """ > Called on entering the with block or starting the decorated > function. > > If used in a with statement whatever this method returns will > be the > context manager. > """ > > def finish(self, *exc): > """ > Called on exit. Arguments and return value of this method have > the same meaning as the __exit__ method of a normal context > manager. > """ > > @mycontext('some', 'args') > def function(): > pass > > with mycontext('some', 'args') as something: > pass > > I'm not entirely happy with the name of the class or the start and > finish methods, so open to suggestions there. start and finish *could* > be __enter__ and __exit__ - but that would make the class you > implement *look* like a normal context manager and I thought it was > better to distinguish them. Perhaps before and after? > > All the best, > > Michael Foord > -- > http://www.ironpythoninaction.com/ > http://www.voidspace.org.uk/blog > > READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From glyph at twistedmatrix.com Fri Jun 25 23:40:34 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Fri, 25 Jun 2010 17:40:34 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> Message-ID: <51EFE211-DBCA-497E-9BC5-CC0D2256173E@twistedmatrix.com> On Jun 25, 2010, at 5:02 PM, Guido van Rossum wrote: > But you'd still have to validate it, right? You wouldn't want to go on > using what you thought was wrapped UTF-8 if it wasn't actually valid > UTF-8 (or you'd be worse off than in Python 2). So you're really just > worried about space consumption. So, yes, I am mainly worried about memory consumption, but don't underestimate the pure CPU cost of doing all the copying. It's quite a bit faster to simply scan through a string than to scan and while you're scanning, keep faulting out the L2 cache while you're accessing some other area of memory to store the copy. Plus, If I am decoding with the surrogateescape error handler (or its effective equivalent), then no, I don't need to validate it in advance; interpretation can be done lazily as necessary. I realize that this is just GIGO, but I wouldn't be doing this on data that didn't have an explicitly declared or required encoding in the first place. > I'd like to see a lot of hard memory profiling data before I got overly worried about that. I know of several Python applications that are already constrained by memory. I don't have a lot of hard memory profiling data, but in an environment where you're spawning as many processes as you can in order to consume _all_ the physically available RAM for string processing, it stands to reason that properly decoding everything and thereby exploding everything out into 4x as much data (or 2x, if you're lucky) would result in a commensurate decrease in throughput. I don't think I could even reasonably _propose_ that such a project stop treating textual data as bytes, because there's no optimization strategy once that sort of architecture has been put into place. If your function says "this takes unicode", then you just have to bite the bullet and decode it, or rewrite it again to have a different requirement. So, right now, I don't know where I'd get the data with to make the argument in the first place :). If there were some abstraction in the core's treatment of strings, though, and I could decode things and note their encoding without immediately paying this cost (or alternately, paying the cost to see if it's so bad, but with the option of managing it or optimizing it separately). This is why I'm asking for a way for me to implement my own string type, and not for a change of behavior or an optimization in the stdlib itself: I could be wrong, I don't have a particularly high level of certainty in my performance estimates, but I think that my concerns are realistic enough that I don't want to embark on a big re-architecture of text-handling only to have it become a performance nightmare that needs to be reverted. As Robert Collins pointed out, they already have performance issues related to encoding in Bazaar. I know they've done a lot of profiling in that area, so I hope eventually someone from that project will show up with some data to demonstrate it :). And I've definitely heard many, many anecdotes (some of them in this thread) about people distorting their data structures in various ways to avoid paying decoding cost in the ASCII/latin1 case, whether it's *actually* a significant performance issue or not. I would very much like to tell those people "Just call .decode(), and if it turns out to actually be a performance issue, you can always deal with it later, with a custom string type." I'm confident that in *most* cases, it would not be. Anyway, this may be a serious issue, but I increasingly feel like I'm veering into python-ideas territory, so perhaps I'll just have to burn this bridge when I come to it. Hopefully after the moratorium. -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Fri Jun 25 23:53:06 2010 From: barry at python.org (Barry Warsaw) Date: Fri, 25 Jun 2010 17:53:06 -0400 Subject: [Python-Dev] Schedule for Python 2.6.6 In-Reply-To: <4C251CA7.3070902@v.loewis.de> References: <20100625121847.60331d9e@heresy> <4C2512A2.1040404@v.loewis.de> <20100625170322.5ece724f@heresy> <4C251CA7.3070902@v.loewis.de> Message-ID: <20100625175306.6fa9e1eb@heresy> On Jun 25, 2010, at 11:16 PM, Martin v. L?wis wrote: >> Would that be bad or good (slipping into September)? I'd like to get a >> release out as soon after 2.7 final as possible, but it's an entirely >> self-imposed deadline. There's no reason why we can't push the whole 2.6.6 >> thing later if that works better for you. OTOH, I can't go much earlier so if >> September is bad for you, then we'll stick to the above dates. > >I think we can strive for your original proposal. If it slips, we let it >slip by a month or two. Cool, thanks Martin. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From fuzzyman at voidspace.org.uk Fri Jun 25 23:53:29 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 25 Jun 2010 22:53:29 +0100 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <4C251C4D.50806@v.loewis.de> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> <4C251133.2090505@v.loewis.de> <4C251A38.3090205@voidspace.org.uk> <4C251C4D.50806@v.loewis.de> Message-ID: <4C252559.5060800@voidspace.org.uk> On 25/06/2010 22:14, "Martin v. L?wis" wrote: >> What page were we suggesting linking to? >> > I don't think anybody proposed anything specific. Steve Holden > suggested it should go to "reasoned discussion of the > pros and cons as evinced in this thread". Stephen Thorne didn't > propose anything specific but to have a large button. > > Earlier in this discussion *someone* did start a page on the wiki, with this use case in mind... You forced me to actually look it up: http://wiki.python.org/moin/Python2orPython3 >> I'll move the discussion to pydotorg-www >> > I'll predict that this is its death :-( > Heh. Michael > Regards, > Martin > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From tseaver at palladion.com Sat Jun 26 00:12:10 2010 From: tseaver at palladion.com (Tres Seaver) Date: Fri, 25 Jun 2010 18:12:10 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Guido van Rossum wrote: > But you'd still have to validate it, right? You wouldn't want to go on > using what you thought was wrapped UTF-8 if it wasn't actually valid > UTF-8 (or you'd be worse off than in Python 2). So you're really just > worried about space consumption. I'd like to see a lot of hard memory > profiling data before I got overly worried about that. I do know for a fact that using a UCS2-compiled Python instead of the system's UCS4-compiled Python leads to measurable, noticable drop in memory consumption of long-running webserver processes using Unicode (Zope, repoze.bfg, etc). We routinely build Python from source for deployments precisely because of this fact (in part -- the absurd choices made by packagers to exclude crucial bits on various pltaforms is the other part). Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwlKbQACgkQ+gerLs4ltQ4TfACdHgLXPHeGw42GidhQdzABkQaR +nEAoLE1sd+g1aJuxSn6swvvX0g52EU4 =MSwx -----END PGP SIGNATURE----- From ianb at colorstudy.com Sat Jun 26 00:26:20 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 25 Jun 2010 17:26:20 -0500 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> Message-ID: On Fri, Jun 25, 2010 at 4:02 PM, Guido van Rossum wrote: > On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz > > I'd like a version of 'decode' which would give me a type that was, in > every > > respect, unicode, and responded to all protocols exactly as other > > unicode objects (or "str objects", if you prefer py3 nomenclature ;-)) > do, > > but wouldn't actually copy any of that memory unless it really needed to > > (for example, to pass to a C API that expected native wide characters), > and > > that would hold on to the original bytes so that it could produce them on > > demand if encoded to the same encoding again. So, as others in this > thread > > have mentioned, the 'ABC' really implies some stuff about C APIs as well. > > I'm not sure about the exact performance impact of such a class, which is > > why I'd like the ability to implement it *outside* of the stdlib and see > how > > it works on a project, and return with a proposal along with some data. > > There are also different ways to implement this, and other optimizations > > (like ropes) which might be better. > > You can almost do this today, but the lack of things like the > hypothetical > > "__rcontains__" does make it impossible to be totally transparent about > it. > > But you'd still have to validate it, right? You wouldn't want to go on > using what you thought was wrapped UTF-8 if it wasn't actually valid > UTF-8 (or you'd be worse off than in Python 2). So you're really just > worried about space consumption. I'd like to see a lot of hard memory > profiling data before I got overly worried about that. > It wasn't my profiling, but I seem to recall that Fredrik Lundh specifically benchmarked ElementTree with all-unicode and sometimes-ascii-bytes, and found that using Python 2 strs in some cases provided notable advantages. I know Stefan copied ElementTree in this regard in lxml, maybe he also did a benchmark or knows of one? -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Sat Jun 26 00:27:04 2010 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 25 Jun 2010 18:27:04 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100625222722.594D23A4099@sparrow.telecommunity.com> At 01:18 AM 6/26/2010 +0900, Stephen J. Turnbull wrote: >It seems to me what is wanted here is something like Perl's taint >mechanism, for *both* kinds of strings. Am I missing something? You could certainly view it as a kind of tainting. The part where the type would be bytes-based is indeed somewhat incidental to the actual use case -- it's just that if you already have the bytes, and all you want to do is tag them (e.g. the WSGI headers case), the extra encoding step seems pointless. A string coercion protocol (that would be used by .join(), .format(), __contains__, __mod__, etc.) would allow you to do whatever sort of tainted-string or tainted-bytes implementations one might wish to have. I suppose that tainting user inputs (as in Perl) would be just as useful of an application of the same coercion protocol. Actually, I have another use case for this custom string coercion, which is that I once wrote a string subclass whose purpose was to track the original file and line number of some text. Even though only my code was manipulating the strings, it was very difficult to get the tainting to work correctly without extreme care as to the string methods used. (For example, I had to use string addition rather than %-formatting.) >But with your architecture, it seems to me that you actually don't >want polymorphic functions in the stdlib. You want the stdlib >functions to be bytes-oriented if and only if they are reliable. (This >is what I was saying to Guido elsewhere.) I'm not sure I follow you. What I want is for the stdlib to create stringlike objects of a type determined by the types of the inputs -- where the logic for deciding this coercion can be controlled by the input objects' types, rather than putting this in the hands of the stdlib function. And of course, this applies to non-stdlib functions, too -- anything that simply manipulates user-defined string classes, should allow the user-defined classes to determine the coercion of the result. >BTW, this was a little unclear to me: > > > [Collisions will] be with other *unicode* strings. Ones coming > > from other code, and literals embedded in the stdlib. > >What about the literals in the stdlib? Are you saying they contain >invalid code points for your known output encoding? Or are you saying >that with non-polymorphic unicode stdlib, you get lots of false >positives when combining with your validated bytes? No, I mean that the current string coercion rules cause everything to be converted to unicode, thereby discarding the tainting information, so to speak. This applies equally to other tainting use cases, and other uses for custom stringlike objects. From steve at holdenweb.com Sat Jun 26 00:38:38 2010 From: steve at holdenweb.com (Steve Holden) Date: Fri, 25 Jun 2010 18:38:38 -0400 Subject: [Python-Dev] Signs of neglect? Message-ID: I was pretty stunned when I tried this. Remember that the Tools subdirectory is distributed with Windows, so this means we got through almost two releases without anyone realizing that 2to3 does not appear to have touched this code. Yes, I have: http://bugs.python.org/issue9083 When's 3.2 due out? regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From janssen at parc.com Sat Jun 26 00:40:52 2010 From: janssen at parc.com (Bill Janssen) Date: Fri, 25 Jun 2010 15:40:52 PDT Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> Message-ID: <26215.1277505652@parc.com> Guido van Rossum wrote: > On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz > wrote: > > > > On Jun 24, 2010, at 4:59 PM, Guido van Rossum wrote: > > > > Regarding the proposal of a String ABC, I hope this isn't going to > > become a backdoor to reintroduce the Python 2 madness of allowing > > equivalency between text and bytes for *some* strings of bytes and not > > others. I never actually replied to this... Absolutely right, which is why you might really want another kind of string, rather than a way to treat some bytes values as strings in some places. Both Python 2 and Python 3 are missing one of the three types. Python 1 and 2 didn't have "bytes", and this caused problems because "str" was pressed into use to hold arbitrary byte sequences. (Python 2 "str" has other problems as well, like losing track of the encoding.) Python 3 doesn't have Python 2's "str" (encoded string), and bytes are being pressed into use for that. Each of these uses is an ad hoc hijack of an inappropriate type, and additional frameworks not directly supported by the Python language are being jury-rigged to try to support the uses. On the other hand, this is all in the eye of the beholder. Both byte sequences and strings are horrible formless things; they remind me of BLISS. You seldom really have a byte sequence; what you have is an XDR float or an encoded string or an IP header or an email message. Similarly for strings; they are really file names or city names or English sentences or URIs or other things with significant semantic constraints not captured by the typical type system. So, yes, there *is* an inescapable equivalency between text and bytes for *some* sequences of bytes (those that represent encoded strings) and not others (those that contain the XDR float, for instance). Creating a separate encoded string type would be one way to keep that straight. > > For my part, what I want out of a string ABC is simply the ability to do > > application-specific optimizations. > > There are many applications where all input and output is text, but _must_ > > be UTF-8. ?Even GTK uses UTF-8 as its native text representation, so > > "output" could just be display. > > Right now, in Python 3, the only way to be "correct" about this is to copy > > every byte of input into 4 bytes of output, then copy each code point *back* > > into a single byte of output. ?If all your application does is rewrite the > > occasional XML attribute, for example, this cost can be significant, if not > > overwhelming. > > I'd like a version of 'decode' which would give me a type that was, in every > > respect, unicode, and responded to all protocols exactly as other > > unicode?objects?(or "str objects", if you prefer py3 nomenclature ;-)) do, > > but wouldn't actually copy any of that memory unless it really needed to > > (for example, to pass to a C API that expected native wide characters), and > > that would hold on to the original bytes so that it could produce them on > > demand if encoded to the same encoding again.?So, as others in this thread > > have mentioned, the 'ABC' really implies some stuff about C APIs as well. Seems like it. > > I'm not sure about the exact performance impact of such a class, which is > > why I'd like the ability to implement it *outside* of the stdlib and see how > > it works on a project, and return with a proposal along with some data. Yes, exactly. > > ?There are also different ways to implement this, and other optimizations > > (like ropes) which might be better. > > You can almost do this today, but the lack of things like the hypothetical > > "__rcontains__" does make it impossible to be totally transparent about it. > > But you'd still have to validate it, right? You wouldn't want to go on > using what you thought was wrapped UTF-8 if it wasn't actually valid > UTF-8 (or you'd be worse off than in Python 2). Yes, but there are different ways to validate it that have different performance impacts. Simply trusting the source of the string, for example, would be appropriate in some cases. > So you're really just worried about space consumption. I'd like to see > a lot of hard memory profiling data before I got overly worried about > that. While I've seen some big Web pages, I think the email folks, who often have to process messages with attachments measuring in the tens of megabytes, have the stronger problems here, and I think speed may be more important than memory. I've built both a Web server and an IMAP server in Python, and the IMAP server is where the issues of storage management really prevail. If you have to convert a 20 MB encoded string into a Unicode string just to look at the headers as strings, you have issues. (The Python email package doesn't do that, by the way.) Bill From steve at holdenweb.com Sat Jun 26 00:51:53 2010 From: steve at holdenweb.com (Steve Holden) Date: Fri, 25 Jun 2010 18:51:53 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <51EFE211-DBCA-497E-9BC5-CC0D2256173E@twistedmatrix.com> References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <51EFE211-DBCA-497E-9BC5-CC0D2256173E@twistedmatrix.com> Message-ID: Glyph Lefkowitz wrote: > > On Jun 25, 2010, at 5:02 PM, Guido van Rossum wrote: > >> But you'd still have to validate it, right? You wouldn't want to go on >> using what you thought was wrapped UTF-8 if it wasn't actually valid >> UTF-8 (or you'd be worse off than in Python 2). So you're really just >> worried about space consumption. > > So, yes, I am mainly worried about memory consumption, but don't > underestimate the pure CPU cost of doing all the copying. It's quite a > bit faster to simply scan through a string than to scan and while you're > scanning, keep faulting out the L2 cache while you're accessing some > other area of memory to store the copy. > Yes, but you are already talking about optimizations that might be significant for large-ish strings (where large-ish depends on exactly where Moore's Law is currently delivering computational performance) - the amount of cache consumed by a ten-byte string will slip by unnoticed, but at L2 levels megabytes would effectively flush the cache. > Plus, If I am decoding with the surrogateescape error handler (or its > effective equivalent), then no, I don't need to validate it in advance; > interpretation can be done lazily as necessary. I realize that this is > just GIGO, but I wouldn't be doing this on data that didn't have an > explicitly declared or required encoding in the first place. > >> I'd like to see a lot of hard memory profiling data before I got >> overly worried about that. > > I know of several Python applications that are already constrained by > memory. I don't have a lot of hard memory profiling data, but in an > environment where you're spawning as many processes as you can in order > to consume _all_ the physically available RAM for string processing, it > stands to reason that properly decoding everything and thereby exploding > everything out into 4x as much data (or 2x, if you're lucky) would > result in a commensurate decrease in throughput. > Yes, UCS-4's impact does seem like to could be horrible for these use cases. But "knowing of several Python applications that are already constrained by memory" doesn't mean that it's a bad general decision. Most users will never notice the difference, so we should try to accommodate those who do notice a difference without inconveniencing the rest too much. > I don't think I could even reasonably _propose_ that such a project stop > treating textual data as bytes, because there's no optimization strategy > once that sort of architecture has been put into place. If your function > says "this takes unicode", then you just have to bite the bullet and > decode it, or rewrite it again to have a different requirement. > That has always been my understanding. I regard it as a sort of intellectual tax on the United States (and its Western collaborators) for being too dim to realise that eventually they would end up selling computers to people with more than 256 characters in their alphabet). Sorry guys, but your computers are only as fast as you think they are when you only talk to each other. > So, right now, I don't know where I'd get the data with to make the > argument in the first place :). If there were some abstraction in the > core's treatment of strings, though, and I could decode things and note > their encoding without immediately paying this cost (or alternately, > paying the cost to see if it's so bad, but with the option of managing > it or optimizing it separately). This is why I'm asking for a way for > me to implement my own string type, and not for a change of behavior or > an optimization in the stdlib itself: I could be wrong, I don't have a > particularly high level of certainty in my performance estimates, but I > think that my concerns are realistic enough that I don't want to embark > on a big re-architecture of text-handling only to have it become a > performance nightmare that needs to be reverted. > Recent experience with the thoroughness of the Python 3 release preparations leads me to believe that *anything* new needs to prove its worth outside the stdlib for a while. > As Robert Collins pointed out, they already have performance issues > related to encoding in Bazaar. I know they've done a lot of profiling > in that area, so I hope eventually someone from that project will show > up with some data to demonstrate it :). And I've definitely heard many, > many anecdotes (some of them in this thread) about people distorting > their data structures in various ways to avoid paying decoding cost in > the ASCII/latin1 case, whether it's *actually* a significant performance > issue or not. I would very much like to tell those people "Just call > .decode(), and if it turns out to actually be a performance issue, you > can always deal with it later, with a custom string type." I'm > confident that in *most* cases, it would not be. > Well that would be a nice win. > Anyway, this may be a serious issue, but I increasingly feel like I'm > veering into python-ideas territory, so perhaps I'll just have to burn > this bridge when I come to it. Hopefully after the moratorium. > Sounds like it's worth pursuing, though. I mean after all, we don't want to leave *all* the bit-twiddling to the low-level language users ;-). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Sat Jun 26 00:57:10 2010 From: steve at holdenweb.com (Steve Holden) Date: Fri, 25 Jun 2010 18:57:10 -0400 Subject: [Python-Dev] docs - Copy In-Reply-To: <4C2511EA.3000200@v.loewis.de> References: <4C2511EA.3000200@v.loewis.de> Message-ID: Martin v. L?wis wrote: > Am 25.06.2010 18:57, schrieb Terry Reedy: >> On 6/24/2010 8:51 PM, Rich Healey wrote: >>> http://docs.python.org/library/copy.html >> Discussion of the wording of current docs should go to python-list. >> Py-dev is for development of future Python. > > No no no. [...] It isn't always easy to tell, but I think Martin meant "no". regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Sat Jun 26 00:54:24 2010 From: steve at holdenweb.com (Steve Holden) Date: Fri, 25 Jun 2010 18:54:24 -0400 Subject: [Python-Dev] Schedule for Python 2.6.6 In-Reply-To: <4C2512A2.1040404@v.loewis.de> References: <20100625121847.60331d9e@heresy> <4C2512A2.1040404@v.loewis.de> Message-ID: Martin v. L?wis wrote: > Am 25.06.2010 18:18, schrieb Barry Warsaw: >> Benjamin is still planning to release Python 2.7 final on 2010-07-03, so it's >> time for me to work out the release schedule for Python 2.6.6 - likely the >> last maintenance release for Python 2.6. >> >> Because summer schedules are crazy, and I want to leave two weeks between >> 2.6.6 rc1 and 2.6.6 final, my current schedule looks like: >> >> * Python 2.6.6 rc 1 on Monday 2010-08-02 >> * Python 2.6.6 final on Monday 2010-08-16 > > That would barely work for me. If schedule slips in any way, we'll have > to move the release into end-of-September (but the days as proposed are > fine). > > Regards, > Martin A six-week slippage wouldn't be good. What's the relevant chaos theory when a one- or two-day hold leads to a six-week delivery slippage? Let's hope things don't slip! regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Sat Jun 26 01:00:19 2010 From: steve at holdenweb.com (Steve Holden) Date: Fri, 25 Jun 2010 19:00:19 -0400 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <4C251133.2090505@v.loewis.de> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> <4C251133.2090505@v.loewis.de> Message-ID: <4C253503.6080300@holdenweb.com> Martin v. L?wis wrote: >>>> I am extremely keen for this to happen. Does anyone have ownership of this >>>> project? There was some discussion of it up-list but the discussion fizzled. >>> Can you please explain what "this project" is, in the context of your >>> message? GSoC? GHOP? >> Oh, I thought this was quite clear. I was specifically meaning the large >> "Python 2 or 3" button on python.org. It would help users who want to know >> what version of python to use if they had a clear guide as to what version >> to download. > > Ah, ok. No, nobody has taken ownership of that project, and likely, > nobody actually will - unless you volunteer. > Or perhaps spur the pydotorg community on with some well-placed encouragement. Nobody ever seems to say "thanks" to those guys except the jobs posters - *they* seem pretty happy. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Sat Jun 26 00:55:16 2010 From: steve at holdenweb.com (Steve Holden) Date: Fri, 25 Jun 2010 18:55:16 -0400 Subject: [Python-Dev] Schedule for Python 2.6.6 In-Reply-To: <4C251CA7.3070902@v.loewis.de> References: <20100625121847.60331d9e@heresy> <4C2512A2.1040404@v.loewis.de> <20100625170322.5ece724f@heresy> <4C251CA7.3070902@v.loewis.de> Message-ID: Martin v. L?wis wrote: >> Would that be bad or good (slipping into September)? I'd like to get a >> release out as soon after 2.7 final as possible, but it's an entirely >> self-imposed deadline. There's no reason why we can't push the whole 2.6.6 >> thing later if that works better for you. OTOH, I can't go much earlier so if >> September is bad for you, then we'll stick to the above dates. > > I think we can strive for your original proposal. If it slips, we let it > slip by a month or two. > > Regards, > Martin I suppose for 2..6. it's not really critical. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Sat Jun 26 01:00:19 2010 From: steve at holdenweb.com (Steve Holden) Date: Fri, 25 Jun 2010 19:00:19 -0400 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <4C251133.2090505@v.loewis.de> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> <4C251133.2090505@v.loewis.de> Message-ID: <4C253503.6080300@holdenweb.com> Martin v. L?wis wrote: >>>> I am extremely keen for this to happen. Does anyone have ownership of this >>>> project? There was some discussion of it up-list but the discussion fizzled. >>> Can you please explain what "this project" is, in the context of your >>> message? GSoC? GHOP? >> Oh, I thought this was quite clear. I was specifically meaning the large >> "Python 2 or 3" button on python.org. It would help users who want to know >> what version of python to use if they had a clear guide as to what version >> to download. > > Ah, ok. No, nobody has taken ownership of that project, and likely, > nobody actually will - unless you volunteer. > Or perhaps spur the pydotorg community on with some well-placed encouragement. Nobody ever seems to say "thanks" to those guys except the jobs posters - *they* seem pretty happy. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From benjamin at python.org Sat Jun 26 01:23:02 2010 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 25 Jun 2010 18:23:02 -0500 Subject: [Python-Dev] Signs of neglect? In-Reply-To: References: Message-ID: 2010/6/25 Steve Holden : > I was pretty stunned when I tried this. Remember that the Tools > subdirectory is distributed with Windows, so this means we got through > almost two releases without anyone realizing that 2to3 does not appear > to have touched this code. I would call it more a sign of no tests rather than one of neglect and perhaps also an indication of the usefulness of those tools. > > Yes, I have: http://bugs.python.org/issue9083 > > When's 3.2 due out? PEP 392. -- Regards, Benjamin From fijall at gmail.com Sat Jun 26 01:27:52 2010 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 25 Jun 2010 17:27:52 -0600 Subject: [Python-Dev] PyPy 1.3 released Message-ID: ======================= PyPy 1.3: Stabilization ======================= Hello. We're please to announce release of PyPy 1.3. This release has two major improvements. First of all, we stabilized the JIT compiler since 1.2 release, answered user issues, fixed bugs, and generally improved speed. We're also pleased to announce alpha support for loading CPython extension modules written in C. While the main purpose of this release is increased stability, this feature is in alpha stage and it is not yet suited for production environments. Highlights of this release ========================== * We introduced support for CPython extension modules written in C. As of now, this support is in alpha, and it's very unlikely unaltered C extensions will work out of the box, due to missing functions or refcounting details. The support is disable by default, so you have to do:: import cpyext before trying to import any .so file. Also, libraries are source-compatible and not binary-compatible. That means you need to recompile binaries, using for example:: python setup.py build Details may vary, depending on your build system. Make sure you include the above line at the beginning of setup.py or put it in your PYTHONSTARTUP. This is alpha feature. It'll likely segfault. You have been warned! * JIT bugfixes. A lot of bugs reported for the JIT have been fixed, and its stability greatly improved since 1.2 release. * Various small improvements have been added to the JIT code, as well as a great speedup of compiling time. Cheers, Maciej Fijalkowski, Armin Rigo, Alex Gaynor, Amaury Forgeot d'Arc and the PyPy team From ncoghlan at gmail.com Sat Jun 26 02:19:51 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 26 Jun 2010 10:19:51 +1000 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: On Sat, Jun 26, 2010 at 6:12 AM, James Y Knight wrote: > However, then you have to also consider python packages made up of multiple > distro packages -- like twisted or zope. Twisted includes some C extensions > in the core package. But then there are other twisted modules (installed > under a "twisted.foo" name) which do not include C extensions. If the base > twisted package is installed under a version-specific directory, then all of > the submodule packages need to also be installed under the same > version-specific directory (and thus built for all versions). > > In the past, it has proven somewhat tricky to coordinate which directory the > modules for package "foo" should be installed in, because you need to know > whether *any* of the related packages includes a native ".so" file, not just > the current package. > > The converse situation, where a base package did *not* get installed into a > version-specific directory because it includes no native code, but a > submodule *does* include a ".so" file, is even trickier. I think there are two major ways to tackle this: - allow multiple versions of a .so file within a single directory (i.e Barry's current suggestion) - enhanced namespace packages, allowing a single package to be spread across multiple directories, some of which may be Python version specific (i.e. modifications to PEP 382 to support references to version-specific directories) I think a new PEP is definitely in order, especially to explain why enhancing PEP 382 to support saying "look over here for the .so files for this version" isn't a preferable approach. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at thorne.id.au Sat Jun 26 02:41:34 2010 From: stephen at thorne.id.au (Stephen Thorne) Date: Sat, 26 Jun 2010 10:41:34 +1000 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <4C251C4D.50806@v.loewis.de> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> <4C251133.2090505@v.loewis.de> <4C251A38.3090205@voidspace.org.uk> <4C251C4D.50806@v.loewis.de> Message-ID: <20100626004134.GB16084@thorne.id.au> On 2010-06-25, "Martin v. L?wis" wrote: > > What page were we suggesting linking to? > > I don't think anybody proposed anything specific. Steve Holden > suggested it should go to "reasoned discussion of the > pros and cons as evinced in this thread". Stephen Thorne didn't > propose anything specific but to have a large button. I didn't propose anything, I heard a good idea that I'd like to see followed through. -- Regards, Stephen Thorne From martin at v.loewis.de Sat Jun 26 02:49:49 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 26 Jun 2010 02:49:49 +0200 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <20100626004134.GB16084@thorne.id.au> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> <4C251133.2090505@v.loewis.de> <4C251A38.3090205@voidspace.org.uk> <4C251C4D.50806@v.loewis.de> <20100626004134.GB16084@thorne.id.au> Message-ID: <4C254EAD.4060006@v.loewis.de> Am 26.06.2010 02:41, schrieb Stephen Thorne: > On 2010-06-25, "Martin v. L?wis" wrote: >>> What page were we suggesting linking to? >> >> I don't think anybody proposed anything specific. Steve Holden >> suggested it should go to "reasoned discussion of the >> pros and cons as evinced in this thread". Stephen Thorne didn't >> propose anything specific but to have a large button. > > I didn't propose anything, I heard a good idea that I'd like to see followed > through. Ah, ok. I thought "I am extremely keen for this to happen" indicated that you would be willing to volunteer time to make it happen. Regards, Martin From ncoghlan at gmail.com Sat Jun 26 04:59:31 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 26 Jun 2010 12:59:31 +1000 Subject: [Python-Dev] Signs of neglect? In-Reply-To: References: Message-ID: On Sat, Jun 26, 2010 at 9:23 AM, Benjamin Peterson wrote: > 2010/6/25 Steve Holden : > I would call it more a sign of no tests rather than one of neglect and > perhaps also an indication of the usefulness of those tools. Less than useful tools with no tests probably qualify as neglected... An assessment of the contents of the Py3k tools directory is probably in order, with at least a basic "will it run?" check added for those we decide to keep.. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Sat Jun 26 05:42:25 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 26 Jun 2010 12:42:25 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100625222722.594D23A4099@sparrow.telecommunity.com> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> Message-ID: <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> P.J. Eby writes: > it's just that if you already have the bytes, and all you want to > do is tag them (e.g. the WSGI headers case), the extra encoding > step seems pointless. Well, I'll have to concede that unless and until I get involved in the WSGI development effort. > >But with your architecture, it seems to me that you actually don't > >want polymorphic functions in the stdlib. You want the stdlib > >functions to be bytes-oriented if and only if they are reliable. (This > >is what I was saying to Guido elsewhere.) > > I'm not sure I follow you. What I'm saying here is that if bytes are the signal of validity, and the stdlib functions preserve validity, then it's better to have the stdlib functions object to unicode data as an argument. Compare the alternative: it returns a unicode object which might get passed around for a while before one of your functions receives it and identifies it as unvalidated data. But you agree that there are better mechanisms for validation (although not available in Python yet), so I don't see this as an potential obstacle to polymorphism now. > What I want is for the stdlib to create stringlike objects of a > type determined by the types of the inputs -- In general this is a hard problem, though. Polymorphism, OK, one-way tainting OK, but in general combining related types is pretty arbitrary, and as in the encoded-bytes case, the result type often varies depending on expectations of callers, not the types of the data. From greg.ewing at canterbury.ac.nz Sat Jun 26 09:58:17 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 26 Jun 2010 19:58:17 +1200 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> Message-ID: <4C25B319.8040804@canterbury.ac.nz> Tres Seaver wrote: > I do know for a fact that using a UCS2-compiled Python instead of the > system's UCS4-compiled Python leads to measurable, noticable drop in > memory consumption of long-running webserver processes using Unicode Would there be any sanity in having an option to compile Python with UTF-8 as the internal string representation? -- Greg From stefan_ml at behnel.de Sat Jun 26 11:34:56 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 26 Jun 2010 11:34:56 +0200 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> Message-ID: Ian Bicking, 26.06.2010 00:26: > On Fri, Jun 25, 2010 at 4:02 PM, Guido van Rossum wrote: >> On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz >>> I'd like a version of 'decode' which would give me a type that was, in >> every >>> respect, unicode, and responded to all protocols exactly as other >>> unicode objects (or "str objects", if you prefer py3 nomenclature ;-)) >> do, >>> but wouldn't actually copy any of that memory unless it really needed to >>> (for example, to pass to a C API that expected native wide characters), >> and >>> that would hold on to the original bytes so that it could produce them on >>> demand if encoded to the same encoding again. So, as others in this >> thread >>> have mentioned, the 'ABC' really implies some stuff about C APIs as well. Well, there's the buffer API, so you can already create something that refers to an existing C buffer. However, with respect to a string, you will have to make sure the underlying buffer doesn't get freed while the string is still in use. That will be hard and sometimes impossible to do at the C-API level, even if the string is allowed to keep a reference to something that holds the buffer. At least in lxml, such a feature would be completely worthless, as text is never held by any ref-counted Python wrapper object. It's only part of the XML tree, which is allowed to change at (more or less) any time, so the underlying char* buffer could just get freed without further notice. Adding a guard against that would likely have a larger impact on the performance than the decoding operations. >>> I'm not sure about the exact performance impact of such a class, which is >>> why I'd like the ability to implement it *outside* of the stdlib and see >> how >>> it works on a project, and return with a proposal along with some data. >>> There are also different ways to implement this, and other optimizations >>> (like ropes) which might be better. >>> You can almost do this today, but the lack of things like the >> hypothetical >>> "__rcontains__" does make it impossible to be totally transparent about >> it. >> >> But you'd still have to validate it, right? You wouldn't want to go on >> using what you thought was wrapped UTF-8 if it wasn't actually valid >> UTF-8 (or you'd be worse off than in Python 2). So you're really just >> worried about space consumption. I'd like to see a lot of hard memory >> profiling data before I got overly worried about that. > > It wasn't my profiling, but I seem to recall that Fredrik Lundh specifically > benchmarked ElementTree with all-unicode and sometimes-ascii-bytes, and > found that using Python 2 strs in some cases provided notable advantages. I > know Stefan copied ElementTree in this regard in lxml, maybe he also did a > benchmark or knows of one? Actually, bytes vs. unicode doesn't make that a big difference in Py2 for lxml. ElementTree is a lot older, so I guess it made a larger difference when its code was written (and I even think I recall seeing numbers for lxml where it seemed to make a notable difference). In lxml, text content is stored in the C tree of libxml2 as UTF-8 encoded char* text. On request, lxml creates a string object from it and returns it. In Py2, it checks for plain ASCII content first and returns a byte string for that. Only non-ASCII strings are returned as decoded unicode strings. In Py3, it always returns unicode strings. When I run a little benchmark on lxml in Py2.6.5 that just reads some short text content from an Element object, I only see a tiny difference between unicode strings and byte strings. The gap obviously increases when the text gets longer, e.g. when I serialise the complete text content of an XML document to either a byte string or a unicode string. But even for documents in the megabyte range we are still talking about single milliseconds here, and the difference stays well below 10%. It's seriously hard to make that the performance bottleneck in an XML application. Also, since the string objects are only instantiated at request, memory isn't an issue either. That's different for (c)ElementTree again, where string content is stored as Python objects. Four times the size even for plain ASCII strings (e.g. numbers, IDs or even trailing whitespace!) can well become a problem there, and can easily dominate the overall size of the in-memory tree. Plain ASCII content is surprisingly common in XML documents. Stefan From stefan_ml at behnel.de Sat Jun 26 11:41:48 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 26 Jun 2010 11:41:48 +0200 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <4C25B319.8040804@canterbury.ac.nz> References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <4C25B319.8040804@canterbury.ac.nz> Message-ID: Greg Ewing, 26.06.2010 09:58: > Tres Seaver wrote: > >> I do know for a fact that using a UCS2-compiled Python instead of the >> system's UCS4-compiled Python leads to measurable, noticable drop in >> memory consumption of long-running webserver processes using Unicode > > Would there be any sanity in having an option to compile > Python with UTF-8 as the internal string representation? It would break Py_UNICODE, because the internal size of a unicode character would no longer be fixed. Stefan From steve at holdenweb.com Sat Jun 26 13:18:37 2010 From: steve at holdenweb.com (Steve Holden) Date: Sat, 26 Jun 2010 07:18:37 -0400 Subject: [Python-Dev] Signs of neglect? In-Reply-To: References: Message-ID: <4C25E20D.2040007@holdenweb.com> Nick Coghlan wrote: > On Sat, Jun 26, 2010 at 9:23 AM, Benjamin Peterson wrote: >> 2010/6/25 Steve Holden : >> I would call it more a sign of no tests rather than one of neglect and >> perhaps also an indication of the usefulness of those tools. > > Less than useful tools with no tests probably qualify as neglected... > > An assessment of the contents of the Py3k tools directory is probably > in order, with at least a basic "will it run?" check added for those > we decide to keep.. > Neither webchecker nor wcgui.py will run - the former breaks because sgmllib is mossing, the latter because it uses the wrong name for "tkinter" (but overcoming this will throw it bak to an sgmllib dependency too). Guido thinks it's OK to abandon at least some of them, so I don't see the rest getting much love in the future. They do need sorting through - I don't see anyone wanting xxci.py, for example ("check in files for which rcsdiff returns nonzero exit status"). But I'm grateful you agree with my diagnosis of neglect (not that a diagnosis in itself is going to help in fixing things). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From nagle at animats.com Sat Jun 26 08:11:49 2010 From: nagle at animats.com (John Nagle) Date: Fri, 25 Jun 2010 23:11:49 -0700 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL Message-ID: <4C259A25.1060705@animats.com> We have just released a proof-of-concept implementation of a new approach to thread management - "newthreading". It is available for download at https://sourceforge.net/projects/newthreading/ The user's guide is at http://www.animats.com/papers/languages/newthreadingintro.html This is a pure Python implementation of synchronized objects, along with a set of restrictions which make programs race-condition free, even without a Global Interpreter Lock. The basic idea is that classes derived from SynchronizedObject are automatically locked at entry and unlocked at exit. They're also unlocked when a thread blocks within the class. So at no time can two threads be active in such a class at one time. In addition, only "frozen" objects can be passed in and out of synchronized objects. (This is somewhat like the multiprocessing module, where you can only pass objects that can be "pickled". But it's not as restrictive; multiple threads can access the same synchronized object, one at a time. This pure Python implementation is usable, but does not improve performance. It's a proof of concept implementation so that programmers can try out synchronized classes and see what it's like to work within those restrictions. The semantics of Python don't change for single-thread programs. But when the program forks off the first new thread, the rules change, and some of the dynamic features of Python are disabled. Some of the ideas are borrowed from Java, and some are from "safethreading". The point is to come up with a set of liveable restrictions which would allow getting rid of the GIL. This is becoming essential as Unladen Swallow starts to work and the number of processors per machine keeps climbing. This may in time become a Python Enhancement Proposal. We'd like to get some experience with it first. Try it out and report back. The SourceForge forum for the project is the best place to report problems. John Nagle From arigo at tunes.org Sat Jun 26 10:34:57 2010 From: arigo at tunes.org (Armin Rigo) Date: Sat, 26 Jun 2010 10:34:57 +0200 Subject: [Python-Dev] [pypy-dev] PyPy 1.3 released In-Reply-To: References: Message-ID: <20100626083457.GA14816@code0.codespeak.net> Hi, On Fri, Jun 25, 2010 at 05:27:52PM -0600, Maciej Fijalkowski wrote: > python setup.py build As corrected on the blog (http://morepypy.blogspot.com/), this line should read: pypy setup.py build Armin. From fuzzyman at voidspace.org.uk Sat Jun 26 15:29:24 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 26 Jun 2010 14:29:24 +0100 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL In-Reply-To: <4C259A25.1060705@animats.com> References: <4C259A25.1060705@animats.com> Message-ID: <4C2600B4.5020503@voidspace.org.uk> On 26/06/2010 07:11, John Nagle wrote: > We have just released a proof-of-concept implementation of a new > approach to thread management - "newthreading". It is available > for download at > > https://sourceforge.net/projects/newthreading/ > > The user's guide is at > > http://www.animats.com/papers/languages/newthreadingintro.html The user guide says: The suggested import is from newthreading import * The import * form is considered bad practise in *general* and should not be recommended unless there is a good reason. This is slightly off-topic for python-dev, although I appreciate that you want feedback with the eventual goal of producing a PEP - however the introduction of free-threading in Python has not been hampered by lack of synchronization primitives but by the difficulty of changing the interpreter without unduly impacting single threaded code. Providing an alternative garbage collection mechanism other than reference counting would be a more interesting first-step as far as I can see, as that removes the locking required around every access to an object (which currently touches the reference count). Introducing free-threading by *changing* the threading semantics (so you can't share non-frozen objects between threads) would not be acceptable. That comment is likely to be based on a misunderstanding of your future intentions though. :-) All the best, Michael Foord > > This is a pure Python implementation of synchronized objects, along > with a set of restrictions which make programs race-condition free, > even without a Global Interpreter Lock. The basic idea is that > classes derived from SynchronizedObject are automatically locked > at entry and unlocked at exit. They're also unlocked when a thread > blocks within the class. So at no time can two threads be active > in such a class at one time. > > In addition, only "frozen" objects can be passed in and out of > synchronized objects. (This is somewhat like the multiprocessing > module, where you can only pass objects that can be "pickled". > But it's not as restrictive; multiple threads can access the > same synchronized object, one at a time. > > This pure Python implementation is usable, but does not improve > performance. It's a proof of concept implementation so that > programmers can try out synchronized classes and see what it's > like to work within those restrictions. > > The semantics of Python don't change for single-thread programs. > But when the program forks off the first new thread, the rules > change, and some of the dynamic features of Python are disabled. > > Some of the ideas are borrowed from Java, and some are from > "safethreading". The point is to come up with a set of liveable > restrictions which would allow getting rid of the GIL. This > is becoming essential as Unladen Swallow starts to work and the > number of processors per machine keeps climbing. > > This may in time become a Python Enhancement Proposal. We'd like > to get some experience with it first. Try it out and report back. > The SourceForge forum for the project is the best place to report > problems. > > John Nagle > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From jnoller at gmail.com Sat Jun 26 16:28:50 2010 From: jnoller at gmail.com (Jesse Noller) Date: Sat, 26 Jun 2010 10:28:50 -0400 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL In-Reply-To: <4C2600B4.5020503@voidspace.org.uk> References: <4C259A25.1060705@animats.com> <4C2600B4.5020503@voidspace.org.uk> Message-ID: On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord wrote: > On 26/06/2010 07:11, John Nagle wrote: >> >> We have just released a proof-of-concept implementation of a new >> approach to thread management - "newthreading". It is available >> for download at >> >> https://sourceforge.net/projects/newthreading/ >> >> The user's guide is at >> >> http://www.animats.com/papers/languages/newthreadingintro.html > > The user guide says: > > The suggested import is > > from newthreading import * > > The import * form is considered bad practise in *general* and should not be > recommended unless there is a good reason. This is slightly off-topic for > python-dev, although I appreciate that you want feedback with the eventual > goal of producing a PEP - however the introduction of free-threading in > Python has not been hampered by lack of synchronization primitives but by > the difficulty of changing the interpreter without unduly impacting single > threaded code. > I asked John to drop a message here for this project - so feel free to flame me if anyone. This *is* relevant, and I'd guess fairly interesting to the group as a whole. jesse From solipsis at pitrou.net Sat Jun 26 16:34:12 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 26 Jun 2010 16:34:12 +0200 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL References: <4C259A25.1060705@animats.com> <4C2600B4.5020503@voidspace.org.uk> Message-ID: <20100626163412.25b68be6@pitrou.net> On Sat, 26 Jun 2010 14:29:24 +0100 Michael Foord wrote: > > the introduction of > free-threading in Python has not been hampered by lack of > synchronization primitives but by the difficulty of changing the > interpreter without unduly impacting single threaded code. Exactly what I think too. cheers Antoine. From jnoller at gmail.com Sat Jun 26 16:44:15 2010 From: jnoller at gmail.com (Jesse Noller) Date: Sat, 26 Jun 2010 10:44:15 -0400 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL In-Reply-To: <4C2600B4.5020503@voidspace.org.uk> References: <4C259A25.1060705@animats.com> <4C2600B4.5020503@voidspace.org.uk> Message-ID: On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord wrote: > On 26/06/2010 07:11, John Nagle wrote: >> >> We have just released a proof-of-concept implementation of a new >> approach to thread management - "newthreading". It is available >> for download at >> >> https://sourceforge.net/projects/newthreading/ >> >> The user's guide is at >> >> http://www.animats.com/papers/languages/newthreadingintro.html > > The user guide says: > > The suggested import is > > from newthreading import * > > The import * form is considered bad practise in *general* and should not be > recommended unless there is a good reason. This is slightly off-topic for > python-dev, although I appreciate that you want feedback with the eventual > goal of producing a PEP - however the introduction of free-threading in > Python has not been hampered by lack of synchronization primitives but by > the difficulty of changing the interpreter without unduly impacting single > threaded code. > > Providing an alternative garbage collection mechanism other than reference > counting would be a more interesting first-step as far as I can see, as that > removes the locking required around every access to an object (which > currently touches the reference count). Introducing free-threading by > *changing* the threading semantics (so you can't share non-frozen objects > between threads) would not be acceptable. That comment is likely to be based > on a misunderstanding of your future intentions though. :-) > > All the best, > > Michael Foord I'd also like to point out, that one of the project John cites is Adam Olsen's Safethread work: http://code.google.com/p/python-safethread/ Which, in and of itself is a good read. From stephen at xemacs.org Sat Jun 26 19:24:50 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 27 Jun 2010 02:24:50 +0900 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <4C25B319.8040804@canterbury.ac.nz> References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <4C25B319.8040804@canterbury.ac.nz> Message-ID: <87d3vdn1ul.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > Would there be any sanity in having an option to compile > Python with UTF-8 as the internal string representation? Losing Py_UNICODE as mentioned by Stefan Behnel (IIRC) is just the beginning of the pain. If Emacs's experience is any guide, the cost in speed and complexity of a variable-width internal representation is high. There are a number of tricks you can use, but basically everything becomes O(n) for the natural implementation of most operations (such as indexing by character). You can get around that with a position cache, of course, but that adds complexity, and really cuts into the space saving (and worse, adds another chunk that may or may not be paged in when you need it). What we're considering is a system where buffers come in 1-, 2-, and 4-octet widechars, with automatic translation depending on content. But the buffer is the primary random-access structure in Emacsen, so optimizing it is probably worth our effort. I doubt it would be worth it for Python, but my intuitions here are not reliable. From nagle at animats.com Sat Jun 26 18:39:19 2010 From: nagle at animats.com (John Nagle) Date: Sat, 26 Jun 2010 09:39:19 -0700 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL In-Reply-To: References: <4C259A25.1060705@animats.com> <4C2600B4.5020503@voidspace.org.uk> Message-ID: <4C262D37.7020807@animats.com> On 6/26/2010 7:44 AM, Jesse Noller wrote: > On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord > wrote: >> On 26/06/2010 07:11, John Nagle wrote: >>> >>> We have just released a proof-of-concept implementation of a new >>> approach to thread management - "newthreading". .... >> The import * form is considered bad practise in *general* and >> should not be recommended unless there is a good reason. I agree. I just did that to make the examples cleaner. >> however the introduction of free-threading in Python has not been >> hampered by lack of synchronization primitives but by the >> difficulty of changing the interpreter without unduly impacting >> single threaded code. That's what I'm trying to address here. >> Providing an alternative garbage collection mechanism other than >> reference counting would be a more interesting first-step as far as >> I can see, as that removes the locking required around every access >> to an object (which currently touches the reference count). >> Introducing free-threading by *changing* the threading semantics >> (so you can't share non-frozen objects between threads) would not >> be acceptable. That comment is likely to be based on a >> misunderstanding of your future intentions though. :-) This work comes out of a discussion a few of us had at a restaurant in Palo Alto after a Stanford talk by the group at Facebook which is building a JIT compiler for PHP. We were discussing how to make threading both safe for the average programmer and efficient. Javascript and PHP don't have threads at all; Python has safe threading, but it's slow. C/C++/Java all have race condition problems, of course. The Facebook guy pointed out that you can't redefine a function dynamically in PHP, and they get a performance win in their JIT by exploiting this. I haven't gone into the memory model in enough detail in the technical paper. The memory model I envision for this has three memory zones: 1. Shared fully-immutable objects: primarily strings, numbers, and tuples, all of whose elements are fully immutable. These can be shared without locking, and reclaimed by a concurrent garbage collector like Boehm's. They have no destructors, so finalization is not an issue. 2. Local objects. These are managed as at present, and require no locking. These can either be thread-local, or local to a synchronized object. There are no links between local objects under different "ownership". Whether each thread and object has its own private heap, or whether there's a common heap with locks at the allocator is an implementation decision. 3. Shared mutable objects: mostly synchronized objects, but also immutable objects like tuples which contain references to objects that aren't fully immutable. These are the high-overhead objects, and require locking during reference count updates, or atomic reference count operations if supported by the hardware. The general idea is to minimize the number of objects in this zone. The zone of an object is determined when the object is created, and never changes. This is relatively simple to implement. Tuples (and frozensets, frozendicts, etc.) are normally zone 2 objects. Only "freeze" creates collections in zones 1 and 3. Synchronized objects are always created in zone 3. There are no difficult handoffs, where an object that was previously thread-local now has to be shared and has to acquire locks during the transition. Existing interlinked data structures, like parse trees and GUIs, are by default zone 2 objects, with the same semantics as at present. They can be placed inside a SynchronizedObject if desired, which makes them usable from multiple threads. That's optional; they're thread-local otherwise. The rationale behind "freezing" some of the language semantics when the program goes multi-thread comes from two sources - Adam Olsen's Safethread work, and the acceptance of the multiprocessing module. Olsen tried to retain all the dynamism of the language in a multithreaded environment, but locking all the underlying dictionaries was a boat-anchor on the whole system, and slowed things down so much that he abandoned the project. The Unladen Swallow documentation indicates that early thinking on the project was that Olsen's approach would allow getting rid of the GIL, but later notes indicate that no path to a GIL-free JIT system is currently in development. The multiprocessing module provides semantics similar to threading with "freezing". Data passed between processes is "frozen" by pickling. Processes can't modify each other's code. Restrictive though the multiprocessing module is, it appears to be useful. It is sometimes recommended as the Pythonic approach to multi-core CPUs. This is an indication that "freezing" is not unacceptable to the user community. Most of the real-world use cases for extreme dynamism involve events that happen during startup. Configuration files are read, modules are selectively included, functions are overridden, tables of references to functions are set up, regular expressions are compiled, and the code is brought into the appropriately configured state. Then the worker threads are started and the real work starts. The "newthreading" approach allows all that. After two decades of failed attempts remove the Global Interpreter Lock without making performance worse, it is perhaps time to take a harder look at scaleable threading semantics. John Nagle Animats From pje at telecommunity.com Sat Jun 26 20:17:44 2010 From: pje at telecommunity.com (P.J. Eby) Date: Sat, 26 Jun 2010 14:17:44 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100626181753.601473A4108@sparrow.telecommunity.com> At 12:42 PM 6/26/2010 +0900, Stephen J. Turnbull wrote: >What I'm saying here is that if bytes are the signal of validity, and >the stdlib functions preserve validity, then it's better to have the >stdlib functions object to unicode data as an argument. Compare the >alternative: it returns a unicode object which might get passed around >for a while before one of your functions receives it and identifies it >as unvalidated data. I still don't follow, since passing in bytes should return bytes. Returning unicode would be an error, in the case of a "polymorphic" function (per Guido). >But you agree that there are better mechanisms for validation >(although not available in Python yet), so I don't see this as an >potential obstacle to polymorphism now. Nope. I'm just saying that, given two bytestrings to url-join or path join or whatever, a polymorph should hand back a bytestring. This seems pretty uncontroversial. > > What I want is for the stdlib to create stringlike objects of a > > type determined by the types of the inputs -- > >In general this is a hard problem, though. Polymorphism, OK, one-way >tainting OK, but in general combining related types is pretty >arbitrary, and as in the encoded-bytes case, the result type often >varies depending on expectations of callers, not the types of the >data. But the caller can enforce those expectations by passing in arguments whose types do what they want in such cases, as long as the string literals used by the function don't get to override the relevant parts of the string protocol(s). The idea that I'm proposing is that the basic string and byte types should defer to "user-defined" string types for mixed type operations, so that polymorphism of string-manipulation functions is the *default* case, rather than a *special* case. This makes tainting easier to implement, as well as optimizing and other special cases (like my "source string w/file and line info", or a string with font/formatting attributes). >_______________________________________________ >Python-Dev mailing list >Python-Dev at python.org >http://mail.python.org/mailman/listinfo/python-dev >Unsubscribe: >http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com From doko at ubuntu.com Sat Jun 26 22:06:30 2010 From: doko at ubuntu.com (Matthias Klose) Date: Sat, 26 Jun 2010 22:06:30 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: <4C265DC6.4080600@ubuntu.com> On 25.06.2010 22:12, James Y Knight wrote: > > On Jun 25, 2010, at 4:53 AM, Scott Dial wrote: > >> On 6/24/2010 8:23 PM, James Y Knight wrote: >>> On Jun 24, 2010, at 5:53 PM, Scott Dial wrote: >>>> If the package has .so files that aren't compatible with other version >>>> of python, then what is the motivation for placing that in a shared >>>> location (since it can't actually be shared) >>> >>> Because python looks for .so files in the same place it looks for the >>> .py files of the same package. >> >> My suggestion was that a package that contains .so files should not be >> shared (e.g., the entire lxml package should be placed in a >> version-specific path). The motivation for this PEP was to simplify the >> installation python packages for distros; it was not to reduce the >> number of .py files on the disk. >> >> Placing .so files together does not simplify that install process in any >> way. You will still have to handle such packages in a special way. > > > This is a good point, but I think still falls short of a solution. For a > package like lxml, indeed you are correct. Since debian needs to build > it once per version, it could just put the entire package (.py files and > .so files) into a different per-python-version directory. This is what is currently done. This will increase the size of packages by duplicating the .py files, or you have to install the .py in a common location (irrelevant to sys.path), and provide (sym)links to the expected location. A "different per-python-version directory" also has the disadvantage that file conflicts between (distribution) packages cannot be detected. > However, then you have to also consider python packages made up of > multiple distro packages -- like twisted or zope. Twisted includes some > C extensions in the core package. But then there are other twisted > modules (installed under a "twisted.foo" name) which do not include C > extensions. If the base twisted package is installed under a > version-specific directory, then all of the submodule packages need to > also be installed under the same version-specific directory (and thus > built for all versions). > > In the past, it has proven somewhat tricky to coordinate which directory > the modules for package "foo" should be installed in, because you need > to know whether *any* of the related packages includes a native ".so" > file, not just the current package. > > The converse situation, where a base package did *not* get installed > into a version-specific directory because it includes no native code, > but a submodule *does* include a ".so" file, is even trickier. I don't think that installation into different locations based on the presence of extension will work. Should a location really change if an extension is added as an optimization? Splitting a (python) package into different installation locations should be avoided. Matthias From doko at ubuntu.com Sat Jun 26 22:14:54 2010 From: doko at ubuntu.com (Matthias Klose) Date: Sat, 26 Jun 2010 22:14:54 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: <4C265FBE.9070809@ubuntu.com> On 26.06.2010 02:19, Nick Coghlan wrote: > On Sat, Jun 26, 2010 at 6:12 AM, James Y Knight wrote: >> However, then you have to also consider python packages made up of multiple >> distro packages -- like twisted or zope. Twisted includes some C extensions >> in the core package. But then there are other twisted modules (installed >> under a "twisted.foo" name) which do not include C extensions. If the base >> twisted package is installed under a version-specific directory, then all of >> the submodule packages need to also be installed under the same >> version-specific directory (and thus built for all versions). >> >> In the past, it has proven somewhat tricky to coordinate which directory the >> modules for package "foo" should be installed in, because you need to know >> whether *any* of the related packages includes a native ".so" file, not just >> the current package. >> >> The converse situation, where a base package did *not* get installed into a >> version-specific directory because it includes no native code, but a >> submodule *does* include a ".so" file, is even trickier. > > I think there are two major ways to tackle this: > - allow multiple versions of a .so file within a single directory (i.e > Barry's current suggestion) we already do this, see the naming of the extensions of a python debug build on Windows. Several distributions (Debian, Fedora, Ubuntu) do use this as well to provide extensions for python debug builds. > - enhanced namespace packages, allowing a single package to be spread > across multiple directories, some of which may be Python version > specific (i.e. modifications to PEP 382 to support references to > version-specific directories) this is not what I want to use in a distribution. package management systems like rpm and dpkg do handle conflicts and replacements of files pretty well, having the same file in potentially different locations in the file system doesn't help detecting conflicts and duplicate packages. Matthias From doko at ubuntu.com Sat Jun 26 22:22:29 2010 From: doko at ubuntu.com (Matthias Klose) Date: Sat, 26 Jun 2010 22:22:29 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624164637.22fd9160@heresy> References: <20100624115048.4fd152e3@heresy> <20100624135119.00b9ac5c@heresy> <20100624142830.4c859faf@limelight.wooz.org> <20100624164637.22fd9160@heresy> Message-ID: <4C266185.7080509@ubuntu.com> On 24.06.2010 22:46, Barry Warsaw wrote: > On Jun 24, 2010, at 02:28 PM, Barry Warsaw wrote: > >> On Jun 24, 2010, at 01:00 PM, Benjamin Peterson wrote: >> >>> 2010/6/24 Barry Warsaw : >>>> On Jun 24, 2010, at 10:58 AM, Benjamin Peterson wrote: >>>> >>>>> 2010/6/24 Barry Warsaw : >>>>>> Please let me know what you think. I'm happy to just commit this to the >>>>>> py3k branch if there are no objections . I don't think a new PEP is >>>>>> in order, but an update to PEP 3147 might make sense. >>>>> >>>>> How will this interact with PEP 384 if that is implemented? >>>> I'm trying to come up with something that will work immediately while PEP 384 >>>> is being adopted. >>> >>> But how will modules specify that they support multiple ABIs then? >> >> I didn't understand, so asked Benjamin for clarification in IRC. >> >> barry: if python 3.3 will only load x.3.3.so, but x.3.2.so supports >> the stable abi, will it load it? [14:25] >> gutworth: thanks, now i get it :) [14:26] >> gutworth: i think it should, but it wouldn't under my scheme. let me >> think about it > > So, we could say that PEP 384 compliant extension modules would get written > without a version specifier. IOW, we'd treat foo.so as using the ABI. It > would then be up to the Python runtime to throw ImportErrors if in fact we > were loading a legacy, non-PEP 384 compliant extension. Is it realistic to never break the ABI? I would think of having the ABI encoded in the file name as well, and only bump the ABI if it does change. With the "versioned .so files" proposal an ABI bump is necessary with every python version, with PEP 384 the ABI bump will be decoupled from the python version. Matthias From doko at ubuntu.com Sat Jun 26 22:25:28 2010 From: doko at ubuntu.com (Matthias Klose) Date: Sat, 26 Jun 2010 22:25:28 +0200 Subject: [Python-Dev] FHS compliance of Python installation In-Reply-To: <876318lynt.fsf_-_@benfinney.id.au> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <876318lynt.fsf_-_@benfinney.id.au> Message-ID: <4C266238.2020107@ubuntu.com> On 25.06.2010 02:54, Ben Finney wrote: > James Y Knight writes: > >> Really, python should store the .py files in /usr/share/python/, the >> .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc >> files in /var/lib/python2.5- debug. But python doesn't work like that. > > +1 > > So who's going to draft the ?Filesystem Hierarchy Standard compliance? > PEP? :-) This has nothing to do with the FHS. The FHS talks about data, not code. From ctb at msu.edu Sat Jun 26 22:30:27 2010 From: ctb at msu.edu (C. Titus Brown) Date: Sat, 26 Jun 2010 13:30:27 -0700 Subject: [Python-Dev] FHS compliance of Python installation In-Reply-To: <4C266238.2020107@ubuntu.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <876318lynt.fsf_-_@benfinney.id.au> <4C266238.2020107@ubuntu.com> Message-ID: <20100626203024.GA19754@idyll.org> On Sat, Jun 26, 2010 at 10:25:28PM +0200, Matthias Klose wrote: > On 25.06.2010 02:54, Ben Finney wrote: >> James Y Knight writes: >> >>> Really, python should store the .py files in /usr/share/python/, the >>> .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc >>> files in /var/lib/python2.5- debug. But python doesn't work like that. >> >> +1 >> >> So who's going to draft the ???Filesystem Hierarchy Standard compliance??? >> PEP? :-) > > This has nothing to do with the FHS. The FHS talks about data, not code. Really? It has some guidelines here for object files, etc., at least as of 2004. http://www.pathname.com/fhs/pub/fhs-2.3.html A quick scan suggests /usr/lib is the right place to look: http://www.pathname.com/fhs/pub/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGANDPA cheers, --titus -- C. Titus Brown, ctb at msu.edu From doko at ubuntu.com Sat Jun 26 22:35:40 2010 From: doko at ubuntu.com (Matthias Klose) Date: Sat, 26 Jun 2010 22:35:40 +0200 Subject: [Python-Dev] FHS compliance of Python installation In-Reply-To: <20100626203024.GA19754@idyll.org> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <876318lynt.fsf_-_@benfinney.id.au> <4C266238.2020107@ubuntu.com> <20100626203024.GA19754@idyll.org> Message-ID: <4C26649C.1000507@ubuntu.com> On 26.06.2010 22:30, C. Titus Brown wrote: > On Sat, Jun 26, 2010 at 10:25:28PM +0200, Matthias Klose wrote: >> On 25.06.2010 02:54, Ben Finney wrote: >>> James Y Knight writes: >>> >>>> Really, python should store the .py files in /usr/share/python/, the >>>> .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc >>>> files in /var/lib/python2.5- debug. But python doesn't work like that. >>> >>> +1 >>> >>> So who's going to draft the ???Filesystem Hierarchy Standard compliance??? >>> PEP? :-) >> >> This has nothing to do with the FHS. The FHS talks about data, not code. > > Really? It has some guidelines here for object files, etc., at least as > of 2004. > > http://www.pathname.com/fhs/pub/fhs-2.3.html > > A quick scan suggests /usr/lib is the right place to look: > > http://www.pathname.com/fhs/pub/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGANDPA agreed for object files, but http://www.pathname.com/fhs/pub/fhs-2.3.html#USRSHAREARCHITECTUREINDEPENDENTDATA explicitely states "The /usr/share hierarchy is for all read-only architecture independent *data* files". From doko at ubuntu.com Sat Jun 26 22:45:54 2010 From: doko at ubuntu.com (Matthias Klose) Date: Sat, 26 Jun 2010 22:45:54 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: <4C266702.4010102@ubuntu.com> On 25.06.2010 20:58, Brett Cannon wrote: > On Fri, Jun 25, 2010 at 01:53, Scott Dial >> Placing .so files together does not simplify that install process in any >> way. You will still have to handle such packages in a special way. You >> must still compile the package multiple times for each relevant version >> of python (with special tagging that I imagine distutils can take care >> of) and, worse yet, you have created a more trick install than merely >> having multiple search paths (e.g., installing/uninstalling lxml for >> *one* version of python is actually more difficult in this scheme). > > This is meant to be used by distros in a programmatic fashion, so my > response is "so what?" Their package management system is going to > maintain the directory, not a person. You and I are not going to be > using this for anything. This is purely meant for Linux OS vendors > (maybe OS X) to manage their installs through their package software. > I honestly do not expect human beings to be mucking around with these > installs (and I suspect Barry doesn't either). Placing files for a distribution in a version-independent path does help distributions handling file conflicts, detecting duplicates and with moving files between different (distribution) packages. Having non-conflicting extension names is a schema which already is used on some platforms (debug builds on Windows). The question for me is, if just a renaming of the .so files is acceptable for upstream, or if distributors should implement this on their own, as something like: if ext_path.startswith('/usr/') and not ext_path.startswith('/usr/local/'): load_ext('foo.2.6.so') else: load_ext('foo.so') I fear this will cause issues when e.g. virtualenv environments start copying parts from the system installation instead of symlinking it. Matthias From bugtrack at roumenpetrov.info Sat Jun 26 22:40:07 2010 From: bugtrack at roumenpetrov.info (Roumen Petrov) Date: Sat, 26 Jun 2010 23:40:07 +0300 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: Message-ID: <4C2665A7.6080601@roumenpetrov.info> Brett Cannon wrote: > I finally realized why clang has not been silencing its warnings about > unused return values: I have -Wno-unused-value set in CFLAGS which > comes before OPT (which defines -Wall) as set in PY_CFLAGS in > Makefile.pre.in. > > I could obviously set OPT in my environment, but that would override > the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, > but the README says that's for stuff that tweak binary compatibility. > > So basically what I am asking is what environment variable should I > use? If CFLAGS is correct then does anyone have any issues if I change > the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes > after OPT? It is not important to me as flags set to BASECFLAGS, CFLAGS, OPT or EXTRA_CFLAGS will set makefile macros CFLAGS and after distribution python distutil will use them to build extension modules. So all variable are equal for builds. Also after configure without OPT variable set we could check what script select for build platform and to rerun configure with OPT+own_flags set on command line (! ;) ) . Roumen From foom at fuhm.net Sat Jun 26 23:10:42 2010 From: foom at fuhm.net (James Y Knight) Date: Sat, 26 Jun 2010 17:10:42 -0400 Subject: [Python-Dev] FHS compliance of Python installation In-Reply-To: <4C26649C.1000507@ubuntu.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <876318lynt.fsf_-_@benfinney.id.au> <4C266238.2020107@ubuntu.com> <20100626203024.GA19754@idyll.org> <4C26649C.1000507@ubuntu.com> Message-ID: On Jun 26, 2010, at 4:35 PM, Matthias Klose wrote: > On 26.06.2010 22:30, C. Titus Brown wrote: >> On Sat, Jun 26, 2010 at 10:25:28PM +0200, Matthias Klose wrote: >>> On 25.06.2010 02:54, Ben Finney wrote: >>>> James Y Knight writes: >>>> >>>>> Really, python should store the .py files in /usr/share/python/, >>>>> the >>>>> .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and >>>>> the .pyc >>>>> files in /var/lib/python2.5- debug. But python doesn't work like >>>>> that. >>>> >>>> +1 >>>> >>>> So who's going to draft the ???Filesystem Hierarchy Standard >>>> compliance??? >>>> PEP? :-) >>> >>> This has nothing to do with the FHS. The FHS talks about data, >>> not code. >> >> Really? It has some guidelines here for object files, etc., at >> least as >> of 2004. >> >> http://www.pathname.com/fhs/pub/fhs-2.3.html >> >> A quick scan suggests /usr/lib is the right place to look: >> >> http://www.pathname.com/fhs/pub/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGANDPA > > agreed for object files, but > http://www.pathname.com/fhs/pub/fhs-2.3.html#USRSHAREARCHITECTUREINDEPENDENTDATA > explicitely states "The /usr/share hierarchy is for all read-only > architecture independent *data* files". I always figured the "read-only architecture independent" bit was the important part there, and "code is data". Emacs's el files go into / usr/share/emacs, for instance. James From tjreedy at udel.edu Sun Jun 27 00:11:03 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 26 Jun 2010 18:11:03 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <51EFE211-DBCA-497E-9BC5-CC0D2256173E@twistedmatrix.com> References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <51EFE211-DBCA-497E-9BC5-CC0D2256173E@twistedmatrix.com> Message-ID: The several posts in this and other threads go me to think about text versus number computing (which I am more familiar with). For numbers, we have in Python three builtins, the general purpose ints and floats and the more specialized complex. Two other rational types can be imported for specialized uses. And then there are 3rd-party libraries like mpz and numpy with more number and array of number types. What makes these all potentially work together is the special method system, including, in particular, the rather complete set of __rxxx__ number methods. The latter allow non-commutative operations to be mixed either way and ease mixed commutative operations. For text, we have general purpose str and encoded bytes (and bytearry). I think these are sufficient for general use and I am not sure there should even be anything else in the stdlib. But I think it should be possible to experiment with and use specialized 3rd-party text classes just as one can with number classes. I can imagine that inter-operation, when appropriate, might work better with addition of a couple of missing __rxxx__ methods, such as the mentioned __rcontains__. Although adding such would affect the implementation of a core syntax feature, it would not affect syntax as such as seen by the user. -- Terry Jan Reedy From brett at python.org Sun Jun 27 00:30:43 2010 From: brett at python.org (Brett Cannon) Date: Sat, 26 Jun 2010 15:30:43 -0700 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: Message-ID: On Wed, Jun 23, 2010 at 14:53, Brett Cannon wrote: > I finally realized why clang has not been silencing its warnings about > unused return values: I have -Wno-unused-value set in CFLAGS which > comes before OPT (which defines -Wall) as set in PY_CFLAGS in > Makefile.pre.in. > > I could obviously set OPT in my environment, but that would override > the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, > but the README says that's for stuff that tweak binary compatibility. > > So basically what I am asking is what environment variable should I > use? If CFLAGS is correct then does anyone have any issues if I change > the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes > after OPT? > Since no one objected I swapped the order in r82259. In case anyone else uses clang to compile Python, this means that -Wno-unused-value will now work to silence the warning about unused return values that is caused by some macros. Probably using -Wno-empty-body is also good to avoid all the warnings triggered by the UCS4 macros in cjkcodecs. From scott+python-dev at scottdial.com Sun Jun 27 00:50:27 2010 From: scott+python-dev at scottdial.com (Scott Dial) Date: Sat, 26 Jun 2010 18:50:27 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C265DC6.4080600@ubuntu.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> <4C265DC6.4080600@ubuntu.com> Message-ID: <4C268433.30405@scottdial.com> On 6/26/2010 4:06 PM, Matthias Klose wrote: > On 25.06.2010 22:12, James Y Knight wrote: >> On Jun 25, 2010, at 4:53 AM, Scott Dial wrote: >>> Placing .so files together does not simplify that install process in any >>> way. You will still have to handle such packages in a special way. >> >> This is a good point, but I think still falls short of a solution. For a >> package like lxml, indeed you are correct. Since debian needs to build >> it once per version, it could just put the entire package (.py files and >> .so files) into a different per-python-version directory. > > This is what is currently done. This will increase the size of packages > by duplicating the .py files, or you have to install the .py in a common > location (irrelevant to sys.path), and provide (sym)links to the > expected location. "This is what is currently done" and "provide (sym)links to the expected location" are conflicting statements. If you are symlinking .py files from a shared location, then that is not the same as "just install the package into a version-specific location". What motivation is there for preferring symlinks? Who cares if a ditro package install yields duplicate .py files? Nor am I motivated by having to carry duplicate .py files in a distribution package (I imagine the compression of duplicate .py files is amazing). > A "different per-python-version directory" also has the disadvantage > that file conflicts between (distribution) packages cannot be detected. Why? That sounds like a broken tool, maybe I am naive, please explain. If two packages install /usr/lib/python2.6/foo.so that should be just as detectable two installing /usr/lib/python-shared/foo.cpython-26.so If you *must* compile .so files for every supported version of python at packaging time, then you are already saying the set of python versions is known. I fail to see the difference between a package that installs .py and .so files into many directories than having many .so files in a single directory; except that many directories *already* works. The only gain I can see is that you save duplicate .py files in the package and on the filesystem, and I don't feel that gain alone warrants this fundamental change. I would appreciate a proper explanation of why/how a single directory is better for your distribution. Also, I haven't heard anyone that wasn't using debian tools chime in with support for any of this, so I would like to know how this can help RPMs and ebuilds and the like. > I don't think that installation into different locations based on the > presence of extension will work. Should a location really change if an > extension is added as an optimization? Splitting a (python) package > into different installation locations should be avoided. I'm not sure why changing paths would matter; any package that writes data in its install location would be considered broken by your distro already, so what harm is there in having the packaging tool move it later? Your tool will remove the old path and place it in a new path. All of these shenanigans seem to manifest from your distro's python-support/-central design, which seems to be entirely motivated by reducing duplicate files and *not* simplifying the packaging. While this plan works rather well with .py files, the devil is in the details. I don't think Python should be getting involved in what I believe is a flawed design. What happens to the distro packaging if a python package splits the codebase between 2.x and 3.x (meaning they have distinct .py files)? As someone else mentioned, how is virtualenv going to interact with packages that install like this? -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From mal at egenix.com Sun Jun 27 01:37:02 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 27 Jun 2010 01:37:02 +0200 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: Message-ID: <4C268F1E.5070506@egenix.com> Brett Cannon wrote: > On Wed, Jun 23, 2010 at 14:53, Brett Cannon wrote: >> I finally realized why clang has not been silencing its warnings about >> unused return values: I have -Wno-unused-value set in CFLAGS which >> comes before OPT (which defines -Wall) as set in PY_CFLAGS in >> Makefile.pre.in. >> >> I could obviously set OPT in my environment, but that would override >> the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, >> but the README says that's for stuff that tweak binary compatibility. >> >> So basically what I am asking is what environment variable should I >> use? If CFLAGS is correct then does anyone have any issues if I change >> the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes >> after OPT? >> > > Since no one objected I swapped the order in r82259. In case anyone > else uses clang to compile Python, this means that -Wno-unused-value > will now work to silence the warning about unused return values that > is caused by some macros. Probably using -Wno-empty-body is also good > to avoid all the warnings triggered by the UCS4 macros in cjkcodecs. I think you need to come up with a different solution and revert the change... OPT has historically been the only variable to use for adjusting the Python C compiler settings. As the name implies this was usually used to adjust the optimizer settings, including raising the optimization level from the default or disabling it. With your change CFLAGS will always override OPT and thus any optimization definitions made in OPT will no longer have an effect. Note that CFLAGS defines -O2 on many platforms. In your particular case, you should try setting OPT to "... -Wno-unused-value ..." (ie. replace -Wall with your setting). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 27 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 21 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From brett at python.org Sun Jun 27 02:13:20 2010 From: brett at python.org (Brett Cannon) Date: Sat, 26 Jun 2010 17:13:20 -0700 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C268F1E.5070506@egenix.com> References: <4C268F1E.5070506@egenix.com> Message-ID: On Sat, Jun 26, 2010 at 16:37, M.-A. Lemburg wrote: > Brett Cannon wrote: >> On Wed, Jun 23, 2010 at 14:53, Brett Cannon wrote: >>> I finally realized why clang has not been silencing its warnings about >>> unused return values: I have -Wno-unused-value set in CFLAGS which >>> comes before OPT (which defines -Wall) as set in PY_CFLAGS in >>> Makefile.pre.in. >>> >>> I could obviously set OPT in my environment, but that would override >>> the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, >>> but the README says that's for stuff that tweak binary compatibility. >>> >>> So basically what I am asking is what environment variable should I >>> use? If CFLAGS is correct then does anyone have any issues if I change >>> the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes >>> after OPT? >>> >> >> Since no one objected I swapped the order in r82259. In case anyone >> else uses clang to compile Python, this means that -Wno-unused-value >> will now work to silence the warning about unused return values that >> is caused by some macros. Probably using -Wno-empty-body is also good >> to avoid all the warnings triggered by the UCS4 macros in cjkcodecs. > > I think you need to come up with a different solution and revert > the change... > > OPT has historically been the only variable to use for > adjusting the Python C compiler settings. Just found the relevant section in the README. > > As the name implies this was usually used to adjust the > optimizer settings, including raising the optimization level > from the default or disabling it. It meant optional to me, not optimization. I hate abbreviations sometimes. > > With your change CFLAGS will always override OPT and thus > any optimization definitions made in OPT will no longer > have an effect. That was the point; OPT defines defaults through configure.in and I simply wanted to add to those instead of having OPT completely overwritten by me. > > Note that CFLAGS defines -O2 on many platforms. So then wouldn't that mean they want that to be the optimization level? Or is the historical reason that default exists is so that some default exists but to expect the application to override as desired? > > In your particular case, you should try setting OPT to > "... -Wno-unused-value ..." (ie. replace -Wall with your > setting). So what is CFLAGS for then? ``configure -h`` says it's for "C compiler flags"; that's extremely ambiguous. And it doesn't help that OPT is not mentioned by ``configure -h`` as that is what I have always gone by to know what flags are available for compilation. -Brett > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source ?(#1, Jun 27 2010) >>>> Python/Zope Consulting and Support ... ? ? ? ?http://www.egenix.com/ >>>> mxODBC.Zope.Database.Adapter ... ? ? ? ? ? ? http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... ? ? ? ?http://python.egenix.com/ > ________________________________________________________________________ > 2010-07-19: EuroPython 2010, Birmingham, UK ? ? ? ? ? ? ? ?21 days to go > > ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: > > > ? eGenix.com Software, Skills and Services GmbH ?Pastor-Loeh-Str.48 > ? ?D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > ? ? ? ? ? Registered at Amtsgericht Duesseldorf: HRB 46611 > ? ? ? ? ? ? ? http://www.egenix.com/company/contact/ > From ncoghlan at gmail.com Sun Jun 27 04:43:23 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 Jun 2010 12:43:23 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100626181753.601473A4108@sparrow.telecommunity.com> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> Message-ID: On Sun, Jun 27, 2010 at 4:17 AM, P.J. Eby wrote: > The idea that I'm proposing is that the basic string and byte types should > defer to "user-defined" string types for mixed type operations, so that > polymorphism of string-manipulation functions is the *default* case, rather > than a *special* case. ?This makes tainting easier to implement, as well as > optimizing and other special cases (like my "source string w/file and line > info", or a string with font/formatting attributes). Rather than building this into the base string type, perhaps it would be better (at least initially) to add in a polymorphic str subtype that worked along the following lines: 1. Has an encoded argument in the constructor (e.g. poly_str("/", encoded=b"/") 2. If given objects with an encode() method, assumes they're strings and uses its own parent class methods 3. If given objects with a decode() method, assumes they're encoded and delegates to the encoded attribute str/bytes agnostic functions would need to invoke poly_str deliberately, while bytes-only and text-only algorithms could just use the appropriate literals. Third party types would be supported to some degree (by having either encode or decode methods), although they could still run into trouble with some operations (While full support for third party strings and byte sequence implementations is an interesting idea, I think it's overkill for the specific problem of making it easier to write str/bytes agnostic functions for tasks like URL parsing). Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jun 27 04:59:07 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 Jun 2010 12:59:07 +1000 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <51EFE211-DBCA-497E-9BC5-CC0D2256173E@twistedmatrix.com> Message-ID: On Sun, Jun 27, 2010 at 8:11 AM, Terry Reedy wrote: > I can imagine that inter-operation, when appropriate, might work better with > addition of a couple of ?missing __rxxx__ methods, such as the mentioned > __rcontains__. Although adding such would affect the implementation of a > core syntax feature, it would not affect syntax as such as seen by the user. The problem with strings isn't really the binary operations like __contains__ - adding __rcontains__ would be a fairly simple extrapolation of the existing approaches. Where it gets really messy for strings is the fact that whereas invoking named methods directly on numbers is rare, invoking them on strings is very common, and some of those methods (e.g. split(), join(), __mod__()) allow or require an iterable rather than a single object. This extends the range of use cases to be covered beyond those with syntactic support to potentially include all string methods that take arguments. Creating minimally surprising semantics for the methods which accept iterables is also rather challenging. It's an interesting idea, but I think it's overkill for the specific problem of making it easier to perform more text-like manipulations in a bytes-only domain. Cheers, NIck. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From pje at telecommunity.com Sun Jun 27 05:49:11 2010 From: pje at telecommunity.com (P.J. Eby) Date: Sat, 26 Jun 2010 23:49:11 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> Message-ID: <20100627034922.31A663A4108@sparrow.telecommunity.com> At 12:43 PM 6/27/2010 +1000, Nick Coghlan wrote: >While full support for third party strings and >byte sequence implementations is an interesting idea, I think it's >overkill for the specific problem of making it easier to write >str/bytes agnostic functions for tasks like URL parsing. OTOH, to write your partial implementation is almost as complex - it still must take into account joining and formatting, and so by that point, you've just proposed a new protocol for coercion... so why not just make the coercion protocol explicit in the first place, rather than hardwiring a third type's worth of special cases? Remember, bytes and strings already have to detect mixed-type operations. If there was an API for that, then the hardcoded special cases would just be replaced, or supplemented with type slot checks and calls after the special cases. To put it another way, if you already have two types special-casing their interactions with each other, then rather than add a *third* type to that mix, maybe it's time to have a protocol instead, so that the types that care can do the special-casing themselves, and you generalize to N user types. (Btw, those who are saying that the resulting potential for N*N interaction makes the feature unworkable seem to be overlooking metaclasses and custom numeric types -- two Python features that in principle have the exact same problem, when you use them beyond a certain scope. At least with those features, though, you can generally mix your user-defined metaclasses or numeric types with the Python-supplied basic ones and call arbitrary Python functions on them, without as much heartbreak as you'll get with a from-scratch stringlike object.) All that having been said, a new protocol probably falls under the heading of the language moratorium, unless it can be considered "new methods on builtins"? (But that seems like a stretch even to me.) I just hate the idea that functions taking strings should have to be *rewritten* to be explicitly type-agnostic. It seems *so* un-Pythonic... like if all the bitmasking functions you'd ever written using 32-bit int constants had to be rewritten just because we added longs to the language, and you had to upcast them to be compatible or something. Sounds too much like C or Java or some other non-Python language, where dynamism and polymorphy are the special case, instead of the general rule. From jyasskin at gmail.com Sun Jun 27 07:46:24 2010 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Sat, 26 Jun 2010 22:46:24 -0700 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C268F1E.5070506@egenix.com> References: <4C268F1E.5070506@egenix.com> Message-ID: On Sat, Jun 26, 2010 at 4:37 PM, M.-A. Lemburg wrote: > Brett Cannon wrote: >> On Wed, Jun 23, 2010 at 14:53, Brett Cannon wrote: >>> I finally realized why clang has not been silencing its warnings about >>> unused return values: I have -Wno-unused-value set in CFLAGS which >>> comes before OPT (which defines -Wall) as set in PY_CFLAGS in >>> Makefile.pre.in. >>> >>> I could obviously set OPT in my environment, but that would override >>> the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, >>> but the README says that's for stuff that tweak binary compatibility. >>> >>> So basically what I am asking is what environment variable should I >>> use? If CFLAGS is correct then does anyone have any issues if I change >>> the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes >>> after OPT? >>> >> >> Since no one objected I swapped the order in r82259. In case anyone >> else uses clang to compile Python, this means that -Wno-unused-value >> will now work to silence the warning about unused return values that >> is caused by some macros. Probably using -Wno-empty-body is also good >> to avoid all the warnings triggered by the UCS4 macros in cjkcodecs. > > I think you need to come up with a different solution and revert > the change... > > OPT has historically been the only variable to use for > adjusting the Python C compiler settings. > > As the name implies this was usually used to adjust the > optimizer settings, including raising the optimization level > from the default or disabling it. > > With your change CFLAGS will always override OPT and thus > any optimization definitions made in OPT will no longer > have an effect. > > Note that CFLAGS defines -O2 on many platforms. > > In your particular case, you should try setting OPT to > "... -Wno-unused-value ..." (ie. replace -Wall with your > setting). The python configure environment variables are really confused. If OPT is intended to be user-overridden for optimization settings, it shouldn't be used to set -Wall and -Wstrict-prototypes. If it's intended to set warning options, it shouldn't also set optimization options. Setting the user-visible customization option on the configure command line shouldn't stomp unrelated defaults. In configure-based systems, CFLAGS is traditionally (http://sources.redhat.com/automake/automake.html#Flag-Variables-Ordering) the way to tack options onto the end of the command line. Python breaks this by threading flags through CFLAGS in the makefile, which means they all get stomped if the user sets CFLAGS on the make command line. We should instead use another spelling ("CFlags"?) for the internal variable, and append $(CFLAGS) to it. AC_PROG_CC is the macro that sets CFLAGS to -g -O2 on gcc-based systems (http://www.gnu.org/software/hello/manual/autoconf/C-Compiler.html#index-AC_005fPROG_005fCC-842). If Python's configure.in sets an otherwise-empty CFLAGS to -g before calling AC_PROG_CC, AC_PROG_CC won't change it. Or we could just preserve the users CFLAGS setting across AC_PROG_CC regardless of whether it's set, to let the user set CFLAGS on the configure line without stomping any defaults. From ncoghlan at gmail.com Sun Jun 27 07:53:59 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 Jun 2010 15:53:59 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100627034922.31A663A4108@sparrow.telecommunity.com> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> <20100627034922.31A663A4108@sparrow.telecommunity.com> Message-ID: On Sun, Jun 27, 2010 at 1:49 PM, P.J. Eby wrote: > I just hate the idea that functions taking strings should have to be > *rewritten* to be explicitly type-agnostic. ?It seems *so* un-Pythonic... > ?like if all the bitmasking functions you'd ever written using 32-bit int > constants had to be rewritten just because we added longs to the language, > and you had to upcast them to be compatible or something. ?Sounds too much > like C or Java or some other non-Python language, where dynamism and > polymorphy are the special case, instead of the general rule. The difference is that we have three classes of algorithm here: - those that work only on octet sequences - those that work only on character sequences - those that can work on either Python 2 lumped all 3 classes of algorithm together through the multi-purpose 8-bit str type. The unicode type provided some scope to separate out the second category, but the divisions were rather blurry. Python 3 forces the first two to be separated by using either octets (bytes/bytearray) or characters (str). There are a *very small* number of APIs where it is appropriate to be polymorphic, but this is currently difficult due to the need to supply literals of the appropriate type for the objects being operated on. This isn't ever going to happen automagically due to the need to explicitly provide two literals (one for octet sequences, one for character sequences). The virtues of a separate poly_str type are that: 1. It can be simple and implemented in Python, dispatching to str or bytes as appropriate (probably in the strings module) 2. No chance of impacting the performance of the core interpreter (as builtins are not affected) 3. Lower impact if it turns out to have been a bad idea We could talk about this even longer, but the most effective way forward is going to be a patch that improves the URL parsing situation. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Sun Jun 27 11:10:59 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 27 Jun 2010 11:10:59 +0200 Subject: [Python-Dev] bytes / unicode References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> <20100627034922.31A663A4108@sparrow.telecommunity.com> Message-ID: <20100627111059.49cdb698@pitrou.net> On Sat, 26 Jun 2010 23:49:11 -0400 "P.J. Eby" wrote: > > Remember, bytes and strings already have to detect mixed-type > operations. Not in Python 3. They just raise a TypeError on bad ("mixed-type") arguments. Regards Antoine. From greg.ewing at canterbury.ac.nz Sun Jun 27 11:48:22 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 27 Jun 2010 21:48:22 +1200 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <4C25B319.8040804@canterbury.ac.nz> Message-ID: <4C271E66.5050902@canterbury.ac.nz> Stefan Behnel wrote: > Greg Ewing, 26.06.2010 09:58: > >> Would there be any sanity in having an option to compile >> Python with UTF-8 as the internal string representation? > > It would break Py_UNICODE, because the internal size of a unicode > character would no longer be fixed. It's not fixed anyway with the 2-char build -- some characters are represented using a pair of surrogates. -- Greg From g.brandl at gmx.net Sun Jun 27 11:41:56 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 27 Jun 2010 11:41:56 +0200 Subject: [Python-Dev] Adopt A Demo [was: Signs of neglect?] In-Reply-To: References: Message-ID: Am 26.06.2010 00:38, schrieb Steve Holden: > I was pretty stunned when I tried this. Remember that the Tools > subdirectory is distributed with Windows, so this means we got through > almost two releases without anyone realizing that 2to3 does not appear > to have touched this code. > > Yes, I have: http://bugs.python.org/issue9083 > > When's 3.2 due out? The alpha stage is beginning next week; still enough time to fix the Tools and Demos. I can do some of the work, however, if I have to do it all, I'll just throw out the majority of that stuff. So -- if every dev "adopted" a Tool or Demo, that would be quite a manageable piece of work, and maybe a few demos can be brought up to scratch instead of be deleted. I'll go ahead and promise to care for the "Demo/classes" subdir. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Sun Jun 27 11:44:31 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 27 Jun 2010 11:44:31 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: Am 22.06.2010 01:01, schrieb Terry Reedy: > On 6/21/2010 3:59 PM, Steve Holden wrote: >> Terry Reedy wrote: >>> On 6/21/2010 8:33 AM, Nick Coghlan wrote: >>> >>>> P.S. (We're going to have a tough decision to make somewhere along the >>>> line where docs.python.org is concerned, too - when do we flick the >>>> switch and make a 3.x version of the docs the default? >>> >>> Easy. When 3.2 is released. When 2.7 is released, 3.2 becomes 'trunk'. >>> Trunk released always take over docs.python.org. To do otherwise would >>> be to say that 3.2 is not a real trunk release and not yet ready for >>> real use -- a major slam. >>> >>> Actually, I thought this was already discussed and decided ;-). >>> >> This also gives the 2.7 release it's day in the sun before relegation to >> maintenance status. > > Every new version (except 3.0 and 3.1) has gone to maintenance status > *and* becomes the featured release on docs.python.org the day it was > released. 2.7 would just spend less time as the featured release on > that page. I'm not sure 3.2 should take over in December just yet. (There's also docs3.python.org that always lands at the latest 3.x documentation). However, there will be enough time to discuss this when 3.2 is actually about to be released. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From dickinsm at gmail.com Sun Jun 27 11:57:08 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 27 Jun 2010 10:57:08 +0100 Subject: [Python-Dev] Adopt A Demo [was: Signs of neglect?] In-Reply-To: References: Message-ID: On Sun, Jun 27, 2010 at 10:41 AM, Georg Brandl wrote: > So -- if every dev "adopted" a Tool or Demo, that would be quite a > manageable piece of work, and maybe a few demos can be brought up > to scratch instead of be deleted. > > I'll go ahead and promise to care for the "Demo/classes" subdir. Bagsy the Demo/parser subdirectory. Fixing up unparse.py looks like it could be fun. Mark From eric at trueblade.com Sun Jun 27 12:53:00 2010 From: eric at trueblade.com (Eric Smith) Date: Sun, 27 Jun 2010 06:53:00 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <4C271E66.5050902@canterbury.ac.nz> References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <4C25B319.8040804@canterbury.ac.nz> <4C271E66.5050902@canterbury.ac.nz> Message-ID: <4C272D8C.6010406@trueblade.com> On 6/27/2010 5:48 AM, Greg Ewing wrote: > Stefan Behnel wrote: >> Greg Ewing, 26.06.2010 09:58: >> >>> Would there be any sanity in having an option to compile >>> Python with UTF-8 as the internal string representation? >> >> It would break Py_UNICODE, because the internal size of a unicode >> character would no longer be fixed. > > It's not fixed anyway with the 2-char build -- some > characters are represented using a pair of surrogates. > But isn't this currently ignored everywhere in python's code? Eric. From stephen at xemacs.org Sun Jun 27 16:03:06 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 27 Jun 2010 23:03:06 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100626181753.601473A4108@sparrow.telecommunity.com> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> Message-ID: <877hlkmv39.fsf@uwakimon.sk.tsukuba.ac.jp> P.J. Eby writes: > At 12:42 PM 6/26/2010 +0900, Stephen J. Turnbull wrote: > >What I'm saying here is that if bytes are the signal of validity, and > >the stdlib functions preserve validity, then it's better to have the > >stdlib functions object to unicode data as an argument. Compare the > >alternative: it returns a unicode object which might get passed around > >for a while before one of your functions receives it and identifies it > >as unvalidated data. > > I still don't follow, OK, I give up, since it was your use case that concerned me. I obviously misunderstood. Sorry for the confusion. Sign me, +1 on polymorphic functions in Tsukuba Japan > >In general this is a hard problem, though. Polymorphism, OK, one-way > >tainting OK, but in general combining related types is pretty > >arbitrary, and as in the encoded-bytes case, the result type often > >varies depending on expectations of callers, not the types of the > >data. > > But the caller can enforce those expectations by passing in arguments > whose types do what they want in such cases, as long as the string > literals used by the function don't get to override the relevant > parts of the string protocol(s). This simply isn't true for encoded bytes as proposed. For encoded text, the current encoding has no deterministic relationship to the desired encoding (at the level of generality of the stdlib; of course in specific applications it may be mandated by a standard or private convention). I will have to pass on your other user-defined string types. I've never tried to implement one. I only wanted to point out that a user-controllable tainted string type would be preferable to confounding "unicode" with "tainted". From alexander.belopolsky at gmail.com Sun Jun 27 16:47:08 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 27 Jun 2010 10:47:08 -0400 Subject: [Python-Dev] Adopt A Demo [was: Signs of neglect?] In-Reply-To: References: Message-ID: On Sun, Jun 27, 2010 at 5:57 AM, Mark Dickinson wrote: > On Sun, Jun 27, 2010 at 10:41 AM, Georg Brandl wrote: >> So -- if every dev "adopted" a Tool or Demo, that would be quite a >> manageable piece of work, and maybe a few demos can be brought up >> to scratch instead of be deleted. >> >> I'll go ahead and promise to care for the "Demo/classes" subdir. > > Bagsy the Demo/parser subdirectory. ?Fixing up unparse.py looks like > it could be fun. I have a patch for pybench attached to a not so related issue at http://bugs.python.org/issue5180 . All it took was a 2to3 run and a one line change. Of course it need a review before it can go in, but I am surprised that something like pybench was not updated long time ago. Is it supposed to be single source? That would make sense given the nature of the tool. > > Mark > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/alexander.belopolsky%40gmail.com > From mal at egenix.com Sun Jun 27 18:33:53 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 27 Jun 2010 18:33:53 +0200 Subject: [Python-Dev] Adopt A Demo [was: Signs of neglect?] In-Reply-To: References: Message-ID: <4C277D71.1010802@egenix.com> Alexander Belopolsky wrote: > On Sun, Jun 27, 2010 at 5:57 AM, Mark Dickinson wrote: >> On Sun, Jun 27, 2010 at 10:41 AM, Georg Brandl wrote: >>> So -- if every dev "adopted" a Tool or Demo, that would be quite a >>> manageable piece of work, and maybe a few demos can be brought up >>> to scratch instead of be deleted. >>> >>> I'll go ahead and promise to care for the "Demo/classes" subdir. >> >> Bagsy the Demo/parser subdirectory. Fixing up unparse.py looks like >> it could be fun. > > I have a patch for pybench attached to a not so related issue at > http://bugs.python.org/issue5180 . All it took was a 2to3 run and a > one line change. Of course it need a review before it can go in, but > I am surprised that something like pybench was not updated long time > ago. Is it supposed to be single source? Yes, the idea was to keep the number of changes to a minimum and to have the Python3 version work with Python 2.6, 2.7 and 3.x. Antoine worked on that, AFAIR. The Python2 version of pybench needs to work with more than just Python 2.6 and 2.7 to be able to compare performance of the various releases back to version 2.3. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 27 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 21 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From pje at telecommunity.com Sun Jun 27 19:02:28 2010 From: pje at telecommunity.com (P.J. Eby) Date: Sun, 27 Jun 2010 13:02:28 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> <20100627034922.31A663A4108@sparrow.telecommunity.com> Message-ID: <20100627170805.1785F3A4099@sparrow.telecommunity.com> At 03:53 PM 6/27/2010 +1000, Nick Coghlan wrote: >We could talk about this even longer, but the most effective way >forward is going to be a patch that improves the URL parsing >situation. Certainly, it's the only practical solution for the immediate problems in 3.2. I only mentioned that I "hate the idea" because I'd be more comfortable if it was explicitly declared to be a temporary hack to work around the absence of a string coercion protocol, due to the moratorium on language changes. But, since the moratorium *is* in effect, I'll try to make this my last post on string protocols for a while... and maybe wait until I've looked at the code (str/bytes C implementations) in more detail and can make a more concrete proposal for what the protocol would be and how it would work. (Not to mention closer to the end of the moratorium.) >There are a *very small* number of APIs where it is appropriate to >be polymorphic This is only true if you focus exclusively on bytes vs. unicode, rather than the general issue that it's currently impractical to pass *any* sort of user-defined string type through code that you don't directly control (stdlib or third-party). >The virtues of a separate poly_str type are that: >1. It can be simple and implemented in Python, dispatching to str or >bytes as appropriate (probably in the strings module) >2. No chance of impacting the performance of the core interpreter (as >builtins are not affected) Note that adding a string coercion protocol isn't going to change core performance for existing cases, since any place where the protocol would be invoked would be a code branch that either throws an error or *already* falls back to some other protocol (e.g. the buffer protocol). >3. Lower impact if it turns out to have been a bad idea How many protocols have been added that turned out to be bad ideas? The only ones that have been removed in 3.x, IIRC, are three-way compare, slice-specific operations, and __coerce__... and I'm going to miss __cmp__. ;-) However, IIUC, the reason these protocols were dropped isn't because they were "bad ideas". Rather, they're things that can be implemented in terms of a finer-grained protocol. i.e., if you want __cmp__ or __getslice__ or __coerce__, you can always implement them via a mixin that converts the newer fine-grained protocols into invocations of the older protocol. (As I plan to do for __cmp__ in the handful of places I use it.) At the moment, however, this isn't possible for multi-string operations outside of __add__/__radd__ and comparison -- the coercion rules are hard-wired and can't be overridden by user-defined types. From solipsis at pitrou.net Sun Jun 27 19:50:33 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 27 Jun 2010 19:50:33 +0200 Subject: [Python-Dev] pybench References: Message-ID: <20100627195033.224713c2@pitrou.net> On Sun, 27 Jun 2010 10:47:08 -0400 Alexander Belopolsky wrote: > > I have a patch for pybench attached to a not so related issue at > http://bugs.python.org/issue5180 . All it took was a 2to3 run and a > one line change. Of course it need a review before it can go in, but > I am surprised that something like pybench was not updated long time > ago. Why do you say that? pybench works fine under Python 3 (the py3k branch version of pybench, that is). The patch doesn't look necessary to me. From tjreedy at udel.edu Sun Jun 27 21:03:31 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 27 Jun 2010 15:03:31 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: On 6/27/2010 5:44 AM, Georg Brandl wrote: > Am 22.06.2010 01:01, schrieb Terry Reedy: >> On 6/21/2010 3:59 PM, Steve Holden wrote: >>> Terry Reedy wrote: >>>> On 6/21/2010 8:33 AM, Nick Coghlan wrote: >>>> >>>>> P.S. (We're going to have a tough decision to make somewhere along the >>>>> line where docs.python.org is concerned, too - when do we flick the >>>>> switch and make a 3.x version of the docs the default? >>>> >>>> Easy. When 3.2 is released. When 2.7 is released, 3.2 becomes 'trunk'. >>>> Trunk released always take over docs.python.org. To do otherwise would >>>> be to say that 3.2 is not a real trunk release and not yet ready for >>>> real use -- a major slam. >>>> >>>> Actually, I thought this was already discussed and decided ;-). >>>> >>> This also gives the 2.7 release it's day in the sun before relegation to >>> maintenance status. >> >> Every new version (except 3.0 and 3.1) has gone to maintenance status >> *and* becomes the featured release on docs.python.org the day it was >> released. 2.7 would just spend less time as the featured release on >> that page. > > I'm not sure 3.2 should take over in December just yet. (There's also > docs3.python.org that always lands at the latest 3.x documentation). > > However, there will be enough time to discuss this when 3.2 is actually > about to be released. Sure. Since I expect that the argument for treating 3.2 as a regular production-use-ready release will be stronger then than now, I agree on differing discussion. -- Terry Jan Reedy From martin at v.loewis.de Sun Jun 27 21:25:06 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 27 Jun 2010 21:25:06 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <201006211113.06767.stephan.richter@gmail.com> References: <20100618050712.GC20639@thorne.id.au> <201006211113.06767.stephan.richter@gmail.com> Message-ID: <4C27A592.8010206@v.loewis.de> Am 21.06.2010 17:13, schrieb Stephan Richter: > On Monday, June 21, 2010, Nick Coghlan wrote: >> A decent listing of major packages that already support Python 3 would >> be very handy for the new Python2orPython3 page I created on the wiki, >> and easier to keep up-to-date. (the old Early2to3Migrations page >> didn't look particularly up to date, but hopefully we can keep the new >> list in a happier state). > > I really just want to be able to go to PyPI, Click on "Browse packages" and > then select "Python 3" (it can currently be accomplished by clicking "Python" > and then "3"). Or you can use the link "Python 3 packages" on PyPI's main menu. Regards, Martin From bugtrack at roumenpetrov.info Sun Jun 27 21:25:16 2010 From: bugtrack at roumenpetrov.info (Roumen Petrov) Date: Sun, 27 Jun 2010 22:25:16 +0300 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> Message-ID: <4C27A59C.6040005@roumenpetrov.info> Brett Cannon wrote: > On Sat, Jun 26, 2010 at 16:37, M.-A. Lemburg wrote: >> Brett Cannon wrote: >>> On Wed, Jun 23, 2010 at 14:53, Brett Cannon wrote: [SKIP] >>> Since no one objected I swapped the order in r82259. In case anyone >>> else uses clang to compile Python, this means that -Wno-unused-value >>> will now work to silence the warning about unused return values that >>> is caused by some macros. Probably using -Wno-empty-body is also good >>> to avoid all the warnings triggered by the UCS4 macros in cjkcodecs. Right now you cannot change order of CFLAGS and OPT >> I think you need to come up with a different solution and revert >> the change... >> >> OPT has historically been the only variable to use for >> adjusting the Python C compiler settings. > > Just found the relevant section in the README. > >> >> As the name implies this was usually used to adjust the >> optimizer settings, including raising the optimization level >> from the default or disabling it. > > It meant optional to me, not optimization. I hate abbreviations sometimes. > >> >> With your change CFLAGS will always override OPT and thus >> any optimization definitions made in OPT will no longer >> have an effect. > > That was the point; OPT defines defaults through configure.in and I > simply wanted to add to those instead of having OPT completely > overwritten by me. Now if you confirm that (see configure.in ) : # Optimization messes up debuggers, so turn it off for # debug builds. OPT="-g -O0 -Wall $STRICT_PROTO" is not issue for py3k then left you commit as is (Note that Mark point this). But if optimization "messes up debuggers" you may revert change. I know that is difficult to reach consensus on compiler/preprocessor flags for python build process. Next is a shot list with issues about this: - "Python 2.5 64 bit compile fails on Solaris 10/gcc 4.1.1" : http://bugs.python.org/issue1628484 (committed/rejected) - "Python does not honor "CFLAGS" environment variable" : http://bugs.python.org/issue1453 (wont fix) - "configure: allow user-provided CFLAGS to override AC_PROG_CC defaults" : http://bugs.python.org/issue8211 (fixed) This is still open "configure doesn't set up CFLAGS properly" ( http://bugs.python.org/issue1104249 ) - must be closed as fixed. >> Note that CFLAGS defines -O2 on many platforms. > > So then wouldn't that mean they want that to be the optimization > level? Or is the historical reason that default exists is so that some > default exists but to expect the application to override as desired? > >> >> In your particular case, you should try setting OPT to >> "... -Wno-unused-value ..." (ie. replace -Wall with your >> setting). > > So what is CFLAGS for then? ``configure -h`` says it's for "C compiler > flags"; that's extremely ambiguous. And it doesn't help that OPT is > not mentioned by ``configure -h`` as that is what I have always gone > by to know what flags are available for compilation. > > -Brett If you like to see some flags the could you look into http://bugs.python.org/issue3718 how to define an option to be visible by configure --help. In addition AC_ARG_VAR will allow environment variable to be cached for subsequent run of config.status otherwise you must specify only on configure command line. About all XXflags variables if is good configure script to be simplified to use only CPPFLAGS and CFLAGS to minimize configuration troubles and other build falures. A good sample if configure set preprocessor/compiler flags other then CPPFLAGS/CFLAGS is this issue "OSX: duplicate -arch flags in CFLAGS breaks sysconfig" ( http://bugs.python.org/issue8607 ) Roumen From dickinsm at gmail.com Sun Jun 27 21:43:34 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 27 Jun 2010 20:43:34 +0100 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C268F1E.5070506@egenix.com> References: <4C268F1E.5070506@egenix.com> Message-ID: On Sun, Jun 27, 2010 at 12:37 AM, M.-A. Lemburg wrote: > Brett Cannon wrote: >> On Wed, Jun 23, 2010 at 14:53, Brett Cannon wrote: >>> I finally realized why clang has not been silencing its warnings about >>> unused return values: I have -Wno-unused-value set in CFLAGS which >>> comes before OPT (which defines -Wall) as set in PY_CFLAGS in >>> Makefile.pre.in. >>> >>> I could obviously set OPT in my environment, but that would override >>> the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, >>> but the README says that's for stuff that tweak binary compatibility. >>> >>> So basically what I am asking is what environment variable should I >>> use? If CFLAGS is correct then does anyone have any issues if I change >>> the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes >>> after OPT? >>> >> >> Since no one objected I swapped the order in r82259. In case anyone >> else uses clang to compile Python, this means that -Wno-unused-value >> will now work to silence the warning about unused return values that >> is caused by some macros. Probably using -Wno-empty-body is also good >> to avoid all the warnings triggered by the UCS4 macros in cjkcodecs. > > I think you need to come up with a different solution and revert > the change... Agreed; this needs more thought. For one thing, Brett's change has the result that --with-pydebug builds end up being built with -O2 instead of -O0, which can make debugging (e.g., with gdb) somewhat awkward. Mark From dickinsm at gmail.com Sun Jun 27 22:04:56 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 27 Jun 2010 21:04:56 +0100 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> Message-ID: On Sun, Jun 27, 2010 at 6:46 AM, Jeffrey Yasskin wrote: > AC_PROG_CC is the macro that sets CFLAGS to -g -O2 on gcc-based > systems (http://www.gnu.org/software/hello/manual/autoconf/C-Compiler.html#index-AC_005fPROG_005fCC-842). > If Python's configure.in sets an otherwise-empty CFLAGS to -g before > calling AC_PROG_CC, AC_PROG_CC won't change it. Or we could just > preserve the users CFLAGS setting across AC_PROG_CC regardless of > whether it's set, to let the user set CFLAGS on the configure line > without stomping any defaults. I think saving and restoring CFLAGS across AC_PROG_CC was attempted in http://bugs.python.org/issue8211 . It turned out that it broke OS X universal builds. I'm not sure I understand the importance of allowing AC_PROG_CC to set CFLAGS (if CFLAGS is undefined at the point of the AC_PROG_CC); can someone give an example of why this is necessary? Mark From tjreedy at udel.edu Sun Jun 27 22:07:56 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 27 Jun 2010 16:07:56 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: > Sure. Since I expect that the argument for treating 3.2 as a regular > production-use-ready release will be stronger then than now, I agree on > differing discussion. I meant 'deferring' -- Terry Jan Reedy From jyasskin at gmail.com Sun Jun 27 22:37:48 2010 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Sun, 27 Jun 2010 13:37:48 -0700 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> Message-ID: On Sun, Jun 27, 2010 at 1:04 PM, Mark Dickinson wrote: > On Sun, Jun 27, 2010 at 6:46 AM, Jeffrey Yasskin wrote: >> AC_PROG_CC is the macro that sets CFLAGS to -g -O2 on gcc-based >> systems (http://www.gnu.org/software/hello/manual/autoconf/C-Compiler.html#index-AC_005fPROG_005fCC-842). >> If Python's configure.in sets an otherwise-empty CFLAGS to -g before >> calling AC_PROG_CC, AC_PROG_CC won't change it. Or we could just >> preserve the users CFLAGS setting across AC_PROG_CC regardless of >> whether it's set, to let the user set CFLAGS on the configure line >> without stomping any defaults. > > I think saving and restoring CFLAGS across AC_PROG_CC was attempted in > http://bugs.python.org/issue8211 . It turned out that it broke OS X > universal builds. Thanks for the link to the issue. http://bugs.python.org/issue8366 says Ronald Oussoren fixed the universal builds without reverting the CFLAGS propagation. > I'm not sure I understand the importance of allowing AC_PROG_CC to set > CFLAGS (if CFLAGS is undefined at the point of the AC_PROG_CC); ?can > someone give an example of why this is necessary? Marc-Andre's argument seems to be "it's possible that AC_PROG_CC adds other flags as well (it currently doesn't, but that may well change in future versions of autoconf)." That seems a little weak to constrain fixing actual problems today. If it ever adds more arguments, we'll need to inspect them anyway to see if they're more like -g or -O2 (wanted or harmful). Jeffrey From brett at python.org Sun Jun 27 22:50:23 2010 From: brett at python.org (Brett Cannon) Date: Sun, 27 Jun 2010 13:50:23 -0700 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> Message-ID: On Sun, Jun 27, 2010 at 13:37, Jeffrey Yasskin wrote: > On Sun, Jun 27, 2010 at 1:04 PM, Mark Dickinson wrote: >> On Sun, Jun 27, 2010 at 6:46 AM, Jeffrey Yasskin wrote: >>> AC_PROG_CC is the macro that sets CFLAGS to -g -O2 on gcc-based >>> systems (http://www.gnu.org/software/hello/manual/autoconf/C-Compiler.html#index-AC_005fPROG_005fCC-842). >>> If Python's configure.in sets an otherwise-empty CFLAGS to -g before >>> calling AC_PROG_CC, AC_PROG_CC won't change it. Or we could just >>> preserve the users CFLAGS setting across AC_PROG_CC regardless of >>> whether it's set, to let the user set CFLAGS on the configure line >>> without stomping any defaults. >> >> I think saving and restoring CFLAGS across AC_PROG_CC was attempted in >> http://bugs.python.org/issue8211 . It turned out that it broke OS X >> universal builds. > > Thanks for the link to the issue. http://bugs.python.org/issue8366 > says Ronald Oussoren fixed the universal builds without reverting the > CFLAGS propagation. > >> I'm not sure I understand the importance of allowing AC_PROG_CC to set >> CFLAGS (if CFLAGS is undefined at the point of the AC_PROG_CC); ?can >> someone give an example of why this is necessary? > > Marc-Andre's argument seems to be "it's possible that AC_PROG_CC adds > other flags as well (it currently doesn't, but that may well change in > future versions of autoconf)." That seems a little weak to constrain > fixing actual problems today. If it ever adds more arguments, we'll > need to inspect them anyway to see if they're more like -g or -O2 > (wanted or harmful). > I went ahead and reverted the change, but it does seem like the build environment could use a cleanup. From dickinsm at gmail.com Sun Jun 27 22:54:06 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 27 Jun 2010 21:54:06 +0100 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> Message-ID: On Sun, Jun 27, 2010 at 9:37 PM, Jeffrey Yasskin wrote: > On Sun, Jun 27, 2010 at 1:04 PM, Mark Dickinson wrote: >> I think saving and restoring CFLAGS across AC_PROG_CC was attempted in >> http://bugs.python.org/issue8211 . It turned out that it broke OS X >> universal builds. > > Thanks for the link to the issue. http://bugs.python.org/issue8366 > says Ronald Oussoren fixed the universal builds without reverting the > CFLAGS propagation. Yes, you're right (of course). Thanks. Looking at the current configure.in, CFLAGS *does* get saved and restored across the AC_PROG_CC call if it's non-empty; I'm not sure whether this actually (currently) has any effect, since as I understand the documentation CFLAGS won't be touched by AC_PROG_CC if it's already set. >> I'm not sure I understand the importance of allowing AC_PROG_CC to set >> CFLAGS (if CFLAGS is undefined at the point of the AC_PROG_CC); ?can >> someone give an example of why this is necessary? > > Marc-Andre's argument seems to be "it's possible that AC_PROG_CC adds > other flags as well (it currently doesn't, but that may well change in > future versions of autoconf)." That seems a little weak to constrain > fixing actual problems today. If it ever adds more arguments, we'll > need to inspect them anyway to see if they're more like -g or -O2 > (wanted or harmful). Okay; thanks for the explanation. Mark From greg.ewing at canterbury.ac.nz Mon Jun 28 00:35:36 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 28 Jun 2010 10:35:36 +1200 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <4C272D8C.6010406@trueblade.com> References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <4C25B319.8040804@canterbury.ac.nz> <4C271E66.5050902@canterbury.ac.nz> <4C272D8C.6010406@trueblade.com> Message-ID: <4C27D238.3060100@canterbury.ac.nz> Eric Smith wrote: > But isn't this currently ignored everywhere in python's code? It's true that code using a utf-8 build would have to be aware of the fact much more often. But I'm thinking of applications that would otherwise want to keep all their strings encoded to save memory. If they do that, they also need to deal with sequence items not corresponding to characters. If they can handle that, they may be able to handle utf-8 just as well. -- Greg From rdmurray at bitdance.com Mon Jun 28 01:31:21 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 27 Jun 2010 19:31:21 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> <20100627034922.31A663A4108@sparrow.telecommunity.com> Message-ID: <20100627233121.E1E0821948D@kimball.webabinitio.net> I've been watching this discussion with intense interest, but have been so lagged in following the thread that I haven't replied. I got caught up today.... On Sun, 27 Jun 2010 15:53:59 +1000, Nick Coghlan wrote: > The difference is that we have three classes of algorithm here: > - those that work only on octet sequences > - those that work only on character sequences > - those that can work on either > > Python 2 lumped all 3 classes of algorithm together through the > multi-purpose 8-bit str type. The unicode type provided some scope to > separate out the second category, but the divisions were rather > blurry. > > Python 3 forces the first two to be separated by using either octets > (bytes/bytearray) or characters (str). There are a *very small* number > of APIs where it is appropriate to be polymorphic, but this is > currently difficult due to the need to supply literals of the > appropriate type for the objects being operated on. > > This isn't ever going to happen automagically due to the need to > explicitly provide two literals (one for octet sequences, one for > character sequences). In email6 I'm currently handling this by putting the algorithm on a base class and the literals on 'Bytes...' and 'String...' subclasses as class variables. Slightly ugly, but it works. The current design also speaks to an earlier point someone made about the fact that we are really dealing with more complex, and domain specific, data, not simply "byte strings". A "BytesMessage" contains lots of structured encoding information as well as the possibility of 'garbage' bytes. A StringMessage contains text and data decoded into objects (ex: an image object), possibly with some PEP 383 surrogates included (haven't quite figured that part out yet). So, a BytesMessage object isn't just a byte string, it's a load of structured data that requires the associated algorithms to convert into meaningful text and objects. Going the other way, the decisions made about character encodings need to be encoded into the structured bytes representation that could ultimately go out on the wire. I suspect that the same thing needs to be done for URIs/IRIs, and html/MIME and the corresponding text and objects. It is my hope that the email6 work will lay a firm foundation for the latter, but URI/IRI is a whole different protocol that I'm glad I don't have to deal with :) > The virtues of a separate poly_str type are that: Having such a poly_str type would probably make my life easier. I also would like just vent a little frustration at having to use single-character-slice notation when I want to index a character in a string in my algorithms.... -- R. David Murray www.bitdance.com From rdmurray at bitdance.com Mon Jun 28 01:41:48 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 27 Jun 2010 19:41:48 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <26215.1277505652@parc.com> References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <26215.1277505652@parc.com> Message-ID: <20100627234148.9618021948F@kimball.webabinitio.net> On Fri, 25 Jun 2010 15:40:52 -0700, Bill Janssen wrote: > Guido van Rossum wrote: > > So you're really just worried about space consumption. I'd like to see > > a lot of hard memory profiling data before I got overly worried about > > that. > > While I've seen some big Web pages, I think the email folks, who often > have to process messages with attachments measuring in the tens of > megabytes, have the stronger problems here, and I think speed may be > more important than memory. I've built both a Web server and an IMAP > server in Python, and the IMAP server is where the issues of storage > management really prevail. If you have to convert a 20 MB encoded > string into a Unicode string just to look at the headers as strings, you > have issues. (The Python email package doesn't do that, by the way.) Unfortunately in the current Python3 email package (email5), this is no longer true. You have to decode everything *first* in order to pass it through email (which presents a few problems when dealing with 8bit data, as has been mentioned here before). eamil6 intends to fix this, and once again allow you to decode to text only the bits you actually need to access and manipulate. -- R. David Murray www.bitdance.com From rdmurray at bitdance.com Mon Jun 28 02:00:17 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 27 Jun 2010 20:00:17 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: Message-ID: <20100628000017.F3B732194BA@kimball.webabinitio.net> On Fri, 18 Jun 2010 18:52:45 -0000, lutz at rmi.net wrote: > What I'm suggesting is that extreme caution be exercised from > this point forward with all things 3.X-related. Whether you > wish to accept this or not, 3.X has a negative image to many. > This suggestion specifically includes not abandoning current > 3.X email package users as a case in point. Ripping the rug > out from new 3.X users after they took the time to port seems > like it may be just enough to tip the scales altogether. Catching up on my python-dev email, I just want to clarify this with respect to email. (1) I suspect that the new API will be enough of a carrot that they won't mind converting to it, BUT, (2) the plan is to provide a compatibility API that will fully support the current Python3 email5 API (but with fewer bugs in areas such as header folding and unfolding). -- R. David Murray www.bitdance.com From greg at krypto.org Mon Jun 28 06:33:36 2010 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 27 Jun 2010 21:33:36 -0700 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL In-Reply-To: <4C262D37.7020807@animats.com> References: <4C259A25.1060705@animats.com> <4C2600B4.5020503@voidspace.org.uk> <4C262D37.7020807@animats.com> Message-ID: fyi - newthreading has been picked up by lwn. http://lwn.net/Articles/393822/#Comments -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Mon Jun 28 06:33:36 2010 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 27 Jun 2010 21:33:36 -0700 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL In-Reply-To: <4C262D37.7020807@animats.com> References: <4C259A25.1060705@animats.com> <4C2600B4.5020503@voidspace.org.uk> <4C262D37.7020807@animats.com> Message-ID: fyi - newthreading has been picked up by lwn. http://lwn.net/Articles/393822/#Comments -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Mon Jun 28 10:28:45 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 28 Jun 2010 20:28:45 +1200 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100627233121.E1E0821948D@kimball.webabinitio.net> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> <20100627034922.31A663A4108@sparrow.telecommunity.com> <20100627233121.E1E0821948D@kimball.webabinitio.net> Message-ID: <4C285D3D.80907@canterbury.ac.nz> R. David Murray wrote: > Having such a poly_str type would probably make my life easier. A thought on this poly_str type: perhaps it could be called "ascii", since that's what it would have to be restricted to, and have a'xxx' as a literal syntax for it, seeing as literals seem to be one of its main use cases. > I also would like just vent a little frustration at having to > use single-character-slice notation when I want to index a character > in a string in my algorithms.... Thinking way outside the square, and probably the pale as well, maybe @ could be pressed into service as an infix operator, with s at i being equivalent to s[i:i+1] -- Greg From orsenthil at gmail.com Mon Jun 28 10:25:26 2010 From: orsenthil at gmail.com (Senthil Kumaran) Date: Mon, 28 Jun 2010 13:55:26 +0530 Subject: [Python-Dev] bytes / unicode In-Reply-To: <4C285D3D.80907@canterbury.ac.nz> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> <20100627034922.31A663A4108@sparrow.telecommunity.com> <20100627233121.E1E0821948D@kimball.webabinitio.net> <4C285D3D.80907@canterbury.ac.nz> Message-ID: <20100628082526.GA6509@remy> On Mon, Jun 28, 2010 at 08:28:45PM +1200, Greg Ewing wrote: > A thought on this poly_str type: perhaps it could be > called "ascii", since that's what it would have to be > restricted to, and have > > a'xxx' > > as a literal syntax for it, seeing as literals seem to > be one of its main use cases. This seems like a good idea. > > Thinking way outside the square, and probably the pale > as well, maybe @ could be pressed into service as an > infix operator, with > > s at i > > being equivalent to > > s[i:i+1] > And this is way beyond being intuitive. -- Senthil From rdmurray at bitdance.com Mon Jun 28 13:24:48 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 28 Jun 2010 07:24:48 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100628082526.GA6509@remy> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> <20100627034922.31A663A4108@sparrow.telecommunity.com> <20100627233121.E1E0821948D@kimball.webabinitio.net> <4C285D3D.80907@canterbury.ac.nz> <20100628082526.GA6509@remy> Message-ID: <20100628112448.348771FD0CD@kimball.webabinitio.net> On Mon, 28 Jun 2010 13:55:26 +0530, Senthil Kumaran wrote: > On Mon, Jun 28, 2010 at 08:28:45PM +1200, Greg Ewing wrote: > > Thinking way outside the square, and probably the pale > > as well, maybe @ could be pressed into service as an > > infix operator, with > > > > s at i > > > > being equivalent to > > > > s[i:i+1] > > > > And this is way beyond being intuitive. Agreed, -1 on that. Like I said, I was just venting. The decision to have indexing bytes return an int is set in stone now and I just have to live with it. -- R. David Murray www.bitdance.com From mal at egenix.com Mon Jun 28 13:38:31 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 28 Jun 2010 13:38:31 +0200 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> Message-ID: <4C2889B7.2060105@egenix.com> Brett Cannon wrote: > On Sun, Jun 27, 2010 at 13:37, Jeffrey Yasskin wrote: >> On Sun, Jun 27, 2010 at 1:04 PM, Mark Dickinson wrote: >>> On Sun, Jun 27, 2010 at 6:46 AM, Jeffrey Yasskin wrote: >>>> AC_PROG_CC is the macro that sets CFLAGS to -g -O2 on gcc-based >>>> systems (http://www.gnu.org/software/hello/manual/autoconf/C-Compiler.html#index-AC_005fPROG_005fCC-842). >>>> If Python's configure.in sets an otherwise-empty CFLAGS to -g before >>>> calling AC_PROG_CC, AC_PROG_CC won't change it. Or we could just >>>> preserve the users CFLAGS setting across AC_PROG_CC regardless of >>>> whether it's set, to let the user set CFLAGS on the configure line >>>> without stomping any defaults. >>> >>> I think saving and restoring CFLAGS across AC_PROG_CC was attempted in >>> http://bugs.python.org/issue8211 . It turned out that it broke OS X >>> universal builds. >> >> Thanks for the link to the issue. http://bugs.python.org/issue8366 >> says Ronald Oussoren fixed the universal builds without reverting the >> CFLAGS propagation. >> >>> I'm not sure I understand the importance of allowing AC_PROG_CC to set >>> CFLAGS (if CFLAGS is undefined at the point of the AC_PROG_CC); can >>> someone give an example of why this is necessary? >> >> Marc-Andre's argument seems to be "it's possible that AC_PROG_CC adds >> other flags as well (it currently doesn't, but that may well change in >> future versions of autoconf)." That seems a little weak to constrain >> fixing actual problems today. If it ever adds more arguments, we'll >> need to inspect them anyway to see if they're more like -g or -O2 >> (wanted or harmful). Please see the discussion on the ticket for details. AC_PROG_CC provides the basic defaults for the CFLAGS compiler settings depending on which compiler is chosen/found: http://www.gnu.org/software/hello/manual/autoconf/C-Compiler.html > I went ahead and reverted the change, but it does seem like the build > environment could use a cleanup. Thanks and, indeed, the build system environment variable usage does need a cleanup. It's a larger project, though, and one that will likely break existing build setups. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 28 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 20 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Mon Jun 28 14:13:53 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 28 Jun 2010 22:13:53 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: <4C285D3D.80907@canterbury.ac.nz> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> <20100627034922.31A663A4108@sparrow.telecommunity.com> <20100627233121.E1E0821948D@kimball.webabinitio.net> <4C285D3D.80907@canterbury.ac.nz> Message-ID: On Mon, Jun 28, 2010 at 6:28 PM, Greg Ewing wrote: > R. David Murray wrote: > >> Having such a poly_str type would probably make my life easier. > > A thought on this poly_str type: perhaps it could be > called "ascii", since that's what it would have to be > restricted to, and have > > ?a'xxx' > > as a literal syntax for it, seeing as literals seem to > be one of its main use cases. One of the virtues of doing this as a helper type in a module somewhere (probably string) is that we can defer that kind of decision until later. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From dickinsm at gmail.com Mon Jun 28 15:50:37 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Mon, 28 Jun 2010 14:50:37 +0100 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C2889B7.2060105@egenix.com> References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> Message-ID: On Mon, Jun 28, 2010 at 12:38 PM, M.-A. Lemburg wrote: >> On Sun, Jun 27, 2010 at 13:37, Jeffrey Yasskin wrote: >>> On Sun, Jun 27, 2010 at 1:04 PM, Mark Dickinson wrote: >>>> I'm not sure I understand the importance of allowing AC_PROG_CC to set >>>> CFLAGS (if CFLAGS is undefined at the point of the AC_PROG_CC); ?can >>>> someone give an example of why this is necessary? >>> >>> Marc-Andre's argument seems to be "it's possible that AC_PROG_CC adds >>> other flags as well (it currently doesn't, but that may well change in >>> future versions of autoconf)." That seems a little weak to constrain >>> fixing actual problems today. If it ever adds more arguments, we'll >>> need to inspect them anyway to see if they're more like -g or -O2 >>> (wanted or harmful). > > Please see the discussion on the ticket for details. Yes, I've done that. It's repeatedly asserted in that discussion that AC_PROG_CC should be allowed to initialize an otherwise empty CFLAGS, but nowhere in that discussion does it explain *why* this is desirable. What would be so bad about not allowing AC_PROG_CC to initialize CFLAGS? (E.g., by setting an otherwise empty CFLAGS to '-g' before the AC_PROG_CC invocation.) That would fix the issue of the unwanted -O2 flag that AC_PROG_CC otherwise adds. Mark From mal at egenix.com Mon Jun 28 16:04:04 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 28 Jun 2010 16:04:04 +0200 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> Message-ID: <4C28ABD4.1030000@egenix.com> Mark Dickinson wrote: > On Mon, Jun 28, 2010 at 12:38 PM, M.-A. Lemburg wrote: >>> On Sun, Jun 27, 2010 at 13:37, Jeffrey Yasskin wrote: >>>> On Sun, Jun 27, 2010 at 1:04 PM, Mark Dickinson wrote: >>>>> I'm not sure I understand the importance of allowing AC_PROG_CC to set >>>>> CFLAGS (if CFLAGS is undefined at the point of the AC_PROG_CC); can >>>>> someone give an example of why this is necessary? >>>> >>>> Marc-Andre's argument seems to be "it's possible that AC_PROG_CC adds >>>> other flags as well (it currently doesn't, but that may well change in >>>> future versions of autoconf)." That seems a little weak to constrain >>>> fixing actual problems today. If it ever adds more arguments, we'll >>>> need to inspect them anyway to see if they're more like -g or -O2 >>>> (wanted or harmful). >> >> Please see the discussion on the ticket for details. > > Yes, I've done that. It's repeatedly asserted in that discussion that > AC_PROG_CC should be allowed to initialize an otherwise empty CFLAGS, > but nowhere in that discussion does it explain *why* this is > desirable. What would be so bad about not allowing AC_PROG_CC to > initialize CFLAGS? (E.g., by setting an otherwise empty CFLAGS to > '-g' before the AC_PROG_CC invocation.) That would fix the issue of > the unwanted -O2 flag that AC_PROG_CC otherwise adds. Why do you think that the default -O2 is unwanted and how do you know whether the compiler accepts -g as option ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 28 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 20 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From dickinsm at gmail.com Mon Jun 28 16:22:19 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Mon, 28 Jun 2010 15:22:19 +0100 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C28ABD4.1030000@egenix.com> References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> Message-ID: On Mon, Jun 28, 2010 at 3:04 PM, M.-A. Lemburg wrote: > Why do you think that the default -O2 is unwanted Because it can cause debug builds of Python to be built with optimization enabled, as we've already seen at least twice. > and how do you know > whether the compiler accepts -g as option ? I don't. It could easily be tested for, though. Alternatively, setting an empty CFLAGS to '-g' could be done just for gcc, since this is the only compiler for which AC_PROG_CC adds -O2. Mark From mal at egenix.com Mon Jun 28 17:28:03 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 28 Jun 2010 17:28:03 +0200 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> Message-ID: <4C28BF83.9080903@egenix.com> Mark Dickinson wrote: > On Mon, Jun 28, 2010 at 3:04 PM, M.-A. Lemburg wrote: >> Why do you think that the default -O2 is unwanted > > Because it can cause debug builds of Python to be built with > optimization enabled, as we've already seen at least twice. Then let me put it this way: How many Python users will compile Python in debug mode ? The point is that the default build of Python should use the correct production settings for the C compiler out of the box and that's what AC_PROG_CC is all about. I'm pretty sure that Python developers who want to use a debug build have enough code foo to get the -O2 turned into a -O0 either by adjust OPT and/or by providing their own CFLAGS env var. Also note that in some cases you may actually want to have a debug build with optimizations turned on, e.g. to track down a compiler optimization bug. >> and how do you know >> whether the compiler accepts -g as option ? > > I don't. It could easily be tested for, though. Alternatively, > setting an empty CFLAGS to '-g' could be done just for gcc, since this > is the only compiler for which AC_PROG_CC adds -O2. ... and then end up with default Python builds which don't have debug symbols available to track down core dumps, etc. ? AC_PROG_CC checks whether the compiler supports -g and always uses it in that case. The option is supported by more compilers than just GCC. E.g. IBM's xlC and Intel's icl compilers support that option as well. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 28 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 20 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Mon Jun 28 17:31:40 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 28 Jun 2010 17:31:40 +0200 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C28BF83.9080903@egenix.com> References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> <4C28BF83.9080903@egenix.com> Message-ID: <4C28C05C.80008@egenix.com> M.-A. Lemburg wrote: > Mark Dickinson wrote: >> On Mon, Jun 28, 2010 at 3:04 PM, M.-A. Lemburg wrote: >>> Why do you think that the default -O2 is unwanted >> >> Because it can cause debug builds of Python to be built with >> optimization enabled, as we've already seen at least twice. > > Then let me put it this way: > > How many Python users will compile Python in debug mode ? > > The point is that the default build of Python should use > the correct production settings for the C compiler out of > the box and that's what AC_PROG_CC is all about. > > I'm pretty sure that Python developers who want to use a > debug build have enough code foo to get the -O2 turned into a -O0 > either by adjust OPT and/or by providing their own CFLAGS env var. > > Also note that in some cases you may actually want to have > a debug build with optimizations turned on, e.g. to track down > a compiler optimization bug. > >>> and how do you know >>> whether the compiler accepts -g as option ? >> >> I don't. It could easily be tested for, though. Alternatively, >> setting an empty CFLAGS to '-g' could be done just for gcc, since this >> is the only compiler for which AC_PROG_CC adds -O2. > > ... and then end up with default Python builds which don't have > debug symbols available to track down core dumps, etc. ? > > AC_PROG_CC checks whether the compiler supports -g and always > uses it in that case. The option is supported by more compilers > than just GCC. E.g. IBM's xlC and Intel's icl compilers support > that option as well. Sorry, Intel's compiler is called "icc", not "icl": http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/mac/man/icc.txt IBM's compiler: http://publib.boulder.ibm.com/infocenter/macxhelp/v6v81/index.jsp?topic=/com.ibm.vacpp6m.doc/compiler/ref/ruoptlst.htm -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 28 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 20 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From guido at python.org Mon Jun 28 17:39:22 2010 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Jun 2010 08:39:22 -0700 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL In-Reply-To: References: <4C259A25.1060705@animats.com> <4C2600B4.5020503@voidspace.org.uk> <4C262D37.7020807@animats.com> Message-ID: On Sun, Jun 27, 2010 at 9:33 PM, Gregory P. Smith wrote: > fyi - newthreading has been picked up by lwn. > http://lwn.net/Articles/393822/#Comments Do you know if any of the commenters is Nagle himself (and if so, which)? The discussion is hard to follow since the context of replies isn't always clear. There also seems to be a bunch of C++ thinking (and some knee-jerk responses by people who aren't actually all that familiar with Python) although I admit I don't have much of an intuition about memory models for fully free threading myself. It's a brave new world... --Guido -- --Guido van Rossum (python.org/~guido) From dickinsm at gmail.com Mon Jun 28 17:44:00 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Mon, 28 Jun 2010 16:44:00 +0100 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C28BF83.9080903@egenix.com> References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> <4C28BF83.9080903@egenix.com> Message-ID: On Mon, Jun 28, 2010 at 4:28 PM, M.-A. Lemburg wrote: > Mark Dickinson wrote: >> On Mon, Jun 28, 2010 at 3:04 PM, M.-A. Lemburg wrote: >>> Why do you think that the default -O2 is unwanted >> >> Because it can cause debug builds of Python to be built with >> optimization enabled, as we've already seen at least twice. > > Then let me put it this way: > > How many Python users will compile Python in debug mode ? > > The point is that the default build of Python should use > the correct production settings for the C compiler out of > the box and that's what AC_PROG_CC is all about. > > I'm pretty sure that Python developers who want to use a > debug build have enough code foo to get the -O2 turned into a -O0 > either by adjust OPT and/or by providing their own CFLAGS env var. Shrug. Clearly someone at some point in the past thought it was a good idea to have --with-pydebug builds use -O0. If there's going to be a deliberate decision to drop that now, then that's fine with me. >> I don't. ?It could easily be tested for, though. ?Alternatively, >> setting an empty CFLAGS to '-g' could be done just for gcc, since this >> is the only compiler for which AC_PROG_CC adds -O2. > > ... and then end up with default Python builds which don't have > debug symbols available to track down core dumps, etc. ? No, I don't see how that follows. I was suggesting that *for gcc only*, an empty CFLAGS be set to '-g' before calling AC_PROG_CC. The *only* effect this would have would be that for gcc, if the user hasn't specified CFLAGS, then CFLAGS ends up being '-g' rather than '-g -O2' after the AC_PROG_CC call. But I'm really not looking for an argument here; I just wanted to understand why you thought AC_PROG_CC setting CFLAGS was important, and you've explained that. Thanks. Mark From mal at egenix.com Mon Jun 28 18:03:23 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 28 Jun 2010 18:03:23 +0200 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> <4C28BF83.9080903@egenix.com> Message-ID: <4C28C7CB.8030600@egenix.com> Mark Dickinson wrote: > On Mon, Jun 28, 2010 at 4:28 PM, M.-A. Lemburg wrote: >> Mark Dickinson wrote: >>> On Mon, Jun 28, 2010 at 3:04 PM, M.-A. Lemburg wrote: >>>> Why do you think that the default -O2 is unwanted >>> >>> Because it can cause debug builds of Python to be built with >>> optimization enabled, as we've already seen at least twice. >> >> Then let me put it this way: >> >> How many Python users will compile Python in debug mode ? >> >> The point is that the default build of Python should use >> the correct production settings for the C compiler out of >> the box and that's what AC_PROG_CC is all about. >> >> I'm pretty sure that Python developers who want to use a >> debug build have enough code foo to get the -O2 turned into a -O0 >> either by adjust OPT and/or by providing their own CFLAGS env var. > > Shrug. Clearly someone at some point in the past thought it was a > good idea to have --with-pydebug builds use -O0. If there's going to > be a deliberate decision to drop that now, then that's fine with me. Ah right, the time machine again :-) OPT already uses -O0 if --with-pydebug is used and the compiler supports -g. Since OPT gets added after CFLAGS, the override already happens... >>> I don't. It could easily be tested for, though. Alternatively, >>> setting an empty CFLAGS to '-g' could be done just for gcc, since this >>> is the only compiler for which AC_PROG_CC adds -O2. >> >> ... and then end up with default Python builds which don't have >> debug symbols available to track down core dumps, etc. ? > > No, I don't see how that follows. I was suggesting that *for gcc > only*, an empty CFLAGS be set to '-g' before calling AC_PROG_CC. The > *only* effect this would have would be that for gcc, if the user > hasn't specified CFLAGS, then CFLAGS ends up being '-g' rather than > '-g -O2' after the AC_PROG_CC call. But I'm really not looking for an > argument here; I just wanted to understand why you thought AC_PROG_CC > setting CFLAGS was important, and you've explained that. Thanks. Sorry, that was a misunderstand on my part. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 28 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 20 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From techtonik at gmail.com Mon Jun 28 18:05:13 2010 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 28 Jun 2010 19:05:13 +0300 Subject: [Python-Dev] WPython 1.1 was released In-Reply-To: References: <201006232112.41047.steve@pearwood.info> Message-ID: It would be interesting to see benchmark diagrams inline on one page with overall summaries. I've posted a enhancement to http://code.google.com/p/unladen-swallow/issues/detail?id=145 if somebody is going to look at that. I wonder if 32bit version can bring more speedups? -- anatoly t. From techtonik at gmail.com Mon Jun 28 20:09:56 2010 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 28 Jun 2010 21:09:56 +0300 Subject: [Python-Dev] Pickle security and remote logging Message-ID: Hello, I need to send logging module output over the network. The module has everything to make this happen, except security. SocketHandler and DatagramHandler examples are using pickle module that is said to be insecure. SocketHandler and DatagramHandler docs should at least contain a warning about danger of exposing unpickling interfaces to insecure networks. pickle documentation mentions that it is possible to control what gets unpickled, but there is any no example or security analysis if the proposed solution will be secure. Is there any way to implement secure network logging? I do not care about data encryption - I just do not want my server exploited by malformed data. -- anatoly t. From zohair_ms at hotmail.com Mon Jun 28 20:09:35 2010 From: zohair_ms at hotmail.com (Zohair) Date: Mon, 28 Jun 2010 11:09:35 -0700 (PDT) Subject: [Python-Dev] Access a function Message-ID: <29008798.post@talk.nabble.com> I am a very new to python and have a small question.. I have a function: set_time_at_next_pps(self, *args, **kwargs) but don't know how to use it... Askign for your help please. Cheers, Zoh -- View this message in context: http://old.nabble.com/Access-a-function-tp29008798p29008798.html Sent from the Python - python-dev mailing list archive at Nabble.com. From fuzzyman at voidspace.org.uk Mon Jun 28 20:39:08 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 28 Jun 2010 19:39:08 +0100 Subject: [Python-Dev] Access a function In-Reply-To: <29008798.post@talk.nabble.com> References: <29008798.post@talk.nabble.com> Message-ID: <4C28EC4C.7030905@voidspace.org.uk> On 28/06/2010 19:09, Zohair wrote: > I am a very new to python and have a small question.. > > I have a function: > set_time_at_next_pps(self, *args, **kwargs) but don't know how to use it... > Askign for your help please. > Hi Zoh, This mailing list is for the development *of* Python, not for questions about developing *with* Python. You should ask your question on a mailing list / newsgroup like python-list or python-tutor. python-list is available via google groups: https://groups.google.com/group/comp.lang.python/topics You haven't given enough information to answer the question however. The first argument 'self' means that the function is probably a method of a class, and should be called from a class instance. The *args / **kwargs means that the function can take any number of arguments or keyword arguments, which doesn't tell us anything about the function should be used. You can find out more on Python functions in the tutorial: http://docs.python.org/tutorial/controlflow.html#more-on-defining-functions All the best, Michael Foord > Cheers, > > Zoh > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From phd at phd.pp.ru Mon Jun 28 20:42:28 2010 From: phd at phd.pp.ru (Oleg Broytman) Date: Mon, 28 Jun 2010 22:42:28 +0400 Subject: [Python-Dev] Access a function In-Reply-To: <29008798.post@talk.nabble.com> References: <29008798.post@talk.nabble.com> Message-ID: <20100628184228.GA17475@phd.pp.ru> Hello. We'are sorry but we cannot help you. This mailing list is to work on developing Python (fixing bugs and adding new features to Python itself); if you're having problems using Python, please find another forum. Probably python-list (comp.lang.python) news group/mailing list is the best place. See http://www.python.org/community/lists/ for other lists/news groups/fora. Thank you for understanding. On Mon, Jun 28, 2010 at 11:09:35AM -0700, Zohair wrote: > > I am a very new to python and have a small question.. > > I have a function: > set_time_at_next_pps(self, *args, **kwargs) but don't know how to use it... > Askign for your help please. > > Cheers, > > Zoh > -- > View this message in context: http://old.nabble.com/Access-a-function-tp29008798p29008798.html > Sent from the Python - python-dev mailing list archive at Nabble.com. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/phd%40phd.pp.ru Oleg. -- Oleg Broytman http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From alexander.belopolsky at gmail.com Mon Jun 28 21:59:00 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 28 Jun 2010 15:59:00 -0400 Subject: [Python-Dev] How to spell PyInstance_NewRaw in py3k? Message-ID: Issue #5180 [1] presented an interesting challenge: how to unpickle instances of old-style classes when a pickle created with 2.x is loaded in 3.x python? The problem is that pickle protocol requires that unpickled instances be created without calling the __init__ method. This is necessary because pickle file may not contain information about how __init__ method should be invoked. Instead, implementations are required to bypass __init__ and populate instance's __dict__ directly using data found in the pickle. Pure python implementation uses the following trick that happens to work in 3.x: class Empty: pass pickled = Empty() pickled.__class__ = Pickled This of course, creates a new-style class in 3.x, but if 3.x version of Pickled behaves similarly to its 2.x predecessor, it should work. The cPickle implementation, on the other hand uses 2.x C API which is not available in 3.x. Namely, the PyInstance_NewRaw function. In order to fix the bug described in issue #5180, I had to emulate PyInstance_NewRaw using type->tp_alloc. I considered an rejected the idea to use tp_new instead. [2] Is this the right way to proceed? The patch is attached to the issue. [3] [1] http://bugs.python.org/issue5180 [2] http://bugs.python.org/issue5180#msg108846 [3] http://bugs.python.org/file17792/issue5180.diff From lvh at laurensvh.be Mon Jun 28 23:33:05 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Mon, 28 Jun 2010 23:33:05 +0200 Subject: [Python-Dev] Access a function In-Reply-To: <20100628184228.GA17475@phd.pp.ru> References: <29008798.post@talk.nabble.com> <20100628184228.GA17475@phd.pp.ru> Message-ID: Of course I concur with the two posters above me, but in order to advertise for my own shop... If you're stuck with a lot of newbie questions like these you might want to try #python (the IRC channel on irc.freenode.net). You're more likely to get quick successive responses there than on other media (which are more suitable for bigger, more complex questions). cheers Laurens From guido at python.org Tue Jun 29 01:09:55 2010 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Jun 2010 16:09:55 -0700 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL In-Reply-To: <4C262D37.7020807@animats.com> References: <4C259A25.1060705@animats.com> <4C2600B4.5020503@voidspace.org.uk> <4C262D37.7020807@animats.com> Message-ID: I'm moving this thread to python-ideas, where it belongs. I've looked at the implementation code (even stepped through it with pdb!), read the sample/test code, and read the two papers on animats.com fairly closely (they have a lot of overlap, and the memory model described below seems copied verbatim from http://www.animats.com/papers/languages/pythonconcurrency.html version 0.8). Some reactions (trying to hide my responses to the details of the code): - First of all, I'm very happy to see radical ideas proposed, even if they are at present unrealistic. We need a big brainstorm to come up with ideas from which an eventual solution to the multicore problem might be chosen. (Jesse Noller's multiprocessing is another; Adam Olsen's work yet another, at a different end of the spectrum.) - The proposed new semantics (frozen objects, memory model, auto-freezing of globals, enforcement of naming conventions) are radically different from Python's current semantics. They will break every 3rd party library in many more ways than Python 3. This is not surprising given the goals of the proposal (and its roots in Adam Olsen's work) but places a huge roadblock for acceptance. I see no choice but to keep trying to come up with a compromise that is more palatable and compatible without throwing away all the advantages. As it now stands, the proposal might as well be a new and different language. - SynchronizedObject looks like a mixture of a Java synchronized class (a non-standard concept in Java but easily understood as a class all whose public methods are synchronized) and a condition variable (which has the same semantics of releasing the lock while waiting but without crawling the stack for other locks to release). It looks like the examples showing off SynchronizedObject could be implemented just as elegantly using a condition variable (and voluntary abstention from using shared mutable objects). - If the goal is to experiment with new control structures, I recommend decoupling them from the memory model and frozen objects, instead relying (as is traditional in Python) on programmer caution to avoid races. This would make it much easier to see how programmers respond to the new control structures. - You could add the freeze() function for voluntary use, and you could even add automatic wrapping of arguments and return values for certain classes using a class decorator or a metaclass, but the performance overhead makes this unlikely to win over many converts. I don't see much use for the "whole program freezing" done by the current prototype -- there are way too many backdoors in Python for the prototype approach to be anywhere near foolproof, and if we want a non-foolproof approach, voluntary constraint (and, in some cases, voluntary, i.e. explicit, wrapping of modules or classes) would work just as well. - For a larger-scale experiment with the new memory model and semantic restrictions (or would it be better to call them syntactic restrictions? -- after all they are about statically detectable properties like naming conventions) I recommend looking at PyPy, which has as one of its explicitly stated project goals easy experimentation with different object models. - I'm sure I've forgotten something, but I wanted to keep my impressions fresh. - Again, John, thanks for taking the time to come up with an implementation of your idea! --Guido On Sat, Jun 26, 2010 at 9:39 AM, John Nagle wrote: > On 6/26/2010 7:44 AM, Jesse Noller wrote: >> >> On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord >> ?wrote: >>> >>> On 26/06/2010 07:11, John Nagle wrote: >>>> >>>> We have just released a proof-of-concept implementation of a new >>>> approach to thread management - "newthreading". > > .... > >>> The import * form is considered bad practise in *general* and >>> should not be recommended unless there is a good reason. > > ? I agree. ?I just did that to make the examples cleaner. > >>> however the introduction of free-threading in Python has not been >>> hampered by lack of synchronization primitives but by the >>> difficulty of changing the interpreter without unduly impacting >>> single threaded code. > > ? ?That's what I'm trying to address here. > >>> Providing an alternative garbage collection mechanism other than >>> reference counting would be a more interesting first-step as far as >>> I can see, as that removes the locking required around every access >>> to an object (which currently touches the reference count). >>> Introducing free-threading by *changing* the threading semantics >>> (so you can't share non-frozen objects between threads) would not >>> be acceptable. That comment is likely to be based on a >>> misunderstanding of your future intentions though. :-) > > ? ?This work comes out of a discussion a few of us had at a restaurant > in Palo Alto after a Stanford talk by the group at Facebook which > is building a JIT compiler for PHP. ?We were discussing how to > make threading both safe for the average programmer and efficient. > Javascript and PHP don't have threads at all; Python has safe > threading, but it's slow. ?C/C++/Java all have race condition > problems, of course. ?The Facebook guy pointed out that you > can't redefine a function dynamically in PHP, and they get > a performance win in their JIT by exploiting this. > > ? ?I haven't gone into the memory model in enough detail in the > technical paper. ?The memory model I envision for this has three > memory zones: > > ? ?1. ?Shared fully-immutable objects: primarily strings, numbers, > and tuples, all of whose elements are fully immutable. ?These can > be shared without locking, and reclaimed by a concurrent garbage > collector like Boehm's. ?They have no destructors, so finalization > is not an issue. > > ? ?2. ?Local objects. ?These are managed as at present, and > require no locking. ?These can either be thread-local, or local > to a synchronized object. ?There are no links between local > objects under different "ownership". ?Whether each thread and > object has its own private heap, or whether there's a common heap with > locks at the allocator is an implementation decision. > > ? ?3. ?Shared mutable objects: mostly synchronized objects, but > also immutable objects like tuples which contain references > to objects that aren't fully immutable. ?These are the high-overhead > objects, and require locking during reference count updates, or > atomic reference count operations if supported by the hardware. > The general idea is to minimize the number of objects in this > zone. > > ? ?The zone of an object is determined when the object is created, > and never changes. ? This is relatively simple to implement. > Tuples (and frozensets, frozendicts, etc.) are normally zone 2 > objects. ?Only "freeze" creates collections in zones 1 and 3. > Synchronized objects are always created in zone 3. > There are no difficult handoffs, where an object that was previously > thread-local now has to be shared and has to acquire locks during > the transition. > > ? ?Existing interlinked data structures, like parse trees and GUIs, > are by default zone 2 objects, with the same semantics as at > present. ?They can be placed inside a SynchronizedObject if > desired, which makes them usable from multiple threads. > That's optional; they're thread-local otherwise. > > ? ?The rationale behind "freezing" some of the language semantics > when the program goes multi-thread comes from two sources - > Adam Olsen's Safethread work, and the acceptance of the > multiprocessing module. ?Olsen tried to retain all the dynamism of > the language in a multithreaded environment, but locking all the > underlying dictionaries was a boat-anchor on the whole system, > and slowed things down so much that he abandoned the project. > The Unladen Swallow documentation indicates that early thinking > on the project was that Olsen's approach would allow getting > rid of the GIL, but later notes indicate that no path to a > GIL-free JIT system is currently in development. > > ? ?The multiprocessing module provides semantics similar to > threading with "freezing". ?Data passed between processes is "frozen" > by pickling. ?Processes can't modify each other's code. ?Restrictive > though the multiprocessing module is, it appears to be useful. > It is sometimes recommended as the Pythonic approach to multi-core CPUs. > This is an indication that "freezing" is not unacceptable to the > user community. > > ? ?Most of the real-world use cases for extreme dynamism > involve events that happen during startup. ?Configuration files are > read, modules are selectively included, functions are overridden, tables > of references to functions are set up, regular expressions are compiled, > and the code is brought into the appropriately configured state. ?Then > the worker threads are started and the real work starts. The > "newthreading" approach allows all that. > > ? ?After two decades of failed attempts remove the Global > Interpreter Lock without making performance worse, it is perhaps > time to take a harder look at scaleable threading semantics. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?John Nagle > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Animats -- --Guido van Rossum (python.org/~guido) From steve at holdenweb.com Tue Jun 29 15:56:11 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 29 Jun 2010 09:56:11 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes Message-ID: I hope this is an appropriate dev topic. It seems to me that the unicode discussions of recent days are well highlighted by difficulties I am having using the mailbox module (hardly surprising given the difficulties of handling email generally) even though it passes its tests. I can't find anything related in the issue tracker (symptoms: one program that works fine under Python 2 in under twenty seconds takes forever (over ten minutes) to fail while creating the (start, stop) index to the mailbox). My code reads Thunderbird mailboxen from file store on my Windows Vista system under 3.1. The failures I am experiencing could easily be encoding issues so I won't post any detail yet, but I am concerned about the timing - even when the code is "fixed", if it needs to be, the performance may still make the module of dubious value. Can someone who is set up to do easily just do a timing of test_mailbox under 2.6 and 3.2, to verify they see the same disparity as me? The test takes about twice as long under 3.1 here (and I am concerned that unexercised aspects of the code may extend real-world problem run times by an order of magnitude or more). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From miki.tebeka at gmail.com Tue Jun 29 16:10:20 2010 From: miki.tebeka at gmail.com (Miki Tebeka) Date: Tue, 29 Jun 2010 07:10:20 -0700 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: Message-ID: Hello Steve, > Can someone who is set up to do easily just do a timing of test_mailbox > under 2.6 and 3.2, to verify they see the same disparity as me? The test > takes about twice as long under 3.1 here On Ubuntu timing was: Python 2.6.5: 23.8sec Python 2.7rc2: 32.7sec Python 3.1.2: 32.3sec All the best, -- Miki From orsenthil at gmail.com Tue Jun 29 16:11:20 2010 From: orsenthil at gmail.com (Senthil Kumaran) Date: Tue, 29 Jun 2010 19:41:20 +0530 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: Message-ID: <20100629141120.GA7448@remy> On Tue, Jun 29, 2010 at 09:56:11AM -0400, Steve Holden wrote: > Can someone who is set up to do easily just do a timing of test_mailbox > under 2.6 and 3.2, to verify they see the same disparity as me? The test Actually, No. Python 2.7b2+ (trunk:81685M, Jun 4 2010, 21:52:06) Ran 274 tests in 27.231s OK real 0m27.769s user 0m1.110s sys 0m0.440s Python 3.2a0 (py3k:82364M, Jun 29 2010, 19:37:27 Ran 268 tests in 24.444s OK real 0m25.126s user 0m2.810s sys 0m0.270s 07:39 PM:senthil@:~/python/py3k This is under Ubuntu 64 Bit. Perhaps, the problem you are observing is Windows Only? -- Senthil Banectomy, n.: The removal of bruises on a banana. -- Rich Hall, "Sniglets" From ncoghlan at gmail.com Tue Jun 29 16:14:31 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 30 Jun 2010 00:14:31 +1000 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: Message-ID: Command line: ./python -m test.regrtest -v test_mailbox trunk: Ran 274 tests in 25.239s py3k: Ran 268 tests in 26.263s So I don't see any substantial difference on a Kubuntu 10.04 box (both builds are recent'ish, but not completely up to date). However, the underlying IO access is significantly different between POSIX and Windows, so there could still be something pathological happening at the filesystem manipulation layer. My comparisons are also 2.7 vs 3.2 rather than 2.6 vs 3.1. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at holdenweb.com Tue Jun 29 16:26:28 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 29 Jun 2010 10:26:28 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: Message-ID: <4C2A0294.3070806@holdenweb.com> Nick Coghlan wrote: > Command line: ./python -m test.regrtest -v test_mailbox > > trunk: Ran 274 tests in 25.239s > py3k: Ran 268 tests in 26.263s > > So I don't see any substantial difference on a Kubuntu 10.04 box (both > builds are recent'ish, but not completely up to date). > > However, the underlying IO access is significantly different between > POSIX and Windows, so there could still be something pathological > happening at the filesystem manipulation layer. My comparisons are > also 2.7 vs 3.2 rather than 2.6 vs 3.1. > > Cheers, > Nick. > Thanks for all the timings! If a Windows user could do the same thing that would help ... regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Tue Jun 29 16:49:00 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 29 Jun 2010 10:49:00 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <4C2A0294.3070806@holdenweb.com> References: <4C2A0294.3070806@holdenweb.com> Message-ID: Steve Holden wrote: > Nick Coghlan wrote: >> Command line: ./python -m test.regrtest -v test_mailbox >> >> trunk: Ran 274 tests in 25.239s >> py3k: Ran 268 tests in 26.263s >> >> So I don't see any substantial difference on a Kubuntu 10.04 box (both >> builds are recent'ish, but not completely up to date). >> >> However, the underlying IO access is significantly different between >> POSIX and Windows, so there could still be something pathological >> happening at the filesystem manipulation layer. My comparisons are >> also 2.7 vs 3.2 rather than 2.6 vs 3.1. >> >> Cheers, >> Nick. >> > Thanks for all the timings! If a Windows user could do the same thing > that would help ... > And there is *definitely a performance issue. I created a Thunderbird folder of 26 Google alerts and just parsed then all after reading them in from the mailbox. 2.5 (!): 0.78 sec 3.1 : 42.80 sec Rather than debate the code here perhaps I should just open an issue for this? I can then provide both a program and some data, which can be added to the tests if appropriate. The issue can clearly stand some investigation. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From barry at python.org Tue Jun 29 16:50:12 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 29 Jun 2010 10:50:12 -0400 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C28BF83.9080903@egenix.com> References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> <4C28BF83.9080903@egenix.com> Message-ID: <20100629105012.341adc7b@heresy> On Jun 28, 2010, at 05:28 PM, M.-A. Lemburg wrote: >How many Python users will compile Python in debug mode ? How many Python users compile Python at all? :) >The point is that the default build of Python should use >the correct production settings for the C compiler out of >the box and that's what AC_PROG_CC is all about. Sure. >I'm pretty sure that Python developers who want to use a >debug build have enough code foo to get the -O2 turned into a -O0 >either by adjust OPT and/or by providing their own CFLAGS env var. Yes, but it's a PITA for several reasons, IMO: * It's pretty underdocumented * It's obscure * It's hard to remember the exact fu needed because you do it infrequently * I usually only remember my mistake when gdb acts funny I strongly suggest that --with-pydebug should be all you need to ensure the best debugging environment, which means turning off compiler optimization. Last time I tried, the -O0 was added and it worked well. (I know this has been in flux though.) >Also note that in some cases you may actually want to have >a debug build with optimizations turned on, e.g. to track down >a compiler optimization bug. Yes, but that's *much* more rare than wanting to step through some bit of C code without going crazy. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From mail at timgolden.me.uk Tue Jun 29 16:51:00 2010 From: mail at timgolden.me.uk (Tim Golden) Date: Tue, 29 Jun 2010 15:51:00 +0100 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <4C2A0294.3070806@holdenweb.com> References: <4C2A0294.3070806@holdenweb.com> Message-ID: <4C2A0854.5060004@timgolden.me.uk> On 29/06/2010 15:26, Steve Holden wrote: > Nick Coghlan wrote: >> Command line: ./python -m test.regrtest -v test_mailbox >> >> trunk: Ran 274 tests in 25.239s >> py3k: Ran 268 tests in 26.263s >> >> So I don't see any substantial difference on a Kubuntu 10.04 box (both >> builds are recent'ish, but not completely up to date). >> >> However, the underlying IO access is significantly different between >> POSIX and Windows, so there could still be something pathological >> happening at the filesystem manipulation layer. My comparisons are >> also 2.7 vs 3.2 rather than 2.6 vs 3.1. >> >> Cheers, >> Nick. >> > Thanks for all the timings! If a Windows user could do the same thing > that would help ... WinXP SP3 2.6 Ran 272 tests in 13.172s 3.1 Ran 267 tests in 15.735s py3k A *lot* of ERROR and FAIL tests WinXP SP3 TJG From barry at python.org Tue Jun 29 16:51:35 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 29 Jun 2010 10:51:35 -0400 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C28C7CB.8030600@egenix.com> References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> <4C28BF83.9080903@egenix.com> <4C28C7CB.8030600@egenix.com> Message-ID: <20100629105135.245bf5d7@heresy> On Jun 28, 2010, at 06:03 PM, M.-A. Lemburg wrote: >OPT already uses -O0 if --with-pydebug is used and the >compiler supports -g. Since OPT gets added after CFLAGS, the override >already happens... So nobody's proposing to drop that? Good! Ignore my last message then. :) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From guido at python.org Tue Jun 29 16:56:22 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Jun 2010 07:56:22 -0700 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: <4C2A0294.3070806@holdenweb.com> Message-ID: On Tue, Jun 29, 2010 at 7:49 AM, Steve Holden wrote: > Steve Holden wrote: >> Nick Coghlan wrote: >>> Command line: ./python -m test.regrtest -v test_mailbox >>> >>> trunk: Ran 274 tests in 25.239s >>> py3k: Ran 268 tests in 26.263s >>> >>> So I don't see any substantial difference on a Kubuntu 10.04 box (both >>> builds are recent'ish, but not completely up to date). >>> >>> However, the underlying IO access is significantly different between >>> POSIX and Windows, so there could still be something pathological >>> happening at the filesystem manipulation layer. My comparisons are >>> also 2.7 vs 3.2 rather than 2.6 vs 3.1. >>> >>> Cheers, >>> Nick. >>> >> Thanks for all the timings! If a Windows user could do the same thing >> that would help ... >> > And there is *definitely a performance issue. I created a Thunderbird > folder of 26 Google alerts and just parsed then all after reading them > in from the mailbox. > > 2.5 (!): ?0.78 sec > 3.1 ? ?: 42.80 sec > > Rather than debate the code here perhaps I should just open an issue for > this? I can then provide both a program and some data, which can be > added to the tests if appropriate. The issue can clearly stand some > investigation. Since you have such a great reproducible test case, could you point the profiler at it? (Perhaps on a reduced dataset... The profiler multiples your run time by some number between 2 and 10 IIRC.) -- --Guido van Rossum (python.org/~guido) From mail at timgolden.me.uk Tue Jun 29 17:04:48 2010 From: mail at timgolden.me.uk (Tim Golden) Date: Tue, 29 Jun 2010 16:04:48 +0100 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <4C2A0854.5060004@timgolden.me.uk> References: <4C2A0294.3070806@holdenweb.com> <4C2A0854.5060004@timgolden.me.uk> Message-ID: <4C2A0B90.9020705@timgolden.me.uk> On 29/06/2010 15:51, Tim Golden wrote: > On 29/06/2010 15:26, Steve Holden wrote: >> Nick Coghlan wrote: >>> Command line: ./python -m test.regrtest -v test_mailbox >>> >>> trunk: Ran 274 tests in 25.239s >>> py3k: Ran 268 tests in 26.263s >>> >>> So I don't see any substantial difference on a Kubuntu 10.04 box (both >>> builds are recent'ish, but not completely up to date). >>> >>> However, the underlying IO access is significantly different between >>> POSIX and Windows, so there could still be something pathological >>> happening at the filesystem manipulation layer. My comparisons are >>> also 2.7 vs 3.2 rather than 2.6 vs 3.1. >>> >>> Cheers, >>> Nick. >>> >> Thanks for all the timings! If a Windows user could do the same thing >> that would help ... > > WinXP SP3 > > 2.6 Ran 272 tests in 13.172s > 3.1 Ran 267 tests in 15.735s > py3k A *lot* of ERROR and FAIL tests py3k HEAD on Win7 Ran 268 tests in 34.055s TJG From vinay_sajip at yahoo.co.uk Tue Jun 29 17:15:22 2010 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Tue, 29 Jun 2010 15:15:22 +0000 (UTC) Subject: [Python-Dev] Pickle security and remote logging References: Message-ID: anatoly techtonik gmail.com> writes: > insecure. SocketHandler and DatagramHandler docs should at least > contain a warning about danger of exposing unpickling interfaces to > insecure networks. I've updated the documentation of SocketHandler.makePickle to mention security concerns, and that the method can be overridden to use a more secure implementation (e.g. HMAC-signed pickles). Regards, Vinay Sajip From steve at holdenweb.com Tue Jun 29 17:29:55 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 29 Jun 2010 11:29:55 -0400 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <20100629105012.341adc7b@heresy> References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> <4C28BF83.9080903@egenix.com> <20100629105012.341adc7b@heresy> Message-ID: Barry Warsaw wrote: > On Jun 28, 2010, at 05:28 PM, M.-A. Lemburg wrote: > >> How many Python users will compile Python in debug mode ? > > How many Python users compile Python at all? :) > >> The point is that the default build of Python should use >> the correct production settings for the C compiler out of >> the box and that's what AC_PROG_CC is all about. > > Sure. > >> I'm pretty sure that Python developers who want to use a >> debug build have enough code foo to get the -O2 turned into a -O0 >> either by adjust OPT and/or by providing their own CFLAGS env var. > > Yes, but it's a PITA for several reasons, IMO: > > * It's pretty underdocumented > * It's obscure > * It's hard to remember the exact fu needed because you do it infrequently > * I usually only remember my mistake when gdb acts funny > > I strongly suggest that --with-pydebug should be all you need to ensure the > best debugging environment, which means turning off compiler optimization. > Last time I tried, the -O0 was added and it worked well. (I know this has > been in flux though.) > >> Also note that in some cases you may actually want to have >> a debug build with optimizations turned on, e.g. to track down >> a compiler optimization bug. > > Yes, but that's *much* more rare than wanting to step through some bit of C > code without going crazy. I agree - trying to step through -O2 optimized code isn't going to help debug your code, it's going to help you debug the optimizer. That's a very rare use case. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Tue Jun 29 17:40:50 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 29 Jun 2010 11:40:50 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: <4C2A0294.3070806@holdenweb.com> Message-ID: Guido van Rossum wrote: > On Tue, Jun 29, 2010 at 7:49 AM, Steve Holden wrote: >> Steve Holden wrote: >>> Nick Coghlan wrote: >>>> Command line: ./python -m test.regrtest -v test_mailbox >>>> >>>> trunk: Ran 274 tests in 25.239s >>>> py3k: Ran 268 tests in 26.263s >>>> >>>> So I don't see any substantial difference on a Kubuntu 10.04 box (both >>>> builds are recent'ish, but not completely up to date). >>>> >>>> However, the underlying IO access is significantly different between >>>> POSIX and Windows, so there could still be something pathological >>>> happening at the filesystem manipulation layer. My comparisons are >>>> also 2.7 vs 3.2 rather than 2.6 vs 3.1. >>>> >>>> Cheers, >>>> Nick. >>>> >>> Thanks for all the timings! If a Windows user could do the same thing >>> that would help ... >>> >> And there is *definitely a performance issue. I created a Thunderbird >> folder of 26 Google alerts and just parsed then all after reading them >> in from the mailbox. >> >> 2.5 (!): 0.78 sec >> 3.1 : 42.80 sec >> >> Rather than debate the code here perhaps I should just open an issue for >> this? I can then provide both a program and some data, which can be >> added to the tests if appropriate. The issue can clearly stand some >> investigation. > > Since you have such a great reproducible test case, could you point > the profiler at it? (Perhaps on a reduced dataset... The profiler > multiples your run time by some number between 2 and 10 IIRC.) > Sure. I attach the outputs of both files, as well as the program and the data. With profiling (python -m cProfile test3.py) the run took less than a third of a second under 2.5, and 168 seconds under 3.1. I'd say that was problematical :) I will leave the profiler output to speak for itself, since I can find nothing much to say about it except that there's a hell of a lot of decoding going on inside mailbox.iterkeys(). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test3.1.out URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test2.5.out URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test3.py URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test.mailbox URL: From solipsis at pitrou.net Tue Jun 29 18:34:22 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 29 Jun 2010 18:34:22 +0200 Subject: [Python-Dev] Mailbox module - timings and functionality changes References: <4C2A0294.3070806@holdenweb.com> Message-ID: <20100629183422.00f1997d@pitrou.net> On Tue, 29 Jun 2010 11:40:50 -0400 Steve Holden wrote: > Sure. I attach the outputs of both files, as well as the program and the > data. With profiling (python -m cProfile test3.py) the run took less > than a third of a second under 2.5, and 168 seconds under 3.1. I'd say > that was problematical :) > > I will leave the profiler output to speak for itself, since I can find > nothing much to say about it except that there's a hell of a lot of > decoding going on inside mailbox.iterkeys(). Ok, a lot of time is spent in cp1252 decoding. Somewhat less time, but still too much of it, is spent in TextIOWrapper.tell(). This seems to imply that mailbox files are opened in text mode, which sounds wrong to me. Perhaps Andrew can shed more light on this? From amk at amk.ca Tue Jun 29 18:34:42 2010 From: amk at amk.ca (A.M. Kuchling) Date: Tue, 29 Jun 2010 12:34:42 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: <4C2A0294.3070806@holdenweb.com> Message-ID: <20100629163442.GA5051@amk-desktop.matrixgroup.net> On Tue, Jun 29, 2010 at 07:56:22AM -0700, Guido van Rossum wrote: > Since you have such a great reproducible test case, could you point > the profiler at it? (Perhaps on a reduced dataset... The profiler > multiples your run time by some number between 2 and 10 IIRC.) Let me underline Guido's suggestion. Steve, I've done a lot of mailbox.py stuff and can look at your problem, but off the top of my head, my suspicion would be that I/O is the culprit, and a profile could confirm that. My thought is that mailbox.py is opening the file in some reading mode that ends up doing a lot more processing on Windows than on Unix because of universal newlines or something like that. --amk From amk at amk.ca Tue Jun 29 18:52:28 2010 From: amk at amk.ca (A.M. Kuchling) Date: Tue, 29 Jun 2010 12:52:28 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: <4C2A0294.3070806@holdenweb.com> Message-ID: <20100629165228.GA5350@amk-desktop.matrixgroup.net> On Tue, Jun 29, 2010 at 11:40:50AM -0400, Steve Holden wrote: > I will leave the profiler output to speak for itself, since I can find > nothing much to say about it except that there's a hell of a lot of > decoding going on inside mailbox.iterkeys(). The problem is actually in _generate_toc(), which is reading through the entire file to figure out where all the 'From' lines that start messages are located. TextIOWrapper()'s tell() method seems to be very slow, so one help is to only call tell() when necessary; patch: -> svn diff Lib/ Index: Lib/mailbox.py =================================================================== --- Lib/mailbox.py (revision 82346) +++ Lib/mailbox.py (working copy) @@ -775,13 +775,14 @@ starts, stops = [], [] self._file.seek(0) while True: - line_pos = self._file.tell() line = self._file.readline() if line.startswith('From '): + line_pos = self._file.tell() if len(stops) < len(starts): stops.append(line_pos - len(os.linesep)) starts.append(line_pos) elif not line: + line_pos = self._file.tell() stops.append(line_pos) break self._toc = dict(enumerate(zip(starts, stops))) But should mailboxes really be opened in a UTF-8 encoding, or should they be treated as 7-bit text? I'll have to think about this. --amk From rdmurray at bitdance.com Tue Jun 29 19:20:35 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 29 Jun 2010 13:20:35 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <20100629183422.00f1997d@pitrou.net> References: <4C2A0294.3070806@holdenweb.com> <20100629183422.00f1997d@pitrou.net> Message-ID: <20100629172035.8348D21A2AF@kimball.webabinitio.net> On Tue, 29 Jun 2010 18:34:22 +0200, Antoine Pitrou wrote: > On Tue, 29 Jun 2010 11:40:50 -0400 > Steve Holden wrote: > > Sure. I attach the outputs of both files, as well as the program and the > > data. With profiling (python -m cProfile test3.py) the run took less > > than a third of a second under 2.5, and 168 seconds under 3.1. I'd say > > that was problematical :) > > > > I will leave the profiler output to speak for itself, since I can find > > nothing much to say about it except that there's a hell of a lot of > > decoding going on inside mailbox.iterkeys(). > > Ok, a lot of time is spent in cp1252 decoding. Somewhat less time, but > still too much of it, is spent in TextIOWrapper.tell(). This seems to > imply that mailbox files are opened in text mode, which sounds wrong to > me. Perhaps Andrew can shed more light on this? Given the current state of the email package for python3, it makes sense that it would open them in text mode. email can't currently process bytes, only text. -- R. David Murray www.bitdance.com From solipsis at pitrou.net Tue Jun 29 19:30:53 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 29 Jun 2010 19:30:53 +0200 Subject: [Python-Dev] Mailbox module - timings and functionality changes References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> Message-ID: <20100629193053.750991e1@pitrou.net> On Tue, 29 Jun 2010 12:52:28 -0400 "A.M. Kuchling" wrote: > > But should mailboxes really be opened in a UTF-8 encoding, or should > they be treated as 7-bit text? I'll have to think about this. I don't see how you can assume UTF-8 for mailbox files, given that each message will have its particular encoding. Besides, Steve's profile results show that you are not using UTF-8, but rather the local encoding, which is cp1252 under his Windows setup. Regards Antoine. From steve at holdenweb.com Tue Jun 29 19:54:09 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 29 Jun 2010 13:54:09 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <20100629165228.GA5350@amk-desktop.matrixgroup.net> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> Message-ID: <4C2A3341.4010705@holdenweb.com> A.M. Kuchling wrote: > On Tue, Jun 29, 2010 at 11:40:50AM -0400, Steve Holden wrote: >> I will leave the profiler output to speak for itself, since I can find >> nothing much to say about it except that there's a hell of a lot of >> decoding going on inside mailbox.iterkeys(). > > The problem is actually in _generate_toc(), which is reading through > the entire file to figure out where all the 'From' lines that start > messages are located. TextIOWrapper()'s tell() method seems to be > very slow, so one help is to only call tell() when necessary; patch: > > -> svn diff Lib/ > Index: Lib/mailbox.py > =================================================================== > --- Lib/mailbox.py (revision 82346) > +++ Lib/mailbox.py (working copy) > @@ -775,13 +775,14 @@ > starts, stops = [], [] > self._file.seek(0) > while True: > - line_pos = self._file.tell() > line = self._file.readline() > if line.startswith('From '): > + line_pos = self._file.tell() > if len(stops) < len(starts): > stops.append(line_pos - len(os.linesep)) > starts.append(line_pos) > elif not line: > + line_pos = self._file.tell() > stops.append(line_pos) > break > self._toc = dict(enumerate(zip(starts, stops))) > > But should mailboxes really be opened in a UTF-8 encoding, or should > they be treated as 7-bit text? I'll have to think about this. Neither! You can't open them as 7-bit text, because real-world email does contain bytes whose ordinal value exceeds 127. You can't open them using a text encoding because theoretically there might be ASCII headers that indicate that parts of the content are in specific character sets or encodings. If only we had a data structure that easily allowed us to manipulate 8-bit characters ... regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From guido at python.org Tue Jun 29 21:26:31 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Jun 2010 12:26:31 -0700 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <4C2A3341.4010705@holdenweb.com> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> Message-ID: It should probably be opened in binary mode. Binary files do have a .readline() method (returning a bytes object), and bytes objects have a .startswith() method. The tell positions computed this way are even compatible with those used by the text file. So you could do it this way: - open binary stream - compute TOC by reading through it using .readline() and .tell() - rewind (don't close) - wrap the binary stream in a text stream - use that for the rest of the code --Guido On Tue, Jun 29, 2010 at 10:54 AM, Steve Holden wrote: > A.M. Kuchling wrote: >> On Tue, Jun 29, 2010 at 11:40:50AM -0400, Steve Holden wrote: >>> I will leave the profiler output to speak for itself, since I can find >>> nothing much to say about it except that there's a hell of a lot of >>> decoding going on inside mailbox.iterkeys(). >> >> The problem is actually in _generate_toc(), which is reading through >> the entire file to figure out where all the 'From' lines that start >> messages are located. ?TextIOWrapper()'s tell() method seems to be >> very slow, so one help is to only call tell() when necessary; patch: >> >> -> svn diff Lib/ >> Index: Lib/mailbox.py >> =================================================================== >> --- Lib/mailbox.py ? ?(revision 82346) >> +++ Lib/mailbox.py ? ?(working copy) >> @@ -775,13 +775,14 @@ >> ? ? ? ? ?starts, stops = [], [] >> ? ? ? ? ?self._file.seek(0) >> ? ? ? ? ?while True: >> - ? ? ? ? ? ?line_pos = self._file.tell() >> ? ? ? ? ? ? ?line = self._file.readline() >> ? ? ? ? ? ? ?if line.startswith('From '): >> + ? ? ? ? ? ? ? ?line_pos = self._file.tell() >> ? ? ? ? ? ? ? ? ?if len(stops) < len(starts): >> ? ? ? ? ? ? ? ? ? ? ?stops.append(line_pos - len(os.linesep)) >> ? ? ? ? ? ? ? ? ?starts.append(line_pos) >> ? ? ? ? ? ? ?elif not line: >> + ? ? ? ? ? ? ? ?line_pos = self._file.tell() >> ? ? ? ? ? ? ? ? ?stops.append(line_pos) >> ? ? ? ? ? ? ? ? ?break >> ? ? ? ? ?self._toc = dict(enumerate(zip(starts, stops))) >> >> But should mailboxes really be opened in a UTF-8 encoding, or should >> they be treated as 7-bit text? ?I'll have to think about this. > > Neither! You can't open them as 7-bit text, because real-world email > does contain bytes whose ordinal value exceeds 127. You can't open them > using a text encoding because theoretically there might be ASCII headers > that indicate that parts of the content are in specific character sets > or encodings. > > If only we had a data structure that easily allowed us to manipulate > 8-bit characters ... > > regards > ?Steve > -- > Steve Holden ? ? ? ? ? +1 571 484 6266 ? +1 800 494 3119 > See Python Video! ? ? ? http://python.mirocommunity.org/ > Holden Web LLC ? ? ? ? ? ? ? ? http://www.holdenweb.com/ > UPCOMING EVENTS: ? ? ? ?http://holdenweb.eventbrite.com/ > "All I want for my birthday is another birthday" - > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Ian Dury, 1942-2000 > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From steve at holdenweb.com Tue Jun 29 23:02:14 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 29 Jun 2010 17:02:14 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> Message-ID: <4C2A5F56.2010700@holdenweb.com> Guido van Rossum wrote: > It should probably be opened in binary mode. Binary files do have a > .readline() method (returning a bytes object), and bytes objects have > a .startswith() method. The tell positions computed this way are even > compatible with those used by the text file. So you could do it this > way: > > - open binary stream > - compute TOC by reading through it using .readline() and .tell() > - rewind (don't close) Because closing is inefficient, or because it breaks the algorithm? > - wrap the binary stream in a text stream "wrap" how? The ultimate destiny of the text is twofold: 1) To be stored as some kind of LOB in a database, and 2) Therefrom to be reconstituted and parsed into email.Message objects. Is the wrapping a one-off operation or a software layer? Sorry, being a bit dense here, I know. regards Steve > - use that for the rest of the code > > --Guido > > On Tue, Jun 29, 2010 at 10:54 AM, Steve Holden wrote: >> A.M. Kuchling wrote: >>> On Tue, Jun 29, 2010 at 11:40:50AM -0400, Steve Holden wrote: >>>> I will leave the profiler output to speak for itself, since I can find >>>> nothing much to say about it except that there's a hell of a lot of >>>> decoding going on inside mailbox.iterkeys(). >>> The problem is actually in _generate_toc(), which is reading through >>> the entire file to figure out where all the 'From' lines that start >>> messages are located. TextIOWrapper()'s tell() method seems to be >>> very slow, so one help is to only call tell() when necessary; patch: >>> >>> -> svn diff Lib/ >>> Index: Lib/mailbox.py >>> =================================================================== >>> --- Lib/mailbox.py (revision 82346) >>> +++ Lib/mailbox.py (working copy) >>> @@ -775,13 +775,14 @@ >>> starts, stops = [], [] >>> self._file.seek(0) >>> while True: >>> - line_pos = self._file.tell() >>> line = self._file.readline() >>> if line.startswith('From '): >>> + line_pos = self._file.tell() >>> if len(stops) < len(starts): >>> stops.append(line_pos - len(os.linesep)) >>> starts.append(line_pos) >>> elif not line: >>> + line_pos = self._file.tell() >>> stops.append(line_pos) >>> break >>> self._toc = dict(enumerate(zip(starts, stops))) >>> >>> But should mailboxes really be opened in a UTF-8 encoding, or should >>> they be treated as 7-bit text? I'll have to think about this. >> Neither! You can't open them as 7-bit text, because real-world email >> does contain bytes whose ordinal value exceeds 127. You can't open them >> using a text encoding because theoretically there might be ASCII headers >> that indicate that parts of the content are in specific character sets >> or encodings. >> >> If only we had a data structure that easily allowed us to manipulate >> 8-bit characters ... >> >> regards >> Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From techtonik at gmail.com Wed Jun 30 01:22:59 2010 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 30 Jun 2010 02:22:59 +0300 Subject: [Python-Dev] Pickle security and remote logging In-Reply-To: References: Message-ID: On Tue, Jun 29, 2010 at 6:15 PM, Vinay Sajip wrote: > > I've updated the documentation of SocketHandler.makePickle to mention security > concerns, and that the method can be overridden to use a more secure > implementation (e.g. HMAC-signed pickles). Thanks. But I doubt HMAC complication helps to protect logging server. If shared key is compromised -server becomes vulnerable. I would prefer approach when no code execution is possible. Some alternative serialization way for transmitting log data structures over network. Protocol buffers first come in mind, but they seem to be an overkill, and stdlib doesn't include any implementation. -- anatoly t. From guido at python.org Wed Jun 30 01:41:52 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Jun 2010 16:41:52 -0700 Subject: [Python-Dev] Pickle security and remote logging In-Reply-To: References: Message-ID: On Tue, Jun 29, 2010 at 4:22 PM, anatoly techtonik wrote: > On Tue, Jun 29, 2010 at 6:15 PM, Vinay Sajip wrote: >> >> I've updated the documentation of SocketHandler.makePickle to mention security >> concerns, and that the method can be overridden to use a more secure >> implementation (e.g. HMAC-signed pickles). > > Thanks. But I doubt HMAC complication helps to protect logging server. > If shared key is compromised -server becomes vulnerable. I would > prefer approach when no code execution is possible. Some alternative > serialization way for transmitting log data structures over network. > Protocol buffers first come in mind, but they seem to be an overkill, > and stdlib doesn't include any implementation. You could use marshal by default. It does not execute code when unmarshalling. A limitation is that it only supports built-in types like list, dict, string etc. but that might be just fine for logging data. Another option would be JSON. (Or XML, if you want bulky. :-) As for protocol buffers, assuming its absence (so far :-) from the stdlib is the only objection, how hard would it be to make the logging package "prepared" so that if one *did* have protocol buffers installed, it would be a one-line config setting to use them? -- --Guido van Rossum (python.org/~guido) From rdmurray at bitdance.com Wed Jun 30 01:56:30 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 29 Jun 2010 19:56:30 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <4C2A3341.4010705@holdenweb.com> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> Message-ID: <20100629235630.E02B61FDDBE@kimball.webabinitio.net> On Tue, 29 Jun 2010 13:54:09 -0400, Steve Holden wrote: > A.M. Kuchling wrote: > > But should mailboxes really be opened in a UTF-8 encoding, or should > > they be treated as 7-bit text? I'll have to think about this. > > Neither! You can't open them as 7-bit text, because real-world email > does contain bytes whose ordinal value exceeds 127. You can't open them > using a text encoding because theoretically there might be ASCII headers > that indicate that parts of the content are in specific character sets > or encodings. > > If only we had a data structure that easily allowed us to manipulate > 8-bit characters ... email6 *will* handle this use case. When it exists :) But note that it is *not* just a matter of easily handling 8 bit characters. There are a whole bunch of algorithms needed for interpreting that 7 and 8 bit data. All the info is there in the email headers, but being able to do string operations on 8 bit byte strings doesn't get you the answers you need by itself. It really is the case that the Python3 bytes/unicode split forces us to redo most of the algorithms so that they handle bytes and text *correctly*. This isn't a trivial undertaking, but the end result will be well worth it. -- R. David Murray www.bitdance.com From rdmurray at bitdance.com Wed Jun 30 02:05:29 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 29 Jun 2010 20:05:29 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <4C2A5F56.2010700@holdenweb.com> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> <4C2A5F56.2010700@holdenweb.com> Message-ID: <20100630000529.3AA351FF08C@kimball.webabinitio.net> On Tue, 29 Jun 2010 17:02:14 -0400, Steve Holden wrote: > Guido van Rossum wrote: > > > - wrap the binary stream in a text stream > > "wrap" how? The ultimate destiny of the text is twofold: I would imagine Guido is talking about an io.TextIOWrapper...in other words, take the binary file you've just finished grabbing info from, and reread it as a text file in order to grab the actual message content. If you have messages in your files that are using an 8bit content transfer encoding, then you (currently) will have some problems unless the charset happens to be the one you use when you wrap the binary stream as a text stream. -- R. David Murray www.bitdance.com From steve at holdenweb.com Wed Jun 30 02:31:59 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 29 Jun 2010 20:31:59 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <20100629235630.E02B61FDDBE@kimball.webabinitio.net> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> <20100629235630.E02B61FDDBE@kimball.webabinitio.net> Message-ID: <4C2A907F.1010409@holdenweb.com> R. David Murray wrote: > On Tue, 29 Jun 2010 13:54:09 -0400, Steve Holden wrote: >> A.M. Kuchling wrote: >>> But should mailboxes really be opened in a UTF-8 encoding, or should >>> they be treated as 7-bit text? I'll have to think about this. >> Neither! You can't open them as 7-bit text, because real-world email >> does contain bytes whose ordinal value exceeds 127. You can't open them >> using a text encoding because theoretically there might be ASCII headers >> that indicate that parts of the content are in specific character sets >> or encodings. >> >> If only we had a data structure that easily allowed us to manipulate >> 8-bit characters ... > > email6 *will* handle this use case. When it exists :) But note that it > is *not* just a matter of easily handling 8 bit characters. There are > a whole bunch of algorithms needed for interpreting that 7 and 8 bit data. > All the info is there in the email headers, but being able to do string > operations on 8 bit byte strings doesn't get you the answers you need > by itself. > > It really is the case that the Python3 bytes/unicode split forces us > to redo most of the algorithms so that they handle bytes and text > *correctly*. This isn't a trivial undertaking, but the end result > will be well worth it. > I completely agree. The unusual thing here is that I of all people should find himself running into these issues, since my use of Python is normally pretty conservative. Since the course I am currently writing is already overdue I have to find answers now to problems that were present in the initial 3.0 release and have not received much attention since. You know that I support your work to revise the email package. I hope that we can eventually have it incorporate mailbox readers as well. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From janssen at parc.com Wed Jun 30 04:55:12 2010 From: janssen at parc.com (Bill Janssen) Date: Tue, 29 Jun 2010 19:55:12 PDT Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? Message-ID: <71728.1277866512@parc.com> My Leopard and Tiger PPC buildbots are momentarily green! But I'm looking into why I'm skipping some tests. My buildbots are up-to-date OS-wise and very vanilla, with the latest applicable Xcode. 4 skips unexpected on darwin: test_gdb test_ioctl test_readline test_ttk_guionly Three of these (gdb, readline, ttk_guionly) are just bad predictions of which tests should skip on Darwin, I think -- gdb is only version 6, so that test won't run, readline doesn't get built, ttk doesn't work without Tcl/Tk 8.5. But the the skip of test_ioctl baffles me. "test_ioctl skipped -- Unable to open /dev/tty" But when I log in via ssh and try it with the system python: ~ wjanssen$ python python Python 2.5.1 (r251:54863, Jun 17 2009, 20:37:34) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> open("/dev/tty") open("/dev/tty") >>> Seems to work fine. So this I don't understand. Any ideas, anyone? Bill From stephen at xemacs.org Wed Jun 30 04:55:02 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 30 Jun 2010 11:55:02 +0900 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> <4C28BF83.9080903@egenix.com> <20100629105012.341adc7b@heresy> Message-ID: <87y6dxb56h.fsf@uwakimon.sk.tsukuba.ac.jp> Steve Holden writes: > I agree - trying to step through -O2 optimized code isn't going to > help debug your code, it's going to help you debug the > optimizer. That's a very rare use case. Not really. I don't have a lot of practice in debugging at that level, so take it with a grain of salt, but what I've found with XEmacs code is that debugging at -O0 is less often helpful than debugging at -O2. Quite often a naive compilation strategy is used which basically turns those C statements into macros for the underlying assembler, and the code works the way the author thinks it should. But his assumptions are invalid, and when optimized it fails. So I guess you can call that "debugging the optimizer" if you like.... From guido at python.org Wed Jun 30 05:57:09 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Jun 2010 20:57:09 -0700 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <71728.1277866512@parc.com> References: <71728.1277866512@parc.com> Message-ID: On Tue, Jun 29, 2010 at 7:55 PM, Bill Janssen wrote: > My Leopard and Tiger PPC buildbots are momentarily green! ?But I'm > looking into why I'm skipping some tests. ?My buildbots are up-to-date > OS-wise and very vanilla, with the latest applicable Xcode. > > 4 skips unexpected on darwin: > ? ?test_gdb test_ioctl test_readline test_ttk_guionly > > Three of these (gdb, readline, ttk_guionly) are just bad predictions of > which tests should skip on Darwin, I think -- gdb is only version 6, so > that test won't run, readline doesn't get built, ttk doesn't work > without Tcl/Tk 8.5. So it looks like you gould get readline and ttk to run and pass by separately downloading and installing readline (I've done this many times before) and Tcl/Tk (no idea but I suppose it should work). >?But the the skip of test_ioctl baffles me. > > "test_ioctl skipped -- Unable to open /dev/tty" > > But when I log in via ssh and try it with the system python: > > ~ wjanssen$ python > python > Python 2.5.1 (r251:54863, Jun 17 2009, 20:37:34) > [GCC 4.0.1 (Apple Inc. build 5465)] on darwin > Type "help", "copyright", "credits" or "license" for more information. >>>> open("/dev/tty") > open("/dev/tty") > >>>> > > Seems to work fine. ?So this I don't understand. ?Any ideas, anyone? Maybe the buildbot runs the tests as a tty-less daemon process. If you ask me it's pretty crazy to have a test that requires a tty. But there you have it -- and it's the same in Python 3. (But then again, who knows, I might have written that test. ;-) -- --Guido van Rossum (python.org/~guido) From martin at v.loewis.de Wed Jun 30 07:24:33 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 30 Jun 2010 07:24:33 +0200 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <71728.1277866512@parc.com> References: <71728.1277866512@parc.com> Message-ID: <4C2AD511.5020709@v.loewis.de> > Seems to work fine. So this I don't understand. Any ideas, anyone? Didn't we discuss this before? The buildbot slave has no controlling terminal anymore, hence it cannot open /dev/tty. If you are curious, just patch your checkout to output the exact errno (e.g. to stdout), and trigger a build through the web. Regards, Martin From martin at v.loewis.de Wed Jun 30 07:37:18 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 30 Jun 2010 07:37:18 +0200 Subject: [Python-Dev] Taking over the Mercurial Migration Message-ID: <4C2AD80E.9010404@v.loewis.de> It seems that both Dirkjan and Brett are very caught up with real life for the coming months. So I suggest that some other committer who favors the Mercurial transition steps forward and takes over this project. If nobody volunteers, I propose that we release 3.2 from Subversion, and reconsider Mercurial migration next year. Regards, Martin From stephen at xemacs.org Wed Jun 30 08:19:37 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 30 Jun 2010 15:19:37 +0900 Subject: [Python-Dev] Taking over the Mercurial Migration In-Reply-To: <4C2AD80E.9010404@v.loewis.de> References: <4C2AD80E.9010404@v.loewis.de> Message-ID: <87sk45avpi.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > It seems that both Dirkjan and Brett are very caught up > with real life for the coming months. So I suggest that > some other committer who favors the Mercurial transition > steps forward and takes over this project. I am not a committer, and am not intimately familiar with PEP 385, so not appropriate to become the proponent, I think. However, I am one of the PEP 374 co-authors, and have experience with previous transition to Mercurial of similar scale (XEmacs). I can promise to devote time to the transition in July and August, in support of whoever might step forward. I hope someone does. > If nobody volunteers, I propose that we release 3.2 > from Subversion, and reconsider Mercurial migration > next year. In the absence of a volunteer, I think that's probably necessary. From g.brandl at gmx.net Wed Jun 30 10:41:51 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 30 Jun 2010 10:41:51 +0200 Subject: [Python-Dev] Taking over the Mercurial Migration In-Reply-To: <4C2AD80E.9010404@v.loewis.de> References: <4C2AD80E.9010404@v.loewis.de> Message-ID: Am 30.06.2010 07:37, schrieb "Martin v. L?wis": > It seems that both Dirkjan and Brett are very caught up > with real life for the coming months. So I suggest that > some other committer who favors the Mercurial transition > steps forward and takes over this project. > > If nobody volunteers, I propose that we release 3.2 > from Subversion, and reconsider Mercurial migration > next year. IIUC, Dirkjan is only caught up for another month. I have no problems with releasing a first 3.2 alpha from SVN and then switching, so I propose that we target the migration for August -- I can help in the second half of August if needed. Georg From vinay_sajip at yahoo.co.uk Wed Jun 30 11:23:37 2010 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Wed, 30 Jun 2010 09:23:37 +0000 (UTC) Subject: [Python-Dev] Pickle security and remote logging References: Message-ID: Guido van Rossum python.org> writes: > As for protocol buffers, assuming its absence (so far from the > stdlib is the only objection, how hard would it be to make the logging > package "prepared" so that if one *did* have protocol buffers > installed, it would be a one-line config setting to use them? I envisage that if protocol buffers were available, and if support for them in logging was to be added, this could be done via an optional keyword arg to the SocketHandler which sets a handler attribute, which would then be used in makePickle to make the required serialized form. @anatoly: The documentation just mentions HMAC as an example; the levels of paranoia to be applied are different for different people, different times and different situations ;-) I assume that someone reading the docs could readily see that they could substitute "sign the pickle" with some alternative strategy in makePickle. You could implement marshal, protocol buffers etc. right now just by overriding SocketHandler.makePickle in your custom class. An alternative strategy would be to provide an optional serializer=None callable in the SocketHandler constructor. If specified, then makePickle would call this serializer with the LogRecord instance as the only argument, and use the return value as the serialized form, instead of calling pickle.dumps. Regards, Vinay Sajip From exarkun at twistedmatrix.com Wed Jun 30 13:32:32 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Wed, 30 Jun 2010 11:32:32 -0000 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <4C2AD511.5020709@v.loewis.de> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> Message-ID: <20100630113232.1937.151974582.divmod.xquotient.556@localhost.localdomain> On 05:24 am, martin at v.loewis.de wrote: >>Seems to work fine. So this I don't understand. Any ideas, anyone? > >Didn't we discuss this before? The buildbot slave has no controlling >terminal anymore, hence it cannot open /dev/tty. If you are curious, >just patch your checkout to output the exact errno (e.g. to stdout), >and trigger a build through the web. Could the test be rewritten (or supplemented) to use a pty? Most or perhaps all of the same operations should be supported. Jean-Paul From steve at holdenweb.com Wed Jun 30 14:42:05 2010 From: steve at holdenweb.com (Steve Holden) Date: Wed, 30 Jun 2010 08:42:05 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <20100630000529.3AA351FF08C@kimball.webabinitio.net> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> <4C2A5F56.2010700@holdenweb.com> <20100630000529.3AA351FF08C@kimball.webabinitio.net> Message-ID: <4C2B3B9D.3080200@holdenweb.com> R. David Murray wrote: > On Tue, 29 Jun 2010 17:02:14 -0400, Steve Holden wrote: >> Guido van Rossum wrote: >> >>> - wrap the binary stream in a text stream >> "wrap" how? The ultimate destiny of the text is twofold: > > I would imagine Guido is talking about an io.TextIOWrapper...in other > words, take the binary file you've just finished grabbing info > from, and reread it as a text file in order to grab the actual > message content. > > If you have messages in your files that are using an 8bit content > transfer encoding, then you (currently) will have some problems > unless the charset happens to be the one you use when you wrap > the binary stream as a text stream. > http://bugs.python.org/issue9124 regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 DjangoCon US September 7-9, 2010 http://djangocon.us/ See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ From steve at holdenweb.com Wed Jun 30 14:42:05 2010 From: steve at holdenweb.com (Steve Holden) Date: Wed, 30 Jun 2010 08:42:05 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <20100630000529.3AA351FF08C@kimball.webabinitio.net> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> <4C2A5F56.2010700@holdenweb.com> <20100630000529.3AA351FF08C@kimball.webabinitio.net> Message-ID: <4C2B3B9D.3080200@holdenweb.com> R. David Murray wrote: > On Tue, 29 Jun 2010 17:02:14 -0400, Steve Holden wrote: >> Guido van Rossum wrote: >> >>> - wrap the binary stream in a text stream >> "wrap" how? The ultimate destiny of the text is twofold: > > I would imagine Guido is talking about an io.TextIOWrapper...in other > words, take the binary file you've just finished grabbing info > from, and reread it as a text file in order to grab the actual > message content. > > If you have messages in your files that are using an 8bit content > transfer encoding, then you (currently) will have some problems > unless the charset happens to be the one you use when you wrap > the binary stream as a text stream. > http://bugs.python.org/issue9124 regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 DjangoCon US September 7-9, 2010 http://djangocon.us/ See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ From janssen at parc.com Wed Jun 30 18:00:09 2010 From: janssen at parc.com (Bill Janssen) Date: Wed, 30 Jun 2010 09:00:09 PDT Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: References: <71728.1277866512@parc.com> Message-ID: <68796.1277913609@parc.com> Guido van Rossum wrote: > On Tue, Jun 29, 2010 at 7:55 PM, Bill Janssen wrote: > > My Leopard and Tiger PPC buildbots are momentarily green! ?But I'm > > looking into why I'm skipping some tests. ?My buildbots are up-to-date > > OS-wise and very vanilla, with the latest applicable Xcode. > > > > 4 skips unexpected on darwin: > > ? ?test_gdb test_ioctl test_readline test_ttk_guionly > > > > Three of these (gdb, readline, ttk_guionly) are just bad predictions of > > which tests should skip on Darwin, I think -- gdb is only version 6, so > > that test won't run, readline doesn't get built, ttk doesn't work > > without Tcl/Tk 8.5. > > So it looks like you gould get readline and ttk to run and pass by > separately downloading and installing readline (I've done this many > times before) and Tcl/Tk (no idea but I suppose it should work). Sure. But the skips should be expected "on Darwin", since a vanilla OS X system apparently won't have the necessary bits. At the very least, regrtest.py should test for these conditions and add them to the "expected skips" list if necessary. I'll work up a patch. > >?But the the skip of test_ioctl baffles me. > > > > "test_ioctl skipped -- Unable to open /dev/tty" > > > > But when I log in via ssh and try it with the system python: > > > > ~ wjanssen$ python > > python > > Python 2.5.1 (r251:54863, Jun 17 2009, 20:37:34) > > [GCC 4.0.1 (Apple Inc. build 5465)] on darwin > > Type "help", "copyright", "credits" or "license" for more information. > >>>> open("/dev/tty") > > open("/dev/tty") > > > >>>> > > > > Seems to work fine. ?So this I don't understand. ?Any ideas, anyone? > > Maybe the buildbot runs the tests as a tty-less daemon process. If you > ask me it's pretty crazy to have a test that requires a tty. But there > you have it -- and it's the same in Python 3. (But then again, who > knows, I might have written that test. ;-) So, my question then is, why are these skips "unexpected"? Seems to me that if this is the case, this test will never run on any platform. Bill From janssen at parc.com Wed Jun 30 18:03:15 2010 From: janssen at parc.com (Bill Janssen) Date: Wed, 30 Jun 2010 09:03:15 PDT Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <4C2AD511.5020709@v.loewis.de> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> Message-ID: <68821.1277913795@parc.com> Martin v. L?wis wrote: > > Seems to work fine. So this I don't understand. Any ideas, anyone? > > Didn't we discuss this before? Possibly, but I don't recall doing so. > The buildbot slave has no controlling > terminal anymore, hence it cannot open /dev/tty. If you are curious, > just patch your checkout to output the exact errno (e.g. to stdout), > and trigger a build through the web. So, why is skipping this test "unexpected"? I see "x86 Tiger" is also showing this as an unexpected skip. Should I just add it to the list of expected skips on Darwin? Actually, will it run on any platform? Bill From janssen at parc.com Wed Jun 30 18:26:24 2010 From: janssen at parc.com (Bill Janssen) Date: Wed, 30 Jun 2010 09:26:24 PDT Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <20100630113232.1937.151974582.divmod.xquotient.556@localhost.localdomain> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <20100630113232.1937.151974582.divmod.xquotient.556@localhost.localdomain> Message-ID: <69334.1277915184@parc.com> exarkun at twistedmatrix.com wrote: > Could the test be rewritten (or supplemented) to use a pty? Most or > perhaps all of the same operations should be supported. Buildbot seems to be explicitly not using a PTY. From the the top of the test output: make buildbottest in dir /Users/buildbot/buildarea/trunk.parc-leopard-1/build (timeout 1800 secs) watching logfiles {} argv: ['make', 'buildbottest'] [...] closing stdin using PTY: False I believe this is specified by the build master. This test seems to work on Ubuntu and FreeBSD, though. Bill From solipsis at pitrou.net Wed Jun 30 18:42:58 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 30 Jun 2010 18:42:58 +0200 Subject: [Python-Dev] Mailbox module - timings and functionality changes References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> <4C2A5F56.2010700@holdenweb.com> <20100630000529.3AA351FF08C@kimball.webabinitio.net> Message-ID: <20100630184258.473d8535@pitrou.net> On Tue, 29 Jun 2010 20:05:29 -0400 "R. David Murray" wrote: > > I would imagine Guido is talking about an io.TextIOWrapper...in other > words, take the binary file you've just finished grabbing info > from, and reread it as a text file in order to grab the actual > message content. This sounds a bit suboptimal to me (and introduces race conditions if e.g. the file is replaced with another one before you reopen it). You could instead decode the binary data by yourself, especially if you have already stored that data somewhere. Also, please note that values used by seek() and tell() on text I/O are "opaque cookies". While they can happen to match the raw binary file position, it is a mere coincidence (or an implementation detail, at your will). Therefore, reusing tell() values of a binary file to seek() a TextIOWrapper accessing the same file is wrong. From solipsis at pitrou.net Wed Jun 30 18:44:57 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 30 Jun 2010 18:44:57 +0200 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? References: <71728.1277866512@parc.com> <68796.1277913609@parc.com> Message-ID: <20100630184457.10067764@pitrou.net> On Wed, 30 Jun 2010 09:00:09 PDT Bill Janssen wrote: > > So, my question then is, why are these skips "unexpected"? Seems to me > that if this is the case, this test will never run on any platform. You can change the value of the "usepty" option in your buildbot.tac. (you will also have to restart the buildslave process) Regards Antoine. From guido at python.org Wed Jun 30 19:03:49 2010 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Jun 2010 10:03:49 -0700 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <20100630184258.473d8535@pitrou.net> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> <4C2A5F56.2010700@holdenweb.com> <20100630000529.3AA351FF08C@kimball.webabinitio.net> <20100630184258.473d8535@pitrou.net> Message-ID: On Wed, Jun 30, 2010 at 9:42 AM, Antoine Pitrou wrote: > On Tue, 29 Jun 2010 20:05:29 -0400 > "R. David Murray" wrote: >> >> I would imagine Guido is talking about an io.TextIOWrapper...in other >> words, take the binary file you've just finished grabbing info >> from, and reread it as a text file in order to grab the actual >> message content. > > This sounds a bit suboptimal to me (and introduces race conditions if > e.g. the file is replaced with another one before you reopen it). You > could instead decode the binary data by yourself, especially if you > have already stored that data somewhere. That's why I proposed not reopening but wrapping. Of course the contents of the file could still change, but that's a limitation of how the mailbox module works -- it builds a TOC and expects the file not to change. > Also, please note that values used by seek() and tell() on > text I/O are "opaque cookies". While they can happen to match the > raw binary file position, it is a mere coincidence (or an > implementation detail, at your will). Therefore, reusing tell() values > of a binary file to seek() a TextIOWrapper accessing the same file > is wrong. Well, um, I actually designed it carefully so that bytes offsets *would* work as text offsets in those cases where they make sense at all. -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Wed Jun 30 19:20:34 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 30 Jun 2010 19:20:34 +0200 Subject: [Python-Dev] TextIOWrapper.tell() In-Reply-To: References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> <4C2A5F56.2010700@holdenweb.com> <20100630000529.3AA351FF08C@kimball.webabinitio.net> <20100630184258.473d8535@pitrou.net> Message-ID: <20100630192034.5740825b@pitrou.net> On Wed, 30 Jun 2010 10:03:49 -0700 Guido van Rossum wrote: > > > Also, please note that values used by seek() and tell() on > > text I/O are "opaque cookies". While they can happen to match the > > raw binary file position, it is a mere coincidence (or an > > implementation detail, at your will). Therefore, reusing tell() values > > of a binary file to seek() a TextIOWrapper accessing the same file > > is wrong. > > Well, um, I actually designed it carefully so that bytes offsets > *would* work as text offsets in those cases where they make sense at > all. Ah, this is embarrassing. I always assumed it was an implementation detail since neither the PEP nor the module docs say otherwise. PEP 3116 clearly says: ?Unlike with raw I/O, the units for .seek() are not specified - some implementations (e.g. StringIO) use characters and others (e.g. TextIOWrapper) use bytes.? And also: ?.seek(pos: object, whence: int = 0) -> int Seek to position pos. If pos is non-zero, it must be a cookie returned from .tell() and whence must be zero.? ?it must be a cookie returned from .tell()? here seems to imply that non-zero values of other origin should not be used. Regards Antoine. From guido at python.org Wed Jun 30 19:28:10 2010 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Jun 2010 10:28:10 -0700 Subject: [Python-Dev] TextIOWrapper.tell() In-Reply-To: <20100630192034.5740825b@pitrou.net> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> <4C2A5F56.2010700@holdenweb.com> <20100630000529.3AA351FF08C@kimball.webabinitio.net> <20100630184258.473d8535@pitrou.net> <20100630192034.5740825b@pitrou.net> Message-ID: On Wed, Jun 30, 2010 at 10:20 AM, Antoine Pitrou wrote: > On Wed, 30 Jun 2010 10:03:49 -0700 > Guido van Rossum wrote: >> >> > Also, please note that values used by seek() and tell() on >> > text I/O are "opaque cookies". While they can happen to match the >> > raw binary file position, it is a mere coincidence (or an >> > implementation detail, at your will). Therefore, reusing tell() values >> > of a binary file to seek() a TextIOWrapper accessing the same file >> > is wrong. >> >> Well, um, I actually designed it carefully so that bytes offsets >> *would* work as text offsets in those cases where they make sense at >> all. > > Ah, this is embarrassing. I always assumed it was an implementation > detail since neither the PEP nor the module docs say otherwise. > > PEP 3116 clearly says: > > ?Unlike with raw I/O, the units for .seek() are not specified - some > implementations (e.g. StringIO) use characters and others (e.g. > TextIOWrapper) use bytes.? > > And also: > > ?.seek(pos: object, whence: int = 0) -> int > > ? ?Seek to position pos. If pos is non-zero, it must be a cookie > ? ?returned from .tell() and whence must be zero.? > > ?it must be a cookie returned from .tell()? here seems to imply that > non-zero values of other origin should not be used. Guilty as charged. I really did take care that it would work, but forgot to mention it. I guess we can depend on this property *inside* the stdlib (as long as there are tests for each piece of code depending on it that would break if it ever changed) but should not advertise it widely. Note that it doesn't go the other way -- due to encoding state, text streams can certainly return cookies that make no sense to binary streams. But text streams take byte offsets too and do the best they can. (Obviously if a byte offset points in the middle of a multibyte character all bets are off.) The C stdlib has a similar thing -- while AFAIK POSIX lseek() really is required to return and take byte offsets, this is not required for fseek() and ftell() according to the C std -- but I think it's still a pretty safe bet, and I betcha lots of apps are making this assumption. -- --Guido van Rossum (python.org/~guido) From martin at v.loewis.de Wed Jun 30 19:29:36 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 30 Jun 2010 19:29:36 +0200 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <20100630113232.1937.151974582.divmod.xquotient.556@localhost.localdomain> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <20100630113232.1937.151974582.divmod.xquotient.556@localhost.localdomain> Message-ID: <4C2B7F00.2010602@v.loewis.de> Am 30.06.2010 13:32, schrieb exarkun at twistedmatrix.com: > On 05:24 am, martin at v.loewis.de wrote: >>> Seems to work fine. So this I don't understand. Any ideas, anyone? >> >> Didn't we discuss this before? The buildbot slave has no controlling >> terminal anymore, hence it cannot open /dev/tty. If you are curious, >> just patch your checkout to output the exact errno (e.g. to stdout), >> and trigger a build through the web. > > Could the test be rewritten (or supplemented) to use a pty? Most or > perhaps all of the same operations should be supported. I'm not sure. It uses TIOCGPGRP, basically to establish that ioctl can also put results into a Python array (IIUC). This goes back to http://bugs.python.org/555817 Somebody rewriting it would need to make sure the original test purpose is still met. Regards, Martin From barry at python.org Wed Jun 30 20:16:14 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jun 2010 14:16:14 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C23D3C2.1060500@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> Message-ID: <20100630141614.10dbccde@heresy> I'm trying to catch up on this thread, so I may collapse some responses or refer to points others have brought up. On Jun 24, 2010, at 05:53 PM, Scott Dial wrote: >If the package has .so files that aren't compatible with other version >of python, then what is the motivation for placing that in a shared >location (since it can't actually be shared)? I think Matthias has described the motivation for the Debian/Ubuntu case, and James describes Python's current search algorithm for a packages .py[c] and .so files. There are a few points that you've made that I want to respond to. You claim that versioned .so files scheme is "more complicated" than multiple version-specific search paths (if I understand your counter proposal correctly). It all depends on your point of view. From mine, a 100 line patch that almost nobody but (some) distros will care about or be affected by, and that only changes a fairly obscure build-time configuration, is much simpler than trying to make version-specific search paths work. If you build Python from source, you do not care about this patch and you'll never see its effects. If you get Python on a distribution that only gives you one version of Python at a time, you also will probably never care or see the effects of this patch. If you're a Debian or Ubuntu user who wants to use Python 3.2 and 3.3, you *might* care about it, but most likely it'll just work behind the scenes. If you're a Python packager or work on the Python infrastructure for one of those platforms, then you will care. About just sharing the py files. You say that would be acceptable to you, but it's actually a pretty big deal. If you're supporting two versions of Python, then every distro Python package doubles in size. Even with compression, you're talking longer download times and probably more critically, you've greatly increased CDROM space pressures. The Ubuntu CDROM is already essentially at capacity so doubling the size of all Python packages (most of which btw do not have extension modules) makes such an approach impossible. Moving to a DVD image has been discussed, but it is currently believed not in the best interest of users, especially on slow links, to do so at this time. The versioned .so approach will of course increase the size of packages by twice the contained .so file size, and that's already an uncomfortable but acceptable increase. It's acceptable because of the gain users get by having multiple versions of Python available and the fact that there aren't nearly as many extension modules as there are Python files. Doubling the size of .py files as well isn't acceptable. >But the only motivation for doing this with .pyc files is that the .py >files are able to be shared, since the .pyc is an on-demand-generated, >version-specific artifact (and not the source). The .so file is created >offline by another toolchain, is version-specific, and presumably you >are not suggesting that Python generate it on-demand. Definitely not. pyc files are generated upon installation of the distro package, but of course the .so files must be compiled on a build machine and included in the distro package. The whole process is much simpler if the versioned .so files can just live in the same directory. >For packages that have .so files, won't the distro already have to build >multiple copies of that package for all version of Python? So, why can't >it place them in separate directories that are version-specific at that >time? This is not the same as placing .py files that are >version-agnostic into a version-agnostic location. It's not a matter of "could", it's a matter of simplicity, and I think versioned .so files are the simplest solution given all the constraints. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Wed Jun 30 20:31:05 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jun 2010 14:31:05 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C266185.7080509@ubuntu.com> References: <20100624115048.4fd152e3@heresy> <20100624135119.00b9ac5c@heresy> <20100624142830.4c859faf@limelight.wooz.org> <20100624164637.22fd9160@heresy> <4C266185.7080509@ubuntu.com> Message-ID: <20100630143105.37e1225e@heresy> On Jun 26, 2010, at 10:22 PM, Matthias Klose wrote: >On 24.06.2010 22:46, Barry Warsaw wrote: >> So, we could say that PEP 384 compliant extension modules would get written >> without a version specifier. IOW, we'd treat foo.so as using the ABI. It >> would then be up to the Python runtime to throw ImportErrors if in fact we >> were loading a legacy, non-PEP 384 compliant extension. > >Is it realistic to never break the ABI? I would think of having the ABI >encoded in the file name as well, and only bump the ABI if it does change. >With the "versioned .so files" proposal an ABI bump is necessary with every >python version, with PEP 384 the ABI bump will be decoupled from the python >version. You're right that the ABI will break, requiring a bump, and I think you're right that this means that PEP 384 compliant shared libraries would have to have a version number in their file name (assuming the versioned .so proposal is accepted). The problem is that we would need two version numbers, one for extension modules that are not PEP 384 complaint (and thus get bumped for every new Python version), and one for modules that are PEP 384 compliant (and thus only get bumped once in a while). The reason is that I think it will always be the case that we will have PEP 384 compliant and non-compliant extension modules. Perhaps identifying the underlying problems will lead to a more acceptable patch for Python. My patch tries to take a simple (perhaps too simplistic) solution, and I'm not married to it, but I think the general idea of versioned .so files is the right one. 1. The file name extensions that Python searches for are hardcoded and compiled in. dyload_shlib.c hard codes the file name pattern that extension modules must have in order for Python to load them. They must be .so or module.so. This gets compiled into Python at build time and there's no way for a distro (or anyone else who builds Python from source) to extend the file name patterns without modifying the source code. 2. The extension that distutils writes for shared libraries is dictated by build-time options and cannot be overridden. When you ./configure Python, autoconf figures out what shared library extension your platform uses. It substitutes this into a Makefile variable. That Makefile gets installed into your system with the base Python package and distutils parses the Makefile looking for this variable. When distutils calls your platform compiler, it uses this Makefile variable as the file name extension to use for your shared library. You cannot change this or override it to get distutils to write some other file name extension, well. Of these two problems, #1 is more serious because we have to modify the Python source code to hack in addition shared library search suffixes. #2 can be worked around by renaming the .so file after the build. The disadvantage of this though is that if you're a local packager, you'll have to remember to do the same thing if you want multiple Python version support, because distutils won't take care of it for you. Maybe that's okay, in which case it would still be good to address #1. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Wed Jun 30 20:39:50 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jun 2010 14:39:50 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C246E81.3020302@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: <20100630143950.60da41f7@heresy> On Jun 25, 2010, at 04:53 AM, Scott Dial wrote: >My suggestion was that a package that contains .so files should not be >shared (e.g., the entire lxml package should be placed in a >version-specific path). Matthias outlined some of the pitfalls with this approach. >The motivation for this PEP was to simplify the installation python packages >for distros; it was not to reduce the number of .py files on the disk. As others have pointed out, versioned so files is not part of PEP 3147. That PEP does reduce the number of py files on disk, which as I explained in a previous follow, is an important consideration. >Placing .so files together does not simplify that install process in any >way. I disagree of course. :) >You will still have to handle such packages in a special way. You must still >compile the package multiple times for each relevant version of python (with >special tagging that I imagine distutils can take care of) and, worse yet, No, distutils cannot take care of this. There is no way currently to tell distutils to generate a .so file with anything but the platform-specific way of spelling "shared library". >you have created a more trick install than merely having multiple search >paths (e.g., installing/uninstalling lxml for *one* version of python is >actually more difficult in this scheme). That's not a use case we care about. If you have Python 3.2 and 3.3 installed on your system, why would you want lxml installed for one but not the other? And even if for some reason you did, the only way to do that would be in a way similar to handling the PEP 3147 pyc files. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From exarkun at twistedmatrix.com Wed Jun 30 20:46:02 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Wed, 30 Jun 2010 18:46:02 -0000 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <20100630184457.10067764@pitrou.net> References: <71728.1277866512@parc.com> <68796.1277913609@parc.com> <20100630184457.10067764@pitrou.net> Message-ID: <20100630184602.1937.1550858232.divmod.xquotient.569@localhost.localdomain> On 04:44 pm, solipsis at pitrou.net wrote: >On Wed, 30 Jun 2010 09:00:09 PDT >Bill Janssen wrote: >> >>So, my question then is, why are these skips "unexpected"? Seems to >>me >>that if this is the case, this test will never run on any platform. > >You can change the value of the "usepty" option in your buildbot.tac. >(you will also have to restart the buildslave process) But don't do this. The usepty option is completely unrelated to the suggestion I was making. Flipping it to True will only cause other things to break and have no impact on this test. Jean-Paul From exarkun at twistedmatrix.com Wed Jun 30 20:49:54 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Wed, 30 Jun 2010 18:49:54 -0000 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <69334.1277915184@parc.com> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <20100630113232.1937.151974582.divmod.xquotient.556@localhost.localdomain> <69334.1277915184@parc.com> Message-ID: <20100630184954.1937.1956849777.divmod.xquotient.577@localhost.localdomain> On 04:26 pm, janssen at parc.com wrote: >exarkun at twistedmatrix.com wrote: >>Could the test be rewritten (or supplemented) to use a pty? Most or >>perhaps all of the same operations should be supported. > >Buildbot seems to be explicitly not using a PTY. From the the top of >the test output: > >make buildbottest >in dir /Users/buildbot/buildarea/trunk.parc-leopard-1/build (timeout >1800 secs) >watching logfiles {} >argv: ['make', 'buildbottest'] >[...] >closing stdin >using PTY: False This output is telling you that the build slave isn't giving the child processes it creates a pty. What I had in mind was writing the test to create a new pty, instead of trying to use the controlling tty. So basically, the two things are completely unrelated and this buildbot configuration isn't hurting anything (and in fact is likely helping quite a few things, so I suggest leaving it alone). > >I believe this is specified by the build master. > >This test seems to work on Ubuntu and FreeBSD, though. That's interesting. I wonder if those slaves are able to open /dev/tty for some reason? The slave is supposed to detach from the controlling terminal when it daemonizes. There could be a bug in that code, I suppose, or the slaves could be running without daemonization for some reason. The operators would have to tell us about that, I think. Or, another possibility is that /dev/tty doesn't work how I expect it to and on Ubuntu and FreeBSD it can be opened even if you don't have a controlling terminal. Hopefully not, though. Jean-Paul From barry at python.org Wed Jun 30 20:53:29 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jun 2010 14:53:29 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C268433.30405@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> <4C265DC6.4080600@ubuntu.com> <4C268433.30405@scottdial.com> Message-ID: <20100630145329.736f2aab@heresy> On Jun 26, 2010, at 06:50 PM, Scott Dial wrote: >On 6/26/2010 4:06 PM, Matthias Klose wrote: >> On 25.06.2010 22:12, James Y Knight wrote: >>> On Jun 25, 2010, at 4:53 AM, Scott Dial wrote: >>>> Placing .so files together does not simplify that install process in any >>>> way. You will still have to handle such packages in a special way. >>> >>> This is a good point, but I think still falls short of a solution. For a >>> package like lxml, indeed you are correct. Since debian needs to build >>> it once per version, it could just put the entire package (.py files and >>> .so files) into a different per-python-version directory. >> >> This is what is currently done. This will increase the size of packages >> by duplicating the .py files, or you have to install the .py in a common >> location (irrelevant to sys.path), and provide (sym)links to the >> expected location. > >"This is what is currently done" and "provide (sym)links to the >expected location" are conflicting statements. I think Matthias was referring to "what is currently done" to your statement "debian needs to build it once per version". Providing symlinks is how we are able to make it appear that there are version-specific py files without actually doing so. >If you are symlinking .py files from a shared location, then that is not the >same as "just install the package into a version-specific location". What >motivation is there for preferring symlinks? This reduces .py file duplications in distro packages. >Who cares if a ditro package install yields duplicate .py files? Nor am >I motivated by having to carry duplicate .py files in a distribution >package (I imagine the compression of duplicate .py files is amazing). It might be amazing, but it's still a significant overhead. As I've described, multiply that by all the py files in all the distro packages containing Python source code, and then still try to fit it on a CDROM. >What happens to the distro packaging if a python package splits the >codebase between 2.x and 3.x (meaning they have distinct .py files)? The Debian/Ubuntu approach to Python 2/3 support is to provide them in separate distro packages. E.g. for Python package foo, you would have Debuntu package python-foo (for the Python 2.x version) and python3-foo. We do not share source between Python 2 and 3 versions, at least not yet . This doesn't hurt us much because the number of Python packages that are source compatible between the two is pretty low (Benjamin's 'six' package might change that :), and not much depends on Python 3 yet. >As someone else mentioned, how is virtualenv going to interact with packages >that install like this? This is a good question, but I *think* it won't affect it much at all. To test for sure I'd either need a Python 3 compatible virtualenv or backport my patch to Python 2.6 and 2.7. But still, I'm not sure it would matter since the same shared library import suffix is used in either case. I actually think version-specific search paths would have a greater impact on virtualenv. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Wed Jun 30 20:55:16 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jun 2010 14:55:16 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: <20100630145516.08b5b2ec@heresy> On Jun 25, 2010, at 11:58 AM, Brett Cannon wrote: >> Placing .so files together does not simplify that install process in any >> way. You will still have to handle such packages in a special way. You >> must still compile the package multiple times for each relevant version >> of python (with special tagging that I imagine distutils can take care >> of) and, worse yet, you have created a more trick install than merely >> having multiple search paths (e.g., installing/uninstalling lxml for >> *one* version of python is actually more difficult in this scheme). > >This is meant to be used by distros in a programmatic fashion, so my >response is "so what?" Their package management system is going to >maintain the directory, not a person. You and I are not going to be >using this for anything. This is purely meant for Linux OS vendors >(maybe OS X) to manage their installs through their package software. >I honestly do not expect human beings to be mucking around with these >installs (and I suspect Barry doesn't either). Spot on. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Wed Jun 30 20:58:00 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jun 2010 14:58:00 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C266702.4010102@ubuntu.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> <4C266702.4010102@ubuntu.com> Message-ID: <20100630145800.7658936e@heresy> On Jun 26, 2010, at 10:45 PM, Matthias Klose wrote: >Having non-conflicting extension names is a schema which already is used on >some platforms (debug builds on Windows). The question for me is, if just a >renaming of the .so files is acceptable for upstream, or if distributors >should implement this on their own, as something like: > > if ext_path.startswith('/usr/') and not ext_path.startswith('/usr/local/'): > load_ext('foo.2.6.so') > else: > load_ext('foo.so') > >I fear this will cause issues when e.g. virtualenv environments start copying >parts from the system installation instead of symlinking it. I concur. I think my patch will have much less impact on virtualenv and similar tools because there's nothing much magical about it. It just says "oh there's another file suffix you should consider when looking for a shared library", which as you point out is already done on Windows. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Wed Jun 30 21:03:28 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jun 2010 15:03:28 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C2506AE.3060002@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> <4C2506AE.3060002@scottdial.com> Message-ID: <20100630150328.281f5d5f@heresy> On Jun 25, 2010, at 03:42 PM, Scott Dial wrote: >On 6/25/2010 2:58 PM, Brett Cannon wrote: >> I assume you are talking about PEP 3147. You're right that the PEP was >> for pyc files and that's it. No one is talking about rewriting the >> PEP. > >Yes, I am making reference to PEP 3147. I make reference to that PEP >because this change is of the same order of magnitude as the .pyc >change, and we asked for a PEP for that, and if this .so stuff is an >extension of that thought process, then it should either be reflected by >that PEP or a new PEP. I think it's not nearly on the order of magnitude as PEP 3147. One way to measure that is the size of the patch required to implement the feature and ensure the test suite still works. My versioned so patch is *way* smaller. I actually think because this is almost exclusively an extension to a build-time configuration option, and doesn't really change the language, a PEP shouldn't be necessary. But by the same token, I'm willing to write a new one (and *not* touch PEP 3147) just so that we have a point of reference to record the discussion and decision. So I'll do that. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Wed Jun 30 21:06:10 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jun 2010 15:06:10 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C23DD99.9050604@egenix.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C23DD99.9050604@egenix.com> Message-ID: <20100630150610.7ae4ac6a@heresy> On Jun 25, 2010, at 12:35 AM, M.-A. Lemburg wrote: >Scott Dial wrote: >> On 6/24/2010 5:09 PM, Barry Warsaw wrote: >>>> What use case does this address? >>> >>>> If you want to make it so a system can install a package in just one >>>> location to be used by multiple Python installations, then the version >>>> number isn't enough. You also need to distinguish debug builds, profiling >>>> builds, Unicode width (see issue8654), and probably several other >>>> ./configure options. >>> >>> This is a good point, but more easily addressed. Let's say a distro makes >>> three Python 3.2 variants available, one "normal" build, a debug build, and >>> UCS2 and USC4 versions of the above. All we need to do is choose a different >>> .so ABI tag (see previous follow) for each of those builds. My updated patch >>> (coming soon) allows you to define that tag to configure. So e.g. >> >> Why is this use case not already addressed by having independent >> directories? And why is there an incentive to co-mingle these >> version-punned files with version-agnostic ones? > >I don't think this is a good idea. After a while your Python >lib directories would need some serious dusting off to make them >maintainable again. > >Disk space is cheap so setting up dedicated directories for each >variant will result in a much easier to manage installation. > >If you want a really clever setup, use hard links between those >directory (you can also use symlinks if you like). >Then a change in one Python file will automatically >propagate to all other variant dirs without any maintenance >effort. Together with PYTHONHOME this makes a really nice >virtualenv-like environment. Note that I do believe there is a difference between what users maintaining their own Python installations might want, and what a distro needs to maintain its entire Python stack. So while dedicated directories might make more sense if you're maintaining your own Python built from source, it doesn't make as much sense for a distro, as described in previous responses by Matthias. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From exarkun at twistedmatrix.com Wed Jun 30 21:10:05 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Wed, 30 Jun 2010 19:10:05 -0000 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <4C2B7F00.2010602@v.loewis.de> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <20100630113232.1937.151974582.divmod.xquotient.556@localhost.localdomain> <4C2B7F00.2010602@v.loewis.de> Message-ID: <20100630191005.1937.1474314461.divmod.xquotient.617@localhost.localdomain> On 05:29 pm, martin at v.loewis.de wrote: >Am 30.06.2010 13:32, schrieb exarkun at twistedmatrix.com: >>On 05:24 am, martin at v.loewis.de wrote: >>>>Seems to work fine. So this I don't understand. Any ideas, anyone? >>> >>>Didn't we discuss this before? The buildbot slave has no controlling >>>terminal anymore, hence it cannot open /dev/tty. If you are curious, >>>just patch your checkout to output the exact errno (e.g. to stdout), >>>and trigger a build through the web. >> >>Could the test be rewritten (or supplemented) to use a pty? Most or >>perhaps all of the same operations should be supported. > >I'm not sure. It uses TIOCGPGRP, basically to establish that ioctl >can also put results into a Python array (IIUC). This goes back to >http://bugs.python.org/555817 > >Somebody rewriting it would need to make sure the original test purpose >is still met. Absolutely. And even so, it may still make sense to run the test against both /dev/tty and a pty (or whatever subset of those things can be acquired in the testing environment). You can do a TIOCGPGRP on a new pty (created by os.openpty) but it produces somewhat less interesting results than doing it on /dev/tty. FIONREAD might be a nice alternative. It produces interesting (ie, non- zero) values in an easily predictable/controllable way (it tells you how many bytes are in the read buffer). Jean-Paul From exarkun at twistedmatrix.com Wed Jun 30 21:11:22 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Wed, 30 Jun 2010 19:11:22 -0000 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <20100630184602.1937.1550858232.divmod.xquotient.569@localhost.localdomain> References: <71728.1277866512@parc.com> <68796.1277913609@parc.com> <20100630184457.10067764@pitrou.net> <20100630184602.1937.1550858232.divmod.xquotient.569@localhost.localdomain> Message-ID: <20100630191122.1937.493523511.divmod.xquotient.619@localhost.localdomain> On 06:46 pm, exarkun at twistedmatrix.com wrote: > >On 04:44 pm, solipsis at pitrou.net wrote: >>On Wed, 30 Jun 2010 09:00:09 PDT >>Bill Janssen wrote: >>> >>>So, my question then is, why are these skips "unexpected"? Seems to >>>me >>>that if this is the case, this test will never run on any platform. >> >>You can change the value of the "usepty" option in your buildbot.tac. >>(you will also have to restart the buildslave process) > >But don't do this. The usepty option is completely unrelated to the >suggestion I was making. Flipping it to True will only cause other >things to break and have no impact on this test. Ah, sorry. I confused myself. The option is related. But it will also break other things, so I still would recommend looking for other solutions. Jean-Paul From brett at python.org Wed Jun 30 21:28:03 2010 From: brett at python.org (Brett Cannon) Date: Wed, 30 Jun 2010 12:28:03 -0700 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <68821.1277913795@parc.com> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <68821.1277913795@parc.com> Message-ID: On Wed, Jun 30, 2010 at 09:03, Bill Janssen wrote: > Martin v. L?wis wrote: > >> > Seems to work fine. ?So this I don't understand. ?Any ideas, anyone? >> >> Didn't we discuss this before? > > Possibly, but I don't recall doing so. > >> The buildbot slave has no controlling >> terminal anymore, hence it cannot open /dev/tty. If you are curious, >> just patch your checkout to output the exact errno (e.g. to stdout), >> and trigger a build through the web. > > So, why is skipping this test "unexpected"? ?I see "x86 Tiger" is also > showing this as an unexpected skip. ?Should I just add it to the list of > expected skips on Darwin? ?Actually, will it run on any platform? The whole "unexpected" skipping is somewhat of a mess. In an ideal situation modules that are optionally built should be allowed to skip, and on a per-platform basis certain OS-specific tests (whether they be exclusive to a specific OS or run on all OSs except Windows) should be skipped. Otherwise any import failure should be a test failure. The "unexpected" test skipping was meant to solve both of these situations, but in an imperfect way. My PSF grant proposal to work on Python full-time for two to three months after my Ph.D. is complete (assuming the PSF gives me the grant this would start most likely in November or December) includes cleaning up the test suite and this would be the first thing I tackle. From martin at v.loewis.de Wed Jun 30 21:53:08 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 30 Jun 2010 21:53:08 +0200 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <68821.1277913795@parc.com> Message-ID: <4C2BA0A4.9070002@v.loewis.de> > The whole "unexpected" skipping is somewhat of a mess. In an ideal > situation modules that are optionally built should be allowed to skip, While this may be the wide-spread interpretation, it is definitely *not* the original intention of the feature. When Tim Peters added it, he wanted it to tell him whether he did the Windows build correctly, INCLUDING ALL OPTIONAL PACKAGES that can possibly work on Windows. If you try to generalize this beyond Windows, then the only skips that are expected are the ones for tests that absolutely cannot work on the platform - i.e. Unix tests on Windows, and Windows tests on Unix. Otherwise, if you can get it to pass by installing additional software, Tim did *not* mean this to be an expected skip. Regards, Martin From janssen at parc.com Wed Jun 30 22:21:51 2010 From: janssen at parc.com (Bill Janssen) Date: Wed, 30 Jun 2010 13:21:51 PDT Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <4C2BA0A4.9070002@v.loewis.de> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <68821.1277913795@parc.com> <4C2BA0A4.9070002@v.loewis.de> Message-ID: <76469.1277929311@parc.com> Martin v. L?wis wrote: > > The whole "unexpected" skipping is somewhat of a mess. In an ideal > > situation modules that are optionally built should be allowed to skip, > > While this may be the wide-spread interpretation, it is definitely *not* > the original intention of the feature. > > When Tim Peters added it, he wanted it to tell him whether he did the > Windows build correctly, INCLUDING ALL OPTIONAL PACKAGES that can > possibly work on Windows. If you try to generalize this beyond Windows, > then the only skips that are expected are the ones for tests that > absolutely cannot work on the platform - i.e. Unix tests on Windows, > and Windows tests on Unix. Otherwise, if you can get it to pass by > installing additional software, Tim did *not* mean this to be an > expected skip. Perfectly reasonable, good to know. So on my OS X buildbots I should update gdb, tcl/tk, and readline, so that those tests can run. Probably be good to put a note in the regrtest.py comments to this effect, as I don't see a PEP about testing or buildbots. Bill From mal at egenix.com Wed Jun 30 22:35:56 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 30 Jun 2010 22:35:56 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100630150610.7ae4ac6a@heresy> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C23DD99.9050604@egenix.com> <20100630150610.7ae4ac6a@heresy> Message-ID: <4C2BAAAC.5090101@egenix.com> Barry Warsaw wrote: > On Jun 25, 2010, at 12:35 AM, M.-A. Lemburg wrote: > >> Scott Dial wrote: >>> On 6/24/2010 5:09 PM, Barry Warsaw wrote: >>>>> What use case does this address? >>>> >>>>> If you want to make it so a system can install a package in just one >>>>> location to be used by multiple Python installations, then the version >>>>> number isn't enough. You also need to distinguish debug builds, profiling >>>>> builds, Unicode width (see issue8654), and probably several other >>>>> ./configure options. >>>> >>>> This is a good point, but more easily addressed. Let's say a distro makes >>>> three Python 3.2 variants available, one "normal" build, a debug build, and >>>> UCS2 and USC4 versions of the above. All we need to do is choose a different >>>> .so ABI tag (see previous follow) for each of those builds. My updated patch >>>> (coming soon) allows you to define that tag to configure. So e.g. >>> >>> Why is this use case not already addressed by having independent >>> directories? And why is there an incentive to co-mingle these >>> version-punned files with version-agnostic ones? >> >> I don't think this is a good idea. After a while your Python >> lib directories would need some serious dusting off to make them >> maintainable again. >> >> Disk space is cheap so setting up dedicated directories for each >> variant will result in a much easier to manage installation. >> >> If you want a really clever setup, use hard links between those >> directory (you can also use symlinks if you like). >> Then a change in one Python file will automatically >> propagate to all other variant dirs without any maintenance >> effort. Together with PYTHONHOME this makes a really nice >> virtualenv-like environment. > > Note that I do believe there is a difference between what users maintaining > their own Python installations might want, and what a distro needs to maintain > its entire Python stack. So while dedicated directories might make more sense > if you're maintaining your own Python built from source, it doesn't make as > much sense for a distro, as described in previous responses by Matthias. Fair enough. I haven't followed the thread closely, so Matthias will probably already have answered this: The Python default installation dir for libs (including site-packages) is $prefix/lib/pythonX.X, so you already have separate and properly versioned directory paths. What difference would the extra version on the .so file make in such a setup ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 30 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 18 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From brett at python.org Wed Jun 30 23:12:59 2010 From: brett at python.org (Brett Cannon) Date: Wed, 30 Jun 2010 14:12:59 -0700 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <4C2BA0A4.9070002@v.loewis.de> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <68821.1277913795@parc.com> <4C2BA0A4.9070002@v.loewis.de> Message-ID: On Wed, Jun 30, 2010 at 12:53, "Martin v. L?wis" wrote: >> The whole "unexpected" skipping is somewhat of a mess. In an ideal >> situation modules that are optionally built should be allowed to skip, > > While this may be the wide-spread interpretation, it is definitely *not* > the original intention of the feature. > > When Tim Peters added it, he wanted it to tell him whether he did the > Windows build correctly, INCLUDING ALL OPTIONAL PACKAGES that can > possibly work on Windows. If you try to generalize this beyond Windows, > then the only skips that are expected are the ones for tests that > absolutely cannot work on the platform - i.e. Unix tests on Windows, > and Windows tests on Unix. Otherwise, if you can get it to pass by > installing additional software, Tim did *not* mean this to be an > expected skip. Interesting. Do you use it that way when you make the Windows build? From ncoghlan at gmail.com Wed Jun 30 23:52:30 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 1 Jul 2010 07:52:30 +1000 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <4C2BA0A4.9070002@v.loewis.de> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <68821.1277913795@parc.com> <4C2BA0A4.9070002@v.loewis.de> Message-ID: On Thu, Jul 1, 2010 at 5:53 AM, "Martin v. L?wis" wrote: > When Tim Peters added it, he wanted it to tell him whether he did the > Windows build correctly, INCLUDING ALL OPTIONAL PACKAGES that can > possibly work on Windows. If you try to generalize this beyond Windows, > then the only skips that are expected are the ones for tests that > absolutely cannot work on the platform - i.e. Unix tests on Windows, > and Windows tests on Unix. Otherwise, if you can get it to pass by > installing additional software, Tim did *not* mean this to be an > expected skip. Note that it works this way on Linux as well. On Kubuntu (for example) you need another half dozen or so additional *-dev packages installed to avoid unexpected test skips. Cheers, Nick. P.S. For anyone curious, I posted the list of extra packages you need here: http://boredomandlaziness.blogspot.com/2010/01/kubuntu-dev-packages-to-build-python.html -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From brett at python.org Wed Jun 30 23:55:14 2010 From: brett at python.org (Brett Cannon) Date: Wed, 30 Jun 2010 14:55:14 -0700 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <68821.1277913795@parc.com> <4C2BA0A4.9070002@v.loewis.de> Message-ID: On Wed, Jun 30, 2010 at 14:52, Nick Coghlan wrote: > On Thu, Jul 1, 2010 at 5:53 AM, "Martin v. L?wis" wrote: >> When Tim Peters added it, he wanted it to tell him whether he did the >> Windows build correctly, INCLUDING ALL OPTIONAL PACKAGES that can >> possibly work on Windows. If you try to generalize this beyond Windows, >> then the only skips that are expected are the ones for tests that >> absolutely cannot work on the platform - i.e. Unix tests on Windows, >> and Windows tests on Unix. Otherwise, if you can get it to pass by >> installing additional software, Tim did *not* mean this to be an >> expected skip. > > Note that it works this way on Linux as well. On Kubuntu (for example) > you need another half dozen or so additional *-dev packages installed > to avoid unexpected test skips. So it isn't that it's "unexpected", it's that a dependency is missing. So it seems the terminology needs to get tweaked.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4