Showing content from https://mail.python.org/pipermail/python-dev/2008-July.txt below:
From guido at python.org Tue Jul 1 01:12:03 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Jun 2008 16:12:03 -0700 Subject: [Python-Dev] [Python-checkins] r64424 - inpython/trunk:Include/object.h Lib/test/test_sys.pyMisc/NEWSObjects/intobject.c Objects/longobject.cObjects/typeobject.cPython/bltinmodule.c In-Reply-To: <5c6f2a5d0806300931x7635cee5t597c09ff7b06fc6f@mail.gmail.com> References: <20080620041816.4D5E81E4002@bag.python.org> <5c6f2a5d0806261317m3c8b848dm6e8071d8b841fa59@mail.gmail.com> <4863FC7B.6070903@v.loewis.de> <5c6f2a5d0806261346o7af44dc6g449d6bece2d75842@mail.gmail.com> <04BCC25BF0EC4DFBB06FEEA568199FB1@RaymondLaptop1> <5c6f2a5d0806261453n6ebe20b7yb26ca69c27f75517@mail.gmail.com> <58A6CFFCB6AC4A84A6C86809A50295FF@RaymondLaptop1> <5c6f2a5d0806291726o77bcd4ffvcf08c4ab2be539a4@mail.gmail.com> <5c6f2a5d0806300931x7635cee5t597c09ff7b06fc6f@mail.gmail.com> Message-ID: Mon, Jun 30, 2008 at 9:31 AM, Mark Dickinson wrote: > On Mon, Jun 30, 2008 at 4:53 PM, Guido van Rossum wrote: >> FWIW, I'm fine with making these methods on float -- a class method >> float.fromhex(...) echoes e.g. dict.fromkeys(...) and >> datetime.fromordinal(...). The to-hex conversion could be x.hex() -- >> we don't tend to use ".toxyz()" as a naming convention much in Python. > > Would it be totally outrageous for the float constructor to accept > hex strings directly? int('0x10') raises a ValueError as well. You might propose float('0x...p...', 16) but since the format is so specifically different I think that's not completely kosher. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Tue Jul 1 22:16:37 2008 From: barry at python.org (Barry Warsaw) Date: Tue, 1 Jul 2008 16:16:37 -0400 Subject: [Python-Dev] Second betas tomorrow Message-ID: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Wow, I bet this one crept up on you as quickly as it did me! We have our second planned beta releases for 2.6 and 3.0 tomorrow. As usual I will start looking at blockers and buildbots tomorrow afternoon (UTC-4 time) with a plan to start building things at about 6pm. Also, I will of course be in #python-dev on freenode to answer any questions, or get second opinions. PEP 361 claims that these will be the last betas. Whether that's true or not depends on how well the beta2's go. Please help review code or fix bugs. If you know of things that absolutely must go into beta2, be sure there is an open release-blocker bug on the issue. Thanks, - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSGqQpnEjvBPtnXfVAQLdagP/VooK8+AoPrb1bR7xAxGqg0vC1HOKw5qZ 8VQArzgldz1OnoG24PuKGdaEw7PbHjCMkD0/CyZWjH8/yWawcxV7hKl6RYHJ3GX9 keroo7wz3/NaptJtA9ldoKA5ekV8WVVC5OElgtjKr+v6HorPQSHzUgJiDHYUS1FW A8fdHipyZds= =vwYy -----END PGP SIGNATURE----- From guido at python.org Tue Jul 1 22:25:27 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 1 Jul 2008 13:25:27 -0700 Subject: [Python-Dev] [Python-3000] Second betas tomorrow In-Reply-To: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> References: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> Message-ID: I think we should put this one off. The previous betas were done on June 18, and IMO the next beta should be about a month afterwards, not 2 weeks. On Tue, Jul 1, 2008 at 1:16 PM, Barry Warsaw wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Wow, I bet this one crept up on you as quickly as it did me! > > We have our second planned beta releases for 2.6 and 3.0 tomorrow. As > usual I will start looking at blockers and buildbots tomorrow afternoon > (UTC-4 time) with a plan to start building things at about 6pm. Also, I > will of course be in #python-dev on freenode to answer any questions, or get > second opinions. > > PEP 361 claims that these will be the last betas. Whether that's true or > not depends on how well the beta2's go. Please help review code or fix > bugs. If you know of things that absolutely must go into beta2, be sure > there is an open release-blocker bug on the issue. > > Thanks, > - -Barry > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (Darwin) > > iQCVAwUBSGqQpnEjvBPtnXfVAQLdagP/VooK8+AoPrb1bR7xAxGqg0vC1HOKw5qZ > 8VQArzgldz1OnoG24PuKGdaEw7PbHjCMkD0/CyZWjH8/yWawcxV7hKl6RYHJ3GX9 > keroo7wz3/NaptJtA9ldoKA5ekV8WVVC5OElgtjKr+v6HorPQSHzUgJiDHYUS1FW > A8fdHipyZds= > =vwYy > -----END PGP SIGNATURE----- > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jnoller at gmail.com Tue Jul 1 22:28:11 2008 From: jnoller at gmail.com (Jesse Noller) Date: Tue, 1 Jul 2008 16:28:11 -0400 Subject: [Python-Dev] [Python-3000] Second betas tomorrow In-Reply-To: References: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> Message-ID: <4222a8490807011328v4df3eb9fl7de44be415b334ff@mail.gmail.com> On Tue, Jul 1, 2008 at 4:23 PM, Georg Brandl wrote: > Barry Warsaw schrieb: >> >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Wow, I bet this one crept up on you as quickly as it did me! >> >> We have our second planned beta releases for 2.6 and 3.0 tomorrow. As >> usual I will start looking at blockers and buildbots tomorrow afternoon >> (UTC-4 time) with a plan to start building things at about 6pm. Also, I >> will of course be in #python-dev on freenode to answer any questions, or >> get second opinions. >> >> PEP 361 claims that these will be the last betas. Whether that's true or >> not depends on how well the beta2's go. Please help review code or fix >> bugs. If you know of things that absolutely must go into beta2, be sure >> there is an open release-blocker bug on the issue. > > May I ask if it really makes sense to release the beta tomorrow? Looking > at the Misc/NEWS files for 2.6 and 3.0, there are around 3-5 entries > for each release. I know it's good to follow the release plan, but it > also may save you, the release manager, work for the third beta (which > I think will be necessary if beta2 is released tomorrow). > > Georg > Speaking from my minor perspective - I've been sick and MIA, so there has not been a lot of movement on the pep 371 issues / multiprocessing bugs since Beta 1, there's still a fair amount of issues to close out. -jesse From barry at python.org Tue Jul 1 22:40:35 2008 From: barry at python.org (Barry Warsaw) Date: Tue, 1 Jul 2008 16:40:35 -0400 Subject: [Python-Dev] [Python-3000] Second betas tomorrow In-Reply-To: References: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jul 1, 2008, at 4:25 PM, Guido van Rossum wrote: > I think we should put this one off. The previous betas were done on > June 18, and IMO the next beta should be about a month afterwards, > not 2 weeks. I will not be able to make releases the weeks of July 21st and 28th. The next scheduled beta is August 6th. There are two options. I could shift everything forward 2 weeks and do the next betas on July 16th. Or we could wait until August 6th. That would mean 6 weeks between betas. It's fine with me either way. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSGqWQ3EjvBPtnXfVAQK5owP/Yd1pwWtwelbstnb6xh/dEtILirAyhfyo kcfQSSFBX+GgkDIx99cxgmJ7nB+xSNSy1MlkXukDj41O2m+dCqcQaxhyim4yqBYC r/Zc7IIiPT/nNQ/l97z8w0FqBoS/bmk9pqckBzrJfRRW14LZD8m2E/aU+OZeGi6z 0GZn/zwQbYk= =yC2a -----END PGP SIGNATURE----- From musiccomposition at gmail.com Tue Jul 1 22:45:21 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Tue, 1 Jul 2008 15:45:21 -0500 Subject: [Python-Dev] [Python-3000] Second betas tomorrow In-Reply-To: References: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> Message-ID: <1afaf6160807011345n650a645fq661908f75a1f6a03@mail.gmail.com> On Tue, Jul 1, 2008 at 3:40 PM, Barry Warsaw wrote: > > There are two options. I could shift everything forward 2 weeks and do the > next betas on July 16th. Or we could wait until August 6th. That would > mean 6 weeks between betas. It's fine with me either way. I vote for shifting things 2 weeks forward. -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From python at rcn.com Tue Jul 1 22:51:17 2008 From: python at rcn.com (Raymond Hettinger) Date: Tue, 1 Jul 2008 13:51:17 -0700 Subject: [Python-Dev] [Python-3000] Second betas tomorrow References: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> Message-ID: <9B1D90667C03488BB57E09808F4C9B64@RaymondLaptop1> From: "Barry Warsaw" > There are two options. I could shift everything forward 2 weeks and > do the next betas on July 16th. Or we could wait until August 6th. > That would mean 6 weeks between betas. It's fine with me either way. +1 for six weeks to allow the code to be more thoroughly exercised. Raymond From guido at python.org Tue Jul 1 22:54:54 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 1 Jul 2008 13:54:54 -0700 Subject: [Python-Dev] [Python-3000] Second betas tomorrow In-Reply-To: <9B1D90667C03488BB57E09808F4C9B64@RaymondLaptop1> References: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> <9B1D90667C03488BB57E09808F4C9B64@RaymondLaptop1> Message-ID: On Tue, Jul 1, 2008 at 1:51 PM, Raymond Hettinger wrote: > From: "Barry Warsaw" > >> There are two options. I could shift everything forward 2 weeks and do >> the next betas on July 16th. Or we could wait until August 6th. That >> would mean 6 weeks between betas. It's fine with me either way. >> > > +1 for six weeks to allow the code to be more thoroughly exercised. > In that case I'd rather insert an extra beta -- one in 2 weeks and one in 6 weeks. -- --Guido van Rossum (home page: http://www.python.org/~guido/) -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Wed Jul 2 00:55:16 2008 From: barry at python.org (Barry Warsaw) Date: Tue, 1 Jul 2008 18:55:16 -0400 Subject: [Python-Dev] [Python-3000] Second betas tomorrow In-Reply-To: References: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> <9B1D90667C03488BB57E09808F4C9B64@RaymondLaptop1> Message-ID: <8A43F3E7-BEE8-4656-833D-867328D16D52@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jul 1, 2008, at 4:54 PM, Guido van Rossum wrote: > On Tue, Jul 1, 2008 at 1:51 PM, Raymond Hettinger > wrote: > From: "Barry Warsaw" > > There are two options. I could shift everything forward 2 weeks > and do the next betas on July 16th. Or we could wait until August > 6th. That would mean 6 weeks between betas. It's fine with me > either way. > > +1 for six weeks to allow the code to be more thoroughly exercised. > > In that case I'd rather insert an extra beta -- one in 2 weeks and > one in 6 weeks. Okay. I can't actually do it on July 16th, so the revised schedule will be: 15-Jul-2008 beta 2 23-Aug-2008 beta 3 03-Sep-2008 rc1 17-Sep-2008 rc2 01-Oct-2008 final releases I will update PEP 361 now. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSGq11HEjvBPtnXfVAQJdYgP8DFVvCHeDzIDliY0bQuw+DXxMuGAxHWFO BZR2b4sEGFzMRfbGCJOi7wVubc4imwYDIpFXgzFHpWFMfUdBHGaSpnZJhGDxURqp 0vQQ3/nJLy7lpWfDYBy0Sps6XjANQF5SaqeW8KMVsa3X6Spw0fHTmF4xBIjiUaBy MvydyLNszY4= =9/s1 -----END PGP SIGNATURE----- From guido at python.org Wed Jul 2 01:04:04 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 1 Jul 2008 16:04:04 -0700 Subject: [Python-Dev] [Python-3000] Second betas tomorrow In-Reply-To: <8A43F3E7-BEE8-4656-833D-867328D16D52@python.org> References: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> <9B1D90667C03488BB57E09808F4C9B64@RaymondLaptop1> <8A43F3E7-BEE8-4656-833D-867328D16D52@python.org> Message-ID: On Tue, Jul 1, 2008 at 3:55 PM, Barry Warsaw wrote: > On Jul 1, 2008, at 4:54 PM, Guido van Rossum wrote: >> On Tue, Jul 1, 2008 at 1:51 PM, Raymond Hettinger wrote: >> From: "Barry Warsaw" >> >> There are two options. I could shift everything forward 2 weeks and do >> the next betas on July 16th. Or we could wait until August 6th. That >> would mean 6 weeks between betas. It's fine with me either way. >> >> +1 for six weeks to allow the code to be more thoroughly exercised. >> >> In that case I'd rather insert an extra beta -- one in 2 weeks and one in >> 6 weeks. > > Okay. I can't actually do it on July 16th, so the revised schedule will be: > > 15-Jul-2008 beta 2 > 23-Aug-2008 beta 3 > 03-Sep-2008 rc1 > 17-Sep-2008 rc2 > 01-Oct-2008 final releases > > I will update PEP 361 now. +1 Thanks for being flexible! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Wed Jul 2 01:13:57 2008 From: barry at python.org (Barry Warsaw) Date: Tue, 1 Jul 2008 19:13:57 -0400 Subject: [Python-Dev] [Python-3000] Second betas tomorrow In-Reply-To: References: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> <9B1D90667C03488BB57E09808F4C9B64@RaymondLaptop1> <8A43F3E7-BEE8-4656-833D-867328D16D52@python.org> Message-ID: <535961AE-BCFE-4563-96E0-B883D97A1188@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jul 1, 2008, at 7:04 PM, Guido van Rossum wrote: > On Tue, Jul 1, 2008 at 3:55 PM, Barry Warsaw wrote: >> On Jul 1, 2008, at 4:54 PM, Guido van Rossum wrote: >>> On Tue, Jul 1, 2008 at 1:51 PM, Raymond Hettinger >>> wrote: >>> From: "Barry Warsaw" >>> >>> There are two options. I could shift everything forward 2 weeks >>> and do >>> the next betas on July 16th. Or we could wait until August 6th. >>> That >>> would mean 6 weeks between betas. It's fine with me either way. >>> >>> +1 for six weeks to allow the code to be more thoroughly exercised. >>> >>> In that case I'd rather insert an extra beta -- one in 2 weeks and >>> one in >>> 6 weeks. >> >> Okay. I can't actually do it on July 16th, so the revised schedule >> will be: >> >> 15-Jul-2008 beta 2 >> 23-Aug-2008 beta 3 >> 03-Sep-2008 rc1 >> 17-Sep-2008 rc2 >> 01-Oct-2008 final releases >> >> I will update PEP 361 now. > > +1 > > Thanks for being flexible! Anything for a great release! - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSGq6NnEjvBPtnXfVAQKOlQP/RYlj6vxHEmlW/mVNIWqBYy/SmmMA6Qw4 hE3Bhb9QYGC5F0kEKyY5BmBVwETe70ahE1X3AOgmLrnHh5XwvGh8sNrFka/3s9sh vt6XAZh9IoXekZBIOGO4Gz0EtcURVUvAbCzCSXkHCQyL3qoV1r+mxsXVLRV2S4q0 UifMzkOm6WI= =wDrk -----END PGP SIGNATURE----- From brett at python.org Wed Jul 2 01:27:03 2008 From: brett at python.org (Brett Cannon) Date: Tue, 1 Jul 2008 16:27:03 -0700 Subject: [Python-Dev] [Python-3000] Second betas tomorrow In-Reply-To: <8A43F3E7-BEE8-4656-833D-867328D16D52@python.org> References: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> <9B1D90667C03488BB57E09808F4C9B64@RaymondLaptop1> <8A43F3E7-BEE8-4656-833D-867328D16D52@python.org> Message-ID: On Tue, Jul 1, 2008 at 3:55 PM, Barry Warsaw wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Jul 1, 2008, at 4:54 PM, Guido van Rossum wrote: > >> On Tue, Jul 1, 2008 at 1:51 PM, Raymond Hettinger wrote: >> From: "Barry Warsaw" >> >> There are two options. I could shift everything forward 2 weeks and do >> the next betas on July 16th. Or we could wait until August 6th. That >> would mean 6 weeks between betas. It's fine with me either way. >> >> +1 for six weeks to allow the code to be more thoroughly exercised. >> >> In that case I'd rather insert an extra beta -- one in 2 weeks and one in >> 6 weeks. > > Okay. I can't actually do it on July 16th, so the revised schedule will be: > > 15-Jul-2008 beta 2 > 23-Aug-2008 beta 3 > 03-Sep-2008 rc1 > 17-Sep-2008 rc2 > 01-Oct-2008 final releases > > I will update PEP 361 now. Is a Google Calendar kept by anyone that lists stuff like planned release dates, etc.? -Brett From musiccomposition at gmail.com Wed Jul 2 01:28:59 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Tue, 1 Jul 2008 18:28:59 -0500 Subject: [Python-Dev] [Python-3000] Second betas tomorrow In-Reply-To: References: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> <9B1D90667C03488BB57E09808F4C9B64@RaymondLaptop1> <8A43F3E7-BEE8-4656-833D-867328D16D52@python.org> Message-ID: <1afaf6160807011628m43738e85x3f03064a6df307ef@mail.gmail.com> On Tue, Jul 1, 2008 at 6:27 PM, Brett Cannon wrote: > Is a Google Calendar kept by anyone that lists stuff like planned > release dates, etc.? It's on my personal one. :) -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From barry at python.org Wed Jul 2 03:44:16 2008 From: barry at python.org (Barry Warsaw) Date: Tue, 1 Jul 2008 21:44:16 -0400 Subject: [Python-Dev] [Python-3000] Second betas tomorrow In-Reply-To: References: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> <9B1D90667C03488BB57E09808F4C9B64@RaymondLaptop1> <8A43F3E7-BEE8-4656-833D-867328D16D52@python.org> Message-ID: <79F76ED3-D21C-4D1D-B6B0-ECEBDFCEDDE3@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jul 1, 2008, at 7:27 PM, Brett Cannon wrote: > On Tue, Jul 1, 2008 at 3:55 PM, Barry Warsaw wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On Jul 1, 2008, at 4:54 PM, Guido van Rossum wrote: >> >>> On Tue, Jul 1, 2008 at 1:51 PM, Raymond Hettinger >>> wrote: >>> From: "Barry Warsaw" >>> >>> There are two options. I could shift everything forward 2 weeks >>> and do >>> the next betas on July 16th. Or we could wait until August 6th. >>> That >>> would mean 6 weeks between betas. It's fine with me either way. >>> >>> +1 for six weeks to allow the code to be more thoroughly exercised. >>> >>> In that case I'd rather insert an extra beta -- one in 2 weeks and >>> one in >>> 6 weeks. >> >> Okay. I can't actually do it on July 16th, so the revised schedule >> will be: >> >> 15-Jul-2008 beta 2 >> 23-Aug-2008 beta 3 >> 03-Sep-2008 rc1 >> 17-Sep-2008 rc2 >> 01-Oct-2008 final releases >> >> I will update PEP 361 now. > > Is a Google Calendar kept by anyone that lists stuff like planned > release dates, etc.? http://www.google.com/calendar/ical/b6v58qvojllt0i6ql654r1vh00%40group.calendar.google.com/public/basic.ics - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSGrdcHEjvBPtnXfVAQJYxgQAh/+j8pF21H0k1vp+1znOh57MohU7gVP6 7fMnLSzOoA+9w7+pVvJVzWbr09vg41kO6OzqEAoMUPV2BK8ZHePuHZkLDwhCAAYk nixu2vRZZEGmT6aC0jejwOCY7vy5giTHelX442drKZcuSdNl4x1kvyohBnm0flIH 6B7HRL3Oo2Q= =5yqD -----END PGP SIGNATURE----- From brett at python.org Wed Jul 2 04:01:33 2008 From: brett at python.org (Brett Cannon) Date: Tue, 1 Jul 2008 19:01:33 -0700 Subject: [Python-Dev] [Python-3000] Second betas tomorrow In-Reply-To: <79F76ED3-D21C-4D1D-B6B0-ECEBDFCEDDE3@python.org> References: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> <9B1D90667C03488BB57E09808F4C9B64@RaymondLaptop1> <8A43F3E7-BEE8-4656-833D-867328D16D52@python.org> <79F76ED3-D21C-4D1D-B6B0-ECEBDFCEDDE3@python.org> Message-ID: On Tue, Jul 1, 2008 at 6:44 PM, Barry Warsaw wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Jul 1, 2008, at 7:27 PM, Brett Cannon wrote: > >> On Tue, Jul 1, 2008 at 3:55 PM, Barry Warsaw wrote: >>> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> On Jul 1, 2008, at 4:54 PM, Guido van Rossum wrote: >>> >>>> On Tue, Jul 1, 2008 at 1:51 PM, Raymond Hettinger >>>> wrote: >>>> From: "Barry Warsaw" >>>> >>>> There are two options. I could shift everything forward 2 weeks and do >>>> the next betas on July 16th. Or we could wait until August 6th. That >>>> would mean 6 weeks between betas. It's fine with me either way. >>>> >>>> +1 for six weeks to allow the code to be more thoroughly exercised. >>>> >>>> In that case I'd rather insert an extra beta -- one in 2 weeks and one >>>> in >>>> 6 weeks. >>> >>> Okay. I can't actually do it on July 16th, so the revised schedule will >>> be: >>> >>> 15-Jul-2008 beta 2 >>> 23-Aug-2008 beta 3 >>> 03-Sep-2008 rc1 >>> 17-Sep-2008 rc2 >>> 01-Oct-2008 final releases >>> >>> I will update PEP 361 now. >> >> Is a Google Calendar kept by anyone that lists stuff like planned >> release dates, etc.? > > http://www.google.com/calendar/ical/b6v58qvojllt0i6ql654r1vh00%40group.calendar.google.com/public/basic.ics Thanks, Barry! -Brett From brett at python.org Wed Jul 2 04:04:44 2008 From: brett at python.org (Brett Cannon) Date: Tue, 1 Jul 2008 19:04:44 -0700 Subject: [Python-Dev] Can someone check my lib2to3 change for fix_imports? Message-ID: I just committed r64651 which is my attempt to add support to fix_imports so that modules that have been split up in 3.0 can be properly fixed. 2to3's test suite passes and all, but I am not sure if I botched it somehow since I did the change slightly blind. Can someone just do a quick check to make sure I did it properly? Also, what order should renames be declared to give priority to certain renames (e.g., urllib should probably be renamed to urllib.requeste over urllib.error when not used in a ``from ... import`` statement). -Brett From musiccomposition at gmail.com Wed Jul 2 04:38:13 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Tue, 1 Jul 2008 21:38:13 -0500 Subject: [Python-Dev] Can someone check my lib2to3 change for fix_imports? In-Reply-To: References: Message-ID: <1afaf6160807011938r44a58880n60e696b25c098617@mail.gmail.com> On Tue, Jul 1, 2008 at 9:04 PM, Brett Cannon wrote: > I just committed r64651 which is my attempt to add support to > fix_imports so that modules that have been split up in 3.0 can be > properly fixed. 2to3's test suite passes and all, but I am not sure if > I botched it somehow since I did the change slightly blind. Can > someone just do a quick check to make sure I did it properly? Also, > what order should renames be declared to give priority to certain > renames (e.g., urllib should probably be renamed to urllib.requeste > over urllib.error when not used in a ``from ... import`` statement). Well for starters, you know the test for fix_imports is disabled, right? -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From musiccomposition at gmail.com Wed Jul 2 04:42:33 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Tue, 1 Jul 2008 21:42:33 -0500 Subject: [Python-Dev] [Python-3000] Second betas tomorrow In-Reply-To: <79F76ED3-D21C-4D1D-B6B0-ECEBDFCEDDE3@python.org> References: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> <9B1D90667C03488BB57E09808F4C9B64@RaymondLaptop1> <8A43F3E7-BEE8-4656-833D-867328D16D52@python.org> <79F76ED3-D21C-4D1D-B6B0-ECEBDFCEDDE3@python.org> Message-ID: <1afaf6160807011942r738aae8u6c84aab03d463d76@mail.gmail.com> On Tue, Jul 1, 2008 at 8:44 PM, Barry Warsaw wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Jul 1, 2008, at 7:27 PM, Brett Cannon wrote: >> >> Is a Google Calendar kept by anyone that lists stuff like planned >> release dates, etc.? > > http://www.google.com/calendar/ical/b6v58qvojllt0i6ql654r1vh00%40group.calendar.google.com/public/basic.ics Can I get the non-iCal version? -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From barry at python.org Wed Jul 2 05:29:10 2008 From: barry at python.org (Barry Warsaw) Date: Tue, 1 Jul 2008 23:29:10 -0400 Subject: [Python-Dev] [Python-3000] Second betas tomorrow In-Reply-To: <1afaf6160807011942r738aae8u6c84aab03d463d76@mail.gmail.com> References: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> <9B1D90667C03488BB57E09808F4C9B64@RaymondLaptop1> <8A43F3E7-BEE8-4656-833D-867328D16D52@python.org> <79F76ED3-D21C-4D1D-B6B0-ECEBDFCEDDE3@python.org> <1afaf6160807011942r738aae8u6c84aab03d463d76@mail.gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jul 1, 2008, at 10:42 PM, Benjamin Peterson wrote: > On Tue, Jul 1, 2008 at 8:44 PM, Barry Warsaw wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On Jul 1, 2008, at 7:27 PM, Brett Cannon wrote: >>> >>> Is a Google Calendar kept by anyone that lists stuff like planned >>> release dates, etc.? >> >> http://www.google.com/calendar/ical/b6v58qvojllt0i6ql654r1vh00%40group.calendar.google.com/public/basic.ics > > Can I get the non-iCal version? http://www.google.com/calendar/feeds/b6v58qvojllt0i6ql654r1vh00%40group.calendar.google.com/public/basic http://www.google.com/calendar/embed?src=b6v58qvojllt0i6ql654r1vh00%40group.calendar.google.com&ctz=America/New_York - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSGr2BnEjvBPtnXfVAQJKBQP/bme7XNFS74SSmNNYX6Wz7Dq83VSQ8J6A hZf6k7tTx6I3qv0Xgc2jD9NnNuLmqG+Rw8Ag5CjBtZXgzAoyszluzddJfz3G0032 zPofZx/ekp22u4XJo9iQyrDKinp+qTlDqlQntsscY5l+KXR5P9ahWeWWM9aQw707 VYkxQ2yAA7g= =fzdc -----END PGP SIGNATURE----- From brett at python.org Wed Jul 2 05:36:59 2008 From: brett at python.org (Brett Cannon) Date: Tue, 1 Jul 2008 20:36:59 -0700 Subject: [Python-Dev] Can someone check my lib2to3 change for fix_imports? In-Reply-To: <1afaf6160807011938r44a58880n60e696b25c098617@mail.gmail.com> References: <1afaf6160807011938r44a58880n60e696b25c098617@mail.gmail.com> Message-ID: On Tue, Jul 1, 2008 at 7:38 PM, Benjamin Peterson wrote: > On Tue, Jul 1, 2008 at 9:04 PM, Brett Cannon wrote: >> I just committed r64651 which is my attempt to add support to >> fix_imports so that modules that have been split up in 3.0 can be >> properly fixed. 2to3's test suite passes and all, but I am not sure if >> I botched it somehow since I did the change slightly blind. Can >> someone just do a quick check to make sure I did it properly? Also, >> what order should renames be declared to give priority to certain >> renames (e.g., urllib should probably be renamed to urllib.requeste >> over urllib.error when not used in a ``from ... import`` statement). > > Well for starters, you know the test for fix_imports is disabled, right? > Nope, I forgot and turning it on has it failing running under 2.5. -Brett From ismail at namtrac.org Wed Jul 2 07:25:19 2008 From: ismail at namtrac.org (=?UTF-8?Q?=C4=B0smail_D=C3=B6nmez?=) Date: Wed, 2 Jul 2008 08:25:19 +0300 Subject: [Python-Dev] py3k branch still using -fno-strict-aliasing Message-ID: <19e566510807012225v6da0e8b2jac05ef407caee1b4@mail.gmail.com> Hi, I remember discussing this before and coming to conclusion that -fno-strict-aliasing would be removed from py3k CFLAGS. But as of now its still used. I tested with gcc 4.3.1 on Linux x86_64 and there is no strict aliasing warning when this flag is removed. Also make testall passes. Is there any reason to keep this flag? If not see the attached patch. Regards, ismail -- Programmer Excuses number 45: I do object-oriented programming - if the customer objects, I do more programming. -------------- next part -------------- A non-text attachment was scrubbed... Name: strict-aliasing.patch Type: text/x-diff Size: 1064 bytes Desc: not available URL: From paddy3118 at googlemail.com Wed Jul 2 08:08:22 2008 From: paddy3118 at googlemail.com (Paddy 3118) Date: Wed, 2 Jul 2008 07:08:22 +0100 Subject: [Python-Dev] [issue3214] Suggest change to glossary explanation: "Duck Typing" Message-ID: <3f7cdd360807012308y6eb6f018l6341cb0ace73e4e@mail.gmail.com> Hi, I'd like extra opinions on this issue please: http://bugs.python.org/issue3214 It's about changing the definition of Duck typing to remove hasattr and leave just EAFP in the enablers - more detail is in the issue log. Thanks, Paddy. From brett at python.org Wed Jul 2 08:32:54 2008 From: brett at python.org (Brett Cannon) Date: Tue, 1 Jul 2008 23:32:54 -0700 Subject: [Python-Dev] Can someone check my lib2to3 change for fix_imports? In-Reply-To: References: <1afaf6160807011938r44a58880n60e696b25c098617@mail.gmail.com> Message-ID: On Tue, Jul 1, 2008 at 8:36 PM, Brett Cannon wrote: > On Tue, Jul 1, 2008 at 7:38 PM, Benjamin Peterson > wrote: >> On Tue, Jul 1, 2008 at 9:04 PM, Brett Cannon wrote: >>> I just committed r64651 which is my attempt to add support to >>> fix_imports so that modules that have been split up in 3.0 can be >>> properly fixed. 2to3's test suite passes and all, but I am not sure if >>> I botched it somehow since I did the change slightly blind. Can >>> someone just do a quick check to make sure I did it properly? Also, >>> what order should renames be declared to give priority to certain >>> renames (e.g., urllib should probably be renamed to urllib.requeste >>> over urllib.error when not used in a ``from ... import`` statement). >> >> Well for starters, you know the test for fix_imports is disabled, right? >> > > Nope, I forgot and turning it on has it failing running under 2.5. > And refactor.py cannot be run directly from 2.5 because of a relative import and in 2.6 (where runpy has extra smarts) it still doesn't work thanks to main() not being passed an argument is needs (Issue3131). Looks like 2to3 needs some TLC. -Brett From steve at holdenweb.com Wed Jul 2 13:23:48 2008 From: steve at holdenweb.com (Steve Holden) Date: Wed, 02 Jul 2008 07:23:48 -0400 Subject: [Python-Dev] [issue3214] Suggest change to glossary explanation: "Duck Typing" In-Reply-To: <3f7cdd360807012308y6eb6f018l6341cb0ace73e4e@mail.gmail.com> References: <3f7cdd360807012308y6eb6f018l6341cb0ace73e4e@mail.gmail.com> Message-ID: Paddy 3118 wrote: > Hi, > I'd like extra opinions on this issue please: > http://bugs.python.org/issue3214 > > It's about changing the definition of Duck typing to remove hasattr and > leave just EAFP in the enablers - more detail is in the issue log. > The change seems to make sense. Use of hasattr() to determine method availability, while not strictly "look before you leap" because it doesn't test for a specific type, certainly isn't EAFP either. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From duncan.booth at suttoncourtenay.org.uk Wed Jul 2 13:47:02 2008 From: duncan.booth at suttoncourtenay.org.uk (Duncan Booth) Date: Wed, 2 Jul 2008 11:47:02 +0000 (UTC) Subject: [Python-Dev] repeated keyword arguments References: <1d85506f0806271207n49542b91x35b5c565378c4124@mail.gmail.com> <48659116.10302@canterbury.ac.nz> <200806281158.41558.steve@pearwood.info> Message-ID: "Steven D'Aprano" wrote: > It would be nice to be able to do this: > > defaults = dict(a=5, b=7) > f(**defaults, a=8) # override the value of a in defaults > > but unfortunately that gives a syntax error. Reversing the order would > override the wrong value. So as Python exists now, no, it's not > terribly useful. But it's not inherently a stupid idea. There is already an easy way to do that using functools.partial, and it is documented and therefore presumably deliberate behaviour "If additional keyword arguments are supplied, they extend and override keywords." >>> from functools import partial >>> def f(a=1, b=2, c=3): print a, b, c >>> g = partial(f, b=99) >>> g() 1 99 3 >>> g(a=100, b=101) 100 101 3 From ncoghlan at gmail.com Wed Jul 2 14:31:33 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 02 Jul 2008 22:31:33 +1000 Subject: [Python-Dev] Review needed: proposed fix for 2.6 __hash__ incompatibility (issue 2235) Message-ID: <486B7525.6070502@gmail.com> I've posted a possible fix for the __hash__ backwards incompatibilities described in issue 2235 [1]. The patch uses a model similar to that used in Py3k (using None is indicate "don't inherit __hash__"), but extends it to allowing Py_None to be explicitly stored in the tp_hash slot. The major downside is that we suffer the cost of an extra pointer comparison on every call to PyObject_Hash, but I wasn't able to come up with another solution that preserved backwards compatibility while still allowing collections.Hashable to function correctly. The patch involves a few changes to fairly deep components in typeobject.c though, so I'd like at least some kind of sanity check before I commit it. Cheers, Nick. [1] http://bugs.python.org/issue2235 -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From asmodai at in-nomine.org Wed Jul 2 16:13:28 2008 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Wed, 2 Jul 2008 16:13:28 +0200 Subject: [Python-Dev] UCS2/UCS4 default Message-ID: <20080702141328.GW62693@nexus.in-nomine.org> Guido (and others of course), back in 2001 you pointed out that you wanted to move to UCS4 completely as the ideal situation (http://mail.python.org/pipermail/i18n-sig/2001-June/001107.html) over the current default UCS2. Given 3.0 will use Unicode strings as the default, would it also not make sense to make the switch at this point as well? The current situation with UCS2 is particularly bad now that the CJK ideographs Extension B. has been produced (and C is under ballot and D is under development). Personally I use nothing else but UCS4 compiled Python binaries for the past years. See also http://www.python.org/dev/peps/pep-0261/ for background for the 2001 options. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Expansion of happiness is the purpose of life... From guido at python.org Wed Jul 2 16:13:59 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 2 Jul 2008 07:13:59 -0700 Subject: [Python-Dev] Review needed: proposed fix for 2.6 __hash__ incompatibility (issue 2235) In-Reply-To: <486B7525.6070502@gmail.com> References: <486B7525.6070502@gmail.com> Message-ID: On Wed, Jul 2, 2008 at 5:31 AM, Nick Coghlan wrote: > I've posted a possible fix for the __hash__ backwards incompatibilities > described in issue 2235 [1]. > > The patch uses a model similar to that used in Py3k (using None is indicate > "don't inherit __hash__"), but extends it to allowing Py_None to be > explicitly stored in the tp_hash slot. The major downside is that we suffer > the cost of an extra pointer comparison on every call to PyObject_Hash, but > I wasn't able to come up with another solution that preserved backwards > compatibility while still allowing collections.Hashable to function > correctly. >From your description it seems storing Py_None in the slot acts as a magic value meaning "this is defined but not usable". However it used to be pretty common for various code around to call various slots directly (after a NULL) check. That would have disastrous results if the slot value was Py_None. Would it be terribly inconvenient if the magic value was in fact another function, with a public name), whose sole purpose was to raise an exception? > The patch involves a few changes to fairly deep components in typeobject.c > though, so I'd like at least some kind of sanity check before I commit it. > > Cheers, > Nick. > > [1] http://bugs.python.org/issue2235 I can't promise I'll have time to look at this before my EuroPython keynote, but it's important for me to get it right, so if nobody else jumps in, remind me Tuesday. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ncoghlan at gmail.com Wed Jul 2 16:36:18 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 03 Jul 2008 00:36:18 +1000 Subject: [Python-Dev] Review needed: proposed fix for 2.6 __hash__ incompatibility (issue 2235) In-Reply-To: References: <486B7525.6070502@gmail.com> Message-ID: <486B9262.80107@gmail.com> Guido van Rossum wrote: > On Wed, Jul 2, 2008 at 5:31 AM, Nick Coghlan wrote: >> I've posted a possible fix for the __hash__ backwards incompatibilities >> described in issue 2235 [1]. >> >> The patch uses a model similar to that used in Py3k (using None is indicate >> "don't inherit __hash__"), but extends it to allowing Py_None to be >> explicitly stored in the tp_hash slot. The major downside is that we suffer >> the cost of an extra pointer comparison on every call to PyObject_Hash, but >> I wasn't able to come up with another solution that preserved backwards >> compatibility while still allowing collections.Hashable to function >> correctly. > >>From your description it seems storing Py_None in the slot acts as a > magic value meaning "this is defined but not usable". However it used > to be pretty common for various code around to call various slots > directly (after a NULL) check. That would have disastrous results if > the slot value was Py_None. Would it be terribly inconvenient if the > magic value was in fact another function, with a public name), whose > sole purpose was to raise an exception? Not only not inconvenient, but a significant improvement - as well as addressing your concern that I missed some code that calls tp_hash directly (a concern that I share, particularly since it could be an extension module we don't control that ends up doing it), it also gets rid of that extra pointer comparison in PyObject_Hash that was bothering me. >> The patch involves a few changes to fairly deep components in typeobject.c >> though, so I'd like at least some kind of sanity check before I commit it. >> >> Cheers, >> Nick. >> >> [1] http://bugs.python.org/issue2235 > > I can't promise I'll have time to look at this before my EuroPython > keynote, but it's important for me to get it right, so if nobody else > jumps in, remind me Tuesday. I'd now advise waiting until I have a chance to implement your idea anyway :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From collinw at gmail.com Wed Jul 2 18:30:14 2008 From: collinw at gmail.com (Collin Winter) Date: Wed, 2 Jul 2008 09:30:14 -0700 Subject: [Python-Dev] Can someone check my lib2to3 change for fix_imports? In-Reply-To: <1afaf6160807011938r44a58880n60e696b25c098617@mail.gmail.com> References: <1afaf6160807011938r44a58880n60e696b25c098617@mail.gmail.com> Message-ID: <43aa6ff70807020930j32ab1564n33ea41b9db487400@mail.gmail.com> On Tue, Jul 1, 2008 at 7:38 PM, Benjamin Peterson wrote: > On Tue, Jul 1, 2008 at 9:04 PM, Brett Cannon wrote: >> I just committed r64651 which is my attempt to add support to >> fix_imports so that modules that have been split up in 3.0 can be >> properly fixed. 2to3's test suite passes and all, but I am not sure if >> I botched it somehow since I did the change slightly blind. Can >> someone just do a quick check to make sure I did it properly? Also, >> what order should renames be declared to give priority to certain >> renames (e.g., urllib should probably be renamed to urllib.requeste >> over urllib.error when not used in a ``from ... import`` statement). > > Well for starters, you know the test for fix_imports is disabled, right? Why was this test disabled, rather than fixed? That seems a rather poor solution to the problem of it taking longer than desired to run. Collin From musiccomposition at gmail.com Wed Jul 2 18:34:00 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Wed, 2 Jul 2008 11:34:00 -0500 Subject: [Python-Dev] Can someone check my lib2to3 change for fix_imports? In-Reply-To: <43aa6ff70807020930j32ab1564n33ea41b9db487400@mail.gmail.com> References: <1afaf6160807011938r44a58880n60e696b25c098617@mail.gmail.com> <43aa6ff70807020930j32ab1564n33ea41b9db487400@mail.gmail.com> Message-ID: <1afaf6160807020934h37f6e989i772da74b68ced414@mail.gmail.com> On Wed, Jul 2, 2008 at 11:30 AM, Collin Winter wrote: > On Tue, Jul 1, 2008 at 7:38 PM, Benjamin Peterson >> Well for starters, you know the test for fix_imports is disabled, right? > > Why was this test disabled, rather than fixed? That seems a rather > poor solution to the problem of it taking longer than desired to run. I believe Martin was the one who disabled it. > > Collin > -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From collinw at gmail.com Wed Jul 2 18:36:30 2008 From: collinw at gmail.com (Collin Winter) Date: Wed, 2 Jul 2008 09:36:30 -0700 Subject: [Python-Dev] Can someone check my lib2to3 change for fix_imports? In-Reply-To: References: <1afaf6160807011938r44a58880n60e696b25c098617@mail.gmail.com> Message-ID: <43aa6ff70807020936j6bfa578cm9b260d363f34e5f@mail.gmail.com> On Tue, Jul 1, 2008 at 11:32 PM, Brett Cannon wrote: > On Tue, Jul 1, 2008 at 8:36 PM, Brett Cannon wrote: >> On Tue, Jul 1, 2008 at 7:38 PM, Benjamin Peterson >> wrote: >>> On Tue, Jul 1, 2008 at 9:04 PM, Brett Cannon wrote: >>>> I just committed r64651 which is my attempt to add support to >>>> fix_imports so that modules that have been split up in 3.0 can be >>>> properly fixed. 2to3's test suite passes and all, but I am not sure if >>>> I botched it somehow since I did the change slightly blind. Can >>>> someone just do a quick check to make sure I did it properly? Also, >>>> what order should renames be declared to give priority to certain >>>> renames (e.g., urllib should probably be renamed to urllib.requeste >>>> over urllib.error when not used in a ``from ... import`` statement). >>> >>> Well for starters, you know the test for fix_imports is disabled, right? >>> >> >> Nope, I forgot and turning it on has it failing running under 2.5. >> > > And refactor.py cannot be run directly from 2.5 because of a relative > import and in 2.6 (where runpy has extra smarts) it still doesn't work > thanks to main() not being passed an argument is needs (Issue3131). Why are you trying to run refactor.py directly, rather than using 2to3 (http://svn.python.org/view/sandbox/trunk/2to3/2to3) as an entry point? > Looks like 2to3 needs some TLC. Agreed. A lot of the pending bugs seem to be related to the version of lib2to3 in the stdlib, rather than the stand-alone product. Neal Norwitz and I have been working to turn parts of 2to3 into a more general refactoring library; once that's done (or even preferably before), lib2to3 should be removed from the stdlib. It's causing far more trouble than it's worth. Collin From guido at python.org Wed Jul 2 19:08:01 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 2 Jul 2008 10:08:01 -0700 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <20080702141328.GW62693@nexus.in-nomine.org> References: <20080702141328.GW62693@nexus.in-nomine.org> Message-ID: I think we should continue to leave this up to the distribution. AFAIK many Linux distros already use UCS4 for everything anyway. The alternative (no matter what the configure flag is called) is UTF-16, not UCS-2 though: there is support for surrogate pairs in various places, including the \U escape and the UTF-8 codec. I don't want to rule out UTF-16 as internal representation from the Python language spec, because JVM- and .NET-based implementations pretty much have no choice in the matter if they want to be compatible with the native string type (which is very important for performance and compatibility with other languages on those platforms). For that reason I think it's also better that the configure script continues to default to UTF-16 -- this will give the UTF-16 support code the necessary exercise. (It is mostly a superset of the UCS-4 support code, so I'm less worried about the latter getting enough exercise.) --Guido On Wed, Jul 2, 2008 at 7:13 AM, Jeroen Ruigrok van der Werven wrote: > > Guido (and others of course), > > back in 2001 you pointed out that you wanted to move to UCS4 completely as > the ideal situation > (http://mail.python.org/pipermail/i18n-sig/2001-June/001107.html) over the > current default UCS2. > > Given 3.0 will use Unicode strings as the default, would it also not make > sense to make the switch at this point as well? > > The current situation with UCS2 is particularly bad now that the CJK > ideographs Extension B. has been produced (and C is under ballot and D is > under development). > > Personally I use nothing else but UCS4 compiled Python binaries for the past > years. > > See also http://www.python.org/dev/peps/pep-0261/ for background for the > 2001 options. > > -- > Jeroen Ruigrok van der Werven / asmodai > ????? ?????? ??? ?? ?????? > http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B > Expansion of happiness is the purpose of life... > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jul 2 19:13:45 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 2 Jul 2008 10:13:45 -0700 Subject: [Python-Dev] Who wants to work with Klocwork? Message-ID: I've got an offer from Klocwork (a static source code analysis company, www.klocwork.com) to give some developers free access to their findings from running their bug-finding software over Python source code. I don't have the bandwidth to deal with this myself, but I think it would be valuable if we could get some folks to look at their findings. We have a similar relationship with one of Klocwork's competitors. In my experience, each vendor's tool has a different strength, and it is likely that each will find some important bugs that the other didn't flag. So IMO it's useful to do this with each vendor that offers... -- --Guido van Rossum (home page: http://www.python.org/~guido/) From asmodai at in-nomine.org Wed Jul 2 19:19:57 2008 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Wed, 2 Jul 2008 19:19:57 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> Message-ID: <20080702171957.GY62693@nexus.in-nomine.org> -On [20080702 19:08], Guido van Rossum (guido at python.org) wrote: >I think we should continue to leave this up to the distribution. AFAIK >many Linux distros already use UCS4 for everything anyway. FreeBSD's ports makes it a configure option. >For that reason I think it's also better that the configure script >continues to default to UTF-16 -- this will give the UTF-16 support >code the necessary exercise. (It is mostly a superset of the UCS-4 >support code, so I'm less worried about the latter getting enough >exercise.) I was under the impression that it was still UCS2 and thus limiting things to the BMP only. So you are saying it's UTF-16 nowadays? For both 2.6 and 3.0? -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Nature does nothing uselessly... From guido at python.org Wed Jul 2 19:42:13 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 2 Jul 2008 10:42:13 -0700 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <20080702171957.GY62693@nexus.in-nomine.org> References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> Message-ID: On Wed, Jul 2, 2008 at 10:19 AM, Jeroen Ruigrok van der Werven wrote: > -On [20080702 19:08], Guido van Rossum (guido at python.org) wrote: >>I think we should continue to leave this up to the distribution. AFAIK >>many Linux distros already use UCS4 for everything anyway. > > FreeBSD's ports makes it a configure option. > >>For that reason I think it's also better that the configure script >>continues to default to UTF-16 -- this will give the UTF-16 support >>code the necessary exercise. (It is mostly a superset of the UCS-4 >>support code, so I'm less worried about the latter getting enough >>exercise.) > > I was under the impression that it was still UCS2 and thus limiting things > to the BMP only. So you are saying it's UTF-16 nowadays? For both 2.6 and > 3.0? Yes. At least in the sense that \Uxxxxxxxx gets translated to a surrogate pair, and that the UTF-8 codec supports surrogate pairs in both directions. It's been like this for a long time. What else would you expect from UTF-16 support? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jul 2 20:17:16 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 2 Jul 2008 11:17:16 -0700 Subject: [Python-Dev] Who wants to work with Klocwork? In-Reply-To: References: Message-ID: Followup: Neal Norwitz will coordinate this, send mail to him if you're interested, not to me. :-) On Wed, Jul 2, 2008 at 10:13 AM, Guido van Rossum wrote: > I've got an offer from Klocwork (a static source code analysis > company, www.klocwork.com) to give some developers free access to > their findings from running their bug-finding software over Python > source code. I don't have the bandwidth to deal with this myself, but > I think it would be valuable if we could get some folks to look at > their findings. > > We have a similar relationship with one of Klocwork's competitors. In > my experience, each vendor's tool has a different strength, and it is > likely that each will find some important bugs that the other didn't > flag. So IMO it's useful to do this with each vendor that offers... > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From asmodai at in-nomine.org Wed Jul 2 20:22:15 2008 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Wed, 2 Jul 2008 20:22:15 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> Message-ID: <20080702182215.GA62693@nexus.in-nomine.org> -On [20080702 19:42], Guido van Rossum (guido at python.org) wrote: >Yes. At least in the sense that \Uxxxxxxxx gets translated to a >surrogate pair, and that the UTF-8 codec supports surrogate pairs in >both directions. It's been like this for a long time. What else would >you expect from UTF-16 support? Well, unless I misunderstand things, a Python 3 compiled with the default Unicode option gives this: >>> len("\N{MUSICAL SYMBOL G CLEF}") 2 Whereas a Python 3 with --with-wide-unicode gives: >>> len("\N{MUSICAL SYMBOL G CLEF}") 1 This, of course, causes problems with splitting, finding, and so on. So that means that a Python 3 with only 2 byte Unicode support is not to be used/recommended for Unicode outside of the BMP. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Tomorrow's battle is won during today's practice... From guido at python.org Wed Jul 2 20:27:42 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 2 Jul 2008 11:27:42 -0700 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <20080702182215.GA62693@nexus.in-nomine.org> References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> Message-ID: On Wed, Jul 2, 2008 at 11:22 AM, Jeroen Ruigrok van der Werven wrote: > -On [20080702 19:42], Guido van Rossum (guido at python.org) wrote: >>Yes. At least in the sense that \Uxxxxxxxx gets translated to a >>surrogate pair, and that the UTF-8 codec supports surrogate pairs in >>both directions. It's been like this for a long time. What else would >>you expect from UTF-16 support? > > Well, unless I misunderstand things, a Python 3 compiled with the default > Unicode option gives this: > >>>> len("\N{MUSICAL SYMBOL G CLEF}") > 2 > > Whereas a Python 3 with --with-wide-unicode gives: > > >>>> len("\N{MUSICAL SYMBOL G CLEF}") > 1 > > This, of course, causes problems with splitting, finding, and so on. Understood. > So that > means that a Python 3 with only 2 byte Unicode support is not to be > used/recommended for Unicode outside of the BMP. I disagree. Instead, I would say that such code needs to be aware of surrogates. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From asmodai at in-nomine.org Wed Jul 2 20:35:41 2008 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Wed, 2 Jul 2008 20:35:41 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> Message-ID: <20080702183541.GB62693@nexus.in-nomine.org> -On [20080702 20:27], Guido van Rossum (guido at python.org) wrote: >I disagree. Instead, I would say that such code needs to be aware of >surrogates. Just to make sure I understood you: Python's code needs to be made aware of surrogates? If so, do you want me to log issues for the things encountered? -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Learn from the past -- don't wear it like a yoke around your neck... From guido at python.org Wed Jul 2 20:47:02 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 2 Jul 2008 11:47:02 -0700 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <20080702183541.GB62693@nexus.in-nomine.org> References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> Message-ID: On Wed, Jul 2, 2008 at 11:35 AM, Jeroen Ruigrok van der Werven wrote: > -On [20080702 20:27], Guido van Rossum (guido at python.org) wrote: >>I disagree. Instead, I would say that such code needs to be aware of >>surrogates. > > Just to make sure I understood you: > > Python's code needs to be made aware of surrogates? No, Python already is aware of surrogates. I meant applications processing non-BMP text should beware of them. > If so, do you want me to log issues for the things encountered? If you find places where the Python core or standard library is doing Unicode processing that would break when surrogates are present you should file a bug. However this does not mean that every bit of code that slices a string at an arbitrary point (and hence risks slicing in the middle of a surrogate) is incorrect -- it all depends on what is done next with the slice. I'd also prefer to receive bug reports about breakages actually encountered in the wild than purely theoretical issues. And in all cases a fragment of test code to reproduce the problem would be appreciated. > -- > Jeroen Ruigrok van der Werven / asmodai > ????? ?????? ??? ?? ?????? > http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B > Learn from the past -- don't wear it like a yoke around your neck... > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Wed Jul 2 21:02:46 2008 From: brett at python.org (Brett Cannon) Date: Wed, 2 Jul 2008 12:02:46 -0700 Subject: [Python-Dev] Can someone check my lib2to3 change for fix_imports? In-Reply-To: <43aa6ff70807020930j32ab1564n33ea41b9db487400@mail.gmail.com> References: <1afaf6160807011938r44a58880n60e696b25c098617@mail.gmail.com> <43aa6ff70807020930j32ab1564n33ea41b9db487400@mail.gmail.com> Message-ID: On Wed, Jul 2, 2008 at 9:30 AM, Collin Winter wrote: > On Tue, Jul 1, 2008 at 7:38 PM, Benjamin Peterson > wrote: >> On Tue, Jul 1, 2008 at 9:04 PM, Brett Cannon wrote: >>> I just committed r64651 which is my attempt to add support to >>> fix_imports so that modules that have been split up in 3.0 can be >>> properly fixed. 2to3's test suite passes and all, but I am not sure if >>> I botched it somehow since I did the change slightly blind. Can >>> someone just do a quick check to make sure I did it properly? Also, >>> what order should renames be declared to give priority to certain >>> renames (e.g., urllib should probably be renamed to urllib.requeste >>> over urllib.error when not used in a ``from ... import`` statement). >> >> Well for starters, you know the test for fix_imports is disabled, right? > > Why was this test disabled, rather than fixed? That seems a rather > poor solution to the problem of it taking longer than desired to run. I think it may have been to turn off a failing test just before a release and it was just never switched back on. -Brett From brett at python.org Wed Jul 2 21:06:01 2008 From: brett at python.org (Brett Cannon) Date: Wed, 2 Jul 2008 12:06:01 -0700 Subject: [Python-Dev] Can someone check my lib2to3 change for fix_imports? In-Reply-To: <43aa6ff70807020936j6bfa578cm9b260d363f34e5f@mail.gmail.com> References: <1afaf6160807011938r44a58880n60e696b25c098617@mail.gmail.com> <43aa6ff70807020936j6bfa578cm9b260d363f34e5f@mail.gmail.com> Message-ID: On Wed, Jul 2, 2008 at 9:36 AM, Collin Winter wrote: > On Tue, Jul 1, 2008 at 11:32 PM, Brett Cannon wrote: >> On Tue, Jul 1, 2008 at 8:36 PM, Brett Cannon wrote: >>> On Tue, Jul 1, 2008 at 7:38 PM, Benjamin Peterson >>> wrote: >>>> On Tue, Jul 1, 2008 at 9:04 PM, Brett Cannon wrote: >>>>> I just committed r64651 which is my attempt to add support to >>>>> fix_imports so that modules that have been split up in 3.0 can be >>>>> properly fixed. 2to3's test suite passes and all, but I am not sure if >>>>> I botched it somehow since I did the change slightly blind. Can >>>>> someone just do a quick check to make sure I did it properly? Also, >>>>> what order should renames be declared to give priority to certain >>>>> renames (e.g., urllib should probably be renamed to urllib.requeste >>>>> over urllib.error when not used in a ``from ... import`` statement). >>>> >>>> Well for starters, you know the test for fix_imports is disabled, right? >>>> >>> >>> Nope, I forgot and turning it on has it failing running under 2.5. >>> >> >> And refactor.py cannot be run directly from 2.5 because of a relative >> import and in 2.6 (where runpy has extra smarts) it still doesn't work >> thanks to main() not being passed an argument is needs (Issue3131). > > Why are you trying to run refactor.py directly, rather than using 2to3 > (http://svn.python.org/view/sandbox/trunk/2to3/2to3) as an entry > point? > Because I honestly did not see it yesterday in my terminal. I blame it on Canada Day. =) >> Looks like 2to3 needs some TLC. > > Agreed. A lot of the pending bugs seem to be related to the version of > lib2to3 in the stdlib, rather than the stand-alone product. Neal > Norwitz and I have been working to turn parts of 2to3 into a more > general refactoring library; once that's done (or even preferably > before), lib2to3 should be removed from the stdlib. It's causing far > more trouble than it's worth. Fine by me. -Brett From martin at v.loewis.de Wed Jul 2 21:51:49 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 02 Jul 2008 21:51:49 +0200 Subject: [Python-Dev] Can someone check my lib2to3 change for fix_imports? In-Reply-To: <43aa6ff70807020930j32ab1564n33ea41b9db487400@mail.gmail.com> References: <1afaf6160807011938r44a58880n60e696b25c098617@mail.gmail.com> <43aa6ff70807020930j32ab1564n33ea41b9db487400@mail.gmail.com> Message-ID: <486BDC55.6070506@v.loewis.de> > Why was this test disabled, rather than fixed? That seems a rather > poor solution to the problem of it taking longer than desired to run. I disabled it because I didn't know how to fix it, and created bug reports 2968 and 2969 in return. It is policy that tests that break get disabled, rather than keeping them broken. Regards, Martin From martin at v.loewis.de Wed Jul 2 22:09:57 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 02 Jul 2008 22:09:57 +0200 Subject: [Python-Dev] Can someone check my lib2to3 change for fix_imports? In-Reply-To: <43aa6ff70807020936j6bfa578cm9b260d363f34e5f@mail.gmail.com> References: <1afaf6160807011938r44a58880n60e696b25c098617@mail.gmail.com> <43aa6ff70807020936j6bfa578cm9b260d363f34e5f@mail.gmail.com> Message-ID: <486BE095.2030801@v.loewis.de> > Agreed. A lot of the pending bugs seem to be related to the version of > lib2to3 in the stdlib, rather than the stand-alone product. Neal > Norwitz and I have been working to turn parts of 2to3 into a more > general refactoring library; once that's done (or even preferably > before), lib2to3 should be removed from the stdlib. It's causing far > more trouble than it's worth. I disagree. I think it is quite useful that distutils is able to invoke it, and other people also asked for that feature on PyCon. Why do you think the trouble wouldn't be caused if it wasn't a standard library package? Regards, Martin From collinw at gmail.com Wed Jul 2 22:39:58 2008 From: collinw at gmail.com (Collin Winter) Date: Wed, 2 Jul 2008 13:39:58 -0700 Subject: [Python-Dev] Can someone check my lib2to3 change for fix_imports? In-Reply-To: <486BDC55.6070506@v.loewis.de> References: <1afaf6160807011938r44a58880n60e696b25c098617@mail.gmail.com> <43aa6ff70807020930j32ab1564n33ea41b9db487400@mail.gmail.com> <486BDC55.6070506@v.loewis.de> Message-ID: <43aa6ff70807021339j56d81522hbbfcb754fa027150@mail.gmail.com> On Wed, Jul 2, 2008 at 12:51 PM, "Martin v. L?wis" wrote: >> Why was this test disabled, rather than fixed? That seems a rather >> poor solution to the problem of it taking longer than desired to run. > > I disabled it because I didn't know how to fix it, and created bug > reports 2968 and 2969 in return. So you did. I didn't notice them, sorry. > It is policy that tests that break > get disabled, rather than keeping them broken. From collinw at gmail.com Wed Jul 2 22:40:42 2008 From: collinw at gmail.com (Collin Winter) Date: Wed, 2 Jul 2008 13:40:42 -0700 Subject: [Python-Dev] Can someone check my lib2to3 change for fix_imports? In-Reply-To: <486BE095.2030801@v.loewis.de> References: <1afaf6160807011938r44a58880n60e696b25c098617@mail.gmail.com> <43aa6ff70807020936j6bfa578cm9b260d363f34e5f@mail.gmail.com> <486BE095.2030801@v.loewis.de> Message-ID: <43aa6ff70807021340w76804278rd0b90c31a8175faa@mail.gmail.com> On Wed, Jul 2, 2008 at 1:09 PM, "Martin v. L?wis" wrote: >> Agreed. A lot of the pending bugs seem to be related to the version of >> lib2to3 in the stdlib, rather than the stand-alone product. Neal >> Norwitz and I have been working to turn parts of 2to3 into a more >> general refactoring library; once that's done (or even preferably >> before), lib2to3 should be removed from the stdlib. It's causing far >> more trouble than it's worth. > > I disagree. I think it is quite useful that distutils is able to > invoke it, and other people also asked for that feature on PyCon. But distutils currently *doesn't* invoke it, AFAICT (unless that support is implemented somewhere other than trunk/Lib/distutils/), and no-one has stepped up to make that happen in the months since PyCon. Moreover, as I told those people who asked for this at PyCon, 2to3 is and will never be perfect, meaning that at best, distutils/2to3 integration would look like "python setup.py run2to3", where distutils is just a long-hand way of running 2to3 over your code. This strikes me as a waste of time. > Why do you think the trouble wouldn't be caused if it wasn't > a standard library package? Problems with the current setup: 1) People are currently confused as to where they should be commit fixes. 2) Changes to the sandbox version have to be manually merged into the stdlib version, which is more overhead than I think it's worth. In addition, the stdlib version lags the sandbox version. 3) At least one bug report (issue3131) has mentioned problems with the stdlib 2to3 exhibiting problems that the stand-alone version does not. This is again extra overhead. 4) The test_imports test was commented out because of stdlib test policy. I'd rather not have that policy imposed on 2to3. Collin From martin at v.loewis.de Thu Jul 3 00:36:59 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 03 Jul 2008 00:36:59 +0200 Subject: [Python-Dev] Can someone check my lib2to3 change for fix_imports? In-Reply-To: <43aa6ff70807021340w76804278rd0b90c31a8175faa@mail.gmail.com> References: <1afaf6160807011938r44a58880n60e696b25c098617@mail.gmail.com> <43aa6ff70807020936j6bfa578cm9b260d363f34e5f@mail.gmail.com> <486BE095.2030801@v.loewis.de> <43aa6ff70807021340w76804278rd0b90c31a8175faa@mail.gmail.com> Message-ID: <486C030B.6060405@v.loewis.de> > But distutils currently *doesn't* invoke it, AFAICT Sure. In 3k, look at Lib/distutils/command/build.py:build_py_2to3. That's how I ported Django to Py3k. > 1) People are currently confused as to where they should be commit fixes. Sure, but it only happens rarely. > 2) Changes to the sandbox version have to be manually merged into the > stdlib version, which is more overhead than I think it's worth. In > addition, the stdlib version lags the sandbox version. It's not a real problem, IMO, using msgmerge is fairly straight-forward. > 3) At least one bug report (issue3131) has mentioned problems with the > stdlib 2to3 exhibiting problems that the stand-alone version does not. > This is again extra overhead. I think the 2to3 packaging issue is otherwise unresolved. Do you want 2to3 to be excluded completely from 2.6 and 3.1 releases? If not, how do you want them packaged? Will it work if packaged in that way? > 4) The test_imports test was commented out because of stdlib test > policy. I'd rather not have that policy imposed on 2to3. It would be possible to comment out the test only in the copy in the stdlib version, or to omit testing 2to3 in the stdlib altogether, if that helps. Regards, Martin From asmodai at in-nomine.org Thu Jul 3 12:48:13 2008 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Thu, 3 Jul 2008 12:48:13 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> Message-ID: <20080703104813.GF62693@nexus.in-nomine.org> My apologies for hammering on this, but I think it is quite important and currently Python 3.0 seems confused about UCS-2 versus UTF-16. -On [20080702 20:47], Guido van Rossum (guido at python.org) wrote: >No, Python already is aware of surrogates. I meant applications >processing non-BMP text should beware of them. Just to make sure people are fully aware of the distinctions: UCS-2 uses 16 bits to encode Unicode data, does NOT support surrogate pairs and therefore CANNOT represent data beyond U+FFFF (thus only supporting the Basic Multilingual Plane, BMP). It is a fixed-length character encoding. UTF-16 also uses 16 bits to encode Unicode data, but DOES support surrogate pairs and therefore CAN represent data beyond U+FFFF by using said surrogate pairs (thus supporting all planes). It is a variable-length character encoding. So a string representation in UCS-2 means every character occupies 16 bits. A string representation in UTF-16 means characters can occupy 16 bits or 32-bits. If one stays within the BMP than all is well, but when you move beyond the BMP (U+10000 - U+10FFFF) then Python needs to correctly check the string for surrogate pairs and deal with them internally. >If you find places where the Python core or standard library is doing >Unicode processing that would break when surrogates are present you >should file a bug. However this does not mean that every bit of code >that slices a string at an arbitrary point (and hence risks slicing in >the middle of a surrogate) is incorrect -- it all depends on what is >done next with the slice. Basically everything but string forming or string printing seems to be broken for surrogate pairs, from what I can tell. Also, I think you are confused about slicing in the middle of a surrogate pair, from a UTF-16 perspective this is 1 codepoint! And as such Python needs to treat it as one character/codepoint in a string, dealing with slicing as appropriate. The way you currently describe it is that UTF-16 strings will be treated as UCS-2 when it comes to slicing and the likes. >From a UTF-16 point of view such slicing can NEVER occur unless you are bit or byte slicing instead of character/codepoint slicing. The documentation for len() says: Return the length (the number of items) of an object. I think it can be fairly said that an item in a string is a character or codepoint. Take for example the following string: a = '\U00020045\u942a' # Two hanzi/kanji/hanja >From a Unicode perspective we are looking at two characters/codepoints. When we use a 4-byte Python 3.0 binary we get (as expected): >>> len(a) 2 When we use a 2-byte Python 3.0 binary (the default) we get (not as expected): >>> len(a) 3 >From a UTF-16 perspective a surrogate pair is one character/codepoint and as such len() should have reported 2 as well. That the sequence is stored internally as 0xd840 0xdc45 0x942a and occupies 3 bytes is not interesting. But it seems as if len() is treating the string as being in UCS-2 (fixed-length), which is the only logical explanation for the number 3, instead of treating it as UTF-16 (variable-length) and reporting the number 2. Subsequently doing a: print a[1] to get the 0x942a (?) actually requires a[2] on the 2-byte Python 3.0. As such the code you write for 2-byte and 4-byte Python 3.0 is *different* when you have to deal with the same Unicode strings! This cannot be the desired situation, can it? Two more examples: >>> a.find('?') # 4-byte 1 >>> a.find('?') # 2-byte 2 >>> import re # 4-byte >>> m = re.search('?', a) >>> m.start() 1 >>> import re # 2-byte >>> m = re.search('?', a) >>> m.start() 2 This, in my opinion, has nothing to do with the application writers, but more with Python's internals being confused about UCS-2 and UTF-16. We accept full 32-bit codepoints with the \U escape in strings, and we may even store it as UTF-16 internally, but we clearly do not deal with it properly as UTF-16, but rather as UCS-2, when it comes to using said strings with core functions and modules. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B For wouldst thou not carve at my Soul with thine sword of Supreme Truth? From solipsis at pitrou.net Thu Jul 3 13:58:17 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 3 Jul 2008 11:58:17 +0000 (UTC) Subject: [Python-Dev] UCS2/UCS4 default References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> Message-ID: Hi, > Subsequently doing a: print a[1] to get the 0x942a (?) actually requires > a[2] on the 2-byte Python 3.0. How is it annoying *in practice*? In actual code the index, instead of being a constant, will be retrieved through various means such as .find() or re.search().start()... as you show yourself later in your message. What is primordial is that Python shows a consistent behaviour, and it does, since indices returned by .find() et al. have the same meaning as indices you can use with the [] operator. AFAIK that's why Guido asked for real-world rather than theoretical examples. Regards Antoine. From ncoghlan at gmail.com Thu Jul 3 14:39:29 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 03 Jul 2008 22:39:29 +1000 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <20080703104813.GF62693@nexus.in-nomine.org> References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> Message-ID: <486CC881.5090902@gmail.com> Jeroen Ruigrok van der Werven wrote: > The documentation for len() says: > Return the length (the number of items) of an object. So what this tells us is that in a UCS-2 build of Python, the "items" in a unicode string are not, strictly speaking, Unicode code points or characters. Instead, they are successive 16-bit fragments of a UTF-16 encoded string (which correspond to characters only if there are no surrogate pairs present in the string). Let's look at the options here: 1. System is NOT memory limited (i.e. most desktops): use a UCS-4 Python build, which is what most Linux distributions do (I'm not sure about the pydotorg provided Windows or Mac OS X builds). 2. System is memory limited, only BMP Unicode code points are used: use a UCS-2 Python build, limit yourself to characters on the BMP (possibly enforced by use of an appropriate codec to decode input text). 3. System is memory limited, but needs to support characters beyond the BMP: use a UCS-2 Python build, handling any codepoints outside the BMP in application code. The current Python approach handles all three cases relatively gracefully and with minimal overhead. Dealing natively with surrogate pair issues could easily result in pointless complexity for cases 1 and 2, while completely disallowing codepoints beyond the BMP in a UCS-2 build would needlessly rule out option 3. So here's the challenge: 1. If you are advocating disallowing the use of characters outside the BMP in a UCS-2 build, enumerate the advantages of doing so (paying particular attention to any advantages which cannot be obtained simply by using an appropriate codec that disallows non-BMP characters). 2. If you are advocating making the "items" in a Unicode string code points even in a UCS-2 build, enumerate all of the string behaviours that would have to change, as well as indicating how to avoid causing a reduction in speed for cases 1 and 2 above. Sure, option 2 might be nice to have, but the purity argument isn't going to be anywhere near enough motivation to justify the additional code complexity - there need to be practical benefits that aren't better met just by sacrificing a bit of memory efficiency and switching to a UCS-4 build. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From mal at egenix.com Thu Jul 3 15:00:22 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 03 Jul 2008 15:00:22 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <486CC881.5090902@gmail.com> References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> Message-ID: <486CCD66.70906@egenix.com> I think the discussion is going in the wrong direction: The choice between UCS2 and UCS4 builds is really only meant to enhance the possibility to interface to native OS or application APIs, e.g. Windows LIBC and Java use UTF-16, glibc on Unix uses UCS4. The problem of slicing Unicode objects is far more complicated than just breaking a surrogate pair. Unicode if full of combining code points - if you break such a sequence, the output will be just as wrong; regardless of UCS2 vs. UCS4. A long time ago we had a discussion about these problems. I had suggested a new module (unicodeindex IIRC) which takes care of indexing Unicode strings based on code points (which support for surrogates), glyphs (taking combining code points into account) and words (with support for various breaking/non-breaking separation code points). Trying to solve such issues at the storage level is the wrong approach, since the problem is application specific and thus requires a higher-level set of possible solutions. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 03 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 3 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From djarb at highenergymagic.org Thu Jul 3 15:14:47 2008 From: djarb at highenergymagic.org (Daniel Arbuckle) Date: Thu, 3 Jul 2008 06:14:47 -0700 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <486CC881.5090902@gmail.com> References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> Message-ID: On Thu, Jul 3, 2008 at 5:39 AM, Nick Coghlan wrote: > 1. If you are advocating disallowing the use of characters outside the BMP > in a UCS-2 build, enumerate the advantages of doing so (paying particular > attention to any advantages which cannot be obtained simply by using an > appropriate codec that disallows non-BMP characters). Right now, the same python code has different meaning, depending on a compile-time option that most users didn't even set for themselves. Moreover, the errors caused by this semantic difference are not reported. There's just no way to justify that. You can't solve this problem by saying 'programmers should choose a codec that limits them to the BMP when they target 2-byte python,' because the problem specifically arises when code that works correctly in a 4-byte python is placed into a 2-byte python, an operation performed by the users rather than by programmers. Since 2-byte python is apparently a holdover for memory-limited (and presumably CPU-limited as well) systems, it doesn't make sense to impose on it the requirement of correctly dealing with surrogate pairs. Given that, it seems to me that the best solution would be to make 4-byte python the default, and also to make 2-byte python raise an exception when it encounters characters outside the BMP. This way, a mysterious and unreported semantic error becomes an explicit syntactic error. For programmers who want to target a 2-byte format (for win32 compatibility, for example), the correct choice of codec is a superior solution to forcing a 2-byte internal representation on python. From asmodai at in-nomine.org Thu Jul 3 15:21:46 2008 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Thu, 3 Jul 2008 15:21:46 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <486CCD66.70906@egenix.com> References: <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <486CCD66.70906@egenix.com> Message-ID: <20080703132146.GI62693@nexus.in-nomine.org> -On [20080703 15:00], M.-A. Lemburg (mal at egenix.com) wrote: >Unicode if full of combining code points - if you break such a sequence, >the output will be just as wrong; regardless of UCS2 vs. UCS4. In my opinion you are confusing two related, but very separated things here. Combining characters have nothing to do with breaking up the encoding of a single codepoint. Sure enough, if you arbitrary slice up codepoints that consist of combining characters then your result is indeed odd looking. I never said that nor is that the point I am making. Guido points out that Python supports surrogate pairs and says that if Python is dealing wrongly with this in the core than it needs to be fixed. I am pointing out that given the fact we allow surrogate pairs we deal rather simplistic with it in the core. In fact, we do not consider them at all. In essence: though we may accept full 21-bit codepoints in the form of \U00000000 escape sequences and store them internally as UTF-16 (which I still need to verify) we subsequently deal with them programmatically as UCS-2, which is plain silly. You either commit yourself fully to UTF-16 and surrogate pairs or not. Not some form in-between, because that will ultimately lead to more confusion due to the difference in results when dealing with Unicode. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Believe in Angels... From mhammond at skippinet.com.au Thu Jul 3 15:42:58 2008 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 3 Jul 2008 23:42:58 +1000 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> Message-ID: <031901c8dd12$b7258b60$2570a220$@com.au> > For programmers who want to target a 2-byte format (for win32 > compatibility, for example) As MAL said, this is taking the discussion in the wrong direction. For people on Windows, win32 isn't a "compatibility" consideration. I suspect most users of the other platforms MAL mentioned and all others with their own native unicode implementations would agree. Cheers, Mark From mal at egenix.com Thu Jul 3 15:57:41 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 03 Jul 2008 15:57:41 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <20080703132146.GI62693@nexus.in-nomine.org> References: <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <486CCD66.70906@egenix.com> <20080703132146.GI62693@nexus.in-nomine.org> Message-ID: <486CDAD5.1060506@egenix.com> On 2008-07-03 15:21, Jeroen Ruigrok van der Werven wrote: > -On [20080703 15:00], M.-A. Lemburg (mal at egenix.com) wrote: >> Unicode if full of combining code points - if you break such a sequence, >> the output will be just as wrong; regardless of UCS2 vs. UCS4. > > In my opinion you are confusing two related, but very separated things here. > Combining characters have nothing to do with breaking up the encoding of a > single codepoint. Sure enough, if you arbitrary slice up codepoints that > consist of combining characters then your result is indeed odd looking. > > I never said that nor is that the point I am making. Please remember that lone surrogate pair code points are perfectly valid Unicode code points, nevertheless. Just as a lone combining code point is valid on its own. > Guido points out that Python supports surrogate pairs and says that if > Python is dealing wrongly with this in the core than it needs to be fixed. > I am pointing out that given the fact we allow surrogate pairs we deal > rather simplistic with it in the core. In fact, we do not consider them at > all. In essence: though we may accept full 21-bit codepoints in the form of > \U00000000 escape sequences and store them internally as UTF-16 (which I > still need to verify) we subsequently deal with them programmatically as > UCS-2, which is plain silly. Python applies conversion from non-BMP code points to surroagtes for UCS builds in a few places and I agree that we should probably do that at a few more places. However, these are mainly conversion issues of encoded Unicode representations vs. the internal Unicode storage where you want to avoid exceptions in favor of finding a solution that preserves data. To make it clear: UCS2 builds of Python do not support non-BMP code points out of the box. A programmer will always have to use a codec to map the internal storage on these builds to the full Unicode code point range. The following codecs support surrogates on UCS2 builds: * UTF-8 * UTF-16 * UTF-32 * unicode-escape * raw-unicode-escape > You either commit yourself fully to UTF-16 and surrogate pairs or not. Not > some form in-between, because that will ultimately lead to more confusion > due to the difference in results when dealing with Unicode. Programmers will have to be aware of the fact that on UCS2 builds of Python non-BMP code points will have to be treated differently than on UCS4 builds. I don't see that as a problem. It is in a way similar to 32-bit vs. 64-bit builds of Python or the fact that floating point numbers work differently depending on the Python platform or compiler being used. BTW: Have you ever run into any problems with UCS2 vs. UCS4 in practice that were not easy to solve ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 03 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 3 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From guido at python.org Thu Jul 3 15:58:26 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 3 Jul 2008 06:58:26 -0700 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <20080703104813.GF62693@nexus.in-nomine.org> References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> Message-ID: On Thu, Jul 3, 2008 at 3:48 AM, Jeroen Ruigrok van der Werven wrote: > My apologies for hammering on this, but I think it is quite important and > currently Python 3.0 seems confused about UCS-2 versus UTF-16. [...] Your seem to be suggesting that len(u"\U00012345") should return 1 on a system that internally uses UTF-16 and hence represents this string as a surrogate pair. This is not going to happen. You may as well complain to the authors of the Java standard about the corresponding problem there. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From djarb at highenergymagic.org Thu Jul 3 16:38:47 2008 From: djarb at highenergymagic.org (Daniel Arbuckle) Date: Thu, 3 Jul 2008 07:38:47 -0700 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <031901c8dd12$b7258b60$2570a220$@com.au> References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <031901c8dd12$b7258b60$2570a220$@com.au> Message-ID: On Thu, Jul 3, 2008 at 6:42 AM, Mark Hammond wrote: > For people on Windows, win32 isn't a "compatibility" consideration. I > suspect most users of the other platforms MAL mentioned and all others with > their own native unicode implementations would agree. I'm sorry, but you're wrong. Interfacing python to interoperate with the underlying system is compatibility. Surely your own win32 extensions already address this necessity. Regardless, as I said before, nothing justifies silently changing the meaning of a program based on an option that most users don't set for themselves and are not aware of. When such a change would take place, it should be reported explicitly as an error. From asmodai at in-nomine.org Thu Jul 3 16:46:48 2008 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Thu, 3 Jul 2008 16:46:48 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> Message-ID: <20080703144648.GA34192@nexus.in-nomine.org> -On [20080703 15:58], Guido van Rossum (guido at python.org) wrote: >Your seem to be suggesting that len(u"\U00012345") should return 1 on >a system that internally uses UTF-16 and hence represents this string >as a surrogate pair. >From a Unicode and UTF-16 point of view that makes the most sense. So yes, I am suggesting that. >This is not going to happen. You may as well complain to the authors >of the Java standard about the corresponding problem there. Why would I need to complain to them? They already fixed it since 1.5.0. Java 1.5.0's release notes (http://java.sun.com/developer/technicalArticles/releases/j2se15/): Supplementary Character Support 32-bit supplementary character support has been carefully added to the platform as part of the transition to Unicode 4.0 support. Supplementary characters are encoded as a special pair of UTF16 values to generate a different character, or codepoint. A surrogate pair is a combination of a high UTF16 value and a following low UTF16 value. The high and low values are from a special range of UTF16 values. In general, when using a String or sequence of characters, the core API libraries will transparently handle the new supplementary characters for you. See also http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html The methods that accept an int value support all Unicode characters, including supplementary characters. For example, Character.isLetter(0x2F81A) returns true because the code point value represents a letter (a CJK ideograph). -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Life can only be understood backwards, but it must be lived forwards... From guido at python.org Thu Jul 3 17:03:55 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 3 Jul 2008 08:03:55 -0700 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <20080703144648.GA34192@nexus.in-nomine.org> References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <20080703144648.GA34192@nexus.in-nomine.org> Message-ID: On Thu, Jul 3, 2008 at 7:46 AM, Jeroen Ruigrok van der Werven wrote: > -On [20080703 15:58], Guido van Rossum (guido at python.org) wrote: >>Your seem to be suggesting that len(u"\U00012345") should return 1 on >>a system that internally uses UTF-16 and hence represents this string >>as a surrogate pair. > > From a Unicode and UTF-16 point of view that makes the most sense. So yes, I > am suggesting that. > >>This is not going to happen. You may as well complain to the authors >>of the Java standard about the corresponding problem there. > > Why would I need to complain to them? They already fixed it since 1.5.0. > > Java 1.5.0's release notes > (http://java.sun.com/developer/technicalArticles/releases/j2se15/): > > Supplementary Character Support > > 32-bit supplementary character support has been carefully added to the > platform as part of the transition to Unicode 4.0 support. Supplementary > characters are encoded as a special pair of UTF16 values to generate a > different character, or codepoint. A surrogate pair is a combination of a > high UTF16 value and a following low UTF16 value. The high and low values > are from a special range of UTF16 values. > > In general, when using a String or sequence of characters, the core API > libraries will transparently handle the new supplementary characters for > you. > > See also http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html > > The methods that accept an int value support all Unicode characters, > including supplementary characters. For example, Character.isLetter(0x2F81A) > returns true because the code point value represents a letter (a CJK > ideograph). I don't see an answer there to the question of whether the length() method of a Java String object containing a single surrogate pair returns 1 or 2; I suspect it returns 2. Python 3 supports things like chr(0x12345) and ord("\U00012345"). (And so does Python 2, using unichr and unicode literals.) The one thing that may be missing from Python is things like interpretation of surrogates by functions like isalpha() and I'm okay with adding that (since those have to loop over the entire string anyway). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From amauryfa at gmail.com Thu Jul 3 17:31:57 2008 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Thu, 3 Jul 2008 17:31:57 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <20080703144648.GA34192@nexus.in-nomine.org> Message-ID: Hello, 2008/7/3 Guido van Rossum : > I don't see an answer there to the question of whether the length() > method of a Java String object containing a single surrogate pair > returns 1 or 2; I suspect it returns 2. Python 3 supports things like > chr(0x12345) and ord("\U00012345"). (And so does Python 2, using > unichr and unicode literals.) python2.6 support for supplementary characters is not ideal: >>> unichr(0x2f81a) ValueError: unichr() arg not in range(0x10000) (narrow Python build) >>> ord(u'\U0002F81A') TypeError: ord() expected a character, but string of length 2 found. \Uxxxxxxxx seems the only way to enter these characters. 3.0 is much better and passes the two tests above. The unicodedata module gives good results in both versions: >>> unicodedata.name(u'\U0002F81A') 'CJK COMPATIBILITY IDEOGRAPH-2F81A' [34311 refs] >>> unicodedata.category(u'\U0002F81A') 'Lo' With python 3.0, I found only two places that refuse large code points on narrow builds: the "%c" format, and Py_BuildValue('C'). They should be fixed. > The one thing that may be missing from Python is things like > interpretation of surrogates by functions like isalpha() and I'm okay > with adding that (since those have to loop over the entire string > anyway). In this case, a new .isascii() method would be needed for some uses. -- Amaury Forgeot d'Arc From p.f.moore at gmail.com Thu Jul 3 17:32:37 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 3 Jul 2008 16:32:37 +0100 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <20080703144648.GA34192@nexus.in-nomine.org> Message-ID: <79990c6b0807030832k48bb408bmddb3911dc931a839@mail.gmail.com> On 03/07/2008, Guido van Rossum wrote: > I don't see an answer there to the question of whether the length() > method of a Java String object containing a single surrogate pair > returns 1 or 2; I suspect it returns 2. It appears you're right: >type testucs.java class testucs { public static void main(String[] args) { StringBuilder s = new StringBuilder("Hello, "); s.appendCodePoint(0x2F81A); System.out.println(s); // Display the string. System.out.println(s.length()); } } >java testucs Hello, ? 9 >java -version java version "1.6.0_05" Java(TM) SE Runtime Environment (build 1.6.0_05-b13) Java HotSpot(TM) Client VM (build 10.0-b19, mixed mode, sharing) > Python 3 supports things like > chr(0x12345) and ord("\U00012345"). (And so does Python 2, using > unichr and unicode literals.) And Java doesn't appear to - that appendCodePoint() method was wonderfully hard to find :-) Paul. From armin.ronacher at active-4.com Thu Jul 3 18:30:09 2008 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Thu, 3 Jul 2008 16:30:09 +0000 (UTC) Subject: [Python-Dev] UCS2/UCS4 default References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <20080703144648.GA34192@nexus.in-nomine.org> Message-ID: Guido van Rossum python.org> writes: > The one thing that may be missing from Python is things like > interpretation of surrogates by functions like isalpha() and I'm okay > with adding that (since those have to loop over the entire string > anyway). That and methods to safely iterate and slice strings by codepoint. Java supports that via String.codePointCount / String.codePointAt / String.codePointBefore / String.offsetByCodepoints. Maybe not on the unicode/str object itself but as part of unicodedata that would make sense for applications that have to deal with unicode on that level. Regards, Armin From steve at holdenweb.com Thu Jul 3 18:35:29 2008 From: steve at holdenweb.com (Steve Holden) Date: Thu, 03 Jul 2008 12:35:29 -0400 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <79990c6b0807030832k48bb408bmddb3911dc931a839@mail.gmail.com> References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <20080703144648.GA34192@nexus.in-nomine.org> <79990c6b0807030832k48bb408bmddb3911dc931a839@mail.gmail.com> Message-ID: Paul Moore wrote: > On 03/07/2008, Guido van Rossum wrote: >> I don't see an answer there to the question of whether the length() >> method of a Java String object containing a single surrogate pair >> returns 1 or 2; I suspect it returns 2. > > It appears you're right: > >> type testucs.java > class testucs { > public static void main(String[] args) { > StringBuilder s = new StringBuilder("Hello, "); > s.appendCodePoint(0x2F81A); > System.out.println(s); // Display the string. > System.out.println(s.length()); > } > } > >> java testucs > Hello, ? > 9 > >> java -version > java version "1.6.0_05" > Java(TM) SE Runtime Environment (build 1.6.0_05-b13) > Java HotSpot(TM) Client VM (build 10.0-b19, mixed mode, sharing) > >> Python 3 supports things like >> chr(0x12345) and ord("\U00012345"). (And so does Python 2, using >> unichr and unicode literals.) > > And Java doesn't appear to - that appendCodePoint() method was > wonderfully hard to find :-) > There's also the issue of indexing the Unicode strings. If we are going to insist that len(u) counts surrogate pairs as one character then random access to the characters of a string is going to be an extremely inefficient operation. Surely it's desirable under all circumstances that len(u) == sum(1 for c in u) and that [c for c in u] == [c[i] for i in range(*len(u))] How would that play under Jeroen's proposed change? regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From asmodai at in-nomine.org Thu Jul 3 18:41:32 2008 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Thu, 3 Jul 2008 18:41:32 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <79990c6b0807030832k48bb408bmddb3911dc931a839@mail.gmail.com> References: <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <20080703144648.GA34192@nexus.in-nomine.org> <79990c6b0807030832k48bb408bmddb3911dc931a839@mail.gmail.com> Message-ID: <20080703164132.GB34192@nexus.in-nomine.org> -On [20080703 17:32], Paul Moore (p.f.moore at gmail.com) wrote: > System.out.println(s.length()); I think you want to use codePointCount() to count the Unicode code points. length() returns Unicode code units. As http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html explains: In the J2SE API documentation, Unicode code point is used for character values in the range between U+0000 and U+10FFFF, and Unicode code unit is used for 16-bit char values that are code units of the UTF-16 encoding. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Man is the measure of all things... From guido at python.org Thu Jul 3 18:48:38 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 3 Jul 2008 09:48:38 -0700 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <20080703144648.GA34192@nexus.in-nomine.org> <79990c6b0807030832k48bb408bmddb3911dc931a839@mail.gmail.com> Message-ID: On Thu, Jul 3, 2008 at 9:35 AM, Steve Holden wrote: > Paul Moore wrote: >> >> On 03/07/2008, Guido van Rossum wrote: >>> >>> I don't see an answer there to the question of whether the length() >>> method of a Java String object containing a single surrogate pair >>> returns 1 or 2; I suspect it returns 2. >> >> It appears you're right: >> >>> type testucs.java >> >> class testucs { >> public static void main(String[] args) { >> StringBuilder s = new StringBuilder("Hello, "); >> s.appendCodePoint(0x2F81A); >> System.out.println(s); // Display the string. >> System.out.println(s.length()); >> } >> } >> >>> java testucs >> >> Hello, ? >> 9 >> >>> java -version >> >> java version "1.6.0_05" >> Java(TM) SE Runtime Environment (build 1.6.0_05-b13) >> Java HotSpot(TM) Client VM (build 10.0-b19, mixed mode, sharing) >> >>> Python 3 supports things like >>> chr(0x12345) and ord("\U00012345"). (And so does Python 2, using >>> unichr and unicode literals.) >> >> And Java doesn't appear to - that appendCodePoint() method was >> wonderfully hard to find :-) >> > There's also the issue of indexing the Unicode strings. If we are going to > insist that len(u) counts surrogate pairs as one character then random > access to the characters of a string is going to be an extremely inefficient > operation. But my whole point is that len(u) should count surrogate pairs as TWO! > Surely it's desirable under all circumstances that > > len(u) == sum(1 for c in u) > > and that > > [c for c in u] == [c[i] for i in range(*len(u))] > > How would that play under Jeroen's proposed change? I am not considering such a change. At best there will be some helper function in unicodedata, or perhaps a helper method on the 3.0 str type to iterate over characters instead of 16-bit values. Whether that iterator should yield 21-bit integer values or strings containing one character (i.e. perhaps a surrogate pair) and what it would do with lone surrogate halves is up to the committee to design this API. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From asmodai at in-nomine.org Thu Jul 3 18:51:40 2008 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Thu, 3 Jul 2008 18:51:40 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <20080703144648.GA34192@nexus.in-nomine.org> Message-ID: <20080703165140.GD34192@nexus.in-nomine.org> -On [20080703 17:03], Guido van Rossum (guido at python.org) wrote: >I don't see an answer there to the question of whether the length() >method of a Java String object containing a single surrogate pair >returns 1 or 2; I suspect it returns 2. As http://java.sun.com/j2se/1.5.0/docs/api/java/lang/CharSequence.html#length() states: int length() Returns the length of this character sequence. The length is the number of 16-bit chars in the sequence. But since Java switched to full UTF-16 support in 1.5.0 they extended their API since the existing methods have probably come too ingrained. E.g. codePointCount() http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html#codePointCount(char[],%20int,%20int) >The one thing that may be missing from Python is things like >interpretation of surrogates by functions like isalpha() and I'm okay >with adding that (since those have to loop over the entire string >anyway). Those would be welcome already, yes. I'll see if I can help out. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Fallen into ever-mourn, with these wings so torn, after your day my dawn... From asmodai at in-nomine.org Thu Jul 3 19:01:30 2008 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Thu, 3 Jul 2008 19:01:30 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <20080703144648.GA34192@nexus.in-nomine.org> Message-ID: <20080703170130.GE34192@nexus.in-nomine.org> -On [20080703 18:45], James Y Knight (foom at fuhm.net) wrote: >I think this is misguided. Only trying to at least correct the current situation, which I consider a bit of a mess, personally. (Although it seems others share my view.) >I'd like to have 3 levels of access available: >1) "byte"-level. In a new implementation I'd probably choose to make >all my strings stored in UTF-8, but UTF-16 is fine too. >2) codepoint-level. >3) grapheme-level. Sounds interesting as well and I can very much see the advantages of such levels and their methods. Especially in the i18n/l10n work I do. >You should be able to iterate over the string at any of the levels, >ask for the nearest codepoint/grapheme boundary to the left or right >of an index at a different level, etc. [snip] Actually it seems Java already has a lot of similar methods. >There are a few more desirable operations, to manipulate strings at >the grapheme level (because unlike for UTF-8/UTF-16 codepoints, >graphemes don't have the nice property of not containing prefixes >which are themselves valid graphemes). So, you want a find (and >everything else that implicitly does a find operation, like split, >replace, strip, etc) which requires that both endpoints of its match >are on a grapheme-boundary. [[Probably the easiest way to implement >this would be in the regexp engine.]] Well, your ideas and seeing Java's stuff actually got me excited to work on these kind of ideas, next to my datetime revamp. What would the chances for inclusion in Python be if such a PEP + code would be presented Guido? -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Beware of the fury of the patient man... From foom at fuhm.net Thu Jul 3 18:45:39 2008 From: foom at fuhm.net (James Y Knight) Date: Thu, 3 Jul 2008 12:45:39 -0400 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <20080703144648.GA34192@nexus.in-nomine.org> References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <20080703144648.GA34192@nexus.in-nomine.org> Message-ID: On Jul 3, 2008, at 10:46 AM, Jeroen Ruigrok van der Werven wrote: > -On [20080703 15:58], Guido van Rossum (guido at python.org) wrote: >> Your seem to be suggesting that len(u"\U00012345") should return 1 on >> a system that internally uses UTF-16 and hence represents this string >> as a surrogate pair. > > From a Unicode and UTF-16 point of view that makes the most sense. > So yes, I > am suggesting that. I think this is misguided. IMO, basically every programming language gets string handling wrong. (maybe with the exception of the unreleased perl6? it had some interesting moves in this area, but I haven't really been paying attention.) Everyone treats strings as arrays, but they are used quite differently. For a string, there is hardly ever a time when a programmer needs to index it with an arbitrary offset in number of codepoints, and the length-in-codepoints is pretty non-useful as well. Constant-time access to arbitrary codepoints in a string is pretty much unimportant. What *is* of utmost importantance is constant-time access to previously-returned points in the string. I'd like to have 3 levels of access available: 1) "byte"-level. In a new implementation I'd probably choose to make all my strings stored in UTF-8, but UTF-16 is fine too. 2) codepoint-level. 3) grapheme-level. You should be able to iterate over the string at any of the levels, ask for the nearest codepoint/grapheme boundary to the left or right of an index at a different level, etc. Python could probably still be made to work kinda like this. I think a language designed as such in the first place could be nicer, with opaque index objects into the string rather than integers, and such, but...whatever. Let's assume python is changed to always store strings in UTF-16. All it would take is adding a few more functions to the str object to operate on the higher levels. Wherever I say "pos" I mean an integer index into the string, at the UTF-16 level. That may sometimes be unaligned with the boundary of the representation you're asking about, and behavior in that case needs to be specified as well. .nextcodepoint(curpos, how_many=1) -> returns an index into the string how_many codepoints to the right (or left if negative) of the index curpos. .nextgrapheme(curpos, how_many=1) -> returns an index into the string how_many graphemes to the right (or left if negative) of the index curpos. .codepoints(from_pos=0, to_pos=None) -> return an iterator of codepoints from 'from_pos' to 'to_pos'. I think codepoints could be represented as strings themselves (so usually one character, sometimes two character strings). .graphemes(from_pos=0, to_pos=None) -> return an iterator of graphemes from 'from_pos' to 'to_pos'. Also could be represented by strings. The returned graphemes should probably be normalized. There are a few more desirable operations, to manipulate strings at the grapheme level (because unlike for UTF-8/UTF-16 codepoints, graphemes don't have the nice property of not containing prefixes which are themselves valid graphemes). So, you want a find (and everything else that implicitly does a find operation, like split, replace, strip, etc) which requires that both endpoints of its match are on a grapheme-boundary. [[Probably the easiest way to implement this would be in the regexp engine.]] A concrete example of that: u'A\N{COMBINING TILDE}\N{COMBINING MACRON BELOW}'.find(u'A\N{COMBINING TILDE}') returns 0. But you want a way to ask for only a *actual* "A with tilde", not an "A with tilde and macron". Anyhow, I'm not going to tackle this issue or try to push it further, but if someone does tackle it, python could grow to have the best unicode available. :) James From guido at python.org Thu Jul 3 19:10:07 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 3 Jul 2008 10:10:07 -0700 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <20080703170130.GE34192@nexus.in-nomine.org> References: <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <20080703144648.GA34192@nexus.in-nomine.org> <20080703170130.GE34192@nexus.in-nomine.org> Message-ID: On Thu, Jul 3, 2008 at 10:01 AM, Jeroen Ruigrok van der Werven wrote: > What would the chances for inclusion in Python be if such a PEP + code would > be presented Guido? As long as it is clear that the len() function and the basic slicing and indexing operations on strings continue to work in code units (i.e. 16-bit quantities) and the APIs for dealing with code points (i.e. treating surrogate pairs as a single character) are a separate API, there is a chance. Existing code using the existing APIs should not change its behavior (even if you consider the existing behavior broken), with the exception of isalpha() and similar APIs, which can IMO safely be extended to consider surrogate pairs. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From facundobatista at gmail.com Thu Jul 3 19:12:41 2008 From: facundobatista at gmail.com (Facundo Batista) Date: Thu, 3 Jul 2008 14:12:41 -0300 Subject: [Python-Dev] us.pycon.org down? Message-ID: (sorry for the crossposting) Do you know what happened with "http://us.pycon.org/"? Thank you! -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From rhamph at gmail.com Thu Jul 3 19:21:24 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 3 Jul 2008 11:21:24 -0600 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <486CDAD5.1060506@egenix.com> References: <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <486CCD66.70906@egenix.com> <20080703132146.GI62693@nexus.in-nomine.org> <486CDAD5.1060506@egenix.com> Message-ID: On Thu, Jul 3, 2008 at 7:57 AM, M.-A. Lemburg wrote: > On 2008-07-03 15:21, Jeroen Ruigrok van der Werven wrote: >> >> -On [20080703 15:00], M.-A. Lemburg (mal at egenix.com) wrote: >>> >>> Unicode if full of combining code points - if you break such a sequence, >>> the output will be just as wrong; regardless of UCS2 vs. UCS4. >> >> In my opinion you are confusing two related, but very separated things >> here. >> Combining characters have nothing to do with breaking up the encoding of a >> single codepoint. Sure enough, if you arbitrary slice up codepoints that >> consist of combining characters then your result is indeed odd looking. >> >> I never said that nor is that the point I am making. > > Please remember that lone surrogate pair code points are perfectly > valid Unicode code points, nevertheless. Just as a lone combining > code point is valid on its own. That is a big part of these problems. For all practical purposes, a surrogate is like a UTF-8 code unit, and must be handled the same way, so why the heck do they confuse everybody by saying "oh, it's a code point too!"? -- Adam Olsen, aka Rhamphoryncus From martin at v.loewis.de Thu Jul 3 19:31:14 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 03 Jul 2008 19:31:14 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <20080703104813.GF62693@nexus.in-nomine.org> References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> Message-ID: <486D0CE2.6010909@v.loewis.de> > Basically everything but string forming or string printing seems to be > broken for surrogate pairs, from what I can tell. We probably disagree what "it works correctly" means. I think everything works correctly. > Also, I think you are confused about slicing in the middle of a surrogate > pair, from a UTF-16 perspective this is 1 codepoint! Yes, but it is two code units. Python's UTF-16 implementation operates on code units, not code points. > And as such Python > needs to treat it as one character/codepoint in a string, dealing with > slicing as appropriate. It does. However, functions such as len, and all indexing, operate in code units, not code points. > The way you currently describe it is that UTF-16 > strings will be treated as UCS-2 when it comes to slicing and the likes. No. In UCS-2, the surrogate range is reserved (for UTF-16). In Python, it's not reserved, but interpreted as UTF-16. > From a UTF-16 point of view such slicing can NEVER occur unless you are bit > or byte slicing instead of character/codepoint slicing. It most certainly can. UTF-16 is not a character set, but a character encoding form (unlike UCS-2, which is a coded character set). Slicing *can* occur at the code unit level. UTF-16 is also understood as a character encoding scheme (by means of the BOM), then slicing can occur even on the byte level. > I think it can be fairly said that an item in a string is a character or > codepoint. Not in Python - it's a code unit. Regards, Martin From goodger at python.org Thu Jul 3 19:32:41 2008 From: goodger at python.org (David Goodger) Date: Thu, 3 Jul 2008 13:32:41 -0400 Subject: [Python-Dev] [PyCon-Organizers] us.pycon.org down? In-Reply-To: References: Message-ID: <4335d2c40807031032v6804a6e0pc75f02204de9bb3d@mail.gmail.com> On Thu, Jul 3, 2008 at 13:12, Facundo Batista wrote: > (sorry for the crossposting) > > Do you know what happened with "http://us.pycon.org/"? Not sure. The machine is still up (it serves www.pycon.org as well). Either something is misconfigured, or a process can't start, or something... I'll ask Jeff Rush (whose machine it's on) and Doug Napoleone (who knows more about the server than I, and has admin access) to look into it. -- David Goodger From martin at v.loewis.de Thu Jul 3 19:33:23 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 03 Jul 2008 19:33:23 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <486CC881.5090902@gmail.com> References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> Message-ID: <486D0D63.3090807@v.loewis.de> > 1. System is NOT memory limited (i.e. most desktops): use a UCS-4 Python > build, which is what most Linux distributions do (I'm not sure about the > pydotorg provided Windows or Mac OS X builds). The Windows builds must continue to use a two-byte representation, as otherwise PythonWin will break (and anything else that tries to pass Unicode strings directly to a Win32 *W function). Regards, Martin From asmodai at in-nomine.org Thu Jul 3 19:35:45 2008 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Thu, 3 Jul 2008 19:35:45 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <486CCD66.70906@egenix.com> <20080703132146.GI62693@nexus.in-nomine.org> <486CDAD5.1060506@egenix.com> Message-ID: <20080703173545.GF34192@nexus.in-nomine.org> -On [20080703 19:21], Adam Olsen (rhamph at gmail.com) wrote: >On Thu, Jul 3, 2008 at 7:57 AM, M.-A. Lemburg wrote: >> Please remember that lone surrogate pair code points are perfectly >> valid Unicode code points, nevertheless. Just as a lone combining >> code point is valid on its own. > >That is a big part of these problems. For all practical purposes, a >surrogate is like a UTF-8 code unit, and must be handled the same way, >so why the heck do they confuse everybody by saying "oh, it's a code >point too!"? Because surrogate code points are not Unicode scalar values, isolated UTF-16 code units in the range 0xd800-0xdfff are ill-formed. (D91 from Unicode 5.0/5.1, section 3.9) So, no, it is not a code point too. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Als men blijft geloven kan de zwaarste steen niet zinken... From martin at v.loewis.de Thu Jul 3 19:36:03 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 03 Jul 2008 19:36:03 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <486CDAD5.1060506@egenix.com> References: <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <486CCD66.70906@egenix.com> <20080703132146.GI62693@nexus.in-nomine.org> <486CDAD5.1060506@egenix.com> Message-ID: <486D0E03.8020007@v.loewis.de> > Please remember that lone surrogate pair code points are perfectly > valid Unicode code points, nevertheless. Just as a lone combining > code point is valid on its own. Actually, I think they aren't (not any more than an invalid codepoint, or an unassigned codepoint). They are reserved for UTF-16 only. I would have to lookup the exact Unicode terminology, but "valid" is probably not a predicate that they would use. Regards, Martin From martin at v.loewis.de Thu Jul 3 19:39:00 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 03 Jul 2008 19:39:00 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <20080703164132.GB34192@nexus.in-nomine.org> References: <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <20080703144648.GA34192@nexus.in-nomine.org> <79990c6b0807030832k48bb408bmddb3911dc931a839@mail.gmail.com> <20080703164132.GB34192@nexus.in-nomine.org> Message-ID: <486D0EB4.10901@v.loewis.de> > I think you want to use codePointCount() to count the Unicode code points. > length() returns Unicode code units. > > As http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html explains: > > In the J2SE API documentation, Unicode code point is used for character > values in the range between U+0000 and U+10FFFF, and Unicode code unit is > used for 16-bit char values that are code units of the UTF-16 encoding. So you would like to contribute a function codePointCount to Python's standard library? Go ahead. Regards, Martin From janssen at parc.com Thu Jul 3 19:43:58 2008 From: janssen at parc.com (Bill Janssen) Date: Thu, 3 Jul 2008 10:43:58 PDT Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <20080703144648.GA34192@nexus.in-nomine.org> <79990c6b0807030832k48bb408bmddb3911dc931a839@mail.gmail.com> Message-ID: <08Jul3.104407pdt."58698"@synergy1.parc.xerox.com> > Surely it's desirable under all circumstances that > > len(u) == sum(1 for c in u) > > and that > > [c for c in u] == [c[i] for i in range(*len(u))] > > How would that play under Jeroen's proposed change? Yes, but I think the argument is about what "c" is -- a character or a codepoint. Your point about efficiency is well-taken; I doubt that random access to a particular character in a string has to be efficient -- kind of a dying technique these days -- but slices and regexp performance need efficiency guarantees. Bill From tjreedy at udel.edu Thu Jul 3 19:44:19 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 03 Jul 2008 13:44:19 -0400 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <031901c8dd12$b7258b60$2570a220$@com.au> Message-ID: Daniel Arbuckle wrote: > Regardless, as I said before, nothing justifies silently changing the > meaning of a program based on an option that most users don't set for > themselves and are not aware of. The premise of this thread seems to be that the majority should suffer for the benefit of a few. That is not Python's philosophy. Python hides many system differences. It is gradually hiding more. For instance, float('nan') works uniformly in 2.6 (with little performance hit), whereas it was system specific in 2.5 But Python does not promise to hide all system differences. If the possible effects of (unicode) string build choice are not properly documented, then I agree that they should be, just as they are (or, in some cases, I presume are) the effects of underlying OS, processor integer and pointer size, float scheme, garbage collection scheme, and perhaps something I forgot. Suggested documentation changes can be submitted to the tracker as specific ascii text targeted at a specific location. If accepted, the doc maintainers will adapt submitted text to 'doc style' and add the needed markup. Current response time is usually under a week, perhaps even a day. Documented effects are not 'silent'. But I am sure they could be made a bit louder. Perhaps someday someone will volunteer to contribute a chapter to Using Python on Possible Semantic Variations that would run through the issues listed above so they are gathered together in one place as well as scattered throughout the Language and Library Reference manuals. > When such a change would take place, > it should be reported explicitly as an error. No, possible changes should be documented so that they are not silent. (But I am curious, by 'would' do you mean 'would with the current data' or 'theoretically could with chosen data'?) Terry Jan Reedy From guido at python.org Thu Jul 3 19:55:24 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 3 Jul 2008 10:55:24 -0700 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <031901c8dd12$b7258b60$2570a220$@com.au> Message-ID: On Thu, Jul 3, 2008 at 10:44 AM, Terry Reedy wrote: > The premise of this thread seems to be that the majority should suffer for > the benefit of a few. That is not Python's philosophy. Who are the many here? Who are the few? I'd venture that (at least for the foreseeable future, say, until China will finally have taken over the role of the US as the de-facto dominant super power :-) the many are people whose app will never see a Unicode character outside the BMP, or who do such minimal string processing that their code doesn't care whether it's handling UTF-16-encoded data. Python's philosophy is also Practicality Beats Purity. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From doug.napoleone at gmail.com Thu Jul 3 19:53:46 2008 From: doug.napoleone at gmail.com (doug.napoleone at gmail.com) Date: Thu, 3 Jul 2008 13:53:46 -0400 Subject: [Python-Dev] [PyCon-Organizers] us.pycon.org down? In-Reply-To: <4335d2c40807031032v6804a6e0pc75f02204de9bb3d@mail.gmail.com> References: <4335d2c40807031032v6804a6e0pc75f02204de9bb3d@mail.gmail.com> Message-ID: In Montana visiting. Will be back at the hotel in about 4 hours. Looks like base site include is missing or has wrong permissions. On 7/3/08, David Goodger wrote: > On Thu, Jul 3, 2008 at 13:12, Facundo Batista > wrote: >> (sorry for the crossposting) >> >> Do you know what happened with "http://us.pycon.org/"? > > Not sure. The machine is still up (it serves www.pycon.org as well). > Either something is misconfigured, or a process can't start, or > something... > > I'll ask Jeff Rush (whose machine it's on) and Doug Napoleone (who > knows more about the server than I, and has admin access) to look into > it. > > -- > David Goodger > _______________________________________________ > PyCon-organizers mailing list > PyCon-organizers at python.org > http://mail.python.org/mailman/listinfo/pycon-organizers > From asmodai at in-nomine.org Thu Jul 3 20:39:02 2008 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Thu, 3 Jul 2008 20:39:02 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <486D0CE2.6010909@v.loewis.de> References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702171957.GY62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486D0CE2.6010909@v.loewis.de> Message-ID: <20080703183902.GG34192@nexus.in-nomine.org> -On [20080703 19:31], "Martin v. L?wis" (martin at v.loewis.de) wrote: >Yes, but it is two code units. Python's UTF-16 implementation operates >on code units, not code points. Thank you, that is the single most important piece of information I got about this entire thing because it does change the entire approach. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Knowledge comes, but Wisdom lingers... From goodger at python.org Thu Jul 3 20:40:35 2008 From: goodger at python.org (David Goodger) Date: Thu, 3 Jul 2008 14:40:35 -0400 Subject: [Python-Dev] [PyCon-Organizers] us.pycon.org down? In-Reply-To: <4335d2c40807031032v6804a6e0pc75f02204de9bb3d@mail.gmail.com> References: <4335d2c40807031032v6804a6e0pc75f02204de9bb3d@mail.gmail.com> Message-ID: <4335d2c40807031140o61bed415s2588f1e97f2aa655@mail.gmail.com> On Thu, Jul 3, 2008 at 13:32, David Goodger wrote: > On Thu, Jul 3, 2008 at 13:12, Facundo Batista wrote: >> (sorry for the crossposting) >> >> Do you know what happened with "http://us.pycon.org/"? > > Not sure. The machine is still up (it serves www.pycon.org as well). > Either something is misconfigured, or a process can't start, or > something... > > I'll ask Jeff Rush (whose machine it's on) and Doug Napoleone (who > knows more about the server than I, and has admin access) to look into > it. Jeff fixed it. URL rewriting was off by mistake. -- David Goodger From rhamph at gmail.com Thu Jul 3 20:50:57 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 3 Jul 2008 12:50:57 -0600 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <20080703173545.GF34192@nexus.in-nomine.org> References: <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <486CCD66.70906@egenix.com> <20080703132146.GI62693@nexus.in-nomine.org> <486CDAD5.1060506@egenix.com> <20080703173545.GF34192@nexus.in-nomine.org> Message-ID: On Thu, Jul 3, 2008 at 11:35 AM, Jeroen Ruigrok van der Werven wrote: > -On [20080703 19:21], Adam Olsen (rhamph at gmail.com) wrote: >>On Thu, Jul 3, 2008 at 7:57 AM, M.-A. Lemburg wrote: >>> Please remember that lone surrogate pair code points are perfectly >>> valid Unicode code points, nevertheless. Just as a lone combining >>> code point is valid on its own. >> >>That is a big part of these problems. For all practical purposes, a >>surrogate is like a UTF-8 code unit, and must be handled the same way, >>so why the heck do they confuse everybody by saying "oh, it's a code >>point too!"? > > Because surrogate code points are not Unicode scalar values, isolated UTF-16 > code units in the range 0xd800-0xdfff are ill-formed. (D91 from Unicode > 5.0/5.1, section 3.9) > > So, no, it is not a code point too. UTF-16 D91 UTF-16 encoding form: The Unicode encoding form that assigns each Unicode scalar value in the ranges U+0000..U+D7FF and U+E000..U+FFFF to a single unsigned 16-bit code unit with the same numeric value as the Unicode scalar value, and that assigns each Unicode scalar value in the range U+10000..U+10FFFF to a surrogate pair, according to Table 3-5. ? In UTF-16, the code point sequence <004D, 0430, 4E8C, 10302> is represented as <004D 0430 4E8C D800 DF02>, where corresponds to U+10302. ? Because surrogate code points are not Unicode scalar values, isolated UTF-16 code units in the range D80016..DFFF16 are ill-formed. In the context of UTF-8 or UTF-32, a Unicode scalar value is a single code point of a valid character (more or less) and a code unit is the base unit (1 and 4 bytes respectively) of which 1 or more combine to form a code point. In UTF-16, code point becomes synonymous with code unit and Unicode scalar value becomes one or more code points. WTF? -- Adam Olsen, aka Rhamphoryncus From mal at egenix.com Thu Jul 3 21:07:04 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 03 Jul 2008 21:07:04 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <486CCD66.70906@egenix.com> <20080703132146.GI62693@nexus.in-nomine.org> <486CDAD5.1060506@egenix.com> Message-ID: <486D2358.1010604@egenix.com> On 2008-07-03 19:21, Adam Olsen wrote: > On Thu, Jul 3, 2008 at 7:57 AM, M.-A. Lemburg wrote: >> On 2008-07-03 15:21, Jeroen Ruigrok van der Werven wrote: >>> -On [20080703 15:00], M.-A. Lemburg (mal at egenix.com) wrote: >>>> Unicode if full of combining code points - if you break such a sequence, >>>> the output will be just as wrong; regardless of UCS2 vs. UCS4. >>> In my opinion you are confusing two related, but very separated things >>> here. >>> Combining characters have nothing to do with breaking up the encoding of a >>> single codepoint. Sure enough, if you arbitrary slice up codepoints that >>> consist of combining characters then your result is indeed odd looking. >>> >>> I never said that nor is that the point I am making. >> Please remember that lone surrogate pair code points are perfectly >> valid Unicode code points, nevertheless. Just as a lone combining >> code point is valid on its own. > > That is a big part of these problems. For all practical purposes, a > surrogate is like a UTF-8 code unit, and must be handled the same way, > so why the heck do they confuse everybody by saying "oh, it's a code > point too!"? You have to take that up with the Unicode consortium :-) It would have been better not to add surrogates to the standard at all. To be fair, I don't think that anybody seriously assumed at the time that more than 16 bits would be needed. In practice, you do need to be able to build Unicode strings that contain half a surrogate (ie. a single code point) or a combining code point without its anchor code point, so trying to be smart about detecting surrogates is going to create more confusion than do good, e.g. >>> x1 = u'\udbc0' >>> x2 = u'\udc00' >>> x1 u'\udbc0' >>> x2 u'\udc00' >>> len(x1) 1 >>> len(x2) 1 Having len(x1+x2) == 1 wouldn't be right and break all sorts of assumptions you normally make about string concatenation. Which is why len(x1+x2) gives 2 in both UCS2 and UCS4 builds. The fact that u'\U00100000' can map to a length 1 Unicode string in UCS4 builds and a length 2 string in UCS2 builds is merely due to the fact that the unicode-escape codec (which converts the escaped string literal to a Unicode object) does know about surrogates and uses them to avoid exceptions. Programmers need to be aware of this fact, that's all... just like they need to aware of differences between integer and float division, different behavior of classic and new-style classes, etc. etc. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 03 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 3 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From mal at egenix.com Thu Jul 3 21:16:03 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 03 Jul 2008 21:16:03 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <20080703173545.GF34192@nexus.in-nomine.org> References: <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <486CCD66.70906@egenix.com> <20080703132146.GI62693@nexus.in-nomine.org> <486CDAD5.1060506@egenix.com> <20080703173545.GF34192@nexus.in-nomine.org> Message-ID: <486D2573.3070301@egenix.com> On 2008-07-03 19:35, Jeroen Ruigrok van der Werven wrote: > -On [20080703 19:21], Adam Olsen (rhamph at gmail.com) wrote: >> On Thu, Jul 3, 2008 at 7:57 AM, M.-A. Lemburg wrote: >>> Please remember that lone surrogate pair code points are perfectly >>> valid Unicode code points, nevertheless. Just as a lone combining >>> code point is valid on its own. >> That is a big part of these problems. For all practical purposes, a >> surrogate is like a UTF-8 code unit, and must be handled the same way, >> so why the heck do they confuse everybody by saying "oh, it's a code >> point too!"? > > Because surrogate code points are not Unicode scalar values, isolated UTF-16 > code units in the range 0xd800-0xdfff are ill-formed. (D91 from Unicode > 5.0/5.1, section 3.9) True. They are not valid UTF-16 code units, but a code unit is just a storage byte representation of a Unicode tranformation... """ Code Unit. The minimal bit combination that can represent a unit of encoded text for processing or interchange. The Unicode Standard uses 8-bit code units in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding form, and 32-bit code units in the UTF-32 encoding form. (See definition D77 in Section 3.9, Unicode Encoding Forms.) """ That's not the same thing as a code point which is an assignment of a slot in the Unicode character set... """ Code Point. Any value in the Unicode codespace; that is, the range of integers from 0 to 10FFFF16. (See definition D10 in Section 3.4, Characters and Encoding.) """ Reference: http://www.unicode.org/glossary/ Also see Chapter 3.4 (http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf#G2212): """ Surrogate code points and noncharacters are considered assigned code points, but not assigned characters. """ -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 03 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 3 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From facundobatista at gmail.com Thu Jul 3 21:16:07 2008 From: facundobatista at gmail.com (Facundo Batista) Date: Thu, 3 Jul 2008 16:16:07 -0300 Subject: [Python-Dev] [PyCon-Organizers] us.pycon.org down? In-Reply-To: <4335d2c40807031140o61bed415s2588f1e97f2aa655@mail.gmail.com> References: <4335d2c40807031032v6804a6e0pc75f02204de9bb3d@mail.gmail.com> <4335d2c40807031140o61bed415s2588f1e97f2aa655@mail.gmail.com> Message-ID: 2008/7/3 David Goodger : > Jeff fixed it. URL rewriting was off by mistake. Thanks! :) -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From mal at egenix.com Thu Jul 3 21:24:40 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 03 Jul 2008 21:24:40 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <031901c8dd12$b7258b60$2570a220$@com.au> Message-ID: <486D2778.8090503@egenix.com> On 2008-07-03 19:44, Terry Reedy wrote: > The premise of this thread seems to be that the majority should suffer > for the benefit of a few. That is not Python's philosophy. In reality, most Unixes ship with UCS4 builds of Python. Windows and Mac OS X ship with UCS2 builds. Still, anyone is free to build their own favorite version - that's freedom of choice, which is good. Programmers just need to be made aware of the differences in UCS2 and UCS4 builds and deal with it. Here's talk I've given many many times over the years which explains some of the details that a Python programmer needs to know when dealing with Unicode: http://www.egenix.com/files/python/PyConUK2007-Developing-Unicode-aware-applications-in-Python.pdf Perhaps I should add a section on UCS2 vs. UCS4 the next time around ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 03 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 3 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From jeremy at 54oaks.com Thu Jul 3 21:35:33 2008 From: jeremy at 54oaks.com (Jeremy Link) Date: Thu, 3 Jul 2008 12:35:33 -0700 Subject: [Python-Dev] problems compiling ctypes Message-ID: <010601c8dd43$f71ab890$8101a8c0@bocaron> I've grabbed the latest libffi that contains support for the ARM processor. I then enable FFI_CLOSURES in the arm/ffi.c file. When I do this, I get compilation errors that it is missing ffi_prep_closure. Is ffi.c up to date for supporting the ARM platform? Not sure if there is a simple configuration change in one of the files that will fix *everything* or if ffi.c just doesn't support ARM yet and so it needs be developed/revamped. Thanks for any help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at holdenweb.com Thu Jul 3 21:59:08 2008 From: steve at holdenweb.com (Steve Holden) Date: Thu, 03 Jul 2008 15:59:08 -0400 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: <486D2778.8090503@egenix.com> References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <031901c8dd12$b7258b60$2570a220$@com.au> <486D2778.8090503@egenix.com> Message-ID: M.-A. Lemburg wrote: > On 2008-07-03 19:44, Terry Reedy wrote: >> The premise of this thread seems to be that the majority should suffer >> for the benefit of a few. That is not Python's philosophy. > > In reality, most Unixes ship with UCS4 builds of Python. Windows > and Mac OS X ship with UCS2 builds. Still, anyone is free to build > their own favorite version - that's freedom of choice, which is good. > > Programmers just need to be made aware of the differences in UCS2 > and UCS4 builds and deal with it. > > Here's talk I've given many many times over the years which explains > some of the details that a Python programmer needs to know when dealing > with Unicode: > > http://www.egenix.com/files/python/PyConUK2007-Developing-Unicode-aware-applications-in-Python.pdf > > > Perhaps I should add a section on UCS2 vs. UCS4 the next time around ;-) > The indications are that would be helpful to many people (including myself). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From tjreedy at udel.edu Thu Jul 3 23:01:48 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 03 Jul 2008 17:01:48 -0400 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <031901c8dd12$b7258b60$2570a220$@com.au> Message-ID: Guido van Rossum wrote: > On Thu, Jul 3, 2008 at 10:44 AM, Terry Reedy wrote: >> The premise of this thread seems to be that the majority should suffer for >> the benefit of a few. That is not Python's philosophy. The premise is the OP's idea that Python should switch to all UCS4 to create a more pure ('ideal') situation or the idea that len(s) should count codepoints (correct term?) for all builds as a matter of purity even though on it would be time-costly on 16-bit builds as a matter of practicality. > Who are the many here? Those who are happy with 3.0 strings as they are for their systems and who would not benefit from the proposed change. In other words, what you say below. > Who are the few? Those who are stuck with 16-bit builds and who would benefit from 32-bits builds because they need to use non basic plane chars and need to use the operations for which a change would make a positive difference. In my opinion, such people with Windows should at least install Linux + UCS4 Python as an alternate install. > I'd venture that (at least for > the foreseeable future, say, until China will finally have taken over > the role of the US as the de-facto dominant super power :-) the many > are people whose app will never see a Unicode character outside the > BMP, or who do such minimal string processing that their code doesn't > care whether it's handling UTF-16-encoded data. Just what I meant. > Python's philosophy is also Practicality Beats Purity. Just what I meant, in the form 'Purity does not beat Practicality'. Having summarized, perhaps too briefly, why Python's basic unicode implementation would not change in the near future, I went on to my main point, which is that better docs might be an alternative solution to the problems raised. tjr From martin at v.loewis.de Thu Jul 3 23:15:40 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 03 Jul 2008 23:15:40 +0200 Subject: [Python-Dev] problems compiling ctypes In-Reply-To: <010601c8dd43$f71ab890$8101a8c0@bocaron> References: <010601c8dd43$f71ab890$8101a8c0@bocaron> Message-ID: <486D417C.9060503@v.loewis.de> > Thanks for any help. This list (python-dev) is not for getting help, but for providing it. So if you have patches that you would like to discuss, please go ahead. As you are seeking help, please use python-list at python.org (aka news:comp.lang.python) instead. Regards, Martin From rhamph at gmail.com Fri Jul 4 00:00:56 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 3 Jul 2008 16:00:56 -0600 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <031901c8dd12$b7258b60$2570a220$@com.au> Message-ID: On Thu, Jul 3, 2008 at 3:01 PM, Terry Reedy wrote: > > The premise is the OP's idea that Python should switch to all UCS4 to create > a more pure ('ideal') situation or the idea that len(s) should count > codepoints (correct term?) for all builds as a matter of purity even though > on it would be time-costly on 16-bit builds as a matter of practicality. Wrong term - code units and code points are equivalent in UTF-16 and UTF-32. What you're looking for is unicode scalar values. -- Adam Olsen, aka Rhamphoryncus From guido at python.org Fri Jul 4 00:21:46 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 3 Jul 2008 15:21:46 -0700 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <031901c8dd12$b7258b60$2570a220$@com.au> Message-ID: On Thu, Jul 3, 2008 at 3:00 PM, Adam Olsen wrote: > On Thu, Jul 3, 2008 at 3:01 PM, Terry Reedy wrote: >> >> The premise is the OP's idea that Python should switch to all UCS4 to create >> a more pure ('ideal') situation or the idea that len(s) should count >> codepoints (correct term?) for all builds as a matter of purity even though >> on it would be time-costly on 16-bit builds as a matter of practicality. > > Wrong term - code units and code points are equivalent in UTF-16 and > UTF-32. What you're looking for is unicode scalar values. I don't think so. I have in my lap the Unicode 5.0 standard, which on page 102, under UTF-16, states (amongst others): """ * In UTF-16, the code point sequence <004D, 0430, 4E8C, 10302> is represented as <004D 0439 4E8C D800 DF02>, where corresponds to U+10302. * Because surrogate code points are not Unicode scalar values, isolated UTF-16 code units in the range D800[16]..DFFF[16] are ill-formed. """ >From this I understand they distinguish carefully between code points and code units -- D800 is a code unit but not a code point, 10302 is a code point but not a (UTF-16) code unit. OTOH outside the context of UTF-8, the surrogates are also referred to as "reserved code points" (e.g. in Table 2-3 on page 27, "Types of Code Points"). I think the best thing we can do is to use "code points" to refer to characters and "code units" to the individual 16-bit values in the UTF-16 encoding; this seems compatible with usage elsewhere in this thread by most folks. Also see http://unicode.org/glossary/: """ Code Point. Any value in the Unicode codespace; that is, the range of integers from 0 to 10FFFF16. (See definition D10 in Section 3.4, Characters and Encoding.) . . . Code Unit. The minimal bit combination that can represent a unit of encoded text for processing or interchange. The Unicode Standard uses 8-bit code units in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding form, and 32-bit code units in the UTF-32 encoding form. (See definition D77 in Section 3.9, Unicode Encoding Forms.) """ -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Fri Jul 4 00:31:49 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 04 Jul 2008 00:31:49 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <031901c8dd12$b7258b60$2570a220$@com.au> Message-ID: <486D5355.4000104@v.loewis.de> > Wrong term - code units and code points are equivalent in UTF-16 and > UTF-32. What you're looking for is unicode scalar values. How so? Section 2.5, UTF-16 says "code points in the supplementary planes, in the range U+10000..U+10FFFF, are represented as pairs of 16-bit code units." So clearly, code points in Unicode range from U+0000..U+10FFFF, independent of encoding form. In UTF-16, code units range from 0..65535. OTOH, "unicode scalar value" is nearly synonymous to "code point": D76 Unicode Scalar Value. Any Unicode code point except high-surrogate and low-surrogate code points. So codepoint in Terry's message was the right term. Regards, Martin From rhamph at gmail.com Fri Jul 4 01:50:51 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 3 Jul 2008 17:50:51 -0600 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <031901c8dd12$b7258b60$2570a220$@com.au> Message-ID: On Thu, Jul 3, 2008 at 4:21 PM, Guido van Rossum wrote: > On Thu, Jul 3, 2008 at 3:00 PM, Adam Olsen wrote: >> On Thu, Jul 3, 2008 at 3:01 PM, Terry Reedy wrote: >>> >>> The premise is the OP's idea that Python should switch to all UCS4 to create >>> a more pure ('ideal') situation or the idea that len(s) should count >>> codepoints (correct term?) for all builds as a matter of purity even though >>> on it would be time-costly on 16-bit builds as a matter of practicality. >> >> Wrong term - code units and code points are equivalent in UTF-16 and >> UTF-32. What you're looking for is unicode scalar values. > > I don't think so. I have in my lap the Unicode 5.0 standard, which on > page 102, under UTF-16, states (amongst others): > > """ > * In UTF-16, the code point sequence <004D, 0430, 4E8C, 10302> is > represented as <004D 0439 4E8C D800 DF02>, where > corresponds to U+10302. The literal interpretation is that the U+10302 code point should get expanded into . It doesn't say if is a pair of code units or a pair of code points. > * Because surrogate code points are not Unicode scalar values, > isolated UTF-16 code units in the range D800[16]..DFFF[16] are > ill-formed. > """ So a lone surrogate code unit is not a valid scalar. It also implies surrogate code points exist, rather than ruling them out. > From this I understand they distinguish carefully between code points > and code units -- D800 is a code unit but not a code point, 10302 is a > code point but not a (UTF-16) code unit. I disagree. They switch between code point and code unit arbitrarily, never than saying surrogate code points don't exist. > OTOH outside the context of UTF-8, the surrogates are also referred to > as "reserved code points" (e.g. in Table 2-3 on page 27, "Types of > Code Points"). You mean outside the context of UTF-16? Regarding them as reserved and lone surrogates as ill-formed code units would have been simpler, but alas, is not the case. Regarding changes in 5.1 (http://www.unicode.org/versions/Unicode5.1.0/), I can find this bit to give some context: Rendering Default Ignorable Code Points Update the last paragraph on p. 192 of The Unicode Standard, Version 5.0, in Section 5.20, Default Ignorable Code Points, to read as follows: Replacement Text An implementation should ignore all default ignorable code points in rendering whenever it does not support those code points, whether they are assigned or not. In previous versions of the Unicode Standard, surrogate code points, private use code points, and some control characters were also default ignorable code points. However, to avoid security problems, such characters always should be displayed with a missing glyph, so that there is a visible indication of their presence in the text. In Unicode 5.1 these code points are no longer default ignorable code points. For more information, see UTR #36, "Unicode Security Considerations." Clearly they act as if surrogate code points exist. Finally, we find this in the glossary: Unicode Scalar Value. Any Unicode code point except high-surrogate and low-surrogate code points. In other words, the ranges of integers 0 to D7FF16 and E00016 to 10FFFF16 inclusive. (See definition D76 in Section 3.9, Unicode Encoding Forms.) Clearly, each surrogate is a valid code point, regardless of encoding. A surrogate pair simultaneously represents both one code point (the scalar value) and two code points (the surrogate code points). To be unambiguous you must instead use either code units (always 2 for UTF-16) or scalar values (always 1 in any encoding). The OP wanted it to always be 1, so the correct unambiguous term is scalar value. > I think the best thing we can do is to use "code points" to refer to > characters and "code units" to the individual 16-bit values in the > UTF-16 encoding; this seems compatible with usage elsewhere in this > thread by most folks. > > Also see http://unicode.org/glossary/: > > """ > Code Point. Any value in the Unicode codespace; that is, the range of > integers from 0 to 10FFFF16. (See definition D10 in Section 3.4, > Characters and Encoding.) > . > . > . > Code Unit. The minimal bit combination that can represent a unit of > encoded text for processing or interchange. The Unicode Standard uses > 8-bit code units in the UTF-8 encoding form, 16-bit code units in the > UTF-16 encoding form, and 32-bit code units in the UTF-32 encoding > form. (See definition D77 in Section 3.9, Unicode Encoding Forms.) > """ > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- Adam Olsen, aka Rhamphoryncus From guido at python.org Fri Jul 4 05:26:16 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 3 Jul 2008 20:26:16 -0700 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <031901c8dd12$b7258b60$2570a220$@com.au> Message-ID: On Thu, Jul 3, 2008 at 4:50 PM, Adam Olsen wrote: > Clearly, each surrogate is a valid code point, regardless of encoding. > A surrogate pair simultaneously represents both one code point (the > scalar value) and two code points (the surrogate code points). To be > unambiguous you must instead use either code units (always 2 for > UTF-16) or scalar values (always 1 in any encoding). > > The OP wanted it to always be 1, so the correct unambiguous term is > scalar value. Fine, if you want to be completely unambiguous you apparently you can't use the word code point but you have to use either scalar values (always Unicode characters) or code units (always part of an encoding, and 8, 16 or 32 bits). Regardless of what the OP might want, len() of a surrogate pair will return 2 (since it counts code units), and we'll have to provide another API to count scalar values / characters that sees a surrogate pair as one. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From dickinsm at gmail.com Fri Jul 4 11:39:55 2008 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 4 Jul 2008 10:39:55 +0100 Subject: [Python-Dev] [Python-checkins] r64424 - inpython/trunk:Include/object.h Lib/test/test_sys.pyMisc/NEWSObjects/intobject.c Objects/longobject.cObjects/typeobject.cPython/bltinmodule.c In-Reply-To: References: <20080620041816.4D5E81E4002@bag.python.org> <4863F43C.2080904@v.loewis.de> <5c6f2a5d0806261317m3c8b848dm6e8071d8b841fa59@mail.gmail.com> <4863FC7B.6070903@v.loewis.de> <5c6f2a5d0806261346o7af44dc6g449d6bece2d75842@mail.gmail.com> <04BCC25BF0EC4DFBB06FEEA568199FB1@RaymondLaptop1> <5c6f2a5d0806261453n6ebe20b7yb26ca69c27f75517@mail.gmail.com> <58A6CFFCB6AC4A84A6C86809A50295FF@RaymondLaptop1> Message-ID: <5c6f2a5d0807040239s22fc6b7cx31f27e916b6ba6a2@mail.gmail.com> On Sun, Jun 29, 2008 at 3:12 AM, Alex Martelli wrote: > On Sat, Jun 28, 2008 at 4:46 PM, Raymond Hettinger wrote: >> Is everyone agreed on a tohex/fromhex pair using the C99 notation as >> recommended in 754R? > > Dunno about everyone, but I'm +1 on that. > > >> Are you thinking of math module functions or as a method and classmethod on >> floats? > > I'd prefer math modules functions. I'm halfway through implementing this as a pair of float methods. Are there compelling reasons to prefer math module functions over float methods, or vice versa? Personally, I'm leaning slightly towards float methods: for me, these conversions are important enough to belong in the core language. But I don't have strong feelings either way. Mark From mal at egenix.com Fri Jul 4 12:08:14 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 04 Jul 2008 12:08:14 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080702182215.GA62693@nexus.in-nomine.org> <20080702183541.GB62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <031901c8dd12$b7258b60$2570a220$@com.au> <486D2778.8090503@egenix.com> Message-ID: <486DF68E.7090000@egenix.com> On 2008-07-03 21:59, Steve Holden wrote: > M.-A. Lemburg wrote: >> On 2008-07-03 19:44, Terry Reedy wrote: >>> The premise of this thread seems to be that the majority should >>> suffer for the benefit of a few. That is not Python's philosophy. >> >> In reality, most Unixes ship with UCS4 builds of Python. Windows >> and Mac OS X ship with UCS2 builds. Still, anyone is free to build >> their own favorite version - that's freedom of choice, which is good. >> >> Programmers just need to be made aware of the differences in UCS2 >> and UCS4 builds and deal with it. >> >> Here's talk I've given many many times over the years which explains >> some of the details that a Python programmer needs to know when dealing >> with Unicode: >> >> http://www.egenix.com/files/python/PyConUK2007-Developing-Unicode-aware-applications-in-Python.pdf >> >> >> Perhaps I should add a section on UCS2 vs. UCS4 the next time around ;-) > > The indications are that would be helpful to many people (including > myself). Ok, I'll add one for one of the next conferences. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 04 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 2 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From guido at python.org Fri Jul 4 15:49:07 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 4 Jul 2008 06:49:07 -0700 Subject: [Python-Dev] [Python-checkins] r64424 - inpython/trunk:Include/object.h Lib/test/test_sys.pyMisc/NEWSObjects/intobject.c Objects/longobject.cObjects/typeobject.cPython/bltinmodule.c In-Reply-To: <5c6f2a5d0807040239s22fc6b7cx31f27e916b6ba6a2@mail.gmail.com> References: <20080620041816.4D5E81E4002@bag.python.org> <4863F43C.2080904@v.loewis.de> <5c6f2a5d0806261317m3c8b848dm6e8071d8b841fa59@mail.gmail.com> <4863FC7B.6070903@v.loewis.de> <5c6f2a5d0806261346o7af44dc6g449d6bece2d75842@mail.gmail.com> <04BCC25BF0EC4DFBB06FEEA568199FB1@RaymondLaptop1> <5c6f2a5d0806261453n6ebe20b7yb26ca69c27f75517@mail.gmail.com> <58A6CFFCB6AC4A84A6C86809A50295FF@RaymondLaptop1> <5c6f2a5d0807040239s22fc6b7cx31f27e916b6ba6a2@mail.gmail.com> Message-ID: Float methods are fine. On Fri, Jul 4, 2008 at 2:39 AM, Mark Dickinson wrote: > On Sun, Jun 29, 2008 at 3:12 AM, Alex Martelli wrote: >> On Sat, Jun 28, 2008 at 4:46 PM, Raymond Hettinger wrote: >>> Is everyone agreed on a tohex/fromhex pair using the C99 notation as >>> recommended in 754R? >> >> Dunno about everyone, but I'm +1 on that. >> >> >>> Are you thinking of math module functions or as a method and classmethod on >>> floats? >> >> I'd prefer math modules functions. > > I'm halfway through implementing this as a pair of float methods. Are there > compelling reasons to prefer math module functions over float methods, or > vice versa? > > Personally, I'm leaning slightly towards float methods: for me, these > conversions are important enough to belong in the core language. But I > don't have strong feelings either way. > > Mark > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mail at timgolden.me.uk Fri Jul 4 17:24:32 2008 From: mail at timgolden.me.uk (Tim Golden) Date: Fri, 04 Jul 2008 16:24:32 +0100 Subject: [Python-Dev] ctypes assertion failure Message-ID: <486E40B0.4020608@timgolden.me.uk> This problem was raised on the comtypes-users list as it prevents comtypes from being imported on Python 2.6 at the moment. http://bugs.python.org/issue3258 I'll try to find the time to step through to code to work out what's going on, but it's inside the innards of ctypes which I've never looked into before. Could someone confirm at a glance whether this should be given a high priority, please? It results in an assertion error in debug mode, a SystemError in non-debug referring to a NULL return without an Exception set. Thanks TJG From status at bugs.python.org Fri Jul 4 18:06:28 2008 From: status at bugs.python.org (Python tracker) Date: Fri, 4 Jul 2008 18:06:28 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20080704160628.7A783780B1@psf.upfronthosting.co.za> ACTIVITY SUMMARY (06/27/08 - 07/04/08) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 1941 open (+33) / 13165 closed (+34) / 15106 total (+67) Open issues with patches: 607 Average duration of open issues: 704 days. Median duration of open issues: 1570 days. Open Issues Breakdown open 1915 (+33) pending 26 ( +0) Issues Created Or Reopened (71) _______________________________ isinstance(anything, MetaclassThatDefinesInstancecheck) raises i 06/30/08 http://bugs.python.org/issue2325 reopened jyasskin patch test_multiprocessing hangs on OS X 10.5.3 07/02/08 http://bugs.python.org/issue3088 reopened jnoller patch test_multiprocessing causes test_ctypes to fail 07/02/08 http://bugs.python.org/issue3125 reopened jnoller patch "Quick search" box renders too long on FireFox 3 06/27/08 http://bugs.python.org/issue3154 reopened benjamin.peterson make text is broken 06/27/08 CLOSED http://bugs.python.org/issue3217 created benjamin.peterson 2to3 Fix_imports optimization 06/27/08 http://bugs.python.org/issue3218 created nedds patch repeated keyword arguments 06/27/08 CLOSED http://bugs.python.org/issue3219 created gangesmaster patch Improve Bytes and Byte Array Methods doc 06/27/08 CLOSED http://bugs.python.org/issue3220 created tjreedy SystemError: Parent module 'foo' not loaded on import statement 06/27/08 http://bugs.python.org/issue3221 created schmir inf*inf gives inf, but inf**2 gives overflow error 06/27/08 CLOSED http://bugs.python.org/issue3222 created ms py3k warn on use of frame.f_exc* 06/27/08 http://bugs.python.org/issue3223 created benjamin.peterson easy Small typo in 2.6 what's new 06/28/08 CLOSED http://bugs.python.org/issue3224 created catlee backport python 3.0 language functionality to python 2.5 by addi 06/28/08 CLOSED http://bugs.python.org/issue3225 created kaizhu can't install on OSX 10.4 06/28/08 http://bugs.python.org/issue3226 created benjamin.peterson os.environ.clear has no effect on child processes 06/28/08 CLOSED http://bugs.python.org/issue3227 created joe.p.cool mailbox.mbox creates files with execute bit set 06/28/08 http://bugs.python.org/issue3228 created pl Language reference, class definitions: missing text in "Programm 06/28/08 CLOSED http://bugs.python.org/issue3229 created oefe dictobject.c: inappropriate use of PySet_GET_SIZE? 06/28/08 CLOSED http://bugs.python.org/issue3230 created oefe re.compile fails with some bytes patterns 06/28/08 http://bugs.python.org/issue3231 created pitrou patch Wrong str->bytes conversion in Lib/encodings/idna.py 06/29/08 http://bugs.python.org/issue3232 created pitrou Timestamp stored in ZIP file not correct ? 06/29/08 CLOSED http://bugs.python.org/issue3233 created pythonmeister subprocess.py strips last character when raising an AttributeErr 06/29/08 CLOSED http://bugs.python.org/issue3234 created mmokrejs Improve subprocess module usage 06/29/08 CLOSED http://bugs.python.org/issue3235 created mmokrejs ints contructed from strings don't use the smallint constants 06/29/08 CLOSED http://bugs.python.org/issue3236 created pitrou idlelib3.0 still using xrange 06/29/08 CLOSED http://bugs.python.org/issue3237 created tjreedy backport python 3.0 language functionality to python 2.6 by addi 06/29/08 http://bugs.python.org/issue3238 created kaizhu curses/textpad.py incorrectly and redundantly imports ascii 06/30/08 http://bugs.python.org/issue3239 reopened facundobatista patch IDLE environment corrupts string.letters 06/30/08 CLOSED http://bugs.python.org/issue3240 created rupole warnings module prints garbage 06/30/08 CLOSED http://bugs.python.org/issue3241 created schmir Segfault in PyFile_SoftSpace/PyEval_EvalFrameEx with sys.stdout 06/30/08 CLOSED http://bugs.python.org/issue3242 reopened benjamin.peterson patch Support iterable bodies in httplib 06/30/08 http://bugs.python.org/issue3243 created catlee multipart/form-data encoding 06/30/08 http://bugs.python.org/issue3244 created catlee Memory leak on OS X 06/30/08 CLOSED http://bugs.python.org/issue3245 created fiddlerwoaroof configure: WARNING: sys/socket.h: present but cannot be compiled 06/30/08 http://bugs.python.org/issue3246 created rrochele dir of an "_sre.SRE_Match" object not working 07/01/08 CLOSED http://bugs.python.org/issue3247 created vizcayno ScrolledText can't be placed in a PanedWindow 07/01/08 http://bugs.python.org/issue3248 created gpolo patch bug adding datetime.timedelta to datetime.date 07/01/08 CLOSED http://bugs.python.org/issue3249 created cjw296 datetime.time does not support arithmetic 07/01/08 http://bugs.python.org/issue3250 created cjw296 patch references are case insensitive 07/01/08 CLOSED http://bugs.python.org/issue3251 created tds333 str.tobytes() and bytes/bytearray.tostr() 07/01/08 CLOSED http://bugs.python.org/issue3252 created mark shutil.move bahave unexpected in fat32 07/01/08 http://bugs.python.org/issue3253 created grissiom Suggestion: change default behavior of __ne__ 07/01/08 CLOSED http://bugs.python.org/issue3254 created cvp [proposal] alternative for re.sub 07/02/08 http://bugs.python.org/issue3255 created ocean-city Multiprocessing docs are not 3.0-ready 07/02/08 http://bugs.python.org/issue3256 created mishok13 "#define socklen_t int" in pyconfig.h 07/02/08 CLOSED http://bugs.python.org/issue3257 created fgoujeon ctypes assertion failure in trunk 07/02/08 http://bugs.python.org/issue3258 created tim.golden fix_imports needs to be using the 'as' keyword 07/02/08 CLOSED http://bugs.python.org/issue3259 created brett.cannon fix_imports does not handle intra-package renames 07/02/08 http://bugs.python.org/issue3260 created brett.cannon Lib/test/test_cookielib declares utf-8 encoding, but contains no 07/02/08 CLOSED http://bugs.python.org/issue3261 created leosoto re.split doesn't split with zero-width regex 07/02/08 http://bugs.python.org/issue3262 created mrabarnett patch Odd code fragment in ABC definitions 07/02/08 CLOSED http://bugs.python.org/issue3263 created rhettinger Use -lcrypto instead of -lcrypt on Solaris 2.6 when available 07/02/08 CLOSED http://bugs.python.org/issue3264 created mmokrejs Python-2.5.2/Modules/_ctypes/malloc_closure.c:70: error: `MAP_AN 07/03/08 http://bugs.python.org/issue3265 created mmokrejs Python-2.5.2/Modules/mmapmodule.c:915: error: `O_RDWR' undeclare 07/03/08 http://bugs.python.org/issue3266 created mmokrejs yield in list comprehensions possibly broken in 3.0 07/03/08 CLOSED http://bugs.python.org/issue3267 created erickt Cleanup of tp_basicsize inheritance 07/03/08 http://bugs.python.org/issue3268 created Rhamphoryncus patch strptime() makes an error concerning second in arg 07/03/08 CLOSED http://bugs.python.org/issue3269 created nevgor test_multiprocessing: test_listener_client flakiness 07/03/08 http://bugs.python.org/issue3270 created jnoller patch iter.next() or iter.__next__() ? 07/03/08 CLOSED http://bugs.python.org/issue3271 created vizcayno Multiprocessing hangs when multiprocessing.Pool methods are call 07/03/08 http://bugs.python.org/issue3272 created mishok13 multiprocessing and meaningful errors 07/03/08 http://bugs.python.org/issue3273 created mishok13 Py_CLEAR(tmp) seg faults 07/03/08 http://bugs.python.org/issue3274 created stutzbach Control flow not optimized 07/03/08 CLOSED http://bugs.python.org/issue3275 created quotemstr httplib.HTTPConnection._send_request should not blindly assume d 07/04/08 http://bugs.python.org/issue3276 created ludvig.ericson patch socket's OOB data management is broken on FreeBSD 07/04/08 http://bugs.python.org/issue3277 created giampaolo.rodola socket's SO_OOBINLINE option does not work on FreeBSD 07/04/08 http://bugs.python.org/issue3278 created giampaolo.rodola import of site.py fails on startup 07/04/08 http://bugs.python.org/issue3279 created rupole %c format does not accept large numbers on ucs-2 builds 07/04/08 http://bugs.python.org/issue3280 created amaury.forgeotdarc support r"\" 07/04/08 CLOSED http://bugs.python.org/issue3281 created lidaobing Undefined unicode characters should be non-printable 07/04/08 CLOSED http://bugs.python.org/issue3282 created amaury.forgeotdarc multiprocessing.connection doesn't import AuthenticationError, w 07/04/08 http://bugs.python.org/issue3283 created mishok13 patch Issues Now Closed (61) ______________________ DeprecationWarning in zipfile.py while zipping 113000 files 216 days http://bugs.python.org/issue1526 loewis zipfile hangs on certain zip files 202 days http://bugs.python.org/issue1622 loewis patch Move Demo/classes/Rat.py to Lib/fractions.py and fix it up. 6 days http://bugs.python.org/issue1682 marketdickinson patch ZIP files with archive comments longer than 4k not recognized as 179 days http://bugs.python.org/issue1746 loewis test_audioop.py converted to unittest 142 days http://bugs.python.org/issue2042 benjamin.peterson patch urlparse() does not handle URLs with port numbers properly 127 days http://bugs.python.org/issue2195 facundobatista subprocess.Popen.communicate takes bytes, not str 68 days http://bugs.python.org/issue2683 georg.brandl patch pickling of large recursive structures crashes cPickle 4 days http://bugs.python.org/issue2702 facundobatista patch asynchat forgets packets when push is called from a thread 54 days http://bugs.python.org/issue2808 josiahcarlson Copy cgi.parse_qs() to urllib.parse 50 days http://bugs.python.org/issue2829 benjamin.peterson tests for sys.getsizeof fail on win64 9 days http://bugs.python.org/issue3147 schuppenies 3.0b1 doesn't seem to install on macs 8 days http://bugs.python.org/issue3174 benjamin.peterson Pydoc should ignore __package__ attributes 8 days http://bugs.python.org/issue3190 ncoghlan round docstring is inaccurate 7 days http://bugs.python.org/issue3191 georg.brandl Documentation for fractions module needs work 2 days http://bugs.python.org/issue3197 marketdickinson patch Can't import sqlite3 in Python 2.6b1 3 days http://bugs.python.org/issue3215 loewis make text is broken 4 days http://bugs.python.org/issue3217 georg.brandl repeated keyword arguments 4 days http://bugs.python.org/issue3219 benjamin.peterson patch Improve Bytes and Byte Array Methods doc 4 days http://bugs.python.org/issue3220 georg.brandl inf*inf gives inf, but inf**2 gives overflow error 1 days http://bugs.python.org/issue3222 marketdickinson Small typo in 2.6 what's new 0 days http://bugs.python.org/issue3224 benjamin.peterson backport python 3.0 language functionality to python 2.5 by addi 0 days http://bugs.python.org/issue3225 loewis os.environ.clear has no effect on child processes 0 days http://bugs.python.org/issue3227 benjamin.peterson Language reference, class definitions: missing text in "Programm 0 days http://bugs.python.org/issue3229 benjamin.peterson dictobject.c: inappropriate use of PySet_GET_SIZE? 0 days http://bugs.python.org/issue3230 rhettinger Timestamp stored in ZIP file not correct ? 0 days http://bugs.python.org/issue3233 loewis subprocess.py strips last character when raising an AttributeErr 0 days http://bugs.python.org/issue3234 benjamin.peterson Improve subprocess module usage 3 days http://bugs.python.org/issue3235 mmokrejs ints contructed from strings don't use the smallint constants 1 days http://bugs.python.org/issue3236 loewis idlelib3.0 still using xrange 0 days http://bugs.python.org/issue3237 benjamin.peterson IDLE environment corrupts string.letters 2 days http://bugs.python.org/issue3240 loewis warnings module prints garbage 0 days http://bugs.python.org/issue3241 brett.cannon Segfault in PyFile_SoftSpace/PyEval_EvalFrameEx with sys.stdout 1 days http://bugs.python.org/issue3242 amaury.forgeotdarc patch Memory leak on OS X 0 days http://bugs.python.org/issue3245 benjamin.peterson dir of an "_sre.SRE_Match" object not working 2 days http://bugs.python.org/issue3247 amaury.forgeotdarc bug adding datetime.timedelta to datetime.date 3 days http://bugs.python.org/issue3249 cjw296 references are case insensitive 0 days http://bugs.python.org/issue3251 georg.brandl str.tobytes() and bytes/bytearray.tostr() 0 days http://bugs.python.org/issue3252 lemburg Suggestion: change default behavior of __ne__ 0 days http://bugs.python.org/issue3254 cvp "#define socklen_t int" in pyconfig.h 0 days http://bugs.python.org/issue3257 amaury.forgeotdarc fix_imports needs to be using the 'as' keyword 0 days http://bugs.python.org/issue3259 brett.cannon Lib/test/test_cookielib declares utf-8 encoding, but contains no 0 days http://bugs.python.org/issue3261 brett.cannon Odd code fragment in ABC definitions 0 days http://bugs.python.org/issue3263 gvanrossum Use -lcrypto instead of -lcrypt on Solaris 2.6 when available 0 days http://bugs.python.org/issue3264 mmokrejs yield in list comprehensions possibly broken in 3.0 0 days http://bugs.python.org/issue3267 brett.cannon strptime() makes an error concerning second in arg 0 days http://bugs.python.org/issue3269 nevgor iter.next() or iter.__next__() ? 0 days http://bugs.python.org/issue3271 benjamin.peterson Control flow not optimized 1 days http://bugs.python.org/issue3275 georg.brandl support r"\" 0 days http://bugs.python.org/issue3281 facundobatista Undefined unicode characters should be non-printable 0 days http://bugs.python.org/issue3282 georg.brandl rlcompleter add "(" to callables feature 2520 days http://bugs.python.org/issue449227 facundobatista patch, easy Docs don't define sequence-ness very well 1979 days http://bugs.python.org/issue678464 benjamin.peterson asyncore.dispactcher: incorrect connect 1613 days http://bugs.python.org/issue889153 josiahcarlson catch invalid chunk length in httplib read routine 1590 days http://bugs.python.org/issue900744 pdorrell patch asyncore fixes and improvements 1583 days http://bugs.python.org/issue909005 josiahcarlson patch asyncore.file_dispatcher should not take fd as argument 1393 days http://bugs.python.org/issue1025525 josiahcarlson asyncore should handle ECONNRESET in send 1331 days http://bugs.python.org/issue1063924 josiahcarlson Add notes to the manual about `is` and methods 893 days http://bugs.python.org/issue1410739 georg.brandl patch, easy 2.4.2 file.read caches EOF state 715 days http://bugs.python.org/issue1523853 georg.brandl asyncore/asynchat patches 387 days http://bugs.python.org/issue1736190 josiahcarlson patch asynchat should call "handle_close" 379 days http://bugs.python.org/issue1740572 josiahcarlson Top Issues Most Discussed (10) ______________________________ 35 test_multiprocessing hangs on OS X 10.5.3 2 days open http://bugs.python.org/issue3088 24 math test fails on Solaris 10 13 days open http://bugs.python.org/issue3167 18 cmath test fails on Solaris 10 13 days open http://bugs.python.org/issue3168 11 Let bin/oct/hex show floats 10 days open http://bugs.python.org/issue3008 10 Use -lcrypto instead of -lcrypt on Solaris 2.6 when available 0 days closed http://bugs.python.org/issue3264 10 __eq__ / __hash__ check doesn't take inheritance into account 122 days open http://bugs.python.org/issue2235 8 Segfault in PyFile_SoftSpace/PyEval_EvalFrameEx with sys.stdout 1 days closed http://bugs.python.org/issue3242 8 "Quick search" box renders too long on FireFox 3 7 days pending http://bugs.python.org/issue3154 7 re.IGNORECASE not Unicode-ready 53 days open http://bugs.python.org/issue2834 6 ints contructed from strings don't use the smallint constants 1 days closed http://bugs.python.org/issue3236 From unknown_kev_cat at hotmail.com Sat Jul 5 01:20:34 2008 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Fri, 4 Jul 2008 23:20:34 +0000 (UTC) Subject: [Python-Dev] UCS2/UCS4 default References: <20080702141328.GW62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <031901c8dd12$b7258b60$2570a220$@com.au> <486D5355.4000104@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > > > Wrong term - code units and code points are equivalent in UTF-16 and > > UTF-32. What you're looking for is unicode scalar values. > > How so? Section 2.5, UTF-16 says > > "code points in the supplementary planes, in the range > U+10000..U+10FFFF, are represented as pairs of 16-bit code units." > > So clearly, code points in Unicode range from U+0000..U+10FFFF, > independent of encoding form. > > In UTF-16, code units range from 0..65535. > > OTOH, "unicode scalar value" is nearly synonymous to "code point": > > D76 Unicode Scalar Value. Any Unicode code point except high-surrogate > and low-surrogate code points. > > So codepoint in Terry's message was the right term. > No Terry did definitely mean Unicode scalar values. He was describing the "pure" but impractical "len()" that would count a surrogate pair as "1", not 2, even in the 32-bit builds. For what it is worth: Code point: a number between 0 and 1114111. Scalar Value: a code point, except the surrogate code points. Code unit: The basic unit of the encoding. One code unit is always sufficient to encode some Unicode Scalar values. However, other Unicode scalar values may require multiple Code units. Note that a scalar value is a code point. A code point may or may not be a scalar value. Practical len() returns the number of code units of the internal storage format. Pure len() allegedly would return the number of Unicode scalar values (obviously a surrogate pair would be considered a single Unicode scalar value). Please keep in mind that encodings encode Unicode scalar values. Thus a utf-8 code unit sequence (or UTF-32 code unit) that would give a code point in the surrogate sections is technically in error. (Although python would do well to ignore this restriction as there may be valid reasons to have a utf-8 sequence that is not a valid encoded Unicode text sequence) From martin at v.loewis.de Sat Jul 5 07:35:18 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 05 Jul 2008 07:35:18 +0200 Subject: [Python-Dev] UCS2/UCS4 default In-Reply-To: References: <20080702141328.GW62693@nexus.in-nomine.org> <20080703104813.GF62693@nexus.in-nomine.org> <486CC881.5090902@gmail.com> <031901c8dd12$b7258b60$2570a220$@com.au> <486D5355.4000104@v.loewis.de> Message-ID: <486F0816.8070707@v.loewis.de> >> The premise is the OP's idea that Python should switch to all UCS4 to >> create a more pure ('ideal') situation or the idea that len(s) should >> count codepoints (correct term?) for all builds as a matter of purity >> even though on it would be time-costly on 16-bit builds as a matter >> of practicality. > No Terry did definitely mean Unicode scalar values. True. However, using the word "code point" to refer to "Unicode scalar values" is also correct. He (rather, the OP) wanted to count code points (i.e. not count code units). > Practical len() returns the number of code units of the internal storage format. No, it returns the number of code units. > Pure len() allegedly would return the number of Unicode scalar values (obviously > a surrogate pair would be considered a single Unicode scalar value). Perhaps-not-so-obviously-but-still-intendended, a pure len counting surrogate pairs as one would *also* count code points. > Please keep in mind that encodings encode Unicode scalar values. A "coded character set" is "a character set in which each character is assigned a numeric code point". So clearly, a character encoding form encodeds code points. > Thus a utf-8 > code unit sequence (or UTF-32 code unit) that would give a code point in the > surrogate sections is technically in error. Sure, but this has nothing to do with Terry's terminology use. Regards, Martin From dickinsm at gmail.com Sat Jul 5 11:39:34 2008 From: dickinsm at gmail.com (Mark Dickinson) Date: Sat, 5 Jul 2008 10:39:34 +0100 Subject: [Python-Dev] C99 code in the Python core? Message-ID: <5c6f2a5d0807050239s61e00dd9ub315bcac59299bdc@mail.gmail.com> I have a general question and a specific question. First the general one: (1) When is it okay to use C99 code in the Python core? More particularly, is it considered acceptable to use widely-implemented library functions that are specified in C99 but not ANSI C, or widely-implemented features that are new to C99? Or is C99 code now acceptable pretty much anywhere? If so, should PEP 7 be updated? It currently says: """Use ANSI/ISO standard C (the 1989 version of the standard).""" I think there are some C99 features that still aren't implemented everywhere, even on major platforms. (Examples are the inverse hyperbolic trig functions in math.h.) And the specific question: (2) Is it okay to use the '%a' format specifier for sprintf, sscanf and friends. Are there major platforms where this isn't implemented? (Using '%a' would make the issue implementation much simpler.) Mark From matthieu.brucher at gmail.com Sat Jul 5 11:59:13 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Sat, 5 Jul 2008 11:59:13 +0200 Subject: [Python-Dev] C99 code in the Python core? In-Reply-To: <5c6f2a5d0807050239s61e00dd9ub315bcac59299bdc@mail.gmail.com> References: <5c6f2a5d0807050239s61e00dd9ub315bcac59299bdc@mail.gmail.com> Message-ID: 2008/7/5 Mark Dickinson : > I have a general question and a specific question. First the general one: > > (1) When is it okay to use C99 code in the Python core? More particularly, > is it considered acceptable to use widely-implemented library functions that > are specified in C99 but not ANSI C, or widely-implemented features that > are new to C99? > > Or is C99 code now acceptable pretty much anywhere? If so, should > PEP 7 be updated? It currently says: """Use ANSI/ISO standard C > (the 1989 version of the standard).""" > > I think there are some C99 features that still aren't implemented > everywhere, even on major platforms. (Examples are the inverse hyperbolic > trig functions in math.h.) Hi, I don't think that C99 is not supported by Visual Studio and there are no plan for Microsoft to do so. Matthieu -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher From martin at v.loewis.de Sat Jul 5 12:46:53 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 05 Jul 2008 12:46:53 +0200 Subject: [Python-Dev] C99 code in the Python core? In-Reply-To: <5c6f2a5d0807050239s61e00dd9ub315bcac59299bdc@mail.gmail.com> References: <5c6f2a5d0807050239s61e00dd9ub315bcac59299bdc@mail.gmail.com> Message-ID: <486F511D.50809@v.loewis.de> > (1) When is it okay to use C99 code in the Python core? More particularly, > is it considered acceptable to use widely-implemented library functions that > are specified in C99 but not ANSI C, or widely-implemented features that > are new to C99? [C99 is also ANSI C, IIUC. ANSI has adopted ISO/IEC 9899:1999 as a U.S. national standard.] It's ok to use functions of the C99 standard library if you have a configure test and a fall-back implementation, or if you know that the function is available on all systems we care about. > Or is C99 code now acceptable pretty much anywhere? No. As others have pointed out, Microsoft still hasn't implemented in Visual C. > If so, should > PEP 7 be updated? It currently says: """Use ANSI/ISO standard C > (the 1989 version of the standard).""" No. > (2) Is it okay to use the '%a' format specifier for sprintf, sscanf and friends. > Are there major platforms where this isn't implemented? (Using > '%a' would make the issue implementation much simpler.) It's implemented in VS 2008, see http://msdn.microsoft.com/en-us/library/hf4y5e3w.aspx On the other hand, people still might try to run Python on older versions of Solaris, such as Solaris 2.6 (which was released 1997). I don't know when Solaris' CRT first started to support this. I'd add a configure test, and, at run-time, raise an exception if the C library doesn't support it yet somebody tries to use it. Regards, Martin From greg at krypto.org Sat Jul 5 21:00:35 2008 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 5 Jul 2008 12:00:35 -0700 Subject: [Python-Dev] [Python-3000] Second betas tomorrow In-Reply-To: References: <2865D095-DA45-4875-AE40-8A5F8C81C299@python.org> <9B1D90667C03488BB57E09808F4C9B64@RaymondLaptop1> <8A43F3E7-BEE8-4656-833D-867328D16D52@python.org> <79F76ED3-D21C-4D1D-B6B0-ECEBDFCEDDE3@python.org> <1afaf6160807011942r738aae8u6c84aab03d463d76@mail.gmail.com> Message-ID: <52dc1c820807051200v9bb3673g967aa6d596520f6a@mail.gmail.com> On Tue, Jul 1, 2008 at 8:29 PM, Barry Warsaw wrote: > On Jul 1, 2008, at 10:42 PM, Benjamin Peterson wrote: > >> On Tue, Jul 1, 2008 at 8:44 PM, Barry Warsaw wrote: >> >>> On Jul 1, 2008, at 7:27 PM, Brett Cannon wrote: >>> >>>> >>>> Is a Google Calendar kept by anyone that lists stuff like planned >>>> release dates, etc.? >>>> >>> >>> >>> http://www.google.com/calendar/ical/b6v58qvojllt0i6ql654r1vh00%40group.calendar.google.com/public/basic.ics >>> >> >> Can I get the non-iCal version? >> > > > http://www.google.com/calendar/feeds/b6v58qvojllt0i6ql654r1vh00%40group.calendar.google.com/public/basic > > > http://www.google.com/calendar/embed?src=b6v58qvojllt0i6ql654r1vh00%40group.calendar.google.com&ctz=America/New_York > > - -Barry > > And for anyone who hasn't already figured it out.. you can just add b6v58qvojllt0i6ql654r1vh00 at group.calendar.google.com as a friend in your existing google calendar to see the release schedule calendar alongside your own. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat Jul 5 21:10:37 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 5 Jul 2008 19:10:37 +0000 (UTC) Subject: [Python-Dev] bytearray and array.array are not thread-safe Message-ID: Hello, Short story: bytearray and array.array by construction allow user code to reallocate their internal memory buffer. But a raw pointer to the said buffer can also be obtained by another thread, and used after releasing the GIL (for CPU-intensive operations like compression). As a consequence, the interpreter crashes. Was it envisioned? I see no warning in the docs for the array.array type (although it has been there for quite some time). See http://bugs.python.org/issue3139 (reported by Amaury) Regards Antoine. From martin at v.loewis.de Sun Jul 6 00:28:43 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 06 Jul 2008 00:28:43 +0200 Subject: [Python-Dev] bytearray and array.array are not thread-safe In-Reply-To: References: Message-ID: <486FF59B.6090609@v.loewis.de> > Short story: bytearray and array.array by construction allow user code to > reallocate their internal memory buffer. But a raw pointer to the said buffer > can also be obtained by another thread, and used after releasing the GIL (for > CPU-intensive operations like compression). As a consequence, the interpreter > crashes. > > Was it envisioned? I guess this wasn't considered. For t#, there is a comment from Travis that it really shouldn't release the buffer yet, but it does, anyway. I propose that new codes s*, t*, w* are added, and that s#,t#,w# refuses objects which implement a releasebuffer procedure (alternatively, s# etc might be removed altogether right away). Users of s* then need to pass in a Py_Buffer view pointer that gets filled, and need to explicitly release the buffer. For convenience, it might help if the Py_buffer structure includes a borrowed PyObject* to the underlying object, along with a PyBuffer_Release procedure/macro. Regards, Martin From grig.gheorghiu at gmail.com Sun Jul 6 03:02:31 2008 From: grig.gheorghiu at gmail.com (Grig Gheorghiu) Date: Sat, 5 Jul 2008 18:02:31 -0700 Subject: [Python-Dev] Community buildbots and Python release quality metrics In-Reply-To: <20080626151855.25821.815972320.divmod.xquotient.10384@joule.divmod.com> References: <20080626151855.25821.815972320.divmod.xquotient.10384@joule.divmod.com> Message-ID: <3f09d5a00807051802y20a86ceax2b3d0267ae6b6682@mail.gmail.com> On Thu, Jun 26, 2008 at 8:18 AM, wrote: > Today on planetpython.org, Doug Hellman announced the June issue of Python > magazine. The cover story this month is about Pybots, "the fantastic > automation system that has been put in place to make sure new releases of > Python software are as robust and stable as possible". > > Last week, there was a "beta" release of Python which, according to the > community buildbots, cannot run any existing python software. Normally I'd > be complaining here about Twisted, but in fact Twisted is doing relatively > well right now; only 80 failing tests. Django apparently cannot even be > imported. > > The community buildbots have been in a broken state for months now[1]. I've > been restraining myself from whinging about this, but now that it's getting > close to release, it's time to get these into shape, or it's time to get rid > of them. Hi all, Sorry for not replying sooner, I was on vacation when this thread started and I only got back in town yesterday. To bring my $0.02 to this discussion: the Pybots 'community buildbots' turned out largely to be a failure. Why? Because there was never really a 'community' around it, especially a community of project leaders who would be interested in the state of their projects' tests. All the machines donated for the Pybots farm belong to people who just happen to be interested in given projects, but are not really the leaders of those projects. The only project who constantly stayed on top of the buildbot status was Twisted, represented by JP Calderone (although even there the tests were running on my machine, and not on a machine contributed by the Twisted folks.) I still haven't given up, and I hope this thread will spur project leaders into donating time, or resources, to the Pybots project. It has been my bitter observation about the Open Source world that people just LOVE to get stuff for free. As soon as you mention more involvement from them in the form of time, money, hardware resources, etc., the same brave proponents of cool things to be done are nowhere to be found. To come back to this thread, I don't think it's reasonable to expect the Python core developers to be that interested in the status of the community buildbots. It is again up to the project leaders to step up to the plate, donate machines to Pybots, and stay on top of any breakages that result from Python core checkins. It seems to me that the Python core developers have always responded promptly and favorably to reports of breakages coming from the Pybots farm. Grig From josiah.carlson at gmail.com Sun Jul 6 03:52:26 2008 From: josiah.carlson at gmail.com (Josiah Carlson) Date: Sat, 5 Jul 2008 18:52:26 -0700 Subject: [Python-Dev] Packing and unpacking integers Message-ID: A few years ago (yes, it's been that long), I proposed adding a new format code to struct that would pack integers as strings, similar to the 's' format code. In particular, struct.pack('>60G', v) would be a 60-byte big-endian unsigned integer as a string. The feature request is http://bugs.python.org/issue1023290 . Shortly thereafter, it was decided that it wouldn't become a struct format code, but instead would find itself as part of binhex. Raymond Hettinger was supposed to write the function a couple years ago for, I believe, Python 2.4 . It never happened. It still hasn't happened for Python 2.5 or 2.6 . I believe there is still a need for packing integers as strings and unpacking strings as integers, more specifically, offering to Python an interface to _PyLong_FromByteArray() and _PyLong_AsByteArray(). I would be happy to write the functionality and unittests this coming week for 2.6 and 3.0 if I get the ok. If not, I can write it for 2.7 and 3.1 . - Josiah From solipsis at pitrou.net Sun Jul 6 13:17:23 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 06 Jul 2008 13:17:23 +0200 Subject: [Python-Dev] bytearray and array.array are not thread-safe In-Reply-To: <486FF59B.6090609@v.loewis.de> References: <486FF59B.6090609@v.loewis.de> Message-ID: <1215343043.5983.7.camel@fsol> Le dimanche 06 juillet 2008 ? 00:28 +0200, "Martin v. L?wis" a ?crit : > I propose that new codes s*, t*, w* are added, and that s#,t#,w# refuses > objects which implement a releasebuffer procedure (alternatively, s# etc > might be removed altogether right away). Users of s* then need to pass > in a Py_Buffer view pointer that gets filled, and need to explicitly > release the buffer. For convenience, it might help if the Py_buffer > structure includes a borrowed PyObject* to the underlying object, along > with a PyBuffer_Release procedure/macro. Why a borrowed reference rather than a new one? It could be decref'ed as part as the proposed PyBuffer_Release procedure. Overall it sounds like a clean resolution of the problem. Regards Antoine. From glyph at divmod.com Sun Jul 6 17:46:30 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Sun, 06 Jul 2008 15:46:30 -0000 Subject: [Python-Dev] Community buildbots and Python release quality metrics In-Reply-To: <3f09d5a00807051802y20a86ceax2b3d0267ae6b6682@mail.gmail.com> References: <20080626151855.25821.815972320.divmod.xquotient.10384@joule.divmod.com> <3f09d5a00807051802y20a86ceax2b3d0267ae6b6682@mail.gmail.com> Message-ID: <20080706154630.25821.930077863.divmod.xquotient.12261@joule.divmod.com> On 01:02 am, grig.gheorghiu at gmail.com wrote: >To bring my $0.02 to this discussion: the Pybots 'community buildbots' >turned out largely to be a failure. Let's not say it's a failure. Let's instead say that it hasn't yet become a success :-). >I still haven't given up, and I hope this thread will spur project >leaders into donating time, or resources, to the Pybots project. It >has been my bitter observation about the Open Source world that people >just LOVE to get stuff for free. As soon as you mention more >involvement from them in the form of time, money, hardware resources, >etc., the same brave proponents of cool things to be done are nowhere >to be found. I think this list is the wrong place to go to reach the people who need to be reached. It's python core developers and other people already involved in and aware of core development. That said I'm not sure what the *right* place is; I think your blog is syndicated on the unofficial planet python, so maybe that's a good place to start. Sadly, the right thing to do in terms of drumming up support is to get someone interested in PR and have them go to each project individually, but that might be more effort than setting up the buildbots themselves, at least initially... However, let's say that this were tremendously successful, and lots of people start paying attention. I think pybots.org needs to be updated to say exactly what a participant interested in python testing needs to do, beyond "here's how you set up a buildbot" (a page that is actually a daunting-looking blog post which admits it may be somewhat outdated), because setting up a buildbot might not be the only thing that the project needs. It's one thing to tell people that they need to be helping out (and I'm sure you're right) but it's much more useful to get the message out that "we really need people to do X, Y, and Z". One thing I will definitely commit to is that if you make a "cry for help" page, I'll blog about it to drive attention to it, and I'll encourage the other, perhaps better-read Python bloggers I know to do so as well. My personal interest at the moment is to get all of the irrelevant red off of the community builders page. Whether or not you believe in an XP "green bar" philosophy, the large number of spurious failures is distracting. Who is it that is capable of making appropriate changes? Is there something I could do to help with that? Note that I'm committing to say that I can do *that*, but, at least you could shut me up by making it my fault ;-). (I'd also like to improve the labels of the build slaves. What exactly is "x86 Red Hat 9 trunk" testing? Trunk of what? What project?) It would be good to remove the perception that it's somebody else's problem as much as possible. Right now, all these dead buildbots suggest to the various communities, "oh, I guess that guy who runs that buildbot needs to fix it". The dead bots should just be killed off, and their projects removed from the list, so that if someone wants to get involved and set up a bot for lxml, they're not put off of it by the fact that it might be rude to the guy who is currently (allegedly) running it. From martin at v.loewis.de Sun Jul 6 19:25:34 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 06 Jul 2008 19:25:34 +0200 Subject: [Python-Dev] bytearray and array.array are not thread-safe In-Reply-To: <1215343043.5983.7.camel@fsol> References: <486FF59B.6090609@v.loewis.de> <1215343043.5983.7.camel@fsol> Message-ID: <4871000E.2060800@v.loewis.de> > Why a borrowed reference rather than a new one? It could be decref'ed as > part as the proposed PyBuffer_Release procedure. The question is a) whether a Py_Buffer remains valid even if the object goes away. That seems not to be the case, i.e. the caller of getbuffer needs to hold onto the object, anyway. b) whether it would still be correct to call releasebuffer explicitly. Of course, as getbuffer would have to fill the object into the view, releasebuffer could also DECREF the included object. Alternatively, there could be a pair of functions PyBuffer_Get and PyBuffer_Release, which would fill the object into the view itself. So I withdraw issue b; the real question remains whether it is desired that a buffer will remain alive as long as there is a view to it. That is a question for the buffer experts to answer; it may also have impacts on cyclic garbage collection (as inclusion of a view into an object will mean that the tp_traverse function must also Py_VISIT the embedded object). > Overall it sounds like a clean resolution of the problem. Unfortunately, it's also a significant change at this point. I personally won't have time to provide a patch, but I think a patch is needed before the last beta. IOW, the issue should become a release blocker. Regards, Martin From stephen at xemacs.org Sun Jul 6 19:40:14 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 07 Jul 2008 02:40:14 +0900 Subject: [Python-Dev] Community buildbots and Python release quality metrics In-Reply-To: <20080706154630.25821.930077863.divmod.xquotient.12261@joule.divmod.com> References: <20080626151855.25821.815972320.divmod.xquotient.10384@joule.divmod.com> <3f09d5a00807051802y20a86ceax2b3d0267ae6b6682@mail.gmail.com> <20080706154630.25821.930077863.divmod.xquotient.12261@joule.divmod.com> Message-ID: <87y74enc75.fsf@uwakimon.sk.tsukuba.ac.jp> glyph at divmod.com writes: > On 01:02 am, grig.gheorghiu at gmail.com wrote: > >To bring my $0.02 to this discussion: the Pybots 'community buildbots' > >turned out largely to be a failure. > > Let's not say it's a failure. Let's instead say that it hasn't yet > become a success :-). +1 > >I still haven't given up, and I hope this thread will spur project > >leaders into donating time, or resources, to the Pybots project. > I think this list is the wrong place to go to reach the people who need > to be reached. It's python core developers and other people already > involved in and aware of core development. That said I'm not sure what > the *right* place is; I think your blog is syndicated on the unofficial > planet python, so maybe that's a good place to start. Sadly, the right > thing to do in terms of drumming up support is to get someone interested > in PR and have them go to each project individually, but that might be > more effort than setting up the buildbots themselves, at least > initially... Exactly, and that's why nobody should be "bitter" about it. The problem is that the while overall the effort and rewards look to be balanced in favor of the rewards, the startup costs for individuals are quite high. I think this *is* the place to start, though. The project leaders "should" be, and probably generally are, "here". They have the strongest interest in any individual 'bot, while Guido is quite correct in saying python-dev can't afford to have strong interest in all the bots. > However, let's say that this were tremendously successful, and lots of > people start paying attention. I think pybots.org needs to be updated > to say exactly what a participant interested in python testing needs to > do, beyond "here's how you set up a buildbot" (a page that is actually a > daunting-looking blog post which admits it may be somewhat outdated), > because setting up a buildbot might not be the only thing that the > project needs. It's one thing to tell people that they need to be > helping out (and I'm sure you're right) but it's much more useful to get > the message out that "we really need people to do X, Y, and Z". One > thing I will definitely commit to is that if you make a "cry for help" > page, I'll blog about it to drive attention to it, and I'll encourage > the other, perhaps better-read Python bloggers I know to do so as > well. Two suggestions in this vein: First, I think it's established that some but not all "red community bots" *are* of interest to Python core development. While I'm not aware of the technical details, I estimate that triaging the community 'bot failures is probably similar to reviewing bugs in the Python tracker. Perhaps Martin van Loewis and others who have offered the 5-for-1 review deal would be willing to extend the definition of "review" to include initial bug reports based on a red community bot (ie, you review the community 'bot failure and decide it is something that should concern Python core development). Perhaps that's not appropriate, but a similar system could be set up. Second, something intermediate between the occasional half-hour of triaging bugs and a full-blown PR campaign at the projects would be documenting the criteria for reporting a failure on a community 'bot to the Python tracker as a bug, etc. This would also serve as a basis for talking to project lurker who might have the odd half-hour to do some "red bot" triaging. (By criteria I mean the kinds of things that Python core considers necessary breakage in new versions that downstream must address in their own code, vs. regressions in a x.y.z patchlevel, etc. The kind of thing that glyph and Guido were discussing earlier.) > It would be good to remove the perception that it's somebody else's > problem as much as possible. To the extent that a 'bot is running prerelease project against prerelease Python, this is probably not very doable. If Python is stable and the project version is prerelease, it's the project's bug until proven otherwise, and vice versa. If both are stable, again some expertise is probably needed for triage. I guess that means that one important task is to classfy the bots in a two-by-two matrix according to stability of project and Python respectively. From martin at v.loewis.de Sun Jul 6 19:29:34 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 06 Jul 2008 19:29:34 +0200 Subject: [Python-Dev] Community buildbots and Python release quality metrics In-Reply-To: <20080706154630.25821.930077863.divmod.xquotient.12261@joule.divmod.com> References: <20080626151855.25821.815972320.divmod.xquotient.10384@joule.divmod.com> <3f09d5a00807051802y20a86ceax2b3d0267ae6b6682@mail.gmail.com> <20080706154630.25821.930077863.divmod.xquotient.12261@joule.divmod.com> Message-ID: <487100FE.508@v.loewis.de> > (I'd also like to improve the labels of the build slaves. What exactly > is "x86 Red Hat 9 trunk" testing? Trunk of what? What project?) It seems like you would like to edit the master configuration file. That can be arranged fairly easily. Regards, Martin From martin at v.loewis.de Sun Jul 6 19:34:06 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 06 Jul 2008 19:34:06 +0200 Subject: [Python-Dev] Packing and unpacking integers In-Reply-To: References: Message-ID: <4871020E.5070802@v.loewis.de> > I believe there is still a need for packing integers as strings and > unpacking strings as integers, more specifically, offering to Python > an interface to _PyLong_FromByteArray() and _PyLong_AsByteArray(). I > would be happy to write the functionality and unittests this coming > week for 2.6 and 3.0 if I get the ok. If not, I can write it for 2.7 > and 3.1 . I think it needs to be deferred to the next releases, given that the beta release already happened. If you have any spare time, please look into some of the real serious, release-blocking bug reports. Regards, Martin From grig.gheorghiu at gmail.com Sun Jul 6 23:09:37 2008 From: grig.gheorghiu at gmail.com (Grig Gheorghiu) Date: Sun, 6 Jul 2008 14:09:37 -0700 Subject: [Python-Dev] Community buildbots and Python release quality metrics In-Reply-To: <20080706154630.25821.930077863.divmod.xquotient.12261@joule.divmod.com> References: <20080626151855.25821.815972320.divmod.xquotient.10384@joule.divmod.com> <3f09d5a00807051802y20a86ceax2b3d0267ae6b6682@mail.gmail.com> <20080706154630.25821.930077863.divmod.xquotient.12261@joule.divmod.com> Message-ID: <3f09d5a00807061409y42f17e8lba152f2bfe5025e4@mail.gmail.com> On Sun, Jul 6, 2008 at 8:46 AM, wrote: > > However, let's say that this were tremendously successful, and lots of > people start paying attention. I think pybots.org needs to be updated to > say exactly what a participant interested in python testing needs to do, > beyond "here's how you set up a buildbot" (a page that is actually a > daunting-looking blog post which admits it may be somewhat outdated), > because setting up a buildbot might not be the only thing that the project > needs. It's one thing to tell people that they need to be helping out (and > I'm sure you're right) but it's much more useful to get the message out that > "we really need people to do X, Y, and Z". One thing I will definitely > commit to is that if you make a "cry for help" page, I'll blog about it to > drive attention to it, and I'll encourage the other, perhaps better-read > Python bloggers I know to do so as well. I have posted 'cries for help' repeatedly on my blog, with generally little success. See http://agiletesting.blogspot.com/search?q=pybots . But I will post again. If you can include here a paragraph of what your vision of the 'X, Y and Z' above is, that'd help too. I think I've been pretty clear about the benefits that the Pybots farm can bring to a given project, so all project leaders on this list should be aware of them IMO. If not, I'd be happy to rehash them. But the home page of pybots.org is pretty self-explanatory I think. > > My personal interest at the moment is to get all of the irrelevant red off > of the community builders page. Whether or not you believe in an XP "green > bar" philosophy, the large number of spurious failures is distracting. Who > is it that is capable of making appropriate changes? Is there something I > could do to help with that? Note that I'm committing to say that I can do > *that*, but, at least you could shut me up by making it my fault ;-). > I'll send a message to the pybots mailing list asking people whose buildbots are turned off if they're still interested in running them. Negative or no answers will mean we can remove them from the farm. > (I'd also like to improve the labels of the build slaves. What exactly is > "x86 Red Hat 9 trunk" testing? Trunk of what? What project?) > It's not only a question of changing a static label in this case. A given buildslave can run the tests for multiple projects, in which case it becomes tricky to change the label on the fly accordingly. As an aside, the slave you mention was running on my machine, and I used it to run the Twisted tests, but I shut it down a while ago because the buildbot process was taking too many resources. If the Twisted project can donate a machine, I'd be happy to include it in the Pybots farm ASAP. > It would be good to remove the perception that it's somebody else's problem > as much as possible. Right now, all these dead buildbots suggest to the > various communities, "oh, I guess that guy who runs that buildbot needs to > fix it". The dead bots should just be killed off, and their projects > removed from the list, so that if someone wants to get involved and set up a > bot for lxml, they're not put off of it by the fact that it might be rude to > the guy who is currently (allegedly) running it. As I said, I'll see what the current owners have to say, and then I'll report back to this list. Thanks for offering your help! Grig From victor.stinner at haypocalc.com Mon Jul 7 01:11:52 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 7 Jul 2008 01:11:52 +0200 Subject: [Python-Dev] Play with fuzzing Message-ID: <200807070111.52959.victor.stinner@haypocalc.com> Hi, I wrote a fuzzing "framework" called Fusil and this week I wrote a fuzzer for Python. The idea is quite simple: for a module, - list all functions, classes and class methods - call a function with random arguments (of random types) - instanciate a class with random arguments - if the class is created correctly, call methods with random arguments Example: --------------------- 8< ----------------------------------- print "Call 39/40: linuxaudiodev.open()" try: linuxaudiodev.open( # argument 1/2 u"\u62C0\uFBD7\uB46A\u55E0\uFB7B\uD392\u7CEE", # argument 2/2 52.682, ) except Exception, err: print >>stderr, "ERROR: %s" % err --------------------- 8< ----------------------------------- I tried it on CPython 2.5 and then on CPython trunk (future 2.6). I found some bugs, see last bug entries in Python bugtracker. Just an example: http://bugs.python.org/issue3304 -> invalid call to PyMem_Free() in fileio_init() Most bugs crash with a segmentation fault, abort or a denial of service. If you would like to try my fuzzer, use: (1) svn co http://fusil.hachoir.org/svn/trunk fusil (2) cd fusil (3) ./run_fusil.sh -p projects/python.py --fast --remove ALL The option --fast goes faster, --remove does remove session directory even if Python generated some files, and "ALL" test all modules. FUSIL IS NOT SAFE! So run it under a different user using to avoid dangerous call to os.unlink(). The module list is hardcoded: it's the list of CPython modules written in C. More informations about Fusil: http://fusil.hachoir.org/trac -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From brett at python.org Mon Jul 7 01:33:14 2008 From: brett at python.org (Brett Cannon) Date: Sun, 6 Jul 2008 16:33:14 -0700 Subject: [Python-Dev] Play with fuzzing In-Reply-To: <200807070111.52959.victor.stinner@haypocalc.com> References: <200807070111.52959.victor.stinner@haypocalc.com> Message-ID: On Sun, Jul 6, 2008 at 4:11 PM, Victor Stinner wrote: > Hi, > > I wrote a fuzzing "framework" called Fusil and this week I wrote a fuzzer for > Python. The idea is quite simple: for a module, > - list all functions, classes and class methods > - call a function with random arguments (of random types) > - instanciate a class with random arguments > - if the class is created correctly, call methods with random arguments > > Example: > --------------------- 8< ----------------------------------- > print "Call 39/40: linuxaudiodev.open()" > try: > linuxaudiodev.open( > # argument 1/2 > u"\u62C0\uFBD7\uB46A\u55E0\uFB7B\uD392\u7CEE", > # argument 2/2 > 52.682, > ) > except Exception, err: > print >>stderr, "ERROR: %s" % err > --------------------- 8< ----------------------------------- > > I tried it on CPython 2.5 and then on CPython trunk (future 2.6). I found some > bugs, see last bug entries in Python bugtracker. Just an example: > > http://bugs.python.org/issue3304 > -> invalid call to PyMem_Free() in fileio_init() > You can use http://bugs.python.org/issue?%40search_text=&title=&%40columns=title&id=&%40columns=id&creation=&creator=haypo&activity=2008-07-06&%40columns=activity&%40sort=activity&actor=&nosy=&type=&components=&versions=&dependencies=&assignee=&keywords=&priority=&%40group=priority&status=1&%40columns=status&resolution=&%40pagesize=50&%40startwith=0&%40queryname=&%40old-queryname=&%40action=search to see all of the bugs Victor has filed today (looks like eight). -Brett From martin at v.loewis.de Mon Jul 7 06:37:10 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 07 Jul 2008 06:37:10 +0200 Subject: [Python-Dev] Community buildbots and Python release quality metrics In-Reply-To: <3f09d5a00807061409y42f17e8lba152f2bfe5025e4@mail.gmail.com> References: <20080626151855.25821.815972320.divmod.xquotient.10384@joule.divmod.com> <3f09d5a00807051802y20a86ceax2b3d0267ae6b6682@mail.gmail.com> <20080706154630.25821.930077863.divmod.xquotient.12261@joule.divmod.com> <3f09d5a00807061409y42f17e8lba152f2bfe5025e4@mail.gmail.com> Message-ID: <48719D76.7040806@v.loewis.de> > It's not only a question of changing a static label in this case. A > given buildslave can run the tests for multiple projects, in which > case it becomes tricky to change the label on the fly accordingly. I think you could set up different builders for a single slave in that case (use a slave lock to make them run sequentially). > As > an aside, the slave you mention was running on my machine, and I used > it to run the Twisted tests, but I shut it down a while ago because > the buildbot process was taking too many resources. If the Twisted > project can donate a machine, I'd be happy to include it in the Pybots > farm ASAP. Please talk to Trent Nelson. He has a Windows machine that he donated precisely for that kind of activity. Regards, Martin From martin at v.loewis.de Mon Jul 7 06:38:48 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 07 Jul 2008 06:38:48 +0200 Subject: [Python-Dev] Play with fuzzing In-Reply-To: <200807070111.52959.victor.stinner@haypocalc.com> References: <200807070111.52959.victor.stinner@haypocalc.com> Message-ID: <48719DD8.4070807@v.loewis.de> > I wrote a fuzzing "framework" called Fusil and this week I wrote a fuzzer for > Python. The idea is quite simple: for a module, > - list all functions, classes and class methods > - call a function with random arguments (of random types) > - instanciate a class with random arguments > - if the class is created correctly, call methods with random arguments I was already wondering how you found out all these things. It's quite amazing! Thanks, Martin From solipsis at pitrou.net Mon Jul 7 13:57:57 2008 From: solipsis at pitrou.net (Antoine) Date: Mon, 7 Jul 2008 13:57:57 +0200 (CEST) Subject: [Python-Dev] bytearray and array.array are not thread-safe In-Reply-To: <4871000E.2060800@v.loewis.de> References: <486FF59B.6090609@v.loewis.de> <1215343043.5983.7.camel@fsol> <4871000E.2060800@v.loewis.de> Message-ID: <2463911e698f87e837b4296cb13810ea.squirrel@webmail.nerim.net> > Unfortunately, it's also a significant change at this point. I > personally won't have time to provide a patch, but I think a patch > is needed before the last beta. IOW, the issue should become a > release blocker. Agreed. Unfortunately I don't have much time to write a patch either. Perhaps in one or two weeks, but it would be better if someone beats me to it. Regards Antoine. From solipsis at pitrou.net Mon Jul 7 14:39:33 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 7 Jul 2008 12:39:33 +0000 (UTC) Subject: [Python-Dev] buildbots References: <20080626151855.25821.815972320.divmod.xquotient.10384@joule.divmod.com> <3f09d5a00807051802y20a86ceax2b3d0267ae6b6682@mail.gmail.com> Message-ID: Hello, As someone who could (perhaps) (potentially) provide a buildbot machine, there are several questions which need answering before I take a decision: - are more buildbots needed and if so, which kinds of platforms/architectures? - for which software? Python itself? third-party apps and libraries? - how resource-consuming is it? CPU? memory? disk space? can it run along other services fine or does it need the whole machine for itself? - how time-consuming is it (in terms of human work)? I may spend a bit of time at the start to set it up but I'd like it to it run quite flawlessly afterward. I'm really not a sysadmin at heart... I suppose other interested people could ask themselves the same questions... Just my 2 cents. Antoine. From grig.gheorghiu at gmail.com Mon Jul 7 20:49:05 2008 From: grig.gheorghiu at gmail.com (Grig Gheorghiu) Date: Mon, 7 Jul 2008 11:49:05 -0700 Subject: [Python-Dev] buildbots In-Reply-To: References: <20080626151855.25821.815972320.divmod.xquotient.10384@joule.divmod.com> <3f09d5a00807051802y20a86ceax2b3d0267ae6b6682@mail.gmail.com> Message-ID: <3f09d5a00807071149r42b7b7ddl4eb5a91d21e662b0@mail.gmail.com> On Mon, Jul 7, 2008 at 5:39 AM, Antoine Pitrou wrote: > > > - are more buildbots needed and if so, which kinds of platforms/architectures? I can't really answer that question for the python code buildbot farm, but for the Pybots community project, the platforms we currently have are in a table on this page: http://pybots.org/ If you are able to offer something that's not on the list, that'd be good. But any help at all is appreciated. I believe Windows has traditionally been under-represented in all buildbot farms, and it's likely to stay that way... > - for which software? Python itself? third-party apps and libraries? For Pybots, we're testing third-party apps and libraries against changes made to Python core. If you're interested in a 3rd party project, and you're willing to stay on top of that project's buildbot status, and notify both the project leaders and the Python core devs whenever you notice an ugly breakage -- then you're exactly the kind of guy we need on the Pybots project :-) > - how resource-consuming is it? CPU? memory? disk space? can it run along other > services fine or does it need the whole machine for itself? In my experience, buildbot runs fine on newer hardware. It does consume CPU, so if you have a slow machine, it might start impacting your other processes. > - how time-consuming is it (in terms of human work)? I may spend a bit of time > at the start to set it up but I'd like it to it run quite flawlessly afterward. > I'm really not a sysadmin at heart... The initial learning curve can be a bit steep, but I'm here to help. Once you add your buildslave to the buildbot farm, things should run fairly smoothly. You will get notified via email / RSS about breakages, and then you'll have to invest the time to see what kind of breakage it is, and to notify the interested parties. > > I suppose other interested people could ask themselves the same questions... > > Just my 2 cents. > > Antoine. Thanks for the questions, they really help IMO. I also hope the answers helped. Grig From grig.gheorghiu at gmail.com Mon Jul 7 21:54:23 2008 From: grig.gheorghiu at gmail.com (Grig Gheorghiu) Date: Mon, 7 Jul 2008 12:54:23 -0700 Subject: [Python-Dev] Community buildbots and Python release quality metrics In-Reply-To: <3f09d5a00807061409y42f17e8lba152f2bfe5025e4@mail.gmail.com> References: <20080626151855.25821.815972320.divmod.xquotient.10384@joule.divmod.com> <3f09d5a00807051802y20a86ceax2b3d0267ae6b6682@mail.gmail.com> <20080706154630.25821.930077863.divmod.xquotient.12261@joule.divmod.com> <3f09d5a00807061409y42f17e8lba152f2bfe5025e4@mail.gmail.com> Message-ID: <3f09d5a00807071254x5b5ee091ye24781a1dd3afa07@mail.gmail.com> On Sun, Jul 6, 2008 at 2:09 PM, Grig Gheorghiu wrote: > I'll send a message to the pybots mailing list asking people whose > buildbots are turned off if they're still interested in running them. > Negative or no answers will mean we can remove them from the farm. > OK, I posted a message to the pybots mailing list and I removed 2 slaves. Out of the 6 remaining, 4 are currently active, and one will hopefully soon be active starting next week. This leaves just one unanswered for so far. I also got an email from another person volunteering a buildslave, so we'll soon have 7 machines. As I said, if anybody else wants to participate in the Pybots project, please let me know! I'll also post a blog entry on this soon. Grig From priyarp.tech at gmail.com Mon Jul 7 22:48:45 2008 From: priyarp.tech at gmail.com (Pree Raj) Date: Mon, 7 Jul 2008 13:48:45 -0700 Subject: [Python-Dev] __module__ not found on ported Python Message-ID: <8bb4faa80807071348u3c296026w303f6fe433bedf67@mail.gmail.com> Hi, I am trying to port Python to ThreadX. I have managed to get the prompt. However when I try "import sys" or any built in module I get an error __import__ not found. initmain() in the Initialization code is commented out at present because of some errors. Could it be because of this ? Also, I would like to know which are the MUST HAVE built in modules to be included for normal working of my ported version of Python. Thanks, Priya -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Mon Jul 7 23:15:32 2008 From: brett at python.org (Brett Cannon) Date: Mon, 7 Jul 2008 14:15:32 -0700 Subject: [Python-Dev] __module__ not found on ported Python In-Reply-To: <8bb4faa80807071348u3c296026w303f6fe433bedf67@mail.gmail.com> References: <8bb4faa80807071348u3c296026w303f6fe433bedf67@mail.gmail.com> Message-ID: On Mon, Jul 7, 2008 at 1:48 PM, Pree Raj wrote: [SNIP] > Also, I would like to know which are the MUST HAVE built in modules to be > included for normal working of my ported version of Python. You can look at sys.builtin_module_names to see what CPython compiles in. Otherwise you just have to go based on what error messages say. =) -Brett From priyarp.tech at gmail.com Tue Jul 8 01:39:39 2008 From: priyarp.tech at gmail.com (Pree Raj) Date: Mon, 7 Jul 2008 16:39:39 -0700 Subject: [Python-Dev] __module__ not found on ported Python In-Reply-To: References: <8bb4faa80807071348u3c296026w303f6fe433bedf67@mail.gmail.com> Message-ID: <8bb4faa80807071639l7879595ejb2a49affa9536442@mail.gmail.com> Thanks Brett. I have been able to do initmain() now. However, if I do "import sys" from the python prompt I still get ImportError: __import__ not found I am not sure where the initialization is going wrong for this error to show up. Can someone please help. On Mon, Jul 7, 2008 at 2:15 PM, Brett Cannon wrote: > On Mon, Jul 7, 2008 at 1:48 PM, Pree Raj wrote: > [SNIP] > > Also, I would like to know which are the MUST HAVE built in modules to be > > included for normal working of my ported version of Python. > > You can look at sys.builtin_module_names to see what CPython compiles > in. Otherwise you just have to go based on what error messages say. =) > > -Brett > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Tue Jul 8 07:32:29 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 08 Jul 2008 07:32:29 +0200 Subject: [Python-Dev] __module__ not found on ported Python In-Reply-To: <8bb4faa80807071639l7879595ejb2a49affa9536442@mail.gmail.com> References: <8bb4faa80807071348u3c296026w303f6fe433bedf67@mail.gmail.com> <8bb4faa80807071639l7879595ejb2a49affa9536442@mail.gmail.com> Message-ID: <4872FBED.50402@v.loewis.de> > ImportError: __import__ not found > I am not sure where the initialization is going wrong for this error to > show up. > Can someone please help. This isn't really the right list to ask for help, at least without studying some source code prior to posting. The specific error message is produced in ceval.c, IMPORT_NAME. Use debugging technologies to trace through the code to find out what went wrong. Regards, Martin From solipsis at pitrou.net Tue Jul 8 11:33:02 2008 From: solipsis at pitrou.net (Antoine) Date: Tue, 8 Jul 2008 11:33:02 +0200 (CEST) Subject: [Python-Dev] buildbots In-Reply-To: <3f09d5a00807071149r42b7b7ddl4eb5a91d21e662b0@mail.gmail.com> References: <20080626151855.25821.815972320.divmod.xquotient.10384@joule.divmod.com> <3f09d5a00807051802y20a86ceax2b3d0267ae6b6682@mail.gmail.com> <3f09d5a00807071149r42b7b7ddl4eb5a91d21e662b0@mail.gmail.com> Message-ID: <3baff3efe0bbb965dd60c3b8a996e9f3.squirrel@webmail.nerim.net> Hi and thanks for your answers, > If you are able to offer something that's not on the list, that'd be > good. But any help at all is appreciated. > > I believe Windows has traditionally been under-represented in all > buildbot farms, and it's likely to stay that way... Well what I could provide is a 32-bit x86 Debian stable. Rather common I fear... > For Pybots, we're testing third-party apps and libraries against > changes made to Python core. If you're interested in a 3rd party > project, and you're willing to stay on top of that project's buildbot > status, and notify both the project leaders and the Python core devs > whenever you notice an ugly breakage Not interested /enough/ I think... by your description it sounds the job should really be done by a core developer of each of those packages (even if the machine is donated by someone else). What I could be interested in is to provide a buildbot for Python itself, but I don't know if that's needed right now. Especially for such a common platform as a x86 Debian. Regards Antoine. From jeremy at alum.mit.edu Tue Jul 8 14:49:02 2008 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue, 8 Jul 2008 08:49:02 -0400 Subject: [Python-Dev] [Python-checkins] buildbot failure in sparc Debian 3.0 In-Reply-To: <20080702202345.DAD571E400A@bag.python.org> References: <20080702202345.DAD571E400A@bag.python.org> Message-ID: Does anyone have a clue about why this test fails only on this platform? The test is question is verifying that URLError gets raised. From the traceback, it appears that there is an uncaught exception (URLError) but it fails in an assertRaises() check for URLError. That doesn't make much sense unless the variable URLError refers to different objects, but both modules use "from urllib.error import URLError". And, of course, the test works fine on other platforms. Jeremy On Wed, Jul 2, 2008 at 4:23 PM, wrote: > The Buildbot has detected a new failure of sparc Debian 3.0. > Full details are available at: > http://www.python.org/dev/buildbot/all/sparc%20Debian%203.0/builds/330 > > Buildbot URL: http://www.python.org/dev/buildbot/all/ > > Buildslave for this Build: klose-debian-sparc > > Build Reason: > Build Source Stamp: [branch branches/py3k] HEAD > Blamelist: benjamin.peterson > > BUILD FAILED: failed test > > Excerpt from the test logfile: > 1 test failed: > test_urllib2 > > ====================================================================== > ERROR: test_badly_named_methods (test.test_urllib2.OpenerDirectorTests) > ---------------------------------------------------------------------- > > Traceback (most recent call last): > File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/test/test_urllib2.py", line 408, in test_badly_named_methods > self.assertRaises(URLError, o.open, scheme+"://example.com/") > File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/unittest.py", line 311, in failUnlessRaises > callableObj(*args, **kwargs) > File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/urllib/request.py", line 356, in open > response = self._open(req, data) > File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/urllib/request.py", line 379, in _open > 'unknown_open', req) > File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/urllib/request.py", line 334, in _call_chain > result = func(*args) > File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/urllib/request.py", line 1102, in unknown_open > raise URLError('unknown url type: %s' % type) > urllib.error.URLError: > > make: *** [buildbottest] Error 1 > > sincerely, > -The Buildbot > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://mail.python.org/mailman/listinfo/python-checkins > From martin at v.loewis.de Tue Jul 8 21:15:08 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 08 Jul 2008 21:15:08 +0200 Subject: [Python-Dev] [Python-checkins] buildbot failure in sparc Debian 3.0 In-Reply-To: References: <20080702202345.DAD571E400A@bag.python.org> Message-ID: <4873BCBC.9040104@v.loewis.de> Jeremy Hylton wrote: > Does anyone have a clue about why this test fails only on this > platform? The test is question is verifying that URLError gets > raised. From the traceback, it appears that there is an uncaught > exception (URLError) but it fails in an assertRaises() check for > URLError. That doesn't make much sense unless the variable URLError > refers to different objects, but both modules use "from urllib.error > import URLError". And, of course, the test works fine on other > platforms. It might be a transient error, and a complete cleanup of the tree might fix it. To do so, build a non-existent branch through the web ui, then build the original branch again; this will cause a fresh checkout. If the error then persists, my guess it's some kind of compiler issue, which can be investigated only with access to the machine. Regards, Martin From glyph at divmod.com Tue Jul 8 21:30:04 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Tue, 08 Jul 2008 19:30:04 -0000 Subject: [Python-Dev] Community buildbots and Python release quality metrics In-Reply-To: <487100FE.508@v.loewis.de> References: <20080626151855.25821.815972320.divmod.xquotient.10384@joule.divmod.com> <3f09d5a00807051802y20a86ceax2b3d0267ae6b6682@mail.gmail.com> <20080706154630.25821.930077863.divmod.xquotient.12261@joule.divmod.com> <487100FE.508@v.loewis.de> Message-ID: <20080708193004.25821.1534535725.divmod.xquotient.12490@joule.divmod.com> On 6 Jul, 05:29 pm, martin at v.loewis.de wrote: >>(I'd also like to improve the labels of the build slaves. What >>exactly >>is "x86 Red Hat 9 trunk" testing? Trunk of what? What project?) > >It seems like you would like to edit the master configuration file. >That can be arranged fairly easily. How shall we arrange it, then? :) Whoever is interested, I've got a recent SSH key on https://launchpad.net/~glyph/+sshkeys From glyph at divmod.com Tue Jul 8 21:56:56 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Tue, 08 Jul 2008 19:56:56 -0000 Subject: [Python-Dev] Community buildbots and Python release quality metrics In-Reply-To: <3f09d5a00807061409y42f17e8lba152f2bfe5025e4@mail.gmail.com> References: <20080626151855.25821.815972320.divmod.xquotient.10384@joule.divmod.com> <3f09d5a00807051802y20a86ceax2b3d0267ae6b6682@mail.gmail.com> <20080706154630.25821.930077863.divmod.xquotient.12261@joule.divmod.com> <3f09d5a00807061409y42f17e8lba152f2bfe5025e4@mail.gmail.com> Message-ID: <20080708195656.25821.1158092892.divmod.xquotient.12536@joule.divmod.com> On 6 Jul, 09:09 pm, grig.gheorghiu at gmail.com wrote: >On Sun, Jul 6, 2008 at 8:46 AM, wrote: >> >>It's one thing to tell people that they need to be helping out (and >>I'm sure you're right) but it's much more useful to get the message >>out that >>"we really need people to do X, Y, and Z". >I have posted 'cries for help' repeatedly on my blog, with generally >little success. See http://agiletesting.blogspot.com/search?q=pybots . >But I will post again. If you can include here a paragraph of what >your vision of the 'X, Y and Z' above is, that'd help too It looks to me like the main thing that Pybots needs is help with maintenance. Getting someone to set up an individual buildbot is one thing, but keeping it working is the important bit and it looks like people are not doing that. The project would also be greatly aided by having dedicated people diagnose errors, report bugs against Python core if they're real and report bugs against Pybots if they're spurious. It would be good to have this effort be centralized and directed because it would otherwise be too easy to file duplicate bug reports, or to assume "oh, this has been failing for months, someone must have filed a bug already". >It's not only a question of changing a static label in this case. A >given buildslave can run the tests for multiple projects, in which >case it becomes tricky to change the label on the fly accordingly. I'm a bit confused about how the projects being tested changes on the fly... but then, this level of specifics is probably best left to the pybots mailing list. Hopefully sometime soon I'll have the time to add yet another subscription. Thanks for cleaning up the buildbots though! I can see a lot more tests actually running now :). From grig.gheorghiu at gmail.com Tue Jul 8 22:47:28 2008 From: grig.gheorghiu at gmail.com (Grig Gheorghiu) Date: Tue, 8 Jul 2008 13:47:28 -0700 Subject: [Python-Dev] Community buildbots and Python release quality metrics In-Reply-To: <20080708195656.25821.1158092892.divmod.xquotient.12536@joule.divmod.com> References: <20080626151855.25821.815972320.divmod.xquotient.10384@joule.divmod.com> <3f09d5a00807051802y20a86ceax2b3d0267ae6b6682@mail.gmail.com> <20080706154630.25821.930077863.divmod.xquotient.12261@joule.divmod.com> <3f09d5a00807061409y42f17e8lba152f2bfe5025e4@mail.gmail.com> <20080708195656.25821.1158092892.divmod.xquotient.12536@joule.divmod.com> Message-ID: <3f09d5a00807081347x631b7ca3w37effe6e75f6fe6@mail.gmail.com> On Tue, Jul 8, 2008 at 12:56 PM, wrote: > It looks to me like the main thing that Pybots needs is help with > maintenance. Getting someone to set up an individual buildbot is one thing, > but keeping it working is the important bit and it looks like people are not > doing that. The project would also be greatly aided by having dedicated > people diagnose errors, report bugs against Python core if they're real and > report bugs against Pybots if they're spurious. > > It would be good to have this effort be centralized and directed because it > would otherwise be too easy to file duplicate bug reports, or to assume "oh, > this has been failing for months, someone must have filed a bug already". I agree with all you're saying here. As usual, the devil is in the details. Finding those 'dedicated people' and also people who would act as the central point of contact for bug reports etc. turns out to be very hard in practice. If you have any ideas, I'd be glad to hear them. Grig From tseaver at palladion.com Fri Jul 11 05:08:11 2008 From: tseaver at palladion.com (Tres Seaver) Date: Thu, 10 Jul 2008 23:08:11 -0400 Subject: [Python-Dev] [Python-checkins] buildbot failure in sparc Debian 3.0 In-Reply-To: <4873BCBC.9040104@v.loewis.de> References: <20080702202345.DAD571E400A@bag.python.org> <4873BCBC.9040104@v.loewis.de> Message-ID: <4876CE9B.8050402@palladion.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martin v. L?wis wrote: > Jeremy Hylton wrote: >> Does anyone have a clue about why this test fails only on this >> platform? The test is question is verifying that URLError gets >> raised. From the traceback, it appears that there is an uncaught >> exception (URLError) but it fails in an assertRaises() check for >> URLError. That doesn't make much sense unless the variable URLError >> refers to different objects, but both modules use "from urllib.error >> import URLError". And, of course, the test works fine on other >> platforms. > > It might be a transient error, and a complete cleanup of the tree > might fix it. To do so, build a non-existent branch through the web ui, > then build the original branch again; this will cause a fresh checkout. > > If the error then persists, my guess it's some kind of compiler issue, > which can be investigated only with access to the machine. I would also be on the lookout for stale .pyc / .pyo files: I saw a similar failure recently while testing third-party code, and suspected both causes: cleaning out the .pyc files and carefully removing aliased imports eventually got the problem to go away (at which point I could no longer reproduce it at all). Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIds6b+gerLs4ltQ4RAmIRAJ4pxs0sWLDrpOAilqV+Mx8vKJzeEQCeLMoX gsFhfjJ4bxwAxgBji7/Jzvw= =bMRD -----END PGP SIGNATURE----- From dickinsm at gmail.com Fri Jul 11 10:37:36 2008 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 11 Jul 2008 09:37:36 +0100 Subject: [Python-Dev] patch review request: float.hex and float.fromhex Message-ID: <5c6f2a5d0807110137l3fffd090mbc2c9c80d1f7b05b@mail.gmail.com> Does anyone have time to review the patch http://bugs.python.org/file10876/hex_float5.patch for issue 3008 (float <-> hexadecimal string conversion): http://bugs.python.org/issue3008 ? It would be really good if this could go in before next week's beta. Guido's approved the idea in principle, but I still need to: - get permission from Barry to check in a new feature this late in the release cycle, and - persuade some other developer to review the patch. I'll gladly 'pay' for a patch review by reviewing one or more of someone else's patches. Mark From python at rcn.com Fri Jul 11 11:06:51 2008 From: python at rcn.com (Raymond Hettinger) Date: Fri, 11 Jul 2008 12:06:51 +0300 Subject: [Python-Dev] patch review request: float.hex and float.fromhex References: <5c6f2a5d0807110137l3fffd090mbc2c9c80d1f7b05b@mail.gmail.com> Message-ID: <3A2DC3264D7A4DCD9D64416A9441B8A1@RaymondLaptop1> From: "Mark Dickinson" > Does anyone have time to review the patch > > http://bugs.python.org/file10876/hex_float5.patch > > for issue 3008 (float <-> hexadecimal string conversion): I'll look at it today and tomorrow. Raymond From python at rcn.com Fri Jul 11 13:24:39 2008 From: python at rcn.com (Raymond Hettinger) Date: Fri, 11 Jul 2008 14:24:39 +0300 Subject: [Python-Dev] Running Py2.6 with the -3 option Message-ID: <731F1430F8054EC1A183F121F84B4504@RaymondLaptop1> Some effort needs to be made to clear the standard library of -3 warnings. Running -3 on production code usually involves exercising library code so the useful result is obscured by Python complaining about itself. Since that use case involves the users own tests, I don't think the effort needs to be extended to our own unittest suite. But the rest of the library could likely benefit from a good -3 cleanup. Raymond From kirkshorts at hotmail.co.uk Fri Jul 11 13:27:44 2008 From: kirkshorts at hotmail.co.uk (Andy Scott) Date: Fri, 11 Jul 2008 11:27:44 +0000 Subject: [Python-Dev] A proposed solution for Issue 502236: Asyncrhonous exceptions between threads Message-ID: [OK so a newbie post here so many apologies if I am doing this wrong] Quick Synopsis: A child thread in an executing Python program can not safely shutdown the program. The issue URL is: http://bugs.python.org/issue502236 So my proposal is: Example: We have three threads - t0 - Main system thread t1 - Worker thread t2 - Worker thread t1 encounters an issue that means it wants to shut down the application in as safe a way as possible A Solution: 1. Put in place a new function call sys.exitapplication, what this would do is: a. Mark a flag in t0's data structure saying a request to shutdown has been made b. Raise a new exception, SystemShuttingDown, in t1. 2. As the main interpreter executes it checks the "shutting down flag" in the per thread data and follows one of two paths: If it is t0: a. Stops execution of the current code sequence b. Iterates over all extant threads setting the "system shutdown" flag in the per thread data structure. Setting this flag is a one time deal - it can not be undone once set. (And to avoid issues with multiple threads setting it - it can only ever be a single fixed value so setting it multiple times results in the same answer) c. Enters a timed wait loop where it will allow the other threads time to see the signal. It will iterate this loop a set number of times to avoid being blocked on any given thread. d. When all threads have exited, or been forcefully closed, raise the SystemShuttingDown exception If it is not t0: a. Stops execution of the current code sequence b. Raises the exception, SystemShuttingDown. There are problems with this approach, as I see it they are (but please see the assumptions I have made): P1. If the thread is in a tight loop will it see the exception? Or more generally: when should the exception be raised? P2. When should the interpreter check this flag? I think the answer to both of these problems is to: Check the flag, and hence raise the exception, in the following circumstances: - When the interpreter executes a back loop. So this should catch the jump back to the top of a "while True:" loop - Just before the interpreter makes a call to a hooked in non-Python system function, e.g. file I/O, networking &c. Checking at these points should be the minimal required, I think, to ensure that a given thread can not ignore the exception. It may be possible, or even required, to perform the check every time a Python function call is made. I think this approach would then allow for the finally handlers to be called. Assumptions: [Here I must admit to a large amount of ignorance of the internals of Python at this time. So if my assumptions are incorrect I would greatly appreciate being told so :-) Preferably as polite as possible and any code pointers while welcome unless they point to some very esoteric and arcane area would be best kept general so I feel more of a spur to go learn the code base] 1. The Python interpreter has per thread information. 2. The Python interpreter can tell if the system, t0, thread is running. 3. The Python engine has (or can easily obtain) a list of all threads it created. 4. It is possible to raise exceptions as the byte code is executing. I am mailing this out as: A. I have no idea if my thoughts are correct or total un-mitigated rubbish :-) B. I believe the introduction of this proposal (if I am correct) will require a PEP being raised, which aiui requires building community support (which is very fair imo) so this is me trying to do so :-) So apologies if this post has been total spam (but no eggs) or too long - give a little whistle and it will all be OK again. Andy --------------------------------------Brain chemistry is not just for Christmas _________________________________________________________________ Play and win great prizes with Live Search and Kung Fu Panda http://clk.atdmt.com/UKM/go/101719966/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From musiccomposition at gmail.com Fri Jul 11 14:19:18 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Fri, 11 Jul 2008 07:19:18 -0500 Subject: [Python-Dev] Running Py2.6 with the -3 option In-Reply-To: <731F1430F8054EC1A183F121F84B4504@RaymondLaptop1> References: <731F1430F8054EC1A183F121F84B4504@RaymondLaptop1> Message-ID: <1afaf6160807110519w16b97efbt8a7b13b1276df135@mail.gmail.com> On Fri, Jul 11, 2008 at 6:24 AM, Raymond Hettinger wrote: > Some effort needs to be made to clear the standard library of -3 warnings. > Running -3 on production code usually involves exercising library code so > the useful result is obscured by Python complaining about itself. Since > that use case involves the users own tests, I don't think the effort needs > to be extended to our own unittest suite. But the rest of the library could > likely benefit from a good -3 cleanup. Yes, indeed. We should make sure, however, that the changes in the 2.6 libraries are the absolute minimum to get the job done. (I'm trying to pretend like this isn't violating the prohibition on all-inclusive overhauls in the stdlib.) -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From steve at holdenweb.com Fri Jul 11 15:02:21 2008 From: steve at holdenweb.com (Steve Holden) Date: Fri, 11 Jul 2008 09:02:21 -0400 Subject: [Python-Dev] Running Py2.6 with the -3 option In-Reply-To: <1afaf6160807110519w16b97efbt8a7b13b1276df135@mail.gmail.com> References: <731F1430F8054EC1A183F121F84B4504@RaymondLaptop1> <1afaf6160807110519w16b97efbt8a7b13b1276df135@mail.gmail.com> Message-ID: Benjamin Peterson wrote: > On Fri, Jul 11, 2008 at 6:24 AM, Raymond Hettinger wrote: >> Some effort needs to be made to clear the standard library of -3 warnings. >> Running -3 on production code usually involves exercising library code so >> the useful result is obscured by Python complaining about itself. Since >> that use case involves the users own tests, I don't think the effort needs >> to be extended to our own unittest suite. But the rest of the library could >> likely benefit from a good -3 cleanup. > > Yes, indeed. We should make sure, however, that the changes in the 2.6 > libraries are the absolute minimum to get the job done. (I'm trying to > pretend like this isn't violating the prohibition on all-inclusive > overhauls in the stdlib.) > The prohibition is on *gratuitous* changes, basically along the lines of "if it ain't broke, don't fix it". The stdlib is definitely broken if it raises warnings of that kind. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From musiccomposition at gmail.com Fri Jul 11 16:50:21 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Fri, 11 Jul 2008 09:50:21 -0500 Subject: [Python-Dev] Running Py2.6 with the -3 option In-Reply-To: References: <731F1430F8054EC1A183F121F84B4504@RaymondLaptop1> <1afaf6160807110519w16b97efbt8a7b13b1276df135@mail.gmail.com> Message-ID: <1afaf6160807110750m45ccf92ek37a255fef025832b@mail.gmail.com> On Fri, Jul 11, 2008 at 8:02 AM, Steve Holden wrote: > Benjamin Peterson wrote: >> >> Yes, indeed. We should make sure, however, that the changes in the 2.6 >> libraries are the absolute minimum to get the job done. (I'm trying to >> pretend like this isn't violating the prohibition on all-inclusive >> overhauls in the stdlib.) >> > The prohibition is on *gratuitous* changes, basically along the lines of "if > it ain't broke, don't fix it". The stdlib is definitely broken if it raises > warnings of that kind. Just because it's massive breakage fixage doesn't mean that it's unlikely to break something else. :) > > regards > Steve > -- > Steve Holden +1 571 484 6266 +1 800 494 3119 > Holden Web LLC http://www.holdenweb.com/ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/musiccomposition%40gmail.com > -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From steve at holdenweb.com Fri Jul 11 17:08:41 2008 From: steve at holdenweb.com (Steve Holden) Date: Fri, 11 Jul 2008 11:08:41 -0400 Subject: [Python-Dev] Running Py2.6 with the -3 option In-Reply-To: <1afaf6160807110750m45ccf92ek37a255fef025832b@mail.gmail.com> References: <731F1430F8054EC1A183F121F84B4504@RaymondLaptop1> <1afaf6160807110519w16b97efbt8a7b13b1276df135@mail.gmail.com> <1afaf6160807110750m45ccf92ek37a255fef025832b@mail.gmail.com> Message-ID: <48777779.70302@holdenweb.com> Benjamin Peterson wrote: > On Fri, Jul 11, 2008 at 8:02 AM, Steve Holden wrote: >> Benjamin Peterson wrote: >>> Yes, indeed. We should make sure, however, that the changes in the 2.6 >>> libraries are the absolute minimum to get the job done. (I'm trying to >>> pretend like this isn't violating the prohibition on all-inclusive >>> overhauls in the stdlib.) >>> >> The prohibition is on *gratuitous* changes, basically along the lines of "if >> it ain't broke, don't fix it". The stdlib is definitely broken if it raises >> warnings of that kind. > > Just because it's massive breakage fixage doesn't mean that it's > unlikely to break something else. :) > I agree but, contrariwise, just because we are likely to break other things doesn't mean we shouldn't fix the massive breakage. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From steve at holdenweb.com Fri Jul 11 17:08:41 2008 From: steve at holdenweb.com (Steve Holden) Date: Fri, 11 Jul 2008 11:08:41 -0400 Subject: [Python-Dev] Running Py2.6 with the -3 option In-Reply-To: <1afaf6160807110750m45ccf92ek37a255fef025832b@mail.gmail.com> References: <731F1430F8054EC1A183F121F84B4504@RaymondLaptop1> <1afaf6160807110519w16b97efbt8a7b13b1276df135@mail.gmail.com> <1afaf6160807110750m45ccf92ek37a255fef025832b@mail.gmail.com> Message-ID: <48777779.70302@holdenweb.com> Benjamin Peterson wrote: > On Fri, Jul 11, 2008 at 8:02 AM, Steve Holden wrote: >> Benjamin Peterson wrote: >>> Yes, indeed. We should make sure, however, that the changes in the 2.6 >>> libraries are the absolute minimum to get the job done. (I'm trying to >>> pretend like this isn't violating the prohibition on all-inclusive >>> overhauls in the stdlib.) >>> >> The prohibition is on *gratuitous* changes, basically along the lines of "if >> it ain't broke, don't fix it". The stdlib is definitely broken if it raises >> warnings of that kind. > > Just because it's massive breakage fixage doesn't mean that it's > unlikely to break something else. :) > I agree but, contrariwise, just because we are likely to break other things doesn't mean we shouldn't fix the massive breakage. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From status at bugs.python.org Fri Jul 11 18:06:49 2008 From: status at bugs.python.org (Python tracker) Date: Fri, 11 Jul 2008 18:06:49 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20080711160649.5EDC0782BC@psf.upfronthosting.co.za> ACTIVITY SUMMARY (07/04/08 - 07/11/08) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 1967 open (+43) / 13199 closed (+17) / 15166 total (+60) Open issues with patches: 621 Average duration of open issues: 700 days. Median duration of open issues: 1604 days. Open Issues Breakdown open 1939 (+42) pending 28 ( +1) Issues Created Or Reopened (60) _______________________________ Delete obsolete 'Unicode' in Py3 str docstrings; related fixes 07/04/08 CLOSED http://bugs.python.org/issue3284 created tjreedy easy Fraction.from_any() 07/04/08 CLOSED http://bugs.python.org/issue3285 created rhettinger patch IDLE opens window too low on Windows 07/04/08 http://bugs.python.org/issue3286 created tjreedy Fraction constructor should raise TypeError instead of Attribute 07/04/08 CLOSED http://bugs.python.org/issue3287 created rhettinger patch float.as_integer_ratio method is not documented 07/05/08 http://bugs.python.org/issue3288 created marketdickinson unnecessary call to time and localtime slows time.mktime 07/05/08 CLOSED http://bugs.python.org/issue3289 created nother_jnelson python-config --cflags includes irrelevant flags 07/05/08 http://bugs.python.org/issue3290 created sacha rlcompleter doesn't work anymore 07/05/08 CLOSED http://bugs.python.org/issue3291 created pitrou patch Position index limit; s.insert(i,x) not same as s[i:i]=[x] 07/05/08 http://bugs.python.org/issue3292 created tjreedy incorrect comments for PyObject_ReleaseBuffer 07/05/08 http://bugs.python.org/issue3293 created pitrou SVN repository contains an incorrect symbolic link 07/05/08 CLOSED http://bugs.python.org/issue3294 created pitrou PyExc_BufferError is declared but nowhere defined 07/05/08 CLOSED http://bugs.python.org/issue3295 created pitrou patch print function not executed in python 3.0 tutorial 07/05/08 CLOSED http://bugs.python.org/issue3296 created segfaulthunter Python interpreter uses Unicode surrogate pairs only before the 07/06/08 http://bugs.python.org/issue3297 created ezio.melotti Multiline string with quotes is not parsed correctly. 07/06/08 CLOSED http://bugs.python.org/issue3298 created Stavros invalid object destruction in re.finditer() 07/06/08 http://bugs.python.org/issue3299 created haypo patch urllib.quote and unquote - Unicode issues 07/06/08 http://bugs.python.org/issue3300 created mgiuca patch DoS when lo is negative in bisect.insort_right() / _left() 07/06/08 CLOSED http://bugs.python.org/issue3301 created haypo patch segfault on gettext(None) 07/06/08 http://bugs.python.org/issue3302 created haypo patch invalid ref count on locale.strcoll() error 07/06/08 http://bugs.python.org/issue3303 created haypo patch invalid call to PyMem_Free() in fileio_init() 07/06/08 http://bugs.python.org/issue3304 created haypo patch, easy Use Py_XDECREF() instead of Py_DECREF() in MultibyteCodec and Mu 07/06/08 http://bugs.python.org/issue3305 created haypo patch audioop.findmax() crashs with negative length 07/06/08 CLOSED http://bugs.python.org/issue3306 created haypo patch invalid check of _bsddb creation failure 07/06/08 http://bugs.python.org/issue3307 created haypo patch MinGW built extensions do not load (specified procedure cannot b 07/07/08 CLOSED http://bugs.python.org/issue3308 created rogerbinns missing lock release in BZ2File_iternext() 07/07/08 http://bugs.python.org/issue3309 created haypo patch, easy Out-of-date example 3.0b1 Tutorial Classes page, 'issubclass' 07/07/08 http://bugs.python.org/issue3310 created alexis.layton block operation on closed socket/pipe for multiprocessing 07/07/08 http://bugs.python.org/issue3311 created haypo patch bugs in _sqlite module 07/07/08 http://bugs.python.org/issue3312 created haypo patch dlopen() error with no error message from dlerror() 07/07/08 http://bugs.python.org/issue3313 created haypo patch urllib.parse doesn't import sys 07/07/08 CLOSED http://bugs.python.org/issue3314 created mgiuca patch abc.rst little error 07/07/08 CLOSED http://bugs.python.org/issue3315 created mishok13 patch Proposal for fix_urllib 07/07/08 http://bugs.python.org/issue3316 created nedds patch duplicate lines in zipfile.py 07/07/08 http://bugs.python.org/issue3317 created amaury.forgeotdarc patch Documentation: timeit: "lower bound" should read "upper bound" 07/08/08 http://bugs.python.org/issue3318 created unutbu pystone.main(10) causes ZeroDivisionError 07/08/08 http://bugs.python.org/issue3319 created mokeefe patch various doc typos 07/08/08 http://bugs.python.org/issue3320 created dsm001 patch _multiprocessing.Connection() doesn't check handle 07/08/08 http://bugs.python.org/issue3321 created haypo patch bugs in scanstring_str() and scanstring_unicode() of _json modul 07/08/08 http://bugs.python.org/issue3322 created haypo Clarify __slots__ behaviour when inheriting 07/09/08 http://bugs.python.org/issue3323 created strangefeatures Broken link in online doc 07/09/08 http://bugs.python.org/issue3324 created ThomasH use of cPickle in multiprocessing 07/09/08 http://bugs.python.org/issue3325 created mishok13 patch py3k shouldn't use -fno-strict-aliasing anymore 07/09/08 http://bugs.python.org/issue3326 created cartman patch NULL member in modules_by_index 07/09/08 http://bugs.python.org/issue3327 created krisvale When PyObject_CallMethod fails, refcount is incorrect 07/09/08 http://bugs.python.org/issue3328 created dominic.lavoie API for setting the memory allocator used by Python 07/09/08 http://bugs.python.org/issue3329 created jlaurila webbrowser module doesn't correctly handle '|' character. 07/09/08 http://bugs.python.org/issue3330 created AdrianP Possible inconsistency in behavior of list comprehensions vs. ge 07/10/08 http://bugs.python.org/issue3331 created carlj DocTest and dict sort. 07/10/08 CLOSED http://bugs.python.org/issue3332 created jedie Need -3 warning for exec statement becoming a function 07/10/08 CLOSED http://bugs.python.org/issue3333 created rhettinger 2to3 looses indentation on import fix 07/10/08 http://bugs.python.org/issue3334 created ctheune subprocess lib - opening same command fails 07/10/08 http://bugs.python.org/issue3335 created gtg944q datetime weekday() function 07/10/08 http://bugs.python.org/issue3336 created ryanboesch Fixer for dbm is failing 07/11/08 CLOSED http://bugs.python.org/issue3337 created brett.cannon cPickle segfault with deep recursion 07/11/08 http://bugs.python.org/issue3338 created esrever_otua dummy_thread LockType.acquire() always returns None, should be T 07/11/08 http://bugs.python.org/issue3339 created toymachine patch optparse print_usage(),.. methods are not documented 07/11/08 http://bugs.python.org/issue3340 created techtonik "Suggest a change" link 07/11/08 http://bugs.python.org/issue3341 created techtonik Tracebacks are not properly indented 07/11/08 http://bugs.python.org/issue3342 created amaury.forgeotdarc patch Py_DisplaySourceLine is not documented 07/11/08 http://bugs.python.org/issue3343 created amaury.forgeotdarc Issues Now Closed (36) ______________________ async_chat.__init__() parameters 221 days http://bugs.python.org/issue1519 josiahcarlson Error when printing an exception containing a Unicode string 99 days http://bugs.python.org/issue2517 ncoghlan patch performance problem in socket._fileobject.read 82 days http://bugs.python.org/issue2632 gregory.p.smith patch shutil.copytree glob-style filtering [patch] 76 days http://bugs.python.org/issue2663 georg.brandl patch "Report bug" links 61 days http://bugs.python.org/issue2823 techtonik cleanup of freelist management 52 days http://bugs.python.org/issue2862 gregory.p.smith patch, patch By default, HTTPSConnection should send header "Host: somehost" 24 days http://bugs.python.org/issue3094 gregory.p.smith patch, easy glob.py improvements 20 days http://bugs.python.org/issue3159 facundobatista patch cmath test fails on Solaris 10 14 days http://bugs.python.org/issue3168 MrJean1 patch sha modules & Modules/Setup.dist 13 days http://bugs.python.org/issue3183 gregory.p.smith float('infinity') should be valid 11 days http://bugs.python.org/issue3188 marketdickinson patch Improve subprocess module usage 6 days http://bugs.python.org/issue3235 georg.brandl curses/textpad.py incorrectly and redundantly imports ascii 5 days http://bugs.python.org/issue3239 facundobatista patch socket's OOB data management is broken on OS X and FreeBSD 3 days http://bugs.python.org/issue3277 gregory.p.smith socket's SO_OOBINLINE option does not work 3 days http://bugs.python.org/issue3278 gregory.p.smith %c format does not accept large numbers on ucs-2 builds 1 days http://bugs.python.org/issue3280 amaury.forgeotdarc Delete obsolete 'Unicode' in Py3 str docstrings; related fixes 0 days http://bugs.python.org/issue3284 benjamin.peterson easy Fraction.from_any() 6 days http://bugs.python.org/issue3285 rhettinger patch Fraction constructor should raise TypeError instead of Attribute 6 days http://bugs.python.org/issue3287 rhettinger patch unnecessary call to time and localtime slows time.mktime 0 days http://bugs.python.org/issue3289 facundobatista rlcompleter doesn't work anymore 0 days http://bugs.python.org/issue3291 benjamin.peterson patch SVN repository contains an incorrect symbolic link 0 days http://bugs.python.org/issue3294 benjamin.peterson PyExc_BufferError is declared but nowhere defined 0 days http://bugs.python.org/issue3295 benjamin.peterson patch print function not executed in python 3.0 tutorial 0 days http://bugs.python.org/issue3296 benjamin.peterson Multiline string with quotes is not parsed correctly. 0 days http://bugs.python.org/issue3298 Stavros DoS when lo is negative in bisect.insort_right() / _left() 4 days http://bugs.python.org/issue3301 rhettinger patch audioop.findmax() crashs with negative length 1 days http://bugs.python.org/issue3306 facundobatista patch MinGW built extensions do not load (specified procedure cannot b 1 days http://bugs.python.org/issue3308 loewis urllib.parse doesn't import sys 0 days http://bugs.python.org/issue3314 facundobatista patch abc.rst little error 0 days http://bugs.python.org/issue3315 benjamin.peterson patch DocTest and dict sort. 0 days http://bugs.python.org/issue3332 rhettinger Need -3 warning for exec statement becoming a function 1 days http://bugs.python.org/issue3333 rhettinger Fixer for dbm is failing 0 days http://bugs.python.org/issue3337 brett.cannon asyncore.py and "handle_error" 1839 days http://bugs.python.org/issue760475 josiahcarlson asyncore misses socket closes when poll is used 1515 days http://bugs.python.org/issue953599 josiahcarlson asyncore should handle also ECONNABORTED in recv 390 days http://bugs.python.org/issue1736101 josiahcarlson patch Top Issues Most Discussed (10) ______________________________ 14 threading module can deadlock after fork 1643 days open http://bugs.python.org/issue874900 11 MinGW built extensions do not load (specified procedure cannot 1 days closed http://bugs.python.org/issue3308 10 urllib.quote and unquote - Unicode issues 5 days open http://bugs.python.org/issue3300 9 Crash in PyObject_Malloc 356 days open http://bugs.python.org/issue1758146 8 urllib2 header capitalization 122 days open http://bugs.python.org/issue2275 7 bytearrays are not thread safe 22 days open http://bugs.python.org/issue3139 7 test_multiprocessing hangs intermittently on POSIX platforms 9 days open http://bugs.python.org/issue3088 6 API for setting the memory allocator used by Python 2 days open http://bugs.python.org/issue3329 6 duplicate lines in zipfile.py 4 days open http://bugs.python.org/issue3317 6 Let bin/oct/hex show floats 17 days open http://bugs.python.org/issue3008 From rhamph at gmail.com Fri Jul 11 21:26:33 2008 From: rhamph at gmail.com (Adam Olsen) Date: Fri, 11 Jul 2008 13:26:33 -0600 Subject: [Python-Dev] Running Py2.6 with the -3 option In-Reply-To: References: <731F1430F8054EC1A183F121F84B4504@RaymondLaptop1> <1afaf6160807110519w16b97efbt8a7b13b1276df135@mail.gmail.com> Message-ID: On Fri, Jul 11, 2008 at 7:02 AM, Steve Holden wrote: > Benjamin Peterson wrote: >> >> On Fri, Jul 11, 2008 at 6:24 AM, Raymond Hettinger wrote: >>> >>> Some effort needs to be made to clear the standard library of -3 >>> warnings. >>> Running -3 on production code usually involves exercising library code >>> so >>> the useful result is obscured by Python complaining about itself. Since >>> that use case involves the users own tests, I don't think the effort >>> needs >>> to be extended to our own unittest suite. But the rest of the library >>> could >>> likely benefit from a good -3 cleanup. >> >> Yes, indeed. We should make sure, however, that the changes in the 2.6 >> libraries are the absolute minimum to get the job done. (I'm trying to >> pretend like this isn't violating the prohibition on all-inclusive >> overhauls in the stdlib.) >> > The prohibition is on *gratuitous* changes, basically along the lines of "if > it ain't broke, don't fix it". The stdlib is definitely broken if it raises > warnings of that kind. Is the stdlib broken or is it the warnings that are broken? The code is just fine in 2.6. Adding pragmas to disable warnings would be just fine. Or we could hardcode some warnings as "already seen". -- Adam Olsen, aka Rhamphoryncus From brett at python.org Fri Jul 11 22:16:30 2008 From: brett at python.org (Brett Cannon) Date: Fri, 11 Jul 2008 13:16:30 -0700 Subject: [Python-Dev] Running Py2.6 with the -3 option In-Reply-To: References: <731F1430F8054EC1A183F121F84B4504@RaymondLaptop1> <1afaf6160807110519w16b97efbt8a7b13b1276df135@mail.gmail.com> Message-ID: On Fri, Jul 11, 2008 at 12:26 PM, Adam Olsen wrote: > On Fri, Jul 11, 2008 at 7:02 AM, Steve Holden wrote: >> Benjamin Peterson wrote: >>> >>> On Fri, Jul 11, 2008 at 6:24 AM, Raymond Hettinger wrote: >>>> >>>> Some effort needs to be made to clear the standard library of -3 >>>> warnings. >>>> Running -3 on production code usually involves exercising library code >>>> so >>>> the useful result is obscured by Python complaining about itself. Since >>>> that use case involves the users own tests, I don't think the effort >>>> needs >>>> to be extended to our own unittest suite. But the rest of the library >>>> could >>>> likely benefit from a good -3 cleanup. >>> >>> Yes, indeed. We should make sure, however, that the changes in the 2.6 >>> libraries are the absolute minimum to get the job done. (I'm trying to >>> pretend like this isn't violating the prohibition on all-inclusive >>> overhauls in the stdlib.) >>> >> The prohibition is on *gratuitous* changes, basically along the lines of "if >> it ain't broke, don't fix it". The stdlib is definitely broken if it raises >> warnings of that kind. > > Is the stdlib broken or is it the warnings that are broken? Nothing is broken, per se, but the stdlib emits a ton of warnings through basic usage for Py3K-related changes. We are telling people to run their code in 2.6 with -3 and to eliminate all warnings in order to have 2to3 work to transition to 3.0. Having the stdlib itself emit warnings is just not reasonable. > The code > is just fine in 2.6. Adding pragmas to disable warnings would be just > fine. Or we could hardcode some warnings as "already seen". > No, we should eat our own dog food and transition the code over. If anything it will help with code maintenance between 2.x and 3.x. -Brett From josiah.carlson at gmail.com Sat Jul 12 04:51:58 2008 From: josiah.carlson at gmail.com (Josiah Carlson) Date: Fri, 11 Jul 2008 19:51:58 -0700 Subject: [Python-Dev] A proposed solution for Issue 502236: Asyncrhonous exceptions between threads In-Reply-To: References: Message-ID: This doesn't need to be an interpreter thing; it's easy to implement by the user (I've done it about a dozen times using a single global flag). If you want it to be automatic, it's even possible to make it happen automatically using sys.settrace() and friends (you can even make it reasonably fast if you use a C callback). - Josiah On Fri, Jul 11, 2008 at 4:27 AM, Andy Scott wrote: > [OK so a newbie post here so many apologies if I am doing this wrong] > > Quick Synopsis: > > A child thread in an executing Python program can not safely shutdown the > program. The issue URL is: http://bugs.python.org/issue502236 > > So my proposal is: > > Example: > > We have three threads - > t0 - Main system thread > t1 - Worker thread > t2 - Worker thread > > t1 encounters an issue that means it wants to shut down the application in > as safe a way as possible > > > A Solution: > > 1. Put in place a new function call sys.exitapplication, what this would do > is: > a. Mark a flag in t0's data structure saying a request to shutdown has > been made > b. Raise a new exception, SystemShuttingDown, in t1. > 2. As the main interpreter executes it checks the "shutting down flag" in > the per thread data and follows one of two paths: > If it is t0: > a. Stops execution of the current code sequence > b. Iterates over all extant threads setting the "system shutdown" flag > in the per thread data structure. Setting this flag is a one time deal - it > can not be undone once set. (And to avoid issues with multiple threads > setting it - it can only ever be a single fixed value so setting it multiple > times results in the same answer) > c. Enters a timed wait loop where it will allow the other threads time > to see the signal. It will iterate this loop a set number of times to avoid > being blocked on any given thread. > d. When all threads have exited, or been forcefully closed, raise the > SystemShuttingDown exception > > If it is not t0: > a. Stops execution of the current code sequence > b. Raises the exception, SystemShuttingDown. > > There are problems with this approach, as I see it they are (but please see > the assumptions I have made): > > P1. If the thread is in a tight loop will it see the exception? Or more > generally: when should the exception be raised? > P2. When should the interpreter check this flag? > > I think the answer to both of these problems is to: > > Check the flag, and hence raise the exception, in the following > circumstances: > > - When the interpreter executes a back loop. So this should catch the jump > back to the top of a "while True:" loop > - Just before the interpreter makes a call to a hooked in non-Python > system function, e.g. file I/O, networking &c. > > Checking at these points should be the minimal required, I think, to ensure > that a given thread can not ignore the exception. It may be possible, or > even required, to perform the check every time a Python function call is > made. > > I think this approach would then allow for the finally handlers to be > called. > > Assumptions: > > [Here I must admit to a large amount of ignorance of the internals of Python > at this time. So if my assumptions are incorrect I would greatly appreciate > being told so :-) Preferably as polite as possible and any code pointers > while welcome unless they point to some very esoteric and arcane area would > be best kept general so I feel more of a spur to go learn the code base] > > 1. The Python interpreter has per thread information. > 2. The Python interpreter can tell if the system, t0, thread is running. > 3. The Python engine has (or can easily obtain) a list of all threads it > created. > 4. It is possible to raise exceptions as the byte code is executing. > > I am mailing this out as: > > A. I have no idea if my thoughts are correct or total un-mitigated rubbish > :-) > B. I believe the introduction of this proposal (if I am correct) will > require a PEP being raised, which aiui requires building community support > (which is very fair imo) so this is me trying to do so :-) > > So apologies if this post has been total spam (but no eggs) or too long - > give a little whistle and it will all be OK again. > > Andy > -------------------------------------- > Brain chemistry is not just for Christmas > > > ________________________________ > Get Messenger on your Mobile! Get it now! > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/josiah.carlson%40gmail.com > > From fumanchu at aminus.org Sat Jul 12 19:08:31 2008 From: fumanchu at aminus.org (Robert Brewer) Date: Sat, 12 Jul 2008 10:08:31 -0700 Subject: [Python-Dev] A proposed solution for Issue 502236: Asyncrhonousexceptions between threads In-Reply-To: References: Message-ID: Josiah Carlson wrote: > This doesn't need to be an interpreter thing; it's easy to implement > by the user (I've done it about a dozen times using a single global > flag). If you want it to be automatic, it's even possible to make it > happen automatically using sys.settrace() and friends (you can even > make it reasonably fast if you use a C callback). Agreed. If someone wants a small library to help do this, especially in web servers, the latest version of Cherrpy includes a 'process' subpackage under a generous license. It does all the things Andy describes via a Bus object: > Andy Scott wrote: > > 1. Put in place a new function call sys.exitapplication, what this > > would do is: > > a. Mark a flag in t0's data structure saying a request to > > shutdown has been made This is bus.exit(), which publishes a 'stop' message to all subscribed 'stop' listeners, and then an 'exit' message to any 'exit' listeners. > > b. Raise a new exception, SystemShuttingDown, in t1. That's up to the listener. > > 2. As the main interpreter executes it checks the "shutting down > > flag" in the per thread data and follows one of two paths: > > If it is t0: > > a. Stops execution of the current code sequence > > b. Iterates over all extant threads ... > > c. Enters a timed wait loop where it will allow the other > > threads time to see the signal. It will iterate this loop > > a set number of times to avoid being blocked on any given > > thread. This is implemented as [t.join() for t in threading.enumerate()] in the main thread. > > d. When all threads have exited, or been forcefully closed, > > raise the SystemShuttingDown exception The bus just lets the main thread exit at this point. > > P1. If the thread is in a tight loop will it see the exception? Or > > more generally: when should the exception be raised? That's dependent enough on what work the thread is doing that a completely generic approach is generally not sufficient. Therefore, the process.bus sends a 'stop' message, and leaves the implementation of the receiver up to the author of that thread's logic. Presumably, one wouldn't register a listener for the 'stop' message unless one knew how to actually stop. > > P2. When should the interpreter check this flag? > > > > I think the answer to both of these problems is to check the flag, > > and hence raise the exception, in the following circumstances: > > - When the interpreter executes a back loop. So this should catch > > the jump back to the top of a "while True:" loop > > - Just before the interpreter makes a call to a hooked in non- > > Python system function, e.g. file I/O, networking &c. This is indeed how most well-written apps do it already. > > Checking at these points should be the minimal required, I think, to > > ensure that a given thread can not ignore the exception. It may be > > possible, or even required, to perform the check every time a Python > > function call is made. PLEASE don't make Python function calls slower. > > 1. The Python interpreter has per thread information. > > 2. The Python interpreter can tell if the system, t0, thread is > > running. > > 3. The Python engine has (or can easily obtain) a list of all > > threads it created. > > 4. It is possible to raise exceptions as the byte code is executing. Replace 'Python interpreter' with 'your application' and those become relatively simple architectural issues: maintain a list of threads, have them expose an interface to determine if they're running, and make them monitor a flag to know when another thread is asking them to stop. Robert Brewer fumanchu at aminus.org From matt.giuca at gmail.com Sat Jul 12 19:27:16 2008 From: matt.giuca at gmail.com (Matt Giuca) Date: Sun, 13 Jul 2008 03:27:16 +1000 Subject: [Python-Dev] urllib.quote and unquote - Unicode issues Message-ID: Hi all, My first post to the list. In fact, first time Python hacker, long-time Python user though. (Melbourne, Australia). Some of you may have seen for the past week or so my bug report on Roundup, http://bugs.python.org/issue3300 I've spent a heap of effort on this patch now so I'd really like to get some more opinions and have this patch considered for Python 3.0. Basically, urllib.quote and unquote seem not to have been updated since Python 2.5, and because of this they implicitly perform Latin-1 encoding and decoding (with respect to percent-encoded characters). I think they should default to UTF-8 for a number of reasons, including that's what other software such as web browsers use. I've submitted a patch which fixes quote and unquote to use UTF-8 by default. I also added extra arguments allowing the caller to choose the encoding (after discussion, there was some consensus that this would be beneficial). I have now completed updating the documentation, writing extensive test cases, and testing the rest of the standard library for code breakage - with the result being there wasn't really any, everything seems to just work nicely with UTF-8. You can read the sordid details of my investigation in the tracker. Firstly, it'd be nice to hear if people think this is desirable behaviour. Secondly, if it's feasible to get this patch in Python 3.0. (I think if it were delayed to Python 3.1, the code breakage wouldn't justify it). And thirdly, if the first two are positive, if anyone would like to review this patch and check it in. I have extensively tested it, and am now pretty confident that it won't cause any grief if it's checked in. Thanks very much, Matt Giuca -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Sat Jul 12 20:46:48 2008 From: brett at python.org (Brett Cannon) Date: Sat, 12 Jul 2008 11:46:48 -0700 Subject: [Python-Dev] urllib.quote and unquote - Unicode issues In-Reply-To: References: Message-ID: On Sat, Jul 12, 2008 at 10:27 AM, Matt Giuca wrote: > Hi all, > > My first post to the list. In fact, first time Python hacker, long-time > Python user though. (Melbourne, Australia). > Welcome! > Some of you may have seen for the past week or so my bug report on Roundup, > http://bugs.python.org/issue3300 > > I've spent a heap of effort on this patch now so I'd really like to get some > more opinions and have this patch considered for Python 3.0. > Hopefully we can get to it in the near future. Since we are having two more betas (one of this is next week) hopefully there is enough time before hitting a release candidate to have this looked at. > Basically, urllib.quote and unquote seem not to have been updated since > Python 2.5, and because of this they implicitly perform Latin-1 encoding and > decoding (with respect to percent-encoded characters). I think they should > default to UTF-8 for a number of reasons, including that's what other > software such as web browsers use. > > I've submitted a patch which fixes quote and unquote to use UTF-8 by > default. I also added extra arguments allowing the caller to choose the > encoding (after discussion, there was some consensus that this would be > beneficial). I have now completed updating the documentation, writing > extensive test cases, and testing the rest of the standard library for code > breakage - with the result being there wasn't really any, everything seems > to just work nicely with UTF-8. You can read the sordid details of my > investigation in the tracker. > > Firstly, it'd be nice to hear if people think this is desirable behaviour. Based on what is said in this email, it sounds reasonable. > Secondly, if it's feasible to get this patch in Python 3.0. (I think if it > were delayed to Python 3.1, the code breakage wouldn't justify it). If what you are saying is true, then it can probably go in as a bug fix (unless someone else knows something about Latin-1 on the Net that makes this not true). > And > thirdly, if the first two are positive, if anyone would like to review this > patch and check it in. > That I can't say I can necessarily due; have my own bug reports to work through this weekend. =) -Brett From janssen at parc.com Sat Jul 12 23:07:09 2008 From: janssen at parc.com (Bill Janssen) Date: Sat, 12 Jul 2008 14:07:09 PDT Subject: [Python-Dev] urllib.quote and unquote - Unicode issues In-Reply-To: References: Message-ID: <08Jul12.140711pdt."58698"@synergy1.parc.xerox.com> > Basically, urllib.quote and unquote seem not to have been updated since > Python 2.5, and because of this they implicitly perform Latin-1 encoding and > decoding (with respect to percent-encoded characters). I think they should > default to UTF-8 for a number of reasons, including that's what other > software such as web browsers use. The standard here is RFC 3986, from Jan 2005, which says, ``When a new URI scheme defines a component that represents textual data consisting of characters from the Universal Character Set [UCS], the data should first be encoded as octets according to the UTF-8 character encoding [STD63]; then only those octets that do not correspond to characters in the unreserved set should be percent-encoded.'' The "unreserved set" consists of the following ASCII characters: ``Characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde. unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" '' There are a few other wrinkles; it's worth reading section 2.5 carefully. I'd say, treat the incoming data as either Unicode (if it's a Unicode string), or some unknown superset of ASCII (which includes both Latin-1 and UTF-8) if it's a byte-string (and thus in some unknown encoding), and apply the appropriate transformation. Bill From asmodai at in-nomine.org Sun Jul 13 00:28:01 2008 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Sun, 13 Jul 2008 00:28:01 +0200 Subject: [Python-Dev] urllib.quote and unquote - Unicode issues In-Reply-To: References: Message-ID: <20080712222801.GC27106@nexus.in-nomine.org> -On [20080712 19:27], Matt Giuca (matt.giuca at gmail.com) wrote: >Basically, urllib.quote and unquote seem not to have been updated since Python >2.5, and because of this they implicitly perform Latin-1 encoding and decoding >(with respect to percent-encoded characters). I think they should default to >UTF-8 for a number of reasons, including that's what other software such as web >browsers use. Very nice, I had this somewhere on my todo list to work on. I'm very much in favour, especially since it synchronizes us with the RFCs (for all I remember reading about it last time). -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Can your hear the Dolphin's cry..? From martin at v.loewis.de Sun Jul 13 01:10:01 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 13 Jul 2008 01:10:01 +0200 Subject: [Python-Dev] urllib.quote and unquote - Unicode issues In-Reply-To: <20080712222801.GC27106@nexus.in-nomine.org> References: <20080712222801.GC27106@nexus.in-nomine.org> Message-ID: <487939C9.3040703@v.loewis.de> > Very nice, I had this somewhere on my todo list to work on. I'm very much > in favour, especially since it synchronizes us with the RFCs (for all I > remember reading about it last time). I still think that it doesn't. The RFCs haven't changed, and can't change for compatibility reasons. The encoding of non-ASCII characters in URLs remains as underspecified as it always was. Now, with IRIs, the situation is different, but I don't think the patch claims to implement IRIs (and if so, it perhaps shouldn't change URL processing in doing so). Regards, Martin From matt.giuca at gmail.com Sun Jul 13 01:15:18 2008 From: matt.giuca at gmail.com (Matt Giuca) Date: Sun, 13 Jul 2008 09:15:18 +1000 Subject: [Python-Dev] urllib.quote and unquote - Unicode issues In-Reply-To: <20080712222801.GC27106@nexus.in-nomine.org> References: <20080712222801.GC27106@nexus.in-nomine.org> Message-ID: Thanks for all the replies, and making me feel welcome :) > > If what you are saying is true, then it can probably go in as a bug > fix (unless someone else knows something about Latin-1 on the Net that > makes this not true). > Well from what I've seen, the only time Latin-1 naturally appears on the net is when you have a web page in Latin-1 (either explicit or inferred; and note that a browser like Firefox will infer Latin-1 if it sees only ASCII characters) with a form in it. Submitting the form, the browser will use Latin-1 to percent-encode the query string. So if you write a web app and you don't have any non-ASCII characters or mention the charset, chances are you'll get Latin-1. But I would argue you're leaving things to chance and you deserve to get funny behaviour. If you do any of the following: - Use a non-ASCII character, encoded as UTF-8 on the page. - Send a Content-Type: xxxx; charset=utf-8. - In HTML, set a . - In the form itself, set
RetroSearch is an open source project built by @garambo
| Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4