Apologies for the long message. There are a lot of issues to address: There was a clear concensus at the XML-SIG developer's day discussion that Expat should become part of the standard distribution. Admittedly the audience was biased and Fredrik wasn't in the room at that point but it was clear that everyone was in agreement (in contrast to the doc-sig discussion!). I think Andrew had some reservations (he was probably subconsciously channeling Guido) but almost everyone in the room was strongly behind the idea -- and the room was overfull. Insofar as this is not a democracy, I feel the need to channel some of the crowd's opinions and some of my own. The crowd (and I) obviously thought that XML support is an important part of coming with "batteries included" on the modern Web. There are four basic specs maintained by the W3C and IETF that underly the Web: URLs, HTTP, HTML and now XML. In fact, modern versions of HTML (XHTML) and HTTP (WebDAV) depend upon XML. Microsoft is also trying to establish XML-based protocols as replacements for CORBA and as the basis of their entire Web object model. Luckily, the important things to know about XML are very simple. It's a way of encoding hierarchical structures in text using a standard, language-independent syntax that happens to be compatible with document markup syntaxes. Some other things to know about it are: * it is very rigorously defined * there are test suites to verify implementations * it has enough nooks and crannies to be hard to implement * xmllib doesn't implement enough of it * and thus isn't a conforming XML parser xmllib was pretty cool when it was the first XML parser in a general purpose language. Now it is out of date. It is, however, what we present to the world as our "XML support." Whatever we do about expat, we need to decide what to do about the fact that xmllib is not a real XML processor (plus it is slow as hell!). Writing an XML processor is harder than it should be and very few people have the patience to pour over the spec and get it right. Okay, so of course you know where I am leading. Perl, Apache, Mozilla and most other C-coded open source software projects embed expat. This is because expat is blazingly fast, Unicode aware and highly conformant. It's written in ANSI-C and seems stable as a rock. It changes slowly and doesn't have a lot of extra features. Best of all, someone else maintains it and we have wrapped it in a pretty thin C layer which is easy to maintain. The layer is roughly the same size as xmllib. Guido astonished me at IPC8 with a level of humility and honesty that is very rare in this business -- especially coming from a successful language designer. He said that part of why Python didn't grab a bigger part of the CGI market was because he didn't understand the importance of CGI to the Web in the early days. He has also not been shy in saying he doesn't know much about XML. Many of us think that it will be much bigger than CGI. One opinion expressed during the meeting is that XML is a big draw for business, development money and publicity. Okay, having XML in a separate package is not the same as ignoring it altogether but people expect these fundamental technologies to be built in. As soon as you split them out you run into versioning and distribution issues. Yes, distutils will help, but I don't think it will do everything. I don't know of any package management system that can automatically correct version skew problems. The only "system" that works is full-distribution testing. Some feel that we should install PyExpat but not expat. The problem is installation, especially on Windows. It is demonstrably the case that windows programmers are ALREADY nervous about installing the XML toolkit. I got two personal emails about how to install last week (where do people get my email address??) and the XML-SIG list got one or two also. If we install pyexpat without expat, we'll have versioning problems, path problems, multiple DLL problems and so forth. If we statically bind expat and pyexpat the problems go away (on windows at least). There are rumours that some Unixes are not smart in the same situation. This can be solved by renaming symbols before building. This can be accomplished with the C pre-processor. Expat+Pyexpat is about 100K. My Python directory is 35MB so I'm not too worried. I think that the compressed Python tarball is more than 5MB now, isn't it? I'm not big on the idea of multiple Python "distributions" because in practice there will be only two: the portable one and the Windows-specific one. We'll still have to write emails like this imploring the (two) maintainers to support XML or whatever and we may have divergence between the two versions. Distributions make sense in the Linux case because there is a lot of money going around, there is money to be made on shrink-wrapped boxes and it is important to optimize for different cases. For Python, the freebsd model of "the same everywhere" is more appropriate. If that means a more distributed standard library maintenance mechanism, then fine, let's work that out. I don't expect Guido to maintain PyExpat or Expat any more than Larry Wall maintains the Perl XML parser layer or Brian B. maintains the XML support in Apache himself. If we can get concensus on this issue, I will approach James Clark for a more Pythonic license. Right now it has an MPL license but I suspect that James will be flexible. Therefore the concrete proposal is: * add expat, pyexpat and a thin SAX layer to the standard Python distribution * rename symbols in expat if necessary * deprecate xmllib * continue development of the XML toolkit for non-core tools -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4