Fredrik Lundh wrote: > M.-A. Lemburg wrote: > >>The whole point in adding Unicode to the language was to make >>the difference between text and binary data clear and visible >>at the type level. > > well, when I wrote the Unicode type, the whole point was to be able to > make it easy to handle Unicode text. no more, no less. ... and the Unicode integration made that a reality :-) In todays globalized world, the only sane way to deal with different scripts is through Unicode, which is why I believe that text data should eventually always be stored in Unicode objects - regardless of whether it takes more memory or not. (If you compare development time to prices of a few GB extra RAM, the effort needed to maintain text in non-Unicode formats simply doesn't pay off anymore.) >>If we start to store text data in Unicode now and leave binary >>data in 8-bit strings, then the move to Unicode strings literals >>will be much smoother in P3k. > > hopefully, the P3K string design will take a lot more into account than > text-vs-binary; there are many ways to represent text, and many ways > to store binary data, and many usage patterns for them both. a good > design should take most of this into account. (google for "stringlib" for > some work I'm doing in this area) Ah, now I know where you're coming from :-) Shift tables don't work well in the Unicode world with its large alphabet. BTW, you might want to look at the BMS implementation I did for mxTextTools. Here's a nice reference for pattern matching: http://www-igm.univ-mlv.fr/~lecroq/string/index.html -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 08 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4