Fredrik Lundh wrote: > > CT: > > How do I build a dist that doesn't need to change a lot of > > stuff in the user's installation? > > somewhere in this thread, Guido wrote: > > > BTW, I added a tag "pre-unicode" to the CVS tree to the revisions > > before the Unicode changes were made. > > maybe you could base SLP on that one? I have no idea how this works. Would this mean that I cannot get patctes which come after unicode? Meanwhile, I've looked into the sources. It is easy for me to get rid of the problem by supplying my own unicodedata.c, where I replace all functions by some unimplemented exception. Furthermore, I wondered about the data format. Is the unicode database used inyou re package as well? Otherwise, I see only references form unicodedata.c, and that means the data structure can be massively enhanced. At the moment, that baby is 64k entries long, with four bytes and an optional string. This is a big waste. The strings are almost all some distinct <xxx> prefixes, together with a list of hex smallwords. This is done as strings, probably this makes 80 percent of the space. The only function that uses the "decomposition" field (namely the string) is unicodedata_decomposition. It does nothing more than to wrap it into a PyObject. We can do a little better here. I gues I can bring it down to a third of this space without much effort, just by using - binary encoding for the <xxx> tags as enumeration - binary encoding of the hexed entries - omission of the spaces Instead of a 64 k of structures which contain pointers anyway, I can use a 64k pointer array with offsets into one packed table. The unicodedata access functions would change *slightly*, just building some hex strings and so on. I guess this is not a time critical section? Should I try this evening? :-) cheers - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4