Bill Tutt wrote: > > MAL wrote: > > >Andrew M. Kuchling" wrote: > >> > >> Paul Prescod writes: > >>>The new \N escape interpolates named characters within strings. For > >>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a > >>>unicode smiley face at the end. > >> > >> Cute idea, and it certainly means you can avoid looking up Unicode > >> numbers. (You can look up names instead. :) ) Note that this means the > >> Unicode database is no longer optional if this is done; it has to be > >> around at code-parsing time. Python could import it automatically, as > >> exceptions.py is imported. Christian's work on compressing > >> unicodedatabase.c is therefore really important. (Is Perl5.6 actually > >> dragging around the Unicode database in the binary, or is it read out > >> of some external file or data structure?) > > > > Sorry to disappoint you guys, but the Unicode name and comments > > are *not* included in the unicodedatabase.c file Christian > > is currently working on. The reason is simple: it would add > > huge amounts of string data to the file. So this is a no-no > > for the core distribution... > > > > Ok, now you're just being silly. Its possible to put the character names in > a separate structure so that they don't automatically get paged in with the > normal unicode character property data. If you never use it, it won't get > paged in, its that simple.... Sure, but it would still cause the interpreter binary or DLL to increase in size considerably... that caused some major noise a few days ago due to the fact that the unicodedata module adds some 600kB to the interpreter -- even though it would only get swapped in when needed (the interpreter itself doesn't use it). > Looking up the Unicode code value from the Unicode character name smells > like a good time to use gperf to generate a perfect hash function for the > character names. Esp. for the Unicode 3.0 character namespace. Then you can > just store the hashkey -> Unicode character mapping, and hardly ever need to > page in the actual full character name string itself. Great idea, but why not put this into separate codec module ? > I haven't looked at what the comment field contains, so I have no idea how > useful that info is. Probably not worth looking at... > *waits while gperf crunches through the ~10,550 Unicode characters where > this would be useful* -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4