A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://mail.python.org/pipermail/python-dev/2005-October/057359.html below:

[Python-Dev] Unicode charmap decoders slow

[Python-Dev] Unicode charmap decoders slow [Python-Dev] Unicode charmap decoders slow"Martin v. Löwis" martin at v.loewis.de
Sun Oct 16 18:53:24 CEST 2005
Tony Nelson wrote:
> Umm, 0 (NUL) is a valid output character in most of the 8-bit character
> sets.  It could be handled by having a separate "exceptions" string of the
> unicode code points that actually map to the exception char.

Yes. But only U+0000 should normally map to 0. It could be special-cased
altogether.

> As you are concerned about pathological cases for hashing (that would make
> the hash chains long), it is worth noting that in such cases this data
> structure could take 64K bytes.  Of course, no such case occurs in standard
> encodings, and 64K is affordable as long is it is rare.

I'm not concerned with long hash chains, I dislike having collisions in 
the first place (even if they are only for two code points). Having to
deal with collisions makes the code more complicated, and less predictable.

It's primarily a matter of taste: avoid hashtables if you can :-)

Regards,
Martin
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4