RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2000-September/009487.html below:

[Python-Dev] How about braindead Unicode "compression"?

[Python-Dev] How about braindead Unicode "compression"?Guido van Rossum guido@beopen.com
Sun, 24 Sep 2000 16:47:52 -0500

Previous message: [Python-Dev] How about braindead Unicode "compression"?
Next message: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules posixmodule.c,2.173,2.174
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> unicodedatabase.c has 64K lines of the form:
> 
> /* U+009a */ { 13, 0, 15, 0, 0 },
> 
> Each struct getting initialized there takes 8 bytes on most machines (4
> unsigned chars + a char*).
> 
> However, there are only 3,567 unique structs (54,919 of them are all 0's!).
> So a braindead-easy mechanical "compression" scheme would simply be to
> create one vector with the 3,567 unique structs, and replace the 64K record
> constructors with 2-byte indices into that vector.  Data size goes down from
> 
>     64K * 8b = 512Kb
> 
> to
> 
>     3567 * 8b + 64K * 2b ~= 156Kb
> 
> at once; the source-code transformation is easy to do via a Python program;
> the compiler warnings on my platform (due to unicodedatabase.c's sheer size)
> can go away; and one indirection is added to access (which remains utterly
> uniform).
> 
> Previous objections to compression were, as far as I could tell, based on
> fear of elaborate schemes that rendered the code unreadable and the access
> code excruciating.  But if we can get more than a factor of 3 with little
> work and one new uniform indirection, do people still object?
> 
> If nobody objects by the end of today, I intend to do it.

Go for it!  I recall seeing that file and thinking the same thing.

(Isn't the VC++ compiler warning about line numbers > 64K?  Then you'd
have to put two pointers on one line to make it go away, regardless of
the size of the generated object code.)

--Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)

Previous message: [Python-Dev] How about braindead Unicode "compression"?
Next message: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules posixmodule.c,2.173,2.174
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4