> unicodedatabase.c has 64K lines of the form: > > /* U+009a */ { 13, 0, 15, 0, 0 }, > > Each struct getting initialized there takes 8 bytes on most machines (4 > unsigned chars + a char*). > > However, there are only 3,567 unique structs (54,919 of them are all 0's!). > So a braindead-easy mechanical "compression" scheme would simply be to > create one vector with the 3,567 unique structs, and replace the 64K record > constructors with 2-byte indices into that vector. Data size goes down from > > 64K * 8b = 512Kb > > to > > 3567 * 8b + 64K * 2b ~= 156Kb > > at once; the source-code transformation is easy to do via a Python program; > the compiler warnings on my platform (due to unicodedatabase.c's sheer size) > can go away; and one indirection is added to access (which remains utterly > uniform). > > Previous objections to compression were, as far as I could tell, based on > fear of elaborate schemes that rendered the code unreadable and the access > code excruciating. But if we can get more than a factor of 3 with little > work and one new uniform indirection, do people still object? > > If nobody objects by the end of today, I intend to do it. Go for it! I recall seeing that file and thinking the same thing. (Isn't the VC++ compiler warning about line numbers > 64K? Then you'd have to put two pointers on one line to make it go away, regardless of the size of the generated object code.) --Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4