RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2002-August/027792.html below:

[Python-Dev] PEP 277 (unicode filenames): please review

[Python-Dev] PEP 277 (unicode filenames): please reviewJack Jansen Jack.Jansen@oratrix.com
Wed, 14 Aug 2002 11:46:52 +0200

Previous message: [Python-Dev] PEP 277 (unicode filenames): please review
Next message: [Python-Dev] PEP 277 (unicode filenames): please review
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wednesday, August 14, 2002, at 08:33 , Martin v. Loewis wrote:
>> Do I misunderstand something, or this this a bug (limitation?) in the
>> unicode->latin-1 decoder?
>
> It's a limitation, in all codecs. Contributions of normalization code
> are welcome. Since this is hard work, this is unlikely to be fixed in
> Python 2.3 - unless somebody has a really good incentive for fixing
> it.

Why is this hard work? I would guess that a simple table lookup would 
suffice, after all there are only a finite number of unicode characters 
that can be split up, and each one can be split up in only a small 
number of ways.

Wouldn't something like
for c in input:
	if not canbestartofcombiningsequence.has_key(c):
		output.append(c)
      nlookahead = MAXCHARSTOCOMBINE
      while nlookahead > 1:
		attempt = lookahead next nlookahead bytes from input
		if combine.has_key(attempt):
			output.append(combine[attempt])
			skip the lookahead in input
			break
	else:
		output.append(c)
do the trick, if the two dictionaries are initialized intelligently?
		
--
- Jack Jansen        <Jack.Jansen@oratrix.com>        
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -

Previous message: [Python-Dev] PEP 277 (unicode filenames): please review
Next message: [Python-Dev] PEP 277 (unicode filenames): please review
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4