Thank you very much for your persistent help. I was able to get the 8th bit characters to act as keys...with a somewhat complex construction: chr(int(linesplit[0])). Linesplit had decimal numbers in text format. linesplit = split('\t',encodingline) if (len(linesplit) > 5): try: templist = linesplit[2:4] templist.append(split(';|:',linesplit[4])) templist.append(strip(linesplit[5])) encodedict[chr(int(linesplit[0]))] = templist print templist except ValueError: logerror('My error', linesplit[0]) else: logerror('Not >5 fields long', linesplit) D-Man wrote: > On Fri, Apr 20, 2001 at 09:45:33PM +0100, Maurice Bauhahn wrote: > | Thank you for the suggestion, D-Man. > | > | However, I doubt that this is a problem with the display, because I > | can see all these unusual characters when I print a line of text to > | the screen. The problem becomes obvious when I try one of those > | upper ASCII characters as a key of the dictionary...it does not > | work. My hope is to compare each character from a text file...and > > How do you know it doesn't work? I have heard that all strings in > Jython are Unicode because all Java strings are Unicode (or something > like that). > > Say...I just tried it again, using Jython 2.0 and CPython 2.1. If I > type > > print chr( 233 ) > > I get an accented e in CPython and something else from Jython, but not > the '\351' from before. Actually in CPython I get '\xe9' if I just > call chr. It might be a difference between str() and repr(). > > If you can enter the character into your file, putting a 'u' in front > of the string specifies it as unicode. Ex : > > print u'é' > > Say, what if you use the 'unichr' function? There might be a > difference between chr and unichr (in CPython there is). > > Here is a snippet, CPython first, then Jython : > > >>> unichr( 8218 ) > u'\u201a' > >>> print unichr( 8218 ) > > Traceback (most recent call last): > File "<stdin>", line 1, in ? > UnicodeError: ASCII encoding error: ordinal not in range(128) > > >>> ord( 'é' ) > 8218 > >>> unichr( 233 ) > '\351' > >>> unichr( 8218 ) > u'\u201A' > >>> print unichr( 8218 ) > é > >>> print chr( 8218 ) > é > > | use the dictionary to assist in translation of those characters to > | Unicode (the Cambodian script...so standard Java code converters are > | not useful). > | > | Maybe I will have to call a Java function to accomplish my desired > | task, right? > > Maybe. I really don't have much experience with using Unicode or > locale specific stuff. > > I hope my results give you some thoughts on how to solve your problem. > -D -- Maurice Bauhahn 2 Meadow Way Dorney Reach MAIDENHEAD SL6 0DS United Kingdom Home Tel: +44(0)1628 626068 Work Tel: +44(0)1932 878404 Home Email: bauhahnm at clara.net Work Email: mbauhahn at brio.com
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4