Thank you, Steve. Maybe it would help if I could explain what I am doing. I'm trying to write a programme to transcode eight-bit encodings to Unicode encodings (the Cambodian/Khmer language) and then do letter-pair frequency studies. Since I will be comparing characters (not integers) to compare against the key, I need to have characters as the key. That effort has in fact now been successfull. My next problem, discussed elsewhere in comp.lang.python, is to import Unicode escaped characters/strings from another 8-bit encoded file into a Jython dictionary. (Presumably the solution is in the codecs module). Steve Holden wrote: > "Maurice Bauhahn" <bauhahnm at clara.net> wrote in message > news:mailman.988158281.19384.python-list at python.org... > > Thank you very much for your persistent help. > > > > I was able to get the 8th bit characters to act as keys...with a somewhat > > complex construction: chr(int(linesplit[0])). Linesplit had decimal > > numbers in text format. > > > Would this shed any light on your original question or help in solving your > problem more compactly? Note that this is CPython, not Jython, but > portability should make all this work in both implementations. From what > I've read, it seems to be your need to see decimal numbers in the source > whcih led you to these contortions. > This solved the problem I first encountered (which was probably an artifact of something on the same line!). > > Your original assertion that > > """ > >>> chr(127) > '?' (in fact a character like a house) > """ > > is quite correct, but I don't see why a weird printable representation makes > a character unsuitable for use as a dictionary key. Maybe I missed your > point. Anyway ... > I do not worry about the shape of the characters (why those of Khmer are much more novel in any case;-)). > > >>> # Construct a string of all chars from 0 to 255 > >>> chars = "".join(map(chr, [i for i in range(256)])) > >>> # Use decimal value to access single characters > >>> # and use them as dictionary keys > >>> dict = {} > >>> dict[chars[233]] = "Two hundred thirty-three" > >>> dict[chars[27]] = "escape" > >>> dict["\033"] > 'escape' > >>> dict["\351"] > 'Two hundred thirty-three' > >>> dict > {'\033': 'escape', '\351': 'Two hundred thirty-three'} > >>> > > In other words, having constructed the chars[] list, you can index it with > decimal numbers to get the characters you want. chars could equally have > been a list of single-character strings, with the same effect. Yes, it is the list of single-character strings that I am using (now successfully). > > > If this doesn't help you at all, please feel free to ignore my rantings. > Thank you for the questions and desire to help! > > regards > Steve > > > linesplit = split('\t',encodingline) > > if (len(linesplit) > 5): > > try: > > templist = linesplit[2:4] > > templist.append(split(';|:',linesplit[4])) > > templist.append(strip(linesplit[5])) > > encodedict[chr(int(linesplit[0]))] = templist > > print templist > > except ValueError: > > logerror('My error', linesplit[0]) > > else: > > logerror('Not >5 fields long', linesplit) > > -- Maurice Bauhahn 2 Meadow Way Dorney Reach MAIDENHEAD SL6 0DS United Kingdom Home Tel: +44(0)1628 626068 Work Tel: +44(0)1932 878404 Home Email: bauhahnm at clara.net Work Email: mbauhahn at brio.com
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4