On 11:59 am, eckhardt at satorlaser.com wrote: >Sorry, I wasn't clear enough. I'll try to explain further... > >Let's assume we have a filename like this: > > 0xc2 0xa9 0x2f 0x7f > >The first two bytes are the copyright sign encoded in UTF-8, followed >by a >slash (0x2f, path separator) and a character encoded in an unknown >codepage >(0x7f is not ASCII!). Originally I thought that this was a valid idea, but then it became clear that this could be a problem. Consider a filename which includes a UTF-8 encoding of a PUA code point. >I'm not sure if the use I proposed is correct according to the intended >use of >the PUA. I know that ideally no such string would escape from Python, >i.e. it >should only be visible internally. I would guess that that is something >the >PUA was intended for. Viewing the PUA with GNOME charmap, I can see that many code points there have character renderings on my Ubuntu system. I have to assume, therefore, that there are other (and potentially conflicting) uses for this unicode feature.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4