Working on extension of genericwiki.py plugin for PyBlosxom and I have problems with UTF-8 and RE. When I have this wiki line, it does break URL too early: [http://en.wikipedia.org/wiki/Petr_Chelcický Petr Chelcický's] work(s) into English. and creates [<a href="http://en.wikipedia.org/wiki/Petr_Chel">http://en.wikipedia.org/wiki/Petr_Chel</a>cický Petr Chelcický's] The RE genericwiki uses for parsing this: # WikiName pattern used in your wiki wikinamepattern = r'\b(([A-Z]\w+){2,})\b' # original mailurlpattern = r'mailto\:[\"\-\_\.\w]+\@[\-\_\.\w]+\w' newsurlpattern = r'news\:(?:\w+\.){1,}\w+' fileurlpattern = r'(?:http|https|file|ftp):[/-_.\w-]+[\/\w][?&+=%\w/-_.#]*' [...] # Turn '[xxx:address label]' into labeled link body = re.sub(r'\[(' + fileurlpattern + '|' + mailurlpattern + '|' + newsurlpattern + ')\s+(.+?)\]', r'<a href="\1">\2</a>', body,re.U) I have tried to test RE and UTF-8 in Python generally and the results are even more confusing (done with locale cs_CZ.UTF-8 in konsole): >> locale.getpreferredencoding() 'UTF-8' >>> print re.sub("(\w*)","X","[Chelcický]",re.L) X[X?Xý] >>> print re.sub("(\w*)","X","[Chelcický]",re.UNICODE) X[X?X?X]X >>> I would expect that both print commands should give just plain X, but apparently Python doesn't undestand that. What's the problem? Thanks for any reply, Matej
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4