Heiko Wundram <me+python-dev at modelnine.org> wrote: >>> Don't get me wrong, I personally find this functionality very, very >>> interesting (I'm +0.5 on adding it in some way or another), >>> especially as a >>> part of the standard library (not necessarily as an extension to >>> .split()). >> >> It's already there. It's called shlex.split(), and follows the >> semantic of a standard UNIX shell, including escaping and other >> things. > > I knew about *nix shell escaping, but that isn't necessarily what I > find in input I have to process (although generally it's what you > see, yeah). That's why I said that it would be interesting to have a > generalized method, sort of like the csv module but only for string > "interpretation", which takes a dialect, and parses a string for the > specified dialect. > > Remember, there also escaping by doubling the end of string marker > (for example, '""this is not a single argument""'.split() should be > parsed as ['"this','is','not','a',....]), and I know programs that > use exactly this format for file storage. I never met this one. Anyway, I don't think it's harder than: >>> def mysplit(s): ... """Allow double quotes to escape a quotes""" ... return shlex.split(s.replace(r'""', r'\"')) ... >>> mysplit('""This is not a single argument""') ['"This', 'is', 'not', 'a', 'single', 'argument"'] > Maybe, one could simply export the function the csv module uses to > parse the actual data fields as a more prominent method, which > accepts keyword arguments, instead of a Dialect-derived class. I think you're over-generalizing a very simple problem. I believe that str.split, shlex.split, and some simple variation like the one above (maybe using regular expressions to do the substitution if you have slightly more complex cases) can handle 99.99% of the splitting cases. They surely handle 100% of those I myself had to parse. I believe the standard library already covers common usage. There will surely be cases where a custom lexer/splitetr will have to be written, but that's life :) Giovanni Bajo
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4