in article mailman.988058444.8527.python-list at python.org, Sean 'Shaleh' Perry at shaleh at valinux.com wrote on 4/23/01 4:40 PM: > what about simple cases like 'I' or 'V'? 'IV'? This is a good idea. Maybe all the cases where the integer translates to a single-character Roman numeral (1, 5, 10, 50, &c). And the numbers from 1 to 10. The set of known values was just a random sampling throughout the domain; perhaps a little less randomness is in order. > Also, I find > > #Define pattern to detect valid Roman numerals > romanNumeralPattern = \ > re.compile('^M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$') > > confusing. "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." (Jamie Zawinski) > As I read this, I see 'begins with optionally up to 3 M's, then > either CM, CD, or some permutation of D and C'. Yet this appears to match > something as simply as V. Why? It matches 'V' because it matches 'D?C?C?C?' with '', then 'L?X?X?X?' with '', then 'V?I?I?I?' with 'V'. The pattern is meant to match all and only the valid Roman numerals for the numbers 1..3999. (It does, in fact, match all of them; I believe it matches only them.) It exists so we can validate the input to fromRoman() up front. > > Also, perhaps use of the {} operator in the regex's might help. Maybe not. > You're absolutely right. The regular expression can be rewritten as romanNumeralPattern = \ re.compile('^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$') Is this more efficient? It is more elegant (as regular expressions go)? Any regular expression experts care to comment? -M
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4