A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/mrabarnett/mrab-regex/issues/87 below:

Allow duplicate names of groups · Issue #87 · mrabarnett/mrab-regex · GitHub

Original report by Marcin Wojnarski (Bitbucket: mwojnars, GitHub: mwojnars).

Hi,

Currently, duplicate names are not allowed, for example this code raises an exception because group "a" is defined twice:

>>> regex.match(r'(?<a>here)? or (?<a>here)?', "here or here")
error: duplicate group

I suspect this design is a legacy after standard 're' module which didn't allow multiple values, so it was somehow natural to reject duplicate group names, too. But now, in 'regex' module which can capture repeated values, it would be natural to accept also duplicate group names and merge values extracted from all same-named groups into one list.

This enhancement would allow parsing loose formats, where a given value may appear in any of several different places in the text and we must prepare a regex that has groups in all these places. Usually, we would expect that only one place is matched (groups are optional like in regex above), but we can't say in advance which one and - for convenience - we'd like to use the same name for all these places, to avoid manual merging of several groups afterwards. In other use cases, it may be possible that more than 1 group matches and we want to extract all the matched values as a single list.

I think this enhancement would fit very well to the concept of repeated captures that's already present in 'regex'.

Do any other regex implementations have something like this?

I don't know.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4