A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://learnbyexample.github.io/mini/regexp-gotcha-1/ below:

Regexp gotcha 1: grouping common portions

Similar to a(b+c)d = abd+acd in maths, you get a(b|c)d = abd|acd in regular expressions. However, you'll have to be careful if quantifiers are involved.

For example, (a*|b*) isn't the same as (a|b)*. Can you reason out why? Here's a railroad diagram to help you out:

Credit: debuggex.com

The difference is that (a*|b*) only matches same letter sequences like a, bb, aaaaaa, etc. But (a|b)* can match mixed sequences like ababbba too. You can also simplify (a|b)* to [ab]* since it is just single character alternation in this particular example.

Here's an illustration using Python:

>>> import re

>>> test = ['aa', 'abbaba', 'aaabbb', 'bbbbb', 'abc']

>>> [s for s in test if re.fullmatch(r'(a*|b*)', s)]
['aa', 'bbbbb']

>>> [s for s in test if re.fullmatch(r'(a|b)*', s)]
['aa', 'abbaba', 'aaabbb', 'bbbbb']

Want to learn regular expressions from the basics with plenty of examples and exercises? I've written regexp ebooks for Python, JavaScript, Ruby and CLI tools.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4