Similar to a(b+c)d = abd+acd
in maths, you get a(b|c)d = abd|acd
in regular expressions. However, you'll have to be careful if quantifiers are involved.
For example, (a*|b*)
isn't the same as (a|b)*
. Can you reason out why? Here's a railroad diagram to help you out:
Credit: debuggex.com
The difference is that (a*|b*)
only matches same letter sequences like a
, bb
, aaaaaa
, etc. But (a|b)*
can match mixed sequences like ababbba
too. You can also simplify (a|b)*
to [ab]*
since it is just single character alternation in this particular example.
Here's an illustration using Python:
>>> import re
>>> test = ['aa', 'abbaba', 'aaabbb', 'bbbbb', 'abc']
>>> [s for s in test if re.fullmatch(r'(a*|b*)', s)]
['aa', 'bbbbb']
>>> [s for s in test if re.fullmatch(r'(a|b)*', s)]
['aa', 'abbaba', 'aaabbb', 'bbbbb']
Want to learn regular expressions from the basics with plenty of examples and exercises? I've written regexp ebooks for Python, JavaScript, Ruby and CLI tools.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4