The current solution is to call str.replace(<compiled_re>.pattern, flags=<compiled_re>.flags)
which is relatively ugly and verbose in my opnion.
Here's a contrived example of removing stopwords and normalizing whitespace afterwards:
import pandas as pd import re some_names = pd.Series(["three weddings and a funeral", "the big lebowski", "florence and the machine"]) stopwords = ["the", "a", "and"] stopwords_re = re.compile(r"(\s+)?\b({})\b(\s+)?".format("|".join(stopwords), re.IGNORECASE) whitespace_re = re.compile(r"\s+") # desired code: # some_names.str.replace(stopwords_re, " ").str.strip().str.replace(whitespace_re, " ") # actual code: some_names.\ str.replace(stopwords_re.pattern, " ", flags=stopwords_re.flags).\ str.strip().str.replace(whitespace_re.pattern, " ", flags=whitespace_re.flags)
Why do I think this is better?
Is there a good reason not to implement this?
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4