RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/pandas-dev/pandas/issues/15446 below:

Should str.replace accept a compiled expression? · Issue #15446 · pandas-dev/pandas · GitHub

The current solution is to call str.replace(<compiled_re>.pattern, flags=<compiled_re>.flags) which is relatively ugly and verbose in my opnion.

Here's a contrived example of removing stopwords and normalizing whitespace afterwards:

import pandas as pd
import re

some_names = pd.Series(["three weddings and a funeral", "the big lebowski", "florence and the machine"])

stopwords = ["the", "a", "and"]
stopwords_re = re.compile(r"(\s+)?\b({})\b(\s+)?".format("|".join(stopwords), re.IGNORECASE)
whitespace_re = re.compile(r"\s+")

# desired code:
# some_names.str.replace(stopwords_re, " ").str.strip().str.replace(whitespace_re, " ")

# actual code:
some_names.\
    str.replace(stopwords_re.pattern, " ", flags=stopwords_re.flags).\
    str.strip().str.replace(whitespace_re.pattern, " ", flags=whitespace_re.flags)

Why do I think this is better?

It's nice to have commonly used regular expressions compiled and to carry their flags around with them (and also allows the use of "verbose" regular expressions)
It's not that compiled regular expressions should quack like strings... it's that in this case we're making strings quack like compiled regular expressions, but at the same time not letting those compiled regular expressions quack their own quack.

Is there a good reason not to implement this?

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4