Replace values given in to_replace with value.
Values of the Series/DataFrame are replaced with other values dynamically. This differs from updating with .loc
or .iloc
, which require you to specify a location to update with some value.
How to find the values that will be replaced.
numeric, str or regex:
numeric: numeric values equal to to_replace will be replaced with value
str: string exactly matching to_replace will be replaced with value
regex: regexs matching to_replace will be replaced with value
list of str, regex, or numeric:
First, if to_replace and value are both lists, they must be the same length.
Second, if
regex=True
then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesnât matter much for value since there are only a few possible substitution regexes you can use.str, regex and numeric rules apply as above.
dict:
Dicts can be used to specify different replacement values for different existing values. For example,
{'a': 'b', 'y': 'z'}
replaces the value âaâ with âbâ and âyâ with âzâ. To use a dict in this way, the optional value parameter should not be given.For a DataFrame a dict can specify that different values should be replaced in different columns. For example,
{'a': 1, 'b': 'z'}
looks for the value 1 in column âaâ and the value âzâ in column âbâ and replaces these values with whatever is specified in value. The value parameter should not beNone
in this case. You can treat this as a special case of passing two lists except that you are specifying the column to search in.For a DataFrame nested dictionaries, e.g.,
{'a': {'b': np.nan}}
, are read as follows: look in column âaâ for the value âbâ and replace it with NaN. The optional value parameter should not be specified to use a nested dict in this way. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.
None:
This means that the regex argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is also
None
then this must be a nested dictionary or Series.
See the examples section for examples of each of these.
Value to replace any values matching to_replace with. For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.
If True, performs operation inplace and returns None.
Maximum size gap to forward or backward fill.
Deprecated since version 2.1.0.
Whether to interpret to_replace and/or value as regular expressions. Alternatively, this could be a regular expression or a list, dict, or array of regular expressions in which case to_replace must be None
.
The method to use when for replacement, when to_replace is a scalar, list or tuple and value is None
.
Deprecated since version 2.1.0.
Object after replacement.
If regex is not a bool
and to_replace is not None
.
If to_replace is not a scalar, array-like, dict
, or None
If to_replace is a dict
and value is not a list
, dict
, ndarray
, or Series
If to_replace is None
and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series.
When replacing multiple bool
or datetime64
objects and the arguments to to_replace does not match the type of the value being replaced
If a list
or an ndarray
is passed to to_replace and value but they are not the same length.
Notes
Regex substitution is performed under the hood with re.sub
. The rules for substitution for re.sub
are the same.
Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.
When dict is used as the to_replace value, it is like key(s) in the dict are the to_replace part and value(s) in the dict are the value parameter.
Examples
Scalar `to_replace` and `value`
>>> s = pd.Series([1, 2, 3, 4, 5]) >>> s.replace(1, 5) 0 5 1 2 2 3 3 4 4 5 dtype: int64
>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4], ... 'B': [5, 6, 7, 8, 9], ... 'C': ['a', 'b', 'c', 'd', 'e']}) >>> df.replace(0, 5) A B C 0 5 5 a 1 1 6 b 2 2 7 c 3 3 8 d 4 4 9 e
List-like `to_replace`
>>> df.replace([0, 1, 2, 3], 4) A B C 0 4 5 a 1 4 6 b 2 4 7 c 3 4 8 d 4 4 9 e
>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1]) A B C 0 4 5 a 1 3 6 b 2 2 7 c 3 1 8 d 4 4 9 e
>>> s.replace([1, 2], method='bfill') 0 3 1 3 2 3 3 4 4 5 dtype: int64
dict-like `to_replace`
>>> df.replace({0: 10, 1: 100}) A B C 0 10 5 a 1 100 6 b 2 2 7 c 3 3 8 d 4 4 9 e
>>> df.replace({'A': 0, 'B': 5}, 100) A B C 0 100 100 a 1 1 6 b 2 2 7 c 3 3 8 d 4 4 9 e
>>> df.replace({'A': {0: 100, 4: 400}}) A B C 0 100 5 a 1 1 6 b 2 2 7 c 3 3 8 d 4 400 9 e
Regular expression `to_replace`
>>> df = pd.DataFrame({'A': ['bat', 'foo', 'bait'], ... 'B': ['abc', 'bar', 'xyz']}) >>> df.replace(to_replace=r'^ba.$', value='new', regex=True) A B 0 new abc 1 foo new 2 bait xyz
>>> df.replace({'A': r'^ba.$'}, {'A': 'new'}, regex=True) A B 0 new abc 1 foo bar 2 bait xyz
>>> df.replace(regex=r'^ba.$', value='new') A B 0 new abc 1 foo new 2 bait xyz
>>> df.replace(regex={r'^ba.$': 'new', 'foo': 'xyz'}) A B 0 new abc 1 xyz new 2 bait xyz
>>> df.replace(regex=[r'^ba.$', 'foo'], value='new') A B 0 new abc 1 new new 2 bait xyz
Compare the behavior of s.replace({'a': None})
and s.replace('a', None)
to understand the peculiarities of the to_replace parameter:
>>> s = pd.Series([10, 'a', 'a', 'b', 'a'])
When one uses a dict as the to_replace value, it is like the value(s) in the dict are equal to the value parameter. s.replace({'a': None})
is equivalent to s.replace(to_replace={'a': None}, value=None, method=None)
:
>>> s.replace({'a': None}) 0 10 1 None 2 None 3 b 4 None dtype: object
When value
is not explicitly passed and to_replace is a scalar, list or tuple, replace uses the method parameter (default âpadâ) to do the replacement. So this is why the âaâ values are being replaced by 10 in rows 1 and 2 and âbâ in row 4 in this case.
>>> s.replace('a') 0 10 1 10 2 10 3 b 4 b dtype: object
Deprecated since version 2.1.0: The âmethodâ parameter and padding behavior are deprecated.
On the other hand, if None
is explicitly passed for value
, it will be respected:
>>> s.replace('a', None) 0 10 1 None 2 None 3 b 4 None dtype: object
Changed in version 1.4.0: Previously the explicit
None
was silently ignored.
When regex=True
, value
is not None
and to_replace is a string, the replacement will be applied in all columns of the DataFrame.
>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4], ... 'B': ['a', 'b', 'c', 'd', 'e'], ... 'C': ['f', 'g', 'h', 'i', 'j']})
>>> df.replace(to_replace='^[a-g]', value='e', regex=True) A B C 0 0 e e 1 1 e e 2 2 e h 3 3 e i 4 4 e j
If value
is not None
and to_replace is a dictionary, the dictionary keys will be the DataFrame columns that the replacement will be applied.
>>> df.replace(to_replace={'B': '^[a-c]', 'C': '^[h-j]'}, value='e', regex=True) A B C 0 0 e f 1 1 e g 2 2 e e 3 3 d e 4 4 e e
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4