Apply a function along an axis of the DataFrame.
Objects passed to the function are Series objects whose index is either the DataFrameâs index (axis=0
) or the DataFrameâs columns (axis=1
). By default (result_type=None
), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument. The return type of the applied function is inferred based on the first computed result obtained after applying the function to a Series object.
Function to apply to each column or row.
Axis along which the function is applied:
0 or âindexâ: apply function to each column.
1 or âcolumnsâ: apply function to each row.
Determines if row or column is passed as a Series or ndarray object:
False
: passes each row or column as a Series to the function.
True
: the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.
These only act when axis=1
(columns):
âexpandâ : list-like results will be turned into columns.
âreduceâ : returns a Series if possible rather than expanding list-like results. This is the opposite of âexpandâ.
âbroadcastâ : results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.
The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.
Positional arguments to pass to func in addition to the array/series.
Only has an effect when func
is a listlike or dictlike of funcs and the func isnât a string. If âcompatâ, will if possible first translate the func into pandas methods (e.g. Series().apply(np.sum)
will be translated to Series().sum()
). If that doesnât work, will try call to apply again with by_row=True
and if that fails, will call apply again with by_row=False
(backward compatible). If False, the funcs will be passed the whole Series at once.
Added in version 2.1.0.
Choose the execution engine to use. If not provided the function will be executed by the regular Python interpreter.
Other options include JIT compilers such Numba and Bodo, which in some cases can speed up the execution. To use an executor you can provide the decorators numba.jit
, numba.njit
or bodo.jit
. You can also provide the decorator with parameters, like numba.jit(nogit=True)
.
Not all functions can be executed with all execution engines. In general, JIT compilers will require type stability in the function (no variable should change data type during the execution). And not all pandas and NumPy APIs are supported. Check the engine documentation [1] and [2] for limitations.
Warning
String parameters will stop being supported in a future pandas version.
Added in version 2.2.0.
Pass keyword arguments to the engine. This is currently only used by the numba engine, see the documentation for the engine argument for more information.
Additional keyword arguments to pass as keywords arguments to func.
Result of applying func
along the given axis of the DataFrame.
Notes
Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.
References
[2]Bodo documentation <https://docs.bodo.ai/latest/>/
Examples
>>> df = pd.DataFrame([[4, 9]] * 3, columns=["A", "B"]) >>> df A B 0 4 9 1 4 9 2 4 9
Using a numpy universal function (in this case the same as np.sqrt(df)
):
>>> df.apply(np.sqrt) A B 0 2.0 3.0 1 2.0 3.0 2 2.0 3.0
Using a reducing function on either axis
>>> df.apply(np.sum, axis=0) A 12 B 27 dtype: int64
>>> df.apply(np.sum, axis=1) 0 13 1 13 2 13 dtype: int64
Returning a list-like will result in a Series
>>> df.apply(lambda x: [1, 2], axis=1) 0 [1, 2] 1 [1, 2] 2 [1, 2] dtype: object
Passing result_type='expand'
will expand list-like results to columns of a Dataframe
>>> df.apply(lambda x: [1, 2], axis=1, result_type="expand") 0 1 0 1 2 1 1 2 2 1 2
Returning a Series inside the function is similar to passing result_type='expand'
. The resulting column names will be the Series index.
>>> df.apply(lambda x: pd.Series([1, 2], index=["foo", "bar"]), axis=1) foo bar 0 1 2 1 1 2 2 1 2
Passing result_type='broadcast'
will ensure the same shape result, whether list-like or scalar is returned by the function, and broadcast it along the axis. The resulting column names will be the originals.
>>> df.apply(lambda x: [1, 2], axis=1, result_type="broadcast") A B 0 1 2 1 1 2 2 1 2
Advanced users can speed up their code by using a Just-in-time (JIT) compiler with apply
. The main JIT compilers available for pandas are Numba and Bodo. In general, JIT compilation is only possible when the function passed to apply
has type stability (variables in the function do not change their type during the execution).
>>> import bodo >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit)
Note that JIT compilation is only recommended for functions that take a significant amount of time to run. Fast functions are unlikely to run faster with JIT compilation.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4