These are the changes in pandas 2.1.0. See Release notes for a full changelog including other versions of pandas.
Enhancements# PyArrow will become a required dependency with pandas 3.0#PyArrow will become a required dependency of pandas starting with pandas 3.0. This decision was made based on PDEP 10.
This will enable more changes that are hugely beneficial to pandas users, including but not limited to:
inferring strings as PyArrow backed strings by default enabling a significant reduction of the memory footprint and huge performance improvements.
inferring more complex dtypes with PyArrow by default, like Decimal
, lists
, bytes
, structured data
and more.
Better interoperability with other libraries that depend on Apache Arrow.
We are collecting feedback on this decision here.
Avoid NumPy object dtype for strings by default#Previously, all strings were stored in columns with NumPy object dtype by default. This release introduces an option future.infer_string
that infers all strings as PyArrow backed strings with dtype "string[pyarrow_numpy]"
instead. This is a new string dtype implementation that follows NumPy semantics in comparison operations and will return np.nan
as the missing value indicator. Setting the option will also infer the dtype "string"
as a StringDtype
with storage set to "pyarrow_numpy"
, ignoring the value behind the option mode.string_storage
.
This option only works if PyArrow is installed. PyArrow backed strings have a significantly reduced memory footprint and provide a big performance improvement compared to NumPy object (GH 54430).
The option can be enabled with:
pd.options.future.infer_string = True
This behavior will become the default with pandas 3.0.
DataFrame reductions preserve extension dtypes#In previous versions of pandas, the results of DataFrame reductions (DataFrame.sum()
DataFrame.mean()
etc.) had NumPy dtypes, even when the DataFrames were of extension dtypes. pandas can now keep the dtypes when doing reductions over DataFrame columns with a common dtype (GH 52788).
Old Behavior
In [1]: df = pd.DataFrame({"a": [1, 1, 2, 1], "b": [np.nan, 2.0, 3.0, 4.0]}, dtype="Int64") In [2]: df.sum() Out[2]: a 5 b 9 dtype: int64 In [3]: df = df.astype("int64[pyarrow]") In [4]: df.sum() Out[4]: a 5 b 9 dtype: int64
New Behavior
In [1]: df = pd.DataFrame({"a": [1, 1, 2, 1], "b": [np.nan, 2.0, 3.0, 4.0]}, dtype="Int64") In [2]: df.sum() Out[2]: a 5 b 9 dtype: Int64 In [3]: df = df.astype("int64[pyarrow]") In [4]: df.sum() Out[4]: a 5 b 9 dtype: int64[pyarrow]
Notice that the dtype is now a masked dtype and PyArrow dtype, respectively, while previously it was a NumPy integer dtype.
To allow DataFrame reductions to preserve extension dtypes, ExtensionArray._reduce()
has gotten a new keyword parameter keepdims
. Calling ExtensionArray._reduce()
with keepdims=True
should return an array of length 1 along the reduction axis. In order to maintain backward compatibility, the parameter is not required, but will it become required in the future. If the parameter is not found in the signature, DataFrame reductions can not preserve extension dtypes. Also, if the parameter is not found, a FutureWarning
will be emitted and type checkers like mypy may complain about the signature not being compatible with ExtensionArray._reduce()
.
Series.transform()
not respecting Copy-on-Write when func
modifies Series
inplace (GH 53747)
Calling Index.values()
will now return a read-only NumPy array (GH 53704)
Setting a Series
into a DataFrame
now creates a lazy instead of a deep copy (GH 53142)
The DataFrame
constructor, when constructing a DataFrame from a dictionary of Index objects and specifying copy=False
, will now use a lazy copy of those Index objects for the columns of the DataFrame (GH 52947)
A shallow copy of a Series or DataFrame (df.copy(deep=False)
) will now also return a shallow copy of the rows/columns Index
objects instead of only a shallow copy of the data, i.e. the index of the result is no longer identical (df.copy(deep=False).index is df.index
is no longer True) (GH 53721)
DataFrame.head()
and DataFrame.tail()
will now return deep copies (GH 54011)
Add lazy copy mechanism to DataFrame.eval()
(GH 53746)
Trying to operate inplace on a temporary column selection (for example, df["a"].fillna(100, inplace=True)
) will now always raise a warning when Copy-on-Write is enabled. In this mode, operating inplace like this will never work, since the selection behaves as a temporary copy. This holds true for:
DataFrame.update / Series.update
DataFrame.fillna / Series.fillna
DataFrame.replace / Series.replace
DataFrame.clip / Series.clip
DataFrame.where / Series.where
DataFrame.mask / Series.mask
DataFrame.interpolate / Series.interpolate
DataFrame.ffill / Series.ffill
DataFrame.bfill / Series.bfill
DataFrame.map()
method and support for ExtensionArrays#
The DataFrame.map()
been added and DataFrame.applymap()
has been deprecated. DataFrame.map()
has the same functionality as DataFrame.applymap()
, but the new name better communicates that this is the DataFrame
version of Series.map()
(GH 52353).
When given a callable, Series.map()
applies the callable to all elements of the Series
. Similarly, DataFrame.map()
applies the callable to all elements of the DataFrame
, while Index.map()
applies the callable to all elements of the Index
.
Frequently, it is not desirable to apply the callable to nan-like values of the array and to avoid doing that, the map
method could be called with na_action="ignore"
, i.e. ser.map(func, na_action="ignore")
. However, na_action="ignore"
was not implemented for many ExtensionArray
and Index
types and na_action="ignore"
did not work correctly for any ExtensionArray
subclass except the nullable numeric ones (i.e. with dtype Int64
etc.).
na_action="ignore"
now works for all array types (GH 52219, GH 51645, GH 51809, GH 51936, GH 52033; GH 52096).
Previous behavior:
In [1]: ser = pd.Series(["a", "b", np.nan], dtype="category") In [2]: ser.map(str.upper, na_action="ignore") NotImplementedError In [3]: df = pd.DataFrame(ser) In [4]: df.applymap(str.upper, na_action="ignore") # worked for DataFrame 0 0 A 1 B 2 NaN In [5]: idx = pd.Index(ser) In [6]: idx.map(str.upper, na_action="ignore") TypeError: CategoricalIndex.map() got an unexpected keyword argument 'na_action'
New behavior:
In [5]: ser = pd.Series(["a", "b", np.nan], dtype="category") In [6]: ser.map(str.upper, na_action="ignore") Out[6]: 0 A 1 B 2 NaN dtype: category Categories (2, str): ['A', 'B'] In [7]: df = pd.DataFrame(ser) In [8]: df.map(str.upper, na_action="ignore") Out[8]: 0 0 A 1 B 2 NaN In [9]: idx = pd.Index(ser) In [10]: idx.map(str.upper, na_action="ignore") Out[10]: CategoricalIndex(['A', 'B', nan], categories=['A', 'B'], ordered=False, dtype='category')
Also, note that Categorical.map()
implicitly has had its na_action
set to "ignore"
by default. This has been deprecated and the default for Categorical.map()
will change to na_action=None
, consistent with all the other array types.
DataFrame.stack()
#
pandas has reimplemented DataFrame.stack()
. To use the new implementation, pass the argument future_stack=True
. This will become the only option in pandas 3.0.
The previous implementation had two main behavioral downsides.
The previous implementation would unnecessarily introduce NA values into the result. The user could have NA values automatically removed by passing dropna=True
(the default), but doing this could also remove NA values from the result that existed in the input. See the examples below.
The previous implementation with sort=True
(the default) would sometimes sort part of the resulting index, and sometimes not. If the inputâs columns are not a MultiIndex
, then the resulting index would never be sorted. If the columns are a MultiIndex
, then in most cases the level(s) in the resulting index that come from stacking the column level(s) would be sorted. In rare cases such level(s) would be sorted in a non-standard order, depending on how the columns were created.
The new implementation (future_stack=True
) will no longer unnecessarily introduce NA values when stacking multiple levels and will never sort. As such, the arguments dropna
and sort
are not utilized and must remain unspecified when using future_stack=True
. These arguments will be removed in the next major release.
In [11]: columns = pd.MultiIndex.from_tuples([("B", "d"), ("A", "c")]) In [12]: df = pd.DataFrame([[0, 2], [1, 3]], index=["z", "y"], columns=columns) In [13]: df Out[13]: B A d c z 0 2 y 1 3
In the previous version (future_stack=False
), the default of dropna=True
would remove unnecessarily introduced NA values but still coerce the dtype to float64
in the process. In the new version, no NAs are introduced and so there is no coercion of the dtype.
In [14]: df.stack([0, 1], future_stack=False, dropna=True) Out[14]: z A c 2.0 B d 0.0 y A c 3.0 B d 1.0 dtype: float64 In [15]: df.stack([0, 1], future_stack=True) Out[15]: z B d 0 A c 2 y B d 1 A c 3 dtype: int64
If the input contains NA values, the previous version would drop those as well with dropna=True
or introduce new NA values with dropna=False
. The new version persists all values from the input.
In [16]: df = pd.DataFrame([[0, 2], [np.nan, np.nan]], columns=columns) In [17]: df Out[17]: B A d c 0 0.0 2.0 1 NaN NaN In [18]: df.stack([0, 1], future_stack=False, dropna=True) Out[18]: 0 A c 2.0 B d 0.0 dtype: float64 In [19]: df.stack([0, 1], future_stack=False, dropna=False) Out[19]: 0 A d NaN c 2.0 B d 0.0 c NaN 1 A d NaN c NaN B d NaN c NaN dtype: float64 In [20]: df.stack([0, 1], future_stack=True) Out[20]: 0 B d 0.0 A c 2.0 1 B d NaN A c NaN dtype: float64Other enhancements#
Series.ffill()
and Series.bfill()
are now supported for objects with IntervalDtype
(GH 54247)
Added filters
parameter to read_parquet()
to filter out data, compatible with both engines
(GH 53212)
Categorical.map()
and CategoricalIndex.map()
now have a na_action
parameter. Categorical.map()
implicitly had a default value of "ignore"
for na_action
. This has formally been deprecated and will be changed to None
in the future. Also notice that Series.map()
has default na_action=None
and calls to series with categorical data will now use na_action=None
unless explicitly set otherwise (GH 44279)
api.extensions.ExtensionArray
now has a map()
method (GH 51809)
DataFrame.applymap()
now uses the map()
method of underlying api.extensions.ExtensionArray
instances (GH 52219)
MultiIndex.sort_values()
now supports na_position
(GH 51612)
MultiIndex.sortlevel()
and Index.sortlevel()
gained a new keyword na_position
(GH 51612)
arrays.DatetimeArray.map()
, arrays.TimedeltaArray.map()
and arrays.PeriodArray.map()
can now take a na_action
argument (GH 51644)
arrays.SparseArray.map()
now supports na_action
(GH 52096).
pandas.read_html()
now supports the storage_options
keyword when used with a URL, allowing users to add headers to the outbound HTTP request (GH 49944)
Add Index.diff()
and Index.round()
(GH 19708)
Add "latex-math"
as an option to the escape
argument of Styler
which will not escape all characters between "\("
and "\)"
during formatting (GH 51903)
Add dtype of categories to repr
information of CategoricalDtype
(GH 52179)
Adding engine_kwargs
parameter to read_excel()
(GH 52214)
Classes that are useful for type-hinting have been added to the public API in the new submodule pandas.api.typing
(GH 48577)
Implemented Series.dt.is_month_start
, Series.dt.is_month_end
, Series.dt.is_year_start
, Series.dt.is_year_end
, Series.dt.is_quarter_start
, Series.dt.is_quarter_end
, Series.dt.days_in_month
, Series.dt.unit
, Series.dt.normalize
, Series.dt.day_name()
, Series.dt.month_name()
, Series.dt.tz_convert()
for ArrowDtype
with pyarrow.timestamp
(GH 52388, GH 51718)
DataFrameGroupBy.agg()
and DataFrameGroupBy.transform()
now support grouping by multiple keys when the index is not a MultiIndex
for engine="numba"
(GH 53486)
SeriesGroupBy.agg()
and DataFrameGroupBy.agg()
now support passing in multiple functions for engine="numba"
(GH 53486)
SeriesGroupBy.transform()
and DataFrameGroupBy.transform()
now support passing in a string as the function for engine="numba"
(GH 53579)
DataFrame.stack()
gained the sort
keyword to dictate whether the resulting MultiIndex
levels are sorted (GH 15105)
DataFrame.unstack()
gained the sort
keyword to dictate whether the resulting MultiIndex
levels are sorted (GH 15105)
Series.explode()
now supports PyArrow-backed list types (GH 53602)
Series.str.join()
now supports ArrowDtype(pa.string())
(GH 53646)
Add validate
parameter to Categorical.from_codes()
(GH 50975)
Added ExtensionArray.interpolate()
used by Series.interpolate()
and DataFrame.interpolate()
(GH 53659)
Added engine_kwargs
parameter to DataFrame.to_excel()
(GH 53220)
Implemented api.interchange.from_dataframe()
for DatetimeTZDtype
(GH 54239)
Implemented __from_arrow__
on DatetimeTZDtype
(GH 52201)
Implemented __pandas_priority__
to allow custom types to take precedence over DataFrame
, Series
, Index
, or ExtensionArray
for arithmetic operations, see the developer guide (GH 48347)
Improve error message when having incompatible columns using DataFrame.merge()
(GH 51861)
Improve error message when setting DataFrame
with wrong number of columns through DataFrame.isetitem()
(GH 51701)
Improved error handling when using DataFrame.to_json()
with incompatible index
and orient
arguments (GH 52143)
Improved error message when creating a DataFrame with empty data (0 rows), no index and an incorrect number of columns (GH 52084)
Improved error message when providing an invalid index
or offset
argument to VariableOffsetWindowIndexer
(GH 54379)
Let DataFrame.to_feather()
accept a non-default Index
and non-string column names (GH 51787)
Added a new parameter by_row
to Series.apply()
and DataFrame.apply()
. When set to False
the supplied callables will always operate on the whole Series or DataFrame (GH 53400, GH 53601).
DataFrame.shift()
and Series.shift()
now allow shifting by multiple periods by supplying a list of periods (GH 44424)
Groupby aggregations with numba
(such as DataFrameGroupBy.sum()
) now can preserve the dtype of the input instead of casting to float64
(GH 44952)
Improved error message when DataFrameGroupBy.agg()
failed (GH 52930)
Many read/to_* functions, such as DataFrame.to_pickle()
and read_csv()
, support forwarding compression arguments to lzma.LZMAFile
(GH 52979)
Reductions Series.argmax()
, Series.argmin()
, Series.idxmax()
, Series.idxmin()
, Index.argmax()
, Index.argmin()
, DataFrame.idxmax()
, DataFrame.idxmin()
are now supported for object-dtype (GH 4279, GH 18021, GH 40685, GH 43697)
DataFrame.to_parquet()
and read_parquet()
will now write and read attrs
respectively (GH 54346)
Index.all()
and Index.any()
with floating dtypes and timedelta64 dtypes no longer raise TypeError
, matching the Series.all()
and Series.any()
behavior (GH 54566)
Series.cummax()
, Series.cummin()
and Series.cumprod()
are now supported for pyarrow dtypes with pyarrow version 13.0 and above (GH 52085)
Added support for the DataFrame Consortium Standard (GH 54383)
Performance improvement in DataFrameGroupBy.quantile()
and SeriesGroupBy.quantile()
(GH 51722)
PyArrow-backed integer dtypes now support bitwise operations (GH 54495)
pandas 2.1.0 supports Python 3.9 and higher.
Increased minimum versions for dependencies#Some minimum supported versions of dependencies were updated. If installed, we now require:
For optional libraries the general recommendation is to use the latest version.
See Dependencies and Optional dependencies for more.
Other API changes#arrays.PandasArray
has been renamed NumpyExtensionArray
and the attached dtype name changed from PandasDtype
to NumpyEADtype
; importing PandasArray
still works until the next major version (GH 53694)
PDEP-6: https://pandas.pydata.org/pdeps/0006-ban-upcasting.html
Setitem-like operations on Series (or DataFrame columns) which silently upcast the dtype are deprecated and show a warning. Examples of affected operations are:
ser.fillna('foo', inplace=True)
ser.where(ser.isna(), 'foo', inplace=True)
ser.iloc[indexer] = 'foo'
ser.loc[indexer] = 'foo'
df.iloc[indexer, 0] = 'foo'
df.loc[indexer, 'a'] = 'foo'
ser[indexer] = 'foo'
where ser
is a Series
, df
is a DataFrame
, and indexer
could be a slice, a mask, a single value, a list or array of values, or any other allowed indexer.
In a future version, these will raise an error and you should cast to a common dtype first.
Previous behavior:
In [1]: ser = pd.Series([1, 2, 3]) In [2]: ser Out[2]: 0 1 1 2 2 3 dtype: int64 In [3]: ser[0] = 'not an int64' In [4]: ser Out[4]: 0 not an int64 1 2 2 3 dtype: object
New behavior:
In [1]: ser = pd.Series([1, 2, 3]) In [2]: ser Out[2]: 0 1 1 2 2 3 dtype: int64 In [3]: ser[0] = 'not an int64' FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'not an int64' has dtype incompatible with int64, please explicitly cast to a compatible dtype first. In [4]: ser Out[4]: 0 not an int64 1 2 2 3 dtype: object
To retain the current behaviour, in the case above you could cast ser
to object
dtype first:
In [21]: ser = pd.Series([1, 2, 3]) In [22]: ser = ser.astype('object') In [23]: ser[0] = 'not an int64' In [24]: ser Out[24]: 0 not an int64 1 2 2 3 dtype: object
Depending on the use-case, it might be more appropriate to cast to a different dtype. In the following, for example, we cast to float64
:
In [25]: ser = pd.Series([1, 2, 3]) In [26]: ser = ser.astype('float64') In [27]: ser[0] = 1.1 In [28]: ser Out[28]: 0 1.1 1 2.0 2 3.0 dtype: float64
For further reading, please see https://pandas.pydata.org/pdeps/0006-ban-upcasting.html.
Deprecated parsing datetimes with mixed time zones#Parsing datetimes with mixed time zones is deprecated and shows a warning unless user passes utc=True
to to_datetime()
(GH 50887)
Previous behavior:
In [7]: data = ["2020-01-01 00:00:00+06:00", "2020-01-01 00:00:00+01:00"] In [8]: pd.to_datetime(data, utc=False) Out[8]: Index([2020-01-01 00:00:00+06:00, 2020-01-01 00:00:00+01:00], dtype='object')
New behavior:
In [9]: pd.to_datetime(data, utc=False) FutureWarning: In a future version of pandas, parsing datetimes with mixed time zones will raise a warning unless `utc=True`. Please specify `utc=True` to opt in to the new behaviour and silence this warning. To create a `Series` with mixed offsets and `object` dtype, please use `apply` and `datetime.datetime.strptime`. Index([2020-01-01 00:00:00+06:00, 2020-01-01 00:00:00+01:00], dtype='object')
In order to silence this warning and avoid an error in a future version of pandas, please specify utc=True
:
In [29]: data = ["2020-01-01 00:00:00+06:00", "2020-01-01 00:00:00+01:00"] In [30]: pd.to_datetime(data, utc=True) Out[30]: DatetimeIndex(['2019-12-31 18:00:00+00:00', '2019-12-31 23:00:00+00:00'], dtype='datetime64[s, UTC]', freq=None)
To create a Series
with mixed offsets and object
dtype, please use apply
and datetime.datetime.strptime
:
In [31]: import datetime as dt In [32]: data = ["2020-01-01 00:00:00+06:00", "2020-01-01 00:00:00+01:00"] In [33]: pd.Series(data).apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S%z')) Out[33]: 0 2020-01-01 00:00:00+06:00 1 2020-01-01 00:00:00+01:00 dtype: objectOther Deprecations#
Deprecated DataFrameGroupBy.dtypes
, check dtypes
on the underlying object instead (GH 51045)
Deprecated DataFrame._data
and Series._data
, use public APIs instead (GH 33333)
Deprecated concat()
behavior when any of the objects being concatenated have length 0; in the past the dtypes of empty objects were ignored when determining the resulting dtype, in a future version they will not (GH 39122)
Deprecated Categorical.to_list()
, use obj.tolist()
instead (GH 51254)
Deprecated DataFrameGroupBy.all()
and DataFrameGroupBy.any()
with datetime64 or PeriodDtype
values, matching the Series
and DataFrame
deprecations (GH 34479)
Deprecated axis=1
in DataFrame.ewm()
, DataFrame.rolling()
, DataFrame.expanding()
, transpose before calling the method instead (GH 51778)
Deprecated axis=1
in DataFrame.groupby()
and in Grouper
constructor, do frame.T.groupby(...)
instead (GH 51203)
Deprecated broadcast_axis
keyword in Series.align()
and DataFrame.align()
, upcast before calling align
with left = DataFrame({col: left for col in right.columns}, index=right.index)
(GH 51856)
Deprecated downcast
keyword in Index.fillna()
(GH 53956)
Deprecated fill_method
and limit
keywords in DataFrame.pct_change()
, Series.pct_change()
, DataFrameGroupBy.pct_change()
, and SeriesGroupBy.pct_change()
, explicitly call e.g. DataFrame.ffill()
or DataFrame.bfill()
before calling pct_change
instead (GH 53491)
Deprecated method
, limit
, and fill_axis
keywords in DataFrame.align()
and Series.align()
, explicitly call DataFrame.fillna()
or Series.fillna()
on the alignment results instead (GH 51856)
Deprecated quantile
keyword in Rolling.quantile()
and Expanding.quantile()
, renamed to q
instead (GH 52550)
Deprecated accepting slices in DataFrame.take()
, call obj[slicer]
or pass a sequence of integers instead (GH 51539)
Deprecated behavior of DataFrame.idxmax()
, DataFrame.idxmin()
, Series.idxmax()
, Series.idxmin()
in with all-NA entries or any-NA and skipna=False
; in a future version these will raise ValueError
(GH 51276)
Deprecated explicit support for subclassing Index
(GH 45289)
Deprecated making functions given to Series.agg()
attempt to operate on each element in the Series
and only operate on the whole Series
if the elementwise operations failed. In the future, functions given to Series.agg()
will always operate on the whole Series
only. To keep the current behavior, use Series.transform()
instead (GH 53325)
Deprecated making the functions in a list of functions given to DataFrame.agg()
attempt to operate on each element in the DataFrame
and only operate on the columns of the DataFrame
if the elementwise operations failed. To keep the current behavior, use DataFrame.transform()
instead (GH 53325)
Deprecated passing a DataFrame
to DataFrame.from_records()
, use DataFrame.set_index()
or DataFrame.drop()
instead (GH 51353)
Deprecated silently dropping unrecognized timezones when parsing strings to datetimes (GH 18702)
Deprecated the axis
keyword in DataFrame.ewm()
, Series.ewm()
, DataFrame.rolling()
, Series.rolling()
, DataFrame.expanding()
, Series.expanding()
(GH 51778)
Deprecated the axis
keyword in DataFrame.resample()
, Series.resample()
(GH 51778)
Deprecated the downcast
keyword in Series.interpolate()
, DataFrame.interpolate()
, Series.fillna()
, DataFrame.fillna()
, Series.ffill()
, DataFrame.ffill()
, Series.bfill()
, DataFrame.bfill()
(GH 40988)
Deprecated the behavior of concat()
with both len(keys) != len(objs)
, in a future version this will raise instead of truncating to the shorter of the two sequences (GH 43485)
Deprecated the behavior of Series.argsort()
in the presence of NA values; in a future version these will be sorted at the end instead of giving -1 (GH 54219)
Deprecated the default of observed=False
in DataFrame.groupby()
and Series.groupby()
; this will default to True
in a future version (GH 43999)
Deprecating pinning group.name
to each group in SeriesGroupBy.aggregate()
aggregations; if your operation requires utilizing the groupby keys, iterate over the groupby object instead (GH 41090)
Deprecated the axis
keyword in DataFrameGroupBy.idxmax()
, DataFrameGroupBy.idxmin()
, DataFrameGroupBy.fillna()
, DataFrameGroupBy.take()
, DataFrameGroupBy.skew()
, DataFrameGroupBy.rank()
, DataFrameGroupBy.cumprod()
, DataFrameGroupBy.cumsum()
, DataFrameGroupBy.cummax()
, DataFrameGroupBy.cummin()
, DataFrameGroupBy.pct_change()
, DataFrameGroupBy.diff()
, DataFrameGroupBy.shift()
, and DataFrameGroupBy.corrwith()
; for axis=1
operate on the underlying DataFrame
instead (GH 50405, GH 51046)
Deprecated DataFrameGroupBy
with as_index=False
not including groupings in the result when they are not columns of the DataFrame (GH 49519)
Deprecated is_categorical_dtype()
, use isinstance(obj.dtype, pd.CategoricalDtype)
instead (GH 52527)
Deprecated is_datetime64tz_dtype()
, check isinstance(dtype, pd.DatetimeTZDtype)
instead (GH 52607)
Deprecated is_int64_dtype()
, check dtype == np.dtype(np.int64)
instead (GH 52564)
Deprecated is_interval_dtype()
, check isinstance(dtype, pd.IntervalDtype)
instead (GH 52607)
Deprecated is_period_dtype()
, check isinstance(dtype, pd.PeriodDtype)
instead (GH 52642)
Deprecated is_sparse()
, check isinstance(dtype, pd.SparseDtype)
instead (GH 52642)
Deprecated Styler.applymap_index()
. Use the new Styler.map_index()
method instead (GH 52708)
Deprecated Styler.applymap()
. Use the new Styler.map()
method instead (GH 52708)
Deprecated DataFrame.applymap()
. Use the new DataFrame.map()
method instead (GH 52353)
Deprecated DataFrame.swapaxes()
and Series.swapaxes()
, use DataFrame.transpose()
or Series.transpose()
instead (GH 51946)
Deprecated freq
parameter in PeriodArray
constructor, pass dtype
instead (GH 52462)
Deprecated allowing non-standard inputs in take()
, pass either a numpy.ndarray
, ExtensionArray
, Index
, or Series
(GH 52981)
Deprecated allowing non-standard sequences for isin()
, value_counts()
, unique()
, factorize()
, case to one of numpy.ndarray
, Index
, ExtensionArray
, or Series
before calling (GH 52986)
Deprecated behavior of DataFrame
reductions sum
, prod
, std
, var
, sem
with axis=None
, in a future version this will operate over both axes returning a scalar instead of behaving like axis=0
; note this also affects numpy functions e.g. np.sum(df)
(GH 21597)
Deprecated behavior of concat()
when DataFrame
has columns that are all-NA, in a future version these will not be discarded when determining the resulting dtype (GH 40893)
Deprecated behavior of Series.dt.to_pydatetime()
, in a future version this will return a Series
containing python datetime
objects instead of an ndarray
of datetimes; this matches the behavior of other Series.dt
properties (GH 20306)
Deprecated logical operations (|
, &
, ^
) between pandas objects and dtype-less sequences (e.g. list
, tuple
), wrap a sequence in a Series
or NumPy array before operating instead (GH 51521)
Deprecated parameter convert_type
in Series.apply()
(GH 52140)
Deprecated passing a dictionary to SeriesGroupBy.agg()
; pass a list of aggregations instead (GH 50684)
Deprecated the fastpath
keyword in Categorical
constructor, use Categorical.from_codes()
instead (GH 20110)
Deprecated the behavior of is_bool_dtype()
returning True
for object-dtype Index
of bool objects (GH 52680)
Deprecated the methods Series.bool()
and DataFrame.bool()
(GH 51749)
Deprecated unused closed
and normalize
keywords in the DatetimeIndex
constructor (GH 52628)
Deprecated unused closed
keyword in the TimedeltaIndex
constructor (GH 52628)
Deprecated logical operation between two non boolean Series
with different indexes always coercing the result to bool dtype. In a future version, this will maintain the return type of the inputs (GH 52500, GH 52538)
Deprecated Period
and PeriodDtype
with BDay
freq, use a DatetimeIndex
with BDay
freq instead (GH 53446)
Deprecated value_counts()
, use pd.Series(obj).value_counts()
instead (GH 47862)
Deprecated Series.first()
and DataFrame.first()
; create a mask and filter using .loc
instead (GH 45908)
Deprecated Series.interpolate()
and DataFrame.interpolate()
for object-dtype (GH 53631)
Deprecated Series.last()
and DataFrame.last()
; create a mask and filter using .loc
instead (GH 53692)
Deprecated allowing arbitrary fill_value
in SparseDtype
, in a future version the fill_value
will need to be compatible with the dtype.subtype
, either a scalar that can be held by that subtype or NaN
for integer or bool subtypes (GH 23124)
Deprecated allowing bool dtype in DataFrameGroupBy.quantile()
and SeriesGroupBy.quantile()
, consistent with the Series.quantile()
and DataFrame.quantile()
behavior (GH 51424)
Deprecated behavior of testing.assert_series_equal()
and testing.assert_frame_equal()
considering NA-like values (e.g. NaN
vs None
as equivalent) (GH 52081)
Deprecated bytes input to read_excel()
. To read a file path, use a string or path-like object (GH 53767)
Deprecated constructing SparseArray
from scalar data, pass a sequence instead (GH 53039)
Deprecated falling back to filling when value
is not specified in DataFrame.replace()
and Series.replace()
with non-dict-like to_replace
(GH 33302)
Deprecated literal json input to read_json()
. Wrap literal json string input in io.StringIO
instead (GH 53409)
Deprecated literal string input to read_xml()
. Wrap literal string/bytes input in io.StringIO
/ io.BytesIO
instead (GH 53767)
Deprecated literal string/bytes input to read_html()
. Wrap literal string/bytes input in io.StringIO
/ io.BytesIO
instead (GH 53767)
Deprecated option mode.use_inf_as_na
, convert inf entries to NaN
before instead (GH 51684)
Deprecated parameter obj
in DataFrameGroupBy.get_group()
(GH 53545)
Deprecated positional indexing on Series
with Series.__getitem__()
and Series.__setitem__()
, in a future version ser[item]
will always interpret item
as a label, not a position (GH 50617)
Deprecated replacing builtin and NumPy functions in .agg
, .apply
, and .transform
; use the corresponding string alias (e.g. "sum"
for sum
or np.sum
) instead (GH 53425)
Deprecated strings T
, t
, L
and l
denoting units in to_timedelta()
(GH 52536)
Deprecated the âmethodâ and âlimitâ keywords in .ExtensionArray.fillna
, implement _pad_or_backfill
instead (GH 53621)
Deprecated the method
and limit
keywords in DataFrame.replace()
and Series.replace()
(GH 33302)
Deprecated the method
and limit
keywords on Series.fillna()
, DataFrame.fillna()
, SeriesGroupBy.fillna()
, DataFrameGroupBy.fillna()
, and Resampler.fillna()
, use obj.bfill()
or obj.ffill()
instead (GH 53394)
Deprecated the behavior of Series.__getitem__()
, Series.__setitem__()
, DataFrame.__getitem__()
, DataFrame.__setitem__()
with an integer slice on objects with a floating-dtype index, in a future version this will be treated as positional indexing (GH 49612)
Deprecated the use of non-supported datetime64 and timedelta64 resolutions with pandas.array()
. Supported resolutions are: âsâ, âmsâ, âusâ, ânsâ resolutions (GH 53058)
Deprecated values "pad"
, "ffill"
, "bfill"
, "backfill"
for Series.interpolate()
and DataFrame.interpolate()
, use obj.ffill()
or obj.bfill()
instead (GH 53581)
Deprecated the behavior of Index.argmax()
, Index.argmin()
, Series.argmax()
, Series.argmin()
with either all-NAs and skipna=True
or any-NAs and skipna=False
returning -1; in a future version this will raise ValueError
(GH 33941, GH 33942)
Deprecated allowing non-keyword arguments in DataFrame.to_sql()
except name
and con
(GH 54229)
Deprecated silently ignoring fill_value
when passing both freq
and fill_value
to DataFrame.shift()
, Series.shift()
and DataFrameGroupBy.shift()
; in a future version this will raise ValueError
(GH 53832)
Performance improvement in concat()
with homogeneous np.float64
or np.float32
dtypes (GH 52685)
Performance improvement in factorize()
for object columns not containing strings (GH 51921)
Performance improvement in read_orc()
when reading a remote URI file path (GH 51609)
Performance improvement in read_parquet()
and DataFrame.to_parquet()
when reading a remote file with engine="pyarrow"
(GH 51609)
Performance improvement in read_parquet()
on string columns when using use_nullable_dtypes=True
(GH 47345)
Performance improvement in DataFrame.clip()
and Series.clip()
(GH 51472)
Performance improvement in DataFrame.filter()
when items
is given (GH 52941)
Performance improvement in DataFrame.first_valid_index()
and DataFrame.last_valid_index()
for extension array dtypes (GH 51549)
Performance improvement in DataFrame.where()
when cond
is backed by an extension dtype (GH 51574)
Performance improvement in MultiIndex.set_levels()
and MultiIndex.set_codes()
when verify_integrity=True
(GH 51873)
Performance improvement in MultiIndex.sortlevel()
when ascending
is a list (GH 51612)
Performance improvement in Series.combine_first()
(GH 51777)
Performance improvement in fillna()
when array does not contain nulls (GH 51635)
Performance improvement in isna()
when array has zero nulls or is all nulls (GH 51630)
Performance improvement when parsing strings to boolean[pyarrow]
dtype (GH 51730)
Performance improvement when searching an Index
sliced from other indexes (GH 51738)
Period
âs default formatter (period_format
) is now significantly (~twice) faster. This improves performance of str(Period)
, repr(Period)
, and Period.strftime(fmt=None)()
, as well as .PeriodArray.strftime(fmt=None)
, .PeriodIndex.strftime(fmt=None)
and .PeriodIndex.format(fmt=None)
. to_csv
operations involving PeriodArray
or PeriodIndex
with default date_format
are also significantly accelerated (GH 51459)
Performance improvement accessing arrays.IntegerArrays.dtype
& arrays.FloatingArray.dtype
(GH 52998)
Performance improvement for DataFrameGroupBy
/SeriesGroupBy
aggregations (e.g. DataFrameGroupBy.sum()
) with engine="numba"
(GH 53731)
Performance improvement in DataFrame
reductions with axis=1
and extension dtypes (GH 54341)
Performance improvement in DataFrame
reductions with axis=None
and extension dtypes (GH 54308)
Performance improvement in MultiIndex
and multi-column operations (e.g. DataFrame.sort_values()
, DataFrame.groupby()
, Series.unstack()
) when index/column values are already sorted (GH 53806)
Performance improvement in concat()
when axis=1
and objects have different indexes (GH 52541)
Performance improvement in concat()
when the concatenation axis is a MultiIndex
(GH 53574)
Performance improvement in merge()
for PyArrow backed strings (GH 54443)
Performance improvement in read_csv()
with engine="c"
(GH 52632)
Performance improvement in ArrowExtensionArray.to_numpy()
(GH 52525)
Performance improvement in DataFrameGroupBy.groups()
(GH 53088)
Performance improvement in DataFrame.astype()
when dtype
is an extension dtype (GH 54299)
Performance improvement in DataFrame.iloc()
when input is an single integer and dataframe is backed by extension dtypes (GH 54508)
Performance improvement in DataFrame.isin()
for extension dtypes (GH 53514)
Performance improvement in DataFrame.loc()
when selecting rows and columns (GH 53014)
Performance improvement in DataFrame.transpose()
when transposing a DataFrame with a single PyArrow dtype (GH 54224)
Performance improvement in DataFrame.transpose()
when transposing a DataFrame with a single masked dtype, e.g. Int64
(GH 52836)
Performance improvement in Series.add()
for PyArrow string and binary dtypes (GH 53150)
Performance improvement in Series.corr()
and Series.cov()
for extension dtypes (GH 52502)
Performance improvement in Series.drop_duplicates()
for ArrowDtype
(GH 54667).
Performance improvement in Series.ffill()
, Series.bfill()
, DataFrame.ffill()
, DataFrame.bfill()
with PyArrow dtypes (GH 53950)
Performance improvement in Series.str.get_dummies()
for PyArrow-backed strings (GH 53655)
Performance improvement in Series.str.get()
for PyArrow-backed strings (GH 53152)
Performance improvement in Series.str.split()
with expand=True
for PyArrow-backed strings (GH 53585)
Performance improvement in Series.to_numpy()
when dtype is a NumPy float dtype and na_value
is np.nan
(GH 52430)
Performance improvement in astype()
when converting from a PyArrow timestamp or duration dtype to NumPy (GH 53326)
Performance improvement in various MultiIndex
set and indexing operations (GH 53955)
Performance improvement when doing various reshaping operations on arrays.IntegerArray
& arrays.FloatingArray
by avoiding doing unnecessary validation (GH 53013)
Performance improvement when indexing with PyArrow timestamp and duration dtypes (GH 53368)
Performance improvement when passing an array to RangeIndex.take()
, DataFrame.loc()
, or DataFrame.iloc()
and the DataFrame is using a RangeIndex (GH 53387)
Bug in CategoricalIndex.remove_categories()
where ordered categories would not be maintained (GH 53935).
Bug in Series.astype()
with dtype="category"
for nullable arrays with read-only null value masks (GH 53658)
Bug in Series.map()
, where the value of the na_action
parameter was not used if the series held a Categorical
(GH 22527).
DatetimeIndex.map()
with na_action="ignore"
now works as expected (GH 51644)
DatetimeIndex.slice_indexer()
now raises KeyError
for non-monotonic indexes if either of the slice bounds is not in the index; this behaviour was previously deprecated but inconsistently handled (GH 53983)
Bug in DateOffset
which had inconsistent behavior when multiplying a DateOffset
object by a constant (GH 47953)
Bug in date_range()
when freq
was a DateOffset
with nanoseconds
(GH 46877)
Bug in to_datetime()
converting Series
or DataFrame
containing arrays.ArrowExtensionArray
of PyArrow timestamps to numpy datetimes (GH 52545)
Bug in DatetimeArray.map()
and DatetimeIndex.map()
, where the supplied callable operated array-wise instead of element-wise (GH 51977)
Bug in DataFrame.to_sql()
raising ValueError
for PyArrow-backed date like dtypes (GH 53854)
Bug in Timestamp.date()
, Timestamp.isocalendar()
, Timestamp.timetuple()
, and Timestamp.toordinal()
were returning incorrect results for inputs outside those supported by the Python standard libraryâs datetime module (GH 53668)
Bug in Timestamp.round()
with values close to the implementation bounds returning incorrect results instead of raising OutOfBoundsDatetime
(GH 51494)
Bug in constructing a Series
or DataFrame
from a datetime or timedelta scalar always inferring nanosecond resolution instead of inferring from the input (GH 52212)
Bug in constructing a Timestamp
from a string representing a time without a date inferring an incorrect unit (GH 54097)
Bug in constructing a Timestamp
with ts_input=pd.NA
raising TypeError
(GH 45481)
Bug in parsing datetime strings with weekday but no day e.g. â2023 Sept Thuâ incorrectly raising AttributeError
instead of ValueError
(GH 52659)
Bug in the repr for Series
when dtype is a timezone aware datetime with non-nanosecond resolution raising OutOfBoundsDatetime
(GH 54623)
Bug in TimedeltaIndex
division or multiplication leading to .freq
of â0 Daysâ instead of None
(GH 51575)
Bug in Timedelta
with NumPy timedelta64
objects not properly raising ValueError
(GH 52806)
Bug in to_timedelta()
converting Series
or DataFrame
containing ArrowDtype
of pyarrow.duration
to NumPy timedelta64
(GH 54298)
Bug in Timedelta.__hash__()
, raising an OutOfBoundsTimedelta
on certain large values of second resolution (GH 54037)
Bug in Timedelta.round()
with values close to the implementation bounds returning incorrect results instead of raising OutOfBoundsTimedelta
(GH 51494)
Bug in TimedeltaIndex.map()
with na_action="ignore"
(GH 51644)
Bug in arrays.TimedeltaArray.map()
and TimedeltaIndex.map()
, where the supplied callable operated array-wise instead of element-wise (GH 51977)
Bug in infer_freq()
that raises TypeError
for Series
of timezone-aware timestamps (GH 52456)
Bug in DatetimeTZDtype.base()
that always returns a NumPy dtype with nanosecond resolution (GH 52705)
Bug in RangeIndex
setting step
incorrectly when being the subtrahend with minuend a numeric value (GH 53255)
Bug in Series.corr()
and Series.cov()
raising AttributeError
for masked dtypes (GH 51422)
Bug when calling Series.kurt()
and Series.skew()
on NumPy data of all zero returning a Python type instead of a NumPy type (GH 53482)
Bug in Series.mean()
, DataFrame.mean()
with object-dtype values containing strings that can be converted to numbers (e.g. â2â) returning incorrect numeric results; these now raise TypeError
(GH 36703, GH 44008)
Bug in DataFrame.corrwith()
raising NotImplementedError
for PyArrow-backed dtypes (GH 52314)
Bug in DataFrame.size()
and Series.size()
returning 64-bit integer instead of a Python int (GH 52897)
Bug in DateFrame.dot()
returning object
dtype for ArrowDtype
data (GH 53979)
Bug in Series.any()
, Series.all()
, DataFrame.any()
, and DataFrame.all()
had the default value of bool_only
set to None
instead of False
; this change should have no impact on users (GH 53258)
Bug in Series.corr()
and Series.cov()
raising AttributeError
for masked dtypes (GH 51422)
Bug in Series.median()
and DataFrame.median()
with object-dtype values containing strings that can be converted to numbers (e.g. â2â) returning incorrect numeric results; these now raise TypeError
(GH 34671)
Bug in Series.sum()
converting dtype uint64
to int64
(GH 53401)
Bug in DataFrame.style.to_latex()
and DataFrame.style.to_html()
if the DataFrame contains integers with more digits than can be represented by floating point double precision (GH 52272)
Bug in array()
when given a datetime64
or timedelta64
dtype with unit of âsâ, âusâ, or âmsâ returning NumpyExtensionArray
instead of DatetimeArray
or TimedeltaArray
(GH 52859)
Bug in array()
when given an empty list and no dtype returning NumpyExtensionArray
instead of FloatingArray
(GH 54371)
Bug in ArrowDtype.numpy_dtype()
returning nanosecond units for non-nanosecond pyarrow.timestamp
and pyarrow.duration
types (GH 51800)
Bug in DataFrame.__repr__()
incorrectly raising a TypeError
when the dtype of a column is np.record
(GH 48526)
Bug in DataFrame.info()
raising ValueError
when use_numba
is set (GH 51922)
Bug in DataFrame.insert()
raising TypeError
if loc
is np.int64
(GH 53193)
Bug in HDFStore.select()
loses precision of large int when stored and retrieved (GH 54186)
Bug in Series.astype()
not supporting object_
(GH 54251)
Bug in Series.str()
that did not raise a TypeError
when iterated (GH 54173)
Bug in repr
for DataFrame
with string-dtype columns (GH 54797)
IntervalIndex.get_indexer()
and IntervalIndex.get_indexer_nonunique()
raising if target
is read-only array (GH 53703)
Bug in IntervalDtype
where the object could be kept alive when deleted (GH 54184)
Bug in interval_range()
where a float step
would produce incorrect intervals from floating point artifacts (GH 54477)
Bug in DataFrame.__setitem__()
losing dtype when setting a DataFrame
into duplicated columns (GH 53143)
Bug in DataFrame.__setitem__()
with a boolean mask and DataFrame.putmask()
with mixed non-numeric dtypes and a value other than NaN
incorrectly raising TypeError
(GH 53291)
Bug in DataFrame.iloc()
when using nan
as the only element (GH 52234)
Bug in Series.loc()
casting Series
to np.dnarray
when assigning Series
at predefined index of object
dtype Series
(GH 48933)
Bug in DataFrame.interpolate()
failing to fill across data when method
is "pad"
, "ffill"
, "bfill"
, or "backfill"
(GH 53898)
Bug in DataFrame.interpolate()
ignoring inplace
when DataFrame
is empty (GH 53199)
Bug in Series.idxmin()
, Series.idxmax()
, DataFrame.idxmin()
, DataFrame.idxmax()
with a DatetimeIndex
index containing NaT
incorrectly returning NaN
instead of NaT
(GH 43587)
Bug in Series.interpolate()
and DataFrame.interpolate()
failing to raise on invalid downcast
keyword, which can be only None
or "infer"
(GH 53103)
Bug in Series.interpolate()
and DataFrame.interpolate()
with complex dtype incorrectly failing to fill NaN
entries (GH 53635)
Bug in MultiIndex.set_levels()
not preserving dtypes for Categorical
(GH 52125)
Bug in displaying a MultiIndex
with a long element (GH 52960)
DataFrame.to_orc()
now raising ValueError
when non-default Index
is given (GH 51828)
DataFrame.to_sql()
now raising ValueError
when the name param is left empty while using SQLAlchemy to connect (GH 52675)
Bug in json_normalize()
could not parse metadata fields list type (GH 37782)
Bug in read_csv()
where it would error when parse_dates
was set to a list or dictionary with engine="pyarrow"
(GH 47961)
Bug in read_csv()
with engine="pyarrow"
raising when specifying a dtype
with index_col
(GH 53229)
Bug in read_hdf()
not properly closing store after an IndexError
is raised (GH 52781)
Bug in read_html()
where style elements were read into DataFrames (GH 52197)
Bug in read_html()
where tail texts were removed together with elements containing display:none
style (GH 51629)
Bug in read_sql_table()
raising an exception when reading a view (GH 52969)
Bug in read_sql()
when reading multiple timezone aware columns with the same column name (GH 44421)
Bug in read_xml()
stripping whitespace in string data (GH 53811)
Bug in DataFrame.to_html()
where colspace
was incorrectly applied in case of multi index columns (GH 53885)
Bug in DataFrame.to_html()
where conversion for an empty DataFrame
with complex dtype raised a ValueError
(GH 54167)
Bug in DataFrame.to_json()
where DateTimeArray
/DateTimeIndex
with non nanosecond precision could not be serialized correctly (GH 53686)
Bug when writing and reading empty Stata dta files where dtype information was lost (GH 46240)
Bug where bz2
was treated as a hard requirement (GH 53857)
Bug in PeriodDtype
constructor failing to raise TypeError
when no argument is passed or when None
is passed (GH 27388)
Bug in PeriodDtype
constructor incorrectly returning the same normalize
for different DateOffset
freq
inputs (GH 24121)
Bug in PeriodDtype
constructor raising ValueError
instead of TypeError
when an invalid type is passed (GH 51790)
Bug in PeriodDtype
where the object could be kept alive when deleted (GH 54184)
Bug in read_csv()
not processing empty strings as a null value, with engine="pyarrow"
(GH 52087)
Bug in read_csv()
returning object
dtype columns instead of float64
dtype columns with engine="pyarrow"
for columns that are all null with engine="pyarrow"
(GH 52087)
Bug in Period.now()
not accepting the freq
parameter as a keyword argument (GH 53369)
Bug in PeriodIndex.map()
with na_action="ignore"
(GH 51644)
Bug in arrays.PeriodArray.map()
and PeriodIndex.map()
, where the supplied callable operated array-wise instead of element-wise (GH 51977)
Bug in incorrectly allowing construction of Period
or PeriodDtype
with CustomBusinessDay
freq; use BusinessDay
instead (GH 52534)
Bug in Series.plot()
when invoked with color=None
(GH 51953)
Fixed UserWarning in DataFrame.plot.scatter()
when invoked with c="b"
(GH 53908)
Bug in DataFrameGroupBy.idxmin()
, SeriesGroupBy.idxmin()
, DataFrameGroupBy.idxmax()
, SeriesGroupBy.idxmax()
returns wrong dtype when used on an empty DataFrameGroupBy or SeriesGroupBy (GH 51423)
Bug in DataFrame.groupby.rank()
on nullable datatypes when passing na_option="bottom"
or na_option="top"
(GH 54206)
Bug in DataFrame.resample()
and Series.resample()
in incorrectly allowing non-fixed freq
when resampling on a TimedeltaIndex
(GH 51896)
Bug in DataFrame.resample()
and Series.resample()
losing time zone when resampling empty data (GH 53664)
Bug in DataFrame.resample()
and Series.resample()
where origin
has no effect in resample when values are outside of axis (GH 53662)
Bug in weighted rolling aggregations when specifying min_periods=0
(GH 51449)
Bug in DataFrame.groupby()
and Series.groupby()
where, when the index of the grouped Series
or DataFrame
was a DatetimeIndex
, TimedeltaIndex
or PeriodIndex
, and the groupby
method was given a function as its first argument, the function operated on the whole index rather than each element of the index (GH 51979)
Bug in DataFrameGroupBy.agg()
with lists not respecting as_index=False
(GH 52849)
Bug in DataFrameGroupBy.apply()
causing an error to be raised when the input DataFrame
was subset as a DataFrame
after groupby ([['a']]
and not ['a']
) and the given callable returned Series
that were not all indexed the same (GH 52444)
Bug in DataFrameGroupBy.apply()
raising a TypeError
when selecting multiple columns and providing a function that returns np.ndarray
results (GH 18930)
Bug in DataFrameGroupBy.groups()
and SeriesGroupBy.groups()
with a datetime key in conjunction with another key produced an incorrect number of group keys (GH 51158)
Bug in DataFrameGroupBy.quantile()
and SeriesGroupBy.quantile()
may implicitly sort the result index with sort=False
(GH 53009)
Bug in SeriesGroupBy.size()
where the dtype would be np.int64
for data with ArrowDtype
or masked dtypes (e.g. Int64
) (GH 53831)
Bug in DataFrame.groupby()
with column selection on the resulting groupby object not returning names as tuples when grouping by a list consisting of a single element (GH 53500)
Bug in DataFrameGroupBy.var()
and SeriesGroupBy.var()
failing to raise TypeError
when called with datetime64, timedelta64 or PeriodDtype
values (GH 52128, GH 53045)
Bug in DataFrameGroupBy.resample()
with kind="period"
raising AttributeError
(GH 24103)
Bug in Resampler.ohlc()
with empty object returning a Series
instead of empty DataFrame
(GH 42902)
Bug in SeriesGroupBy.count()
and DataFrameGroupBy.count()
where the dtype would be np.int64
for data with ArrowDtype
or masked dtypes (e.g. Int64
) (GH 53831)
Bug in SeriesGroupBy.nth()
and DataFrameGroupBy.nth()
after performing column selection when using dropna="any"
or dropna="all"
would not subset columns (GH 53518)
Bug in SeriesGroupBy.nth()
and DataFrameGroupBy.nth()
raised after performing column selection when using dropna="any"
or dropna="all"
resulted in rows being dropped (GH 53518)
Bug in SeriesGroupBy.sum()
and DataFrameGroupBy.sum()
summing np.inf + np.inf
and (-np.inf) + (-np.inf)
to np.nan
instead of np.inf
and -np.inf
respectively (GH 53606)
Bug in Series.groupby()
raising an error when grouped Series
has a DatetimeIndex
index and a Series
with a name that is a month is given to the by
argument (GH 48509)
Bug in concat()
coercing to object
dtype when one column has pa.null()
dtype (GH 53702)
Bug in crosstab()
when dropna=False
would not keep np.nan
in the result (GH 10772)
Bug in melt()
where the variable
column would lose extension dtypes (GH 54297)
Bug in merge_asof()
raising KeyError
for extension dtypes (GH 52904)
Bug in merge_asof()
raising ValueError
for data backed by read-only ndarrays (GH 53513)
Bug in merge_asof()
with left_index=True
or right_index=True
with mismatched index dtypes giving incorrect results in some cases instead of raising MergeError
(GH 53870)
Bug in merge()
when merging on integer ExtensionDtype
and float NumPy dtype raising TypeError
(GH 46178)
Bug in DataFrame.agg()
and Series.agg()
on non-unique columns would return incorrect type when dist-like argument passed in (GH 51099)
Bug in DataFrame.combine_first()
ignoring otherâs columns if other
is empty (GH 53792)
Bug in DataFrame.idxmin()
and DataFrame.idxmax()
, where the axis dtype would be lost for empty frames (GH 53265)
Bug in DataFrame.merge()
not merging correctly when having MultiIndex
with single level (GH 52331)
Bug in DataFrame.stack()
losing extension dtypes when columns is a MultiIndex
and frame contains mixed dtypes (GH 45740)
Bug in DataFrame.stack()
sorting columns lexicographically (GH 53786)
Bug in DataFrame.transpose()
inferring dtype for object column (GH 51546)
Bug in Series.combine_first()
converting int64
dtype to float64
and losing precision on very large integers (GH 51764)
Bug when joining empty DataFrame
objects, where the joined index would be a RangeIndex
instead of the joined index type (GH 52777)
Bug in SparseDtype
constructor failing to raise TypeError
when given an incompatible dtype
for its subtype, which must be a NumPy dtype (GH 53160)
Bug in arrays.SparseArray.map()
allowed the fill value to be included in the sparse values (GH 52095)
Bug in ArrowStringArray
constructor raises ValueError
with dictionary types of strings (GH 54074)
Bug in DataFrame
constructor not copying Series
with extension dtype when given in dict (GH 53744)
Bug in ArrowExtensionArray
converting pandas non-nanosecond temporal objects from non-zero values to zero values (GH 53171)
Bug in Series.quantile()
for PyArrow temporal types raising ArrowInvalid
(GH 52678)
Bug in Series.rank()
returning wrong order for small values with Float64
dtype (GH 52471)
Bug in Series.unique()
for boolean ArrowDtype
with NA
values (GH 54667)
Bug in __iter__()
and __getitem__()
returning python datetime and timedelta objects for non-nano dtypes (GH 53326)
Bug in factorize()
returning incorrect uniques for a pyarrow.dictionary
type pyarrow.chunked_array
with more than one chunk (GH 54844)
Bug when passing an ExtensionArray
subclass to dtype
keywords. This will now raise a UserWarning
to encourage passing an instance instead (GH 31356, GH 54592)
Bug where the DataFrame
repr would not work when a column had an ArrowDtype
with a pyarrow.ExtensionDtype
(GH 54063)
Bug where the __from_arrow__
method of masked ExtensionDtypes (e.g. Float64Dtype
, BooleanDtype
) would not accept PyArrow arrays of type pyarrow.null()
(GH 52223)
Fixed metadata propagation in DataFrame.max()
, DataFrame.min()
, DataFrame.prod()
, DataFrame.mean()
, Series.mode()
, DataFrame.median()
, DataFrame.sem()
, DataFrame.skew()
, DataFrame.kurt()
(GH 28283)
Fixed metadata propagation in DataFrame.squeeze()
, and DataFrame.describe()
(GH 28283)
Fixed metadata propagation in DataFrame.std()
(GH 28283)
Bug in FloatingArray.__contains__
with NaN
item incorrectly returning False
when NaN
values are present (GH 52840)
Bug in DataFrame
and Series
raising for data of complex dtype when NaN
values are present (GH 53627)
Bug in DatetimeIndex
where repr
of index passed with time does not print time is midnight and non-day based freq(GH 53470)
Bug in testing.assert_frame_equal()
and testing.assert_series_equal()
now throw assertion error for two unequal sets (GH 51727)
Bug in testing.assert_frame_equal()
checks category dtypes even when asked not to check index type (GH 52126)
Bug in api.interchange.from_dataframe()
was not respecting allow_copy
argument (GH 54322)
Bug in api.interchange.from_dataframe()
was raising during interchanging from non-pandas tz-aware data containing null values (GH 54287)
Bug in api.interchange.from_dataframe()
when converting an empty DataFrame object (GH 53155)
Bug in from_dummies()
where the resulting Index
did not match the original Index
(GH 54300)
Bug in from_dummies()
where the resulting data would always be object
dtype instead of the dtype of the columns (GH 54300)
Bug in DataFrameGroupBy.first()
, DataFrameGroupBy.last()
, SeriesGroupBy.first()
, and SeriesGroupBy.last()
where an empty group would return np.nan
instead of the corresponding ExtensionArray
NA value (GH 39098)
Bug in DataFrame.pivot_table()
with casting the mean of ints back to an int (GH 16676)
Bug in DataFrame.reindex()
with a fill_value
that should be inferred with a ExtensionDtype
incorrectly inferring object
dtype (GH 52586)
Bug in DataFrame.shift()
with axis=1
on a DataFrame
with a single ExtensionDtype
column giving incorrect results (GH 53832)
Bug in Index.sort_values()
when a key
is passed (GH 52764)
Bug in Series.align()
, DataFrame.align()
, Series.reindex()
, DataFrame.reindex()
, Series.interpolate()
, DataFrame.interpolate()
, incorrectly failing to raise with method=âasfreqâ (GH 53620)
Bug in Series.argsort()
failing to raise when an invalid axis
is passed (GH 54257)
Bug in Series.map()
when giving a callable to an empty series, the returned series had object
dtype. It now keeps the original dtype (GH 52384)
Bug in Series.memory_usage()
when deep=True
throw an error with Series of objects and the returned value is incorrect, as it does not take into account GC corrections (GH 51858)
Bug in period_range()
the default behavior when freq was not passed as an argument was incorrect(GH 53687)
Fixed incorrect __name__
attribute of pandas._libs.json
(GH 52898)
A total of 266 people contributed patches to this release. People with a â+â by their names contributed a patch for the first time.
AG +
Aarni Koskela
Adrian DâAlessandro +
Adrien RUAULT +
Ahmad +
Aidos Kanapyanov +
Alex Malins
Alexander Seiler +
Ali Asgar +
Allison Kwan
Amanda Bizzinotto +
Andres Algaba +
Angela Seo +
Anirudh Hegde +
Antony Evmorfopoulos +
Anushka Bishnoi
ArnaudChanoine +
Artem Vorobyev +
Arya Sarkar +
Ashwin Srinath
Austin Au-Yeung +
Austin Burnett +
Bear +
Ben Mangold +
Bernardo Gameiro +
Boyd Kane +
Brayan Alexander Muñoz B +
Brock
Chetan0402 +
Chris Carini
ChristofKaufmann
Clark-W +
Conrad Mcgee Stocks
Corrie Bartelheimer +
Coulton Theuer +
D067751 +
Daniel Isaac
Daniele Nicolodi +
David Samuel +
David Seifert +
Dea Leon +
Dea MarÃa Léon
Deepyaman Datta
Denis Sapozhnikov +
Dharani Akurathi +
DimiGrammatikakis +
Dirk Ulbricht +
Dmitry Shemetov +
Dominik Berger
Efkan S. Goktepe +
Ege ÃzgüroÄlu
Eli Schwartz
Erdi +
Fabrizio Primerano +
Facundo Batista +
Fangchen Li
Felipe Maion +
Francis +
Future Programmer +
Gabriel Kabbe +
Gaétan Ramet +
Gianluca Ficarelli
Godwill Agbehonou +
Guillaume Lemaitre
Guo Ci
Gustavo Vargas +
Hamidreza Sanaee +
HappyHorse +
Harald Husum +
Hugo van Kemenade
Ido Ronen +
Irv Lustig
JHM Darbyshire
JHM Darbyshire (iMac)
JJ +
Jarrod Millman
Jay +
Jeff Reback
Jessica Greene +
Jiawei Zhang +
Jinli Xiao +
Joanna Ge +
Jona Sassenhagen +
Jonas Haag
Joris Van den Bossche
Joshua Shew +
Julian Badillo
Julian Ortiz +
Julien Palard +
Justin Tyson +
Justus Magin
Kabiir Krishna +
Kang Su Min
Ketu Patel +
Kevin +
Kevin Anderson
Kevin Jan Anker
Kevin Klein +
Kevin Sheppard
Kostya Farber
LM +
Lars Lien Ankile +
Lawrence Mitchell
Liwei Cai +
Loic Diridollou
Luciana Solorzano +
Luke Manley
Lumberbot (aka Jack)
Marat Kopytjuk +
Marc Garcia
Marco Edward Gorelli
MarcoGorelli
Maria Telenczuk +
MarvinGravert +
Mateusz SokóŠ+
Matt Richards
Matthew Barber +
Matthew Roeschke
Matus Valo +
Mia Reimer +
Michael Terry +
Michael Tiemann +
Milad Maani Jou +
Miles Cranmer +
MirijaH +
Miyuu +
Natalia Mokeeva
Nathan Goldbaum +
Nicklaus Roach +
Nicolas Camenisch +
Nikolay Boev +
Nirav
Nishu Choudhary
Noa Tamir
Noy Hanan +
Numan +
Numan Ijaz +
Omar Elbaz +
Pandas Development Team
Parfait Gasana
Parthi
Patrick Hoefler
Patrick Schleiter +
Pawel Kranzberg +
Philip
Philip Meier +
Pranav Saibhushan Ravuri
PrathumP +
Rahul Siloniya +
Rajasvi Vinayak +
Rajat Subhra Mukherjee +
Ralf Gommers
RaphSku
Rebecca Chen +
Renato Cotrim Maciel +
Reza (Milad) Maanijou +
Richard Shadrach
Rithik Reddy +
Robert Luce +
Ronalido +
Rylie Wei +
SOUMYADIP MAL +
Sanjith Chockan +
Sayed Qaiser Ali +
Scott Harp +
Se +
Shashwat Agrawal
Simar Bassi +
Simon Brugman +
Simon Hawkins
Simon Høxbro Hansen
Snorf Yang +
Sortofamudkip +
Stefan Krawczyk
Stefanie Molin
Stefanie Senger
Stelios Petrakis +
Stijn Van Hoey
Sven
Sylvain MARIE
Sylvain Marié
Terji Petersen
Thierry Moisan
Thomas
Thomas A Caswell
Thomas Grainger
Thomas Li
Thomas Vranken +
Tianye Song +
Tim Hoffmann
Tim Loderhose +
Tim Swast
Timon Jurschitsch +
Tolker-KU +
Tomas Pavlik +
Toroi +
Torsten Wörtwein
Travis Gibbs +
Umberto Fasci +
Valerii +
VanMyHu +
Victor Momodu +
Vijay Vaidyanathan +
VomV +
William Andrea
William Ayd
Wolf Behrenhoff +
Xiao Yuan
Yao Xiao
Yasin Tatar
Yaxin Li +
Yi Wei +
Yulia +
Yusharth Singh +
Zach Breger +
Zhengbo Wang
abokey1 +
ahmad2901 +
assafam +
auderson
august-tengland +
bunardsheng +
cmmck +
cnguyen-03 +
coco +
dependabot[bot]
giplessis +
github-actions[bot]
gmaiwald +
gmollard +
jbrockmendel
kathleenhang
kevx82 +
lia2710 +
liang3zy22 +
ltartaro +
lusolorz +
m-ganko +
mKlepsch +
mattkeanny +
mrastgoo +
nabdoni +
omar-elbaz +
paulreece +
penelopeysm +
potap75 +
pre-commit-ci[bot] +
raanasn +
raj-thapa +
ramvikrams +
rebecca-palmer
reddyrg1 +
rmhowe425 +
segatrade +
shteken +
sweisss +
taytzehao
tntmatthews +
tpaxman +
tzehaoo +
v-mcoutinho +
wcgonzal +
yonashub
yusharth +
Ãdám Lippai
Å tÄpán Műller +
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4