A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pandas-dev/pandas/issues/42597 below:

Support min/max on ArrowStringArray · Issue #42597 · pandas-dev/pandas · GitHub

Motivation

In order for Dask to perform large shuffles (set_index, join on a non-index column, ...) on a column it needs to be able to compute quantiles.

To do this it is useful to compute min/max values.

What actually breaks

When I try to do this on columns of type string[pyarrow] I get the following exception

import pandas as pd
s = pd.Series(["a", "b", "c"]).astype("string[pyarrow]")
s.min()
~/miniconda/lib/python3.8/site-packages/pandas/core/generic.py in min(self, axis, skipna, level, numeric_only, **kwargs)
  10825         )
  10826         def min(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs):
> 10827             return NDFrame.min(self, axis, skipna, level, numeric_only, **kwargs)
  10828 
  10829         setattr(cls, "min", min)

~/miniconda/lib/python3.8/site-packages/pandas/core/generic.py in min(self, axis, skipna, level, numeric_only, **kwargs)
  10348 
  10349     def min(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs):
> 10350         return self._stat_function(
  10351             "min", nanops.nanmin, axis, skipna, level, numeric_only, **kwargs
  10352         )

~/miniconda/lib/python3.8/site-packages/pandas/core/generic.py in _stat_function(self, name, func, axis, skipna, level, numeric_only, **kwargs)
  10343                 name, axis=axis, level=level, skipna=skipna, numeric_only=numeric_only
  10344             )
> 10345         return self._reduce(
  10346             func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only
  10347         )

~/miniconda/lib/python3.8/site-packages/pandas/core/series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
   4380         if isinstance(delegate, ExtensionArray):
   4381             # dispatch to ExtensionArray interface
-> 4382             return delegate._reduce(name, skipna=skipna, **kwds)
   4383 
   4384         else:

~/miniconda/lib/python3.8/site-packages/pandas/core/arrays/string_arrow.py in _reduce(self, name, skipna, **kwargs)
    377     def _reduce(self, name: str, skipna: bool = True, **kwargs):
    378         if name in ["min", "max"]:
--> 379             return getattr(self, name)(skipna=skipna)
    380 
    381         raise TypeError(f"Cannot perform reduction '{name}' with string dtype")

AttributeError: 'ArrowStringArray' object has no attribute 'min'
Solution

I am hopeful that Arrow maybe already has an min/max implementation and they just haven't been hooked up yet.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4