A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pandas-dev/pandas/issues/42613 below:

PyArrow StringDtype / StringArray fallback policy · Issue #42613 · pandas-dev/pandas · GitHub

In #35169 / #42597 we discussed the desired behavior of PyArrow-backed StringArray when a certain method is not implemented in pyarrow.compute.

For string methods like str_normalize, which aren't currently implemented in pyarrow.compute, I believe we (silently) cast from Pyarrow[string] to an object-dtype ndarray of Python str objects at

mask = isna(self) arr = np.asarray(self)

. That's going to be slow and more than doubles the memory usage of the array.

These kinds of performance cliffs are difficult for users to debug. I don't think we should do that conversion on behalf of the user. If something isn't implemented yet, then I think we should raise with a message saying they should convert to string[python] dtype first.

If we don't want to raise, we could emit a PerformanceWarning, similar to what we do for SparseArray when converting to dense.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4