A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pandas-dev/pandas/issues/40652 below:

Pandas StructDType · Issue #40652 · pandas-dev/pandas · GitHub

I already searched for a while for discussions of nested structures in Pandas, but I couldn't find anything corresponding.

Is your feature request related to a problem?

Currently, there is no way to work with arbitrary nested data types in Pandas.
In Spark and PyArrow, one can have StructTypes. In NumPy, we have Compound types.
However, when we try something like this:

df = pd.DataFrame({'pos':[1,2,3], "val": ["a","b", "c"]})
struct = df.to_records(index=False).view(type=np.ndarray, dtype=list(df.dtypes.items()))
pd.Series(struct)

Then we only get an error:

ValueError: Cannot construct a Series from an ndarray with compound dtype.  Use DataFrame instead.

This would be useful for a number of use cases:

Describe the solution you'd like

My wish would be to have a generic Pandas StructDType that:

A perfect example is the IntervalDType/IntervalArray that is already implemented in Pandas:

class IntervalArray(IntervalMixin, ExtensionArray):

In my opinion, its implementation is a special case of a Struct dtype.

It also supports conversion to and from PyArrow (see

2198f51

).

Therefore, by generalizing the IntervalDType to use any number of subtypes, we would have the StructDType implementation ready.

API breaking implications

to_csv(), etc. could have difficulties with storing nested data.
That's maybe a followup problem to solve.

Describe alternatives you've considered

One can try to construct the Series as a list of tuples.
However, this has two drawbacks:


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4