A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pandas-dev/pandas/issues/42600 below:

Serialize view of ArrowStringArray · Issue #42600 · pandas-dev/pandas · GitHub

Currently Pandas serializes views of ArrowStringArrays by serailizing the whole thing, rather than a subset. Here is an example:

In [1]: import pandas as pd

In [2]: s = pd.Series([c * 1000 for c in "abcdefghijklmnopqrstuvwxyz"])

In [3]: s
Out[3]: 
0     aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa...
1     bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb...
2     cccccccccccccccccccccccccccccccccccccccccccccc...
3     dddddddddddddddddddddddddddddddddddddddddddddd...
4     eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee...
5     ffffffffffffffffffffffffffffffffffffffffffffff...
6     gggggggggggggggggggggggggggggggggggggggggggggg...
7     hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh...
8     iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii...
9     jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj...
10    kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk...
11    llllllllllllllllllllllllllllllllllllllllllllll...
12    mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm...
13    nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn...
14    oooooooooooooooooooooooooooooooooooooooooooooo...
15    pppppppppppppppppppppppppppppppppppppppppppppp...
16    qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq...
17    rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr...
18    ssssssssssssssssssssssssssssssssssssssssssssss...
19    tttttttttttttttttttttttttttttttttttttttttttttt...
20    uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu...
21    vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv...
22    wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww...
23    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
24    yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy...
25    zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz...
dtype: object

In [4]: import pickle

In [5]: len(pickle.dumps(s))
Out[5]: 26758

In [6]: len(pickle.dumps(s.astype("string[pyarrow]")))
Out[6]: 26891

In [7]: len(pickle.dumps(s.head(5)))
Out[7]: 5632

In [8]: len(pickle.dumps(s.astype("string[pyarrow]").head(5)))
Out[8]: 26891

This negatively affects dask dataframe operations that cut up pandas dataframes into small pieces, moves them around to different computers, and then pieces them back together again.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4