Currently Pandas serializes views of ArrowStringArrays by serailizing the whole thing, rather than a subset. Here is an example:
In [1]: import pandas as pd In [2]: s = pd.Series([c * 1000 for c in "abcdefghijklmnopqrstuvwxyz"]) In [3]: s Out[3]: 0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa... 1 bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb... 2 cccccccccccccccccccccccccccccccccccccccccccccc... 3 dddddddddddddddddddddddddddddddddddddddddddddd... 4 eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee... 5 ffffffffffffffffffffffffffffffffffffffffffffff... 6 gggggggggggggggggggggggggggggggggggggggggggggg... 7 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh... 8 iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii... 9 jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj... 10 kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk... 11 llllllllllllllllllllllllllllllllllllllllllllll... 12 mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm... 13 nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn... 14 oooooooooooooooooooooooooooooooooooooooooooooo... 15 pppppppppppppppppppppppppppppppppppppppppppppp... 16 qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq... 17 rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr... 18 ssssssssssssssssssssssssssssssssssssssssssssss... 19 tttttttttttttttttttttttttttttttttttttttttttttt... 20 uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu... 21 vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv... 22 wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww... 23 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx... 24 yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy... 25 zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz... dtype: object In [4]: import pickle In [5]: len(pickle.dumps(s)) Out[5]: 26758 In [6]: len(pickle.dumps(s.astype("string[pyarrow]"))) Out[6]: 26891 In [7]: len(pickle.dumps(s.head(5))) Out[7]: 5632 In [8]: len(pickle.dumps(s.astype("string[pyarrow]").head(5))) Out[8]: 26891
This negatively affects dask dataframe operations that cut up pandas dataframes into small pieces, moves them around to different computers, and then pieces them back together again.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4