A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pandas-dev/pandas/issues/43689 below:

`DataFrame().to_parquet()` does not write Parquet compliant data for nested arrays · Issue #43689 · pandas-dev/pandas · GitHub

Reproducible Example
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd


df = pd.DataFrame({'int_array_col': [[[1,2,3]], [[4,5,6]]]})
df.to_parquet('/tmp/test', engine='pyarrow')

pandas_parquet_table = pq.read_table('/tmp/test')

pyarrow_table = pa.Table.from_pandas(df)
writer = pa.BufferOutputStream()
pq.write_table(
    pyarrow_table,
    writer,
    use_compliant_nested_type=True
)
reader = pa.BufferReader(writer.getvalue())
parquet_table = pq.read_table(reader)

print("Pandas:", pandas_parquet_table.schema.types)
print("Non-compliant Parquet:", pyarrow_table.schema.types)
print("Compliant Parquet:", parquet_table.schema.types)
assert pandas_parquet_table.schema.types == pyarrow_table.types
assert pandas_parquet_table.schema.types == parquet_table.schema.types


```python-traceback
Pandas: [ListType(list<item: list<item: int64>>)]
Non-compliant Parquet: [ListType(list<item: list<item: int64>>)]
Compliant Parquet: [ListType(list<element: list<element: int64>>)]
Traceback (most recent call last):
  File "/Users/judahrand/test_dir/pandas_parquet.py", line 25, in <module>
    assert pandas_parquet_table.schema.types == parquet_table.schema.types
AssertionError
Issue Description

This method currently does not write adherent Parquet Logical Types for nested arrays as defined here. This can cause problems when trying to Parquet as in intermediate format, for example loading data into BigQuery which expects adherent data.

This was an issue in PyArrow itself, however, it was fixed in ARROW-11497. I believe that this flag should be set in Pandas if we are to claim that Pandas .to_parquet() method actually outputs Parquet.

Expected Behavior

Output complaint Parquet.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4