I have data in form of a list of dicts (see MRE below). To make everything type strict I would always like to pass in the expected schema (dtypes) when I read in this data. This option is given in the pl.DataFrame
constructor with either schema
or schema_overrides
. However I frequently run into trouble with the Datetime columns in the schema. Especially when they presented as strings in the dictionaries
polars.exceptions.ComputeError: could not append value: "2020-02-11" of type: str to the builder; make sure that all rows have the same schema or consider increasing `infer_schema_length`
Question
Is there a way to "automatically" parse datetime strings when I construct the Dataframe (or use the pl.from_dicts()
method)? Something comparable to the solution for data that is present as timestamps (int) in the dictionary of the data implemented early 2024 (github issue)?
Is there something similar for date information present as string (e.g. 2022-01-01
)?
Or do I have to drop from my schema_override
every pl.Datetime key and then later on convert this manually via
with_columns(pl.col(list_dropped_datetime_cols).cast(pl.Datetime))
MRE
import polars as pl
schema_override = {
"some_int_override": pl.Int8,
"some_date_override": pl.Datetime,
}
dict_data = [
{
"some_int_override": 1,
"some_date_override": "2020-02-11",
"some_date": "2025-02-11",
}
]
df_naiive = pl.DataFrame(dict_data)
print(df_naiive)
df_schema_override = pl.DataFrame(dict_data, schema_overrides=schema_override)
print(df_schema_override)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4