I have an issue where it is not possible to upload a Pandas DataFrame with a repeated field to BigQuery. It is very much related to an issue I've had earlier: googleapis/google-cloud-python#8093
Since that has been resolved (by being able to specify the schema), I've created a separate issue. I also couldn't find issues related to REPEATED fields.
Environment detailsMac OS X 10.14.5
Python 3.6.8
Packages:
google-api-core==1.14.2
google-auth==1.6.3
google-cloud-bigquery==1.19.0
google-cloud-core==1.0.3
google-cloud-iam==0.2.1
google-cloud-logging==1.12.1
google-resumable-media==0.3.3
googleapis-common-protos==1.6.0
Steps to reproduce
Also:
JobConfig
doesn't change the error.import pandas as pd from google.cloud import bigquery PROJECT = "MY-PROJECT" DATASET = "MY_DATASET" TABLE = "MY_TABLE" # My table schema schema = [ bigquery.SchemaField("foo", "INTEGER", mode="REQUIRED"), bigquery.SchemaField("bar", "FLOAT", mode="REPEATED"), ] # Set everything up client = bigquery.Client(PROJECT) dataset_ref = client.dataset(DATASET) table_ref = dataset_ref.table(TABLE) # Delete the table if exists print("Deleting table if exists...") client.delete_table(table_ref, not_found_ok=True) # Create the table print("Creating table...") table = bigquery.Table(table_ref, schema=schema) table.time_partitioning = bigquery.TimePartitioning( type_=bigquery.TimePartitioningType.DAY ) table = client.create_table(table, exists_ok=True) print("Table schema:") print(table.schema) print("Table partitioning:") print(table.time_partitioning) # Upload data to partition table_partition = TABLE + "$20190522" table_ref = dataset_ref.table(table_partition) df = pd.DataFrame({"foo": [1, 2, 3], "bar": [[2.0, 3.0], [3.0, 4.0], [4.0, 5.0]]}) job_config = bigquery.LoadJobConfig(schema=schema) client.load_table_from_dataframe(df, table_ref, job_config=job_config).result()Stack trace
Traceback (most recent call last):
File "test.py", line 51, in <module>
client.load_table_from_dataframe(df, table_ref, job_config=job_config).result()
File "google/cloud/bigquery/job.py", line 734, in result
return super(_AsyncJob, self).result(timeout=timeout)
File "google/api_core/future/polling.py", line 127, in result
raise self._exception
google.api_core.exceptions.BadRequest: 400 Error while reading data, error message:
Provided schema is not compatible with the file 'prod-scotty-******'.
Field 'bar' is specified as REPEATED in provided schema
which does not match REQUIRED as specified in the file.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4