In batch-load scenarios, an application writes data and commits it as a single atomic transaction. When using the Storage Write API to batch load data, create one or more streams in pending type. Pending type supports stream-level transactions. Records are buffered in a pending state until you commit the stream.
For batch workloads, also consider using the Storage Write API through the Apache Spark SQL connector for BigQuery using Dataproc, rather than writing custom Storage Write API code.
The Storage Write API is well-suited to a data pipeline architecture. A main process creates a number of streams. For each stream, it assigns a worker thread or a separate process to write a portion of the batch data. Each worker creates a connection to its stream, writes data, and finalizes its stream when it's done. After all of the workers signal successful completion to the main process, the main process commits the data. If a worker fails, its assigned portion of the data will not show up in the final results, and the whole worker can be safely retried. In a more sophisticated pipeline, workers checkpoint their progress by reporting the last offset written to the main process. This approach can result in a robust pipeline that is resilient to failures.
Committing is an atomic operation, and you can commit multiple streams at once. A stream can only be committed once, so if the commit operation fails, it is safe to retry it. Until you commit a stream, the data is pending and not visible to reads.
After the stream is finalized and before it is committed, the data can remain in the buffer for up to 4 hours. Pending streams must be committed within 24 hours. There is a quota limit on the total size of the pending stream buffer.
C#To learn how to install and use the client library for BigQuery, see BigQuery client libraries. For more information, see the BigQuery C# API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
GoTo learn how to install and use the client library for BigQuery, see BigQuery client libraries. For more information, see the BigQuery Go API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
JavaTo learn how to install and use the client library for BigQuery, see BigQuery client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Node.jsTo learn how to install and use the client library for BigQuery, see BigQuery client libraries. For more information, see the BigQuery Node.js API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
PythonThis example shows a simple record with two fields. For a longer example that shows how to send different data types, including STRUCT
types, see the append_rows_proto2 sample on GitHub.
To learn how to install and use the client library for BigQuery, see BigQuery client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
This code example depends on a compiled protocol module, customer_record_pb2.py
. To create the compiled module, execute protoc --python_out=. customer_record.proto
, where protoc
is the protocol buffer compiler. The customer_record.proto
file defines the format of the messages used in the Python example.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4