A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://cloud.google.com/dataflow/docs/guides/read-from-cloud-storage below:

Read from Cloud Storage to Dataflow

Read from Cloud Storage to Dataflow

Stay organized with collections Save and categorize content based on your preferences.

To read data from Cloud Storage to Dataflow, use the Apache Beam TextIO or AvroIO I/O connector.

Note: Depending on your scenario, consider using one of the Google-provided Dataflow templates. Several of these templates read from Cloud Storage. Include the Google Cloud Platform library dependency

To use the TextIO or AvroIO connector with Cloud Storage, include the following dependency. This library provides a schema handler for "gs://" filenames.

Java
<dependency>
  <groupId>org.apache.beam</groupId>
  <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
  <version>${beam.version}</version>
</dependency>
Python
apache-beam[gcp]==VERSION
Go
import _ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs"

For more information, see Install the Apache Beam SDK.

Enable gRPC on Apache Beam I/O connector on Dataflow

You can connect to Cloud Storage using gRPC through the Apache Beam I/O connector on Dataflow. gRPC is a high performance open-source remote procedure call (RPC) framework developed by Google that you can use to interact with Cloud Storage.

To speed up your Dataflow job's read requests to Cloud Storage, you can enable the Apache Beam I/O connector on Dataflow to use gRPC.

Command line
  1. Ensure that you use the Apache Beam SDK version 2.55.0 or later.
  2. To run a Dataflow job, use --additional-experiments=use_grpc_for_gcs pipeline option. For information about the different pipeline options, see Optional flags.
Apache Beam SDK
  1. Ensure that you use the Apache Beam SDK version 2.55.0 or later.
  2. To run a Dataflow job, use --experiments=use_grpc_for_gcs pipeline option. For information about the different pipeline options, see Basic options.

You can configure Apache Beam I/O connector on Dataflow to generate gRPC related metrics in Cloud Monitoring. The gRPC related metrics can help you to do the following:

For information about how to configure Apache Beam I/O connector on Dataflow to generate gRPC related metrics, see

Use client-side metrics

. If gathering metrics isn't necessary for your use case, you can choose to opt-out of metrics collection. For instructions, see

Opt-out of client-side metrics

.

Parallelism

The TextIO and AvroIO connectors support two levels of parallelism:

Performance

The following table shows performance metrics for reading from Cloud Storage. The workloads were run on one e2-standard2 worker, using the Apache Beam SDK 2.49.0 for Java. They did not use Runner v2.

100 M records | 1 kB | 1 column Throughput (bytes) Throughput (elements) Read 320 MBps 320,000 elements per second

These metrics are based on simple batch pipelines. They are intended to compare performance between I/O connectors, and are not necessarily representative of real-world pipelines. Dataflow pipeline performance is complex, and is a function of VM type, the data being processed, the performance of external sources and sinks, and user code. Metrics are based on running the Java SDK, and aren't representative of the performance characteristics of other language SDKs. For more information, see Beam IO Performance.

Best practices Example

The following example shows how to read from Cloud Storage.

What's next

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-10-13 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-10-13 UTC."],[],[]]


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.5