A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://developers.google.com/bigquery/docs/glue-federated-datasets below:

Create AWS Glue federated datasets | BigQuery

Stay organized with collections Save and categorize content based on your preferences.

Create AWS Glue federated datasets

This document describes how to create a federated dataset in BigQuery that's linked to an existing database in AWS Glue.

A federated dataset is a connection between BigQuery and an external data source at the dataset level. The tables in a federated dataset are automatically populated from the tables in the corresponding external data source. You can query these tables directly in BigQuery, but you cannot make modifications, additions, or deletions. However, any updates that you make in the external data source are automatically reflected in BigQuery.

Before you begin

Ensure that you have a connection to access AWS Glue data.

Required permissions

To get the permissions that you need to create a federated dataset, ask your administrator to grant you the BigQuery Admin (roles/bigquery.admin) IAM role. For more information about granting roles, see Manage access to projects, folders, and organizations.

This predefined role contains the permissions required to create a federated dataset. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to create a federated dataset:

You might also be able to get these permissions with custom roles or other predefined roles.

For more information about IAM roles and permissions in BigQuery, see Introduction to IAM.

Create a federated dataset

To create a federated dataset, do the following:

Console
  1. Open the BigQuery page in the Google Cloud console.

    Go to the BigQuery page

  2. In the Explorer panel, select the project where you want to create the dataset.

  3. Expand the more_vert Actions option and click Create dataset.

  4. On the Create dataset page, do the following:

  5. Click Create dataset.

SQL

Use the CREATE EXTERNAL SCHEMA data definition language (DDL) statement.

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    CREATE EXTERNAL SCHEMA DATASET_NAME
    WITH CONNECTION PROJECT_ID.CONNECTION_LOCATION.CONNECTION_NAME
      OPTIONS (
        external_source = 'AWS_GLUE_SOURCE',
        location = 'LOCATION');

    Replace the following:

  3. Click play_circle Run.

For more information about how to run queries, see Run an interactive query.

bq

In a command-line environment, create a dataset by using the bq mk command:

bq --location=LOCATION mk --dataset \
    --external_source aws-glue://AWS_GLUE_SOURCE \
    --connection_id PROJECT_ID.CONNECTION_LOCATION.CONNECTION_NAME \
    DATASET_NAME

Replace the following:

Terraform

Use the google_bigquery_dataset resource.

Note: To create BigQuery objects using Terraform, you must enable the Cloud Resource Manager API.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

The following example creates an AWS Glue federated dataset:

resource "google_bigquery_dataset" "dataset" {
  provider                    = google-beta
  dataset_id                  = "example_dataset"
  friendly_name               = "test"
  description                 = "This is a test description."
  location                    = "aws-us-east-1"

external_dataset_reference {
  external_source = "aws-glue://arn:aws:glue:us-east-1:999999999999:database/database"
  connection      = "projects/project/locations/aws-us-east-1/connections/connection"
  }
}

To apply your Terraform configuration in a Google Cloud project, complete the steps in the following sections.

Prepare Cloud Shell
  1. Launch Cloud Shell.
  2. Set the default Google Cloud project where you want to apply your Terraform configurations.

    You only need to run this command once per project, and you can run it in any directory.

    export GOOGLE_CLOUD_PROJECT=PROJECT_ID

    Environment variables are overridden if you set explicit values in the Terraform configuration file.

Prepare the directory

Each Terraform configuration file must have its own directory (also called a root module).

  1. In Cloud Shell, create a directory and a new file within that directory. The filename must have the .tf extension—for example main.tf. In this tutorial, the file is referred to as main.tf.
    mkdir DIRECTORY && cd DIRECTORY && touch main.tf
  2. If you are following a tutorial, you can copy the sample code in each section or step.

    Copy the sample code into the newly created main.tf.

    Optionally, copy the code from GitHub. This is recommended when the Terraform snippet is part of an end-to-end solution.

  3. Review and modify the sample parameters to apply to your environment.
  4. Save your changes.
  5. Initialize Terraform. You only need to do this once per directory.
    terraform init

    Optionally, to use the latest Google provider version, include the -upgrade option:

    terraform init -upgrade
Apply the changes
  1. Review the configuration and verify that the resources that Terraform is going to create or update match your expectations:
    terraform plan

    Make corrections to the configuration as necessary.

  2. Apply the Terraform configuration by running the following command and entering yes at the prompt:
    terraform apply

    Wait until Terraform displays the "Apply complete!" message.

  3. Open your Google Cloud project to view the results. In the Google Cloud console, navigate to your resources in the UI to make sure that Terraform has created or updated them.
Note: Terraform samples typically assume that the required APIs are enabled in your Google Cloud project. API

Call the datasets.insert method with a defined dataset resource and externalDatasetReference field for your AWS Glue database.

List tables in a federated dataset

To list the tables that are available for query in your federated dataset, see Listing datasets.

Get table information

To get information on the tables in your federated dataset, such as schema details, see Get table information.

Control access to tables

To manage access to the tables in your federated dataset, see Control access to resources with IAM.

Row-level security, column-level security, and data masking are also supported for tables in federated datasets.

Schema operations that might invalidate security policies, such as deleting a column in AWS Glue, can cause jobs to fail until the policies are updated. Additionally, if you delete a table in AWS Glue and recreate it, your security policies no longer apply to the recreated table.

Query AWS Glue data

Querying tables in federated datasets is the same as querying tables in any other BigQuery dataset.

You can query AWS Glue tables in the following formats:

Table mapping details

Every table that you grant access to in your AWS Glue database appears as an equivalent table in your BigQuery dataset.

Format

The format of each BigQuery table is determined by the following fields of the respective AWS Glue table:

The only exception is Iceberg tables, which use the TableType (Table.Parameters["table_type"]) field.

For example, an AWS Glue table with the following fields is mapped to an ORC table in BigQuery:

Location

The location of each BigQuery table is determined by the following:

Other properties

Additionally, some AWS Glue table properties are automatically mapped to format-specific options in BigQuery:

Format SerializationLib AWS Glue table value BigQuery option CSV LazySimpleSerDe Table.StorageDescriptor.SerdeInfo.Parameters["field.delim"] CsvOptions.fieldDelimiter CSV LazySimpleSerDe Table.StorageDescriptor.Parameters["serialization.encoding"] CsvOptions.encoding CSV LazySimpleSerDe Table.StorageDescriptor.Parameters["skip.header.line.count"] CsvOptions.skipLeadingRows CSV OpenCsvSerDe Table.StorageDescriptor.SerdeInfo.Parameters["separatorChar"] CsvOptions.fieldDelimiter CSV OpenCsvSerDe Table.StorageDescriptor.SerdeInfo.Parameters["quoteChar"] CsvOptions.quote CSV OpenCsvSerDe Table.StorageDescriptor.Parameters["serialization.encoding"] CsvOptions.encoding CSV OpenCsvSerDe Table.StorageDescriptor.Parameters["skip.header.line.count"] CsvOptions.skipLeadingRows JSON Hive JsonSerDe Table.StorageDescriptor.Parameters["serialization.encoding"] JsonOptions.encoding Create a view in a federated dataset

You can't create a view in a federated dataset. However, you can create a view in a standard dataset that's based on a table in a federated dataset. For more information, see Create views.

Delete a federated dataset

Deleting a federated dataset is the same as deleting any other BigQuery dataset. For more information, see Delete datasets.

Pricing

For information about pricing, see BigQuery Omni pricing.

Limitations What's next

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-07 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["Federated datasets in BigQuery establish a connection with an external AWS Glue database, automatically populating tables within BigQuery that reflect the data from the external source, allowing for direct querying."],["Creating a federated dataset requires specific permissions, such as the BigQuery Admin IAM role, and setting up a connection to access AWS Glue data with the appropriate Identity and Access Management policy statement."],["You can create a federated dataset through the Google Cloud console, SQL, the bq command-line tool, or Terraform, each requiring you to specify the AWS Glue source and connection details."],["While federated datasets allow for querying data in AWS Glue, they do not support direct modifications, additions, or deletions, as updates must occur in the external data source to be reflected in BigQuery."],["Federated datasets are subject to certain limitations, including restrictions on data modification, table creation, and support for certain BigQuery features, all while being priced under the BigQuery Omni model."]]],[]]


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4