This document describes how to create datasets in BigQuery.
To see steps for copying a dataset, including across regions, see Copying datasets.
This document describes how to work with regular datasets that store data in BigQuery. To learn how to work with Spanner external datasets see Create Spanner external datasets. To learn how to work with AWS Glue federated datasets see Create AWS Glue federated datasets.
Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.
To create a dataset, you need the bigquery.datasets.create
IAM permission.
Each of the following predefined IAM roles includes the permissions that you need in order to create a dataset:
ConsoleEU
or an EU-based region for the dataset location, your Core BigQuery Customer Data resides in the EU. Core BigQuery Customer Data is defined in the Service Specific Terms.
Alternatively, click Select scope to search for a resource or to see a list of current resources.
PROJECT_ID
or the ORGANIZATION_ID
, and then click Save.When you change a dataset's billing model, it takes 24 hours for the change to take effect.
Once you change a dataset's storage billing model, you must wait 14 days before you can change the storage billing model again.
Use the CREATE SCHEMA
statement.
To create a dataset in a project other than your default project, add the project ID to the dataset ID in the following format: PROJECT_ID.DATASET_ID
.
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
CREATE SCHEMA PROJECT_ID.DATASET_ID OPTIONS ( default_kms_key_name = 'KMS_KEY_NAME', default_partition_expiration_days = PARTITION_EXPIRATION, default_table_expiration_days = TABLE_EXPIRATION, description = 'DESCRIPTION', labels = [('KEY_1','VALUE_1'),('KEY_2','VALUE_2')], location = 'LOCATION', max_time_travel_hours = HOURS, storage_billing_model = BILLING_MODEL);
Replace the following:
PROJECT_ID
: your project IDDATASET_ID
: the ID of the dataset that you're creatingKMS_KEY_NAME
: the name of the default Cloud Key Management Service key used to protect newly created tables in this dataset unless a different key is supplied at the time of creation. You cannot create a Google-encrypted table in a dataset with this parameter set.PARTITION_EXPIRATION
: the default lifetime (in days) for partitions in newly created partitioned tables. The default partition expiration has no minimum value. The expiration time evaluates to the partition's date plus the integer value. Any partition created in a partitioned table in the dataset is deleted PARTITION_EXPIRATION
days after the partition's date. If you supply the time_partitioning_expiration
option when you create or update a partitioned table, the table-level partition expiration takes precedence over the dataset-level default partition expiration.TABLE_EXPIRATION
: the default lifetime (in days) for newly created tables. The minimum value is 0.042 days (one hour). The expiration time evaluates to the current time plus the integer value. Any table created in the dataset is deleted TABLE_EXPIRATION
days after its creation time. This value is applied if you do not set a table expiration when you create the table.DESCRIPTION
: a description of the datasetKEY_1:VALUE_1
: the key-value pair that you want to set as the first label on this datasetKEY_2:VALUE_2
: the key-value pair that you want to set as the second labelLOCATION
: the dataset's location. After a dataset is created, the location can't be changed. Note: If you choose EU
or an EU-based region for the dataset location, your Core BigQuery Customer Data resides in the EU. Core BigQuery Customer Data is defined in the Service Specific Terms.HOURS
: the duration in hours of the time travel window for the new dataset. The HOURS
value must be an integer expressed in multiples of 24 (48, 72, 96, 120, 144, 168) between 48 (2 days) and 168 (7 days). 168 hours is the default if this option isn't specified.BILLING_MODEL
: sets the storage billing model for the dataset. Set the BILLING_MODEL
value to PHYSICAL
to use physical bytes when calculating storage charges, or to LOGICAL
to use logical bytes. LOGICAL
is the default.
When you change a dataset's billing model, it takes 24 hours for the change to take effect.
Once you change a dataset's storage billing model, you must wait 14 days before you can change the storage billing model again.
Click play_circle Run.
For more information about how to run queries, see Run an interactive query.
bqTo create a new dataset, use the bq mk
command with the --location
flag. For a full list of possible parameters, see the bq mk --dataset
command reference.
To create a dataset in a project other than your default project, add the project ID to the dataset name in the following format: PROJECT_ID:DATASET_ID
.
bq --location=LOCATION mk \ --dataset \ --default_kms_key=KMS_KEY_NAME \ --default_partition_expiration=PARTITION_EXPIRATION \ --default_table_expiration=TABLE_EXPIRATION \ --description="DESCRIPTION" \ --label=KEY_1:VALUE_1 \ --label=KEY_2:VALUE_2 \ --add_tags=KEY_3:VALUE_3[,...] \ --max_time_travel_hours=HOURS \ --storage_billing_model=BILLING_MODEL \ PROJECT_ID:DATASET_ID
Replace the following:
LOCATION
: the dataset's location. After a dataset is created, the location can't be changed. You can set a default value for the location by using the .bigqueryrc
file.
KMS_KEY_NAME
: the name of the default Cloud Key Management Service key used to protect newly created tables in this dataset unless a different key is supplied at the time of creation. You cannot create a Google-encrypted table in a dataset with this parameter set.
PARTITION_EXPIRATION
: the default lifetime (in seconds) for partitions in newly created partitioned tables. The default partition expiration has no minimum value. The expiration time evaluates to the partition's date plus the integer value. Any partition created in a partitioned table in the dataset is deleted PARTITION_EXPIRATION
seconds after the partition's date. If you supply the --time_partitioning_expiration
flag when you create or update a partitioned table, the table-level partition expiration takes precedence over the dataset-level default partition expiration.
TABLE_EXPIRATION
: the default lifetime (in seconds) for newly created tables. The minimum value is 3600 seconds (one hour). The expiration time evaluates to the current time plus the integer value. Any table created in the dataset is deleted TABLE_EXPIRATION
seconds after its creation time. This value is applied if you don't set a table expiration when you create the table.
DESCRIPTION
: a description of the dataset
KEY_1:VALUE_1
: the key-value pair that you want to set as the first label on this dataset, and KEY_2:VALUE_2
is the key-value pair that you want to set as the second label.
KEY_3:VALUE_3
: the key-value pair that you want to set as a tag on the dataset. Add multiple tags under the same flag with commas between key:value pairs.
HOURS
: the duration in hours of the time travel window for the new dataset. The HOURS
value must be an integer expressed in multiples of 24 (48, 72, 96, 120, 144, 168) between 48 (2 days) and 168 (7 days). 168 hours is the default if this option isn't specified.
BILLING_MODEL
: sets the storage billing model for the dataset. Set the BILLING_MODEL
value to PHYSICAL
to use physical bytes when calculating storage charges, or to LOGICAL
to use logical bytes. LOGICAL
is the default.
When you change a dataset's billing model, it takes 24 hours for the change to take effect.
Once you change a dataset's storage billing model, you must wait 14 days before you can change the storage billing model again.
PROJECT_ID
: your project ID.
DATASET_ID
is the ID of the dataset that you're creating.
For example, the following command creates a dataset named mydataset
with data location set to US
, a default table expiration of 3600 seconds (1 hour), and a description of This is my dataset
. Instead of using the --dataset
flag, the command uses the -d
shortcut. If you omit -d
and --dataset
, the command defaults to creating a dataset.
bq --location=US mk -d \ --default_table_expiration 3600 \ --description "This is my dataset." \ mydataset
To confirm that the dataset was created, enter the bq ls
command. Also, you can create a table when you create a new dataset using the following format: bq mk -t dataset.table
. For more information about creating tables, see Creating a table.
Use the google_bigquery_dataset
resource.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Create a dataset
The following example creates a dataset named mydataset
:
When you create a dataset using the google_bigquery_dataset
resource, it automatically grants access to the dataset to all accounts that are members of project-level basic roles. If you run the terraform show
command after creating the dataset, the access
block for the dataset looks similar to the following:
To grant access to the dataset, we recommend that you use one of the google_bigquery_iam
resources, as shown in the following example, unless you plan to create authorized objects, such as authorized views, within the dataset. In that case, use the google_bigquery_dataset_access
resource. Refer to that documentation for examples.
Create a dataset and grant access to it
The following example creates a dataset named mydataset
, then uses the google_bigquery_dataset_iam_policy
resource to grant access to it.
google_bigquery_dataset_access
resource. For examples, see google_bigquery_dataset_access
.
Create a dataset with a customer-managed encryption key
The following example creates a dataset named mydataset
, and also uses the google_kms_crypto_key
and google_kms_key_ring
resources to specify a Cloud Key Management Service key for the dataset. You must enable the Cloud Key Management Service API before running this example.
To apply your Terraform configuration in a Google Cloud project, complete the steps in the following sections.
Prepare Cloud ShellSet the default Google Cloud project where you want to apply your Terraform configurations.
You only need to run this command once per project, and you can run it in any directory.
export GOOGLE_CLOUD_PROJECT=PROJECT_ID
Environment variables are overridden if you set explicit values in the Terraform configuration file.
Each Terraform configuration file must have its own directory (also called a root module).
.tf
extension—for example main.tf
. In this tutorial, the file is referred to as main.tf
.
mkdir DIRECTORY && cd DIRECTORY && touch main.tf
If you are following a tutorial, you can copy the sample code in each section or step.
Copy the sample code into the newly created main.tf
.
Optionally, copy the code from GitHub. This is recommended when the Terraform snippet is part of an end-to-end solution.
terraform init
Optionally, to use the latest Google provider version, include the -upgrade
option:
terraform init -upgrade
terraform plan
Make corrections to the configuration as necessary.
yes
at the prompt:
terraform apply
Wait until Terraform displays the "Apply complete!" message.
Call the datasets.insert
method with a defined dataset resource.
Before trying this sample, follow the C# setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery C# API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
GoBefore trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
JavaBefore trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Node.jsBefore trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
PHPBefore trying this sample, follow the PHP setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery PHP API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
PythonBefore trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
RubyBefore trying this sample, follow the Ruby setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Ruby API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
When you create a dataset in BigQuery, the dataset name must be unique for each project. The dataset name can contain the following:
Dataset names are case-sensitive by default. mydataset
and MyDataset
can coexist in the same project, unless one of them has case-sensitivity turned off. For examples, see Creating a case-insensitive dataset and Resource: Dataset.
A hidden dataset is a dataset whose name begins with an underscore. You can query tables and views in hidden datasets the same way you would in any other dataset. Hidden datasets have the following restrictions:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4