A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://developers.google.com/bigquery/docs/metadata-caching below:

Metadata caching for external tables | BigQuery

Stay organized with collections Save and categorize content based on your preferences.

This document describes how to use metadata caching (also known as column metadata indexing) to improve query performance on object tables and some types of BigLake tables.

Object tables and some types of BigLake tables can cache metadata information about files in external datastores—for example, Cloud Storage. The following types of BigLake tables support metadata caching:

The metadata includes file names, partitioning information, and metadata for files such as row counts. You can choose whether to enable metadata caching on a table. Queries with a large number of files and with Hive partition filters benefit the most from metadata caching.

If you don't enable metadata caching, queries on the table must read the external data source to get object metadata. Reading this data increases the query latency; listing millions of files from the external data source can take several minutes. If you enable metadata caching, queries can avoid listing files from the external data source and can partition and prune files more quickly.

You can enable metadata caching on a BigLake or object table when you create the table. For more information about creating object tables, see Create object tables. For more information about creating BigLake tables, see one of the following topics:

There are two properties that control the behavior of this feature:

When you have metadata caching enabled, you specify the maximum interval of metadata staleness that is acceptable for operations against the table. For example, if you specify an interval of 1 hour, then operations against the table use cached metadata if it has been refreshed within the past hour. If the cached metadata is older than that, the operation falls back to retrieving metadata from the datastore (Amazon S3 or Cloud Storage) instead. You can specify a staleness interval between 30 minutes and 7 days.

You can choose to refresh the cache either automatically or manually:

Both manual and automatic cache refreshes are executed with INTERACTIVE query priority.

If you choose to use automatic refreshes, we recommend that you create a reservation, and then create an assignment with a BACKGROUND job type for the project that runs the metadata cache refresh jobs. This prevents the refresh jobs from competing with user queries for resources, and potentially failing if there aren't sufficient resources available for them.

You should consider how the staleness interval and metadata caching mode values will interact before you set them. Consider the following examples:

For more information on setting metadata caching options for BigLake tables, see Create Amazon S3 BigLake external tables or Create BigLake external tables for Cloud Storage.

For more information on setting metadata caching options for object tables, see Create object tables.

To find information about metadata cache refresh jobs, query the INFORMATION_SCHEMA.JOBS view, as shown in the following example:

SELECT *
FROM `region-us.INFORMATION_SCHEMA.JOBS`
WHERE job_id LIKE '%metadata_cache_refresh%'
AND creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 6 HOUR)
ORDER BY start_time DESC
LIMIT 10;

Cached metadata is protected by the customer-managed encryption key (CMEK) used for the table that the cached metadata is associated with. This might be a CMEK applied directly to the table, or a CMEK that the table inherits from the dataset or project.

If a default CMEK is set for the project or dataset, or if the existing CMEK for the project or dataset is changed, this doesn't affect existing tables or their cached metadata. You must change the key for the table to apply the new key to both the table and its cached metadata.

CMEKs created in BigQuery don't apply to the Cloud Storage files that are used by BigLake and object tables. To obtain end-to-end CMEK encryption, configure CMEKs in Cloud Storage for those files.

To get information about metadata cache usage for a query job, call the jobs.get method for that job and look at the MetadataCacheStatistics field in the JobStatistics2 section of the Job resource. This field provides information on which metadata cache-enabled tables were used by the query, whether the metadata cache was used by the query, and if not, the reason why not.

Table statistics

For BigLake tables that are based on Parquet files, table statistics are collected when the metadata cache is refreshed. Table statistic collection happens during both automatic and manual refreshes, and the statistics are kept for the same period as the metadata cache.

The table statistics collected include file information like row counts, physical and uncompressed file sizes, and cardinality of columns. When you run a query on a Parquet-based BigLake table, these statistics are supplied to the query optimizer to enable better query planning and potentially improve query performance for some types of queries. For example, a common query optimization is dynamic constraint propagation, where the query optimizer dynamically infers predicates on the larger fact tables in a join from the smaller dimension tables. While this optimization can speed up queries by using normalized table schemas, it requires accurate table statistics. The table statistics collected by metadata caching enable greater optimization of query plans in both BigQuery and Apache Spark.

Limitations

The following limitations apply to the metadata cache:

What's next

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-07 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[],[]]


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4