Stay organized with collections Save and categorize content based on your preferences.
Table samplingPreview
This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of the Service Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see the launch stage descriptions.
Table sampling lets you query random subsets of data from large BigQuery tables. Sampling returns a variety of records while avoiding the costs associated with scanning and processing an entire table.
Using table samplingTo use table sampling in a query, include the TABLESAMPLE
clause. For example, the following query selects approximately 10% of a table's data:
SELECT * FROM dataset.my_table TABLESAMPLE SYSTEM (10 PERCENT)
Unlike the LIMIT
clause, TABLESAMPLE
returns a random subset of data from a table. Also, BigQuery does not cache the results of queries that include a TABLESAMPLE
clause, so the query might return different results each time.
You can combine the TABLESAMPLE
clause with other selection conditions. The following example samples about 50% of the table and then applies a WHERE
clause:
SELECT *
FROM dataset.my_table TABLESAMPLE SYSTEM (50 PERCENT)
WHERE customer_id = 1
The next example combines a TABLESAMPLE
clause with a JOIN
clause:
SELECT *
FROM dataset.table1 T1 TABLESAMPLE SYSTEM (10 PERCENT)
JOIN dataset.table2 T2 TABLESAMPLE SYSTEM (20 PERCENT) USING (customer_id)
For smaller tables, if you join two samples and none of the sampled rows meet the join condition, then you might receive an empty result.
You can specify the percentage as a query parameter. The next example shows how to pass the percentage to a query by using the bq command-line tool:
bq query --use_legacy_sql=false --parameter=percent:INT64:29 \
'SELECT * FROM `dataset.my_table` TABLESAMPLE SYSTEM (@percent PERCENT)`
BigQuery tables are organized into data blocks. The TABLESAMPLE
clause works by randomly selecting a percentage of data blocks from the table and reading all of the rows in the selected blocks. The sampling granularity is limited by the number of data blocks.
Typically, BigQuery splits tables or table partitions into blocks if they are larger than about 1 GB. Smaller tables might consist of a single data block. In that case, the TABLESAMPLE
clause reads the entire table. If the sampling percentage is greater than zero and the table is not empty, then table sampling always returns some results.
Blocks can be different sizes, so the exact fraction of rows that are sampled might vary. If you want to sample individual rows, rather than data blocks, then you can use a WHERE rand() < K
clause instead. However, this approach requires BigQuery to scan the entire table. To save costs but still benefit from row-level sampling, you can combine both techniques.
The following example reads approximately 20% of the data blocks from storage and then randomly selects 10% of the rows in those blocks:
SELECT * FROM dataset.my_table TABLESAMPLE SYSTEM (20 PERCENT)
WHERE rand() < 0.1
External tables
You can use the TABLESAMPLE
clause with external tables that store data in a collection of files. BigQuery samples a subset of the external files that the table references. For some file formats, BigQuery can split individual files into blocks for sampling. Some external data, such as data in Google Sheets, consists of a single file that is sampled as one block of data.
If you use table sampling with streaming inserts, then BigQuery samples data from the write-optimized storage. In some cases, all the data in the write-optimized storage is represented as a single block. When that happens, either all the data in the write-optimized storage appears in the results, or none of it does.
Partitioned and clustered tablesPartitioning and clustering produce blocks where all rows within a specific block have either the same partitioning key or have clustering attributes with close values. Therefore, sample sets from these tables tend to be more biased than sample sets from non-partitioned, non-clustered tables.
LimitationsUNNEST
operator, is not supported.IN
subquery is not supported.If you use on-demand billing, then you are charged for reading the data that is sampled. BigQuery does not cache the results of a query that includes a TABLESAMPLE
clause, so each execution incurs the cost of reading the data from storage.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-07 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["Table sampling allows querying random subsets of data from large BigQuery tables, providing a representative sample without the cost of scanning the entire table."],["The `TABLESAMPLE` clause, when used in a query, selects a specified percentage of data blocks from a table, returning all rows within those selected blocks."],["Unlike `LIMIT`, `TABLESAMPLE` returns a random selection, and results are not cached, which means subsequent queries may return different subsets of data."],["Table sampling can be combined with other SQL clauses like `WHERE` and `JOIN`, as well as with parameterized queries to dynamically adjust the sampling percentage."],["Table sampling is supported for external tables, but may behave differently based on file type or structure, and it's subject to several limitations, including restrictions on views, subqueries, array scans, and row-level security."]]],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4