Stay organized with collections Save and categorize content based on your preferences.
Best practices for functionsThis document describes how to optimize queries that use SQL functions.
Optimize string comparisonBest practice: When possible, use LIKE
instead of REGEXP_CONTAINS
.
In BigQuery, you can use the REGEXP_CONTAINS
function or the LIKE
operator to compare strings. REGEXP_CONTAINS
provides more functionality, but also has a slower execution time. Using LIKE
instead of REGEXP_CONTAINS
is faster, particularly if you don't need the full power of regular expressions that REGEXP_CONTAINS
provides, for example wildcard matching.
Consider the following use of the REGEXP_CONTAINS
function:
SELECT dim1 FROM `dataset.table1` WHERE REGEXP_CONTAINS(dim1, '.*test.*');
You can optimize this query as follows:
SELECT dim1 FROM `dataset.table` WHERE dim1 LIKE '%test%';Optimize aggregation functions
Best practice: If your use case supports it, use an approximate aggregation function.
If the SQL aggregation function you're using has an equivalent approximation function, the approximation function yields faster query performance. For example, instead of using COUNT(DISTINCT)
, use APPROX_COUNT_DISTINCT
. For more information, see approximate aggregation functions.
You can also use HyperLogLog++
functions to do approximations (including custom approximate aggregations). For more information, see HyperLogLog++ functions in the GoogleSQL reference.
Consider the following use of the COUNT
function:
SELECT dim1, COUNT(DISTINCT dim2) FROM `dataset.table` GROUP BY 1;
You can optimize this query as follows:
SELECT dim1, APPROX_COUNT_DISTINCT(dim2) FROM `dataset.table` GROUP BY 1;Optimize quantile functions
Best practice: When possible, use APPROX_QUANTILE
instead of NTILE
.
Running a query that contains the NTILE
function can fail with a Resources exceeded
error if there are too many elements to ORDER BY
in a single partition, which causes data volume to grow. The analytic window isn't partitioned, so the NTILE
computation requires a global ORDER BY
for all rows in the table to be processed by a single worker/slot.
Try using APPROX_QUANTILES
instead. This function allows the query to run more efficiently because it doesn't require a global ORDER BY
for all rows in the table.
Consider the following use of the NTILE
function:
SELECT individual_id, NTILE(nbuckets) OVER (ORDER BY sales desc) AS sales_third FROM `dataset.table`;
You can optimize this query as follows:
WITH QuantInfo AS ( SELECT o, qval FROM UNNEST(( SELECT APPROX_QUANTILES(sales, nbuckets) FROM `dataset.table` )) AS qval WITH offset o WHERE o > 0 ) SELECT individual_id, (SELECT (nbuckets + 1) - MIN(o) FROM QuantInfo WHERE sales <= QuantInfo.qval ) AS sales_third FROM `dataset.table`;
The optimized version gives similar but not identical results to the original query, because APPROX_QUANTILES
:
Best practice: Use SQL UDFs for simple calculations because the query optimizer can apply optimizations to SQL UDF definitions. Use Javascript UDFs for complex calculations that are not supported by SQL UDF.
Calling a JavaScript UDF requires the instantiation of a subprocess. Spinning up this process and running the UDF directly impacts query performance. If possible, use a native (SQL) UDF instead.
Persistent UDFsIt is better to create persistent user-defined SQL and JavaScript functions in a centralized BigQuery dataset that can be invoked across queries and in logical views, as opposed to creating and calling a UDF in code each time. Creating org-wide libraries of business logic within shared datasets helps optimize performance and use fewer resources.
The following example shows how a temporary UDF is invoked in a query:
CREATE TEMP FUNCTION addFourAndDivide(x INT64, y INT64) AS ((x + 4) / y); WITH numbers AS (SELECT 1 as val UNION ALL SELECT 3 as val UNION ALL SELECT 4 as val UNION ALL SELECT 5 as val) SELECT val, addFourAndDivide(val, 2) AS result FROM numbers;
You can optimize this query by replacing the temporary UDF with a persistent one:
WITH numbers AS (SELECT 1 as val UNION ALL SELECT 3 as val UNION ALL SELECT 4 as val UNION ALL SELECT 5 as val) SELECT val, `your_project.your_dataset.addFourAndDivide`(val, 2) AS result FROM numbers;
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-07 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["For string comparisons, favor using `LIKE` over `REGEXP_CONTAINS` whenever possible, as it offers faster execution, especially if you don't require the full capabilities of regular expressions."],["If your requirements allow for it, utilize approximate aggregation functions like `APPROX_COUNT_DISTINCT` instead of `COUNT(DISTINCT)` to improve query performance."],["Replace the `NTILE` function with `APPROX_QUANTILES` when dealing with large datasets to avoid `Resources exceeded` errors and improve query efficiency, recognizing that `APPROX_QUANTILES` provides approximate results."],["When creating user-defined functions (UDFs), opt for SQL UDFs over JavaScript UDFs for simple calculations, as SQL UDFs are more readily optimized by the query optimizer; reserve JavaScript UDFs for more complex calculations."],["It is better to create persistent UDFs, and make use of them in a centralized dataset to reduce the amount of resources needed, and optimize performance."]]],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4