Stay organized with collections Save and categorize content based on your preferences.
Continuous materialized view queriesPreview
This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of the Service Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see the launch stage descriptions.
To create a continuous materialized view of a Bigtable table, you run a SQL query that defines the continuous materialized view.
This document describes concepts and patterns to help you prepare your continuous materialized view SQL query. Before you read this document, you should be familiar with Continuous materialized views and GoogleSQL for Bigtable.
Continuous materialized views use a restricted SQL syntax. Queries must use the following pattern:
SELECT
expression AS alias [, ...]
FROM from_item
[ WHERE bool_expression ]
GROUP BY expression [, ...]
from_item:
{
table_name [ as_alias ]
| field_path
}
as_alias:
[ AS ] alias
Query limitations
The following rules apply to a SQL query used to create a continuous materialized view:
SELECT
statementGROUP BY
clauseYou can use the following aggregation functions in a SQL query that defines a continuous materialized view:
COUNT
SUM
MIN
MAX
HLL_COUNT.INIT
HLL_COUNT.MERGE
HLL_COUNT.MERGE_PARTIAL
ANY_VALUE
BIT_AND
BIT_OR
BIT_XOR
AVG
If you SELECT COUNT(*)
you must define a row key, like in the following example:
SELECT
'*' AS _key,
COUNT(*) AS count
FROM
foo
GROUP BY
_key;
Unsupported SQL features
You can't use the following SQL features:
ARRAY
ARRAY_AGG
ARRAY_CONCAT_AGG
COUNT_IF
CURRENT_TIME
and other non-deterministic functionsDATE
, DATETIME
as output columns (Use TIMESTAMP
or store a string.)DESC
sort in the outputDISTINCT
option, as in SUM(*DISTINCT* value)
)LIMIT/OFFSET
ORDER BY
SELECT *
OVER
clause to create a windowing aggregationSTRUCT
You also can't nest GROUP BY
clauses or create map columns. For additional limitations, see Limitations.
Input rows are excluded from a continuous materialized view in the following circumstances:
SELECT apple AS apples , SUM(banana) AS sum_bananas FROM my_table GROUP BY apples
, then any row that contains more than 1MiB of data in the apple
and banana
columns is excluded from the continuous materialized view.SELECT REPEAT(apple, 1000)
or use large constants.Excluded rows increment the user errors metric when they are first processed. For more information about metrics that can help you monitor your continuous materialized views, see Metrics.
Query detailsThis section describes a continuous materialized view query and how the results might look when the view is queried. Data in the source table is the input, and the result data in the continuous materialized view is the output. Output data is either aggregated or unaggregated (in the defined key).
SELECT statementThe select clause configures the columns and aggregations used in the continuous materialized view and must use a GROUP BY
clause.
SELECT *
is not supported, but SELECT COUNT(*)
is.
As in a typical SELECT
statement, you can have multiple aggregations per a grouped set of data. The ungrouped columns must be an aggregation result.
This is an example of a standard GROUP BY
aggregation query in SQL:
SELECT
myfamily["node"] AS node,
myfamily["type"] AS type,
COUNT(clicks) AS clicks_per_key
FROM
mytable
GROUP BY
node,
type
Row keys and unaggregated data
You can optionally specify a _key
output column as when you define your continuous materialized view. This is different from the _key
column that you get when you execute a SQL query on a Bigtable table. If you specify a _key
, the following rules apply:
_key
, and you can't group by anything else except (optionally) by _timestamp
. For more information, see Timestamps._key
column must be of type BYTES
.Specifying a _key
is useful if you plan to read the view with ReadRows
rather than with SQL, because it gives you control over the row key format. On the other hand, a SQL query to a view with a defined _key
might need to decode the _key
explicitly instead of just returning structured key columns.
If you're not using _key
, the unaggregated columns in your SELECT statement become the key in the continuous materialized view, and you can assign the key columns any names supported by SQL conventions.
Your SQL filter must eliminate potential NULL
or other invalid values that can cause errors. An invalid row is omitted from the results and is counted in the materialized_view/user_errors
metric. To debug user errors, try running the SQL query outside of a continuous materialized view.
Unaggregated output columns must be in the GROUP BY
clause. The order in which columns are written in the GROUP BY
clause is the order in which the data is stored in the continuous materialized view row key. For example, GROUP BY a, b, c
is implicitly ORDER BY a ASC, b ASC, c ASC
.
Aggregate columns in the query define the calculations that generate the data in the continuous materialized view.
The alias for an aggregate column is treated as a column qualifier in the continuous materialized view.
Consider the following example:
SELECT
fam["baz"] AS baz,
SUM(fam["foo"]) AS sum_foo,
SUM(fam["bar"]) AS sum_bar
FROM
TABLE
GROUP BY
baz;
The query output has the following characteristics:
baz
is in a separate row in baz ASC
order.baz
has at least one foo
, then the output row's sum_foo
is a non-NULL value.baz
has at least one bar
, then the output row's sum_bar
is a non-NULL value.baz
has no value for either column, it's omitted from the results.Then if you query the view with SELECT *
, the result looks similar to the following:
The default timestamp for an output cell in a continuous materialized view is 0 (1970-01-01 00:00:00Z
). This is visible when you read the view with ReadRows
and not when you query it with SQL.
To use a different timestamp in the output, you can add a column of the TIMESTAMP
type to the SELECT
list of the query and name it _timestamp
. If you query the continuous materialized view using ReadRows
, _timestamp
becomes the timestamp for the other cells in the row.
A timestamp must not be NULL
, must be greater than or equal to zero, and must be a multiple of 1,000 (millisecond precision). Bigtable doesn't support cell timestamps earlier than the Unix epoch (1970-01-01T00:00:00Z).
Consider the following example, which resamples aggregate data by day. The query uses the UNPACK
function.
SELECT
_key,
TIMESTAMP_TRUNC(_timestamp, DAY) AS _timestamp,
SUM(sum_family["sum_column"]) AS sum_column,
SUM(sum_family["foo"]) AS second_sum_column
FROM
UNPACK(
SELECT
*
FROM
my_table(with_history => TRUE))
GROUP BY
1,
2
If a given SUM
has non-empty input for a given day, then the output row contains an aggregated value with a timestamp that matches the truncated day.
If you query the view with SELECT *
, the result looks similar to the following:
If you query your continuous materialized view with SQL, you don't need to be aware of how aggregated values are encoded because SQL exposes the results as typed columns.
If you read from the view using ReadRows
, you need to decode the aggregated data in your read request. For more information on ReadRows
requests, see Reads.
Aggregated values in a continuous materialized view are stored using encoding described in the following table, based on the output type of the column from the view definition.
Type Encoding BOOL 1 byte value, 1 = true, 0 = false BYTES No encoding INT64 (or INT, SMALLINT, INTEGER, BIGINT, TINYINT, BYTEINT) 64-bit big-endian FLOAT64 64-bit IEEE 754, excluding NaN and +/-inf STRING UTF-8 TIME/TIMESTAMP 64-bit integer representing the number of microseconds since the Unix epoch (consistent with GoogleSQL)For more information, see
Encodingin the Data API reference.
What's nextRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.3