Stay organized with collections Save and categorize content based on your preferences.
Working with geospatial dataGeospatial analytics let you analyze geographic data in BigQuery. Geographic data is also known as geospatial data.
Common types of objects when working with geospatial data include the following:
In BigQuery, the GEOGRAPHY
data type represents a geometry value or geometry collection. To represent spatial features, create a table with a GEOGRAPHY
column for the geometry plus additional columns for the attributes. Each row of the table is a spatial feature, and the entire table represents a spatial feature collection.
The GEOGRAPHY
data type describes a point set on the Earth's surface. A point set is a set of points, lines, and polygons on the WGS84 reference spheroid, with geodesic edges. You can use the GEOGRAPHY
data type by calling one of the GoogleSQL geography functions.
Single points on Earth can be described by just a longitude, latitude pair. For example, you can load a CSV file that contains longitude and latitude values and then use the ST_GEOGPOINT
function to convert them into GEOGRAPHY
values.
For more complex geographies, you can load the following geospatial data formats into a GEOGRAPHY
column:
WKT is a text format for describing individual geometry shapes using points, lines, polygons with optional holes, or a collection of points, lines, or polygons. WKB is the binary version of the WKT format. WKB can be hex encoded for formats that don't support binary data, like JSON.
For example, the following defines a point in WKT:
POINT(-121 41)
To describe a spatial feature, WKT is usually embedded in a container file format, such as a CSV file, or in a database table. A file row or a table row usually corresponds to the spatial feature. The whole file or the whole table corresponds to the feature collection. To load WKT data into BigQuery, provide a schema that specifies a GEOGRAPHY
column for the geospatial data.
GEOGRAPHY
value. If auto-detection is enabled, then BigQuery loads the data as a STRING
value.
For example, you might have a CSV file that contains the following data:
"POLYGON((-124.49 47.35,-124.49 40.73,-116.49 40.73,-116.49 47.35,-124.49 47.35))",poly1
"POLYGON((-85.6 31.66,-85.6 24.29,-78.22 24.29,-78.22 31.66,-85.6 31.66))",poly2
"POINT(1 2)",point1
You can load this file by running the bq command-line tool load
command:
bq load --source_format=CSV \
--schema="geography:GEOGRAPHY,name:STRING" \
mydataset.mytable filename1.csv
For more information about loading data in BigQuery, see Introduction to loading data.
To stream WKT data to an existing BigQuery table with a GEOGRAPHY
column, serialize the data as a string in the API request.
For more information about streaming data in BigQuery, see Streaming data into BigQuery.
You can also convert a WKT text string into a GEOGRAPHY
value by using the ST_GEOGFROMTEXT
function.
GeoJSON is a JSON-based format for geometries and spatial features. For example, the following defines a point in GeoJSON:
{ "type": "Point", "coordinates": [-121,41] }
GeoJSON data can contain any of the following object types:
There are two ways to load GeoJSON data into BigQuery:
A newline-delimited GeoJSON file contains a list of GeoJSON feature objects, one per line in the file. A GeoJSON feature object is a JSON object with the following members:
type
. For feature objects, the value must be Feature
. BigQuery validates the value but does not include it in the table schema.
geometry
. The value is a GeoJSON Geometry
object or null
. BigQuery converts this member into a GEOGRAPHY
value.
properties
. The value is any JSON object or null. If the value isn't null
, then BigQuery loads each member of the JSON object as a separate table column. For more information about how BigQuery parses JSON data types, see Details of loading JSON data.
id
. Optional. If present, the value is either a string or a number. BigQuery loads this value into a column named id
.
If the feature object contains other members that are not listed here, then BigQuery converts those members directly into table columns.
You can load a newline-delimited GeoJSON file by using the bq command-line tool's bq load
command, as follows:
bq load \ --source_format=NEWLINE_DELIMITED_JSON \ --json_extension=GEOJSON \ --autodetect \ DATASET.TABLE \ FILE_PATH_OR_URI
Replace the following:
DATASET
is the name of your dataset.TABLE
is the name of the destination table.FILE_PATH_OR_URI
is a path to a local file or a Cloud Storage URI.The previous example enables schema auto-detection. For more control over how BigQuery converts the values inside the properties
object, you can provide an explicit schema instead. For more information, see Specify schemas. If you provide an explicit schema, then don't include a top-level type
column in the schema definition. For each member of the properties
member, define separate columns, not a single nested column.
As defined by RFC 7946, a complete GeoJSON data structure is a single JSON object. Many systems export GeoJSON data as a single FeatureCollection
object that contains all of the geometries. To load this format into BigQuery, you must convert the file by removing the root-level FeatureCollection
object and splitting the individual feature objects into separate lines. For example, the following command uses the jq
command-line tool to split a GeoJSON file into newline- delimited format:
cat ~/file1.json | jq -c '.features[]' > converted.json
Creating an external table from a newline-delimited GeoJSON file
You can query a newline-delimited GeoJSON file stored in Cloud Storage by creating an external table. To create the external table, use the CREATE EXTERNAL TABLE
DDL statement. In the OPTIONS
clause, set the format
option to NEWLINE_DELIMITED_JSON
and the json_extension
option to GEOJSON
.
Example:
CREATE EXTERNAL TABLE mydataset.table1 OPTIONS (
format="NEWLINE_DELIMITED_JSON",
json_extension = 'GEOJSON',
uris = ['gs://mybucket/geofile.json']
);
Loading GeoJSON geometry data
Geospatial analytics supports loading individual GeoJSON geometry objects that are embedded as text strings in other file types. For example, you can load a CSV file where one of the columns contains a GeoJSON geometry object.
To load this type of GeoJSON data into BigQuery, provide a schema that specifies a GEOGRAPHY
column for the GeoJSON data. You must manually provide the schema. Otherwise, if auto-detection is enabled, then BigQuery loads the data as a STRING
value.
Geospatial analytics does not support loading GeoJSON feature objects or feature collections using this approach. If you need to load feature objects, then consider using newline-delimited GeoJSON files.
To stream GeoJSON data to an existing BigQuery table with a GEOGRAPHY
column, serialize the data as a string in the API request.
You can also convert a GeoJSON geometry object into a GEOGRAPHY
value by using the ST_GEOGFROMGEOJSON
function. For example, you can store the geometries as STRING
values and then run a query that calls ST_GEOGFROMGEOJSON
.
GeoParquet is a specification that adds geospatial types to the Parquet file format. GeoParquet includes metadata that provides definite semantics to the contained geospatial data, avoiding the interpretation issues that occur with other geospatial data formats.
When loading Parquet files, BigQuery checks for GeoParquet metadata. If GeoParquet metadata exists, BigQuery loads all of the columns it describes into a corresponding GEOGRAPHY
column by default. For more information about loading Parquet files, see Loading Parquet data.
GEOGRAPHY
columns, contact support. Creating an external table from GeoParquet data
External tables that reference GeoParquet files map relevant columns to the GEOGRAPHY
type.
Statistics available in the GeoParquet file (bbox
, covering
) aren't used to accelerate queries over an external table.
In geospatial analytics, points are positions on the surface of a WGS84 spheroid, expressed as longitude and geodetic latitude. An edge is a spherical geodesic between two endpoints. (That is, edges are the shortest path on the surface of a sphere.)
The WKT format does not provide a coordinate system. When loading WKT data, geospatial analytics assumes the data uses WGS84 coordinates with spherical edges. Make sure your source data matches that coordinate system, unless the geographies are small enough that the difference between spherical and planar edges can be ignored.
GeoJSON explicitly uses WGS84 coordinates with planar edges. When loading GeoJSON data, geospatial analytics converts planar edges to spherical edges. Geospatial analytics adds additional points to the line as necessary, so that the converted sequence of edges remains within 10 meters of the original line. This process is known as tessellation or non-uniform densification. You cannot directly control the tessellation process.
To load geographies with spherical edges, use WKT. To load geographies with planar edges, often called geometries, it's simplest to use GeoJSON. However, if your geometry data is already in WKT format, another option is to load the data as a STRING
type and then use the ST_GEOGFROMTEXT
function to convert to GEOGRAPHY
values. Set the planar
parameter to TRUE
to interpret the data as planar.
GeoParquet files include metadata about the coordinate system and edges that were used to create the data. When reading GeoParquet files with planar edges, geospatial analytics converts planar edges to spherical edges. GeoParquet files with coordinate systems other than WGS84 are rejected.
When choosing an interchange format, be sure to understand the coordinate system used by your source data. Most systems either explicitly support parsing geography (as opposed to geometry) from WKT, or else they assume planar edges.
Your coordinates should be longitude first, latitude second. If the geography has any long segments or edges then they must be tessellated, because geospatial analytics interprets them as spherical geodesics, which may not correspond to the coordinate system where your data originated.
Polygon orientationOn a sphere, every polygon has a complementary polygon. For example, a polygon that describes the Earth's continents would have a complementary polygon that describes the Earth's oceans. Because the two polygons are described by the same boundary rings, rules are required to resolve the ambiguity around which of the two polygons is described by a given WKT string.
When you load WKT and WKB strings from files or by using streaming ingestion, geospatial analytics assumes the polygons in the input are oriented as follows: If you traverse the boundary of the polygon in the order of the input vertices, the interior of the polygon is on the left. Geospatial analytics uses the same rule when exporting geography objects to WKT and WKB strings.
If you use the ST_GEOGFROMTEXT
function to convert a WKT string to a GEOGRAPHY
value, the oriented
parameter specifies how the function determines the polygon:
FALSE
: Interpret the input as the polygon with the smaller area. This is the default behavior.
TRUE
: Use the left-hand orientation rule described previously. This option allows you to load polygons with an area larger than a hemisphere.
Because GeoJSON strings are defined on a planar map, the orientation can be determined without ambiguity, even if the input does not follow the orientation rule defined in the GeoJSON format specification, RFC 7946.
Handling improperly formatted spatial dataWhen you load spatial data from other tools into BigQuery, you might encounter conversion errors due to invalid WKT or GeoJSON data. For example, an error such as Edge K has duplicate vertex with edge N
indicates that the polygon has duplicate vertices (besides the first and last).
To avoid formatting issues, you can use a function that generates standards-compliant output. For example, when you export data from PostGIS, you can use the PostGIS ST_MakeValid
function to standardize the output. Alternatively, import your data as text and then convert it by calling ST_GEOGFROMTEXT
or ST_GEOGFROMGEOJSON
with the make_valid
parameter. When make_valid
is TRUE
, these functions attempt to repair invalid polygons.
To find or to ignore the improperly formatted data, use the SAFE
function prefix to output the problematic data. For example, the following query uses the SAFE
prefix to retrieve improperly formatted spatial data.
SELECT geojson AS bad_geojson FROM mytable WHERE geojson IS NOT NULL AND SAFE.ST_GEOGFROMGEOJSON(geojson) IS NULLConstraints
Geospatial analytics does not support the following features in geospatial formats:
See ST_GEOGFROMGEOJSON
and ST_GEOGFROMTEXT
for constraints specific to GeoJson and WKT input formats.
Geospatial insights are often presented as grid-based, or raster, data. Raster data organizes regionally continuous data, such as satellite imagery, weather forecasts, and land cover, into a grid of pixels. Although BigQuery primarily specializes in tabular vector data, representing features with boundaries and points, you can integrate raster data into your BigQuery analyses by using the ST_REGIONSTATS
function. This function uses Earth Engine, Google's raster analysis platform, to perform computations and aggregations on raster data for enhanced geospatial analysis. For more information, see Work with raster data.
For information about exporting Earth Engine data to BigQuery, see Exporting to BigQuery. For more information about integrations between Earth Engine and BigQuery, see BigQuery integration in the Earth Engine documentation.
Transforming geospatial dataIf your table contains separate columns for longitude and latitude, you can transform the values into geographies by using GoogleSQL geography functions such as ST_GEOGPOINT
. For example, if you have two DOUBLE
columns for longitude and latitude, you can create a geography column with the following query:
SELECT *, ST_GEOGPOINT(longitude, latitude) AS g FROM mytable
BigQuery can convert WKT and GeoJSON strings to geography types. If your data is in another format such as Shapefiles, use an external tool to convert the data to a supported input file format, such as a CSV file, with GEOGRAPHY
columns encoded as WKT or GeoJSON strings.
You can partition and cluster tables that contain GEOGRAPHY
columns. You can use a GEOGRAPHY
column as a clustering column, but you cannot use a GEOGRAPHY
column as a partitioning column.
If you store GEOGRAPHY
data in a table and your queries filter data by using a spatial predicate, ensure that the table is clustered by the GEOGRAPHY
column. This typically improves query performance and might reduce cost. A spatial predicate calls a boolean geography function and has a GEOGRAPHY
column as one of the arguments. The following sample shows a spatial predicate that uses the ST_DWITHIN
function:
WHERE ST_DWITHIN(geo, ST_GeogPoint(longitude, latitude), 100)Using JOINs with spatial data
Spatial JOINs are joins of two tables with a predicate geographic function in the WHERE
clause. For example:
-- how many stations within 1 mile range of each zip code? SELECT zip_code AS zip, ANY_VALUE(zip_code_geom) AS polygon, COUNT(*) AS bike_stations FROM `bigquery-public-data.new_york.citibike_stations` AS bike_stations, `bigquery-public-data.geo_us_boundaries.zip_codes` AS zip_codes WHERE ST_DWITHIN( zip_codes.zip_code_geom, ST_GEOGPOINT(bike_stations.longitude, bike_stations.latitude), 1609.34) GROUP BY zip ORDER BY bike_stations DESC
Spatial joins perform better when your geography data is persisted. The example above creates the geography values in the query. It is more performant to store the geography values in a BigQuery table.
For example, the following query retrieves longitude, latitude pairs and converts them to geographic points. When you run this query, you specify a new destination table to store the query results:
SELECT *, ST_GEOGPOINT(pLongitude, pLatitude) AS p FROM mytable
BigQuery implements optimized spatial JOINs for INNER JOIN and CROSS JOIN operators with the following GoogleSQL predicate functions:
Spatial joins are not optimized:
LEFT
, RIGHT
or FULL OUTER
joinsA JOIN
that uses the ST_DWITHIN
predicate is optimized only when the distance parameter is a constant expression.
When you export spatial data from BigQuery, GEOGRAPHY
column values are always formatted as WKT strings. To export data in GeoJSON format, use the ST_ASGEOJSON
function.
If the tools you're using to analyze the exported data don't understand the GEOGRAPHY
data type, you can convert the column values to strings using a geographic function such as ST_ASTEXT
or ST_ASGEOJSON
. Geospatial analytics adds additional points to the line where necessary so that the converted sequence of edges remains within 10 meters of the original geodesic line.
For example, the following query uses ST_ASGEOJSON
to convert GeoJSON values to strings.
SELECT ST_ASGEOJSON(ST_MAKELINE(ST_GEOGPOINT(1,1), ST_GEOGPOINT(3,2)))
The resulting data would look like the following:
{ "type": "LineString", "coordinates": [ [1, 1], [1.99977145571783, 1.50022838764041], [2.49981908082299, 1.75018082434274], [3, 2] ] }
The GeoJSON line has two additional points. Geospatial analytics adds these points so that the GeoJSON line closely follows the same path on the ground as the original line.
What's nextRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4