This page provides guidance on migrating an open source PostgreSQL database to Spanner.
Migration involves the following tasks:
This page also provides some example schemas using tables from the MusicBrainz PostgreSQL database.
Map your PostgreSQL schema to SpannerYour first step in moving a database from PostgreSQL to Spanner is to determine what schema changes you must make. Use pg_dump
to create Data Definition Language (DDL) statements that define the objects in your PostgreSQL database, and then modify the statements as described in the following sections. After you update the DDL statements, use them to create your database in a Spanner instance.
The following table describes how PostgreSQL data types map to Spanner data types. Update the data types in your DDL statements from PostgreSQL data types to Spanner data types.
PostgreSQL SpannerBigint
int8
INT64
Bigserial
serial8
INT64
Note: There is no auto-increment capability in Spanner. bit [ (n) ]
ARRAY<BOOL>
bit varying [ (n) ]
varbit [ (n) ]
ARRAY<BOOL>
Boolean
bool
BOOL
box
ARRAY<FLOAT64>
bytea
BYTES
character [ (n) ]
char [ (n) ]
STRING
character varying [ (n) ]
varchar [ (n) ]
STRING
cidr
STRING
, using standard CIDR notation. circle
ARRAY<FLOAT64>
date
DATE
double precision
float8
FLOAT64
inet
STRING
Integer
int
int4
INT64
interval[ fields ] [ (p) ]
INT64
if storing the value in milliseconds, or STRING
if storing the value in an application-defined interval format. json
STRING
jsonb
JSON
line
ARRAY<FLOAT64>
lseg
ARRAY<FLOAT64>
macaddr
STRING
, using standard MAC address notation. money
INT64
, or STRING
for arbitrary precision numbers. numeric [ (p, s) ]
decimal [ (p, s) ]
In PostgreSQL, the NUMERIC
and DECIMAL
data types support up to 217 digits of precision and 214-1 of scale, as defined in the column declaration.
The Spanner NUMERIC
data type supports up to 38 digits of precision and 9 decimal digits of scale.
If you require greater precision, see Storing arbitrary precision numeric data for alternative mechanisms.
path
ARRAY<FLOAT64>
pg_lsn
This data type is PostgreSQL-specific, so there isn't a Spanner equivalent. point
ARRAY<FLOAT64>
polygon
ARRAY<FLOAT64>
Real
float4
FLOAT64
Smallint
int2
INT64
Smallserial
serial2
INT64
Serial
serial4
INT64
text
STRING
time [ (p) ] [ without time zone ]
STRING
, using HH:MM:SS.sss
notation. time [ (p) ] with time zone
timetz
STRING
, using HH:MM:SS.sss+ZZZZ
notation. Alternately, this can be broken up into two columns, one of type TIMESTAMP
and another one holding the timezone. timestamp [ (p) ] [ without time zone ]
No equivalent. You may store as a STRING
or TIMESTAMP
at your discretion. timestamp [ (p) ] with time zone
timestamptz
TIMESTAMP
tsquery
No equivalent. Define a storage mechanism in your application instead. tsvector
No equivalent. Define a storage mechanism in your application instead. txid_snapshot
No equivalent. Define a storage mechanism in your application instead. uuid
STRING
or BYTES
xml
STRING
Primary keys
For tables in your Spanner database that you frequently append to, avoid using primary keys that monotonically increase or decrease, as this approach causes hotspots during writes. Instead, modify the DDL CREATE TABLE
statements so that they use supported primary key strategies. If you are using a PostgreSQL feature such as a UUID
data type or function, SERIAL
data types, IDENTITY
column, or sequence, you can use the auto-generated key migration strategies that we recommend.
Note that after you designate your primary key, you can't add or remove a primary key column, or change a primary key value later without deleting and recreating the table. For more information on how to designate your primary key, see Schema and data model - primary keys.
During migration, you might need to keep some existing monotonically increasing integer keys. If you need to keep these kinds of keys on a frequently updated table with a lot of operations on these keys, you can avoid creating hotspots by prefixing the existing key with a pseudo-random number. This technique causes Spanner to redistribute the rows. See What DBAs need to know about Spanner, part 1: Keys and indexes for more information on using this approach.
Foreign keys and referential integrityLearn about foreign keys support in Spanner.
IndexesPostgreSQL b-tree indexes are similar to secondary indexes in Spanner. In a Spanner database you use secondary indexes to index commonly searched columns for better performance, and to replace any UNIQUE
constraints specified in your tables. For example, if your PostgreSQL DDL has this statement:
CREATE TABLE customer (
id CHAR (5) PRIMARY KEY,
first_name VARCHAR (50),
last_name VARCHAR (50),
email VARCHAR (50) UNIQUE
);
You would use this statement in your Spanner DDL:
CREATE TABLE customer (
id STRING(5),
first_name STRING(50),
last_name STRING(50),
email STRING(50)
) PRIMARY KEY (id);
CREATE UNIQUE INDEX customer_emails ON customer(email);
You can find the indexes for any of your PostgreSQL tables by running the \di
meta-command in psql
.
After you determine the indexes that you need, add CREATE INDEX
statements to create them. Follow the guidance at Creating indexes.
Spanner implements indexes as tables, so indexing monotonically increasing columns (like those containing TIMESTAMP
data) can cause a hotspot. See What DBAs need to know about Spanner, part 1: Keys and indexes for more information on methods to avoid hotspots.
Learn about CHECK
constraint support in Spanner.
You must create the functionality of the following objects in your application logic:
serial
data types as sequence generatorsKeep the following tips in mind when migrating this functionality into application logic:
After you update your DDL statements to conform to Spanner schema requirements, use it to create your database in Spanner.
Create a Spanner instance. Follow the guidance in Instances to determine the correct regional configuration and compute capacity to support your performance goals.
Create the database by using either the Google Cloud console or the gcloud
command-line tool:
gcloud spanner databases create
command to create the database:
gcloud spanner databases create DATABASE_NAME --instance=INSTANCE_NAME --ddl='DDL1' --ddl='DDL2'
After you create the database, follow the instructions in Apply IAM roles to create user accounts and grant permissions to the Spanner instance and database.
Refactor the applications and data access layersIn addition to the code needed to replace the preceding database objects, you must add application logic to handle the following functionality:
CHECK
constraints.We recommend using the following process when refactoring:
After you create your Spanner database and refactor your application code, you can migrate your data to Spanner.
COPY
command to dump data to .csv files.Upload the .csv files to Cloud Storage.
Create an application to import data into Spanner. This application could use Dataflow or it could use the client libraries directly. Make sure to follow the guidance in Bulk data loading best practices to get the best performance.
Test all application functions against the Spanner instance to verify that they work as expected. Run production-level workloads to ensure the performance meets your needs. Update the compute capacity as needed to meet your performance goals.
Move to the new systemAfter you complete the initial application testing, turn up the new system using one of the following processes. Offline migration is the simplest way to migrate. However, this approach makes your application unavailable for a period of time, and it provides no rollback path if you find data issues later on. To perform an offline migration:
Start up the application that targets the Spanner database.
Live migration is possible and requires extensive changes to your application to support the migration.
Schema migration examplesThese examples show the CREATE TABLE
statements for several tables in the MusicBrainz PostgreSQL database schema. Each example includes both the PostgreSQL schema and the Spanner schema.
CREATE TABLE artist_credit (
hashed_id STRING(4),
id INT64,
name STRING(MAX) NOT NULL,
artist_count INT64 NOT NULL,
ref_count INT64,
created TIMESTAMP OPTIONS (
allow_commit_timestamp = true
),
) PRIMARY KEY(hashed_id, id);
PostgreSQL
CREATE TABLE artist_credit (
id SERIAL,
name VARCHAR NOT NULL,
artist_count SMALLINT NOT NULL,
ref_count INTEGER DEFAULT 0,
created TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
recording table GoogleSQL
CREATE TABLE recording (
hashed_id STRING(36),
id INT64,
gid STRING(36) NOT NULL,
name STRING(MAX) NOT NULL,
artist_credit_hid STRING(36) NOT NULL,
artist_credit_id INT64 NOT NULL,
length INT64,
comment STRING(255) NOT NULL,
edits_pending INT64 NOT NULL,
last_updated TIMESTAMP OPTIONS (
allow_commit_timestamp = true
),
video BOOL NOT NULL,
) PRIMARY KEY(hashed_id, id);
PostgreSQL
CREATE TABLE recording (
id SERIAL,
gid UUID NOT NULL,
name VARCHAR NOT NULL,
artist_credit INTEGER NOT NULL, -- references artist_credit.id
length INTEGER CHECK (length IS NULL OR length > 0),
comment VARCHAR(255) NOT NULL DEFAULT '',
edits_pending INTEGER NOT NULL DEFAULT 0 CHECK (edits_pending >= 0),
last_updated TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
video BOOLEAN NOT NULL DEFAULT FALSE
);
recording-alias table GoogleSQL
CREATE TABLE recording_alias (
hashed_id STRING(36) NOT NULL,
id INT64 NOT NULL,
alias_id INT64,
name STRING(MAX) NOT NULL,
locale STRING(MAX),
edits_pending INT64 NOT NULL,
last_updated TIMESTAMP NOT NULL OPTIONS (
allow_commit_timestamp = true
),
type INT64,
sort_name STRING(MAX) NOT NULL,
begin_date_year INT64,
begin_date_month INT64,
begin_date_day INT64,
end_date_year INT64,
end_date_month INT64,
end_date_day INT64,
primary_for_locale BOOL NOT NULL,
ended BOOL NOT NULL,
) PRIMARY KEY(hashed_id, id, alias_id),
INTERLEAVE IN PARENT recording ON DELETE NO ACTION;
PostgreSQL
CREATE TABLE recording_alias (
id SERIAL, --PK
recording INTEGER NOT NULL, -- references recording.id
name VARCHAR NOT NULL,
locale TEXT,
edits_pending INTEGER NOT NULL DEFAULT 0 CHECK (edits_pending >=0),
last_updated TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
type INTEGER, -- references recording_alias_type.id
sort_name VARCHAR NOT NULL,
begin_date_year SMALLINT,
begin_date_month SMALLINT,
begin_date_day SMALLINT,
end_date_year SMALLINT,
end_date_month SMALLINT,
end_date_day SMALLINT,
primary_for_locale BOOLEAN NOT NULL DEFAULT false,
ended BOOLEAN NOT NULL DEFAULT FALSE
-- CHECK constraint skipped for brevity
);
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4