A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://docs.databricks.com/aws/en/oltp/sync-data/sync-table below:

Sync data from Unity Catalog tables to a database instance

Sync data from Unity Catalog tables to a database instance

Preview

This feature is in Public Preview in the following regions: us-east-1, us-west-2, eu-west-1, ap-southeast-1, ap-southeast-2, eu-central-1, us-east-2, ap-south-1.

This page describes how to create and manage a synced table. A synced table is a Unity Catalog read-only Postgres table that automatically synchronizes data from a Unity Catalog table to your Lakebase database instance. Syncing a Unity Catalog table into Postgres enables low-latency read queries and supports query-time joins with other Postgres tables.

The synchronization is handled by a managed Lakeflow declarative pipeline that continuously updates the Postgres table with changes from the source table. After creation, synced tables can be queried directly using Postgres tools.

The key characteristics of synced tables are as follows:

Before you begin​ Create a synced table​

To sync a Unity Catalog table into Postgres, do the following:

  1. Click Catalog in the workspace sidebar.

  2. Find and select the Unity Catalog table you want to create a synced table on.

  3. Click Create > Synced table.

  4. Select your catalog, schema, and enter a table name for the new synced table.

  5. Select a database instance and enter the name of the Postgres database in which to create the synced table. The Postgres database field will default to the currently selected target catalog. If a Postgres database does not exist under this name, Databricks creates a new one.

  6. Select a Primary Key. A primary key is required as it enables efficient access to rows for reads, updates, and deletes.

  7. If two rows have the same primary key in the source table, select a Timeseries Key to configure deduplication. When a Timeseries Key is specified, the synced tables contains only the rows with the latest timeseries key value for each primary key.

  8. Select the sync mode from Snapshot, Triggered, and Continuous. For all sync modes, the whole source table is read and written to Postgres.

    Policy

    Description

    Snapshot

    The pipeline runs once to take a snapshot of the source table and copy it to the synced table. Subsequent changes to the source table are automatically reflected in the synced table by taking a new snapshot of the source and creating a new copy. The content of the synced table is updated atomically.

    Triggered

    The pipeline runs once to create an initial snapshot copy of the source table in the synced table. Unlike the Snapshot sync mode, when the synced table is refreshed, only changes since the last pipeline execution are retrieved and applied to the synced table. The incremental refresh can be manually triggered or automatically triggered according to a schedule.

    Continuous

    The pipeline runs continuously. Subsequent changes to the source table are incrementally applied to the synced table in real time streaming mode. No manual refresh is necessary.

    note

    To support Triggered or Continuous sync mode, the source table must have Change data feed enabled.

  9. Choose if you want to create this synced table from a new or existing pipeline.

  10. After Synced table status is Online, log in to your database instance and query the newly created table. Query your table using the SQL editor, external tools, or notebooks.

Create a synced table in a database catalog.

Bash

export CATALOG_NAME=<Database catalog>
export SRC_TBL="source_catalog.source_schema.source_table"
export DEST_TBL="$CATALOG_NAME.some_schema.synced_table"
export PKS='["id"]'
export ST_CATALOG = "storage_catalog"
export ST_SCHEMA = "storage_schema"

curl -X POST https://$WORKSPACE/api/2.0/database/synced_tables \
-H "Content-Type: text/json" \
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data-binary @- << EOF
{
"name": "$DEST_TBL",
"spec": {
"source_table_full_name": "$SRC_TBL",
"primary_key_columns": $PKS,
"scheduling_policy": "TRIGGERED",
},
"new_pipeline_spec": {
"storage_catalog": "$ST_CATALOG",
"storage_schema": "$ST_SCHEMA",
}
}
EOF

Create a synced table in a standard Unity Catalog catalog.

Bash

export CATALOG_NAME=<Standard catalog>
export DATABASE_INSTANCE=<database instance>
export POSTGRES_DATABASE=$CATALOG_NAME
export SRC_TBL="source_catalog.source_schema.source_table"
export DEST_TBL="$CATALOG_NAME.some_schema.sync_table"
export PKS='["id"]'
export ST_CATALOG = "storage_catalog"
export ST_SCHEMA = "storage_schema"

curl -X POST https://$WORKSPACE/api/2.0/database/synced_tables \
-H "Content-Type: text/json" \
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data-binary @- << EOF
{
"name": "$DEST_TBL",
"database_instance_name": "$DATABASE_INSTANCE",
"logical_database_name": "$POSTGRES_DATABASE",
"spec": {
"source_table_full_name": "$SRC_TBL",
"primary_key_columns": $PKS,
"scheduling_policy": "TRIGGERED",
},
"new_pipeline_spec": {
"storage_catalog": "$ST_CATALOG",
"storage_schema": "$ST_SCHEMA",
}
}
EOF

Check the status of a synced table.

Bash

export SYNCEDTABLE='pg_db.silver.sbtest1_online'

curl --request GET \
"https://e2-dogfood.staging.cloud.databricks.com/api/2.0/database/synced_tables/$SYNCEDTABLE" \
--header "Authorization: Bearer dapi..."
Supported operations​

Only a limited set of operations are supported on the Postgres side for synced tables:

Although you are able to modify this table in other ways, it interferes with the synchronization pipeline.

Delete a synced table​

To delete a synced table, you must delete it from Unity Catalog and then drop the table in the database instance. Deleting the synced table from Unity Catalog deregisters the table and stops any data refreshes. However, the table remains in the underlying Postgres database. To free up space in your database instance, connect to the instance and use the DROP TABLE command.

  1. Click Catalog in the workspace sidebar.
  2. Find and select the synced table you want to delete.
  3. Click > Delete.
  4. Connect to the instance with psql, the SQL editor, or from a notebook.
  5. Drop the table using PostgreSQL.

    PostgreSQL

    DROP TABLE synced_table_database.synced_table_schema.synced_table
Ownership and permissions​

If you create a new Postgres database, schema, or table, Postgres ownership is set as follows:

Manage synced table access​

After a synced table is created, the databricks_superuser can READ a synced table from Postgres. The databricks_superuser has pg_read_all_data and pg_write_all_data privileges:

Manage synced table operations​

The databricks_superuser can manage which users are authorized to perform specific operations on a synced table. The supported operations for synced tables are:

All other DDL operations are denied for synced tables.

To grant these privileges to additional users, the databricks_superuser must first create an extension on databricks_auth:

PostgreSQL

CREATE EXTENSION IF NOT EXISTS databricks_auth;

Then the databricks_superuser can add a user to manage a synced table:

PostgreSQL

SELECT databricks_synced_table_add_manager('"synced_table_schema"."synced_table"'::regclass, '[user]');

The databricks_superuser can remove a user from managing a synced table:

PostgreSQL

SELECT databricks_synced_table_remove_manager('[table]', '[user]');

The databricks_superuser can view all managers:

PostgreSQL

SELECT * FROM databricks_synced_table_managers;
Handle invalid characters​

Certain characters, such as the null byte (0x00), are allowed in STRING, ARRAY, MAP, or STRUCT columns in Delta tables but are not supported in Postgres TEXT or JSONB columns. As a result, syncing such data from Delta to Postgres can lead to insertion failures with errors:

org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00

org.postgresql.util.PSQLException: ERROR: unsupported Unicode escape sequence DETAIL: \u0000 cannot be converted to text.

How to resolve:

You can address this issue in one of the following ways:

Limitations and considerations​ Source table's Securable Kinds​

The source table's Securable Kind (shown in the Catalog Explorer Details tab) must be one of the supported options below:

Naming and identifier limitations​ Performance and synchronization​ Capacity and limits​ Catalog integration​

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4