RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://docs.databricks.com/aws/en/query-federation/database-federation below:

What is query federation? | Databricks Documentation

What is query federation?

With query federation, queries are pushed down to the foreign database using JDBC APIs. The query is executed both in Databricks and using remote compute. Query federation is used for sources like MySQL, PostgreSQL, BigQuery, Redshift, Teradata, and more.

Why use Lakehouse Federation?â

The lakehouse emphasizes central storage of data to reduce data redundancy and isolation. Your organization might have numerous data systems in production, and you might want to query data in connected systems for a number of reasons:

On-demand reporting.
Proof-of-concept work.
The exploratory phase of new ETL pipelines or reports.
Supporting workloads during incremental migration.

In each of these scenarios, query federation gets you to insights faster, because you can query the data in place and avoid complex and time-consuming ETL processing.

Query federation is meant for use cases when:

You don't want to ingest data into Databricks.
You want your queries to take advantage of compute in the external database system.
You want the advantages of Unity Catalog interfaces and data governance, including fine-grained access control, data lineage, and search.

Query federation vs. Lakeflow Connectâ

Query federation allows you to query external data sources without moving your data. Databricks recommends ingestion using managed connectors because they scale to accommodate high data volumes, low-latency querying, and third-party API limits. However, you might want to query your data without moving it. When you have a choice between managed ingestion connectors and query federation, choose query federation for ad hoc reporting or proof-of-concept work on your ETL pipelines.

Overview of query federation setupâ

To make a dataset available for read-only querying using Lakehouse Federation, you create the following:

A connection, a securable object in Unity Catalog that specifies a path and credentials for accessing an external database system.
A foreign catalog, a securable object in Unity Catalog that mirrors a database in an external data system, enabling you to perform read-only queries on that data system in your Databricks workspace, managing access using Unity Catalog.

Supported data sourcesâ

Query federation supports connections to the following sources:

Connection requirementsâ

Workspace requirements:

Workspace enabled for Unity Catalog.

Compute requirements:

Network connectivity from your compute resource to the target database systems. See Networking recommendations for Lakehouse Federation.
Databricks compute must use Databricks Runtime 13.3 LTS or above and Standard or Dedicated access mode.
SQL warehouses must be pro or serverless and must use 2023.40 or above.

Permissions required:

To create a connection, you must be a metastore admin or a user with the CREATE CONNECTION privilege on the Unity Catalog metastore attached to the workspace.
To create a foreign catalog, you must have the CREATE CATALOG permission on the metastore and be either the owner of the connection or have the CREATE FOREIGN CATALOG privilege on the connection.

Additional permission requirements are specified in each task-based section that follows.

Create a connectionâ

A connection specifies a path and credentials for accessing an external database system. To create a connection, you can use Catalog Explorer or the CREATE CONNECTION SQL command in a Databricks notebook or the Databricks SQL query editor.

Permissions required: Metastore admin or user with the CREATE CONNECTION privilege.

CatalogÂ Explorer
SQL

In your Databricks workspace, click Catalog.
At the top of the Catalog pane, click the Add icon and select Add a connection from the menu.

Alternatively, from the Quick access page, click the External data > button, go to the Connections tab, and click Create connection.
Enter a user-friendly Connection name.
Select the Connection type (database provider, like MySQL or PostgreSQL).
(Optional) Add a comment.
Click Next.
Enter the connection properties (such as host information, path, and access credentials).

Each connection type requires different connection information. See the article for your connection type, listed in the table of contents to the left.
Click Create connection.
Enter a name for the foreign catalog.
(Optional) Click Test connection to confirm that it works.
Click Create catalog.
Select the workspaces in which users can access the catalog you created. You can select All workspaces have access, or click Assign to workspaces, select the workspaces, and then click Assign.
Change the Owner who will be able to manage access to all objects in the catalog. Start typing a principal in the text box, and then click the principal in the returned results.
Grant Privileges on the catalog. Click Grant:
1. Specify the Principals who will have access to objects in the catalog. Start typing a principal in the text box, and then click the principal in the returned results.
2. Select the Privilege presets to grant to each principal. All account users are granted BROWSE by default.
  - Select Data Reader from the drop-down menu to grant read privileges on objects in the catalog.
  - Select Data Editor from the drop-down menu to grant read and modify privileges on objects in the catalog.
  - Manually select the privileges to grant.
3. Click Grant.
4. Click Next.
5. On the Metadata page, specify tags key-value pairs. For more information, see Apply tags to Unity Catalog securable objects.
6. (Optional) Add a comment.
7. Click Save.

Run the following command in a notebook or the SQL query editor. This example is for connections to a PostgreSQL database. The options differ by connection type. See the article for your connection type, listed in the table of contents to the left.

SQL

CREATE CONNECTION <connection-name> TYPE postgresql
OPTIONS (
  host '<hostname>',
  port '<port>',
  user '<user>',
  password '<password>'
);

We recommend that you use Databricks secrets instead of plaintext strings for sensitive values like credentials. For example:

SQL

CREATE CONNECTION <connection-name> TYPE postgresql
OPTIONS (
  host '<hostname>',
  port '<port>',
  user secret ('<secret-scope>','<secret-key-user>'),
  password secret ('<secret-scope>','<secret-key-password>')
)

For information about setting up secrets, see Secret management.

For information about managing existing connections, see Manage connections for Lakehouse Federation.

Create a foreign catalogâ

note

If you use the UI to create a connection to the data source, foreign catalog creation is included and you can skip this step.

A foreign catalog mirrors a database in an external data system so that you can query and manage access to data in that database using Databricks and Unity Catalog. To create a foreign catalog, you use a connection to the data source that has already been defined.

To create a foreign catalog, you can use Catalog Explorer or the CREATE FOREIGN CATALOG SQL command in a Databricks notebook or the SQL query editor. You can also use the Unity Catalog API. See Databricks reference documentation.

Foreign catalog metadata is synced into Unity Catalog on each interaction with the catalog. For the data type mapping between Unity Catalog and the data source, check the Data Type Mappings section of each data source's documentation.

Permissions required: CREATE CATALOG permission on the metastore and either ownership of the connection or the CREATE FOREIGN CATALOG privilege on the connection.

CatalogÂ Explorer
SQL

In your Databricks workspace, click Catalog to open Catalog Explorer.
At the top of the Catalog pane, click the Add icon and select Add a catalog from the menu.

Alternatively, from the Quick access page, click the Catalogs button, and then click the Create catalog button.
Follow the instructions for creating foreign catalogs in Create catalogs.

Run the following SQL command in a notebook or the SQL query editor. Items in brackets are optional. Replace the placeholder values:

<catalog-name>: Name for the catalog in Databricks.
<connection-name>: The connection object that specifies the data source, path, and access credentials.
<database-name>: Name of the database you want to mirror as a catalog in Databricks. Not required for MySQL, which uses a two-layer namespace.
<external-catalog-name>: Databricks-to-Databricks only: Name of the catalog in the external Databricks workspace that you are mirroring. See Create a foreign catalog.

SQL

CREATE FOREIGN CATALOG [IF NOT EXISTS] <catalog-name> USING CONNECTION <connection-name>
OPTIONS (database '<database-name>');

For information about managing and working with foreign catalogs, see Manage and work with foreign catalogs.

Load data from foreign tables with materialized viewsâ

Databricks recommends loading external data using query federation when you are creating materialized views. See Materialized views.

When you use query federation, users can reference the federated data as follows:

SQL

CREATE MATERIALIZED VIEW xyz AS SELECT * FROM federated_catalog.federated_schema.federated_table;

View system-generated federated queriesâ

Query federation translates Databricks SQL statements into statements that can be pushed down into the federated data source. To view the generated SQL statement, click the foreign data source scan node on the graph view of Query Profile, or run the EXPLAIN FORMATTED SQL statement. See the Supported Pushdown section of each data source's documentation for coverage.

Limitationsâ

Queries are read-only.

The only exception is when Lakehouse Federation is used to federate a workspace's legacy Hive metastore (catalog federation). Foreign tables in that scenario are writeable. See What does it mean to write to a foreign catalog in a federated Hive metastore?.
Throttling of connections is determined using the Databricks SQL concurrent query limit. There is no limit across warehouses per connection. See Queueing and autoscaling for pro and classic SQL warehouses.
Tables and schemas with names that are invalid in Unity Catalog are not supported and are ignored by Unity Catalog upon creation of a foreign catalog. See the list of naming rules and limitations in Limitations.
Table names and schema names are converted to lowercase in Unity Catalog. If this causes name collisions, Databricks cannot guarantee which object is imported into the foreign catalog.
For each foreign table referenced, Databricks schedules a subquery in the remote system to return a subset of data from that table and then returns the result to one Databricks executor task over a single stream. If the result set is too large, the executor might run out of memory.
Dedicated access mode (formerly single user access mode) is only available for users that own the connection.
Lakehouse Federation cannot federate foreign tables with case-sensitive identifiers for Azure Synapse connections or Redshift connections.

Resource quotasâ

Databricks enforces resource quotas on all Unity Catalog securable objects. These quotas are listed in Resource limits. Foreign catalogs and all objects that they contain are included in your total quota usage.

If you expect to exceed these resource limits, contact your Databricks account team.

You can monitor your quota usage using the Unity Catalog resource quotas APIs. See Monitor your usage of Unity Catalog resource quotas.

Additional resourcesâ

Federated queries (Lakehouse Federation) in the SQL language reference
What is catalog federation?
Hive metastore federation: enable Unity Catalog to govern tables registered in a Hive metastore

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4