note
This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.
In this article, you configure properties to establish a connection between Databricks Connect and your Databricks cluster or serverless compute. This information applies to the Python and Scala version of Databricks Connect unless stated otherwise.
Databricks Connect enables you to connect popular IDEs such as Visual Studio Code, PyCharm, RStudio Desktop, IntelliJ IDEA, notebook servers, and other custom applications to Databricks clusters. See What is Databricks Connect?.
RequirementsâTo configure a connection to Databricks compute, you must have:
Databricks Connect installed. For installation requirements and steps for specific language versions of Databricks Connect, see:
A Databricks account and workspace that have Unity Catalog enabled. See Get started with Unity Catalog and Enable a workspace for Unity Catalog.
The Databricks Runtime version of your compute must be equal to, or above, the Databricks Connect package version. Databricks recommends that you use the most recent package of Databricks Connect that matches the Databricks Runtime version. For compute version requirements, see the version support matrix for Databricks Connect for Python or Databricks Connect for Scala.
To use features that are available in later versions of the Databricks Runtime, you must upgrade the Databricks Connect package. See the Databricks Connect release notes for a list of available Databricks Connect releases. For Databricks Runtime version release notes, see Databricks Runtime release notes versions and compatibility.
If you are using classic compute, the cluster must use a cluster access mode of Assigned or Shared. See Access modes.
Before you begin, you need the following:
There are multiple ways to configure the connection to your cluster. Databricks Connect searches for configuration properties in the following order, and uses the first configuration it finds. For advanced configuration information, see Advanced usage of Databricks Connect for Python.
DatabricksSession
class's remote()
methodâ
For this option, which applies to Databricks personal access token authentication only, specify the workspace instance name, the Databricks personal access token, and the ID of the cluster.
You can initialize the DatabricksSession
class in several ways:
host
, token
, and cluster_id
fields in DatabricksSession.builder.remote()
.Config
class.cluster_id
field.Instead of specifying these connection properties in your code, Databricks recommends configuring properties through environment variables or configuration files, as described throughout this section. The following code examples assume that you provide some implementation of the proposed retrieve_*
functions to get the necessary properties from the user or from some other configuration store, such as AWS Systems Manager Parameter Store.
The code for each of these approaches is as follows:
Python
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.remote(
host = f"https://{retrieve_workspace_instance_name()}",
token = retrieve_token(),
cluster_id = retrieve_cluster_id()
).getOrCreate()
Scala
import com.databricks.connect.DatabricksSession
val spark = DatabricksSession.builder()
.host(retrieveWorkspaceInstanceName())
.token(retrieveToken())
.clusterId(retrieveClusterId())
.getOrCreate()
Python
from databricks.connect import DatabricksSession
from databricks.sdk.core import Config
config = Config(
host = f"https://{retrieve_workspace_instance_name()}",
token = retrieve_token(),
cluster_id = retrieve_cluster_id()
)
spark = DatabricksSession.builder.sdkConfig(config).getOrCreate()
Scala
import com.databricks.connect.DatabricksSession
import com.databricks.sdk.core.DatabricksConfig
val config = new DatabricksConfig()
.setHost(retrieveWorkspaceInstanceName())
.setToken(retrieveToken())
val spark = DatabricksSession.builder()
.sdkConfig(config)
.clusterId(retrieveClusterId())
.getOrCreate()
Python
from databricks.connect import DatabricksSession
from databricks.sdk.core import Config
config = Config(
profile = "<profile-name>",
cluster_id = retrieve_cluster_id()
)
spark = DatabricksSession.builder.sdkConfig(config).getOrCreate()
Scala
import com.databricks.connect.DatabricksSession
import com.databricks.sdk.core.DatabricksConfig
val config = new DatabricksConfig()
.setProfile("<profile-name>")
val spark = DatabricksSession.builder()
.sdkConfig(config)
.clusterId(retrieveClusterId())
.getOrCreate()
A Databricks configuration profileâ
For this option, create or identify a Databricks configuration profile containing the field cluster_id
and any other fields that are necessary for the Databricks authentication type that you want to use.
The required configuration profile fields for each authentication type are as follows:
host
and token
.host
, client_id
, and client_secret
.host
.Then set the name of this configuration profile through the configuration class.
note
You can use the auth login
command's --configure-cluster
option to automatically add the cluster_id
field to a new or existing configuration profile. For more information, run the command databricks auth login -h
.
You can specify cluster_id
in a couple of ways:
cluster_id
field in your configuration profile, and then just specify the configuration profile's name.cluster_id
field.If you have already set the DATABRICKS_CLUSTER_ID
environment variable with the cluster's ID, you do not also need to specify cluster_id
.
The code for each of these approaches is as follows:
Python
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.profile("<profile-name>").getOrCreate()
Scala
import com.databricks.connect.DatabricksSession
import com.databricks.sdk.core.DatabricksConfig
val config = new DatabricksConfig()
.setProfile("<profile-name>")
val spark = DatabricksSession.builder()
.sdkConfig(config)
.getOrCreate()
Python
from databricks.connect import DatabricksSession
from databricks.sdk.core import Config
config = Config(
profile = "<profile-name>",
cluster_id = retrieve_cluster_id()
)
spark = DatabricksSession.builder.sdkConfig(config).getOrCreate()
Scala
import com.databricks.connect.DatabricksSession
import com.databricks.sdk.core.DatabricksConfig
val config = new DatabricksConfig()
.setProfile("<profile-name>")
val spark = DatabricksSession.builder()
.sdkConfig(config)
.clusterId(retrieveClusterId())
.getOrCreate()
The DATABRICKS_CONFIG_PROFILE
environment variableâ
For this option, create or identify a Databricks configuration profile containing the field cluster_id
and any other fields that are necessary for the Databricks authentication type that you want to use.
If you have already set the DATABRICKS_CLUSTER_ID
environment variable with the cluster's ID, you do not also need to specify cluster_id
.
The required configuration profile fields for each authentication type are as follows:
host
and token
.host
, client_id
, and client_secret
.host
.note
You can use the auth login
command's --configure-cluster
to automatically add the cluster_id
field to a new or existing configuration profile. For more information, run the command databricks auth login -h
.
Set the DATABRICKS_CONFIG_PROFILE
environment variable to the name of this configuration profile. Then initialize the DatabricksSession
class:
Python
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()
Scala
import com.databricks.connect.DatabricksSession
val spark = DatabricksSession.builder().getOrCreate()
An environment variable for each configuration propertyâ
For this option, set the DATABRICKS_CLUSTER_ID
environment variable and any other environment variables that are necessary for the Databricks authentication type that you want to use.
The required environment variables for each authentication type are as follows:
DATABRICKS_HOST
and DATABRICKS_TOKEN
.DATABRICKS_HOST
, DATABRICKS_CLIENT_ID
, and DATABRICKS_CLIENT_SECRET
.DATABRICKS_HOST
.Then initialize the DatabricksSession
class:
Python
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()
Scala
import com.databricks.connect.DatabricksSession
val spark = DatabricksSession.builder().getOrCreate()
A Databricks configuration profile named DEFAULT
â
For this option, create or identify a Databricks configuration profile containing the field cluster_id
and any other fields that are necessary for the Databricks authentication type that you want to use.
If you have already set the DATABRICKS_CLUSTER_ID
environment variable with the cluster's ID, you do not also need to specify cluster_id
.
The required configuration profile fields for each authentication type are as follows:
host
and token
.host
, client_id
, and client_secret
.host
.Name this configuration profile DEFAULT
.
note
You can use the auth login
command's --configure-cluster
option to automatically add the cluster_id
field to the DEFAULT
configuration profile. For more information, run the command databricks auth login -h
.
Then initialize the DatabricksSession
class:
Python
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()
Scala
import com.databricks.connect.DatabricksSession
val spark = DatabricksSession.builder().getOrCreate()
Configure a connection to serverless computeâ
Databricks Connect for Python supports connecting to serverless compute. To use this feature, version requirements for connecting to serverless must be met. See Requirements.
You can configure a connection to serverless compute in one of the following ways:
Set the local environment variable DATABRICKS_SERVERLESS_COMPUTE_ID
to auto
. If this environment variable is set, Databricks Connect ignores the cluster_id
.
In a local Databricks configuration profile, set serverless_compute_id = auto
, then reference that profile from your code.
[DEFAULT]
host = https://my-workspace.cloud.databricks.com/
serverless_compute_id = auto
token = dapi123...
Or use either of the following options:
Python
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.serverless(True).getOrCreate()
Python
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.remote(serverless=True).getOrCreate()
note
The serverless compute session times out after 10 minutes of inactivity. After this, a new Spark session should be created using getOrCreate()
to connect to serverless compute.
To validate your environment, default credentials, and connection to compute are correctly set up for Databricks Connect, run the databricks-connect test
command, which fails with a non-zero exit code and a corresponding error message when it detects any incompatibility in the setup.
In Databricks Connect 14.3 and above, you can also validate your environment using validateSession()
:
DatabricksSession.builder.validateSession(True).getOrCreate()
Disabling Databricks Connectâ
Databricks Connect (and the underlying Spark Connect) services can be disabled on any given cluster.
To disable the Databricks Connect service, set the following Spark configuration on the cluster.
spark.databricks.service.server.enabled false
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4