note
This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.
This article describes how to use Databricks Utilities with Databricks Connect for Python. Databricks Connect enables you to connect popular IDEs, notebook servers, and custom applications to Databricks clusters. See What is Databricks Connect?. For the Scala version of this article, see Databricks Utilities with Databricks Connect for Scala.
You use Databricks Connect to access Databricks Utilities as follows:
WorkspaceClient
class's dbutils
variable to access Databricks Utilities. The WorkspaceClient
class belongs to the Databricks SDK for Python and is included in Databricks Connect.dbutils.fs
to access the Databricks Utilities fs utility.dbutils.secrets
to access the Databricks Utilities secrets utility.dbutils
.tip
You can also use the included Databricks SDK for Python to access any available Databricks REST API, not just the preceding Databricks Utilities APIs. See databricks-sdk on PyPI.
To initialize WorkspaceClient
, you must provide enough information to authenticate an Databricks SDK with the workspace. For example, you can:
Hard-code the workspace URL and your access token directly within your code, and then initialize WorkspaceClient
as follows. Although this option is supported, Databricks does not recommend this option, as it can expose sensitive information, such as access tokens, if your code is checked into version control or otherwise shared:
Python
from databricks.sdk import WorkspaceClient
w = WorkspaceClient(host = f"https://{retrieve_workspace_instance_name()}",
token = retrieve_token())
Create or specify a configuration profile that contains the fields host
and token
, and then intialize the WorkspaceClient
as follows:
Python
from databricks.sdk import WorkspaceClient
w = WorkspaceClient(profile = "<profile-name>")
Set the environment variables DATABRICKS_HOST
and DATABRICKS_TOKEN
in the same way you set them for Databricks Connect, and then initialize WorkspaceClient
as follows:
Python
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
The Databricks SDK for Python does not recognize the SPARK_REMOTE
environment variable for Databricks Connect.
For additional Databricks authentication options for the Databricks SDK for Python, as well as how to initialize AccountClient
within the Databricks SDKs to access available Databricks REST APIs at the account level instead of at the workspace level, see databricks-sdk on PyPI.
The following example shows how to use the Databricks SDK for Python to automate Databricks Utilities. This example creates a file named zzz_hello.txt
in a Unity Catalog volume's path within the workspace, reads the data from the file, and then deletes the file. This example assumes that the environment variables DATABRICKS_HOST
and DATABRICKS_TOKEN
have already been set:
Python
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
file_path = "/Volumes/main/default/my-volume/zzz_hello.txt"
file_data = "Hello, Databricks!"
fs = w.dbutils.fs
fs.put(
file = file_path,
contents = file_data,
overwrite = True
)
print(fs.head(file_path))
fs.rm(file_path)
See also Interaction with dbutils in the Databricks SDK for Python documentation.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4