If you use Databricks configuration profiles or Databricks-specific environment variables for Databricks authentication, the only code required to start working with a Databricks workspace is the following code snippet, which instructs the Databricks SDK for Python to use its default authentication flow:
from databricks.sdk import WorkspaceClient w = WorkspaceClient() w. # press <TAB> for autocompletion
The conventional name for the variable that holds the workspace-level client of the Databricks SDK for Python is w
, which is shorthand for workspace
.
If you initialise WorkspaceClient
without any arguments, credentials will be picked up automatically from the notebook context. If the same code is run outside the notebook environment, like CI/CD, you have to supply environment variables for the authentication to work.
databricks.sdk.AccountClient
does not support notebook-native authentication.
If you run the Databricks Terraform Provider, the Databricks SDK for Go, the Databricks CLI, or applications that target the Databricks SDKs for other languages, most likely they will all interoperate nicely together. By default, the Databricks SDK for Python tries the following authentication methods, in the following order, until it succeeds:
If the SDK is unsuccessful at this point, it returns an authentication error and stops running.
You can instruct the Databricks SDK for Python to use a specific authentication method by setting the auth_type
argument as described in the following sections.
For each authentication method, the SDK searches for compatible authentication credentials in the following locations, in the following order. Once the SDK finds a compatible set of credentials that it can use, it stops searching:
Credentials that are hard-coded into configuration arguments.
:warning: Caution: Databricks does not recommend hard-coding credentials into arguments, as they can be exposed in plain text in version control systems. Use environment variables or configuration profiles instead.
Credentials in Databricks-specific environment variables.
For Databricks native authentication, credentials in the .databrickscfg
file’s DEFAULT
configuration profile from its default file location (~
for Linux or macOS, and %USERPROFILE%
for Windows).
For Azure native authentication, the SDK searches for credentials through the Azure CLI as needed.
Depending on the Databricks authentication method, the SDK uses the following information. Presented are the WorkspaceClient
and AccountClient
arguments (which have corresponding .databrickscfg
file fields), their descriptions, and any corresponding environment variables.
By default, the Databricks SDK for Python initially tries Databricks token authentication (auth_type='pat'
argument). If the SDK is unsuccessful, it then tries Databricks basic (username/password) authentication (auth_type="basic"
argument).
For Databricks token authentication, you must provide host
and token
; or their environment variable or .databrickscfg
file field equivalents.
For Databricks basic authentication, you must provide host
, username
, and password
(for AWS workspace-level operations); or host
, account_id
, username
, and password
(for AWS, Azure, or GCP account-level operations); or their environment variable or .databrickscfg
file field equivalents.
Argument
Description
Environment variable
host
(String) The Databricks host URL for either the Databricks workspace endpoint or the Databricks accounts endpoint.
DATABRICKS_HOST
account_id
(String) The Databricks account ID for the Databricks accounts endpoint. Only has effect when Host
is either https://accounts.cloud.databricks.com/
(AWS), https://accounts.azuredatabricks.net/
(Azure), or https://accounts.gcp.databricks.com/
(GCP).
DATABRICKS_ACCOUNT_ID
token
(String) The Databricks personal access token (PAT) (AWS, Azure, and GCP) or Azure Active Directory (Azure AD) token (Azure).
DATABRICKS_TOKEN
username
(String) The Databricks username part of basic authentication. Only possible when Host
is *.cloud.databricks.com
(AWS).
DATABRICKS_USERNAME
password
(String) The Databricks password part of basic authentication. Only possible when Host
is *.cloud.databricks.com
(AWS).
DATABRICKS_PASSWORD
For example, to use Databricks token authentication:
from databricks.sdk import WorkspaceClient w = WorkspaceClient(host=input('Databricks Workspace URL: '), token=input('Token: '))Azure native authentication¶
By default, the Databricks SDK for Python first tries Azure client secret authentication (auth_type='azure-client-secret'
argument). If the SDK is unsuccessful, it then tries Azure CLI authentication (auth_type='azure-cli'
argument). See Manage service principals.
The Databricks SDK for Python picks up an Azure CLI token, if you’ve previously authenticated as an Azure user by running az login
on your machine. See Get Azure AD tokens for users by using the Azure CLI.
To authenticate as an Azure Active Directory (Azure AD) service principal, you must provide one of the following. See also Add a service principal to your Azure Databricks account:
azure_workspace_resource_id
, azure_client_secret
, azure_client_id
, and azure_tenant_id
; or their environment variable or .databrickscfg
file field equivalents.
azure_workspace_resource_id
and azure_use_msi
; or their environment variable or .databrickscfg
file field equivalents.
Argument
Description
Environment variable
azure_workspace_resource_id
(String) The Azure Resource Manager ID for the Azure Databricks workspace, which is exchanged for a Databricks host URL.
DATABRICKS_AZURE_RESOURCE_ID
azure_use_msi
(Boolean) true
to use Azure Managed Service Identity passwordless authentication flow for service principals.
ARM_USE_MSI
azure_client_secret
(String) The Azure AD service principal’s client secret.
ARM_CLIENT_SECRET
azure_client_id
(String) The Azure AD service principal’s application ID.
ARM_CLIENT_ID
azure_tenant_id
(String) The Azure AD service principal’s tenant ID.
ARM_TENANT_ID
azure_environment
(String) The Azure environment type (such as Public, UsGov, China, and Germany) for a specific set of API endpoints. Defaults to PUBLIC
.
ARM_ENVIRONMENT
For example, to use Azure client secret authentication:
from databricks.sdk import WorkspaceClient w = WorkspaceClient(host=input('Databricks Workspace URL: '), azure_workspace_resource_id=input('Azure Resource ID: '), azure_tenant_id=input('AAD Tenant ID: '), azure_client_id=input('AAD Client ID: '), azure_client_secret=input('AAD Client Secret: '))Overriding
.databrickscfg
¶
For Databricks native authentication, you can override the default behavior for using .databrickscfg
as follows:
Argument
Description
Environment variable
profile
(String) A connection profile specified within .databrickscfg
to use instead of DEFAULT
.
DATABRICKS_CONFIG_PROFILE
config_file
(String) A non-default location of the Databricks CLI credentials file.
DATABRICKS_CONFIG_FILE
For example, to use a profile named MYPROFILE
instead of DEFAULT
:
from databricks.sdk import WorkspaceClient w = WorkspaceClient(profile='MYPROFILE') # Now call the Databricks workspace APIs as desired...Additional configuration options¶
For all authentication methods, you can override the default behavior in client arguments as follows:
Argument
Description
Environment variable
auth_type
(String) When multiple auth attributes are available in the environment, use the auth type specified by this argument. This argument also holds the currently selected auth.
DATABRICKS_AUTH_TYPE
http_timeout_seconds
(Integer) Number of seconds for HTTP timeout. Default is 60.
(None)
retry_timeout_seconds
(Integer) Number of seconds to keep retrying HTTP requests. Default is 300 (5 minutes).
(None)
debug_truncate_bytes
(Integer) Truncate JSON fields in debug logs above this limit. Default is 96.
DATABRICKS_DEBUG_TRUNCATE_BYTES
debug_headers
(Boolean) true
to debug HTTP headers of requests made by the application. Default is false
, as headers contain sensitive data, such as access tokens.
DATABRICKS_DEBUG_HEADERS
rate_limit
(Integer) Maximum number of requests per second made to Databricks REST API.
DATABRICKS_RATE_LIMIT
For example, to turn on debug HTTP headers:
from databricks.sdk import WorkspaceClient w = WorkspaceClient(debug_headers=True) # Now call the Databricks workspace APIs as desired...
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4