This article shows how to set up Databricks Git folders for version control. After you set up Git folders in your Databricks workspace, you can perform common Git operations such as clone, checkout, commit, push, pull, and branch management from the Databricks UI. You can also see diffs for your changes as you develop in Databricks.
Configure user settingsâDatabricks Git folders uses a personal access token (PAT) or equivalent OAuth credentials to authenticate with your Git provider to perform operations. To use Git folders, you must first configure your Git credentials in Databricks. See Configure Git credentials & connect a remote repo to Databricks.
You can clone public remote repositories without Git credentials. To modify a public remote repository or to clone or modify a private remote repository, you must have a Git credential with Write (or greater) permissions for the remote repository.
Git folders are enabled by default. For more details on enabling or disabling Git folder support, see Enable or disable the Databricks Git folder feature.
Add or edit Git credentials in DatabricksâSelect the down arrow next to the account name at the upper-right of your screen, and then select Settings.
Select the Linked accounts tab.
If you're adding credentials for the first time, follow the on-screen instructions.
If you have previously entered credentials, click Config > Edit and go to the next step.
In the Git provider drop-down, select the provider name.
Depending on the provider selected, you might have the OAuth option and the personal access token (PAT) option. If you choose the OAuth option, complete the web authentication flow. If you choose the PAT option, enter your Git username or email. In the Token field, add the PAT from your Git provider. For details, see Configure Git credentials & connect a remote repo to Databricks
important
Databricks recommends that you use OAuth Git credentials. If you must use personal access tokens, you must set an expiration date.
If your organization has SAML SSO enabled in GitHub, authorize your personal access token for SSO.
You can also save a Git PAT token and username to Databricks using the Databricks Repos API.
Multiple Git credentials per user (Public Preview)âDatabricks supports multiple Git credentials per user, which enables easy switching between credentials for users on teams working with multiple Git providers or using multiple Git accounts for the same provider.
Explicit credential selection for Git foldersâIn addition to using your default Git credential, individual Git folders in Databricks can be configured to use a specific credential for Git operations.
You can change the Git credential used by a Git folder:
Each Git provider (e.g., GitHub, GitLab) supports one default Git credential per user. This default is used automatically for jobs, Repo APIs and Git folder operations.
The first credential you create for a provider automatically becomes the default for that provider. To change it, go to User Settings > Linked accounts, click the 3-dot menu next to the credential you want, and choose "Set as default."
LimitationsâGit folders requires network connectivity to your Git provider to function. Ordinarily, this is over the internet and works without further configuration. However, you might have set up additional restrictions on your Git provider for controlling access. For example, you might have an IP allowlist in place, or you might host your own on-premises Git server using services like GitHub Enterprise (GHE), Bitbucket Server (BBS), or Gitlab Self-managed. Depending on your network hosting and configuration, your Git server might not be accessible via the internet.
Security features in Git foldersâDatabricks Git folders have many security features. The following sections walk you through their setup and use:
You can use AWS Key Management Service to encrypt a Git personal access token (PAT) or other Git credentials. Using a key from an encryption service is referred to as a customer-managed key (CMK) or bring your own key (BYOK).
For more information, see Customer-managed keys for managed services.
Restrict usage to URLs in an allowlistâA workspace admin can limit which remote repositories users can clone from and commit & push to. This helps prevent exfiltration of your code; for example, users cannot push code to an arbitrary repository if you have turned on the allowlist restrictions. You can also prevent users from using unlicensed code by restricting the clone operation to a list of allowed repositories.
To set up an allowlist:
Go to the settings page.
Click the Workspace admin tab (it is open by default).
In the Development section, choose an option from Git URL allowlist permission:
Click the Edit button next to Git URL allowlist: Empty list and enter a comma-separated list of URL prefixes.
Click Save.
note
To disable an existing allowlist and allow access to all repositories:
Set permissions for a repo to control access. Permissions for a repo apply to all content in that repo. You can assign five permission levels to files: NO PERMISSIONS, CAN READ, CAN RUN, CAN EDIT, and CAN MANAGE.
For more details on Git folder permissions, see Git folder ACLs.
Audit loggingâWhen audit logging is enabled, audit events are logged when you interact with a Git folder. For example, an audit event is logged when you create, update, or delete a Git folder, when you list all Git folders associated with a workspace, and when you sync changes between your Git folder and the remote Git repo.
Secrets detectionâGit folders scan code for access key IDs that begin with the prefix AKIA
and warns the user before committing.
To delete a Git folder from your workspace:
Right-click the Git folder, and then select Move to trash.
In the dialog box, type the name of the Git folder you want to delete. Then, click Confirm & move to trash.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4