Learn about and configure the Git server proxy for Git folders, which enables you to proxy Git commands from Databricks Git folders to your on-premises Git repositories served by GitHub Enterprise Server, Bitbucket Server, and GitLab self-managed.
note
Users with a Databricks Git server proxy configured during preview should upgrade their cluster permissions for best performance. See Remove global CAN_ATTACH_TO permissions.
The Databricks Git server proxy is specifically designed to work with the version of the Databricks Runtime included in the configuration notebook. Users are discouraged from updating the Databricks Runtime version of the proxy cluster.
What is Git server proxy for Databricks Git folders?âDatabricks Git server proxy for Git folders is a feature that allows you to proxy Git commands from your Databricks workspace to an on-premises Git server.
Databricks Git folders (formerly Repos) represents your connected Git repositories as folders. The contents of these folders are version-controlled by syncing them to the connected Git repository. By default, Git folders can synchronize only with public Git providers (like public GitHub, GitLab, Azure DevOps, and others). However, if you host your own on-premises Git server (such as GitHub Enterprise Server, Bitbucket Server , or GitLab self-managed), you must use Git server proxy with Git folders to provide Databricks access to your Git server. Your Git server must be accessible from your Databricks data plane (driver node).
If your corporate network is private (VPN) access only (no public access), you must run a Git server proxy to access Git repositories located outside of it and to add Git folders to your workspaces.
How does Git Server Proxy for Databricks Git folders work?âGit server proxy for Databricks Git folders proxies Git commands from the Databricks control plane to a proxy cluster running in your Databricks workspace's compute plane. In this context, the proxy cluster is a cluster configured to run a proxy service for Git commands from Databricks Git folders to your self-hosted Git repository. This proxy service receives Git commands from the Databricks control plane and forwards them to your Git server instance.
The diagram below illustrates the overall system architecture:
Currently, a Git server proxy no longer requires CAN_ATTACH_TO
permission for all users. Admins with an existing proxy clusters can now modify the cluster ACL permission to enable this feature. To enable it:
Select Compute from the sidebar, and then click the kebab menu next to the Compute entry for the Git Server Proxy you're running:
From the dialog, remove the Can Attach To entry for All Users:
This section describes how to prepare your Git server instance for Git server proxy for Databricks Git folders, create the proxy, and validate your configuration.
Before you beginâBefore enabling the proxy, ensure that:
note
Git server proxy for Databricks works in all regions supported by your VPC.
Step 1: Prepare your Git server instanceâimportant
You must be an admin on the workspace with access rights to create a compute resource and complete this task.
To configure your Git server instance:
Give the proxy cluster's driver node access your Git server.
Your enterprise Git server can have an allowlist
of IP addresses from which access is permitted.
To enable the proxy:
Log into your Databricks workspace as a workspace admin with access rights to create a cluster.
Import this notebook, which chooses the smallest instance type available from your cloud provider to run the Git proxy.:
Click Run All to run the notebook, which performs the following tasks:
As a best practice, consider creating a simple job to run the Git proxy compute resource. This can be a simple notebook that prints or logs status such as âThe Git proxy service is running.â Set the job to run on regular time intervals to ensure the Git proxy service is always available for your users.
note
Running an additional long-running compute resource to host the proxy software incurs extra DBUs. To minimize costs, the notebook configures the proxy to use a single-node compute resource with an inexpensive node type. However, you might want to modify the compute options to suit your needs. For more information on compute instance pricing, see the Databricks pricing calculator.
Step 3: Validate your Git server configurationâTo validate your Git server configuration, try to clone a repository hosted on your private Git server via the proxy cluster. A successful clone means that you have successfully enabled Git server proxy for your workspace.
Step 4: Create proxy-enabled Git repositoriesâAfter users configure their Git credentials, no further steps are required to create or synchronize your repos. To configure credentials and access the repositories for your Git folders programmatically, see Configure Git credentials & connect a remote repo to Databricks.
Remove global CAN_ATTACH_TO permissionsâAdmins with an existing proxy clusters can now modify the cluster ACL permission to leverage generally available Git server proxy behavior.
If you previously configured Databricks Git server proxy with CAN_ATTACH_TO
privileges, use the following steps to remove these permissions:
Select Compute from the sidebar, and then click the kebab menu next to the Compute entry for the Git server proxy you're running:
From the dialog, remove the Can Attach To entry for All Users:
Did you encounter an error while configuring Git server proxy for Databricks Git folders? Here are some common issues and ways to diagnose them more effectively.
Checklist for common problemsâBefore you start diagnosing an error, confirm that you've completed the following steps:
If your Git proxy service is not working with the default configuration, you can set specific environment variables to make changes to it to better support your network infrastructure.
Use the following environment variables to update the configuration for your Git proxy service:
To set these environment variables, go to the Compute tab in your Databricks workspace and select the compute configuration for your Git proxy service. At the bottom of the Configuration pane, expand Advanced and select the Spark tab under it. Set one or more of these environment variables by adding them to the Environment variables text area.
Inspect logs on the proxy clusterâThe file at /databricks/git-proxy/git-proxy.log
on the proxy cluster contains logs that are useful for debugging purposes.
The log file should start with the line Data-plane proxy server binding to ('', 8000)â¦
. If it does not, this means that the proxy server did not start properly. Try restarting the cluster, or delete the cluster you created and run the enablement notebook again.
If the log file does start with this line, review the log statements that follow it for each Git request initiated by a Git operation in Databricks Git folders.
For example:
do_GET: https://server-address/path/to/repo/info/refs?service=git-upload-pack 10.139.0.25 - - [09/Jun/2021 06:53:02] /
"GET /server-address/path/to/repo/info/refs?service=git-upload-pack HTTP/1.1" 200`
Error logs written to this file can be useful to help you or Databricks Support debug issues.
Common error messages and their resolutionâSecure connection could not be established because of SSL problems
You might see the following error:
https://git.consult-prodigy.com/Prodigy/databricks_test: Secure connection to https://git.consult-prodigy.com/Prodigy/databricks_test could not be established because of SLL problems
Often this means that you are using a repository that requires special SSL certificates. Check the content of the /databricks/git-proxy/git-proxy.log
file on the proxy cluster. If it says that certificate validation failed, then you must add the certificate of authority to the system certificate chain. First, extract the root certificate (using the browser or other option) and upload it to DBFS. Then, edit the Git folders Git Proxy cluster to use the GIT_PROXY_CA_CERT_PATH
environment variable to point to the root certificate file. For more information about editing cluster environment variables, see Environment variables.
After you have completed that step, restart the cluster.
Failure to clone repository with error âMissing/Invalid Git credentialsâ
First, check that you have configured your Git credentials in User Settings.
You might encounter this error:
Error: Invalid Git credentials. Go to User Settings -> Git Integration and check that your personal access token or app password has the correct repository access.
If your organization is using SAML SSO, make sure the token has been authorized (this can be done from your Git server's Personal Access Token (PAT) management page).
Import and run the Git proxy debug notebook. The results of the notebook run show if there are issues with the Git proxy service.
What are the security implications of the Git server proxy?âYes. Your Databricks workspace does not differentiate between proxied and non-proxied repositories.
Does the Git proxy feature work with other Git enterprise server providers?âDatabricks Git folders supports GitHub Enterprise, Bitbucket Server, Azure DevOps Server, and GitLab self-managed. Other enterprise Git server providers should work as well if they conform to common Git specifications.
Do Databricks Git folders support GPG signing of commits?âNo.
Do Databricks Git folders support SSH transport for Git operations?âNo. Only HTTPS is supported.
Is the use of a non-default HTTPS port on the Git server supported?âCurrently, the enablement notebook assumes that your Git server uses the default HTTPS port 443. You can set the environment variable GIT_PROXY_CUSTOM_HTTP_PORT
to overwrite the port value with a preferred one.
You need one proxy cluster per Databricks workspace.
Can Databricks hide Git server URLs that are proxied? Could users enter the original Git server URLs rather than proxied URLs?âYes to both questions. Users do not need to adjust their behavior for the proxy. With the current proxy implementation, all Git traffic for Databricks Git folders is routed through the proxy. Users enter the normal Git repository URL such as https://git.company.com/org/repo-name.git
.
Yes, the proxy uses the user account's Git server token to authenticate to the Git server.
Is there Databricks access to Git server code?âThe Databricks proxy service accesses the Git repository on the Git server using the user-provided credential and synchronizes any code files in the repository with the Git folder. Access is restricted by the permissions specified in the user-provided personal access token (PAT).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4