A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://docs.databricks.com/aws/en/repos/git-proxy below:

Set up private Git connectivity for Databricks Git folders (Repos)

Set up private Git connectivity for Databricks Git folders (Repos)

Learn about and configure the Git server proxy for Git folders, which enables you to proxy Git commands from Databricks Git folders to your on-premises Git repositories served by GitHub Enterprise Server, Bitbucket Server, and GitLab self-managed.

note

Users with a Databricks Git server proxy configured during preview should upgrade their cluster permissions for best performance. See Remove global CAN_ATTACH_TO permissions.

The Databricks Git server proxy is specifically designed to work with the version of the Databricks Runtime included in the configuration notebook. Users are discouraged from updating the Databricks Runtime version of the proxy cluster.

What is Git server proxy for Databricks Git folders?​

Databricks Git server proxy for Git folders is a feature that allows you to proxy Git commands from your Databricks workspace to an on-premises Git server.

Databricks Git folders (formerly Repos) represents your connected Git repositories as folders. The contents of these folders are version-controlled by syncing them to the connected Git repository. By default, Git folders can synchronize only with public Git providers (like public GitHub, GitLab, Azure DevOps, and others). However, if you host your own on-premises Git server (such as GitHub Enterprise Server, Bitbucket Server , or GitLab self-managed), you must use Git server proxy with Git folders to provide Databricks access to your Git server. Your Git server must be accessible from your Databricks data plane (driver node).

If your corporate network is private (VPN) access only (no public access), you must run a Git server proxy to access Git repositories located outside of it and to add Git folders to your workspaces.

How does Git Server Proxy for Databricks Git folders work?​

Git server proxy for Databricks Git folders proxies Git commands from the Databricks control plane to a proxy cluster running in your Databricks workspace's compute plane. In this context, the proxy cluster is a cluster configured to run a proxy service for Git commands from Databricks Git folders to your self-hosted Git repository. This proxy service receives Git commands from the Databricks control plane and forwards them to your Git server instance.

The diagram below illustrates the overall system architecture:

Currently, a Git server proxy no longer requires CAN_ATTACH_TO permission for all users. Admins with an existing proxy clusters can now modify the cluster ACL permission to enable this feature. To enable it:

  1. Select Compute from the sidebar, and then click the kebab menu next to the Compute entry for the Git Server Proxy you're running:

  2. From the dialog, remove the Can Attach To entry for All Users:

How do I set up Git Server Proxy for Databricks Git folders?​

This section describes how to prepare your Git server instance for Git server proxy for Databricks Git folders, create the proxy, and validate your configuration.

Before you begin​

Before enabling the proxy, ensure that:

note

Git server proxy for Databricks works in all regions supported by your VPC.

Step 1: Prepare your Git server instance​

important

You must be an admin on the workspace with access rights to create a compute resource and complete this task.

To configure your Git server instance:

  1. Give the proxy cluster's driver node access your Git server.

    Your enterprise Git server can have an allowlist of IP addresses from which access is permitted.

    1. Associate a static outbound IP address for traffic that originates from your proxy cluster. You can do this by proxying traffic through a NAT gateway.
    2. Add the IP address from the previous step to your Git server's allowlist.
  1. Set your Git server instance to allow HTTPS transport.
Step 2: Run the enablement notebook​

To enable the proxy:

  1. Log into your Databricks workspace as a workspace admin with access rights to create a cluster.

  2. Import this notebook, which chooses the smallest instance type available from your cloud provider to run the Git proxy.:

    Notebook: Enable Git server proxy for Databricks Git folders for private Git server connectivity in Git folders.

  3. Click Run All to run the notebook, which performs the following tasks:

    As a best practice, consider creating a simple job to run the Git proxy compute resource. This can be a simple notebook that prints or logs status such as “The Git proxy service is running.” Set the job to run on regular time intervals to ensure the Git proxy service is always available for your users.

note

Running an additional long-running compute resource to host the proxy software incurs extra DBUs. To minimize costs, the notebook configures the proxy to use a single-node compute resource with an inexpensive node type. However, you might want to modify the compute options to suit your needs. For more information on compute instance pricing, see the Databricks pricing calculator.

Step 3: Validate your Git server configuration​

To validate your Git server configuration, try to clone a repository hosted on your private Git server via the proxy cluster. A successful clone means that you have successfully enabled Git server proxy for your workspace.

Step 4: Create proxy-enabled Git repositories​

After users configure their Git credentials, no further steps are required to create or synchronize your repos. To configure credentials and access the repositories for your Git folders programmatically, see Configure Git credentials & connect a remote repo to Databricks.

Remove global CAN_ATTACH_TO permissions​

Admins with an existing proxy clusters can now modify the cluster ACL permission to leverage generally available Git server proxy behavior.

If you previously configured Databricks Git server proxy with CAN_ATTACH_TO privileges, use the following steps to remove these permissions:

  1. Select Compute from the sidebar, and then click the kebab menu next to the Compute entry for the Git server proxy you're running:

  2. From the dialog, remove the Can Attach To entry for All Users:

Troubleshooting​

Did you encounter an error while configuring Git server proxy for Databricks Git folders? Here are some common issues and ways to diagnose them more effectively.

Checklist for common problems​

Before you start diagnosing an error, confirm that you've completed the following steps:

Change your Git proxy configuration​

If your Git proxy service is not working with the default configuration, you can set specific environment variables to make changes to it to better support your network infrastructure.

Use the following environment variables to update the configuration for your Git proxy service:

To set these environment variables, go to the Compute tab in your Databricks workspace and select the compute configuration for your Git proxy service. At the bottom of the Configuration pane, expand Advanced and select the Spark tab under it. Set one or more of these environment variables by adding them to the Environment variables text area.

Inspect logs on the proxy cluster​

The file at /databricks/git-proxy/git-proxy.log on the proxy cluster contains logs that are useful for debugging purposes.

The log file should start with the line Data-plane proxy server binding to ('', 8000)…. If it does not, this means that the proxy server did not start properly. Try restarting the cluster, or delete the cluster you created and run the enablement notebook again.

If the log file does start with this line, review the log statements that follow it for each Git request initiated by a Git operation in Databricks Git folders.

For example:

  do_GET: https://server-address/path/to/repo/info/refs?service=git-upload-pack 10.139.0.25 - - [09/Jun/2021 06:53:02] /
"GET /server-address/path/to/repo/info/refs?service=git-upload-pack HTTP/1.1" 200`

Error logs written to this file can be useful to help you or Databricks Support debug issues.

Common error messages and their resolution​ Frequently asked questions​ What's the easiest way to find out if the Git proxy server is running?​

Import and run the Git proxy debug notebook. The results of the notebook run show if there are issues with the Git proxy service.

What are the security implications of the Git server proxy?​

Yes. Your Databricks workspace does not differentiate between proxied and non-proxied repositories.

Does the Git proxy feature work with other Git enterprise server providers?​

Databricks Git folders supports GitHub Enterprise, Bitbucket Server, Azure DevOps Server, and GitLab self-managed. Other enterprise Git server providers should work as well if they conform to common Git specifications.

Do Databricks Git folders support GPG signing of commits?​

No.

Do Databricks Git folders support SSH transport for Git operations?​

No. Only HTTPS is supported.

Is the use of a non-default HTTPS port on the Git server supported?​

Currently, the enablement notebook assumes that your Git server uses the default HTTPS port 443. You can set the environment variable GIT_PROXY_CUSTOM_HTTP_PORT to overwrite the port value with a preferred one.

You need one proxy cluster per Databricks workspace.

Can Databricks hide Git server URLs that are proxied? Could users enter the original Git server URLs rather than proxied URLs?​

Yes to both questions. Users do not need to adjust their behavior for the proxy. With the current proxy implementation, all Git traffic for Databricks Git folders is routed through the proxy. Users enter the normal Git repository URL such as https://git.company.com/org/repo-name.git.

Does the feature transparently proxy authentication data to the Git server?​

Yes, the proxy uses the user account's Git server token to authenticate to the Git server.

Is there Databricks access to Git server code?​

The Databricks proxy service accesses the Git repository on the Git server using the user-provided credential and synchronizes any code files in the repository with the Git folder. Access is restricted by the permissions specified in the user-provided personal access token (PAT).


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4