DQX issues (tickets) are tracked on Github here. You can view all open issues, including those actively being worked on and those available for contribution. Issues with an assigned username are already in progress, while unassigned ones are open for anyone to pick up.
If you'd like to work on an issue, please either assign it to yourself or leave a comment to indicate that you’re taking it on. Before starting work, it's a good idea to discuss the issue in the comments, especially if you have questions, need clarification, or want to confirm the proposed approach.
If you have a new idea, consider opening a new issue first to gather feedback and ensure alignment with the project’s goals. This helps avoid duplicated efforts and ensures contributions fit well with the roadmap. You can also start broader conversations or ask questions in here.
First PrinciplesFavoring standard libraries over external dependencies, especially in specific contexts like Databricks, is a best practice in software development.
There are several reasons why this approach is encouraged:
While minimizing external dependencies is essential, exceptions can be made case-by-case. There are situations where external dependencies are justified, such as when a well-established and actively maintained library provides significant benefits, like time savings, performance improvements, or specialized functionality unavailable in standard libraries.
First contributionIf you're interested in contributing, please create a PR, contact us, or open an issue to discuss your ideas.
Here are the example steps to submit your first contribution:
Fork the DQX repo. You can also create a branch if you are added as a writer to the repo.
Clone the repo locally: git clone <repository-url>
git checkout main
(or gcm
if you're using ohmyzsh).
git pull
(or gl
if you're using ohmyzsh).
git checkout -b FEATURENAME
(or gcb FEATURENAME
if you're using ohmyzsh).
.. do the work
make fmt
(Note: If you have an issue with make fmt
, ensure your IDE folder is ignored in .gitignore. Already added for .idea/ and .cursor/)
make lint
.. fix if any issues are reported
make test
and make integration
, and optionally make coverage
(generate coverage report)
.. fix if any issues reported
git commit -S -a -m "message"
Make sure to enter a meaningful commit message title. You need to sign commits with your GPG key (hence -S option). To set up GPG key in your Github account, follow these instructions. You can configure Git to sign all commits with your GPG key by default: git config --global commit.gpgsign true
If you have not signed your commits initially, you can re-apply all of them and sign as follows:
git reset --soft HEAD~<how-many-commit-go-back>
git commit -S --reuse-message=ORIG_HEAD
git push -f origin <remote-branch-name>
git push origin FEATURENAME
To access the repository, you must use the HTTPS remote with a personal access token or SSH with an SSH key and passphrase that has been authorized for the databrickslabs
organization.
Go to GitHub UI and create PR. Alternatively, gh pr create
(if you have GitHub CLI installed). Use a meaningful pull request title because it'll appear in the release notes. Use Resolves #NUMBER
in pull request description to automatically link it to an existing issue.
This section provides a step-by-step guide for setting up and starting work on the project. These steps will help you set up your project environment and dependencies for efficient development.
To begin, install Hatch, which is our build tool.
On MacOSX, this is achieved using the following:
Run the following command in your project’s root directory to create the default environment and install development dependencies, assuming you've already cloned the github repo.
hatch env create
make dev
Before every commit, apply the consistent formatting of the code, as we want our codebase look consistent:
Before every commit, run automated bug detector and unit tests to ensure that automated pull request checks do pass before your code is reviewed by others:
Running integration tests and code coverageIntegration tests and code coverage are run automatically when you create a Pull Request in Github. You can also trigger the tests from a local machine by configuring authentication to a Databricks workspace.
Using terminalIf you want to run the tests from your local machine in the terminal, you need to set up the following environment variables:
export DATABRICKS_HOST=https://<workspace-url>
export DATABRICKS_CLUSTER_ID=<cluster-id>
# Authenticate to Databricks using OAuth generated for a service principal (recommended)
export DATABRICKS_CLIENT_ID=<oauth-client-id>
export DATABRICKS_CLIENT_SECRET=<oauth-client-secret>
# Optionally enable serverless compute to be used for the tests.
# Note that we run integration tests on standard and serverless compute clusters as part of the CI/CD pipelines
export DATABRICKS_SERVERLESS_COMPUTE_ID=auto
We recommend using OAuth access token generated for a service principal to authenticate with Databricks as presented above. Alternatively, you can authenticate using PAT token by setting the DATABRICKS_TOKEN
environment variable. However, we do not recommend this method, as it is less secure than OAuth.
Run integration tests with the following command:
Calculate test coverage and display report in HTML:
Using IDEIf you want to run integration tests from your IDE, you must set .env
or ~/.databricks/debug-env.json
file (see instructions). The name of the debug environment that you must define is ws
(see debug_env_name
fixture in the conftest.py
).
Minimal Configuration
Create the ~/.databricks/debug-env.json
with the following content, replacing the placeholders:
{
"ws": {
"DATABRICKS_CLIENT_ID": "<oauth-client-id>",
"DATABRICKS_CLIENT_SECRET": "<oauth-client-secret>",
"DATABRICKS_HOST": "https://<workspace-url>",
"DATABRICKS_CLUSTER_ID": "<databricks-cluster-id>"
}
}
You must provide an existing cluster that will auto-start for you as part of the tests.
We recommend using OAuth access token generated for a service principal to authenticate with Databricks as presented above. Alternatively, you can authenticate using PAT token by providing the DATABRICKS_TOKEN
field. However, we do not recommend this method, as it is less secure than OAuth.
Running Tests on Serverless Compute
Integration tests are executed on both standard and serverless compute clusters as part of the CI/CD pipelines. To run integration tests on serverless compute, add the DATABRICKS_SERVERLESS_COMPUTE_ID
field to your debug configuration:
{
"ws": {
"DATABRICKS_CLIENT_ID": "<oauth-client-id>",
"DATABRICKS_CLIENT_SECRET": "<oauth-client-secret>",
"DATABRICKS_HOST": "https://<workspace-url>",
"DATABRICKS_CLUSTER_ID": "<databricks-cluster-id>",
"DATABRICKS_SERVERLESS_COMPUTE_ID": "auto"
}
}
When DATABRICKS_SERVERLESS_COMPUTE_ID
is set, the DATABRICKS_CLUSTER_ID
is ignored, and tests run on serverless compute.
We require that all changes must be covered by unit tests and integration tests. A pull request (PR) will be blocked if the proposed change negatively impacts the code coverage. However, manual testing may still be useful before creating or merging a PR.
To test DQX from your feature branch, you can install it directly as follows:
pip install git+https://github.com/databrickslabs/dqx.git@feature_barnch_name
Replace feature_branch_name
with the name of your branch.
Once you clone the repo locally and install Databricks CLI you can run labs CLI commands from the root of the repository. Similar to other Databricks CLI commands, we can specify the Databricks profile to use with --profile
.
Build the project:
Authenticate your current machine to your Databricks Workspace:
databricks auth login --host <WORKSPACE_HOST>
Show info about the project:
Install dqx:
# use the current codebase
databricks labs install .
Show current installation username:
Uninstall DQX:
databricks labs uninstall dqx
Manual testing of the CLI commands from a pre-release version
In most cases, installing DQX directly from the current codebase is sufficient to test CLI commands. However, this approach may not be ideal in some cases because the CLI would use the current development virtual environment. When DQX is installed from a released version, it creates a fresh and isolated Python virtual environment locally and installs all the required packages, ensuring a clean setup. If you need to perform end-to-end testing of the CLI before an official release, follow the process outlined below.
Usage tips
This method is only available for GitHub accounts with write access to the repository. It is not available if you contribute from a fork.
# create new tag
git tag v0.1.12-alpha
# push the tag
git push origin v0.1.12-alpha
# specify the tag (pre-release version)
databricks labs install dqx@v0.1.12-alpha
Release
The release pipeline only triggers when a valid semantic version is provided (e.g. v0.1.12). Pre-release versions (e.g. v0.1.12-alpha) do not trigger the release pipeline, allowing you to test changes safely before making an official release.
TroubleshootingIf you encounter any package dependency errors after git pull
, run make clean
mypy
errors
See https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html for more details
..., expression has type "None", variable has a type "str"
assert ... is not None
if it's a body of a method. Example:# error: Argument 1 to "delete" of "DashboardWidgetsAPI" has incompatible type "str | None"; expected "str"
self._ws.dashboard_widgets.delete(widget.id)
after
assert widget.id is not None
self._ws.dashboard_widgets.delete(widget.id)
... | None
if it's in the dataclass. Example: cloud: str = None
-> cloud: str | None = None
..., has incompatible type "Path"; expected "str"
Add .as_posix()
to convert Path to str
Argument 2 to "get" of "dict" has incompatible type "None"; expected ...
Add a valid default value for the dictionary return.
Example:
def viz_type(self) -> str:
return self.viz.get("type", None)
after:
Example:
def viz_type(self) -> str:
return self.viz.get("type", "UNKNOWN")
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4