This article describes how to run tests using pytest
using the Databricks extension for Visual Studio Code. See What is the Databricks extension for Visual Studio Code?.
You can run pytest on local code that does not need a connection to a cluster in a remote Databricks workspace. For example, you might use pytest
to test your functions that accept and return PySpark DataFrames in local memory. To get started with pytest
and run it locally, see Get Started in the pytest
documentation.
To run pytest
on code in a remote Databricks workspace, do the following in your Visual Studio Code project:
Add a Python file with the following code, which contains your tests to run. This example assumes that this file is named spark_test.py
and is at the root of your Visual Studio Code project. This file contains a pytest
fixture, which makes the cluster's SparkSession
(the entry point to Spark functionality on the cluster) available to the tests. This file contains a single test that checks whether the specified cell in the table contains the specified value. You can add your own tests to this file as needed.
Python
from pyspark.sql import SparkSession
import pytest
@pytest.fixture
def spark() -> SparkSession:
return SparkSession.builder.getOrCreate()
def test_spark(spark):
spark.sql('USE default')
data = spark.sql('SELECT * FROM diamonds')
assert data.collect()[0][2] == 'Ideal'
Step 2: Create the pytest runnerâ
Add a Python file with the following code, which instructs pytest
to run your tests from the previous step. This example assumes that the file is named pytest_databricks.py
and is at the root of your Visual Studio Code project.
Python
import pytest
import os
import sys
dir_root = os.path.dirname(os.path.realpath(__file__))
os.chdir(dir_root)
sys.dont_write_bytecode = True
retcode = pytest.main(sys.argv[1:])
Step 3: Create a custom run configurationâ
To instruct pytest
to run your tests, you must create a custom run configuration. Use the existing Databricks cluster-based run configuration to create your own custom run configuration, as follows:
On the main menu, click Run > Add configuration.
In the Command Palette, select Databricks.
Visual Studio Code adds a .vscode/launch.json
file to your project, if this file does not already exist.
Change the starter run configuration as follows, and then save the file:
Run on Databricks
to some unique display name for this configuration, in this example Unit Tests (on Databricks)
.program
from ${file}
to the path in the project that contains the test runner, in this example ${workspaceFolder}/pytest_databricks.py
.args
from []
to the path in the project that contains the files with your tests, in this example ["."]
.Your launch.json
file should look like this:
JSON
{
"version": "0.2.0",
"configurations": [
{
"type": "databricks",
"request": "launch",
"name": "Unit Tests (on Databricks)",
"program": "${workspaceFolder}/pytest_databricks.py",
"args": ["."],
"env": {}
}
]
}
Make sure that pytest
is already installed on the cluster first. For example, with the cluster's settings page open in your Databricks workspace, do the following:
pytest
is already installed. If pytest is not visible, click Install new.pytest
.To run the tests, do the following from your Visual Studio Code project:
The pytest
results display in the Debug Console (View > Debug Console on the main menu). For example, these results show that at least one test was found in the spark_test.py
file, and a dot (.
) means that a single test was found and passed. (A failing test would show an F
.)
<date>, <time> - Creating execution context on cluster <cluster-id> ...
<date>, <time> - Synchronizing code to /Workspace/path/to/directory ...
<date>, <time> - Running /pytest_databricks.py ...
============================= test session starts ==============================
platform linux -- Python <version>, pytest-<version>, pluggy-<version>
rootdir: /Workspace/path/to/directory
collected 1 item
spark_test.py . [100%]
============================== 1 passed in 3.25s ===============================
<date>, <time> - Done (took 10818ms)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4