A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://docs.databricks.com/aws/en/dev-tools/vscode-ext/pytest below:

Run tests with pytest using the Databricks extension for Visual Studio Code

Run tests with pytest using the Databricks extension for Visual Studio Code

This article describes how to run tests using pytest using the Databricks extension for Visual Studio Code. See What is the Databricks extension for Visual Studio Code?.

You can run pytest on local code that does not need a connection to a cluster in a remote Databricks workspace. For example, you might use pytest to test your functions that accept and return PySpark DataFrames in local memory. To get started with pytest and run it locally, see Get Started in the pytest documentation.

To run pytest on code in a remote Databricks workspace, do the following in your Visual Studio Code project:

Step 1: Create the tests​

Add a Python file with the following code, which contains your tests to run. This example assumes that this file is named spark_test.py and is at the root of your Visual Studio Code project. This file contains a pytest fixture, which makes the cluster's SparkSession (the entry point to Spark functionality on the cluster) available to the tests. This file contains a single test that checks whether the specified cell in the table contains the specified value. You can add your own tests to this file as needed.

Python

from pyspark.sql import SparkSession
import pytest

@pytest.fixture
def spark() -> SparkSession:



return SparkSession.builder.getOrCreate()
















def test_spark(spark):
spark.sql('USE default')
data = spark.sql('SELECT * FROM diamonds')
assert data.collect()[0][2] == 'Ideal'
Step 2: Create the pytest runner​

Add a Python file with the following code, which instructs pytest to run your tests from the previous step. This example assumes that the file is named pytest_databricks.py and is at the root of your Visual Studio Code project.

Python

import pytest
import os
import sys







dir_root = os.path.dirname(os.path.realpath(__file__))

os.chdir(dir_root)


sys.dont_write_bytecode = True















retcode = pytest.main(sys.argv[1:])
Step 3: Create a custom run configuration​

To instruct pytest to run your tests, you must create a custom run configuration. Use the existing Databricks cluster-based run configuration to create your own custom run configuration, as follows:

  1. On the main menu, click Run > Add configuration.

  2. In the Command Palette, select Databricks.

    Visual Studio Code adds a .vscode/launch.json file to your project, if this file does not already exist.

  3. Change the starter run configuration as follows, and then save the file:

    Your launch.json file should look like this:

    JSON

    {



    "version": "0.2.0",
    "configurations": [
    {
    "type": "databricks",
    "request": "launch",
    "name": "Unit Tests (on Databricks)",
    "program": "${workspaceFolder}/pytest_databricks.py",
    "args": ["."],
    "env": {}
    }
    ]
    }
Step 4: Run the tests​

Make sure that pytest is already installed on the cluster first. For example, with the cluster's settings page open in your Databricks workspace, do the following:

  1. On the Libraries tab, if pytest is visible, then pytest is already installed. If pytest is not visible, click Install new.
  2. For Library Source, click PyPI.
  3. For Package, enter pytest.
  4. Click Install.
  5. Wait until Status changes from Pending to Installed.

To run the tests, do the following from your Visual Studio Code project:

  1. On the main menu, click View > Run.
  2. In the Run and Debug list, click Unit Tests (on Databricks), if it is not already selected.
  3. Click the green arrow (Start Debugging) icon.

The pytest results display in the Debug Console (View > Debug Console on the main menu). For example, these results show that at least one test was found in the spark_test.py file, and a dot (.) means that a single test was found and passed. (A failing test would show an F.)

<date>, <time> - Creating execution context on cluster <cluster-id> ...
<date>, <time> - Synchronizing code to /Workspace/path/to/directory ...
<date>, <time> - Running /pytest_databricks.py ...
============================= test session starts ==============================
platform linux -- Python <version>, pytest-<version>, pluggy-<version>
rootdir: /Workspace/path/to/directory
collected 1 item

spark_test.py . [100%]

============================== 1 passed in 3.25s ===============================
<date>, <time> - Done (took 10818ms)

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4