You can run and debug notebooks, one cell at a time or all cells at once, and see their results in the Visual Studio Code UI using the Databricks extension for Visual Studio Code Databricks Connect integration. All code runs locally, while all code involving DataFrame operations runs on the cluster in the remote Databricks workspace and run responses are sent back to the local caller. All code is debugged locally, while all Spark code continues to run on the cluster in the remote Databricks workspace. The core Spark engine code cannot be debugged directly from the client.
note
This feature works with Databricks Runtime 13.3 and above.
To enable the Databricks Connect integration for notebooks in the Databricks extension for Visual Studio Code, you must install Databricks Connect in the Databricks extension for Visual Studio Code. See Debug code using Databricks Connect for the Databricks extension for Visual Studio Code.
Run Python notebook cellsâFor notebooks with filenames that have a .py
extension, when you open the notebook in the Visual Studio Code IDE, each cell displays Run Cell, Run Above, and Debug Cell buttons. As you run a cell, its results are shown in a separate tab in the IDE. As you debug, the cell being debugged displays Continue, Stop, and Step Over buttons. As you debug a cell, you can use Visual Studio Code debugging features such as watching variables' states and viewing the call stack and debug console.
For notebooks with filenames that have a .ipynb
extension, when you open the notebook in the Visual Studio Code IDE, the notebook and its cells contain additional features. See Running cells and Work with code cells in the Notebook Editor.
For more information about notebook formats for filenames with the .py
and .ipynb
extensions, see Import and export Databricks notebooks.
To run or debug a Python Jupyter notebook (.ipynb
):
In your project, open the Python Jupyter notebook that you want to run or debug. Make sure the Python file is in Jupyter notebook format and has the extension .ipynb
.
tip
You can create a new Python Jupyter notebook by running the >Create: New Jupyter Notebook command from within the Command Palette.
Click Run All Cells to run all cells without debugging, Execute Cell to run an individual corresponding cell without debugging, or Run by Line to run an individual cell line-by-line with limited debugging, with variable values displayed in the Jupyter panel (View > Open View > Jupyter).
For full debugging within an individual cell, set breakpoints, and then click Debug Cell in the menu next to the cell's Run button.
After you click any of these options, you might be prompted to install missing Python Jupyter notebook package dependencies. Click to install.
For more information, see Jupyter Notebooks in VS Code.
The following notebook globals are also enabled:
spark
, representing an instance of databricks.connect.DatabricksSession
, is preconfigured to instantiate DatabricksSession
by getting Databricks authentication credentials from the extension. If DatabricksSession
is already instantiated in a notebook cell's code, this DatabricksSession
settings are used instead. See Code examples for Databricks Connect for Python.
udf
, preconfigured as an alias for pyspark.sql.functions.udf
, which is an alias for Python UDFs. See pyspark.sql.functions.udf.
sql
, preconfigured as an alias for spark.sql
. spark
, as described earlier, represents a preconfigured instance of databricks.connect.DatabricksSession
. See Spark SQL.
dbutils
, preconfigured as an instance of Databricks Utilities, which is imported from databricks-sdk
and is instantiated by getting Databricks authentication credentials from the extension. See Use Databricks Utilities.
note
Only a subset of Databricks Utilities is supported for notebooks with Databricks Connect.
To enable dbutils.widgets
, you must first install the Databricks SDK for Python by running the following command in your local development machine's terminal:
pip install 'databricks-sdk[notebook]'
display
, preconfigured as an alias for the Jupyter builtin IPython.display.display
. See IPython.display.display.
displayHTML
, preconfigured as an alias for dbruntime.display.displayHTML
, which is an alias for display.HTML
from ipython
. See IPython.display.html.
The following notebook magics are also enabled:
%fs
, which is the same as making dbutils.fs
calls. See Mix languages.
%sh
, which runs a command by using the cell magic %%script
on the local machine. This does not run the command in the remote Databricks workspace. See Mix languages.
%md
and %md-sandbox
, which runs the cell magic %%markdown
. See Mix languages.
%sql
, which runs spark.sql
. See Mix languages.
%pip
, which runs pip install
on the local machine. This does not run pip install
in the remote Databricks workspace. See Manage libraries with %pip
commands.
%run
, which runs another notebook. See Orchestrate notebooks and modularize code in notebooks.
note
To enable %run
, you must first install the nbformat library by running the following command in your local development machine's terminal:
Additional features that are enabled include:
Limitations of running cells in notebooks in Visual Studio Code include:
%r
and %scala
are not supported and display an error if called. See Mix languages.%sql
does not support some DML commands, such as Show Tables.RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4