This tutorial walks you through setting up the Databricks extension for Visual Studio Code, and then running Python on a Databricks cluster and as a Databricks job in your remote workspace. See What is the Databricks extension for Visual Studio Code?.
RequirementsâThis tutorial requires that:
In this step, you create a new Databricks project and configure the connection with your remote Databricks workspace.
https://dbc-a1b2345c-d6e7.cloud.databricks.com
.With the Configuration view already open, click Select a cluster or click the gear (Configure cluster) icon.
In the Command Palette, select the name of the cluster that you created previously.
Click the play icon (Start Cluster) if it is not already started.
Create a local Python code file: on the sidebar, click the folder (Explorer) icon.
On the main menu, click File > New File and choose a Python file. Name the file demo.py and save it to the project's root.
Add the following code to the file and then save it. This code creates and displays the contents of a basic PySpark DataFrame:
Python
from pyspark.sql import SparkSession
from pyspark.sql.types import *
spark = SparkSession.builder.getOrCreate()
schema = StructType([
StructField('CustomerID', IntegerType(), False),
StructField('FirstName', StringType(), False),
StructField('LastName', StringType(), False)
])
data = [
[ 1000, 'Mathijs', 'Oosterhout-Rijntjes' ],
[ 1001, 'Joost', 'van Brunswijk' ],
[ 1002, 'Stan', 'Bokenkamp' ]
]
customers = spark.createDataFrame(data, schema)
customers.show()
Output
# +----------+---------+-------------------+
# |CustomerID|FirstName| LastName|
# +----------+---------+-------------------+
# | 1000| Mathijs|Oosterhout-Rijntjes|
# | 1001| Joost| van Brunswijk|
# | 1002| Stan| Bokenkamp|
# +----------+---------+-------------------+
Click the Run on Databricks icon next to the list of editor tabs, and then click Upload and Run File. The output appears in the Debug Console view.
Alternatively, in the Explorer view, right-click the demo.py
file, and then click Run on Databricks > Upload and Run File.
To run demo.py
as a job, click the Run on Databricks icon next to the list of editor tabs, and then click Run File as Workflow. The output appears in a separate editor tab next to the demo.py
file editor.
Alternatively, right-click the demo.py
file in the Explorer panel, then select Run on Databricks > Run File as Workflow.
Now that you have successfully used the Databricks extension for Visual Studio Code to upload a local Python file and run it remotely, you can also:
pytest
. See Run tests with pytest using the Databricks extension for Visual Studio Code.RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4