note
This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.
This article provides code examples that use Databricks Connect for Python. Databricks Connect enables you to connect popular IDEs, notebook servers, and custom applications to Databricks clusters. See What is Databricks Connect?. For the Scala version of this article, see Code examples for Databricks Connect for Scala.
Databricks provides several additional example applications that show how to use Databricks Connect. See the example applications for Databricks Connect repository in GitHub, specifically:
You can also use the following simpler code examples to experiment with Databricks Connect. These examples assume that you are using default authentication for Databricks Connect client setup.
This simple code example queries the specified table and then shows the specified table's first 5 rows. To use a different table, adjust the call to spark.read.table
.
Python
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()
df = spark.read.table("samples.nyctaxi.trips")
df.show(5)
This longer code example does the following:
zzz_demo_temps_table
within the default
schema. If the table with this name already exists, the table is deleted first. To use a different schema or table, adjust the calls to spark.sql
, temps.write.saveAsTable
, or both.SELECT
query on the table's contents.Python
from databricks.connect import DatabricksSession
from pyspark.sql.types import *
from datetime import date
spark = DatabricksSession.builder.getOrCreate()
schema = StructType([
StructField('AirportCode', StringType(), False),
StructField('Date', DateType(), False),
StructField('TempHighF', IntegerType(), False),
StructField('TempLowF', IntegerType(), False)
])
data = [
[ 'BLI', date(2021, 4, 3), 52, 43],
[ 'BLI', date(2021, 4, 2), 50, 38],
[ 'BLI', date(2021, 4, 1), 52, 41],
[ 'PDX', date(2021, 4, 3), 64, 45],
[ 'PDX', date(2021, 4, 2), 61, 41],
[ 'PDX', date(2021, 4, 1), 66, 39],
[ 'SEA', date(2021, 4, 3), 57, 43],
[ 'SEA', date(2021, 4, 2), 54, 39],
[ 'SEA', date(2021, 4, 1), 56, 41]
]
temps = spark.createDataFrame(data, schema)
spark.sql('USE default')
spark.sql('DROP TABLE IF EXISTS zzz_demo_temps_table')
temps.write.saveAsTable('zzz_demo_temps_table')
df_temps = spark.sql("SELECT * FROM zzz_demo_temps_table " \
"WHERE AirportCode != 'BLI' AND Date > '2021-04-01' " \
"GROUP BY AirportCode, Date, TempHighF, TempLowF " \
"ORDER BY TempHighF DESC")
df_temps.show()
spark.sql('DROP TABLE zzz_demo_temps_table')
note
The following example describes how to to write code that is portable between Databricks Connect for Databricks Runtime 13.3 LTS and above in environments where the DatabricksSession
class is unavailable.
The following example uses the DatabricksSession
class, or uses the SparkSession
class if the DatabricksSession
class is unavailable, to query the specified table and return the first 5 rows. This example uses the SPARK_REMOTE
environment variable for authentication.
Python
from pyspark.sql import SparkSession, DataFrame
def get_spark() -> SparkSession:
try:
from databricks.connect import DatabricksSession
return DatabricksSession.builder.getOrCreate()
except ImportError:
return SparkSession.builder.getOrCreate()
def get_taxis(spark: SparkSession) -> DataFrame:
return spark.read.table("samples.nyctaxi.trips")
get_taxis(get_spark()).show(5)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4