This article describes how to compile and deploy Scala jobs as JAR files on a Unity Catalog-enabled cluster in standard access mode. It provides details to ensure that:
spark-core
or hadoop-core
are removed.note
Unity Catalog clusters in standard access mode implement the new Spark Connect architecture, which separates client and server components. This separation allows you to efficiently share clusters while fully enforcing Unity Catalog governance with measures such as row filters and column masks. However, Unity Catalog clusters in standard access mode have some limitations, for example lack of support for APIs such as Spark Context and RDDs. Limtations are listed in Compute access mode limitations for Unity Catalog.
Step 1: Ensure the Scala and JDK versions matchâBefore building your JARs, ensure the version of the Java Development Kit (JDK) and Scala that you will used to compile your code match the versions running on the Databricks Runtime version on your cluster. For information about compatible versions, see the version support matrix.
Step 2: Add Databricks Connect as a dependencyâDatabricks Connect must be used to build Scala JARs instead of OSS Spark. The Spark version running on the Databricks Runtime is more recent than what is currently available on OSS Spark, and includes performance and stability improvements.
In your Scala project's build file, such as build.sbt
for sbt or pom.xml
for Maven, add the following reference to Databricks Connect. Also, remove any dependency on OSS Spark.
<dependency>
<groupId>com.databricks</groupId>
<artifactId>databricks-connect</artifactId>
<version>16.2.0</version>
</dependency>
libraryDependencies += "com.databricks" % "databricks-connect" % "16.2.+"
Step 3: Package as a single JAR and deployâ
Databricks recommends packaging your application and all dependencies into a single JAR file, also known as an über or fat JAR. For sbt, use sbt-assembly
, and for Maven, use maven-shade-plugin
. See the official Maven Shade Plugin and sbt-assembly documentation for details.
Alternatively, you can install dependencies as cluster-scoped libraries. See compute-scoped libraries for more information.
note
For Scala JARs installed as libraries on Unity Catalog standard clusters, classes in the JAR libraries must be in a named package, such as com.databricks.MyClass
, or errors will occur when importing the library.
Deploy your JAR file using a JAR task. See JAR task for jobs.
Step 4: Ensure your JAR is allowlistedâFor security reasons, standard access mode requires an administrator to add Maven coordinates and paths for JAR libraries to an allowlist. See Allowlist libraries and init scripts on compute with standard access mode (formerly shared access mode).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4