dengwei@cluster-6a52-m:/usr/lib/spark$ bin/spark-shell Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 22/05/10 10:30:08 INFO org.apache.spark.SparkEnv: Registering MapOutputTracker 22/05/10 10:30:08 INFO org.apache.spark.SparkEnv: Registering BlockManagerMaster 22/05/10 10:30:08 INFO org.apache.spark.SparkEnv: Registering BlockManagerMasterHeartbeat 22/05/10 10:30:09 INFO org.apache.spark.SparkEnv: Registering OutputCommitCoordinator Spark context Web UI available at http://cluster-6a52-m.us-central1-a.c.xed-project-237404.internal:34191 Spark context available as 'sc' (master = yarn, app id = application_1652096761512_0003). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _
/ / '/
// .__/_,// //_\ version 3.1.2
//
Using Scala version 2.12.14 (OpenJDK 64-Bit Server VM, Java 1.8.0_322)
Type in expressions to have them evaluated.
Type :help for more information.
scala> import org.apache.doris.spark._
import org.apache.doris.spark._
scala> val dorisSparkRDD = sc.dorisRDD(
| tableIdentifier = Some("example_db.ttt"),
| cfg = Some(Map(
| "doris.fenodes" -> "test_host:8031",
| "doris.request.auth.user" -> "test",
| "doris.request.auth.password" -> "test"
| ))
| )
dorisSparkRDD: org.apache.spark.rdd.RDD[AnyRef] = ScalaDorisRDD[0] at RDD at AbstractDorisRDD.scala:32
scala> dorisSparkRDD.collect()`
22/05/10 10:30:46 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 2) (cluster-6a52-w-0.us-central1-a.c.xed-project-237404.internal executor 1): java.lang.ClassNotFoundException: org.apache.doris.spark.rdd.DorisPartition
at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:124)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
currently env:
spark 3.1.2
doris-spark-connector : doris-spark-connector-3.1.2-2.12-1.0.0.jar
os: Debian 10
on google cloud , dataproc
the above code works fine in standalone mode.
but run with error in cluster
the connector was built in docker support by official ( centos) 1.0.0-rc03
this blog has similar case , but the details error is different.
the blog said it was because spark 2.x supported rather than 3.x in 2021.
but the official link here supported 3.2.x already .
thanks in advance for any clue ~
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4