在使用编译代码之后 sbt package
在spark中提交:
sudo -u spark spark-submit --master yarn --deploy-mode client --executor-memory 2G --num-executors 6 --class viterbiAlgorithm.viterbiAlgo ./target/scala-2.11/vibertialgo_2.11-1.3.4.jar
我有个错误:
Exception in thread "main" java.lang.NoSuchMethodError: breeze.linalg.DenseVector$.tabulate$mDc$sp(ILscala/Function1;Lscala/reflect/ClassTag;)Lbreeze/linalg/DenseVector;
at viterbiAlgorithm.User$$anonfun$eval$2.apply(viterbiAlgo.scala:84)
at viterbiAlgorithm.User$$anonfun$eval$2.apply(viterbiAlgo.scala:80)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at viterbiAlgorithm.User.eval(viterbiAlgo.scala:80)
at viterbiAlgorithm.viterbiAlgo$.main(viterbiAlgo.scala:28)
at viterbiAlgorithm.viterbiAlgo.main(viterbiAlgo.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
sbt构建文件如下所示:
name := "vibertiAlgo"
version := "1.3.4"
scalaVersion := "2.11.2"
libraryDependencies ++= Seq(
"org.scalanlp" %% "breeze" % "1.0",
"org.apache.spark" %% "spark-core" % "2.4.0",
"org.apache.spark" %% "spark-sql" % "2.4.0")
我可以成功地在本地运行代码,虽然与 sbt run
,所以我不知道我的代码有什么问题。另外,scala和spark的编译和运行时版本是相同的。
的代码 viterbiAlgo.scala
是:
package viterbiAlgorithm
import breeze.linalg._
// import org.apache.spark.sql.SparkSession
object viterbiAlgo {
def main(arg: Array[String]) {
val A = DenseMatrix((0.5,0.2,0.3),
(0.3,0.5,0.2),
(0.2,0.3,0.5))
val B = DenseMatrix((0.5,0.5),
(0.4,0.6),
(0.7,0.3))
val pi = DenseVector(0.2,0.4,0.4)
val o = DenseVector[Int](0,1,0) //Hive time + cell_id
val model = new Model(A,B,pi)
val user = new User("Jack", model, o) //Hive
user.eval() // run algorithm
user.printResult()
//spark sql
// val warehouseLocation = "spark-warehouse"
// val spark = SparkSession.builder().appName("Spark.sql.warehouse.dir").config("spark.sql.warehouse.dir", warehouseLocation).enableHiveSupport().getOrCreate()
// import spark.implicits._
// import spark.sql
// val usr = "1"
// val model = new Model(A,B,pi)
// val get_statement = "SELECT * FROM viterbi.observation"
// val df = sql(get_statement)
// val o = DenseVector(df.filter(df("usr")===usr).select(df("obs")).collect().map(_.getInt(0)))
// val user = new User(usr, model, o)
// user.eval()
// user.printResult()
}
}
class Model (val A: DenseMatrix[Double], val B:DenseMatrix[Double], val pi: DenseVector[Double]) {
def info():Unit = {
println("The model is:")
println("A:")
println(A)
println("B:")
println(B)
println("Pi:")
println(pi)
}
}
class User (val usr_name: String, val model: Model, val o:DenseVector[Int]) {
val N = model.A.rows // state number
val M = model.B.cols // observation state
val T = o.length // time
val delta = DenseMatrix.zeros[Double](N,T)
val psi = DenseMatrix.zeros[Int](N,T)
val best_route = DenseVector.zeros[Int](T)
def eval():Unit = {
//1. Initialization
delta(::,0) := model.pi * model.B(::, o(0))
psi(::,0) := DenseVector.zeros[Int](N)
/*2. Induction
*/
val tempDelta = DenseMatrix.zeros[Double](N,N)// Initialization
val tempB = DenseMatrix.zeros[Double](N,N)// Initialization
for (t <- 1 to T-1) {
// Delta
tempDelta := DenseMatrix.tabulate(N, N){case (i, j) => delta(i,t-1)}
tempB := DenseMatrix.tabulate(N, N){case (i, j) => model.B(j, o(t))}
delta(::, t) := DenseVector.tabulate(N){i => max((tempDelta *:* model.A *:* tempB).t.t(::,i))}
}
//3. Maximum
val P_star = max(delta(::, T-1))
val i_star_T = argmax(delta(::, T-1))
best_route(T-1) = i_star_T
//4. Backward
for (t <- T-2 to 0 by -1) {
best_route(t) = psi(best_route(t+1),t+1)
}
}
def printResult():Unit = {
println("User: " + usr_name)
model.info()
println
println("Observed: ")
printRoute(o)
println("Best_route is: ")
printRoute(best_route)
println("delta is")
println(delta)
println("psi is: ")
println(psi)
}
def printRoute(v: DenseVector[Int]):Unit = {
for (i <- v(0 to -2)){
print(i + "->")
}
println(v(-1))
}
}
我也试过了 --jars
参数并传递了breeze库的位置,但得到了相同的错误。
我需要提到的是,我在服务器上“本地”测试了代码,并在sparkshell上测试了所有方法(我可以在服务器上的sparkshell上导入breeze库)。
服务器scala版本与sbt构建文件中的版本匹配。尽管spark版本是2.4.0-cdh6.2.1,如果我在“2.4.0”之后添加“cdh6.2.1”,sbt将不会编译。
我尝试了维克托提供的两种可能的解决办法,但没有成功。但是,我将sbt构建文件中的breeze版本更改为 0.13.2
从 1.0
,一切正常。但我不知道出了什么问题。
1条答案
按热度按时间lo8azlld1#
如果您在本地而不是在服务器上运行代码,这意味着您没有在提交作业的类路径中提供库。
您有两个选择:
使用
--jars
参数并传递所有库的位置(在您的示例中,它似乎是breeze
图书馆)。使用
sbt assembly
插件,它将生成一个包含所有所需依赖项的胖jar,然后将该jar提交给作业。