HDFS 如何在集群模式下运行spark-submit命令时覆盖spark jar?(okhttp3)

eit6fx6z  于 2023-06-04  发布在  HDFS
关注(0)|答案(1)|浏览(260)

我的项目中的jar和spark-2.4.0 jars文件夹中的jar存在冲突。我的Retrofit带来了okhttp-3.13.1.jar(在mvn dependency:tree中验证),但服务器中的spark有okhttp-3.8.1.jar,我得到了NoSuchMethodException。所以,我试图显式地给予我的jar来覆盖它。
当我尝试在client模式下运行spark-submit命令时,它会拾取我提供的显式jar。但是当我尝试在cluster模式下运行时,这无法覆盖worker节点上的jar,执行程序使用的是导致NoSuchMethodError的Spark的旧jar。
我的jar是一个胖jar,但Spark罐不知何故优先于相同的。如果我可以删除Spark提供的jar,它可能会工作,但我不能,因为其他服务可能正在使用它。
以下是我的命令:

./spark-submit --class com.myJob \
  --conf spark.yarn.appMasterEnv.ENV=uat \
  --conf spark.driver.memory=12g \
  --conf spark.executor.memory=40g \
  --conf spark.sql.warehouse.dir=/user/myuser/spark-warehouse \
  --conf "spark.driver.extraClassPath=/home/test/okhttp-3.13.1.jar" \
  --conf "spark.executor.extraClassPath=/home/test/okhttp-3.13.1.jar" \
  --jars /home/test/okhttp-3.13.1.jar \
  --conf spark.submit.deployMode=cluster \
  --conf spark.yarn.archive=hdfs://namenode/frameworks/spark/spark-2.4.0-archives/spark-2.4.0-archive.zip \
  --conf spark.master=yarn \
  --conf spark.executor.cores=4 \
  --queue public \
  file:///home/mytest/myjar-SNAPSHOT.jar
final Retrofit retrofit = new Retrofit.Builder()
                            .baseUrl(configuration.ApiUrl()) // this throws nosuchmethodexception
                            .addConverterFactory(JacksonConverterFactory.create(new ObjectMapper()))
                            .build();

我的mvn dependency:tree并不表示我的jar中有任何其他可传递的jar。它在IntelliJ和mvn clean install中运行良好。
我甚至尝试提供jar(hdfs://users/myuser/myjars/okhttp-3.13.1.jar)的HDFS路径,但没有成功。有人能给我点光吗?
如果同时尝试--conf "spark.driver.userClassPathFirst=true" --conf "spark.executor.userClassPathFirst=true"--conf "spark.driver.userClassPathFirst=true" --conf "spark.executor.userClassPathFirst=true",则会出现以下异常

Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.<init>(YarnSparkHadoopUtil.scala:48)
    at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.<clinit>(YarnSparkHadoopUtil.scala)
    at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply$mcJ$sp(Client.scala:81)
    at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply(Client.scala:81)
    at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply(Client.scala:81)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.deploy.yarn.Client.<init>(Client.scala:80)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1526)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassCastException: org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl cannot be cast to org.apache.hadoop.yarn.api.records.Priority
    at org.apache.hadoop.yarn.api.records.Priority.newInstance(Priority.java:39)
    at org.apache.hadoop.yarn.api.records.Priority.<clinit>(Priority.java:34)
    ... 15 more

但是如果我只有--conf "spark.executor.userClassPathFirst=true",那么它挂起

axkjgtzd

axkjgtzd1#

我已经解决了这个问题使用maven阴影插件。
Ignore Spark Cluster Own Jars
参考视频:
https://youtu.be/WyfHUNnMutg?t=23m1s
我按照这里给出的答案并添加了以下内容。即使在SparkSubmit的源代码中,如果我们给予--jar,您也会看到jar被附加到total jar list中,因此它永远不会覆盖这些选项,但它会添加jar。
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L644

<plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-shade-plugin</artifactId>
      <executions>
        <execution>
          <phase>package</phase>
          <goals>
            <goal>shade</goal>
          </goals>
          <configuration>
            <relocations>
              <relocation>
                <pattern>okio</pattern>
                <shadedPattern>com.shaded.okio</shadedPattern>
              </relocation>
              <relocation>
                <pattern>okhttp3</pattern>
                <shadedPattern>com.shaded.okhttp3</shadedPattern>
              </relocation>
            </relocations>
            <filters>
              <filter>
                <artifact>*:*</artifact>
                <excludes>
                  <exclude>META-INF/*.SF</exclude>
                  <exclude>META-INF/*.DSA</exclude>
                  <exclude>META-INF/*.RSA</exclude>
                  <exclude>log4j.properties</exclude>
                </excludes>
              </filter>
            </filters>
          </configuration>
        </execution>
      </executions>
    </plugin>

相关问题