scala 尝试运行spark时,SBT发出java.lang.NullPointerException

hts6caw3  于 2022-12-13  发布在  Scala
关注(0)|答案(1)|浏览(577)

我试图在一台Linux机器上用sbt1.7.2编译spark,该机器的系统是CentOs6。
当我尝试执行清除指令时:./build/sbt clean
我得到以下输出:

java.lang.NullPointerException
    at sun.net.util.URLUtil.urlNoFragString(URLUtil.java:50)
    at sun.misc.URLClassPath.getLoader(URLClassPath.java:526)
    at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:498)
    at sun.misc.URLClassPath.getResource(URLClassPath.java:252)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:406)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:406)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:406)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:406)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
    at sbt.internal.XMainConfiguration.run(XMainConfiguration.java:51)
    at sbt.xMain.run(Main.scala:46)
    at xsbt.boot.Launch$.$anonfun$run$1(Launch.scala:149)
    at xsbt.boot.Launch$.withContextLoader(Launch.scala:176)
    at xsbt.boot.Launch$.run(Launch.scala:149)
    at xsbt.boot.Launch$.$anonfun$apply$1(Launch.scala:44)
    at xsbt.boot.Launch$.launch(Launch.scala:159)
    at xsbt.boot.Launch$.apply(Launch.scala:44)
    at xsbt.boot.Launch$.apply(Launch.scala:21)
    at xsbt.boot.Boot$.runImpl(Boot.scala:78)
    at xsbt.boot.Boot$.run(Boot.scala:73)
    at xsbt.boot.Boot$.main(Boot.scala:21)
    at xsbt.boot.Boot.main(Boot.scala)
[error] [launcher] error during sbt launcher: java.lang.NullPointerException

当我使用sbt1.7.3时也发生了这种情况,但当我使用sbt1.6.2时,它可以成功地清理和编译spark。
我应该先检查什么?我真的很感激任何人可以提供任何建议。

hmae6n7t

hmae6n7t1#

Spark和sbt调试的几点建议。

如何在IntelliJ中构建Spark。

克隆https://github.com/apache/spark,在IntelliJ中将其作为sbt项目打开。
我必须执行sbt compile并重新打开项目,然后才能在IntelliJ中运行我的代码(在此之前我有一个错误object SqlBaseParser is not a member of package org.apache.spark.sql.catalyst.parser)。

// scalastyle:off
import org.apache.spark.sql.{Dataset, SparkSession}

object MyMain extends App {
  val spark = SparkSession.builder()
    .master("local")
    .appName("SparkTestApp")
    .getOrCreate()

  case class Person(id: Long, name: String)

  import spark.implicits._

  val df: Dataset[Person] = spark.range(10).map(i => Person(i, i.toString))

  df.show()

//+---+----+
//| id|name|
//+---+----+
//|  0|   0|
//|  1|   1|
//|  2|   2|
//|  3|   3|
//|  4|   4|
//|  5|   5|
//|  6|   6|
//|  7|   7|
//|  8|   8|
//|  9|   9|
//+---+----+

}

当这些弹出窗口出现时,我也按了Run npm installLoad Maven project,但我没有注意到区别。
还有一次,我不得不在sql/catalyst/target/scala-2.12/src_managed中的Project Structure中只保留一个源根sql/catalyst/target/scala-2.12/src_managed/main(而不是sql/catalyst/target/scala-2.12/src_managed/main/antlr4)。
使用IntelliJ IDEA构建Apache Spark源代码:https://yujheli-wordpress-com.translate.goog/2020/03/26/build-apache-spark-source-code-with-intellij-idea/?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=uk&_x_tr_pto=wapp(中文原文:https://yujheli.wordpress.com/2020/03/26/build-apache-spark-source-code-with-intellij-idea/)的数据
Why does building Spark sources give "object sbt is not a member of package com.typesafe"?

如何在IntelliJ中构建sbt。

sbt本身是棘手的https://www.lihaoyi.com/post/SowhatswrongwithSBT.html和建设它也有点棘手。
克隆https://github.com/sbt/sbt,在IntelliJ中打开它。让我们尝试使用这个克隆的sbt运行前面的Spark代码。
sbt似乎不打算在指定的目录中运行。

object MyClient extends App {
  System.setProperty("user.dir", "../spark")
  sbt.client.Client.main(Array("sql/runMain MyMain"))
}

(通常,不建议变更系统属性user.dir:(How to use "cd" command using Java runtime?
我必须首先执行sbt compile(这包括命令sbt generateContrabands--- sbt使用sbt插件sbt-contrabandContrabandPluginJsonCodecPlugin),以前是sbt-datatype,用于代码生成:https://github.com/sbt/contrabandhttps://www.scala-sbt.org/contraband/https://www.scala-sbt.org/1.x/docs/Datatype.htmlhttps://github.com/eed3si9n/gigahorse/tree/develop/core/src/main/contraband)。在此之前,我遇到了错误not found: value ScalaKeywords
下一个错误是type ExcludeItem is not a member of package sbt.internal.bsp。您可以删除protocol/src/main/contraband-scala/sbt/internal/bsp/codec中的文件ExcludeItemFormats.scalaExcludesItemFormats.scalaExcludesParamsFormats.scalaExcludesResultFormats.scala。它们是过期的自动生成的文件。如果删除目录protocol/src/main/contraband-scala的内容,您可以检查(这是自动生成源代码的根目录)并执行sbt generateContrabands,除这四个文件外的所有文件都将恢复。由于某些原因,这些文件不会混淆sbt,但会混淆IntelliJ。
现在,在运行时,MyClient会产生

//[info] +---+----+
//[info] | id|name|
//[info] +---+----+
//[info] |  0|   0|
//[info] |  1|   1|
//[info] |  2|   2|
//[info] |  3|   3|
//[info] |  4|   4|
//[info] |  5|   5|
//[info] |  6|   6|
//[info] |  7|   7|
//[info] |  8|   8|
//[info] |  9|   9|
//[info] +---+----+

sbt.client.Client称为瘦客户端。

  • 编译.sbt*(网址:)
lazy val sbtClientProj = (project in file("client"))
  .enablePlugins(NativeImagePlugin)
  .dependsOn(commandProj)
  .settings(
    commonBaseSettings,
    scalaVersion := "2.12.11",
    publish / skip := false, // change true to false
    name := "sbt-client",
    .......

sbt publishLocal
新建项目:

  • 构建.sbt*
scalaVersion := "2.12.17"

// ~/.ivy2/local/org.scala-sbt/sbt-client/1.8.1-SNAPSHOT/jars/sbt-client.jar
libraryDependencies += "org.scala-sbt" % "sbt-client" % "1.8.1-SNAPSHOT"
  • 源代码/主文件/scala/主文件.scala*
object Main extends App {
  System.setProperty("user.dir", "../spark")
  sbt.client.Client.main(Array("sql/runMain MyMain"))
  
  //[info] +---+----+
  //[info] | id|name|
  //[info] +---+----+
  //[info] |  0|   0|
  //[info] |  1|   1|
  //[info] |  2|   2|
  //[info] |  3|   3|
  //[info] |  4|   4|
  //[info] |  5|   5|
  //[info] |  6|   6|
  //[info] |  7|   7|
  //[info] |  8|   8|
  //[info] |  9|   9|
  //[info] +---+----+
}

但是瘦客户机并不是sbt正常运行的方式。堆栈跟踪中的sbt.xMain来自https://github.com/sbt/sbt。它位于:https://github.com/sbt/sbt/blob/1.8.x/main/src/main/scala/sbt/Main.scala#L44但是来自堆栈跟踪的xsbt.boot.Boot不是来自这个存储库,它来自https://github.com/sbt/launcher,即https://github.com/sbt/launcher/blob/1.x/launcher-implementation/src/main/scala/xsbt/boot/Boot.scala
sbt的运行分为两步,sbt的可执行文件(通常从https://www.scala-sbt.org/download.html#universal-packages下载)是一个shell脚本,它首先运行sbt-launch.jar(对象xsbt.boot.Boot
https://github.com/sbt/sbt/blob/v1.8.0/sbt#L507-L512

execRunner "$java_cmd" \
  "${java_args[@]}" \
  "${sbt_options[@]}" \
  -jar "$sbt_jar" \
  "${sbt_commands[@]}" \
  "${residual_args[@]}"

其次后者反射地调用SBT(类X1 M34 N1 X)
https://github.com/sbt/launcher/blob/v1.4.1/launcher-implementation/src/main/scala/xsbt/boot/Launch.scala#L147-L149

val main = appProvider.newMain()
try {
  withContextLoader(appProvider.loader)(main.run(appConfig))

https://github.com/sbt/launcher/blob/v1.4.1/launcher-implementation/src/main/scala/xsbt/boot/Launch.scala#L496

// implementation of the above appProvider.newMain()
else if (AppMainClass.isAssignableFrom(entryPoint)) mainClass.newInstance

https://github.com/sbt/launcher/blob/v1.4.1/launcher-implementation/src/main/scala/xsbt/boot/PlainApplication.scala#L13

// implementation of the above main.run(appConfig)
mainMethod.invoke(null, configuration.arguments).asInstanceOf[xsbti.Exit]

然后xMain#run通过XMainConfiguration#run反射调用xMain.run
https://github.com/sbt/sbt/blob/v1.8.0/main/src/main/scala/sbt/Main.scala#L44-L47

class xMain extends xsbti.AppMain {
  def run(configuration: xsbti.AppConfiguration): xsbti.MainResult =
    new XMainConfiguration().run("xMain", configuration)
}

https://github.com/sbt/sbt/blob/v1.8.0/main/src/main/java/sbt/internal/XMainConfiguration.java#L51-L57

Class<?> clazz = loader.loadClass("sbt." + moduleName + "$");
Object instance = clazz.getField("MODULE$").get(null);
Method runMethod = clazz.getMethod("run", xsbti.AppConfiguration.class);
try {
  .....
  return (xsbti.MainResult) runMethod.invoke(instance, updatedConfiguration);

然后,它下载并运行Scala的必要版本(在build.sbt中指定)和sbt其余部分的必要版本(在project/build.properties中指定)。

什么是启动程序。

让我们考虑一个helloworld的发射器。
启动程序由一个库(接口)组成
https://mvnrepository.com/artifact/org.scala-sbt/launcher-interface
https://github.com/sbt/launcher/tree/1.x/launcher-interface
和发射器可运行的震击器
https://mvnrepository.com/artifact/org.scala-sbt/launcher
https://github.com/sbt/launcher/tree/1.x/launcher-implementation/src显示器
创建一个项目(取决于编译托姆的启动器接口)

  • 构建.sbt*
lazy val root = (project in file("."))
  .settings(
    name := "scalademo",
    organization := "com.example",
    version := "0.1.0-SNAPSHOT",
    scalaVersion := "2.13.10",
    libraryDependencies ++= Seq(
      "org.scala-sbt" % "launcher-interface" % "1.4.1" % Provided,
    ),
  )
  • src/main/scala/mypackage/Main.scala*(此类将作为使用启动程序时的入口点)
package mypackage

import xsbti.{AppConfiguration, AppMain, Exit, MainResult}

class Main extends AppMain {
  def run(configuration: AppConfiguration): MainResult = {
    val scalaVersion = configuration.provider.scalaProvider.version

    println(s"Hello, World! Running Scala $scalaVersion")
    configuration.arguments.foreach(println)

    new Exit {
      override val code: Int = 0
    }
  }
}

执行sbt publishLocal。项目jar将发布在~/.ivy2/local/com.example/scalademo_2.13/0.1.0-SNAPSHOT/jars/scalademo_2.13.jar
下载启动程序可运行jar https://repo1.maven.org/maven2/org/scala-sbt/launcher/1.4.1/launcher-1.4.1.jar
创建启动程序配置

  • 我的应用程序配置 *
[scala]
  version: 2.13.10
[app]
  org: com.example
  name: scalademo
  version: 0.1.0-SNAPSHOT
  class: mypackage.Main
  cross-versioned: binary
[repositories]
  local
  maven-central
[boot]
  directory: ${user.home}/.myapp/boot

然后,命令java -jar launcher-1.4.1.jar @my.app.configuration a b c会产生

//Hello world! Running Scala 2.13.10
//a
//b
//c

出现文件

~/.myapp/boot/scala-2.13.10/com.example/scalademo/0.1.0-SNAPSHOT
  scalademo_2.13.jar
  scala-library-2.13.10.jar
~/.myapp/boot/scala-2.13.10/lib
  java-diff-utils-4.12.jar
  jna-5.9.0.jar
  jline-3.21.0.jar
  scala-library.jar
  scala-compiler.jar
  scala-reflect.jar

因此,Launcher有助于在只安装了Java的环境中运行应用程序(Scala不是必需的),将使用Ivy依赖关系解析。有一些特性可以处理返回代码,用不同的Scala版本重新启动应用程序,启动服务器等。
或者,可以使用以下任何命令

java -Dsbt.boot.properties=my.app.configuration -jar launcher-1.4.1.jar
java -jar launcher-repacked.jar      # put my.app.configuration to sbt/sbt.boot.properties/ and repack the jar


https://www.scala-sbt.org/1.x/docs/Launcher-Getting-Started.html显示器

如何使用启动程序运行sbt。

Sbt https://github.com/sbt/sbt使用sbt插件SbtLauncherPluginhttps://github.com/sbt/sbt/blob/v1.8.0/project/SbtLauncherPlugin.scala以便从原始启动器launcher
https://github.com/sbt/launcher/tree/1.x/launcher-implementation/src显示器
https://mvnrepository.com/artifact/org.scala-sbt/launcher
它构建了sbt-launch
https://github.com/sbt/sbt/tree/v1.8.0/launch
https://mvnrepository.com/artifact/org.scala-sbt/sbt-launch
基本上,sbt-launchlauncher的不同之处在于注入了默认配置sbt.boot.properties
如果我们想用启动器运行sbt,那么我们应该找到一种方法来为sbt指定一个工作目录(类似于我们在使用瘦客户机时所做的)。
工作目录可设置为1)在sbt.xMainsbt)中或2)在xsbt.boot.Bootsbt-launcher)中。

1)sbt.xMain设为非final,以便可以扩充

/*final*/ class xMain extends xsbti.AppMain { 
...........

https://github.com/sbt/sbt/blob/v1.8.0/main/src/main/scala/sbt/Main.scala#L44
将新类放入main/src/main/scala(启动器样式的入口点)

import sbt.xMain
import xsbti.{ AppConfiguration, AppProvider, MainResult }
import java.io.File

class MyXMain extends xMain {
  override def run(configuration: AppConfiguration): MainResult = {
    val args = configuration.arguments

    val (dir, rest) =
      if (args.length >= 1 && args(0).startsWith("dir=")) {
        (
          Some(args(0).stripPrefix("dir=")),
          args.drop(1)
        )
      } else {
        (None, args)
      }

    dir.foreach { dir =>
      System.setProperty("user.dir", dir)
    }

    // xMain.run(new AppConfiguration { // not ok
    // new xMain().run(new AppConfiguration { // not ok
    super[xMain].run(new AppConfiguration { // ok
      override val arguments: Array[String] = rest
      override val baseDirectory: File =
        dir.map(new File(_)).getOrElse(configuration.baseDirectory)
      override val provider: AppProvider = configuration.provider
    })
  }
}

sbt publishLocal

  • 我的sbt配置 *
[scala]
  version: auto
  #version: 2.12.17
[app]
  org: org.scala-sbt
  name: sbt
  #name: main  # not ok
  version: 1.8.1-SNAPSHOT
  class: MyXMain
  #class: sbt.xMain
  components: xsbti,extra
  cross-versioned: false
  #cross-versioned: binary
[repositories]
  local
  maven-central
[boot]
  directory: ${user.home}/.mysbt/boot
[ivy]
  ivy-home: ${user.home}/.ivy2

存储器
一道命令:
java -jar launcher-1.4.1.jar @my.sbt.configuration dir=/path_to_spark/spark "sql/runMain MyMain"

x1米58英寸

//[info] +---+----+
//[info] | id|name|
//[info] +---+----+
//[info] |  0|   0|
//[info] |  1|   1|
//[info] |  2|   2|
//[info] |  3|   3|
//[info] |  4|   4|
//[info] |  5|   5|
//[info] |  6|   6|
//[info] |  7|   7|
//[info] |  8|   8|
//[info] |  9|   9|
//[info] +---+----+

sbt-launch.jar取自~/.ivy2/local/org.scala-sbt/sbt-launch/1.8.1-SNAPSHOT/jars,或者只是https://mvnrepository.com/artifact/org.scala-sbt/sbt-launch,因为我们还没有修改启动器)
我必须从spark复制scalastyle-config.xml,否则找不到它。
我仍然有警告fatal: Not a git repository (or any parent up to mount parent ...) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
2)
项目/依赖关系。scala*(https://github.com/sbt/sbt/blob/v1.8.0/project/依赖关系。

val launcherVersion = "1.4.2-SNAPSHOT" // modified

克隆https://github.com/sbt/launcher并进行以下更改

  • 构建.sbt*(网址:https://github.com/sbt/launcher/blob/v1.4.1/build.sbt#L11)
ThisBuild / version := {
  val orig = (ThisBuild / version).value
  if (orig.endsWith("-SNAPSHOT")) "1.4.2-SNAPSHOT" // modified
  else orig
}

(请参阅://github.com/sbt/launcher/blob/v1.4.1/启动器-实现/源代码/主/scala/xsbt/ Boot /启动器-实现/源代码/主/scala/xsbt/启动器/启动器. scala)

class LauncherArguments(
    val args: List[String],
    val isLocate: Boolean,
    val isExportRt: Boolean,
    val dir: Option[String] = None // added
)

object Launch {
  def apply(arguments: LauncherArguments): Option[Int] =
    apply((new File(arguments.dir.getOrElse(""))).getAbsoluteFile, arguments) // modified

  .............

启动程序实现/脚本语言/脚本语言/脚本语言/脚本语言/脚本语言/脚本语言/脚本语言/脚本语言/脚本语言

def parseArgs(args: Array[String]): LauncherArguments = {
    @annotation.tailrec
    def parse(
        args: List[String],
        isLocate: Boolean,
        isExportRt: Boolean,
        remaining: List[String],
        dir: Option[String] // added
    ): LauncherArguments =
      args match {
        ...................
        case "--locate" :: rest        => parse(rest, true, isExportRt, remaining, dir) // modified
        case "--export-rt" :: rest     => parse(rest, isLocate, true, remaining, dir) // modified
          // added
        case "--mydir" :: next :: rest => parse(rest, isLocate, isExportRt, remaining, Some(next))
        case next :: rest              => parse(rest, isLocate, isExportRt, next :: remaining, dir) // modified
        case Nil                       => new LauncherArguments(remaining.reverse, isLocate, isExportRt, dir) // modified
      }
    parse(args.toList, false, false, Nil, None)
  }

SBT发射器:x1米65英寸
第一次会议

  • 我的sbt配置 *
[scala]
  version: auto
[app]
  org: org.scala-sbt
  name: sbt
  version: 1.8.1-SNAPSHOT
  #class: MyXMain
  class: sbt.xMain
  components: xsbti,extra
  cross-versioned: false
[repositories]
  local
  maven-central
[boot]
  directory: ${user.home}/.mysbt/boot
[ivy]
  ivy-home: ${user.home}/.ivy2

一道命令:
java -jar launcher-1.4.2-SNAPSHOT.jar @my.sbt.configuration --mydir /path_to_spark/spark "sql/runMain MyMain"

x1米68英寸
(we正在使用修改的launcher或使用此修改的launcher的新sbt-launch
或者,我们可以在IntelliJ中为xsbt.boot.Boot指定“运行配置”中的“程序参数”
@/path_to_sbt_config/my.sbt.configuration --mydir /path_to_spark/spark "sql/runMain MyMain"
也可以在IntelliJ的“运行配置”中指定工作目录/path_to_spark/spark
x1米75英寸
我尝试使用"org.scala-sbt" % "launcher" % "1.4.2-SNAPSHOT""org.scala-sbt" % "sbt-launch" % "1.8.1-SNAPSHOT"作为依赖项,但得到了No RuntimeVisibleAnnotations in classfile with ScalaSignature attribute: class Boot

您的设置。

因此,我们可以在IntelliJ和/或println s中运行/调试sbt-launcher代码,并使用println s运行/调试sbt代码(因为没有可运行的对象)。
根据堆栈跟踪,我怀疑类加载器urls中的一个为空
https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/sun/misc/URLClassPath.java#L82
也许您可以在sbt.xMain#runMyXMain#run上添加类似于

var cl = getClass.getClassLoader
while (cl != null) {
  println(s"classloader: ${cl.getClass.getName}")
  cl match {
    case cl: URLClassLoader =>
      println("classloader urls:")
      cl.getURLs.foreach(println)
    case _ =>
      println("not URLClassLoader")
  }
  cl = cl.getParent
}


以便查看哪个URL为空。
https://www.scala-sbt.org/1.x/docs/Developers-Guide.html显示器
https://github.com/sbt/sbt/blob/1.8.x/DEVELOPING.md显示器

相关问题