我有任何应用程序在本地运行时成功运行,然后我构建了一个jar并将其部署到gcp,并尝试在data proc cluster上运行它,但是它失败了,出现了一些异常,我不确定,下面是错误日志中的堆栈跟踪。
20/08/19 15:58:07 INFO org.spark_project.jetty.util.log: Logging initialized @4444ms
20/08/19 15:58:07 INFO org.spark_project.jetty.server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
20/08/19 15:58:07 INFO org.spark_project.jetty.server.Server: Started @4520ms
20/08/19 15:58:07 INFO org.spark_project.jetty.server.AbstractConnector: Started ServerConnector@6015a4a5{HTTP/1.1,[http/1.1]}{0.0.0.0:35581}
20/08/19 15:58:08 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at fc-available-picked-ingestor-m/10.22.166.101:8032
20/08/19 15:58:08 INFO org.apache.hadoop.yarn.client.AHSProxy: Connecting to Application History server at fc-available-picked-ingestor-m/10.22.166.101:10200
20/08/19 15:58:08 INFO org.apache.hadoop.conf.Configuration: resource-types.xml not found
20/08/19 15:58:08 INFO org.apache.hadoop.yarn.util.resource.ResourceUtils: Unable to find 'resource-types.xml'.
20/08/19 15:58:08 INFO org.apache.hadoop.yarn.util.resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
20/08/19 15:58:08 INFO org.apache.hadoop.yarn.util.resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
20/08/19 15:58:10 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1597777222711_0015
20/08/19 15:58:16 ERROR org.apache.spark.SparkContext: Error initializing SparkContext.
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:135)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3241)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:121)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3291)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3259)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:470)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866)
at org.apache.spark.scheduler.EventLoggingListener.<init>(EventLoggingListener.scala:71)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:522)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:930)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
at com.walmart.ei.dqa.Utils$.getSparkSession(Utils.scala:47)
at com.walmart.ei.EntryPoint$.main(EntryPoint.scala:27)
at com.walmart.ei.EntryPoint.main(EntryPoint.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
... 27 more
Caused by: java.lang.NoSuchMethodError: shaded.guava.common.base.Preconditions.checkState(ZLjava/lang/String;J)V
at shaded.guava.cloud.hadoop.gcsio.GoogleCloudStorageReadOptions$Builder.build(GoogleCloudStorageReadOptions.java:224)
at shaded.guava.cloud.hadoop.gcsio.GoogleCloudStorageReadOptions.<clinit>(GoogleCloudStorageReadOptions.java:60)
at shaded.guava.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemConfiguration.<clinit>(GoogleHadoopFileSystemConfiguration.java:423)
at shaded.guava.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.<init>(GoogleHadoopFileSystemBase.java:227)
at shaded.guava.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.<init>(GoogleHadoopFileSystem.java:54)
... 32 more
20/08/19 15:58:16 INFO org.spark_project.jetty.server.AbstractConnector: Stopped Spark@6015a4a5{HTTP/1.1,[http/1.1]}{0.0.0.0:0}
这是我的pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example.ei</groupId>
<artifactId>ei-ingestion-data</artifactId>
<version>1.0.0-SNAPSHOT</version>
<packaging>jar</packaging>
<parent>
<groupId>com.example.ei.dqa</groupId>
<artifactId>dqa-parent</artifactId>
<version>0.7</version>
</parent>
<properties>
<spark.version>2.4.4</spark.version>
<encoding>UTF-8</encoding>
<shade.version>3.2.0</shade.version>
<scala.binary.version>2.11</scala.binary.version>
<ei.canonical.schema.version>0.32.1</ei.canonical.schema.version>
<apache.commons.email.version>1.5</apache.commons.email.version>
<!-- <maven.scala.version>2.15.2</maven.scala.version>-->
<scalatest.version>3.1.1</scalatest.version>
<!-- <scalatest.maven.plugin.version>1.0</scalatest.maven.plugin.version>-->
<!-- <cucumber.version>4.2.0</cucumber.version>-->
<!-- <scala.tools.version>2.11</scala.tools.version>-->
</properties>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>libraries-bom</artifactId>
<version>7.0.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>com.walmart.ei.dqa</groupId>
<artifactId>dqa-spark-utils</artifactId>
<version>0.7</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.guava/guava -->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>28.0-jre</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<exclusions>
<exclusion>
<artifactId>avro</artifactId>
<groupId>org.apache.avro</groupId>
</exclusion>
<exclusion>
<artifactId>guava</artifactId>
<groupId>com.google.guava</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_${scala.binary.version}</artifactId>
<version>2.4.4</version>
</dependency>
<!-- <dependency>-->
<!-- <groupId>org.apache.spark</groupId>-->
<!-- <artifactId>spark-avro_${scala.binary.version}</artifactId>-->
<!-- <version>${spark.version}</version>-->
<!-- </dependency>-->
<dependency>
<groupId>com.walmart.ei</groupId>
<artifactId>ei-canonical-schema</artifactId>
<version>${ei.canonical.schema.version}</version>
<exclusions>
<exclusion>
<artifactId>log4j-over-slf4j</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>slf4j-api</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
</exclusions>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.cloud/google-cloud-storage -->
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
<exclusions>
<exclusion>
<artifactId>guava</artifactId>
<groupId>com.google.guava</groupId>
</exclusion>
</exclusions>
</dependency>
<!-- <dependency>-->
<!-- <groupId>org.apache.kafka</groupId>-->
<!-- <artifactId>kafka_${scala.binary.version}</artifactId>-->
<!-- <version>${kafka.version}</version>-->
<!-- <exclusions>-->
<!-- <exclusion>-->
<!-- <artifactId>com.fasterxml.jackson.core</artifactId>-->
<!-- <groupId>jackson-databind</groupId>-->
<!-- </exclusion>-->
<!-- </exclusions>-->
<!-- </dependency>-->
<!-- <dependency>-->
<!-- <groupId>io.cucumber</groupId>-->
<!-- <artifactId>cucumber-junit</artifactId>-->
<!-- <version>${cucumber.version}</version>-->
<!-- <scope>test</scope>-->
<!-- </dependency>-->
<!-- <dependency>-->
<!-- <groupId>io.cucumber</groupId>-->
<!-- <artifactId>cucumber-scala_${scala.tools.version}</artifactId>-->
<!-- <version>${cucumber.version}</version>-->
<!-- <scope>test</scope>-->
<!-- </dependency>-->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-email</artifactId>
<version>${apache.commons.email.version}</version>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_${scala.binary.version}</artifactId>
<version>${scalatest.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>gcs-connector</artifactId>
<version>hadoop2-2.0.0</version>
<exclusions>
<exclusion>
<artifactId>guava</artifactId>
<groupId>com.google.guava</groupId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<resources>
<resource>
<directory>src/main/resources</directory>
</resource>
</resources>
<testResources>
<testResource>
<directory>src/test/resources</directory>
</testResource>
</testResources>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>${shade.version}</version>
<configuration>
<relocations>
<relocation>
<pattern>com.google.common</pattern>
<shadedPattern>com.walmart.com.google.common</shadedPattern>
</relocation>
</relocations>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<artifactSet>
<excludes>
<exclude>org.datanucleus</exclude>
</excludes>
</artifactSet>
<promoteTransitiveDependencies>true</promoteTransitiveDependencies>
<minimizeJar>false</minimizeJar>
<relocations>
<relocation>
<pattern>com.google</pattern>
<shadedPattern>shaded.guava</shadedPattern>
<includes>
<include>com.google.**</include>
</includes>
<excludes>
<exclude>com.google.common.base.Optional</exclude>
<exclude>com.google.common.base.Absent</exclude>
<exclude>com.google.common.base.Present</exclude>
</excludes>
</relocation>
</relocations>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.example.ei.dqa.StructuredStreaming</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
1条答案
按热度按时间h9a6wy2h1#
很可能您的应用程序jar中存在错误配置的依赖项。
java.lang.NoSuchMethodError: shaded.guava.common.base.Preconditions.checkState(ZLjava/lang/String;J)
例外点:您在应用程序jar中添加了与gcs connector的gcsio库不兼容的guava库版本(即它没有必需的方法)。要解决这个问题,您需要确保您使用的是与gcsio库兼容的guava版本。gcs连接器2.0.0应为Guava28.0-jre
.以检查
pom.xml
从应用程序jar中可以包含哪些依赖项了解早期版本的guava,您可以使用maven依赖插件:另外,如果您使用的是spark,那么应该使用gcs连接器实现的hadoop文件系统来访问google云存储(gcs),就像访问hdfs文件一样,您只需要将fs模式从
hdfs://
至gs://
. 因此你不应该依赖gcs-connector
你不应该依赖其他gcs客户端,比如google-cloud-storage
因为dataproc集群已经预装了gcs连接器,如果需要直接访问gcs对象,应该使用throughhcfs接口。