我们正在开发spark应用程序。它将托管在azure hdinsight spark群集上。我们的用例是这样的:我们必须从azure blob存储中提取数据,并使用spark处理数据,最后创建数据或将数据追加回azure blob存储。所以我们使用了azure-storage-4.3.0.jar
我们在eclipse项目中使用了maven,并添加了以下依赖项
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-storage</artifactId>
<version>4.3.0</version>
</dependency>
编译成功。即使应用程序在本地机器上运行良好,执行起来也没有问题。
因此,我们从eclipse创建了一个uber/fat jar,并将其移植到我们的azure hdinsight spark群集,然后运行以下命令:
spark-submit --class myClassName MyUberJar.jar --verbose
应用程序遇到以下错误:
Exception in thread "main" java.lang.NoSuchMethodError: com.microsoft.azure.storage.blob.CloudBlockBlob.startCopy(Lcom/microsoft/azure/storage/blob/CloudBlockBlob;)Ljava/lang/String;
at com.lsy.airmon2.dao.blob.AzureStorageImpl.moveData(AzureStorageImpl.java:188)
at com.lsy.airmon2.processor.SurveyProcessor.stageData(SurveyProcessor.java:92)
at com.lsy.airmon2.processor.Processor.doJob(Processor.java:27)
at com.lsy.airmon2.entrypoint.AirMon2EntryPoint.runP(AirMon2EntryPoint.java:109)
at com.lsy.airmon2.entrypoint.AirMon2EntryPoint.run(AirMon2EntryPoint.java:82)
at com.lsy.airmon2.entrypoint.AirMon2EntryPoint.main(AirMon2EntryPoint.java:42)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
当我们深入研究这个问题时,我们发现azure hdinsight spark已经有较旧版本的azure storage(azure storage.2.2.0.jar)path/usr/hdp/current/hadoop client/lib,而这个较旧版本没有startcopy方法这个方法添加到azure-storage.3.0.0.jar版本中。
因此,我们将所有驱动程序和工作节点上的azure-storage.2.2.0.jar替换为azure-storage.3.0.0.jar。在此更改之后,应用程序遇到了奇怪的异常:
java.net.ConnectException: Call From hn0-FooBar/10.XXX.XXX.XXX to hn1-FooBar.xyzabcxyzabc.ax.internal.cloudapp.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1430)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy9.transitionToStandby(Unknown Source)
at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTranslatorPB.java:112)
at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:172)
at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:514)
at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:956)
at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:855)
at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:617)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:715)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1492)
at org.apache.hadoop.ipc.Client.call(Client.java:1402)
... 14 more
所以我们恢复了所有的变化,回到原点。
对如何解决这个问题有什么建议吗?
1条答案
按热度按时间bnl4lu3b1#
尝试在spark submit命令中使用--packages开关。
例如,我在以前的应用程序中使用过它(尽管没有使用uber jars):
所以应该是这样的: