无法启动apache spark独立群集

zy1mlcev  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(583)

我有一些问题,以启动一个主Spark集群和工人。我在ubuntu16.04 lts上下载并安装了hadoop2.7.3和spark2.0.0。我用我的slave的ip创建了一个conf/slaves文件,这是我的spark-env.sh

  1. # !/usr/bin/env #bash
  2. export SPARK_DIST_CLASSPATH=$(hadoop classpath)
  3. export SPARK_WORKER_CORES=2
  4. export SPARK_MASTER_IP=192.168.1.6
  5. export SPARK_LOCAL_IP=192.168.1.6
  6. export SPARK_YARN_USER_ENV="JAVA_HOME=/usr/lib/jvm/java-8-oracle/jre"

我用start-master.sh启动了master,一切正常。我试着跑工人时遇到了一些问题。
我试过:

  1. (1) - start-slave.sh spark://192.168.1.6:7077 (from worker)
  2. (2) - start-slaves.sh (from master)
  3. (3) - ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://192.168.1.6:7077 (from worker)

在(1)e(2)中,从机显然已启动,但处于master:8080 it 没有显示。使用(3)会引发以下异常:

  1. 16/08/31 14:17:03 INFO worker.Worker: Connecting to master master:7077...
  2. 16/08/31 14:17:03 WARN worker.Worker: Failed to connect to master master:7077
  3. org.apache.spark.SparkException: Exception thrown in awaitResult
  4. at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
  5. at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
  6. at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  7. at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
  8. at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
  9. at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
  10. at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
  11. at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:88)
  12. at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:96)
  13. at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:216)
  14. at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  15. at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  16. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  17. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  18. at java.lang.Thread.run(Thread.java:745)
  19. Caused by: java.io.IOException: Failed to connect to master/192.168.1.6:7077
  20. at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
  21. at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
  22. at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
  23. at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
  24. at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
  25. ... 4 more
  26. Caused by: java.net.ConnectException: Connection refused: master/192.168.1.6:7077
  27. at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  28. at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
  29. at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
  30. at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
  31. at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
  32. at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
  33. at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
  34. at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
  35. at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
  36. ... 1 more
  37. 16/08/31 14:17:40 ERROR worker.Worker: All masters are unresponsive! Giving up.

主进程和工作进程由使用桥接连接安装在同一Windows10主机上的VMWareVM托管。
我还禁用了防火墙。
我能做什么??
提前谢谢。

r9f1avp5

r9f1avp51#

在日志中:

  1. 16/08/31 14:17:03 INFO worker.Worker: Connecting to master master:7077...

你可以看到,它正在尝试连接 master:7077 确保主主机名解析为给定的ip(192.168.1.6)。
您可以检查/etc/hosts文件中的主机名。

y1aodyip

y1aodyip2#

只是想详细说明一下。由于它正在寻找主控形状,您有两个选择,要么编辑文件:

  1. /etc/hosts
  2. # add to following anywhere in the file
  3. 192.168.1.6 master

或者尝试转到spark-config目录(可能是/opt/spark/conf)并编辑spark-defaults.conf

  1. # you may just want to change //master:7077 to 192.168.1.6 to the actual hostname
  2. spark.master spark://master:7077

相关问题