application-report for application(state:accepted)永不停歇地提交spark(带spark 3.0.1)

wtzytmuj  于 2021-07-15  发布在  Hadoop
关注(0)|答案(0)|浏览(888)

我正尝试在虚拟机上运行spark流应用程序(该项目将模拟具有一个节点的群集),方法如下: ./spark-submit --master yarn --deploy-mode cluster --driver-memory 3g /home/bigdata/PycharmProjects/pythonProject/venv/test.py 但当它运行时,我会陷入以下情况:

  1. ApplicationMaster host: N/A
  2. ApplicationMaster RPC port: -1
  3. queue: default
  4. start time: 1609436829638
  5. final status: UNDEFINED
  6. tracking URL: http://bigdata-VirtualBox:8088/proxy/application_1609416014621_0023/
  7. user: bigdata
  8. 20/12/31 18:47:11 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  9. 20/12/31 18:47:12 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  10. 20/12/31 18:47:13 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  11. 20/12/31 18:47:14 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  12. 20/12/31 18:47:15 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  13. 20/12/31 18:47:16 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  14. 20/12/31 18:47:17 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  15. 20/12/31 18:47:18 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  16. 20/12/31 18:47:19 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  17. 20/12/31 18:47:20 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  18. 20/12/31 18:47:21 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  19. 20/12/31 18:47:22 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  20. 20/12/31 18:47:23 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  21. 20/12/31 18:47:24 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  22. 20/12/31 18:47:25 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  23. 20/12/31 18:47:26 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  24. 20/12/31 18:47:27 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  25. 20/12/31 18:47:28 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  26. 20/12/31 18:47:29 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  27. 20/12/31 18:47:30 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  28. 20/12/31 18:47:31 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  29. 20/12/31 18:47:32 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  30. 20/12/31 18:47:33 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED)
  31. 20/12/31 18:47:34 INFO yarn.Client: Application report for application_1609416014621_0023 (state: ACCEPTED).

它一直持续到4-5分钟,然后它给了我一个错误:

  1. Failing this attempt.Diagnostics: [2020-12-31 18:59:34.338]Exception from container-launch.
  2. Container id: container_1609416014621_0024_02_000001
  3. Exit code: 13
  4. [2020-12-31 18:59:34.340]Container exited with a non-zero exit code 13. Error file: prelaunch.err.

我刚接触apachespark和hadoop,所以我不知道该怎么处理这种情况。下面是我的test.py代码:

  1. from pyspark import SparkContext
  2. from pyspark.streaming import StreamingContext
  3. from pyspark.sql import Row, SparkSession
  4. import socket
  5. # Singleton di SparkSession
  6. def getSparkSessionInstance(sparkConf):
  7. if ("sparkSessionSingletonInstance" not in globals()):
  8. globals()["sparkSessionSingletonInstance"] = SparkSession \
  9. .builder \
  10. .config(conf=sparkConf) \
  11. .getOrCreate()
  12. return globals()["sparkSessionSingletonInstance"]
  13. # invio dei dati tramite UDP
  14. def send_df(df,serverPort,UDPClient):
  15. values = [str(t.value) for t in df.select("value").collect()]
  16. str1 = '-'.join(values)
  17. msg=str.encode(str1);
  18. UDPClient.sendto(msg, serverPort)
  19. # informazioni UDP
  20. serverAddressPort = ("127.0.0.1", 3001)
  21. DPClientSocket = socket.socket(family=socket.AF_INET, type=socket.SOCK_DGRAM)
  22. sc = SparkContext("local[2]", "")
  23. sc.setLogLevel("ERROR")
  24. ssc = StreamingContext(sc, 2)
  25. lines = ssc.socketTextStream("localhost", 8000)
  26. splitted_val=lines.flatMap(lambda x:str(x).split('-'));
  27. values=splitted_val.map(lambda x:int(x));
  28. def process(time, rdd):
  29. print("========= %s =========" % str(time))
  30. try:
  31. # ottenimento del Singleton di SparkSession
  32. spark = getSparkSessionInstance(rdd.context.getConf())
  33. # Convert RDD[String] to RDD[Row] to DataFrame
  34. rowRdd = rdd.map(lambda w: Row(value=w))
  35. DataFrame = spark.createDataFrame(rowRdd)
  36. # creazione di una vista temporanea
  37. DataFrame.createOrReplaceTempView("values")
  38. # Do word count on table using SQL and print it
  39. valueDataFrame = \
  40. spark.sql("select value from values")
  41. valueDataFrame.show()
  42. send_df(valueDataFrame,serverAddressPort,DPClientSocket);
  43. except:
  44. pass
  45. values.foreachRDD(process)
  46. ssc.start(); # Start the computation
  47. ssc.awaitTermination();

我怎样才能解决这个问题?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题