如何检查kubernetes上spark中的错误原因?

wnrlj8wa  于 2021-05-19  发布在  Spark
关注(0)|答案(1)|浏览(563)

我运行下面的命令在kubernetes上运行spark作业。

./bin/spark-submit \
        --master k8s://https://192.168.0.91:6443 \
        --deploy-mode cluster \
        --name spark-steve-test \
        --class org.apache.spark.examples.Spark \
        --conf spark.executor.instances=2 \
        --conf spark.kubernetes.namespace=spark \
        --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
        --conf spark.kubernetes.container.image=sclee01/spark:v2.3.0 \
        local:///opt/spark/examples/jars/spark-examples_2.12-3.0.1.jar

然而,我得到下面的消息,它似乎是由于某些原因,荚果没有创建。

20/10/22 12:00:36 INFO LoggingPodStatusWatcherImpl: Application status for spark-6a79e5b39a84403bb83dbf69ca20a02c (phase: Pending)
20/10/22 12:00:37 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
         pod name: spark-steve-test-01-734603754e4038ed-driver
         namespace: spark
         labels: spark-app-selector -> spark-6a79e5b39a84403bb83dbf69ca20a02c, spark-role -> driver
         pod uid: 5afec1f7-b1cd-4bac-a6c2-be239e0efc30
         creation time: 2020-10-22T03:00:34Z
         service account name: spark
         volumes: spark-local-dir-1, spark-conf-volume, spark-token-bdwh9
         node name: bistelresearchdev-sm
         start time: 2020-10-22T03:00:34Z
         phase: Running
         container status: 
                 container name: spark-kubernetes-driver
                 container image: sclee01/spark:v2.3.0
                 container state: running
                 container started at: 2020-10-22T03:00:37Z
20/10/22 12:00:37 INFO LoggingPodStatusWatcherImpl: Application status for spark-6a79e5b39a84403bb83dbf69ca20a02c (phase: Running)
20/10/22 12:00:38 INFO LoggingPodStatusWatcherImpl: Application status for spark-6a79e5b39a84403bb83dbf69ca20a02c (phase: Running)
20/10/22 12:00:39 INFO LoggingPodStatusWatcherImpl: Application status for spark-6a79e5b39a84403bb83dbf69ca20a02c (phase: Running)
20/10/22 12:00:40 INFO LoggingPodStatusWatcherImpl: Application status for spark-6a79e5b39a84403bb83dbf69ca20a02c (phase: Running)
20/10/22 12:00:41 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
         pod name: spark-steve-test-01-734603754e4038ed-driver
         namespace: spark
         labels: spark-app-selector -> spark-6a79e5b39a84403bb83dbf69ca20a02c, spark-role -> driver
         pod uid: 5afec1f7-b1cd-4bac-a6c2-be239e0efc30
         creation time: 2020-10-22T03:00:34Z
         service account name: spark
         volumes: spark-local-dir-1, spark-conf-volume, spark-token-bdwh9
         node name: bistelresearchdev-sm
         start time: 2020-10-22T03:00:34Z
         phase: Failed
         container status: 
                 container name: spark-kubernetes-driver
                 container image: sclee01/spark:v2.3.0
                 container state: terminated
                 container started at: 2020-10-22T03:00:37Z
                 container finished at: 2020-10-22T03:00:40Z
                 exit code: 1
                 termination reason: Error
20/10/22 12:00:41 INFO LoggingPodStatusWatcherImpl: Application status for spark-6a79e5b39a84403bb83dbf69ca20a02c (phase: Failed)
20/10/22 12:00:41 INFO LoggingPodStatusWatcherImpl: Container final statuses:

         container name: spark-kubernetes-driver
         container image: sclee01/spark:v2.3.0
         container state: terminated
         container started at: 2020-10-22T03:00:37Z
         container finished at: 2020-10-22T03:00:40Z
         exit code: 1
         termination reason: Error
20/10/22 12:00:41 INFO LoggingPodStatusWatcherImpl: Application spark-steve-test-01 with submission ID spark:spark-steve-test-01-734603754e4038ed-driver finished
20/10/22 12:00:41 INFO ShutdownHookManager: Shutdown hook called
20/10/22 12:00:41 INFO ShutdownHookManager: Deleting directory /tmp/spark-2e4e5f9a-c54d-4790-b4cb-f9b6cd1e2105

我唯一能看到的是 error 我看不出具体的原因。我运行了下面的命令,但它没有给我任何进一步的信息。

bistel@BISTelResearchDev-NN:~/user/sclee/project/spark/spark-3.0.1-bin-hadoop2.7$  kubectl logs -p spark-steve-test-01-734603754e4038ed-driver
Error from server (NotFound): pods "spark-steve-test-01-734603754e4038ed-driver" not found
bistel@BISTelResearchDev-NN:~/user/sclee/project/spark/spark-3.0.1-bin-hadoop2.7$

任何帮助都会被请求的。
谢谢。

ilmyapht

ilmyapht1#

有关更多详细信息,请使用 kubectl describe pod <pod-name> . 它将打印所选资源的详细描述,包括事件或控制器等相关资源。
你也可以使用 kubectl get event | grep pod/<pod-name> -它将只显示选定吊舱的事件。

相关问题