kubernetes中executor pod在向k8s提交spark作业时保持创建然后删除

a6b3iqyw  于 12个月前  发布在  Apache
关注(0)|答案(1)|浏览(104)

我通过Airflow使用KubernetesPodOperator提交了一个Spark Job,如下所示;驱动程序pod被创建,但执行程序pod不断被创建和删除。

spark_submit = KubernetesPodOperator(
    task_id='test_spark_k8s_submit',
    name='test_spark_k8s_submit',
    namespace='dev-spark',
    image='docker.io/vinhlq9/bitnami-spark-3.3',
    cmds=['/opt/spark/bin/spark-submit'],
    arguments=[
        '--master', k8s_url,
        '--deploy-mode', 'cluster',
        '--name', 'spark-job',
        '--conf', 'spark.kubernetes.namespace=dev-spark',
        '--conf', 'spark.kubernetes.container.image=docker.io/vinhlq9/bitnami-spark-3.3',
        '--conf', 'spark.kubernetes.authenticate.driver.serviceAccountName=spark-user',
        '--conf', 'spark.kubernetes.authenticate.executor.serviceAccountName=spark-user',
        '--conf', 'spark.kubernetes.driverEnv.SPARK_CONF_DIR=/opt/bitnami/spark/conf',
        '--conf', 'spark.kubernetes.driverEnv.SPARK_CONFIG_MAP=spark-config',
        '--conf', 'spark.kubernetes.file.upload.path=/opt/spark',
        '--conf', 'spark.kubernetes.driver.annotation.sidecar.istio.io/inject=false',
        '--conf', 'spark.kubernetes.executor.annotation.sidecar.istio.io/inject=false',
        '--conf', 'spark.eventLog.enabled=true ',
        '--conf', 'spark.eventLog.dir=oss://spark/spark-log/',
        '--conf', 'spark.hadoop.fs.oss.accessKeyId=' + spark_user_access_key ,
        '--conf', 'spark.hadoop.fs.oss.accessKeySecret=' + spark_user_secret_key,
        '--conf', 'spark.hadoop.fs.oss.endpoint=' + spark_user_endpoint,
        '--conf', 'spark.hadoop.fs.oss.impl=org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem',
        '--conf', 'spark.executor.instances=1',
        '--conf', 'spark.executor.memory=4g',
        '--conf', 'spark.executor.cores=2',
        '--conf', 'spark.driver.memory=2g',
        'oss://spark/job/test_spark_k8s_job_simple.py'
    ],
    is_delete_operator_pod=True,
    config_file='/opt/airflow/plugins/k8sconfig-spark-user.json',
    get_logs=True,
    dag=dag
)

以及驱动程序pod中的日志:

spark 08:40:12.26
spark 08:40:12.26 Welcome to the Bitnami spark container
spark 08:40:12.27 Subscribe to project updates by watching https://github.com/bitnami/containers
spark 08:40:12.27 Submit issues and feature requests at https://github.com/bitnami/containers/issues
spark 08:40:12.27
23/05/16 08:40:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/05/16 08:40:16 INFO SparkContext: Running Spark version 3.3.2
23/05/16 08:40:16 INFO ResourceUtils: ==============================================================
23/05/16 08:40:16 INFO ResourceUtils: No custom resources configured for spark.driver.
23/05/16 08:40:16 INFO ResourceUtils: ==============================================================
23/05/16 08:40:16 INFO SparkContext: Submitted application: spark-read-csv
23/05/16 08:40:16 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 2, script: , vendor: , memory -> name: memory, amount: 4096, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
23/05/16 08:40:16 INFO ResourceProfile: Limiting resource is cpus at 2 tasks per executor
23/05/16 08:40:16 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/05/16 08:40:16 INFO SecurityManager: Changing view acls to: spark,root
23/05/16 08:40:16 INFO SecurityManager: Changing modify acls to: spark,root
23/05/16 08:40:16 INFO SecurityManager: Changing view acls groups to:
23/05/16 08:40:16 INFO SecurityManager: Changing modify acls groups to:
23/05/16 08:40:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark, root); groups with view permissions: Set(); users with modify permissions: Set(spark, root); groups with modify permissions: Set()
23/05/16 08:40:16 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
23/05/16 08:40:16 INFO SparkEnv: Registering MapOutputTracker
23/05/16 08:40:16 INFO SparkEnv: Registering BlockManagerMaster
23/05/16 08:40:16 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/05/16 08:40:16 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/05/16 08:40:16 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/05/16 08:40:16 INFO DiskBlockManager: Created local directory at /var/data/spark-77a2ee41-2c8e-45c6-9df6-bb1f549d4566/blockmgr-5350fab4-8dd7-432e-80b3-fbc1924f0dea
23/05/16 08:40:16 INFO MemoryStore: MemoryStore started with capacity 912.3 MiB
23/05/16 08:40:16 INFO SparkEnv: Registering OutputCommitCoordinator
23/05/16 08:40:16 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/05/16 08:40:16 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
23/05/16 08:40:18 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
23/05/16 08:40:18 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh
23/05/16 08:40:18 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh
23/05/16 08:40:18 INFO BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown script
23/05/16 08:40:18 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
23/05/16 08:40:18 INFO NettyBlockTransferService: Server created on spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc:7079
23/05/16 08:40:18 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/05/16 08:40:18 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc, 7079, None)
23/05/16 08:40:18 INFO BlockManagerMasterEndpoint: Registering block manager spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc:7079 with 912.3 MiB RAM, BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc, 7079, None)
23/05/16 08:40:18 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc, 7079, None)
23/05/16 08:40:18 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.baseline-dev-spark.svc, 7079, None)
23/05/16 08:40:18 INFO SingleEventLogFileWriter: Logging events to oss://spark/spark-log/spark-f6f3a41be773442dbc9a30781dffbc11.inprogress
23/05/16 08:40:21 INFO BlockManagerMaster: Removal of executor 1 requested
23/05/16 08:40:21 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asked to remove non-existent executor 1
23/05/16 08:40:21 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
23/05/16 08:40:21 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
23/05/16 08:40:21 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh
23/05/16 08:40:21 INFO BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown script
23/05/16 08:40:24 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.

executor pod中的循环:

23/05/16 08:40:25 INFO BlockManagerMaster: Removal of executor 2 requested
23/05/16 08:40:25 INFO BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
23/05/16 08:40:25 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asked to remove non-existent executor 2
23/05/16 08:40:27 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
23/05/16 08:40:27 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh

以前有人遇到过这种情况吗?如果能了解一下就太好了。

nnt7mjpx

nnt7mjpx1#

我已经修复了这个问题,它是由spark image中的java版本引起的。

相关问题