什么是java中sparklauncher的归档?

4xy9mtcn  于 2021-08-25  发布在  Java
关注(0)|答案(1)|浏览(395)

我将提交pyspark任务,并提交一个包含该任务的环境。
我需要--归档文件来提交包含完整环境的zip包。
工作的spark submit命令如下

/my/spark/home/spark-submit
--master yarn 
--deploy-mode cluster 
--driver-memory 10G 
--executor-memory 8G 
--executor-cores 4 
--queue rnd 
--num-executors 8 
--archives /data/me/ld_env.zip#prediction_env 
--conf spark.pyspark.python=./prediction_env/ld_env/bin/python 
--conf spark.pyspark.driver.python=./prediction_env/ld_env/bin/python 
--conf spark.executor.memoryOverhead=4096 
--py-files dist/mylib-0.1.0-py3-none-any.whl my_task.py

我正在尝试用sparklauncher以编程方式启动spark应用程序

String pyPath = "my_task.py"
String archives = "/data/me/ld_env.zip#prediction_env"
SparkAppHandle handle = new SparkLauncher()
        .setSparkHome(sparkHome)
        .setAppResource(jarPath)
        .setMaster("yarn")
        .setDeployMode("cluster")
        .setConf(SparkLauncher.EXECUTOR_MEMORY, "8G")
        .setConf(SparkLauncher.EXECUTOR_CORES, "2")
        .setConf("spark.executor.instances", "8")
        .setConf("spark.yarn.queue", "rnd")
        .setConf("spark.pyspark.python", "./prediction_env/ld_env/bin/python")
        .setConf("spark.pyspark.driver.python", "./prediction_env/ld_env/bin/python")
        .setConf("spark.executor.memoryOverhead", "4096")
        .addPyFile(pyPath)
        // .addPyFile(archives) 
        // .addFile(archives)
        .addAppArgs("--inputPath",
                inputPath,
                "--outputPath",
                outputPath,
                "--option",
                option)
        .startApplication(taskListener);

我需要一个地方把我的压缩文件放在Yarn上。但我看不到任何档案功能。

tcomlyy6

tcomlyy61#

使用配置 spark.yarn.dist.archives 作为文件运行在Yarn和教程

String pyPath = "my_task.py"
String archives = "/data/me/ld_env.zip#prediction_env"
SparkAppHandle handle = new SparkLauncher()
        .setSparkHome(sparkHome)
        .setAppResource(jarPath)
        .setMaster("yarn")
        .setDeployMode("cluster")
        .setConf(SparkLauncher.EXECUTOR_MEMORY, "8G")
        .setConf(SparkLauncher.EXECUTOR_CORES, "2")
        .setConf("spark.executor.instances", "8")
        .setConf("spark.yarn.queue", "rnd")
        .setConf("spark.pyspark.python", "./prediction_env/ld_env/bin/python")
        .setConf("spark.pyspark.driver.python", "./prediction_env/ld_env/bin/python")
        .setConf("spark.executor.memoryOverhead", "4096")
        .setConf("spark.yarn.dist.archives", archives)
        .addPyFile(pyPath)
        .addAppArgs("--inputPath",
                inputPath,
                "--outputPath",
                outputPath,
                "--option",
                option)
        .startApplication(taskListener);

所以,加上 .setConf("spark.yarn.dist.archives", archives) 解决问题。

相关问题