pyspark 如何在AWS Glue脚本中导入引用文件(XML)

ryevplcw 于 2023-03-01 发布在 Spark

关注(0)|答案(1)|浏览(129)

我尝试在公平调度模式下运行粘合作业。为此，我创建了一个名为fairschedular.xml的xml文件
然后，我将此fairschedular.xml添加到s3 bucket中，并将该位置添加到glue job的引用路径中，如下所示：

<?xml version="1.0"?>
<allocations>
 <pool name="1">
   <schedulingMode>FIFO</schedulingMode>
   <weight>1</weight>
   <minShare>2</minShare>
 </pool>
 <pool name="2">
   <schedulingMode>FIFO</schedulingMode>
   <weight>1</weight>
   <minShare>2</minShare>
 </pool>
</allocations>

然后我在脚本中使用如下：

class JobBase(object):
    
    fair_scheduler_config_file= "fairscheduler.xml"
    rowAsDict={}
    Oracle_Username=None
    Oracle_Password=None
    Oracle_jdbc_url=None

    def __start_spark_glue_context(self):
        conf = SparkConf().setAppName("python_thread").set('spark.scheduler.mode', 'FAIR').set("spark.scheduler.allocation.file", self.fair_scheduler_config_file)
        self.sc = SparkContext(conf=conf)
        self.glueContext = GlueContext(self.sc)
        self.spark = self.glueContext.spark_session

但是当代码运行时，我在spark ui历史服务器中没有看到公平调度池，我看到了公平调度。