spark程序从未指定的位置获取hadoop配置

jogvjijk 于 2021-05-31 发布在 Hadoop

关注(0)|答案(0)|浏览(233)

我有几个测试用例，比如在hdfs上读/写一个文件，我想用scala自动化，用maven运行。我已经将测试环境的hadoop配置文件放在maven项目的resources目录中。在我用来运行项目的任何集群的所需集群上，项目也运行良好。
我不明白的一点是，即使我在项目中没有指定hadoop配置，spark如何从resources目录中获取hadoop配置。下面是项目的代码片段。

def getSparkContext(hadoopConfiguration: Configuration): SparkContext ={
    val conf = new SparkConf().setAppName("SparkTest").setMaster("local")     
    val hdfsCoreSitePath = new Path("/etc/hadoop/conf/core-site.xml","core-site.xml")
    val hdfsHDFSSitePath = new Path("/etc/hadoop/conf/hdfs-site.xml","hdfs-site.xml")
    val hdfsYarnSitePath = new Path("/etc/hadoop/conf/yarn-site.xml","yarn-site.xml")
    val hdfsMapredSitePath = new Path("/etc/hadoop/conf/mapred-site.xml","mapred-site.xml")
    hadoopConfiguration.addResource(hdfsCoreSitePath)
    hadoopConfiguration.addResource(hdfsHDFSSitePath)
    hadoopConfiguration.addResource(hdfsYarnSitePath)
    hadoopConfiguration.addResource(hdfsMapredSitePath)
    hadoopConfiguration.set("hadoop.security.authentication", "Kerberos")
    UserGroupInformation.setConfiguration(hadoopConfiguration)
    UserGroupInformation.loginUserFromKeytab("alice", "/etc/security/keytab/alice.keytab")
    println("-----------------Logged-in via keytab---------------------")
    FileSystem.get(hadoopConfiguration)
    val sc=new SparkContext(conf)
    return sc
  }
@Test
def testCase(): Unit = {
    var hadoopConfiguration: Configuration = new Configuration()
    val sc=getSparkContext(hadoopConfiguration)
    //rest of the code
    //...
    //...
  }

这里，我用过 hadoopconfiguration 对象，但我没有指定该对象 sparkContext 因为这将在我用来运行项目的集群上运行测试，而不是在某些远程测试环境上。
如果这不是一个正确的方法？有没有人能解释一下我应该如何实现我的动机，在某个远程集群的测试环境中运行spark测试用例？

hadoop scala apache-spark

来源：https://stackoverflow.com/questions/50085620/spark-program-taking-hadoop-configurations-from-an-unspecified-location

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

spark程序从未指定的位置获取hadoop配置

暂无答案！

相关问题

热门标签

最新问答