hadoop—从简单的java程序调用giraph作业

n7taea2i  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(489)

我对giraph和hadoop是个新手。遵循giraph的快速入门,我从命令行从源代码运行jar build的示例作业。
我想从简单的java程序运行这个作业。这个问题的灵感来自之前类似的mapreduce作业问题。寻找类似的答案与java的依赖性,这将是需要的。
我已经在本地设置了Yarn-需要有一种方法,以饲料的工作,从java程序。
很明显:https://giraph.apache.org/apidocs/org/apache/giraph/job/giraphjob.html 一定有办法做到这一点,但我发现很难找到Yarn的例子。

unhi4e5o

unhi4e5o1#

从giraphrunner的源代码中找到了一种方法:

@Test
public void testPageRank() throws IOException, ClassNotFoundException, InterruptedException {

    GiraphConfiguration giraphConf = new GiraphConfiguration(getConf());
    giraphConf.setWorkerConfiguration(1,1,100);
    GiraphConstants.SPLIT_MASTER_WORKER.set(giraphConf, false);

    giraphConf.setVertexInputFormatClass(JsonLongDoubleFloatDoubleVertexInputFormat.class);
    GiraphFileInputFormat.setVertexInputPath(giraphConf,
                                             new Path("/input/tiny-graph.txt"));
    giraphConf.setVertexOutputFormatClass(IdWithValueTextOutputFormat.class);

    giraphConf.setComputationClass(PageRankComputation.class);

    GiraphJob giraphJob = new GiraphJob(giraphConf, "page-rank");       

    FileOutputFormat.setOutputPath(giraphJob.getInternalJob(),
                                   new Path("/output/page-rank2"));
    giraphJob.run(true);
}

private Configuration getConf() {
    Configuration conf = new Configuration();
    conf.set("fs.defaultFS", "hdfs://localhost:9000");

    conf.set("yarn.resourcemanager.address", "localhost:8032");
    conf.set("yarn.resourcemanager.hostname", "localhost");

    // framework is now "yarn", should be defined like this in mapred-site.xm
    conf.set("mapreduce.framework.name", "yarn");
    return conf;
}

相关问题