crunch sparkpipeline无法按预期工作

wz1wpwve 于 2021-05-29 发布在 Hadoop

关注(0)|答案(0)|浏览(250)

我正在尝试将我们的代码从crunch mrpipeline迁移到sparkpipeline。我试过这样一个简单的例子

SparkConf sc = new SparkConf().setAppName("Crunch Spark Count").setMaster("local");
JavaSparkContext jsc = new JavaSparkContext(sc);
SparkPipeline p = new SparkPipeline(jsc, "Crunch Spark Count");
PCollection<String> lines = p.read(From.textFile(new Path(fileUrl)));
PCollection<String> words = lines.parallelDo(new Tokenizer(), Writables.strings());
PTable<String, Long> counts = words.count();

我的输入文件类似于file1:hello world hello hadoop file2:hello spark
运行spark程序后，输出结果总是

[hello, 1]
[hadoop, 1]
[world, 1]
[spark, 1]

实际上，打招呼的次数应该是3
这就是crunch'count'函数错误？

hadoop apache-spark apache-crunch

来源：https://stackoverflow.com/questions/35218526/crunch-sparkpipeline-does-not-work-as-expected

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

crunch sparkpipeline无法按预期工作

暂无答案！

相关问题

热门标签

最新问答