val charCount = spark.read.textLines("path/to/file.txt")
.flatMap(line => line.toList())
.map(char => (char, 1)) // This is literally just wordcount, now
.reduceByKey(_ + _)
.map((char, count) => count)
.sum() // something like this ...
println(charCount.collect()(0))
1条答案
按热度按时间kyxcudwk1#
计算文件中的字符数
你为什么要用Hive呢?Spark是如此灵活。