如何将输出保存到sparkr中的特定路径

3pmvbmvn  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(412)

我正在使用spark-submit运行pi.r示例。输出显示在终端上,但我想将输出保存到hdfs路径。

/SparkR-pkg/lib/SparkR/sparkR-submit --master yarn-client examples/pi.R yarn-client 4

上面的命令用于运行pi.r示例
这是pi.r代码

library(SparkR)
args <- commandArgs(trailing = TRUE)
if (length(args) < 1) {
print("Usage: pi <master> [<slices>]")
q("no")
}
sc <- sparkR.init(args[[1]], "PiR")
slices <- ifelse(length(args) > 1, as.integer(args[[2]]), 2)
n <- 100000 * slices
piFunc <- function(elem) {
rands <- runif(n = 2, min = -1, max = 1)
val <- ifelse((rands[1]^2 + rands[2]^2) < 1, 1.0, 0.0)
val
}
piFuncVec <- function(elems) {
message(length(elems))
rands1 <- runif(n = length(elems), min = -1, max = 1)
rands2 <- runif(n = length(elems), min = -1, max = 1)
val <- ifelse((rands1^2 + rands2^2) < 1, 1.0, 0.0)
sum(val)
}
rdd <- parallelize(sc, 1:n, slices)
count <- reduce(lapplyPartition(rdd, piFuncVec), sum)
cat("Pi is roughly", 4.0 * count / n, "\n")
cat("Num elements in RDD ", count(rdd), "\n")

我想保存在hdfs的位置以上的输出。任何帮助将不胜感激

vdgimpew

vdgimpew1#

我没有在集群上测试这一点,但我猜常规r代码在这里可以工作:

output <- paste("Pi is roughly", 4.0 * count / n, "\n")
output <- paste(output, "Num elements in RDD ", count(rdd), "\n")
write(output,file="pathToFile") #can be location.any

相关问题