我在跑步 sparkR
程序。我想把输出保存在 hdfs
输出保存在本地,但是如果我提到 hdfs
路径意味着它抛出错误。我正在从shell脚本执行。这是我的shell脚本:
/SparkR-pkg/lib/SparkR/sparkR-submit --master yarn-client examples/pi.R yarn-client 4
这是我的r码。
library(SparkR)
getwd()
setwd('hdfs://ip-172-31-41-199.us-wes t2.compute.internal:8020/user/karun/output/')
args <- commandArgs(trailing = TRUE)
if (length(args) < 1) {
print("Usage: pi <master> [<slices>]")
q("no")
}
sc <- sparkR.init(args[[1]], "PiR")
slices <- ifelse(length(args) > 1, as.integer(args[[2]]), 2)
n <- 100000 * slices
piFunc <- function(elem) {
rands <- runif(n = 2, min = -1, max = 1)
val <- ifelse((rands[1]^2 + rands[2]^2) < 1, 1.0, 0.0)
val
}
piFuncVec <- function(elems) {
message(length(elems))
rands1 <- runif(n = length(elems), min = -1, max = 1)
rands2 <- runif(n = length(elems), min = -1, max = 1)
val <- ifelse((rands1^2 + rands2^2) < 1, 1.0, 0.0)
sum(val)
}
rdd <- parallelize(sc, 1:n, slices)
count <- reduce(lapplyPartition(rdd, piFuncVec), sum)
output <- paste("Pi is roughly", 4.0 * count / n, "\n")
output <- paste(output, "Num elements in RDD ", count(rdd), "\n")
writeLines(output, con = "file.txt", sep = "\n", useBytes = FALSE)
cat("Num elements in RDD ", count(rdd), "\n")
我尝试了许多方法将输出保存在hdfs link sink、write.data、writetype等中。。我正在尝试通过提及setwd()来更改工作目录。此查询也不起作用。它会引发错误
setwd中出错(“hdfs://ip-172-31-41-199.us-west- 2.compute.internal:8020/user/karun/output/“):无法更改工作目录执行已暂停
我已经排除了2天的故障。任何帮助将不胜感激
暂无答案!
目前还没有任何答案,快来回答吧!