docker(hdfs、spark、Shinny r)

q0qdq0h2  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(452)

我在同一个网络中有3个容器:hadoop容器、spark容器和shanny r容器
我想从我闪亮的应用程序中读取hdfs上的文件夹。如果hadoop、spark和Shinny r在同一台服务器上(没有docker容器),我可以使用以下方法:

  1. system(paste0("hdfs dfs -ls ", "/"), intern = TRUE)

如果我使用docker容器,其中hadoop和Shinny r在不同的容器中,我不能这样做:

  1. system(paste0("hdfs dfs -ls ", "/"), intern = TRUE)

因为他们是独立的。
你知道我怎么做吗?
我试图使用sparklyr的invoke函数,但没有成功。

  1. > library(sparklyr)
  2. >
  3. > conf = spark_config()
  4. >
  5. > sc <- spark_connect(master = "local[*]", config = conf)
  6. Re-using existing Spark connection to local[*]
  7. >
  8. > hconf <- sc %>% spark_context() %>% invoke("hadoopConfiguration")
  9. >
  10. > path <- 'hdfs://namenode:9000/user/root/input2/'
  11. >
  12. > spath <- sparklyr::invoke_new(sc, 'org.apache.hadoop.fs.Path', path)
  13. > spath
  14. <jobj[30]>
  15. org.apache.hadoop.fs.Path
  16. hdfs://namenode:9000/user/root/input2
  17. > fs <- invoke_static(sc, "org.apache.hadoop.fs.FileSystem", "get", hconf)
  18. > fs
  19. <jobj[32]>
  20. org.apache.hadoop.fs.LocalFileSystem
  21. org.apache.hadoop.fs.LocalFileSystem@788cf1b0
  22. > lls <- invoke(fs, "globStatus", spath)
  23. Error: java.lang.IllegalArgumentException: Wrong FS: hdfs://namenode:9000/user/root/input2, expected: file:///
  24. at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649)
  25. at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
  26. at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606)
  27. at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
  28. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
  29. at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
  30. at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
  31. at org.apache.hadoop.fs.Globber.glob(Globber.java:252)

谢谢你的帮助

ax6ht2ek

ax6ht2ek1#

我用这个:/var/run/docker.sock解决了这个问题。所以我换了 Docker 。我的服务是:

  1. shiny:
  2. image: anaid/shiny:1.1
  3. volumes:
  4. - 'shiny_logs:/var/log/shiny-server'
  5. - '/var/run/docker.sock:/var/run/docker.sock'
  6. ports:
  7. - "3838:3838"

我完整的 Docker 组成是:

  1. version: "2"
  2. services:
  3. namenode:
  4. image: anaid/hadoop-namenode:1.1
  5. container_name: namenode
  6. volumes:
  7. - hadoop_namenode:/hadoop/dfs/name
  8. - hadoop_namenode_files:/hadoop/dfs/files
  9. environment:
  10. - CLUSTER_NAME=test
  11. env_file:
  12. - ./hadoop.env
  13. ports:
  14. - 9899:9870
  15. datanode:
  16. image: anaid/hadoop-datanode:1.1
  17. container_name: datanode
  18. depends_on:
  19. - namenode
  20. environment:
  21. SERVICE_PRECONDITION: "namenode:9870"
  22. volumes:
  23. - hadoop_datanode1:/hadoop/dfs/data
  24. - hadoop_namenode_files1:/hadoop/dfs/files
  25. env_file:
  26. - ./hadoop.env
  27. mongodb:
  28. image: mongo
  29. container_name: mongodb
  30. ports:
  31. - "27020:27017"
  32. shiny:
  33. image: anaid/shiny:1.1
  34. volumes:
  35. - 'shiny_logs:/var/log/shiny-server'
  36. - /Users/anaid/Docker/hadoop_spark/hadoop-spark-master/shiny:/srv/shiny-server/
  37. - '/var/run/docker.sock:/var/run/docker.sock'
  38. ports:
  39. - "3838:3838"
  40. nodemanager:
  41. image: anaid/hadoop-nodemanager:1.1
  42. container_name: nodemanager
  43. depends_on:
  44. - namenode
  45. - datanode
  46. env_file:
  47. - ./hadoop.env
  48. historyserver:
  49. image: anaid/hadoop-historyserver:1.1
  50. container_name: historyserver
  51. depends_on:
  52. - namenode
  53. - datanode
  54. volumes:
  55. - hadoop_historyserver:/hadoop/yarn/timeline
  56. env_file:
  57. - ./hadoop.env
  58. spark-master:
  59. image: anaid/spark-master:1.1
  60. container_name: spark-master
  61. ports:
  62. - "9090:8080"
  63. - "7077:7077"
  64. volumes:
  65. - ./apps:/opt/spark-apps
  66. - ./data:/opt/spark-data
  67. environment:
  68. - "SPARK_LOCAL_IP=spark-master"
  69. spark-worker-1:
  70. image: anaid/spark-worker:1.1
  71. container_name: spark-worker-1
  72. depends_on:
  73. - spark-master
  74. environment:
  75. - SPARK_MASTER=spark://spark-master:7077
  76. - SPARK_WORKER_CORES=1
  77. - SPARK_WORKER_MEMORY=30G
  78. - SPARK_DRIVER_MEMORY=15G
  79. - SPARK_EXECUTOR_MEMORY=15G
  80. volumes:
  81. - ./apps:/opt/spark-apps
  82. - ./data:/opt/spark-data
  83. ports:
  84. - "8083:8081"
  85. spark-worker-2:
  86. image: anaid/spark-worker:1.1
  87. container_name: spark-worker-2
  88. depends_on:
  89. - spark-master
  90. environment:
  91. - SPARK_MASTER=spark://spark-master:7077
  92. - SPARK_WORKER_CORES=1
  93. - SPARK_WORKER_MEMORY=30G
  94. - SPARK_DRIVER_MEMORY=15G
  95. - SPARK_EXECUTOR_MEMORY=15G
  96. volumes:
  97. - ./apps:/opt/spark-apps
  98. - ./data:/opt/spark-data
  99. ports:
  100. - "8084:8081"
  101. volumes:
  102. hadoop_namenode:
  103. hadoop_datanode1:
  104. hadoop_namenode_files:
  105. hadoop_namenode_files1:
  106. hadoop_historyserver:
  107. shiny_logs:
  108. mongo-config:

然后我不得不在我闪亮的容器里安装docker。我在dockerfile上添加了命令。我闪亮的dockerfile是:

  1. # get shiny serves plus tidyverse packages image
  2. FROM rocker/shiny:3.6.1
  3. # system libraries of general use
  4. RUN apt-get update && apt-get install -y \
  5. sudo
  6. # Anaid added for V8 and sparklyr library
  7. RUN apt-get install -y \
  8. r-cran-xml \
  9. openjdk-8-jdk \
  10. libv8-dev \
  11. libxml2 \
  12. libxml2-dev \
  13. libssl-dev \
  14. libcurl4-openssl-dev \
  15. libcairo2-dev \
  16. libsasl2-dev \
  17. libssl-dev \
  18. vim
  19. RUN sudo apt-get install -y \
  20. apt-transport-https \
  21. ca-certificates \
  22. curl \
  23. gnupg2 \
  24. software-properties-common
  25. # For docker inside the container
  26. # Add Docker’s official GPG key:
  27. RUN curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
  28. RUN sudo apt-key fingerprint 0EBFCD88
  29. RUN sudo add-apt-repository \
  30. "deb [arch=amd64] https://download.docker.com/linux/debian \
  31. $(lsb_release -cs) \
  32. stable"
  33. RUN sudo apt-get update
  34. # Install the latest version of Docker Engine
  35. RUN sudo apt-get install -y \
  36. docker-ce \
  37. docker-ce-cli \
  38. containerd.io
  39. RUN sudo apt-get install -y \
  40. docker-ce=5:19.03.2~3-0~debian-stretch \
  41. docker-ce-cli=5:19.03.2~3-0~debian-stretch \
  42. containerd.io
  43. # Download and install library. They are saved here /usr/local/lib/R/site-library
  44. RUN R -e "install.packages(c('shiny', 'Rcpp' ,'pillar', 'git2r', 'compiler', 'dbplyr', 'r2d3', 'base64enc', 'devtools', 'zeallot', 'digest', 'jsonlite', 'tibble', 'pkgconfig', 'rlang', 'DBI', 'cli', 'rstudioapi', 'yaml', 'arallel', 'withr', 'dplyr', 'httr_1.4.0', 'generics', 'htmlwidgets', 'vctrs', 'askpass', 'rprojroot', 'tidyselect', 'glue', 'forge', 'R6', 'fansi', 'purrr', 'magrittr', 'backports', 'htmltools', 'ellipsis', 'assertthat', 'config', 'utf8', 'openssl', 'crayon', 'shinydashboard', 'BBmisc', 'ggfortify', 'cluster','stringr', 'DT', 'plotly', 'ggplot2', 'shinyjs', 'dplyr', 'stats', 'graphics', 'grDevices', 'utils', 'datasets', 'methods', 'base', 'Rtools', 'XML', 'data.table', 'jsonlite', 'yaml'))"
  45. RUN R -e "install.packages(c('devtools', 'XML', 'data.table', 'jsonlite', 'yaml', 'rlist', 'V8', 'sparklyr'), repos='http://cran.rstudio.com/')"
  46. RUN R -e "install.packages(c('lattice', 'nlme', 'broom', 'sparklyr', 'shinyalert', 'mongolite', 'jtools'), repos='http://cran.rstudio.com/')"
  47. ## create directories
  48. ## RUN mkdir -p /myScripts
  49. ## copy files
  50. ## COPY /myScripts/installMissingPkgs.R /myScripts/installMissingPkgs.R
  51. ## COPY /myScripts/packageList /myScripts/packageList
  52. ## install R-packages
  53. ## RUN Rscript /myScripts/installMissingPkgs.R
  54. # copy the app to the image
  55. COPY app.R /srv/shiny-server/
  56. # select port
  57. EXPOSE 3838
  58. # allow permission
  59. RUN sudo chown -R shiny:shiny /srv/shiny-server
  60. # run app
  61. CMD ["/usr/bin/shiny-server.sh"]

在docker容器中使用docker系统函数和docker命令
然后我在应用程序中使用r系统函数时遇到了一些问题。这是错误:

  1. Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/namenode/json: dial unix /var/run/docker.sock: connect: permission denied
  2. Warning in system(paste0("docker exec -it namenode hdfs dfs -ls ", dir), :
  3. running command 'docker exec -it namenode hdfs dfs -ls /' had status 1

我通过以下操作解决了这个问题(shiny的容器):

  1. sudo chmod 666 /var/run/docker.sock

然后,我在app user=root上添加了:

  1. system("USER=root")
  2. system("docker exec namenode hdfs dfs -ls /", intern = TRUE)

我的简单应用程序使用system()的代码:

  1. library(shiny)
  2. library(tools)
  3. library(stringi)
  4. ui <- fluidPage(
  5. h3(textOutput("system"))
  6. )
  7. server <- function(input, output, session) {
  8. rv <- reactiveValues(syst = NULL)
  9. observe({
  10. # pwd
  11. # docker ps working
  12. system("USER=root")
  13. rv$syst <- paste(system("docker exec namenode hdfs dfs -ls /", intern = TRUE), system("ls", intern = TRUE) )
  14. })
  15. output$system <- renderText({
  16. rv$syst
  17. })
  18. }
  19. shinyApp(ui, server)

我的闪亮应用程序正在运行(使用系统)

展开查看全部

相关问题