docker(hdfs、spark、Shinny r)

q0qdq0h2  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(456)

我在同一个网络中有3个容器:hadoop容器、spark容器和shanny r容器
我想从我闪亮的应用程序中读取hdfs上的文件夹。如果hadoop、spark和Shinny r在同一台服务器上(没有docker容器),我可以使用以下方法:

system(paste0("hdfs dfs -ls ", "/"), intern = TRUE)

如果我使用docker容器,其中hadoop和Shinny r在不同的容器中,我不能这样做:

system(paste0("hdfs dfs -ls ", "/"), intern = TRUE)

因为他们是独立的。
你知道我怎么做吗?
我试图使用sparklyr的invoke函数,但没有成功。

> library(sparklyr)
>
> conf = spark_config()
>
> sc <- spark_connect(master = "local[*]", config = conf)
Re-using existing Spark connection to local[*]
>
> hconf <- sc %>% spark_context() %>% invoke("hadoopConfiguration")
>
> path <- 'hdfs://namenode:9000/user/root/input2/'
>
> spath <- sparklyr::invoke_new(sc, 'org.apache.hadoop.fs.Path', path)
> spath
<jobj[30]>
  org.apache.hadoop.fs.Path
  hdfs://namenode:9000/user/root/input2
> fs <- invoke_static(sc, "org.apache.hadoop.fs.FileSystem", "get",  hconf)
> fs
<jobj[32]>
  org.apache.hadoop.fs.LocalFileSystem
  org.apache.hadoop.fs.LocalFileSystem@788cf1b0
> lls <- invoke(fs, "globStatus", spath)
Error: java.lang.IllegalArgumentException: Wrong FS: hdfs://namenode:9000/user/root/input2, expected: file:///
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649)
        at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:252)

谢谢你的帮助

ax6ht2ek

ax6ht2ek1#

我用这个:/var/run/docker.sock解决了这个问题。所以我换了 Docker 。我的服务是:

shiny:
        image: anaid/shiny:1.1
        volumes:
          - 'shiny_logs:/var/log/shiny-server'
          - '/var/run/docker.sock:/var/run/docker.sock'
        ports:
          - "3838:3838"

我完整的 Docker 组成是:

version: "2"

    services:
      namenode:
        image: anaid/hadoop-namenode:1.1
        container_name: namenode
        volumes:
          - hadoop_namenode:/hadoop/dfs/name
          - hadoop_namenode_files:/hadoop/dfs/files
        environment:
          - CLUSTER_NAME=test
        env_file:
          - ./hadoop.env
        ports:
          - 9899:9870

      datanode:
        image: anaid/hadoop-datanode:1.1
        container_name: datanode
        depends_on:
          - namenode
        environment:
          SERVICE_PRECONDITION: "namenode:9870"
        volumes:
          - hadoop_datanode1:/hadoop/dfs/data
          - hadoop_namenode_files1:/hadoop/dfs/files
        env_file:
          - ./hadoop.env      

      mongodb:
        image: mongo
        container_name: mongodb
        ports:
          - "27020:27017"

      shiny:
        image: anaid/shiny:1.1
        volumes:
          - 'shiny_logs:/var/log/shiny-server'
          - /Users/anaid/Docker/hadoop_spark/hadoop-spark-master/shiny:/srv/shiny-server/
          - '/var/run/docker.sock:/var/run/docker.sock'
        ports:
          - "3838:3838"

      nodemanager:
        image: anaid/hadoop-nodemanager:1.1
        container_name: nodemanager
        depends_on:
          - namenode
          - datanode
        env_file:
          - ./hadoop.env

      historyserver:
        image: anaid/hadoop-historyserver:1.1
        container_name: historyserver
        depends_on:
          - namenode
          - datanode
        volumes:
          - hadoop_historyserver:/hadoop/yarn/timeline
        env_file:
          - ./hadoop.env

      spark-master:
        image: anaid/spark-master:1.1
        container_name: spark-master
        ports:
          - "9090:8080"
          - "7077:7077"
        volumes:
           - ./apps:/opt/spark-apps
           - ./data:/opt/spark-data
        environment:
          - "SPARK_LOCAL_IP=spark-master"

      spark-worker-1:
        image: anaid/spark-worker:1.1
        container_name: spark-worker-1
        depends_on:
          - spark-master
        environment:
          - SPARK_MASTER=spark://spark-master:7077
          - SPARK_WORKER_CORES=1
          - SPARK_WORKER_MEMORY=30G
          - SPARK_DRIVER_MEMORY=15G
          - SPARK_EXECUTOR_MEMORY=15G
        volumes:
           - ./apps:/opt/spark-apps
           - ./data:/opt/spark-data
        ports:
          - "8083:8081"

      spark-worker-2:
        image: anaid/spark-worker:1.1
        container_name: spark-worker-2
        depends_on:
          - spark-master
        environment:
          - SPARK_MASTER=spark://spark-master:7077
          - SPARK_WORKER_CORES=1
          - SPARK_WORKER_MEMORY=30G
          - SPARK_DRIVER_MEMORY=15G
          - SPARK_EXECUTOR_MEMORY=15G
        volumes:
           - ./apps:/opt/spark-apps
           - ./data:/opt/spark-data
        ports:
          - "8084:8081"

    volumes:
      hadoop_namenode:
      hadoop_datanode1:
      hadoop_namenode_files:
      hadoop_namenode_files1:
      hadoop_historyserver:
      shiny_logs:
      mongo-config:

然后我不得不在我闪亮的容器里安装docker。我在dockerfile上添加了命令。我闪亮的dockerfile是:


# get shiny serves plus tidyverse packages image

FROM rocker/shiny:3.6.1

# system libraries of general use

RUN apt-get update && apt-get install -y \
    sudo

# Anaid added for V8 and sparklyr library

RUN apt-get install -y \ 
        r-cran-xml \
        openjdk-8-jdk \
        libv8-dev \ 
        libxml2 \ 
        libxml2-dev \ 
        libssl-dev \
        libcurl4-openssl-dev \
        libcairo2-dev \
        libsasl2-dev \
        libssl-dev \
        vim 

RUN sudo apt-get install -y \ 
         apt-transport-https \
         ca-certificates \
         curl \
         gnupg2 \
         software-properties-common

# For docker inside the container

# Add Docker’s official GPG key:

RUN curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -

RUN sudo apt-key fingerprint 0EBFCD88

RUN sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/debian \
   $(lsb_release -cs) \
   stable"

RUN sudo apt-get update

# Install the latest version of Docker Engine

RUN sudo apt-get install -y \ 
        docker-ce \ 
        docker-ce-cli \ 
        containerd.io

RUN sudo apt-get install -y \  
        docker-ce=5:19.03.2~3-0~debian-stretch \ 
        docker-ce-cli=5:19.03.2~3-0~debian-stretch \ 
        containerd.io

# Download and install library. They are saved here /usr/local/lib/R/site-library

RUN R -e "install.packages(c('shiny', 'Rcpp' ,'pillar', 'git2r', 'compiler',  'dbplyr',   'r2d3', 'base64enc', 'devtools',    'zeallot',    'digest',    'jsonlite',  'tibble',     'pkgconfig',  'rlang',   'DBI',   'cli',   'rstudioapi',   'yaml',   'arallel',   'withr',   'dplyr',   'httr_1.4.0',       'generics',   'htmlwidgets',   'vctrs',   'askpass',   'rprojroot',   'tidyselect',   'glue',   'forge',   'R6',   'fansi',   'purrr',   'magrittr',   'backports',   'htmltools',   'ellipsis',   'assertthat',   'config',   'utf8',   'openssl',   'crayon', 'shinydashboard',  'BBmisc', 'ggfortify', 'cluster','stringr', 'DT', 'plotly', 'ggplot2', 'shinyjs', 'dplyr', 'stats', 'graphics', 'grDevices', 'utils', 'datasets', 'methods', 'base', 'Rtools', 'XML', 'data.table', 'jsonlite', 'yaml'))"

RUN R -e "install.packages(c('devtools', 'XML', 'data.table', 'jsonlite', 'yaml', 'rlist', 'V8', 'sparklyr'), repos='http://cran.rstudio.com/')"

RUN R -e "install.packages(c('lattice', 'nlme', 'broom', 'sparklyr', 'shinyalert', 'mongolite', 'jtools'), repos='http://cran.rstudio.com/')"

## create directories

## RUN mkdir -p /myScripts

## copy files

## COPY /myScripts/installMissingPkgs.R /myScripts/installMissingPkgs.R

## COPY /myScripts/packageList /myScripts/packageList

## install R-packages

## RUN Rscript /myScripts/installMissingPkgs.R

# copy the app to the image

COPY app.R /srv/shiny-server/

# select port

EXPOSE 3838

# allow permission

RUN sudo chown -R shiny:shiny /srv/shiny-server

# run app

CMD ["/usr/bin/shiny-server.sh"]

在docker容器中使用docker系统函数和docker命令
然后我在应用程序中使用r系统函数时遇到了一些问题。这是错误:

Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/namenode/json: dial unix /var/run/docker.sock: connect: permission denied
    Warning in system(paste0("docker exec -it namenode hdfs dfs -ls ", dir),  :
      running command 'docker exec -it namenode hdfs dfs -ls /' had status 1

我通过以下操作解决了这个问题(shiny的容器):

sudo chmod 666 /var/run/docker.sock

然后,我在app user=root上添加了:

system("USER=root")
    system("docker exec namenode hdfs dfs -ls /", intern = TRUE)

我的简单应用程序使用system()的代码:

library(shiny)
library(tools)
library(stringi)

ui <- fluidPage(

  h3(textOutput("system"))

)

server <- function(input, output, session) {

  rv <- reactiveValues(syst = NULL)

  observe({
    # pwd
    # docker ps working
      system("USER=root")
      rv$syst <- paste(system("docker exec namenode hdfs dfs -ls /", intern = TRUE), system("ls", intern = TRUE) ) 
    })

  output$system <- renderText({ 
    rv$syst
  })
}

shinyApp(ui, server)

我的闪亮应用程序正在运行(使用系统)

相关问题