我正在尝试将pyarrow Filesystem接口与HDFS配合使用。libhdfs.so在调用fs. Hadoop文件系统构造函数时,我收到了www.example.com not found错误,尽管libhdfs.so显然位于指定的位置。
from pyarrow import fs
hfs = fs.HadoopFileSystem(host="10.10.0.167", port=9870)
OSError: Unable to load libhdfs: /hadoop-3.3.1/lib/native/libhdfs.so: cannot open shared object file: No such file or directory
我已经尝试了不同的python和pyarrow版本,并设置了ARROW_LIBHDFS_DIR。为了测试,我在linuxmint上使用了下面的dockerfile。
FROM openjdk:11
RUN apt-get update &&\
apt-get install wget -y
RUN wget -nv https://dlcdn.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1-aarch64.tar.gz &&\
tar -xf hadoop-3.3.1-aarch64.tar.gz
ENV PATH=/miniconda/bin:${PATH}
RUN wget -nv https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh &&\
bash miniconda.sh -b -p /miniconda &&\
conda init
RUN conda install -c conda-forge python=3.9.6
RUN conda install -c conda-forge pyarrow=4.0.1
ENV JAVA_HOME=/usr/local/openjdk-11
ENV HADOOP_HOME=/hadoop-3.3.1
RUN printf 'from pyarrow import fs\nhfs = fs.HadoopFileSystem(host="10.10.0.167", port=9870)\n' > test_arrow.py
# 'python test_arrow.py' fails with ...
# OSError: Unable to load libhdfs: /hadoop-3.3.1/lib/native/libhdfs.so: cannot open shared object file: No such file or directory
RUN python test_arrow.py || true
CMD ["/bin/bash"]
1条答案
按热度按时间vuktfyat1#
我已经为pyarrow fs hadoopfilesystem客户端创建了一个docker文件。需要安装HDFS才能使用libhdfs.so文件。