还配置了必要的参数,以便python可以从hdfsh读取:
export ARROW_LIBHDFS_DIR='/opt/hadoop/lib/native'
export HADOOP_COMMON_LIB_NATIVE_DIR='/opt/hadoop/lib/native'
export HADOOP_OPTS="-Djava.library.path=/opt/hadoop/lib/"
为了 ls $ARROW_LIBHDFS_DIR
我得到了:
libhadoop.a libhadooppipes.a libhdfs.so libnativetask.so
libhadoop.so libhadooputils.a libhdfs.so.0.0.0 libnativetask.so.1.0.0
我的python代码:
import pandas as pd
pd.read_parquet('hdfs:///tmp/data/test.parquet', engine='pyarrow')
我得到的错误:
warn util.nativecodeloader:无法为您的平台加载本机hadoop库。。。在适用的情况下使用内置java类
hdfsGetPathInfo(hdfs:///tmp/data/test.parquet): getFileInfo error:
ClassCastException: org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto cannot be cast to com.google.protobuf.Messagejava.lang.ClassCastException: org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto cannot be cast to com.google.protobuf.Message
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:225)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:900)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1579)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1734)
hdfsGetPathInfo(hdfs:///tmp/data/test.parquet): getFileInfo error:
IllegalStateException: java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:129)
at org.apache.hadoop.ipc.Client.setCallIdAndRetryCount(Client.java:117)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:162)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1579)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1734)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parquet.py", line 296, in read_parquet
return impl.read(path, columns=columns,**kwargs)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parquet.py", line 125, in read
path, columns=columns,**kwargs
File "/usr/local/lib/python3.5/dist-packages/pyarrow/parquet.py", line 1544, in read_table
partitioning=partitioning)
File "/usr/local/lib/python3.5/dist-packages/pyarrow/parquet.py", line 1173, in __init__
open_file_func=partial(_open_dataset_file, self._metadata)
File "/usr/local/lib/python3.5/dist-packages/pyarrow/parquet.py", line 1368, in _make_manifest
.format(path))
OSError: Passed non-file path: hdfs:///tmp/data/test.parquet
暂无答案!
目前还没有任何答案,快来回答吧!