从pydoop访问hdfs集群

az31mfrm  于 2022-12-09  发布在  HDFS
关注(0)|答案(1)|浏览(172)

我有hdfs集群和python在同一个谷歌云平台上。我想从python访问存在于hdfs集群中的文件。我发现使用pydoop one可以做到这一点,但我正在努力给它正确的参数也许。下面是我迄今为止尝试过的代码:-

import pydoop.hdfs as hdfs
import pydoop

pydoop.hdfs.hdfs(host='url of the file system goes here',
                 port=9864, user=None, groups=None)

"""
 class pydoop.hdfs.hdfs(host='default', port=0, user=None, groups=None)

    A handle to an HDFS instance.

    Parameters

            host (str) – hostname or IP address of the HDFS NameNode. Set to an empty string (and port to 0) to connect to the local file system; set to 'default' (and port to 0) to connect to the default (i.e., the one defined in the Hadoop configuration files) file system.

            port (int) – the port on which the NameNode is listening

            user (str) – the Hadoop domain user name. Defaults to the current UNIX user. Note that, in MapReduce applications, since tasks are spawned by the JobTracker, the default user will be the one that started the JobTracker itself.

            groups (list) – ignored. Included for backwards compatibility.

"""

#print (hdfs.ls("/vs_co2_all_2019_v1.csv"))

它给出以下错误:-

RuntimeError: Hadoop config not found, try setting HADOOP_CONF_DIR

如果我执行这行代码:-

print (hdfs.ls("/vs_co2_all_2019_v1.csv"))

什么都没发生。但是这个“vs_co2_all_2019_v1.csv”文件确实存在于集群中,但在我截图时暂时不可用。
我的hdfs截图如下所示:

我拥有的凭据如下所示:

有人能告诉我我做错了什么吗?我需要把哪些凭证放在pydoop API的什么地方?或者也许有另一种更简单的方法来解决这个问题,任何帮助都将不胜感激!!

zbwhf8kr

zbwhf8kr1#

您是否尝试过以下方法?

import pydoop.hdfs as hdfs
import pydoop

hdfs_object = pydoop.hdfs.hdfs(host='url of the file system goes here',
                               port=9864, user=None, groups=None)
hdfs_object.list_directory("/vs_co2_all_2019_v1.csv")

或者简单地说:

hdfs_object.list_directory("/")

请记住,pydoop.hdfs模块与hdfs类(hdfs_object)没有直接关系。

相关问题