使用java列出hdfs的文件夹和文件

dluptydi  于 2021-06-03  发布在  Hadoop
关注(0)|答案(2)|浏览(788)

我试图用java列出hdfs中的所有目录和文件。

Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://ip address"), configuration);
FileStatus[] fileStatus = fs.listStatus(new Path("hdfs://ip address/user/uname/"));
Path[] paths = FileUtil.stat2Paths(fileStatus);
for(FileStatus status : fileStatus){
    System.out.println(status.getPath().toString());
}

我的代码能够生成fs对象,但在第3行卡住了,这里它试图读取文件夹和文件的文件。我在用aws。
请帮我解决这个问题。

fcwjkofz

fcwjkofz1#

检查以下使用递归或非递归方法获取文件列表的方法。要获取目录列表,可以更改代码,使其将目录路径添加到结果列表中,而不是文件中。请检查 fs.isDirectory() 代码中用于提取目录路径的if else子句。 FileStatus 班级也有 isDirectory( )方法检查 FileStatus 示例引用目录。

//helper method to get the list of files from the HDFS path
    public static List<String> 
        listFilesFromHDFSPath(Configuration hadoopConfiguration,
                              String hdfsPath,
                              boolean recursive) throws IOException, 
                                            IllegalArgumentException
    {
        //resulting list of files
        List<String> filePaths = new ArrayList<String>();

        //get path from string and then the filesystem
        Path path = new Path(hdfsPath);  //throws IllegalArgumentException
        FileSystem fs = path.getFileSystem(hadoopConfiguration);

        //if recursive approach is requested
        if(recursive)
        {
            //(heap issues with recursive approach) => using a queue
            Queue<Path> fileQueue = new LinkedList<Path>();

            //add the obtained path to the queue
            fileQueue.add(path);

            //while the fileQueue is not empty
            while (!fileQueue.isEmpty())
            {
                //get the file path from queue
                Path filePath = fileQueue.remove();

                //filePath refers to a file
                if (fs.isFile(filePath))
                {
                    filePaths.add(filePath.toString());
                }
                else   //else filePath refers to a directory
                {
                    //list paths in the directory and add to the queue
                    FileStatus[] fileStatuses = fs.listStatus(filePath);
                    for (FileStatus fileStatus : fileStatuses)
                    {
                        fileQueue.add(fileStatus.getPath());
                    } // for
                } // else

            } // while

        } // if
        else        //non-recursive approach => no heap overhead
        {
            //if the given hdfsPath is actually directory
            if(fs.isDirectory(path))
            {
                FileStatus[] fileStatuses = fs.listStatus(path);

                //loop all file statuses
                for(FileStatus fileStatus : fileStatuses)
                {
                    //if the given status is a file, then update the resulting list
                    if(fileStatus.isFile())
                        filePaths.add(fileStatus.getPath().toString());
                } // for
            } // if
            else        //it is a file then
            {
                //return the one and only file path to the resulting list
                filePaths.add(path.toString());
            } // else

        } // else

        //close filesystem; no more operations
        fs.close();

        //return the resulting list
        return filePaths;
    } // listFilesFromHDFSPath
azpvetkf

azpvetkf2#

这对我有用。。

public static void main(String[] args) throws IOException, URISyntaxException {
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(new URI("hdfs://localhost:9000/"), conf);
    FileStatus[] fileStatus = fs.listStatus(new Path("hdfs://localhost:9000/"));
    for(FileStatus status : fileStatus){
        System.out.println(status.getPath().toString());
    }
}

输出

hdfs://localhost:9000/All.txt
hdfs://localhost:9000/department.txt
hdfs://localhost:9000/emp.tsv
hdfs://localhost:9000/employee.txt
hdfs://localhost:9000/hbase

它认为你给出的uri不正确。试着按照守则去做。
如果未设置conf,则必须添加资源文件

conf.addResource(new Path("/home/kishore/BigData/hadoop/etc/hadoop/core-site.xml"));
conf.addResource(new Path("/home/kishore/BigData/hadoop/etc/hadoop/hdfs-site.xml"));

相关问题