从java中删除hdfs文件夹

amrnrhlw 于 2021-06-03 发布在 Hadoop

关注(0)|答案(3)|浏览(543)

在运行在边缘节点上的java应用程序中，我需要删除hdfs文件夹（如果存在）。在运行mapreduce作业（使用spark）并在文件夹中输出之前，我需要这样做。
我发现我可以用这个方法

org.apache.hadoop.fs.FileUtil.fullyDelete(new File(url))

但是，我只能使用本地文件夹（即运行计算机上的文件url）。我试着用这样的方法：

url = "hdfs://hdfshost:port/the/folder/to/delete";

与 hdfs://hdfshost:port 作为hdfs namenode ipc。我把它用于mapreduce，所以它是正确的。但是它什么也做不了。
那么，我应该使用什么url，或者有其他方法吗？
注意：这里是一个简单的项目。

Java hadoop hdfs

来源：https://stackoverflow.com/questions/28767607/delete-hdfs-folder-from-java

3条答案

按热度按时间

bybem2ql1#

如果需要删除目录中的所有文件：
1）检查目录中有多少文件。
2）以后把它们全部删除

public void delete_archivos_dedirectorio() throws IOException {

//namenode= hdfs://ip + ":" + puerto 

            Path directorio = new Path(namenode + "//test//"); //nos situamos en la ruta//
            FileStatus[] fileStatus = hdfsFileSystem.listStatus(directorio); //listamos los archivos que hay actualmente en ese directorio antes de hacer nada
            int archivos_basura =  fileStatus.length; //vemos cuandoarchivos hay en el directorio antes de hacer nada, y luego iteramos hasta el nuemro de archivos que haya y llos vamos borrando para luego ir crandolos de nuevo en el writte.

            for (int numero = 0; numero <= archivos_basura ; numero++) {

                Path archivo = new Path(namenode + "//test//" + numero + ".txt");

                try {

                    if(hdfsFileSystem.exists(archivo)) {

                        try {
                            hdfsFileSystem.delete(archivo, true);
                        } catch (IOException ex) {
                            System.out.println(ex.getMessage());
                        }
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }

祝你好运：）

赞(0）回复(0）举报 2021-06-04

5lwkijsr2#

我是这样做的：

Configuration conf = new Configuration();
    conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
    conf.set("fs.file.impl",org.apache.hadoop.fs.LocalFileSystem.class.getName());
    FileSystem  hdfs = FileSystem.get(URI.create("hdfs://<namenode-hostname>:<port>"), conf);
    hdfs.delete("/path/to/your/file", isRecursive);

你不需要 hdfs://hdfshost:port/ 在文件路径中

赞(0）回复(0）举报 2021-06-04

svmlkihl3#

这对我有用。
只需在我的wordcount程序中添加以下代码即可：

import org.apache.hadoop.fs.*;

...
Configuration conf = new Configuration();

Path output = new Path("/the/folder/to/delete");
FileSystem hdfs = FileSystem.get(URI.create("hdfs://namenode:port"),conf);

// delete existing directory
if (hdfs.exists(output)) {
    hdfs.delete(output, true);
}

Job job = Job.getInstance(conf, "word count");
...

你需要加上 hdfs://hdfshost:port 显式获取分布式文件系统。否则代码将只适用于本地文件系统。

赞(0）回复(0）举报 2021-06-04

我来回答

从java中删除hdfs文件夹

3条答案

相关问题

热门标签

最新问答