hdfs中的随机读/写

disbfnqx 于 2021-06-03 发布在 Hadoop

关注(0)|答案(2)|浏览(628)

我读到随机读写是不存在的 Hadoop HDFS . 但是，写进去的理由 DFSOutputStream 是

void write(byte buf[], int off, int len)
void write(int b)

类似地，read-in的参数 DFSInputStream 是

int read(byte buf[], int off, int len)

int read()

在对的读/写调用中都可以看到offset参数 HDFS . 为什么需要它，如果 MapReduce 只用于在最后一个位置添加数据的框架？“offset”参数是如何使用的 HDFS ? hdfs写入总是只附加吗？

Java hadoop hdfs FileSystems

来源：https://stackoverflow.com/questions/19192059/random-read-writes-in-hdfs

2条答案

按热度按时间

dohp0rv51#

参数int off并不表示输入文件中的随机点。它实际上是字节[]内的偏移量，从字节[]内写入数据的位置到长度字节数。例如，假设你已经写了

byte buf[15];
read(buf, 5, 10);

这将从输入文件的开头读取数据，而不是从文件的第5个字节读取数据。但是数组buf[]将从第5个字节填充到最后一个字节（5+10）。
要进行交叉检查，可以对参数off使用一些不同的值。无论您为off提供什么值，数据总是从文件的开头读取（如果您没有显式使用seek）。
这里需要注意的一点是数组的大小不能小于off+len。
运行此示例以清楚地了解：

public class ReadHdfsFile {

    public static void main(String[] args) throws IOException {

        Configuration conf = new Configuration();
        conf.addResource(new Path("/Users/miqbal1/hadoop-eco/hadoop-1.1.2/conf/core-site.xml"));
        conf.addResource(new Path("/Users/miqbal1/hadoop-eco/hadoop-1.1.2/conf/hdfs-site.xml"));
        FileSystem fs = FileSystem.get(conf);
        FSDataInputStream in = fs.open(new Path("/demo.txt"));

        //Filling the array b1 from the 5th byte
        int charPos = 0;
        byte[] b1 = new byte[10];
        int bytesRead = in.read(b1, 5, 5);
        System.out.println("Bytes Read : " + bytesRead);
        String s = new String(b1, "UTF-8");
        System.out.println("Printing char by char(you'll see first 5 bytes as blank)...");
        for(char c : s.toCharArray()){
            System.out.println("Character " + ++charPos + " : " + c);

        }
        System.out.println();
        System.out.println("Changing offset value....");

        //Filling the array b2 from the 10th byte
        in.seek(0);
        charPos = 0;
        byte[] b2 = new byte[15];
        bytesRead = in.read(b2, 10, 5);
        System.out.println("Bytes Read : " + bytesRead);
        s = new String(b2, "UTF-8");
        System.out.println("Printing char by char(you'll see first 10 bytes as blank)...");
        for(char c : s.toCharArray()){
            System.out.println("Character " + ++charPos + " : " + c);
        }

        System.out.println("DONE!!!");
        in.close();
        fs.close();
    }
}

hth公司

赞(0）回复(0）举报 2021-06-03

vd8tlhqk2#

“bytesread=in.read（b2，10，5）；”只是fsdatainputstream的一个接口。read中的另一个接口（position、buffer、offset、len）支持随机读取。您还可以参考testdfsio random read案例。
hdfs不支持随机写入。

赞(0）回复(0）举报 2021-06-03