一个比较textpair字节表示的rawcomarator

twh00eeo 于 2021-06-02 发布在 Hadoop

关注(0)|答案(2)|浏览(331)

书中有一节课hadoop:the definitive 指南：

public static class Comparator extends WritableComparator {
    private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();

    public Comparator() {
        super(TextPair.class);
    }

    @Override
    public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
        try {
            int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
            int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
            int cmp = TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
            if (cmp != 0) {
                return cmp;
            }
            return TEXT_COMPARATOR.compare(b1, s1 + firstL1, l1 - firstL1,b2, s2 + firstL2, l2 - firstL2);
        } catch (IOException e) {
            throw new IllegalArgumentException(e);
        }
    }
}

static {
    WritableComparator.define(TextPair.class, new Comparator());
}

我不明白的是：

int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);

如书中所述：
“这段代码的微妙部分是计算firstl1和firstl2，即每个字节流中第一个文本字段的长度。每个由可变长度整数的长度组成（由 decodeVIntSize() 在 WritableUtils )以及它所编码的值（由 readVInt() )."
从我的理解来看 WritableUtils.decodeVIntSize(b1[s1]) 只是第一个文本字段的长度（字节数）和表达式 readVInt(b1, s1) 是内容的领域，这是我感到困惑。有人能给我解释一下吗？提前谢谢。

hadoop IO

来源：https://stackoverflow.com/questions/38342416/a-rawcomparator-for-comparing-textpair-byte-representations

2条答案

按热度按时间

ruarlubt1#

请注意这里它使用的是vint而不是int，并且vint使用第一个字节来表示后面有多少字节。decodevintsize（b1[s1]）获取vint的字节数，而readvint（b1，s1）读取字符串的实际字节数。

赞(0）回复(0）举报 2021-06-02

ozxc1zmp2#

下面这样一个简单的测试将阐明：

public static void main(String[] args) throws Exception {
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    DataOutputStream outDat = new DataOutputStream(out);
    TextPair tp1 = new TextPair("Pig", "Li");
    Comparator cmp = new TextPair.Comparator();
    tp1.write(outDat);
    byte[] b1 = out.toByteArray();
    outDat.close();
    System.out.println(WritableUtils.decodeVIntSize(b1[0]));
    System.out.println(WritableComparator.readVInt(b1, 0)); }

输出为：
1
三
这是因为1代表可变长度头的长度（1字节），因为只需要1个字节就可以包含数字“3”，3代表文本的长度，即3个字符，因此是3个字节（所有三个字符都可以用一个字节表示）。

赞(0）回复(0）举报 2021-06-02

我来回答

一个比较textpair字节表示的rawcomarator

2条答案

相关问题

热门标签

最新问答