hbase mapreduce作业：所有列值都为空

我正在尝试用java从hbase数据库在表上创建map reduce作业。使用这里的例子和互联网上的其他东西，我成功地编写了一个简单的行计数器。但是，由于接收到的字节总是空的，因此尝试写入一个实际对列中的数据执行操作的字节是不成功的。
我的工作动力之一是：

/* Set main, map and reduce classes */
job.setJarByClass(Driver.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);

Scan scan = new Scan();
scan.setCaching(500);
scan.setCacheBlocks(false);

/* Get data only from the last 24h */
Timestamp timestamp = new Timestamp(System.currentTimeMillis());
try {
    long now = timestamp.getTime();
    scan.setTimeRange(now - 24 * 60 * 60 * 1000, now);
} catch (IOException e) {
    e.printStackTrace();
}

/* Initialize the initTableMapperJob */
TableMapReduceUtil.initTableMapperJob(
        "dnsr",
        scan,
        Map.class,
        Text.class,
        Text.class,
        job);

/* Set output parameters */
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(TextOutputFormat.class);

如你所见，这个表叫做 dnsr . 我的Map器如下所示：

@Override
    public void map(ImmutableBytesWritable row, Result value, Context context)
            throws InterruptedException, IOException {
        byte[] columnValue = value.getValue("d".getBytes(), "fqdn".getBytes());
        if (columnValue == null)
            return;

        byte[] firstSeen = value.getValue("d".getBytes(), "fs".getBytes());
        // if (firstSeen == null)
        //     return;

        String fqdn = new String(columnValue).toLowerCase();
        String fs = (firstSeen == null) ? "empty" : new String(firstSeen);

        context.write(new Text(fqdn), new Text(fs));
    }

注意事项：
中的列族 dnsr 这张table刚刚好 d . 有多个列，其中一些被调用 fqdn 以及 fs （第一次见到）；
即使 fqdn 值显示正确，fs始终是“空”字符串（我在出现一些错误后添加了此检查，这些错误表示不能将null转换为新字符串）；
如果我改变主意 fs 例如，列名中包含其他内容 ls （最后一次看到），它起作用了；
减速机什么也不做，只是输出它接收到的所有东西。
我用javascript创建了一个简单的表扫描器，它可以查询完全相同的表和列，我可以清楚地看到值在那里。使用命令行并手动执行查询，我可以清楚地看到 fs 值不为空，它们是稍后可以转换为字符串（表示日期）的字节。
有什么问题我总是空的？
谢谢！
更新：如果我得到一个特定列族中的所有列，我不会收到 fs . 但是，在javascript中实现的简单扫描器返回 fs 作为来自 dnsr table。

@Override
public void map(ImmutableBytesWritable row, Result value, Context context)
        throws InterruptedException, IOException {
    byte[] columnValue = value.getValue(columnFamily, fqdnColumnName);
    if (columnValue == null)
        return;
    String fqdn = new String(columnValue).toLowerCase();

    /* Getting all the columns */
    String[] cns = getColumnsInColumnFamily(value, "d");
    StringBuilder sb = new StringBuilder();
    for (String s : cns) {
        sb.append(s).append(";");
    }

    context.write(new Text(fqdn), new Text(sb.toString()));
}

我用这里的答案得到了所有的列名。

最后，我设法找到了“问题”。hbase是一种面向列的数据存储。在这里，数据是按列存储和检索的，因此，如果只需要一些数据，就可以只读取相关数据。每个列族都有一个或多个列限定符（列），并且每列都有多个单元格。有趣的是每个细胞都有自己的时间戳。
为什么这是个问题？好吧，当您进行范围搜索时，只返回时间戳在该范围内的单元格，因此您可能会得到一个包含“缺少单元格”的行。在我的例子中，我有一个dns记录和其他字段，比如 firstSeen 以及 lastSeen . lastSeen 是一个每次我看到域时都会更新的字段， firstSeen 在第一次出现后将保持不变。当我将范围Map缩小作业更改为一个简单的Map缩小作业（使用所有时间数据）时，一切都很好（但该作业需要更长的时间才能完成）。
干杯！

hbase mapreduce作业：所有列值都为空

1条答案

相关问题

热门标签

最新问答