在hbase中查找具有空值的行数

ht4b089n  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(496)

我已经用rowid和与tweet相关的vrious信息填充了一个hbase表,如干净的文本、url、hashtag等,如下所示

  1. 902221655086211073 column=clean-tweet:clean-text-cta, timestamp=1514793745304, value=democrat mayor order hurricane harvey stand houston

然而,在填充时,我注意到有些行是空的,比如

  1. 902487280543305728 column=clean-tweet:clean-text-cta, timestamp=1514622371008, value=

现在我如何找到有数据的行数?
请帮帮我

hts6caw3

hts6caw31#

到目前为止,hbase shell中还没有这样做的规定。可能您可以使用这样一个简单的代码来获取一个没有值的记录数作为所提供的列限定符。
countandfilter[表名][列族][列限定符]

  1. import java.io.IOException;
  2. import org.apache.hadoop.hbase.HBaseConfiguration;
  3. import org.apache.hadoop.hbase.TableName;
  4. import org.apache.hadoop.hbase.client.Admin;
  5. import org.apache.hadoop.hbase.client.Connection;
  6. import org.apache.hadoop.hbase.client.ConnectionFactory;
  7. import org.apache.hadoop.hbase.client.Result;
  8. import org.apache.hadoop.hbase.client.ResultScanner;
  9. import org.apache.hadoop.hbase.client.Scan;
  10. import org.apache.hadoop.hbase.client.Table;
  11. import org.apache.hadoop.hbase.util.Bytes;
  12. public class CountAndFilter {
  13. private static Connection conn;
  14. private static int recordsWithoutValue = 0;
  15. public static Admin getConnection() throws IOException {
  16. if (conn == null) {
  17. conn = ConnectionFactory.createConnection(HBaseConfiguration.create());
  18. }
  19. return conn.getAdmin();
  20. }
  21. public static void main(String args[]) throws IOException {
  22. getConnection();
  23. scan(args[0], args[1], args[2]);
  24. System.out.println("Records with empty value : " + recordsWithoutValue);
  25. }
  26. public static void scan(String tableName, String columnFamily, String columnQualifier) throws IOException {
  27. Table table = conn.getTable(TableName.valueOf(tableName));
  28. ResultScanner rs = table.getScanner(new Scan().addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnQualifier)));
  29. Result res = null;
  30. try {
  31. while ((res = rs.next()) != null) {
  32. if (res.containsEmptyColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnQualifier))){
  33. recordsWithoutValue++;
  34. }
  35. }
  36. } finally {
  37. rs.close();
  38. }
  39. }
  40. }
展开查看全部

相关问题