将javapairrdd< immutablebyteswritable,result>转换为javardd< string>

tktrz96b  于 2021-06-08  发布在  Hbase
关注(0)|答案(1)|浏览(482)

我正在尝试使用apachespark从hbase读取数据。我只想扫描一个特定的列。我正在创建hbase数据的rdd,如下所示

SparkConf sparkConf = new SparkConf().setAppName("HBaseRead").setMaster("local[2]");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "localhost:2181");

String tableName = "myTable";

conf.set(TableInputFormat.INPUT_TABLE, tableName);
conf.set(TableInputFormat.SCAN_COLUMN_FAMILY, "myCol");

 JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD = sc.newAPIHadoopRDD(conf, TableInputFormat.class,
        ImmutableBytesWritable.class, Result.class);

我想在这里转换 JavaPairRDDJavaRDD 一根绳子。

JavaRDD<String> rdd = ...

我怎样才能做到这一点?

ndh0cuux

ndh0cuux1#

你可以得到 JavaRDD<String> 使用 map 功能如下。

import org.apache.spark.api.java.function.Function;
import org.apache.hadoop.hbase.util.Bytes;
import scala.Tuple2;

JavaRDD<String> javaRDD = javaPairRdd.map(new Function<Tuple2<ImmutableBytesWritable,Result>, String>() {
    @Override
    public String call(Tuple2<ImmutableBytesWritable, Result> tuple) throws Exception {
        Result result = tuple._2;
        String rowKey = Bytes.toString(result.getRow());//row key
        String fName = Bytes.toString(result.getValue(Bytes.toBytes("myColumnFamily"), Bytes.toBytes("firstName")));//firstName column 
        return fName;
    }       
});

相关问题