将spark df存储到hbase

c9qzyr3d 于 2021-06-08 发布在 Hbase

关注(0)|答案(1)|浏览(442)

我试图以一种有效的方式将spark数据集存储到hbase。当我们尝试用java中的lambda做类似的事情时：

sparkDF.foreach(l->this.hBaseConnector.persistMappingToHBase(l,"name_of_hBaseTable") );

函数persistmappingtohbase使用hbase java客户机（put）存储在hbase中。

I get an exception: Exception in thread "main"  org.apache.spark.SparkException: Task not serializable

然后我们尝试了这个：

sparkDF.foreachPartition(partition -> {
    final HBaseConnector hBaseConnector = new HBaseConnector();
    hBaseConnector.connect(hbaseProps);
    while (partition.hasNext()) {
        hBaseConnector.persistMappingToHBase(partition.next());
    }
    hBaseConnector.closeConnection();
});

这似乎是工作，但似乎相当低效，我猜是因为我们创建并关闭了Dataframe的每一行的连接。
将spark ds存储到hbase的好方法是什么？我看到ibm开发的连接器，但从未使用过。

Java sql hbase apache-spark

来源：https://stackoverflow.com/questions/47507607/store-spark-df-to-hbase

1条答案

按热度按时间

wnavrhmk1#

以下内容可用于将内容保存到hbase

val hbaseConfig = HBaseConfiguration.create
hbaseConfig.set("hbase.zookeeper.quorum", "xx.xxx.xxx.xxx")
hbaseConfig.set("hbase.zookeeper.property.clientPort", "2181")
val job = Job.getInstance(hbaseConfig)
job.setOutputFormatClass(classOf[TableOutputFormat[_]])
job.getConfiguration.set(TableOutputFormat.OUTPUT_TABLE, "test_table")
val result = sparkDF.map(row -> {
    //  Using UUID as my rowkey, you can use your own rowkey
    val put = new Put(Bytes.toBytes(UUID.randomUUID().toString))
    //  setting the value of each row to Put object
    ....
    ....
    new Tuple2[ImmutableBytesWritable, Put](new ImmutableBytesWritable(), put)
});
//  save result to hbase table
result.saveAsNewAPIHadoopDataset(job.getConfiguration)

我的数据库中有以下依赖项 build.sbt 文件

libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.3.0"
libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.3.0"
libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.3.0"

展开查看全部

赞(0）回复(0）举报 2021-06-09

我来回答

将spark df存储到hbase

1条答案

相关问题

热门标签

最新问答