从rdd插入数据时发生hbase序列化错误

qnyhuwrf 于 2021-06-08 发布在 Hbase

关注(0)|答案(1)|浏览(476)

我在尝试将数据插入hbase时遇到问题。我正在googlecloudsparkshell上运行scala代码，并尝试将rdd中的数据插入hbase（bigtable）
hbaserdd的格式：--rdd[（string，map[string，string]）]
string是行id，Map包含对应的列及其值。
代码如下：-

val tableName: String = "omniture";

val connection = BigtableConfiguration.connect("*******", "**********")   
val admin = connection.getAdmin();
val table = connection.getTable(TableName.valueOf(tableName));

TRY 1 : 
  hbaseRDD.foreach{w => 

         val put = new Put(Bytes.toBytes(w._1));
         var ColumnValue = w._2

         ColumnValue.foreach{x =>       

         put.addColumn(Bytes.toBytes("u"), Bytes.toBytes(x._1 ), Bytes.toBytes(x._2));

                             }
         table.put(put);

      }      

TRY 2 : 
        hbaseRDD.map{w => 

        val put = new Put(Bytes.toBytes(w._1));
        var ColumnValue = w._2

        ColumnValue.map{x =>       

        put.addColumn(Bytes.toBytes("u"), Bytes.toBytes(x._1 ), Bytes.toBytes(x._2));

                             }
         table.put(put);

      }

下面是我得到的错误：-

org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException: com.google.cloud.bigtable.hbase.BigtableTable
Serialization stack:
        - object not serializable (class: com.google.cloud.bigtable.hbase.BigtableTable, value: BigtableTable{hashCode=0x7d96618, project=cdp-dev-201706-01, instance=cdp-dev-cl-hbase-instance, table=omniture, host=bigtable.googleapis.com})
        - field (class: logic.ingestion.Ingestion$$anonfun$insertTransactionData$1, name: table$1, type: interface org.apache.hadoop.hbase.client.Table)
        - object (class logic.ingestion.Ingestion$$anonfun$insertTransactionData$1, <function1>)
        at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
        at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
        at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
        at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
        ... 27 more

任何帮助都将不胜感激。提前谢谢。

hbase scala apache-spark google-cloud-datastore google-app-engine

来源：https://stackoverflow.com/questions/45166473/hbase-serialization-error-while-inserting-data-from-rdd

1条答案

按热度按时间

yshpjwxd1#

引用自：-通过spark写入hbase:任务不可序列化
下面是正确的方法：

hbaseRDD.foreachPartition {w => 

          val tableName: String = "omniture";

          val connection = BigtableConfiguration.connect("cdp-dev-201706-01", "cdp-dev-cl-hbase-instance")   
          val admin = connection.getAdmin();

          val table = connection.getTable(TableName.valueOf(tableName));

          w.foreach {f=> 

            var put = new Put(Bytes.toBytes(f._1))

            var  ColumnValue = f._2
                 ColumnValue.foreach{x =>       
                      put.addColumn(Bytes.toBytes("u"), Bytes.toBytes(x._1 ), Bytes.toBytes(x._2));
                                }
             table.put(put);
          }

      }    

        hbaseRDD.collect();

在上面的链接中详细解释了

赞(0）回复(0）举报 2021-06-09

我来回答

从rdd插入数据时发生hbase序列化错误

1条答案

相关问题

热门标签

最新问答