我在尝试将数据插入hbase时遇到问题。我正在googlecloudsparkshell上运行scala代码,并尝试将rdd中的数据插入hbase(bigtable)
hbaserdd的格式:--rdd[(string,map[string,string])]
string是行id,Map包含对应的列及其值。
代码如下:-
val tableName: String = "omniture";
val connection = BigtableConfiguration.connect("*******", "**********")
val admin = connection.getAdmin();
val table = connection.getTable(TableName.valueOf(tableName));
TRY 1 :
hbaseRDD.foreach{w =>
val put = new Put(Bytes.toBytes(w._1));
var ColumnValue = w._2
ColumnValue.foreach{x =>
put.addColumn(Bytes.toBytes("u"), Bytes.toBytes(x._1 ), Bytes.toBytes(x._2));
}
table.put(put);
}
TRY 2 :
hbaseRDD.map{w =>
val put = new Put(Bytes.toBytes(w._1));
var ColumnValue = w._2
ColumnValue.map{x =>
put.addColumn(Bytes.toBytes("u"), Bytes.toBytes(x._1 ), Bytes.toBytes(x._2));
}
table.put(put);
}
下面是我得到的错误:-
org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException: com.google.cloud.bigtable.hbase.BigtableTable
Serialization stack:
- object not serializable (class: com.google.cloud.bigtable.hbase.BigtableTable, value: BigtableTable{hashCode=0x7d96618, project=cdp-dev-201706-01, instance=cdp-dev-cl-hbase-instance, table=omniture, host=bigtable.googleapis.com})
- field (class: logic.ingestion.Ingestion$$anonfun$insertTransactionData$1, name: table$1, type: interface org.apache.hadoop.hbase.client.Table)
- object (class logic.ingestion.Ingestion$$anonfun$insertTransactionData$1, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
... 27 more
任何帮助都将不胜感激。提前谢谢。
1条答案
按热度按时间yshpjwxd1#
引用自:-通过spark写入hbase:任务不可序列化
下面是正确的方法:
在上面的链接中详细解释了