对rdd[(longwritable)，(jsonobject)执行操作

ny6fqffe 于 2021-05-29 发布在 Hadoop

关注(0)|答案(0)|浏览(260)

我的任务基本上是：
使用spark/scala从googlecloudbigquery读取数据。
对数据执行一些操作（如更新）。
将数据写回bigquery
到目前为止，我可以使用 newAPIHadoopRDD() 它回来了 RDD[(LongWritable, JsonObject)] .

tableData.map(entry => (entry._1.toString(),entry._2.toString()))
  .take(10)
  .foreach(println)

下面是样本数据，

(341,{"id":"4","name":"Shahu","score":"100"})

我不知道我应该在这个rdd上使用什么功能来满足需求。
我需要将这个rdd转换成dataframe/dataset/json格式吗？怎么做？

hadoop scala apache-spark google-bigquery google-cloud-dataproc

来源：https://stackoverflow.com/questions/42786642/performing-operations-on-rdd-longwritable-jsonobject

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

对rdd[(longwritable)，(jsonobject)执行操作

暂无答案！

相关问题

热门标签

最新问答