如何在spark中正确提取hive表中的array< bigint>?

gz5pxeao  于 2021-06-26  发布在  Hive
关注(0)|答案(0)|浏览(1180)

我有一个配置单元表,它有一列(c4)和 array<bigint> 类型。现在,我想用spark提取这个列。下面是代码片段:

  1. val query = """select c1, c2, c3, c4 from
  2. some_table where some_condition"""
  3. val rddHive = hiveContext.sql(query).rdd.map{ row =>
  4. //is there any other ways to extract wid_list(String here seems not work)
  5. //no compile error and no runtime error
  6. val w = if (row.isNullAt(3)) List() else row.getAs[scala.collection.mutable.WrappedArray[String]]("wid_list").toList
  7. w
  8. }
  9. -> rddHive: org.apache.spark.rdd.RDD[List[String]] = MapPartitionsRDD[7] at map at <console>:32
  10. rddHive.map(x => x(0).getClass.getSimpleName).take(1)
  11. -> Array[String] = Array[Long]

所以,我用 getAs[scala.collection.mutable.WrappedArray[String]] ,而原始数据类型为 array<int> . 但是,没有编译错误或运行时错误。我提取的数据仍然是bigint(long)类型。那么,这里发生了什么(为什么没有编译器错误或运行时错误)?正确的提取方法是什么 array<int> 作为 List[String] Spark?
====================添加更多信息====================

  1. hiveContext.sql(query).printSchema
  2. root
  3. |-- c1: string (nullable = true)
  4. |-- c2: integer (nullable = true)
  5. |-- c3: string (nullable = true)
  6. |-- c4: array (nullable = true)
  7. | |-- element: long (containsNull = true)
  8. hiveContext.sql(query).show(3)
  9. +--------+----+----------------+--------------------+
  10. | c1| c2| c3| c4|
  11. +--------+----+----------------+--------------------+
  12. | c1111| 1|5511798399.22222|[21772244666, 111...|
  13. | c1112| 1|5511798399.88888|[11111111, 111111...|
  14. | c1113| 2| 5555117114.3333|[77777777777, 112...|

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题