更改parquet文件中的列数据类型

cwtwac6a  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(975)

我有一个指向s3位置(Parquet文件)的外部表,其中所有数据类型都是字符串。我想更正所有列的数据类型,而不是将所有内容都作为字符串读取。当我删除外部表并使用新数据类型重新创建时,select查询总是抛出错误,如下所示:

  1. java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary
  2. at org.apache.parquet.column.Dictionary.decodeToInt(Dictionary.java:48)
  3. at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getInt(OnHeapColumnVector.java:233)
  4. at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
  5. at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
  6. at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
  7. at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
  8. at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
  9. at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
  10. at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
cgh8pdjw

cgh8pdjw1#

将类型指定为bigint,它等效于long类型,配置单元没有long数据类型。

  1. hive> alter table table change col col bigint;

重复内容,来自hortonworks论坛

相关问题