sparksql无法从配置单元中的orc表中读取特定列

qlfbtfca  于 2021-06-26  发布在  Hive
关注(0)|答案(0)|浏览(284)

我正在使用 SparkSQL 2.1.1 从兽人的table上读 Hive 1.2.1 存储在谷歌云存储中。我可以成功地选择除一列(这里称为 col1 )类型 smallint . 如果我用这个代码选择特定的列

val hc = new org.apache.spark.sql.hive.HiveContext(sc)
val result = hc.sql("SELECT col1 FROM table")
result.collect().foreach(println)

除此之外,它将失败:
org.apache.spark.sparkexception:由于阶段失败而中止作业:阶段24.0中的任务0失败了4次,最近的失败:阶段24.0中的任务0.3丢失(tid 378,executor 42):java.lang.classcastexception:org.apache.hadoop.io.intwritable不能强制转换到org.apache.hadoop.hive.serde2.io.shortwritable的org.apache.hadoop.hive.serde2.objectinspector.primitive.writableshortobjectinspector.get(writableshortobjectinspector)。java:36)在org.apache.spark.sql.hive.hadooptablereader$$anonfun$14$$anonfun$apply$4.apply(tablereader)。scala:390)在org.apache.spark.sql.hive.hadooptablereader$$anonfun$14$$anonfun$apply$4.apply(tablereader。scala:390)在org.apache.spark.sql.hive.hadooptablereader$$anonfun$fillobject$2.apply(tablereader。scala:435)在org.apache.spark.sql.hive.hadooptablereader$$anonfun$fillobject$2.apply(tablereader。scala:426)在scala.collection.iterator$$anon$11.next(iterator。scala:409)在scala.collection.iterator$$anon$11.next(iterator。scala:409)在org.apache.spark.sql.execution.sparkplan$$anonfun$2.apply(sparkplan。scala:232)在org.apache.spark.sql.execution.sparkplan$$anonfun$2.apply(sparkplan。scala:225)在org.apache.spark.rdd.rdd$$anonfun$mappartitionsinternal$1$$anonfun$apply$25.apply(rdd。scala:827)在org.apache.spark.rdd.rdd$$anonfun$mappartitionsinternal$1$$anonfun$apply$25.apply(rdd。scala:827)在org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrdd。scala:38)在org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd。scala:323)在org.apache.spark.rdd.rdd.iterator(rdd。scala:287)在org.apache.spark.scheduler.resulttask.runtask(resulttask。scala:87)在org.apache.spark.scheduler.task.run(task。scala:99)在org.apache.spark.executor.executor$taskrunner.run(executor。scala:322)位于java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor。java:1142)在java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor。java:617)在java.lang.thread.run(线程。java:748)
我已经试过把那个专栏改成 short 没有成功

val hc = new org.apache.spark.sql.hive.HiveContext(sc)
val result = hc.sql("SELECT cast(col1 as short) FROM table")
result.collect().foreach(println)

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题