使用spark 2.3 thriftserver与hive 2.2.0集成。从Spark直线跑。尝试将数据插入配置单元hbase表(以hbase作为存储的配置单元表)。插入到配置单元本机表是可以的。在向配置单元hbase表中插入时,会引发以下异常:
classcastexception:org.apache.hadoop.hive.hbase.hivehbasetableoutputformat不能转换为org.apache.hadoop.hive.ql.io.hiveoutputformat
在org.apache.spark.scheduler.task.run(task。scala:109)在org.apache.spark.executor.executor$taskrunner.run(executor。scala:345)位于java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor。java:1149)在java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor。java:624)在java.lang.thread.run(线程。java:748)原因:java.lang.classcastexception:org.apache.hadoop.hive.hbase.hivehbasetableoutputformat不能转换为org.apache.hadoop.hive.ql.io.hiveoutputformat at org.apache.spark.sql.hive.execution.hivefileformat$$anon$1.outputformat$lzycompute(hivefileformat)。scala:93)k在写入行时失败。位于org.apache.spark.sql.execution.datasources.fileformatwriter$.org$apache$spark$sql$execution$datasources$fileformatwriter$$executetask(fileformatwriter)。scala:285)在org.apache.spark.sql.execution.datasources.fileformatwriter$$anonfun$write$1.apply(fileformatwriter。scala:197)在org.apache.spark.sql.execution.datasources.fileformatwriter$$anonfun$write$1.apply(fileformatwriter。scala:196)在org.apache.spark.scheduler.resulttask.runtask(resulttask。scala:87)在org.apache.spark.scheduler.task.run(task。scala:109)在org.apache.spark.executor.executor$taskrunner.run(executor。scala:345)在java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor。java:1149)在java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor。java:624)在java.lang.thread.run(线程。java:748)原因:java.lang.classcastexception:org.apache.hadoop.hive.hbase.hivehbasetableoutputformat无法转换为org.apache.hadoop.hive.ql.io.hiveoutputformat位于org.apache.spark.sql.hive.execution.hivefileformat$$anon$1.outputformat$lzycompute(hivefileformat)。scala:93)
1条答案
按热度按时间9w11ddsr1#
似乎是已知的Hive问题。有几个pull请求可以解决这个问题,但是我还没有找到一个实际的解决方案。
https://issues.apache.org/jira/browse/spark-6628