spark上的配置单元:无法使用spark执行引擎对Parquet表运行查询但可以用tez和mr引擎进行查询

falq053o  于 2021-06-24  发布在  Hive
关注(0)|答案(0)|浏览(228)

hadoop软件version:hadoop-2.10.1
配置单元版本:hive-2.3.7
spark版本:spark-2.4.7
tez引擎:tez-0.9.2
我一直在用spark和tez执行引擎开发hive。我已经从dat文件创建了文本格式的表,然后用parquet从它们创建了新表

create table if not exists inventory(.......)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'  STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH '/home/hadoop/tpcds-kit/dats/inventory.dat' INTO TABLE inventory;

create table if not exists inventory_p(.....)
STORED AS PARQUET;

insert overwrite table inventory_p select * from inventory;

我能够使用mr和tez引擎在这些表上运行查询。但无法使用spark引擎运行查询。下面是错误。此错误仅发生在Parquet地板表中,而不会发生在使用文本文件创建的表中。

Query ID = hadoop_20201231231436_849a5c3e-6987-4fb5-be62-2a74258d77d4
Total jobs = 2
Launching Job 1 out of 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Spark Job = 7ce7367f-80a0-4bd3-8949-2acc1f7d6f96
Running with YARN Application = application_1609424642389_0009
Kill Command = /home/hadoop/hadoop/bin/yarn application -kill application_1609424642389_0009

Query Hive on Spark job[0] stages: [0]

Status: Running (Hive on Spark job[0])
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED
--------------------------------------------------------------------------------------
Stage-0                  0       PENDING      4          0        0        4       0
--------------------------------------------------------------------------------------
STAGES: 00/01    [>>--------------------------] 0%    ELAPSED TIME: 2.04 s
--------------------------------------------------------------------------------------
Job failed with java.lang.NoSuchMethodError: org.apache.parquet.column.values.ValuesReader.initFromPage(I[BI)V
20/12/31 23:14:52 [HiveServer2-Background-Pool: Thread-41]: ERROR SessionState: Job failed with java.lang.NoSuchMethodError: org.apache.parquet.column.values.ValuesReader.initFromPage(I[BI)V
java.util.concurrent.ExecutionException: Exception thrown by job
        at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:337)
        at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:342)
        at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
        at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题