spark上的配置单元：无法使用spark执行引擎对Parquet表运行查询但可以用tez和mr引擎进行查询

falq053o 于 2021-06-24 发布在 Hive

关注(0)|答案(0)|浏览(243)

hadoop软件version:hadoop-2.10.1
配置单元版本：hive-2.3.7
spark版本：spark-2.4.7
tez引擎：tez-0.9.2
我一直在用spark和tez执行引擎开发hive。我已经从dat文件创建了文本格式的表，然后用parquet从它们创建了新表

create table if not exists inventory(.......)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'  STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH '/home/hadoop/tpcds-kit/dats/inventory.dat' INTO TABLE inventory;

create table if not exists inventory_p(.....)
STORED AS PARQUET;

insert overwrite table inventory_p select * from inventory;

我能够使用mr和tez引擎在这些表上运行查询。但无法使用spark引擎运行查询。下面是错误。此错误仅发生在Parquet地板表中，而不会发生在使用文本文件创建的表中。

Query ID = hadoop_20201231231436_849a5c3e-6987-4fb5-be62-2a74258d77d4
Total jobs = 2
Launching Job 1 out of 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Spark Job = 7ce7367f-80a0-4bd3-8949-2acc1f7d6f96
Running with YARN Application = application_1609424642389_0009
Kill Command = /home/hadoop/hadoop/bin/yarn application -kill application_1609424642389_0009

Query Hive on Spark job[0] stages: [0]

Status: Running (Hive on Spark job[0])
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED
--------------------------------------------------------------------------------------
Stage-0                  0       PENDING      4          0        0        4       0
--------------------------------------------------------------------------------------
STAGES: 00/01    [>>--------------------------] 0%    ELAPSED TIME: 2.04 s
--------------------------------------------------------------------------------------
Job failed with java.lang.NoSuchMethodError: org.apache.parquet.column.values.ValuesReader.initFromPage(I[BI)V
20/12/31 23:14:52 [HiveServer2-Background-Pool: Thread-41]: ERROR SessionState: Job failed with java.lang.NoSuchMethodError: org.apache.parquet.column.values.ValuesReader.initFromPage(I[BI)V
java.util.concurrent.ExecutionException: Exception thrown by job
        at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:337)
        at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:342)
        at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
        at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Hive apache-spark apache-spark-sql parquet hiveql

来源：https://stackoverflow.com/questions/65524251/hive-on-spark-unable-to-run-query-on-parquet-tables-with-spark-execution-engine

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

spark上的配置单元：无法使用spark执行引擎对Parquet表运行查询但可以用tez和mr引擎进行查询

暂无答案！

相关问题

热门标签

最新问答