我在spark上运行简单查询时遇到了一个问题
select * from table_name
在Hive控制台上,每件事都很好地工作,但是当我执行
select count(*) from table_name
查询终止时出现以下错误:
Query ID = ab_20160515134700_795fc14c-e89b-4172-bcc6-0cfcffadcd88
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Spark Job = d5e1856e-de67-4e2d-a914-ca1aae324b7f
Status: SENT
Failed to execute spark task, with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
版本:
hadoop-2.7.2
apache-hive-2.0.0
spark-1.6.0-bin-hadoop2
scala: 2.11.8
我在hive-site.xml中设置了:spark.master,现在得到了:java.util.concurrent.executionexception:java.lang.runtimeexception:cancel client'8ffe7ea3-aaf4-456c-ae18-23c572a766c5'。错误:子进程在连接回io.netty.util.concurrent.abstractfuture.get(abstractfuture)之前退出。java:37)~[netty-all-4.0.23.最终。jar:4.0.23.final]在org.apache.hive.spark.client.sparkclientimpl.(sparkclientimpl。java:101)[配置单元执行-2.0.0。jar:2.0.0]在org.apache.hive.spark.client.sparkclientfactory.createclient(sparkclientfactory。java:80)[配置单元执行-2.0.0。jar:2.0.0]位于org.apache.hadoop.hive.ql.exec.spark.remotehivesparkclient.createremoteclient(remotehivesparkclient)。java:98)[配置单元执行-2.0.0。jar:2.0.0]在org.apache.hadoop.hive.ql.exec.spark.remotehivesparkclient.(远程hivesparkclient。java:94)[配置单元执行-2.0.0。jar:2.0.0]位于org.apache.hadoop.hive.ql.exec.spark.hivesparkclientfactory.createhivesparkclient(hivesparkclientfactory)。java:63)[配置单元执行-2.0.0。jar:2.0.0]在org.apache.hadoop.hive.ql.exec.spark.session.sparksessionimpl.open(sparksessionimpl。java:55)[配置单元执行-2.0.0。jar:2.0.0]在org.apache.hadoop.hive.ql.exec.spark.session.sparksessionmanagerimpl.getsession(sparksessionmanagerimpl。java:114)[配置单元执行-2.0.0。jar:2.0.0]在org.apache.hadoop.hive.ql.exec.spark.sparkutilities.getsparksession(sparkutilities。java:131)[配置单元执行-2.0.0。jar:2.0.0]在org.apache.hadoop.hive.ql.exec.spark.sparktask.execute(sparktask。java:106)[配置单元执行-2.0.0。jar:2.0.0]在org.apache.hadoop.hive.ql.exec.task.executetask(任务。java:158)[配置单元执行-2.0.0。jar:2.0.0]在org.apache.hadoop.hive.ql.exec.taskrunner.runsequential(taskrunner。java:101)[配置单元执行-2.0.0。jar:2.0.0]在org.apache.hadoop.hive.ql.driver.launchtask(driver。java:1840)[配置单元执行-2.0.0。jar:2.0.0]在org.apache.hadoop.hive.ql.driver.execute(driver。java:1584)[配置单元执行-2.0.0。jar:2.0.0]在org.apache.hadoop.hive.ql.driver.runinternal(驱动程序。java:1361)[配置单元执行-2.0.0。jar:2.0.0]在org.apache.hadoop.hive.ql.driver.run(driver。java:1184)[配置单元执行-2.0.0。jar:2.0.0]在org.apache.hadoop.hive.ql.driver.run(driver。java:1172)[配置单元执行-2.0.0。jar:2.0.0]在org.apache.hadoop.hive.cli.clidriver.processlocalcmd(clidriver。java:233) [配置单元cli-2.0.0。jar:2.0.0]在org.apache.hadoop.hive.cli.clidriver.processcmd(clidriver。java:184)[配置单元cli-2.0.0。jar:2.0.0]在org.apache.hadoop.hive.cli.clidriver.processline(clidriver。java:400)[配置单元cli-2.0.0。jar:2.0.0]在org.apache.hadoop.hive.cli.clidriver.executedriver(clidriver。java:778)[配置单元cli-2.0.0。jar:2.0.0]在org.apache.hadoop.hive.cli.clidriver.run(clidriver。java:717)[配置单元cli-2.0.0。jar:2.0.0]位于org.apache.hadoop.hive.cli.clidriver.main(clidriver。java:645)[配置单元cli-2.0.0。jar:2.0.0]在sun.reflect.nativemethodaccessorimpl.invoke0(本机方法)~[?:1.8.0\u 77]在sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl)。java:62) 在sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl。java:43)在java.lang.reflect.method.invoke(method。java:498)~[?:1.8.0\u 77]位于org.apache.hadoop.util.runjar.run(runjar。java:221)[spark-assembly-1.6.0-hadoop2.6.0。jar:1.6.0]在org.apache.hadoop.util.runjar.main(runjar。java:136) [spark-assembly-1.6.0-hadoop2.6.0。jar:1.6.0]原因:java.lang.runtimeexception:取消客户端“8ffe7ea3-aaf4-456c-ae18-23c572a766c5”。错误:子进程在连接回org.apache.hive.spark.client.rpc.rpcserver.cancelclient(rpcserver)之前退出。java:180)~[hive-exec-2.0.0.配置单元]。jar:2.0.0]在org.apache.hive.spark.client.sparkclientimpl$3.run(sparkclientimpl。java:450)~[hive-exec-2.0.0.配置单元]。jar:2.0.0]在java.lang.thread.run(线程。java:745) ~[?:1.8.0_77] 16/05/16 18:00:33 [驱动程序]:warn client.sparkclientimpl:子进程已退出,代码为1
我已构建spark 1.6.1和hive 2.0.0,因此错误已更改为:
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Iterable
at org.apache.hadoop.hive.ql.parse.spark.GenSparkProcContext.<init>(GenSparkProcContext.java:163)
at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.generateTaskTree(SparkCompiler.java:195)
at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:258)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10861)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:239)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:329)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1158)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1253)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: scala.collection.Iterable
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
1条答案
按热度按时间agxfikkp1#
我在Hive2.0.0和Spark1.6.1上讨论了与您相同的问题。如前所述,已经在issues.apache.org/jira/browse/hive-9970上讨论过。
话虽如此,对于Hive来说:
下载配置单元源程序包
在pom.xml上设置正确的hadoop/spark/tez版本
扩展maven的内存限制。我用
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
使用maven构建配置单元:mvn clean package -Pdist -DskipTests
结果:packaging/target/apache-hive-2.x.y-bin
. 配置hive-site.xml。对于spark:
下载spark源程序包
在pom.xml上设置正确的hadoop版本
不使用Hive创建spark
./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.6,parquet-provided"
结果dist/
. 配置spark-defaults.conf。因为您构建的spark没有hadoop,所以需要将hadoop包jars path包含到$spark\u dist\u classpath中。请参阅本文档页。此外,您还可以阅读《Spark上的Hive》指南作为参考。