无法使用spark读取配置单元分区

webghufk 于 2021-06-26 发布在 Hive

关注(0)|答案(0)|浏览(247)

我有一个分区的hive orc表，我试图用这个查询在spark中读取它

spark.sql("select count(*) from test.puid_tuid where date = '20170316'").show

但是得到这个错误
原因：java.io.filenotfoundexception:文件hdfs://localhost：8020/hive/warehouse/test.db/puid\u tuid/date=20170316不存在。在org.apache.hadoop.hdfs.distributedfilesystem$dirlistingiterator.（distributedfilesystem。java:948)在org.apache.hadoop.hdfs.distributedfilesystem$dirlistingiterator.（distributedfilesystem。java:927)在org.apache.hadoop.hdfs.distributedfilesystem$19.docall（distributedfilesystem。java:872)在org.apache.hadoop.hdfs.distributedfilesystem$19.docall（distributedfilesystem。java:868)在org.apache.hadoop.fs.filesystemlinkresolver.resolve（filesystemlinkresolver。java:81)在org.apache.hadoop.hdfs.distributedfilesystem.listlocatedstatus（distributedfilesystem。java:886)在org.apache.hadoop.fs.filesystem.listlocatedstatus（filesystem。java:1696)在org.apache.hadoop.hive.shimmes.hadoop23shimmes.listlocatedstatus（hadoop23shimmes）。java:667)位于org.apache.hadoop.hive.ql.io.acidutils.getacidstate（acidutils。java:361)在org.apache.hadoop.hive.ql.io.orc.orcinputformat$filegenerator.call（orcinputformat。java:634)在org.apache.hadoop.hive.ql.io.orc.orcinputformat$filegenerator.call（orcinputformat。java:620) 在java.util.concurrent.futuretask.run（futuretask。java:266)位于java.util.concurrent.threadpoolexecutor.runworker（threadpoolexecutor。java:1142)在java.util.concurrent.threadpoolexecutor$worker.run（threadpoolexecutor。java:617)在java.lang.thread.run（线程。java:745)
所以我在我的hdfs中检查了这个路径，它不在那里。
然后，我在配置单元中执行了相同的查询，该查询以零记录结束。
我还列出了所有配置单元分区，它包含相同的分区。
spark中有没有什么方法可以忽略hdfs中没有的所有分区？

Hive hdfs apache-spark apache-spark-sql

来源：https://stackoverflow.com/questions/44679551/not-able-to-read-hive-partitions-using-spark

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

无法使用spark读取配置单元分区

暂无答案！

相关问题

热门标签

最新问答