spark sql读取配置单元表失败

bvjxkvbb  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(479)

我想通过hivejdbc连接将整个配置单元表加载到spark内存中。并且已经在我的项目中添加了hive-site.xml、hdfs-site.xml。spark已连接配置单元,因为已成功获取列名(例如role\u id)。但是spark似乎将列名作为数据加载,并抛出一个异常。这是我的密码:

val df = spark.read.format("jdbc")
  .option("driver", CommonUtils.HIVE_DIRVER)
  .option("url", CommonUtils.HIVE_URL)
  .option("dbtable", "datasource_test.t_leave_map_base")
  .option("header", "true")
  .option("user", CommonUtils.HIVE_PASSWORD)
  .option("password", CommonUtils.HIVE_PASSWORD)
  .option("fetchsize", "20")
  .load()
df.registerTempTable("t_leave_map_base")
df.persist(StorageLevel.MEMORY_ONLY)
df.show()
df

获取错误:
java.lang.numberformatexception:对于输入字符串:“t\u leave\u map\u base.role\u id”位于java.lang.numberformatexception.forinputstring(numberformatexception)。java:65)~[na:1.8.0\u 25]在java.lang.long.parselong(long。java:589)~[na:1.8.0\u 25]在java.lang.long.valueof(long。java:803)~(na:1.8.0μ25)在org.apache.hive.jdbc.hivebaseresultset.getlong(hivebaseresultset。java:366)~[hive-jdbc-1.1.0-cdh5.12.0。jar:1.1.0-cdh5.12.0]位于org.apache.spark.sql.execution.datasources.jdbc.jdbcutils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$jdbcutils$$makegetter$8.apply(jdbcutils)。scala:409)~[spark-sql\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.sql.execution.datasources.jdbc.jdbcutils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$jdbcutils$$makegetter$8.apply(jdbcutils)。scala:408)~[spark-sql\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.sql.execution.datasources.jdbc.jdbcutils$$anon$1.getnext(jdbcutils。scala:330) ~[spark-sql\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.sql.execution.datasources.jdbc.jdbcutils$$anon$1.getnext(jdbcutils。scala:312)~[spark-sql\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.util.nextiterator.hasnext(nextiterator。scala:73)~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.interruptibleiterator.hasnext(interruptibleiterator。scala:37)~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]位于org.apache.spark.util.completioniterator.hasnext(completioniterator。scala:32)~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.sql.catalyst.expressions.generatedclass$generateEditor.processnext(未知源)~[na:na]位于org.apache.spark.sql.execution.bufferedrowiterator.hasnext(bufferedrowiterator)。java:43)~[spark-sql\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.sql.execution.whitestagecodegenexec$$anonfun$8$$anon$1.hasnext(whitestagecodegenexec。scala:395)~[spark-sql\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]位于org.apache.spark.sql.execution.columnar.inmemoryrelation$$anonfun$1$$anon$1.hasnext(inmemoryrelation)。scala:133)~[spark-sql\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.storage.memory.memorystore.putiteratorasvalues(内存存储)。scala:215)~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.storage.blockmanager$$anonfun$doputiterator$1.apply(blockmanager。scala:1038)~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.storage.blockmanager$$anonfun$doputiterator$1.apply(blockmanager。scala:1029)~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.storage.blockmanager.doput(blockmanager。scala:969)~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.storage.blockmanager.doputierator(blockmanager。scala:1029)~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.storage.blockmanager.getorelseupdate(blockmanager。scala:760)~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.rdd.rdd.getorcompute(rdd。scala:334) ~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.rdd.rdd.iterator(rdd。scala:285)~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrdd。scala:38)~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd。scala:323)~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.rdd.rdd.iterator(rdd。scala:287)~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrdd。scala:38) ~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd。scala:323)~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.rdd.rdd.iterator(rdd。scala:287)~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.scheduler.resulttask.runtask(resulttask。scala:87)~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.scheduler.task.run(task。scala:108)~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]在org.apache.spark.executor.executor$taskrunner.run(executor。scala:338) ~[spark-core\ 2.11-2.2.0.cloudera2。jar:2.2.0.cloudera2]位于java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor。java:1142)~[na:1.8.0\u 25]位于java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor。java:617)~[na:1.8.0\u25]在java.lang.thread.run(thread。java:745)[na:1.8.0掼25]
调试项目,所有fetchedrows都是列的名称:

我想知道sparksql是否支持这种方式加载配置单元表?

2vuwiymt

2vuwiymt1#

您可以尝试一个简单的练习,看看spark.sql是否正在从配置单元获取数据。通常,我所理解的是jdbc不是从spark连接到hive的方式。
配置spark-env.sh参数以确保spark使用元存储信息与配置单元通信。
打开机器中的Spark壳。
在spark shell中,使用如下语句

spark.sql("use <hive_db_name>");
   val df = spark.sql("select count(1) from table");
   df.show();
pieyvz9o

pieyvz9o2#

我看到这个问题有各种各样的说法。
spark不使用jdbc访问配置单元。它位于内置的hadoop/hdfs域中。
spark可能会使用jdbc for impala访问kudu表,因为kudu的安全性太粗糙了。你可以用 Impala 来做Hive,但你为什么要这么做?

相关问题