我使用下面的代码从hive导入数据,使用jdbc来spark。
final SparkConf sparkConf = new SparkConf().setAppName("HiveToSparkConnection");
final JavaSparkContext sc = new JavaSparkContext(sparkConf);
final SQLContext sqlContext = new SQLContext(sc);
System.out.println("Before loading jar");
sc.addJar("lib/hive-jdbc-2.3.2.jar");
System.out.println("After loading jar");
Map<String,String> dbConfig = new HashMap<>();
dbConfig.put("url" , "<connection-string>");
dbConfig.put("dbtable" , "<table-name>");
dbConfig.put("user" , "<user>");
dbConfig.put("password" , "<pass>");
dbConfig.put("driver" , "org.apache.hive.jdbc.HiveDriver");
dbConfig.put("fetchsize" , "20");
System.out.println("After connection");
DataFrameReader dFrameReader = sqlContext.read().format("jdbc").options(dbConfig);
Dataset<Row> dataSet = dFrameReader.load();
System.out.println("Print columns");
dataSet.show(5);
它与服务器连接正常。我也能正确地看到模式。但打电话的时候
System.out.println("Print columns");
dataSet.show(5);
我经常犯以下错误。即使尝试使用仅显示空行的字符串表。
java.sql.SQLException: Cannot convert column 1 to long: java.lang.NumberFormatException: For input string: "total_pkg.total"
at org.apache.hive.jdbc.HiveBaseResultSet.getLong(HiveBaseResultSet.java:372)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:426)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:425)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:347)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:329)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NumberFormatException: For input string: "total_pkg.total"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.valueOf(Long.java:803)
at org.apache.hive.jdbc.HiveBaseResultSet.getLong(HiveBaseResultSet.java:368)
我也尝试过节俭,但它没有连接到服务器。
暂无答案!
目前还没有任何答案,快来回答吧!