我试图通过java程序访问配置单元表,但我的程序似乎在默认数据库中看不到任何表。但是,我可以看到相同的表并通过sparkshell查询它们。我已经在spark conf目录中复制了hive-site.xml。唯一不同的是,spark shell运行的是sparkversion1.6.0,而我的java程序运行的是spark2.1.0
package spark_210_test;
import java.util.List;
import org.apache.spark.SparkConf;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public class SparkTest {
private static SparkConf sparkConf;
private static SparkSession sparkSession;
public static void main(String[] args) {
String warehouseLocation = "hdfs://quickstart.cloudera/user/hive/warehouse/";
sparkConf = new SparkConf().setAppName("Hive Test").setMaster("local[*]")
.set("spark.sql.warehouse.dir", warehouseLocation);
sparkSession = SparkSession
.builder()
.config(sparkConf)
.enableHiveSupport()
.getOrCreate();
Dataset<Row> df0 = sparkSession.sql("show tables");
List<Row> currentTablesList = df0.collectAsList();
if (currentTablesList.size() > 0) {
for (int i=0; i< currentTablesList.size(); i++) {
String table = currentTablesList.get(i).getAs("name");
System.out.printf("%s, ", table);
}
}
else System.out.printf("No Table found for %s.\n", warehouseLocation);
Dataset<Row> dfCount = sparkSession.sql("select count(*) from sample_07");
System.out.println(dfCount.collect().toString());
}
}
输出似乎没有从线程“main”org.apache.spark.sql.analysisexception中的配置单元仓库异常中读取任何内容:未找到表或视图:sample\u 07;1号线位置21
整个输出如下所示
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/cloudera/workspace/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/cloudera/workspace/PortalHandlerTest.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/cloudera/workspace/SparkTest.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/cloudera/workspace/JARs/slf4j-log4j12-1.7.22.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/04/25 12:01:51 INFO SparkContext: Running Spark version 2.1.0
17/04/25 12:01:51 WARN SparkContext: Support for Java 7 is deprecated as of Spark 2.0.0
17/04/25 12:01:51 WARN SparkContext: Support for Scala 2.10 is deprecated as of Spark 2.1.0
17/04/25 12:01:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/25 12:01:52 INFO SecurityManager: Changing view acls to: cloudera
17/04/25 12:01:52 INFO SecurityManager: Changing modify acls to: cloudera
17/04/25 12:01:52 INFO SecurityManager: Changing view acls groups to:
17/04/25 12:01:52 INFO SecurityManager: Changing modify acls groups to:
17/04/25 12:01:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); groups with view permissions: Set(); users with modify permissions: Set(cloudera); groups with modify permissions: Set()
17/04/25 12:01:53 INFO Utils: Successfully started service 'sparkDriver' on port 50644.
17/04/25 12:01:53 INFO SparkEnv: Registering MapOutputTracker
17/04/25 12:01:53 INFO SparkEnv: Registering BlockManagerMaster
17/04/25 12:01:53 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/04/25 12:01:53 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/04/25 12:01:53 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-f44e093c-d9a9-42ad-8f5f-9e21b99f0e45
17/04/25 12:01:53 INFO MemoryStore: MemoryStore started with capacity 375.7 MB
17/04/25 12:01:53 INFO SparkEnv: Registering OutputCommitCoordinator
17/04/25 12:01:54 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
17/04/25 12:01:54 INFO Utils: Successfully started service 'SparkUI' on port 4041.
17/04/25 12:01:54 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4041
17/04/25 12:01:54 INFO Executor: Starting executor ID driver on host localhost
17/04/25 12:01:54 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43409.
17/04/25 12:01:54 INFO NettyBlockTransferService: Server created on 10.0.2.15:43409
17/04/25 12:01:54 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/04/25 12:01:54 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 43409, None)
17/04/25 12:01:54 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:43409 with 375.7 MB RAM, BlockManagerId(driver, 10.0.2.15, 43409, None)
17/04/25 12:01:54 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 43409, None)
17/04/25 12:01:54 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.2.15, 43409, None)
17/04/25 12:01:54 INFO SharedState: Warehouse path is 'hdfs://quickstart.cloudera/user/hive/warehouse/'.
17/04/25 12:01:54 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
17/04/25 12:01:55 INFO deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
17/04/25 12:01:55 INFO deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
17/04/25 12:01:55 INFO deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
17/04/25 12:01:55 INFO deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
17/04/25 12:01:55 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
17/04/25 12:01:55 INFO deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
17/04/25 12:01:55 INFO deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
17/04/25 12:01:55 INFO deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
17/04/25 12:01:57 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
17/04/25 12:01:57 INFO ObjectStore: ObjectStore, initialize called
17/04/25 12:01:57 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
17/04/25 12:01:57 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
17/04/25 12:02:01 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
17/04/25 12:02:04 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/04/25 12:02:04 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/04/25 12:02:04 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/04/25 12:02:04 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/04/25 12:02:05 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
17/04/25 12:02:05 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
17/04/25 12:02:05 INFO ObjectStore: Initialized ObjectStore
17/04/25 12:02:05 INFO HiveMetaStore: Added admin role in metastore
17/04/25 12:02:05 INFO HiveMetaStore: Added public role in metastore
17/04/25 12:02:05 INFO HiveMetaStore: No user is added in admin role, since config is empty
17/04/25 12:02:06 INFO HiveMetaStore: 0: get_all_databases
17/04/25 12:02:06 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_all_databases
17/04/25 12:02:06 INFO HiveMetaStore: 0: get_functions: db=default pat=*
17/04/25 12:02:06 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_functions: db=default pat=*
17/04/25 12:02:06 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
17/04/25 12:02:07 INFO SessionState: Created local directory: /tmp/135d2e8d-2300-4f62-b445-ec6e8b0461a7_resources
17/04/25 12:02:07 INFO SessionState: Created HDFS directory: /tmp/hive/cloudera/135d2e8d-2300-4f62-b445-ec6e8b0461a7
17/04/25 12:02:07 INFO SessionState: Created local directory: /tmp/cloudera/135d2e8d-2300-4f62-b445-ec6e8b0461a7
17/04/25 12:02:07 INFO SessionState: Created HDFS directory: /tmp/hive/cloudera/135d2e8d-2300-4f62-b445-ec6e8b0461a7/_tmp_space.db
17/04/25 12:02:07 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is hdfs://quickstart.cloudera/user/hive/warehouse/
17/04/25 12:02:07 INFO HiveMetaStore: 0: get_database: default
17/04/25 12:02:07 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_database: default
17/04/25 12:02:07 INFO HiveMetaStore: 0: get_database: global_temp
17/04/25 12:02:07 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_database: global_temp
17/04/25 12:02:07 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
17/04/25 12:02:08 INFO SparkSqlParser: Parsing command: show tables
17/04/25 12:02:12 INFO HiveMetaStore: 0: get_database: default
17/04/25 12:02:12 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_database: default
17/04/25 12:02:12 INFO HiveMetaStore: 0: get_database: default
17/04/25 12:02:12 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_database: default
17/04/25 12:02:12 INFO HiveMetaStore: 0: get_tables: db=default pat=*
17/04/25 12:02:12 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_tables: db=default pat=*
No Table found for hdfs://quickstart.cloudera/user/hive/warehouse/.
17/04/25 12:02:13 INFO SparkSqlParser: Parsing command: select count(*) from sample_07
17/04/25 12:02:13 INFO HiveMetaStore: 0: get_table : db=default tbl=sample_07
17/04/25 12:02:13 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_table : db=default tbl=sample_07
17/04/25 12:02:13 INFO HiveMetaStore: 0: get_table : db=default tbl=sample_07
17/04/25 12:02:13 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_table : db=default tbl=sample_07
Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or view not found: sample_07; line 1 pos 21
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:459)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:478)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:463)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:60)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:58)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:58)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:58)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:463)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:453)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:64)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:62)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
at spark_210_test.SparkTest.main(SparkTest.java:35)
17/04/25 12:02:13 INFO SparkContext: Invoking stop() from shutdown hook
17/04/25 12:02:13 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4041
17/04/25 12:02:13 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/04/25 12:02:13 INFO MemoryStore: MemoryStore cleared
17/04/25 12:02:13 INFO BlockManager: BlockManager stopped
17/04/25 12:02:13 INFO BlockManagerMaster: BlockManagerMaster stopped
17/04/25 12:02:13 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/04/25 12:02:13 INFO SparkContext: Successfully stopped SparkContext
17/04/25 12:02:13 INFO ShutdownHookManager: Shutdown hook called
17/04/25 12:02:14 INFO ShutdownHookManager: Deleting directory /tmp/spark-7c1cfc73-34b9-463d-b12a-5cbcb832b0f8
为了以防万一,我的pom.xml在下面
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>spark_test_210</groupId>
<artifactId>spark_test_210</artifactId>
<version>0.0.1-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>2.1.0</version>
</dependency></dependencies>
<build>
<sourceDirectory>src</sourceDirectory>
</build>
</project>
任何帮助都将不胜感激
2条答案
按热度按时间t1rydlwq1#
需要采取几个步骤。
使用sparksession.enablehivesupport()而不是不推荐使用的sqlcontext或hivecontext。
将hive-site.xml复制到spark-conf(/usr/lib/spark/conf)目录中
在执行jar时将相同的目录添加到类路径(感谢上面的paul和samson)
xqk2d5yq2#
按照joydeph所说的上述步骤,在表名或视图名的前面加上数据库名。比如employee.emp