为什么我不能使用apachespark连接hive元存储?

wkftcu5l  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(347)

我正试图在java程序的帮助下使用apachespark连接apache配置单元。程序如下:

import org.apache.spark.sql.SparkSession;

public class queryhive {

    public static void main(String[] args)
{
    String warehouseLocation = "spark-warehouse";

    SparkSession spark = SparkSession
            .builder()
            .appName("Java Spark Hive Example")
            .master("local[*]")
            .config("spark.sql.warehouse.dir", warehouseLocation)
            .enableHiveSupport()
            .getOrCreate();
try
{
      spark.sql("select count(*) from heath1").show();
}
catch (Exception AnalysisException)
{
    System.out.print("\nTable is not found\n");
}
}
}

我在maven pom.xml中添加了:hdfs的地址和 <properties> 标签。
我想使用spark查询配置单元表。但是我看不到表,因为我得到了表未找到的例外:

log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/02/18 11:30:56 INFO SparkContext: Running Spark version 2.1.0
17/02/18 11:30:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/02/18 11:30:56 WARN Utils: Your hostname, aims resolves to a loopback address: 127.0.1.1; using 10.0.0.3 instead (on interface wlp2s0)
17/02/18 11:30:56 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/02/18 11:30:56 INFO SecurityManager: Changing view acls to: aims
17/02/18 11:30:56 INFO SecurityManager: Changing modify acls to: aims
17/02/18 11:30:56 INFO SecurityManager: Changing view acls groups to: 
17/02/18 11:30:56 INFO SecurityManager: Changing modify acls groups to: 
17/02/18 11:30:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(aims); groups with view permissions: Set(); users  with modify permissions: Set(aims); groups with modify permissions: Set()
17/02/18 11:30:57 INFO Utils: Successfully started service 'sparkDriver' on port 32975.
17/02/18 11:30:57 INFO SparkEnv: Registering MapOutputTracker
17/02/18 11:30:57 INFO SparkEnv: Registering BlockManagerMaster
17/02/18 11:30:57 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/02/18 11:30:57 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/02/18 11:30:57 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-6263f04a-5c65-4dda-9e9a-faafb32a066a
17/02/18 11:30:57 INFO MemoryStore: MemoryStore started with capacity 335.4 MB
17/02/18 11:30:57 INFO SparkEnv: Registering OutputCommitCoordinator
17/02/18 11:30:58 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/02/18 11:30:58 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.0.3:4040
17/02/18 11:30:58 INFO Executor: Starting executor ID driver on host localhost
17/02/18 11:30:58 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43772.
17/02/18 11:30:58 INFO NettyBlockTransferService: Server created on 10.0.0.3:43772
17/02/18 11:30:58 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/02/18 11:30:58 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.0.3, 43772, None)
17/02/18 11:30:58 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.0.3:43772 with 335.4 MB RAM, BlockManagerId(driver, 10.0.0.3, 43772, None)
17/02/18 11:30:58 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.0.3, 43772, None)
17/02/18 11:30:58 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.0.3, 43772, None)
17/02/18 11:30:58 INFO SharedState: Warehouse path is 'hdfs://localhost:8020/user/hive/warehouse/default.db/spark-warehouse'.
17/02/18 11:30:58 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
17/02/18 11:30:59 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
17/02/18 11:30:59 INFO ObjectStore: ObjectStore, initialize called
17/02/18 11:31:00 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
17/02/18 11:31:00 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
17/02/18 11:31:02 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
17/02/18 11:31:03 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/02/18 11:31:03 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/02/18 11:31:03 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/02/18 11:31:03 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/02/18 11:31:03 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
17/02/18 11:31:03 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
17/02/18 11:31:03 INFO ObjectStore: Initialized ObjectStore
17/02/18 11:31:05 INFO HiveMetaStore: Added admin role in metastore
17/02/18 11:31:05 INFO HiveMetaStore: Added public role in metastore
17/02/18 11:31:05 INFO HiveMetaStore: No user is added in admin role, since config is empty
17/02/18 11:31:05 INFO HiveMetaStore: 0: get_all_databases
17/02/18 11:31:05 INFO audit: ugi=aims  ip=unknown-ip-addr  cmd=get_all_databases   
17/02/18 11:31:05 INFO HiveMetaStore: 0: get_functions: db=default pat=*
17/02/18 11:31:05 INFO audit: ugi=aims  ip=unknown-ip-addr  cmd=get_functions: db=default pat=* 
17/02/18 11:31:05 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
17/02/18 11:31:06 INFO SessionState: Created local directory: /tmp/cac4110a-ebb3-47a6-b21e-682a12724ba2_resources
17/02/18 11:31:06 INFO SessionState: Created HDFS directory: /tmp/hive/aims/cac4110a-ebb3-47a6-b21e-682a12724ba2
17/02/18 11:31:06 INFO SessionState: Created local directory: /tmp/aims/cac4110a-ebb3-47a6-b21e-682a12724ba2
17/02/18 11:31:06 INFO SessionState: Created HDFS directory: /tmp/hive/aims/cac4110a-ebb3-47a6-b21e-682a12724ba2/_tmp_space.db
17/02/18 11:31:06 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is hdfs://localhost:8020/user/hive/warehouse/default.db/spark-warehouse
17/02/18 11:31:06 INFO HiveMetaStore: 0: get_database: default
17/02/18 11:31:06 INFO audit: ugi=aims  ip=unknown-ip-addr  cmd=get_database: default   
17/02/18 11:31:06 INFO HiveMetaStore: 0: get_database: global_temp
17/02/18 11:31:06 INFO audit: ugi=aims  ip=unknown-ip-addr  cmd=get_database: global_temp   
17/02/18 11:31:06 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
17/02/18 11:31:06 INFO SparkSqlParser: Parsing command: select count(*) from health1
17/02/18 11:31:08 INFO HiveMetaStore: 0: get_table : db=default tbl=health1
17/02/18 11:31:08 INFO audit: ugi=aims  ip=unknown-ip-addr  cmd=get_table : db=default tbl=health1  
17/02/18 11:31:08 INFO HiveMetaStore: 0: get_table : db=default tbl=health1
17/02/18 11:31:08 INFO audit: ugi=aims  ip=unknown-ip-addr  cmd=get_table : db=default tbl=health1  

Table is not found
17/02/18 11:31:08 INFO SparkContext: Invoking stop() from shutdown hook
17/02/18 11:31:08 INFO SparkUI: Stopped Spark web UI at http://10.0.0.3:4040
17/02/18 11:31:08 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/02/18 11:31:08 INFO MemoryStore: MemoryStore cleared
17/02/18 11:31:08 INFO BlockManager: BlockManager stopped
17/02/18 11:31:08 INFO BlockManagerMaster: BlockManagerMaster stopped
17/02/18 11:31:08 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/02/18 11:31:08 INFO SparkContext: Successfully stopped SparkContext
17/02/18 11:31:08 INFO ShutdownHookManager: Shutdown hook called
17/02/18 11:31:08 INFO ShutdownHookManager: Deleting directory /tmp/spark-ea93f7ec-6151-43e9-b5d9-bedbba537d62

我正在使用ApacheHive1.2.0和Spark2.1.0
我相信问题不是因为版本。使用eclipse neon作为ide。请告诉我为什么要面对这个问题以及如何解决它。

dw1jzc5e

dw1jzc5e1#

您需要指定架构名称。从schmaname.tablename中选择*或如下所示

try
{
      spark.sql("use schemaName")         // name of the schema
      spark.sql("select count(*) from heath1").show();
}
catch (Exception AnalysisException)
{
    System.out.print("\nTable is not found\n");
}

相关问题