如何在spark中设置配置单元数据库连接

xu3bshqb 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(311)

新的Spark和Hive。目前我可以运行spark1.5.2，也可以从命令行访问hive。我希望能够以编程方式连接到hive数据库，运行查询并将数据提取到dataframe，所有这些都在spark中。我想这种工作流程相当标准。但我不知道怎么做。
现在我知道我可以在spark中获得hivecontext：

import org.apache.spark.sql.hive.HiveContext;

我可以在 hive 里做所有的查询

SHOW TABLES; 
>>customers
  students
  ...

然后我可以从表中获取数据：

SELECT * FROM customers limit 100;

如何在spark中把这两个串在一起？
谢谢。

hadoop Hive apache-spark

来源：https://stackoverflow.com/questions/40349290/how-to-set-up-hive-database-connection-inside-spark

1条答案

按热度按时间

bcs8qyzn1#

//sc是一个现有的sparkcontext。

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

//查询用hiveql表示

val tablelist = sqlContext.sql("show tables")
val custdf = sqlContext.sql("SELECT * FROM customers limit 100") 

tablelist.collect().foreach(println)     
custdf.collect().foreach(println)

赞(0）回复(0）举报 2021-06-03

我来回答

如何在spark中设置配置单元数据库连接

1条答案

相关问题

热门标签

最新问答