根据我的数据科学经验,我能够连接到biginsights中的hive数据库并读取表模式。但是数据科学的经验似乎无法读取表中的内容,因为我得到的计数为零!以下是我的一些设置:
conf = (SparkConf().set("com.ibm.analytics.metadata.enabled","false"))
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
dash = {
'jdbcurl': 'jdbc:hive2://nnnnnnnnnnn:10000/;ssl=true;',
'user': 'xxxxxxxxxx',
'password': 'xxxxxxxxx',
}
spark.conf
offers = spark.read.jdbc(dash['jdbcurl'],
table='offers',
properties={"user" : dash["user"],
"password" : dash["password"]})
offers.count() returns: 0
offers.show()
returns:
+-----------+----------+
|offers.name|offers.age|
+-----------+----------+
+-----------+----------+
谢谢。
1条答案
按热度按时间eivnm1vs1#
是的,我看到了HiveJDBC连接器的相同行为。我尝试了这个python连接器,它返回了正确的计数。
https://datascience.ibm.com/docs/content/analyze-data/python_load.html
from ingest.Connectors import Connectors
````HiveloadOptions = { Connectors.Hive.HOST : 'bi-hadoop-prod-4222.bi.services.us-south.bluemix.net',
Connectors.Hive.PORT : '10000',
Connectors.Hive.SSL : True,
Connectors.Hive.DATABASE : 'default',
Connectors.Hive.USERNAME : 'charles',
Connectors.Hive.PASSWORD : 'march14march',
Connectors.Hive.SOURCE_TABLE_NAME : 'student'}
HiveDF = sqlContext.read.format("com.ibm.spark.discover").options(**HiveloadOptions).load()
```
HiveDF.printSchema()
HiveDF.show()HiveDF.count()
谢谢,查尔斯。