如何通过python访问hive?

xzlaal3s  于 2021-06-03  发布在  Hadoop
关注(0)|答案(16)|浏览(465)

https://cwiki.apache.org/confluence/display/hive/hiveclient#hiveclient-python似乎已经过时了。
当我将此添加到/etc/profile时:

export PYTHONPATH=$PYTHONPATH:/usr/lib/hive/lib/py

然后我可以按照链接中列出的方式进行导入,除了 from hive import ThriftHive 实际上需要:

from hive_service import ThriftHive

下一个例子中的端口是10000,当我尝试时导致程序挂起。默认的hive-thrift端口是9083,它停止了挂起。
所以我就这样安排:

from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
try:
    transport = TSocket.TSocket('<node-with-metastore>', 9083)
    transport = TTransport.TBufferedTransport(transport)
    protocol = TBinaryProtocol.TBinaryProtocol(transport)
    client = ThriftHive.Client(protocol)
    transport.open()
    client.execute("CREATE TABLE test(c1 int)")

    transport.close()
except Thrift.TException, tx:
    print '%s' % (tx.message)

我收到以下错误:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/hive/lib/py/hive_service/ThriftHive.py", line 68, in execute
self.recv_execute()
File "/usr/lib/hive/lib/py/hive_service/ThriftHive.py", line 84, in recv_execute
raise x
thrift.Thrift.TApplicationException: Invalid method name: 'execute'

但是检查thrifthive.py文件会发现在客户机类中执行的方法。
如何使用python访问配置单元?

pxy2qtax

pxy2qtax1#

您可以使用python jaydebeapi包从hive或impala jdbc驱动程序创建db api连接,然后将连接传递给pandas.read\u sql函数以在pandas dataframe中返回数据。

import jaydebeapi

# Apparently need to load the jar files for the first time for impala jdbc driver to work

conn = jaydebeapi.connect('com.cloudera.hive.jdbc41.HS2Driver',
['jdbc:hive2://host:10000/db;AuthMech=1;KrbHostFQDN=xxx.com;KrbServiceName=hive;KrbRealm=xxx.COM', "",""],
jars=['/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/HiveJDBC41.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/TCLIServiceClient.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/commons-codec-1.3.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/commons-logging-1.1.1.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/hive_metastore.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/hive_service.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/httpclient-4.1.3.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/httpcore-4.1.3.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/libfb303-0.9.0.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/libthrift-0.9.0.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/log4j-1.2.14.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/ql.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/slf4j-api-1.5.11.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/slf4j-log4j12-1.5.11.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/zookeeper-3.4.6.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/ImpalaJDBC41.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/TCLIServiceClient.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/commons-codec-1.3.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/commons-logging-1.1.1.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/hive_metastore.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/hive_service.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/httpclient-4.1.3.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/httpcore-4.1.3.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/libfb303-0.9.0.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/libthrift-0.9.0.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/log4j-1.2.14.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/ql.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/slf4j-api-1.5.11.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/slf4j-log4j12-1.5.11.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/zookeeper-3.4.6.jar'
])

# the previous call have initialized the jar files, technically this call needs not include the required jar files

impala_conn = jaydebeapi.connect('com.cloudera.impala.jdbc41.Driver',
['jdbc:impala://host:21050/db;AuthMech=1;KrbHostFQDN=xxx.com;KrbServiceName=impala;KrbRealm=xxx.COM',"",""],
jars=['/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/HiveJDBC41.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/TCLIServiceClient.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/commons-codec-1.3.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/commons-logging-1.1.1.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/hive_metastore.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/hive_service.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/httpclient-4.1.3.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/httpcore-4.1.3.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/libfb303-0.9.0.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/libthrift-0.9.0.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/log4j-1.2.14.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/ql.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/slf4j-api-1.5.11.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/slf4j-log4j12-1.5.11.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/zookeeper-3.4.6.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/ImpalaJDBC41.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/TCLIServiceClient.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/commons-codec-1.3.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/commons-logging-1.1.1.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/hive_metastore.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/hive_service.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/httpclient-4.1.3.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/httpcore-4.1.3.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/libfb303-0.9.0.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/libthrift-0.9.0.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/log4j-1.2.14.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/ql.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/slf4j-api-1.5.11.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/slf4j-log4j12-1.5.11.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/zookeeper-3.4.6.jar'
])

import pandas as pd
df1 = pd.read_sql("SELECT * FROM tablename", conn)
df2 = pd.read_sql("SELECT * FROM tablename", impala_conn)

conn.close()
impala_conn.close()
beq87vna

beq87vna2#

类似于eycheu的解决方案,但更详细一点。
这里有一个专门针对hive2的替代解决方案,它不需要pyhive或安装系统范围的软件包。我在linux环境中工作,我没有根访问权限,因此安装tristin帖子中提到的sasl依赖项对我来说不是一个选择:
如果您使用的是linux,那么在运行上述操作之前,可能需要单独安装sasl。使用apt-get或yum或任何适用于您的发行版的包管理器安装libsasl2-dev包。
具体来说,这个解决方案侧重于利用python包:jaydebeapi。根据我的经验,我只需要在python anaconda2.7上安装这个额外的包。这个包利用了java(jdk)。我想这已经准备好了。
步骤1:安装jaydebeapi

pip install jaydebeap

步骤2:下载适合您环境的驱动程序:
这里是指向企业cdh环境所需的jar的链接
另一篇文章讨论了在哪里可以找到apachehive的jdbc驱动程序
将所有.jar文件存储在一个目录中。我将把这个目录称为/path/to/jar/files/。
步骤3:确定系统身份验证机制:
在列出的pyhive解决方案中,我看到了作为身份验证机制和kerberos的简单列表。请注意,您的jdbc连接url将取决于您使用的身份验证机制。我将在不传递用户名/密码的情况下解释kerberos解决方案。下面是kerberos身份验证和选项的更多信息。
如果尚未创建kerberos票证,请创建kerberos票证

$ kinit

门票可通过 klist .
现在可以通过python建立连接了:

import jaydebeapi
import glob

# Creates a list of jar files in the /path/to/jar/files/ directory

jar_files = glob.glob('/path/to/jar/files/*.jar')

host='localhost'
port='10000'
database='default'

# note: your driver will depend on your environment and drivers you've

# downloaded in step 2

# this is the driver for my environment (jdbc3, hive2, cloudera enterprise)

driver='com.cloudera.hive.jdbc3.HS2Driver'

conn_hive = jaydebeapi.connect(driver,
        'jdbc:hive2://'+host+':' +port+'/'+database+';AuthMech=1;KrbHostFQDN='+host+';KrbServiceName=hive'
                           ,jars=jar_files)

如果您只关心阅读,那么您可以通过eycheu的解决方案轻松地将其直接读入Pandas的Dataframe:

import pandas as pd
df = pd.read_sql("select * from table", conn_hive)

否则,这里有一个更通用的通信选项:

cursor = conn_hive.cursor()
sql_expression = "select * from table"
cursor.execute(sql_expression)
results = cursor.fetchall()

您可以想象,如果您想创建一个表,您不需要“获取”结果,而是可以提交一个create table查询。

0h4hbjxa

0h4hbjxa3#

我相信最简单的方法就是用Hive。
要安装,您需要以下库:

pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive

请注意,尽管您将库安装为 PyHive ,则将模块导入为 pyhive ,全部小写。
如果您使用的是linux,那么在运行上述操作之前,可能需要单独安装sasl。使用apt-get或yum或任何适用于您的发行版的包管理器安装libsasl2-dev包。对于windows,gnu.org上有一些选项,您可以下载二进制安装程序。如果您已经安装了xcode开发工具,那么在mac上sasl应该是可用的( xcode-select --install 终端中)
安装后,您可以按以下方式连接到配置单元:

from pyhive import hive
conn = hive.Connection(host="YOUR_HIVE_HOST", port=PORT, username="YOU")

既然您有了Hive连接,就可以选择如何使用它了。您可以直接查询:

cursor = conn.cursor()
cursor.execute("SELECT cool_stuff FROM hive_table")
for result in cursor.fetchall():
  use_result(result)

…或使用连接创建Dataframe:

import pandas as pd
df = pd.read_sql("SELECT cool_stuff FROM hive_table", conn)
7z5jn7bk

7z5jn7bk4#

没有一个答案演示了如何获取和打印表头。修改了广泛使用并积极维护的pyhive标准示例。

from pyhive import hive
cursor = hive.connect(host="localhost", 
                      port=10000, 
                      username="shadan", 
                      auth="KERBEROS", 
                      kerberos_service_name="hive"
                      ).cursor()
cursor.execute("SELECT * FROM my_dummy_table LIMIT 10")
columnList = [desc[0] for desc in cursor.description]
headerStr = ",".join(columnList)
headerTuple = tuple(headerStr.split (",")
print(headerTuple)
print(cursor.fetchone())
print(cursor.fetchall())
pod7payv

pod7payv5#

这是一个连接hive和python的快速方法,

from pyhive import hive
cursor = hive.connect('YOUR_HOST_NAME').cursor()
cursor.execute('SELECT * from table_name LIMIT 5',async=True)
print cursor.fetchall()

输出:元组列表

jvidinwx

jvidinwx6#

您可以使用配置单元库,因为您要从配置单元导入节俭配置单元导入配置单元类
请尝试以下示例:

import sys

from hive import ThriftHive
from hive.ttypes import HiveServerException

from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol

try:
  transport = TSocket.TSocket('localhost', 10000)
  transport = TTransport.TBufferedTransport(transport)
  protocol = TBinaryProtocol.TBinaryProtocol(transport)
  client = ThriftHive.Client(protocol)
  transport.open()
  client.execute("CREATE TABLE r(a STRING, b INT, c DOUBLE)")
  client.execute("LOAD TABLE LOCAL INPATH '/path' INTO TABLE r")
  client.execute("SELECT * FROM r")
  while (1):
    row = client.fetchOne()
    if (row == None):
       break
    print row

  client.execute("SELECT * FROM r")
  print client.fetchAll()
  transport.close()
except Thrift.TException, tx:
  print '%s' % (tx.message)
zi8p0yeb

zi8p0yeb7#

我已经和你解决了同样的问题,这是我的操作环境(system:linux versions:python 3.6版package:pyhive)请参考我的回答如下:

from pyhive import hive
conn = hive.Connection(host='149.129.***.**', port=10000, username='*', database='*',password="*",auth='LDAP')

关键是添加参考密码auth,同时将auth设置为ldap。那就行了,有什么问题请告诉我

kyvafyod

kyvafyod8#

通常的做法是禁止用户在集群节点上下载和安装包和库。在本例中,@pythonstarter和@goks的解决方案工作得非常好,如果配置单元在同一个节点上运行的话。否则,可以使用 beeline 而不是 hive 命令行工具。查看详细信息


# python 2

import commands

cmd = 'beeline -u "jdbc:hive2://node07.foo.bar:10000/...<your connect string>" -e "SELECT * FROM db_name.table_name LIMIT 1;"'

status, output = commands.getstatusoutput(cmd)

if status == 0:
   print output
else:
   print "error"

.


# python 3

import subprocess

cmd = 'beeline -u "jdbc:hive2://node07.foo.bar:10000/...<your connect string>" -e "SELECT * FROM db_name.table_name LIMIT 1;"'

status, output = subprocess.getstatusoutput(cmd)

if status == 0:
   print(output)
else:
   print("error")
6ie5vjzr

6ie5vjzr9#

pyhs2不再维护。一个更好的替代品是 Impala
不要把下面这些关于 Impala 的例子搞混了;只需将hiveserver2的port更改为10000(默认值),它的工作方式与impala示例相同。这和 Impala 和Hive使用的协议(节俭)是一样的。
https://github.com/cloudera/impyla
它比pyhs2有更多的特性,例如,它有kerberos身份验证,这对我们来说是必须的。

from impala.dbapi import connect
conn = connect(host='my.host.com', port=10000)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description  # prints the result set's schema
results = cursor.fetchall()

## 

cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:
    process(row)

cloudera现在在hs2客户端上投入了更多的精力https://github.com/cloudera/hs2client 这是一个c/c++hiveserver2/impala客户端。如果您向python推送大量数据或从python推送大量数据,可能是更好的选择(也有python绑定-https://github.com/cloudera/hs2client/tree/master/python )
有关impyla的更多信息:
http://blog.cloudera.com/blog/2014/04/a-new-python-client-for-impala/
https://github.com/cloudera/impyla/blob/master/readme.md

tmb3ates

tmb3ates10#

下面的python程序应该可以从python访问配置单元表:

import commands

cmd = "hive -S -e 'SELECT * FROM db_name.table_name LIMIT 1;' "

status, output = commands.getstatusoutput(cmd)

if status == 0:
   print output
else:
   print "error"
icnyk63a

icnyk63a11#

我Assert您使用的是hiveserver2,这就是导致代码无法工作的原因。
您可以使用pyhs2正确访问您的配置单元,示例代码如下:

import pyhs2

with pyhs2.connect(host='localhost',
               port=10000,
               authMechanism="PLAIN",
               user='root',
               password='test',
               database='default') as conn:
    with conn.cursor() as cur:
        #Show databases
        print cur.getDatabases()

        #Execute query
        cur.execute("select * from table")

        #Return column info from query
        print cur.getSchema()

        #Fetch table results
        for i in cur.fetch():
            print i

请注意,在安装带有pip的pyhs2之前,您可以先安装python-devel.x86\u 64 cyrus-sasl-devel.x86\u 64。
希望这个能帮到你。
参考文献:https://cwiki.apache.org/confluence/display/hive/setting+up+hiveserver2#settinguphiveserver2-Python客户端驱动程序

41ik7eoe

41ik7eoe12#

通过使用python客户端驱动程序

pip install pyhs2

然后

import pyhs2

with pyhs2.connect(host='localhost',
               port=10000,
               authMechanism="PLAIN",
               user='root',
               password='test',
               database='default') as conn:
with conn.cursor() as cur:
    #Show databases
    print cur.getDatabases()

    #Execute query
    cur.execute("select * from table")

    #Return column info from query
    print cur.getSchema()

    #Fetch table results
    for i in cur.fetch():
        print i

参考:https://cwiki.apache.org/confluence/display/hive/setting+up+hiveserver2#settinguphiveserver2-Python客户端驱动程序

jmp7cifd

jmp7cifd13#

要使用用户名/密码并指定端口进行连接,代码如下所示:

from pyhive import presto

cursor = presto.connect(host='host.example.com',
                    port=8081,
                    username='USERNAME:PASSWORD').cursor()

sql = 'select * from table limit 10'

cursor.execute(sql)

print(cursor.fetchone())
print(cursor.fetchall())
e0uiprwp

e0uiprwp14#

类似于@python starter解决方案。但是,命令包在python3.x上不可用。所以另一种解决方案是在python3.x中使用子进程

import subprocess

cmd = "hive -S -e 'SELECT * FROM db_name.table_name LIMIT 1;' "

status, output = subprocess.getstatusoutput(cmd)

if status == 0:
   print(output)
else:
   print("error")
46scxncf

46scxncf15#

上面的例子有点过时了。一个新的例子是:

import pyhs2 as hive
import getpass
DEFAULT_DB = 'default'
DEFAULT_SERVER = '10.37.40.1'
DEFAULT_PORT = 10000
DEFAULT_DOMAIN = 'PAM01-PRD01.IBM.COM'

u = raw_input('Enter PAM username: ')
s = getpass.getpass()
connection = hive.connect(host=DEFAULT_SERVER, port= DEFAULT_PORT, authMechanism='LDAP', user=u + '@' + DEFAULT_DOMAIN, password=s)
statement = "select * from user_yuti.Temp_CredCard where pir_post_dt = '2014-05-01' limit 100"
cur = connection.cursor()

cur.execute(statement)
df = cur.fetchall()

除了标准的python程序外,还需要安装一些库,以允许python构建到hadoop数据库的连接。
1.pyhs2,python配置单元服务器2客户端驱动程序
2.sasl,python的cyrus sasl绑定
3.thrift,apache thrift rpc系统的python绑定
4.pyhive,python接口到hive
记住更改可执行文件的权限
chmod+x测试\u hive2.py./测试\u hive2.py
希望对你有帮助。参考文献:https://sites.google.com/site/tingyusz/home/blogs/hiveinpython

相关问题