我使用impyla包通过python中spark上的hive运行一系列查询,使用impyla包的sqlalchemy支持。sqlalchemy为执行的每个sql语句自动创建并关闭一个dbapi游标。因为impyla hiveserver2cursor实现关闭了底层配置单元会话,所以每个sql语句最终都作为一个单独的spark作业运行。我希望避免为每个sql语句启动新spark作业的开销,并使用sqlalchemy而不是原始dbapi接口。
当然,重用dbapi游标是可行的,但我还是更愿意使用sqlalchemy引擎及其连接池和自动游标管理特性。
# this version uses raw dbapi and only one cursor and therfore one hive session
con = connect(host='cdh-dn8.ec2.internal', port=10000, kerberos_service_name='hive', auth_mechanism='GSSAPI')
cur = con.cursor()
cur.execute('set hive.execution.engine=spark')
cur.execute("select * from reference.zipcode where zip = '55112'")
rows = cur.fetchall()
# use data from result and execute more queries ...
cur.close()
con.close()
# this version uses sqlalchemy and one cursor per statement executed, resulting in multiple hive sessions
sqlalchemyengine = create_engine('impala://cdh-dn8.ec2.internal:10000', kerberos_service_name='hive', auth_mechanism='GSSAPI')
conn = sqlalchemyengine.connect()
conn.execute('set hive.execution.engine=spark')
result = conn.execute("select * from reference.zipcode where zip = '55112'")
# use data from result and execute more queries ...
我想知道是否有一个很好的理由让impyla用每个光标打开和关闭配置单元会话,而不是在连接关闭时关闭配置单元会话。
暂无答案!
目前还没有任何答案,快来回答吧!