从配置单元加载表并另存为csv?

y0u0uwnf  于 2021-06-26  发布在  Hive
关注(0)|答案(0)|浏览(275)

我有一个巨大的440m行和30列的表。我需要把它保存在本地。
我已经使用sqlalchemy包来完成这项工作,但这需要很多时间。
请告诉我怎样才能做得更快?我应该用dask吗?这是我的密码:

import csv
import pandas as pd
from pyhive import hive
from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *
from odo import odo

################################################################################ 

# Using SQLAlchemy:

# Creating engine:

engine = create_engine('hive://er123.company.test:10000/db')

# Connecting to the engine:

con = engine.connect()

# Read database and builds SQLAlchemy Table Objects:

metadata = MetaData()
tbl_name = "tbl"
Table(tbl_name, metadata, autoload=True, autoload_with=engine)

# Create statement:

stmt = "SELECT * FROM tbl"

# Fetch the data:

data = pd.read_sql(stmt, con = con)

# To save to a csv:

# Using SQL statement:

pd.read_sql(stmt, con = con).to_csv('data.csv', index=False)

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题