我有一个巨大的440m行和30列的表。我需要把它保存在本地。
我已经使用sqlalchemy包来完成这项工作,但这需要很多时间。
请告诉我怎样才能做得更快?我应该用dask吗?这是我的密码:
import csv
import pandas as pd
from pyhive import hive
from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *
from odo import odo
################################################################################
# Using SQLAlchemy:
# Creating engine:
engine = create_engine('hive://er123.company.test:10000/db')
# Connecting to the engine:
con = engine.connect()
# Read database and builds SQLAlchemy Table Objects:
metadata = MetaData()
tbl_name = "tbl"
Table(tbl_name, metadata, autoload=True, autoload_with=engine)
# Create statement:
stmt = "SELECT * FROM tbl"
# Fetch the data:
data = pd.read_sql(stmt, con = con)
# To save to a csv:
# Using SQL statement:
pd.read_sql(stmt, con = con).to_csv('data.csv', index=False)
暂无答案!
目前还没有任何答案,快来回答吧!