将MySQL查询结果复制到Python中的临时文件

6pp0gazn  于 2023-01-20  发布在  Mysql
关注(0)|答案(2)|浏览(113)
    • bounty将在5天后过期**。回答此问题可获得+500声望奖励。Nick ODell希望引起更多人关注此问题。

我对SQL还不太熟悉,但我之前学习了一个名为Optimizing pandas.read_sql for Postgres的教程。问题是,我正在处理一个大数据集,类似于教程中的示例,我需要一种更快的方法来执行查询并将其转换为DataFrame。在那里,他们使用了以下函数:

def read_sql_tmpfile(query, db_engine):
    with tempfile.TemporaryFile() as tmpfile:
        copy_sql = "COPY ({query}) TO STDOUT WITH CSV {head}".format(
           query=query, head="HEADER"
        )
        conn = db_engine.raw_connection()
        cur = conn.cursor()
        cur.copy_expert(copy_sql, tmpfile)  # I want to replicate this
        tmpfile.seek(0)
        df = pandas.read_csv(tmpfile)
        return df

我试着复制它,就像这样:

def read_sql_tmpfile(query, connection):
    with tempfile.TemporaryFile() as tmpfile:
        copy_sql = "COPY ({query}) TO STDOUT WITH CSV {head}".format(
           query=query, head="HEADER"
        )

        cur = connection.cursor()
        cur.copy_expert(copy_sql, tmpfile)
        tmpfile.seek(0)
        df = pandas.read_csv(tmpfile)
        return df

问题是,cursor.copy_expert来自PostgreSQL的psycopg2库,我找不到一种方法来对pymysql做同样的事情。有什么方法可以做到这一点吗?我应该怎么做?谢谢

t98cgbkg

t98cgbkg1#

正如评论中提到的,在this answer中,您正在寻找SELECT ... INTO OUTFILE
下面是一个基于您的问题的小示例(未经测试):

def read_sql_tmpfile(query, connection):
    # Create tmp file name without creating the file
    tmp_dir = tempfile.mkdtemp()
    tmp_file_name = os.path.join(tmp_dir, next(tempfile._get_candidate_names()))
    
    # Copy data into temporary file
    copy_sql = "{query} INTO OUTFILE {outfile}".format(
           query=query, outfile=tmp_file_name 
    )
    cur = connection.cursor()
    cur.execute(copy_sql)
    
    # Read data from file
    df = pandas.read_csv(tmp_file_name)
    # Cleanup
    os.remove(tmp_file_name)
    return df
bwleehnv

bwleehnv2#

你可以很容易地把你的文件写入/tmp,它在两次重启之间被清除。你也可以添加你自己的装饰器/上下文管理器来应用类似于tempfile.TemporaryFile的细节。一个简单的例子是这样的...

import psutil

class SQLGeneratedTemporaryFile:

  def __init__(self, filename):
    self.filename = filename

  def __enter__(self):
    # run your query and write to your file with the name `self.filename`

  def __exit__(self, *exc):
    psutil.unlink(self.filename)

相关问题