Get a lot of data (200M rows) from SQL Server in Python

kjthegm6  于 2023-03-22  发布在  SQL Server
关注(0)|答案(1)|浏览(119)

I am facing the following issue: I have a really large dataset (+200M rows, dozens of columns) that I have to retrieve from SQL Server.

The current way I'm doing it is this way:

from pyodbc import connect
from pandas import read_sql

def get_table(query, server,
          database="", driver="SQL Server Native Client 11.0"):

cnxn_str = ("Driver={SQL Server Native Client 11.0};"
            "Server="+server+";"
            "Database="+database+";"
            "Trusted_Connection=yes;")
cnxn = connect(cnxn_str)
return read_sql(query, cnxn)

This function runs fine for up to hundreds of thousands of data, but I can't make it return even 1M rows. Every time I have a very large dataset, the following error returns (translated from portuguese):

OperationalError: ('08S01', '[08S01] [Microsoft][SQL Server Native Client 11.0]TCP Provider: A connection atempt failed because a connected component did not respond correctly after a period of time.

Note that to connect to SQL Server, the local code actually use Windows authentication, that's why I don't have to pass any user or pwd.

I also tried to use chunksize, but got the same result.

Is there any way I can do this efficiently?

xlpyo6sf

xlpyo6sf1#

try using a cursor. A cursor fetches one row at a time.

conn = pyodbc.connect(
        r'DRIVER={SQL Server Native Client 11.0};'
        r'SERVER=' + server + ';'
        r'DATABASE=' + database + ';'
        r'UID=' + user + ';'
        r'PWD=' + password
    )
    cursor = conn.cursor()
    cursor.execute("select * from " + table)
    rows = cursor.fetchall()
    
    ... process the rows

    or try

conn = pyodbc.connect(
        r'DRIVER={SQL Server Native Client 11.0};'
        r'SERVER=' + server + ';'
        r'DATABASE=' + database + ';'
        r'UID=' + user + ';'
        r'PWD=' + password
    )
    cursor = conn.cursor()
    cursor.execute("select * from " + table)
    while True:
        rows = cursor.fetchmany(chunksize)
        if not rows:
            break
        for row in rows:
            yield row

相关问题