I am facing the following issue: I have a really large dataset (+200M rows, dozens of columns) that I have to retrieve from SQL Server.
The current way I'm doing it is this way:
from pyodbc import connect
from pandas import read_sql
def get_table(query, server,
database="", driver="SQL Server Native Client 11.0"):
cnxn_str = ("Driver={SQL Server Native Client 11.0};"
"Server="+server+";"
"Database="+database+";"
"Trusted_Connection=yes;")
cnxn = connect(cnxn_str)
return read_sql(query, cnxn)
This function runs fine for up to hundreds of thousands of data, but I can't make it return even 1M rows. Every time I have a very large dataset, the following error returns (translated from portuguese):
OperationalError: ('08S01', '[08S01] [Microsoft][SQL Server Native Client 11.0]TCP Provider: A connection atempt failed because a connected component did not respond correctly after a period of time.
Note that to connect to SQL Server, the local code actually use Windows authentication, that's why I don't have to pass any user or pwd.
I also tried to use chunksize, but got the same result.
Is there any way I can do this efficiently?
1条答案
按热度按时间xlpyo6sf1#
try using a cursor. A cursor fetches one row at a time.