SQL Server How to compare data of Dev, Test and Prod of Azure SQL with Python?

juzqafwq  于 2023-04-28  发布在  Python
关注(0)|答案(1)|浏览(223)

I have need to compare amount rows of Azure SQL databases to check data quality. There are about 50 database tables and occasionally new tables are added. I would like to build with Python script that connects to 3 environments and build reports if tables are ok.

I am wondering where to store results with Python during data comparing process. And I am wondering if comparison should be done 2 tables in once or all 3 in once.

Example of report:

"", "Dev","Test","Prod"
"Table A", "120","120","120"
"Table B", "150","150","50"

Let say that I would like to execute this statement to each DB, what architecture to compare values and create reports?

Possibly this query can be used.

SELECT (SCHEMA_NAME(A.schema_id) + '.' + A.Name) AS TableName  
, SUM(B.rows) AS RecordCount  
FROM sys.objects A  
INNER JOIN sys.partitions B ON A.object_id = B.object_id  
WHERE A.type = 'U'  
GROUP BY A.schema_id, A.Name

Python code for fetching dev:

import pyodbc 
 import pandas as pd

 conn = pyodbc.connect('Driver={SQL Server};'
                  'Server=server_name;'
                  'Database=database_name;'
                  'Trusted_Connection=yes;')

 cursor = conn.cursor()
 sqlquery = "SELECT (SCHEMA_NAME(A.schema_id) + '.' + A.Name) AS 
 TableName  , SUM(B.rows) AS RecordCount FROM sys.objects A  INNER JOIN 
 sys.partitions B ON A.object_id = B.object_id  WHERE A.type = 'U' GROUP 
 BY A.schema_id, A.Name"

 df = pd.read_sql_query(sqlquery, conn)

 print(df)
 print(type(df))

Results from Dev

TableName  RecordCount
0    TableA         120
1    TableB         150

What can I try next?

6psbrbz9

6psbrbz91#

Firstly, we have multiple third-party tools where we can achieve our task. Also, we have few ways to compare the number of rows from Python script.

  1. We can initiate the same block of code which you have done for Dev environment for Test and Prod by changing the variable names for conn and cursor.
connProd = pyodbc.connect('Driver={SQL Server};'
                  'Server=server_name;'
                  'Database=database_name;'
                  'Trusted_Connection=yes;')

cursorProd = connProd.cursor()
sqlqueryProd = "SELECT (SCHEMA_NAME(A.schema_id) + '.' + A.Name) AS 
TableName  , SUM(B.rows) AS RecordCount FROM sys.objects A  INNER JOIN 
sys.partitions B ON A.object_id = B.object_id  WHERE A.type = 'U' GROUP 
BY A.schema_id, A.Name"

dfProd = pd.read_sql_query(sqlqueryProd, connProd)

And we can print them as below:

print(dfProd)
print(type(dfProd))
  1. We can pull the database information to Excel using Pandas and can compare them, please refer to this blog article.

相关问题