I have need to compare amount rows of Azure SQL databases to check data quality. There are about 50 database tables and occasionally new tables are added. I would like to build with Python script that connects to 3 environments and build reports if tables are ok.
I am wondering where to store results with Python during data comparing process. And I am wondering if comparison should be done 2 tables in once or all 3 in once.
Example of report:
"", "Dev","Test","Prod"
"Table A", "120","120","120"
"Table B", "150","150","50"
Let say that I would like to execute this statement to each DB, what architecture to compare values and create reports?
Possibly this query can be used.
SELECT (SCHEMA_NAME(A.schema_id) + '.' + A.Name) AS TableName
, SUM(B.rows) AS RecordCount
FROM sys.objects A
INNER JOIN sys.partitions B ON A.object_id = B.object_id
WHERE A.type = 'U'
GROUP BY A.schema_id, A.Name
Python code for fetching dev:
import pyodbc
import pandas as pd
conn = pyodbc.connect('Driver={SQL Server};'
'Server=server_name;'
'Database=database_name;'
'Trusted_Connection=yes;')
cursor = conn.cursor()
sqlquery = "SELECT (SCHEMA_NAME(A.schema_id) + '.' + A.Name) AS
TableName , SUM(B.rows) AS RecordCount FROM sys.objects A INNER JOIN
sys.partitions B ON A.object_id = B.object_id WHERE A.type = 'U' GROUP
BY A.schema_id, A.Name"
df = pd.read_sql_query(sqlquery, conn)
print(df)
print(type(df))
Results from Dev
TableName RecordCount
0 TableA 120
1 TableB 150
What can I try next?
1条答案
按热度按时间6psbrbz91#
Firstly, we have multiple third-party tools where we can achieve our task. Also, we have few ways to compare the number of rows from Python script.
And we can print them as below: