My client is using a table to create financial reports per month. I have optimized it as there were around 40 aggregations columns and it was running for around 40 minutes to see the output. I created a process that stores the values for the aggregations into a tables and now, when selecting the columns, the aggregations are already calculated and it runs in 3-4 minutes. Now I'm trying to create an index but due to the big volume of data (I guess) SQL Server is not using it. The query is pretty simple
Select *
From Table
Where year(data_doc) = 2023
and month(data_doc) = 10
I created a non-clustered index only for the column 'data_doc' but the SQL server still scanned the whole table. I altered the index an included the columns that make the primary key like this:
CREATE INDEX ix_Data_Doc ON Tabel(data_doc,OperationDetail_Id, OperationTraceOut_Id,OperationTraceIn_Id).
It's important to say that the client is using a view and the 'select * from view' returns around 57 millions rows and that there are 57 columns in total. Using the filter per year and month, it returns around 2/3 millions of rows. The column 'data_doc' datatype is 'datetime'.
How can I create the indexes to optimize the process?
4条答案
按热度按时间h7appiyu1#
Your
WHERE
is not SARGable (it doesn't contain searchable arguements), so the index is "useless" for that query as it can't be used. You need to use date ranges, not functions on your columns in theWHERE
.Also, however, your
SELECT
hadSELECT *
and I doubt that the table only has those 3 columns in it. As a result even if the query is SARGable, it may result in a scan if the data engine estimates the cost of doing (many) key lookups is more expensive than a scan. If that's the case, you'll likely need to change your index toINCLUDE
the other columns in the table, or better only return the columns from the table you need (as I suggest in the code) and ensure at least those (other) columns are in theINCLUDE
.50few1ms2#
Because you are applying a function to the date column, SQL Server can't use the index. Instead, you can rewrite the query, for example:
Or like this with a date parameter:
1sbrub3j3#
The query prevents the server from using the index. This isn't a SQL Server problem. Indexes, in all databases, are built using the actual field values. When the query uses
Where year(date) = 2023 and month(date) = 10
it's trying to filter using calculated values that aren't stored in any index.The best way to handle this is to use a Calendar or Dimension table , a table with precalculated dates for eg 10-20 years and additional indexed columns for Year, Month, Day, WeekDay, Week Number, Semester, Quarter, etc or whatever reporting period you want. This is common in all data warehouse designs.
Using a Calendar table you can filter or group by any period you want easily. It's also possible to perform period-over-period calculations using windowing functions like
LEAD
,LAG
,SUM() OVER()
etc.Your particular query would become eg:
Extra columns per reporting period can be used for filtering, grouping and labeling :
A calendar table isn't very large, which means it's easy to index all the columns you need. It's also a great candidate for compression and columnstore indexing.
40 aggregations columns
57 millions rows and that there are 57 columns
That's a roundabout way of describing a Fact table in a Data Warehouse. It's quite possible there's a Date dimension already. Even if it isn't, a Calendar table with a Columnstore index can speed up querying significantly, especially if the fact table is also a Columnstore
ibps3vxo4#
A computed column can be persisted and indexed. You could add an indexed computed column that contains the year/month in sorting order, i.e.
YYYYMM
, using the expressionYear( data_doc ) * 100 + Month( data_doc )
.Indexing the computed column would let you perform a query using the index, e.g.
... where YearMonth = 202310;
or... where 202301 <= YearMonth and YearMonth <= 202312;
, that could benefit from the index.