SQL Server is not using my index. How to having it using my index?

yhived7q  于 12个月前  发布在  SQL Server
关注(0)|答案(4)|浏览(220)

My client is using a table to create financial reports per month. I have optimized it as there were around 40 aggregations columns and it was running for around 40 minutes to see the output. I created a process that stores the values for the aggregations into a tables and now, when selecting the columns, the aggregations are already calculated and it runs in 3-4 minutes. Now I'm trying to create an index but due to the big volume of data (I guess) SQL Server is not using it. The query is pretty simple

Select * 
From Table
Where year(data_doc) = 2023
and month(data_doc) = 10

I created a non-clustered index only for the column 'data_doc' but the SQL server still scanned the whole table. I altered the index an included the columns that make the primary key like this:

CREATE INDEX ix_Data_Doc ON Tabel(data_doc,OperationDetail_Id, OperationTraceOut_Id,OperationTraceIn_Id).

It's important to say that the client is using a view and the 'select * from view' returns around 57 millions rows and that there are 57 columns in total. Using the filter per year and month, it returns around 2/3 millions of rows. The column 'data_doc' datatype is 'datetime'.

How can I create the indexes to optimize the process?

h7appiyu

h7appiyu1#

Your WHERE is not SARGable (it doesn't contain searchable arguements), so the index is "useless" for that query as it can't be used. You need to use date ranges, not functions on your columns in the WHERE .

SELECT <Explicit List of Columns you need>
FROM dbo.YourTable
WHERE Data_doc >= '20231001'
  AND Data_doc < '20231101';

Also, however, your SELECT had SELECT * and I doubt that the table only has those 3 columns in it. As a result even if the query is SARGable, it may result in a scan if the data engine estimates the cost of doing (many) key lookups is more expensive than a scan. If that's the case, you'll likely need to change your index to INCLUDE the other columns in the table, or better only return the columns from the table you need (as I suggest in the code) and ensure at least those (other) columns are in the INCLUDE .

50few1ms

50few1ms2#

Because you are applying a function to the date column, SQL Server can't use the index. Instead, you can rewrite the query, for example:

SELECT * 
FROM Table 
WHERE date >= '2023-10-01' AND date < '2023-11-01'

Or like this with a date parameter:

DECLARE @date DATETIME = '2023-10-01'

SELECT * 
FROM Table 
WHERE date >= @date AND date < DATEADD(month, 1, @date)
1sbrub3j

1sbrub3j3#

The query prevents the server from using the index. This isn't a SQL Server problem. Indexes, in all databases, are built using the actual field values. When the query uses Where year(date) = 2023 and month(date) = 10 it's trying to filter using calculated values that aren't stored in any index.

The best way to handle this is to use a Calendar or Dimension table , a table with precalculated dates for eg 10-20 years and additional indexed columns for Year, Month, Day, WeekDay, Week Number, Semester, Quarter, etc or whatever reporting period you want. This is common in all data warehouse designs.

Using a Calendar table you can filter or group by any period you want easily. It's also possible to perform period-over-period calculations using windowing functions like LEAD , LAG , SUM() OVER() etc.

Your particular query would become eg:

Select * 
From Sales s INNER JOIN Calendar c on s.Date=c.Date
Where c.Year = 2023
and c.Month = 10

Extra columns per reporting period can be used for filtering, grouping and labeling :

Select Category,YearMonth,sum(Amount) 
From Sales s INNER JOIN Calendar c on s.Date=c.Date
Where c.YearMonth IN ('2023M10','2022M10')
GROUP BY Category,YearMonth

A calendar table isn't very large, which means it's easy to index all the columns you need. It's also a great candidate for compression and columnstore indexing.

40 aggregations columns

57 millions rows and that there are 57 columns

That's a roundabout way of describing a Fact table in a Data Warehouse. It's quite possible there's a Date dimension already. Even if it isn't, a Calendar table with a Columnstore index can speed up querying significantly, especially if the fact table is also a Columnstore

ibps3vxo

ibps3vxo4#

A computed column can be persisted and indexed. You could add an indexed computed column that contains the year/month in sorting order, i.e. YYYYMM , using the expression Year( data_doc ) * 100 + Month( data_doc ) .

Indexing the computed column would let you perform a query using the index, e.g. ... where YearMonth = 202310; or ... where 202301 <= YearMonth and YearMonth <= 202312; , that could benefit from the index.

相关问题