Why is my SQL Server ORDER BY slow despite the ordered column being indexed?

ryevplcw  于 2023-11-16  发布在  SQL Server
关注(0)|答案(5)|浏览(155)

I have an SQL query (generated by LINQ to Entities) which is roughly like the following:

SELECT * FROM [mydb].[dbo].[employees]
JOIN [mydb].[dbo].[industry]
  ON jobs.industryId = industry.id
JOIN [mydb].[dbo].[state]
  ON jobs.stateId = state.id
JOIN [mydb].[dbo].[positionType]
  ON jobs.positionTypeId = positionType.id
JOIN [mydb].[dbo].[payPer]
  ON jobs.salaryPerId = payPer.id
JOIN [mydb].[dbo].[country]
  ON jobs.countryId = country.id
WHERE countryName = 'US'
ORDER BY startDatetime

The query returns about 1200 rows, which I don't think is a huge amount. Unfortunately it also takes ~16 seconds. Without the ORDER BY, the query takes <1 second.

I've used SQL Server Management Studio to put an index on the startDatetime column, and also a clustered index on "cityId, industryId, startDatetime, positionTypeId, payPerId, stateId" (i.e. all of the columns in "jobs" that we use in JOINs and on the column we use ORDER BY on). I already have individual indexes on each of the columns we use in JOINs. Unfortunately this hasn't made the query any faster.

I ran a showplan and got:

|--Nested Loops(Inner Join, OUTER REFERENCES:([mydb].[dbo].[jobs].[cityId]))
       |--Nested Loops(Inner Join, OUTER REFERENCES:([mydb].[dbo].[jobs].[stateId]))
       |    |--Nested Loops(Inner Join, OUTER REFERENCES:([mydb].[dbo].[jobs].[industryId]))
       |    |    |--Nested Loops(Inner Join, OUTER REFERENCES:([mydb].[dbo].[jobs].[positionTypeId]))
       |    |    |    |--Nested Loops(Inner Join, OUTER REFERENCES:([mydb].[dbo].[jobs].[salaryPerId]))
       |    |    |    |    |--Sort(ORDER BY:([mydb].[dbo].[jobs].[issueDatetime] ASC))
       |    |    |    |    |    |--Hash Match(Inner Join, HASH:([mydb].[dbo].[currency].[id])=([mydb].[dbo].[jobs].[salaryCurrencyId]))
       |    |    |    |    |         |--Index Scan(OBJECT:([mydb].[dbo].[currency].[IX_currency]))
       |    |    |    |    |         |--Nested Loops(Inner Join, WHERE:([mydb].[dbo].[jobs].[countryId]=[mydb].[dbo].[country].[id]))
       |    |    |    |    |              |--Index Seek(OBJECT:([mydb].[dbo].[country].[IX_country]), SEEK:([mydb].[dbo].[country].[countryName]='US') ORDERED FORWARD)
       |    |    |    |    |              |--Clustered Index Scan(OBJECT:([mydb].[dbo].[jobs].[PK_jobs]))
       |    |    |    |    |--Clustered Index Seek(OBJECT:([mydb].[dbo].[payPer].[PK_payPer]), SEEK:([mydb].[dbo].[payPer].[id]=[mydb].[dbo].[jobs].[salaryPerId]) ORDERED FORWARD)
       |    |    |    |--Clustered Index Seek(OBJECT:([mydb].[dbo].[positionType].[PK_positionType]), SEEK:([mydb].[dbo].[positionType].[id]=[mydb].[dbo].[jobs].[positionTypeId]) ORDERED FORWARD)
       |    |    |--Clustered Index Seek(OBJECT:([mydb].[dbo].[industry].[PK_industry]), SEEK:([mydb].[dbo].[industry].[id]=[mydb].[dbo].[jobs].[industryId]) ORDERED FORWARD)
       |    |--Clustered Index Seek(OBJECT:([mydb].[dbo].[state].[PK_state]), SEEK:([mydb].[dbo].[state].[id]=[mydb].[dbo].[jobs].[stateId]) ORDERED FORWARD)
       |--Clustered Index Seek(OBJECT:([mydb].[dbo].[city].[PK_city]), SEEK:([mydb].[dbo].[city].[id]=[mydb].[dbo].[jobs].[cityId]) ORDERED FORWARD)

The important line seems to be "|--Sort(ORDER BY:([mydb].[dbo].[jobs].[issueDatetime] ASC))" — without any mention of an index on that column.

Why is my ORDER BY making my query so much slower, and how can I speed up my query?

c7rzv4ha

c7rzv4ha1#

If your query does not contain an order by then it will return the data in whatever order it was found. There is no guarantee that the data will even be returned in the same order when you run the query again.

When you include an order by clause, the database has to build a list of the rows in the correct order and then return the data in that order. This can take a lot of extra processing which translates into extra time.

It probably takes longer to sort a large number of columns, which your query might be returning. At some point you will run out of buffer space and the database will have to start swapping and performance will go downhill.

Try returning less columns (specify the columns you need instead of select *) and see if the query runs faster.

ddhy6vgd

ddhy6vgd2#

Because your query projects all the columns ( * ), it needs 5 columns for the join conditions and has an unselective WHERE clause on what is likely a joined table column, it causes it to hit the Index Tipping Point : the optimizer decides that it is less costly to scan the entire table, filter it and sort it that it would be to range scan the index and then lookup each key in the table to retrieve the needed extra columns (the 5 for the joins and the rest for the * ).

A better index to partially cover this query could be:

CREATE INDEX ... ON .. (countryId, startDatetime);

Jeffrey's suggestion to make the clustered index would cover the query 100% and would definitely improve performance, but changing the clustered index has many side effects. I would start with a non-clustered index as above. Unless they are needed by other queries, you can drop all the other non-clustered indexes you created, they won't help this query.

vptzau2j

vptzau2j3#

You should try below code also

Insert the records into temporary table Without using the Order by clause

SELECT * into #temp FROM [mydb].[dbo].[employees]
JOIN [mydb].[dbo].[industry]
  ON jobs.industryId = industry.id
JOIN [mydb].[dbo].[state]
  ON jobs.stateId = state.id
JOIN [mydb].[dbo].[positionType]
  ON jobs.positionTypeId = positionType.id
JOIN [mydb].[dbo].[payPer]
  ON jobs.salaryPerId = payPer.id
JOIN [mydb].[dbo].[country]
  ON jobs.countryId = country.id
WHERE countryName = 'US'

Now run the statement using Order By Clause

Select * from #temp ORDER BY startDatetime
eni9jsuy

eni9jsuy4#

Indexing a column doesn't help make the sort faster.

If you want to make your query a lot faster, then reverse the order of your tables. Specifically, list table country first in your joined tables.

The reason why this helps is that the where clause can filter rows from the first table instead of having to make all those joins, then filtering the rows.

4c8rllxm

4c8rllxm5#

What order are the fields in the clustered index included in? You'll want to put the startDateTime field first in order for the ORDER BY to match it, or in this case (countryId, startDateTime) up front in that order since you want to select a single countryId (indirectly, via countryName ) and then order by startDateTime .

相关问题