I understand why for Clustered Indexes (maintain sorted order).
For Non-Clustered, though, couldn't you apply a hash function be applied to find the page containing the row? Let's say you have
- 1519 rows
- 2100 bytes per row
- 64 kilobytes per page
Then the number of pages is 1519 * 2100 / (64 * 1024) = 48 [+ remainder].
You can just % 49 the row id to find the page containing the row.
(Yes, I know it's a little more complicated because pages could overflow because hashing will not be perfectly uniform, that we will need to sometimes split pages similar to adding new node halfway between existing nodes in a hash ring, etc. Still, it seems better than B-Tree.)
2条答案
按热度按时间tktrz96b1#
Why is a B-Tree even needed?
It isn't - but for SQL Server that is how non clustered indexes on rowstore on-disc tables is implemented. Potentially they could have provided an option for hash indexes here but they haven't.
Postgres does have hash indexes
The in memory OLTP implementation in SQL Server also supports hash indexes or range indexes (BW tree).
Hash indexes are only good for equality type operations on the whole key.
B trees are significantly more versatile than just this simple key/value type use case and can also support range searches or searches on leading columns of a composite key or bringing back results in a particular sort order.
axr492tv2#
Indexes are rarely well understood by developers. Here is a general explanation of the history of indexes and their nature...
In the Middle Ages, the notion of address (to get to a specific place, a house for example) was very vague. The peasants did not move, alone, the aristocracy traveled, but only went to other aristocrats whose residence was well known. Only the villages had a name. It's already a first form of index: name of the village plus map provides quicker access to the desired location!
Nevertheless, in the 17th century in Paris, people were convinced that it was necessary to give a name to the streets of the big cities to better identify themselves. In 1728 plaques were placed on the houses to indicate the name of each street. The indexing of places then became more precise.
In 1805, it was decided in Paris to number the houses. Not randomly, in a specific order. The odd numbers on the left, the even numbers on the right and starting at the end of the street closest to the Seine. This algorithm then made it possible to quickly find the exact location of a dwelling. It is still used today...
By adding the postal code in 1964, the mail became faster to deliver...
All this information is useless to find a specific place. They are just essential to find it quickly. That's all the point of the index...
Note that there is not one type of index, but tens.
In the case of our addresses, a geographic coordinate with its latitude and longitude is also a form of address, less easy to interpret for humans, but very easy to manipulate for GPS!
The same thing happened on the phone. At first, a switchboard was called and a local operator connected you with the central of the city in which the recipient of your call was located. The remote operator then asked you for the name of the subscriber with whom you wanted to talk...
With the democratization of the telephone, numbers arrived. It was impossible for the operators to know all the names of the subscribers. Then the operators were replaced by electromechanical control units, then by electronic switches... Today everything is done electronically with index tables...
In fact we are surrounded by indexes without knowing it. Your address, your email, your phone numbers, your social security code are forms of index. You don't absolutely need them, but they are just essential so you don't waste time.
In computer science, a strict definition of the notion of index is as follows:
Specific data structure, containing redundant information, specially organized to speed up certain searches
There are therefore many types of index more or less adapted to the information that we want to find quickly.
Mainly the most common algorithms in databases are:
BTree is a perfect index for atomic data. This index type can be used with many operators like :
but it is not effective on the following operators:
And it can be used in a set of columns... But it take a lot of place, because the data is a strict copy of the original one.
Hash indexes are only suitable for :
Bitmap indexes replace the collections of the differents values by a bit into a binary word. It can be used the same ways as hash indexes :
You must understand that any indexes is compound of a special data structure and a specific algorithm, to use this special structure, and find quickly the data you are searching...
So every index needs a data structure and CLUSTERED or NONCLUSTERED SQL Server indexes both uses a BTree+, + is a refinement of BTree index that add a linked list of index pages at evry level.