我试图从几个表中查询前n行。where子句引用一个表中的列列表,而order by子句引用另一个表中的列。看起来mysql选择where子句中涉及的表作为第一次过滤(过滤不多),而应用限制后,影响返回行的是order by。如果我强制mysql为orderby使用覆盖索引,查询将立即返回所需的行。不幸的是,我无法通过jpa将索引提示传递给mysql,而使用本机查询重写所有内容将是一项大量的工作。下面是一个示例:
CREATE TABLE person (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(255),
last_name VARCHAR(255)
) engine=InnoDB;
CREATE TABLE membership (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255) NOT NULL
) engine=InnoDB;
CREATE TABLE employee (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
membership_id INTEGER NOT NULL,
type VARCHAR(15),
enabled BIT NOT NULL,
person_id INTEGER NOT NULL REFERENCES person ( id ),
CONSTRAINT fk_employee_membership_id FOREIGN KEY ( membership_id ) REFERENCES membership ( id ),
CONSTRAINT fk_employee_person_id FOREIGN KEY ( person_id ) REFERENCES person ( id )
) engine=InnoDB;
CREATE UNIQUE INDEX uk_employee_person_id ON employee ( person_id );
CREATE INDEX idx_person_first_name_last_name ON person ( first_name, last_name );
我编写了一个脚本,输出一堆insert语句,用200000行填充表:
# !/bin/bash
#
echo "INSERT INTO membership ( id, name ) VALUES ( 1, 'Default Membership' );"
for seq in {1..200000}; do
echo "INSERT INTO person ( id, first_name, last_name ) VALUES ( $seq, 'firstName$seq', 'lastName$seq' );"
echo "INSERT INTO employee ( id, membership_id, type, enabled, person_id ) VALUES ( $seq, 1, 'INDIVIDUAL', 1, $seq );"
done
我的第一次尝试:
SELECT e.*
FROM person p INNER JOIN employee e ON p.id = e.person_id
WHERE e.membership_id = 1 AND type = 'INDIVIDUAL' AND enabled = 1
ORDER BY p.first_name ASC, p.last_name ASC, p.id ASC
LIMIT 100;
-- 100 rows in set (1.43 sec)
并解释:
+----+-------------+-------+------------+--------+-------------------------------------------------+---------------------------+---------+--------------------+-------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+-------------------------------------------------+---------------------------+---------+--------------------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | e | NULL | ref | uk_employee_person_id,fk_employee_membership_id | fk_employee_membership_id | 4 | const | 99814 | 5.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | p | NULL | eq_ref | PRIMARY | PRIMARY | 4 | qsuite.e.person_id | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+-------------------------------------------------+---------------------------+---------+--------------------+-------+----------+----------------------------------------------+
现在我强制mysql在person上使用(first\u name,last\u name)索引:
SELECT e.*
FROM person p USE INDEX ( idx_person_first_name_last_name )
INNER JOIN employee e ON p.id = e.person_id
WHERE e.membership_id = 1 AND type = 'INDIVIDUAL' AND enabled = 1
ORDER BY p.first_name ASC, p.last_name ASC, p.id ASC
LIMIT 100;
-- 100 rows in set (0.00 sec)
它马上就回来了。并解释:
+----+-------------+-------+------------+--------+-------------------------------------------------+---------------------------------+---------+-------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+-------------------------------------------------+---------------------------------+---------+-------------+------+----------+-------------+
| 1 | SIMPLE | p | NULL | index | NULL | idx_person_first_name_last_name | 2046 | NULL | 100 | 100.00 | Using index |
| 1 | SIMPLE | e | NULL | eq_ref | uk_employee_person_id,fk_employee_membership_id | uk_employee_person_id | 4 | qsuite.p.id | 1 | 5.00 | Using where |
+----+-------------+-------+------------+--------+-------------------------------------------------+---------------------------------+---------+-------------+------+----------+-------------+
请注意,示例中的where子句实际上并没有过滤任何行。这在很大程度上代表了我所拥有的数据和针对该表的大量查询。有没有一种方法可以诱使mysql使用这个索引,或者用一些不太具破坏性的方法来重构这个索引以提高性能?
谢谢。
编辑:我删除了原始的覆盖索引,并在每个表中添加了一个:
CREATE INDEX idx_person_id_first_name_last_name ON person ( id, first_name, last_name );
CREATE INDEX idx_employee_etc ON employee ( membership_id, type, enabled, person_id );
它似乎加快了一点速度,但mysql仍然坚持先运行employee表:
+----+-------------+-------+------------+--------+--------------------------------------------+------------------+---------+--------------------+-------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+--------------------------------------------+------------------+---------+--------------------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | e | NULL | ref | uk_employee_person_id,idx_employee_etc | idx_employee_etc | 68 | const,const,const | 97311 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | p | NULL | eq_ref | PRIMARY,idx_person_id_first_name_last_name | PRIMARY | 4 | qsuite.e.person_id | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+--------------------------------------------+------------------+---------+--------------------+-------+----------+----------------------------------------------+
2条答案
按热度按时间h9a6wy2h1#
我会把你的第二个索引放在person表上
(id, first_name, last_name)
并且去掉第二个索引,除非您真的要以某人的名字作为主要依据进行查询。对于employee表,在
(membership_id, type, enabled, person_id)
在employee表上有适当的索引将有助于收回所有符合条件的记录。索引中包含人员的姓名和id信息可防止引擎转到原始数据页以从中提取列以进行最终排序/限制8zzbczxx2#
将名字和姓氏冗余存储在
employee
表是一个选项-但有缺点。你必须管理冗余。为了保证一致性,可以将这些列作为外键的一部分。ON UPDATE CASCADE
我会给你一些工作。但仍然需要重写insert语句或使用触发器。与first_name
以及last_name
是世界的一部分employee
表中,您将能够为查询创建最佳索引。该表如下所示:查询将更改为:
不过,如果可能的话,我会避免这样的改变。可能还有其他方法可以使用order by的索引。我首先尝试将where条件移动到相关exists子查询中:
现在,为了计算子查询,引擎需要
p.id
,所以它必须从person
首先是表(您将在执行计划中看到)。我想从索引中读出来会很聪明。注意,在innodb中,主键总是任何次键的一部分。所以idx_person_first_name_last_name
索引实际上是打开的(first_name, last_name, id)
.