MariaDB group by很慢

xvw2m8pv  于 2023-10-20  发布在  其他
关注(0)|答案(4)|浏览(158)

我有一个场景,我试图从超过200万行中获取用户的最新消息,并按其父(或线程)ID对它们进行分组。但是,这种分组会导致查询时间大约为1秒,比没有group by时慢1000倍。
这是table

CREATE TABLE `msg` (
`msg_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`msg_to` int(10) unsigned NOT NULL,
`msg_from` int(10) unsigned NOT NULL,
`msg` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`date` timestamp NOT NULL DEFAULT current_timestamp(),
`parent` int(10) unsigned NOT NULL,
 PRIMARY KEY (`msg_id`),
 KEY `msg_toIX` (`msg_to`) USING BTREE,
 KEY `msg_fromIX` (`msg_from`) USING BTREE,
 KEY `parentIX` (`parent`) USING BTREE ) 
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

以下是我的查询

SELECT a.msg_id, a.msg_from, a.msg FROM msg a 
JOIN(SELECT MAX(msg_id) maxid FROM msg WHERE msg_to = 23 GROUP BY parent ORDER BY msg_id DESC LIMIT 10) b
ON a.msg_id IN (b.maxid)
ORDER BY a.msg_id DESC LIMIT 10

解释

这是它得到的最好的结果吗?还是我应该得到更好的性能,因为我可以在0.001中提取1万行,并且没有group by子句?我是不是走错路了?
感谢您一直以来的指导和支持

    • O.琼斯在下面的头上敲钉子。复合指数是缺失的一环
6bc51xsx

6bc51xsx1#

parentmsg_id创建一个复合索引,以便优化每个父节点的最大ID。

CREATE TABLE `msg` (
`msg_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`msg_to` int(10) unsigned NOT NULL,
`msg_from` int(10) unsigned NOT NULL,
`msg` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`date` timestamp NOT NULL DEFAULT current_timestamp(),
`parent` int(10) unsigned NOT NULL,
 PRIMARY KEY (`msg_id`),
 KEY `msg_toIX` (`msg_to`) USING BTREE,
 KEY `msg_fromIX` (`msg_from`) USING BTREE,
 KEY `parentIX` (`parent`, msg_id) USING BTREE ) 
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
rsaldnfx

rsaldnfx2#

试试这个复合指数。它covers,我相信,你内心的疑问。

CREATE INDEX to_parent_id ON msg (msg_to, parent, msg_id);

内部查询应该可以由这个索引上的range scan完成。

fquxozlt

fquxozlt3#

CREATE TABLE `msg` (
`msg_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`msg_to` int(10) unsigned NOT NULL,
`msg_from` int(10) unsigned NOT NULL,
`msg` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`date` timestamp NOT NULL DEFAULT current_timestamp(),
`parent` int(10) unsigned NOT NULL,
 PRIMARY KEY (`msg_id`),
 INDEX (`msg_to`, `parent`, `date`, `msg_id`) USING BTREE
)
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
INSERT INTO
  msg (
    msg_to,
    msg_from,
    msg,
    date,
    parent
  )
VALUES
  (23, 11, 'a', '2023-01-01 10:34:56.78', 1),
  (23, 11, 'b', '2023-01-01 11:34:56.78', 2),
  (23, 11, 'c', '2023-01-01 12:34:56.78', 3),

  (23, 22, 'd', '2023-01-01 11:34:56.78', 1),
  (23, 22, 'e', '2023-01-01 12:34:56.78', 2),
  (23, 22, 'f', '2023-01-01 13:34:56.78', 3),

  (23, 23, 'g', '2023-01-01 12:34:56.78', 1),
  (23, 23, 'h', '2023-01-01 13:34:56.78', 2),
  (23, 23, 'i', '2023-01-01 14:34:56.78', 3),

  (23, 11, 'j', '2023-01-01 13:34:56.78', 1),
  (23, 11, 'k', '2023-01-01 14:34:56.78', 2),
  (23, 11, 'l', '2023-01-01 15:34:56.78', 3),

  (23, 22, 'm', '2023-01-01 14:34:56.78', 1),
  (23, 22, 'n', '2023-01-01 15:34:56.78', 2),
  (23, 22, 'o', '2023-01-01 16:34:56.78', 3)
;
Records: 15  Duplicates: 0  Warnings: 0
WITH
  sorted AS
(
  SELECT
    *,
    ROW_NUMBER() 
      OVER (
        PARTITION BY msg_to, parent
            ORDER BY date DESC, msg_id DESC
      )
        AS thread_msg_id
  FROM
    msg
)
SELECT
  *
FROM
  sorted
WHERE
      msg_to        = 23
  AND thread_msg_id = 1
ORDER BY
  date DESC, msg_id DESC
LIMIT
  2

| msg_id| msg_to| msg_from| MSG|日期|母|线程消息ID|
| --|--|--|--|--|--|--|
| 15 | 23 | 22 |O| 2023-01-01 16:34:56| 3 | 1 |
| 14 | 23 | 22 |n| 2023-01-01 15:34:56| 2 | 1 |
fiddle

v1uwarro

v1uwarro4#

在大家的一点投入下,我将全面回答这个问题。
我最初在JOIN子查询中包含了LIMIT,因为这有助于加快速度。O.Jones关于3个字段的复合索引的建议帮助将查询时间从1秒降低到0.04秒,但我注意到结果忽略了大量的行。我在子查询中重新包含了ORDERBY,它纠正了结果,但将查询时间提高到了0.2s。
根据O.Jones的回答,将索引的顺序从(msg_to,parent,msg_id)切换到(parent,msg_to,msg_id)导致新索引将用于GROUP BY,将DERIVED表中的扫描行从100万+减少到61,查询时间减少到0.000秒,还允许省略子查询中的ORDER BY和LIMIT。
所以中的正确查询是

SELECT a.msg_id, a.msg_from, a.msg FROM msg a 
JOIN(SELECT MAX(msg_id) maxid FROM msg WHERE msg_to = 23 GROUP BY parent) b
ON a.msg_id = b.maxid
ORDER BY a.msg_id DESC 
LIMIT 10

带索引:

CREATE INDEX parent_to_id ON msg (parent, msg_to, msg_id);

谢谢大家。说实话,这是一个非常棒的问题,值得一些投票。我相信将来会有人发现它很有用:)

相关问题