mysql SQL GROUP BY，同时选择最新记录+额外条件

zlhcx6iw 于 12个月前发布在 Mysql

关注(0)|答案(3)|浏览(129)

我试图找出实现以下目标的最有效方法：
我有两张table：

word_translations(id, word_id, language_id)

字符串
和

translations(id, text, word_translation_id, created_at, updated_at, user_id)

型
因此word_translations和translations表之间存在一对多的关系。如果创建的翻译user_id不为null，则用户已经提供了自己的翻译，因此覆盖了默认翻译（每个用户的每个word_translation最多只能有一个（用户）翻译覆盖，但每个word_translation有多个默认翻译）。
我正在尝试编写SQL查询，它返回一个具有以下条件的翻译ID列表：

每个word_translation_id只有一个翻译id
translation id指向特定word_translation（max（id））的最新翻译，除非有其他具有相同word_translation_id且user_id不为null的翻译。在这种情况下，它应该指向translation id，其中user_id不为null。

我已经写了下面的查询来这样做，但我不确定它的性能：

SELECT IF(t2.id IS NULL, t1.id, t2.id) as id, t1.word_translation_id
FROM (
    SELECT MAX(t.id) as id, t.word_translation_id
    FROM translations t
    WHERE t.user_id IS NULL
    GROUP BY t.word_translation_id) as t1
LEFT JOIN (
    SELECT t.id, t.word_translation_id
    FROM translations t
    WHERE t.user_id IS NOT NULL) t2 
ON t1.word_translation_id = t2.word_translation_id;

型
有没有更有效的方法来做到这一点？（另外，你看到上面的查询有任何问题吗？）

mysql

来源：https://stackoverflow.com/questions/77628872/sql-group-by-while-selecting-the-latest-record-extra-conditions

3条答案

按热度按时间

mqkwyuun1#

使用两个与UNION结合使用的合并查询。
查询1返回没有任何用户翻译的单词的最新默认翻译。

SELECT MAX(t1.id), t1.word_translation_id
FROM translations AS t1
LEFT JOIN translations AS t2 ON t1.word_translation_id = t2.word_translation_id AND t2.user_id IS NOT NULL
WHERE t2.id IS NULL
GROUP BY t1.word_translation_id

字符串
查询2返回用户翻译。

SELECT DISTINCT id, word_translation_id
FROM translations
WHERE user_id IS NOT NULL

型
最终查询：

SELECT MAX(t1.id) AS id, t1.word_translation_id
FROM translations AS t1
LEFT JOIN translations AS t2 ON t1.word_translation_id = t2.word_translation_id AND t2.user_id IS NOT NULL
WHERE t2.id IS NULL
GROUP BY t1.word_translation_id

UNION ALL

SELECT id, word_translation_id
FROM translations
WHERE user_id IS NOT NULL

型

赞(0）回复(0）举报 12个月前

wlsrxk512#

您的第一个子查询（derived table）应该很快，索引在(user_id, word_translation_id, id)上。
一个LEFT JOIN到一个 derived table 可能会非常慢，因为没有索引来支持连接。（在LEFT JOIN中）使用外部表的索引。但某些派生表可以有索引。这是当子查询无论如何都必须排序时的情况，例如，在GROUP BY查询或带有ORDER BY和LIMIT的查询中。例如，您的第一个子查询将在GROUP BY列上具有索引。
尝试将第二子查询更改为

LEFT JOIN (
    SELECT t.id, t.word_translation_id
    FROM translations t
    WHERE t.user_id IS NOT NULL
    ORDER BY t.word_translation_id, t.id
    LIMIT 99999999999999 ) t2 
ON t1.word_translation_id = t2.word_translation_id;

字符串
这应该会导致派生表在(word_translation_id, id )上有一个索引，这应该能够支持ON子句。但是这也会导致 filesort，因为我们在WHERE子句中有一个范围条件。
在虚拟列上使用索引的技巧甚至可以摆脱 filsort。您需要的是在WHERE子句中的相等条件，如WHERE t.user_isset = 1。
因此，让我们创建虚拟列和索引：

alter table translations
  add column user_isset tinyint as (user_id is not null),
  add index (user_isset, word_translation_id, id);

型
这是一个完美的索引

WHERE t.user_isset = 1
ORDER BY t.word_translation_id, t.id

型
然而-在你的情况下，第二个子查询甚至是不必要的。你也可以这样写查询：

SELECT COALESCE(t2.id, t1.id) as id, t1.word_translation_id
FROM (
    SELECT MAX(t.id) as id, t.word_translation_id
    FROM translations t
    WHERE t.user_id IS NULL
    GROUP BY t.word_translation_id) as t1
LEFT JOIN translations t2
  ON  t1.word_translation_id = t2.word_translation_id
  AND t2.user_id IS NOT NULL

型
但是在这里，t2.user_id IS NOT NULL也有同样的问题，这是一个范围条件。所以你需要另一个索引，以word_translation_id开始，理想情况下是在(word_translation_id, user_id, id)上。或者你可以使用上面的虚拟列和索引，并将条件t2.user_id IS NOT NULL改为t2.user_isset = 1，使用一个完美的覆盖索引。
另一个简单的解决方案是使用LIMIT 1的相关子查询来为每个word_translations.id获取所需的translations.id。

select w.id as word_translation_id,
(
  select t.id
  from translations t
  where t.word_translation_id = w.id
  order by t.user_id desc, t.id desc
  limit 1
) as id
from word_translations w

型
这个查询的最佳索引是(word_translation_id, user_id, id)。但这依赖于NULL总是先排序。如果你想避免它，你可以再次使用一个带索引的虚拟列：

alter table translations
  add column user_isset tinyint as (user_id is not null),
  add index (word_translation_id, user_isset, id);

型
而查询：

select w.id as word_translation_id,
(
  select t.id
  from translations t
  where t.word_translation_id = w.id
  order by t.user_isset desc, t.id desc
  limit 1
) as id
from word_translations w

型
order by t.user_isset desc, t.id desc在这里工作，因为我们要么得到唯一一行user_isset = 1，或者如果不存在-行user_isset = 0和最高的id。
这个查询应该是相当快的，在许多数据集上很难超越性能。它是 O（m * log（n）），其中 m = word_translations中的行，n = translations中的行。

赞(0）回复(0）举报 12个月前

rryofs0p3#

我想你可以使用anti-join来简化它：

SELECT MAX(t.id) AS id, t.word_translation_id
FROM translations t
LEFT JOIN translations t2 ON
    t.user_id IS NULL
    AND t.word_translation_id = t2.word_translation_id
    AND NOT (t.id = t2.id)
    AND (t2.id > t.id OR t2.user_id IS NOT NULL) 
WHERE t2.id IS NULL
GROUP BY t.word_translation_id

字符串
对于每一个翻译，查找同一个单词的另一个翻译，该翻译的id或user_id不是null（JOIN条件）。然后排除任何为true的行（WHERE子句）。

赞(0）回复(0）举报 12个月前

我来回答

mysql SQL GROUP BY，同时选择最新记录+额外条件

3条答案

相关问题

热门标签

最新问答