postgresql 不同WHERE子句组合的索引策略，包括,文本模式

这里继续另一个问题：
How to get date_part query to hit index?
当执行以下查询时，它会命中我在dateconal，views，impressions，gender，agegroup字段上创建的复合索引：

SELECT date_part('hour', datelocal)                AS hour
     , SUM(views) FILTER (WHERE gender = 'male')   AS male
     , SUM(views) FILTER (WHERE gender = 'female') AS female
FROM   reportimpression
WHERE  datelocal >= '2019-02-01' AND datelocal <  '2019-03-01'
GROUP  BY 1
ORDER  BY 1;

字符串
但是，我希望能够根据WHERE中的其他子句过滤此查询，例如：

SELECT date_part('hour', datelocal)                AS hour
     , SUM(views) FILTER (WHERE gender = 'male')   AS male
     , SUM(views) FILTER (WHERE gender = 'female') AS female
FROM   reportimpression
WHERE  datelocal >= '2019-02-01' AND datelocal <  '2019-03-01'
AND network LIKE '%'
GROUP  BY 1
ORDER  BY 1;

型
第二个查询比第一个慢得多，尽管它应该在更少的记录上操作，而且它没有命中我的索引。
表架构：

CREATE TABLE reportimpression (
    datelocal timestamp without time zone,
    devicename text,
    network text,
    sitecode text,
    advertisername text,
    mediafilename text,
    gender text,
    agegroup text,
    views integer,
    impressions integer,
    dwelltime numeric
);

-- Indices -------------------------------------------------------

CREATE INDEX reportimpression_datelocal_index ON reportimpression(datelocal timestamp_ops);
CREATE INDEX reportimpression_viewership_index ON reportimpression(datelocal timestamp_ops,views int4_ops,impressions int4_ops,gender text_ops,agegroup text_ops);
CREATE INDEX reportimpression_test_index ON reportimpression(datelocal timestamp_ops,(date_part('hour'::text, datelocal)) float8_ops);

型
分析输出：

Finalize GroupAggregate  (cost=1005368.37..1005385.70 rows=3151 width=24) (actual time=70615.636..70615.649 rows=24 loops=1)
  Group Key: (date_part('hour'::text, datelocal))
  ->  Sort  (cost=1005368.37..1005369.94 rows=3151 width=24) (actual time=70615.631..70615.634 rows=48 loops=1)
        Sort Key: (date_part('hour'::text, datelocal))
        Sort Method: quicksort  Memory: 28kB
        ->  Gather  (cost=1005005.62..1005331.75 rows=3151 width=24) (actual time=70615.456..70641.208 rows=48 loops=1)
              Workers Planned: 1
              Workers Launched: 1
              ->  Partial HashAggregate  (cost=1004005.62..1004016.65 rows=3151 width=24) (actual time=70613.132..70613.152 rows=24 loops=2)
                    Group Key: date_part('hour'::text, datelocal)
                    ->  Parallel Seq Scan on reportimpression  (cost=0.00..996952.63 rows=2821195 width=17) (actual time=0.803..69876.914 rows=2429159 loops=2)
                          Filter: ((datelocal >= '2019-02-01 00:00:00'::timestamp without time zone) AND (datelocal < '2019-03-01 00:00:00'::timestamp without time zone) AND (network ~~ '%'::text))
                          Rows Removed by Filter: 6701736
Planning time: 0.195 ms
Execution time: 70641.349 ms

型
我是否需要创建额外的索引，调整我的SELECT，或者完全其他的东西？

您添加的 predicate 使用LIKE运算符：

AND network LIKE '%'

字符串
实际的查询计划取决于您传递的内容，而不是“%”。但是，一般来说，普通的btree索引对此毫无用处。您将需要一个三元组索引或使用文本搜索基础设施或类似的，这取决于您可能正在寻找的模式。
请参阅：

PostgreSQL LIKE查询性能变化
Pattern matching with LIKE, SIMILAR TO or regular expressions

您甚至可以合并多种索引策略。示例如下：

PostgreSQL：查找与给定句子最接近的句子

如果那是：

AND network = '<input_string>'

型
然后，无论如何，实际上使用**=**运算符，而不是LIKE。按重要性升序排列的原因：
1.较短的
1.不那么混乱
1.使Postgres规划器的工作更简单（稍微便宜一点）
1.纠正
如果无意中传递了包含特殊字符的字符串，可能会得到不正确的结果。请参阅：

正则表达式或LIKE模式的转义函数

postgresql 不同WHERE子句组合的索引策略，包括,文本模式

1条答案

相关问题

热门标签

最新问答