PostgreSQL、JSONB、嵌套查询和数据库索引的问题

vlf7wbxs  于 2023-01-25  发布在  PostgreSQL
关注(0)|答案(1)|浏览(181)

目前我正纠结于PostgreSQL 11、JSONB和数据库索引。
我们有一个客户表(简称):

create table public.customers (   
   id uuid not null,   
   created timestamp without time zone,   
   lastmodified timestamp without time zone,   
   data jsonb,   
   primary key (id, mandantid) 
);

数据列中的JSON如下所示(已缩短):

{
  "id": "...",
  "$encrypted": {
    "iv": "...",
    "data": "...",
    "keyRef": "key1"
  }
}

我想做的是创建一个查询来计算所有的数据库条目,即没有'key1'作为'keyRef'。我不是一个PostgreSQLMaven,所以我尝试了几种方法。
我搜索了几个文档/stackoverflow/blog,找到了以下方法:

方法1

CREATE INDEX idx_customer_encryption_key ON customers(( (data->'$encrypted')::jsonb ->>'keyRef'::text));
COUNT(id) FROM customers WHERE data->'$encrypted' ->> 'keyRef' != 'key1';

查询可以工作,但速度较慢(未使用索引)。

Finalize Aggregate  (cost=163614.73..163614.74 rows=1 width=8) (actual time=6604.325..6610.718 rows=1 loops=1)
  Output: count(id)
  Buffers: shared hit=6550991 read=731452
  ->  Gather  (cost=163614.51..163614.72 rows=2 width=8) (actual time=6604.315..6610.709 rows=3 loops=1)
        Output: (PARTIAL count(id))
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=6550991 read=731452
        ->  Partial Aggregate  (cost=162614.51..162614.52 rows=1 width=8) (actual time=6594.083..6594.083 rows=1 loops=3)
              Output: PARTIAL count(id)
              Buffers: shared hit=6550991 read=731452
              Worker 0:  actual time=6588.966..6588.967 rows=1 loops=1
                JIT:
                  Functions: 5
"                  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"                  Timing: Generation 0.314 ms, Inlining 0.000 ms, Optimization 0.164 ms, Emission 3.179 ms, Total 3.657 ms"
                Buffers: shared hit=2180408 read=243504
              Worker 1:  actual time=6589.102..6589.102 rows=1 loops=1
                JIT:
                  Functions: 5
"                  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"                  Timing: Generation 0.321 ms, Inlining 0.000 ms, Optimization 0.165 ms, Emission 3.174 ms, Total 3.661 ms"
                Buffers: shared hit=2142345 read=238322
              ->  Parallel Seq Scan on public.customers  (cost=0.00..160368.33 rows=898475 width=16) (actual time=3.728..6539.198 rows=722392 loops=3)
"                    Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
                    Filter: (((customers.data -> '$encrypted'::text) ->> 'keyRef'::text) <> 'key1'::text)
                    Buffers: shared hit=6550991 read=731452
                    Worker 0:  actual time=3.343..6533.485 rows=720314 loops=1
                      Buffers: shared hit=2180408 read=243504
                    Worker 1:  actual time=3.808..6535.494 rows=706508 loops=1
                      Buffers: shared hit=2142345 read=238322
Planning:
  Buffers: shared hit=20 read=2 dirtied=1
Planning Time: 0.561 ms
JIT:
  Functions: 17
"  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"  Timing: Generation 1.021 ms, Inlining 0.000 ms, Optimization 0.569 ms, Emission 10.153 ms, Total 11.744 ms"
Execution Time: 6611.164 ms

方法2

CREATE INDEX idx_customer_encryption_key ON customers (jsonb_extract_path_text(data, '$encrypted', 'keyRef'));
SELECT COUNT(id) FROM customers WHERE jsonb_extract_path_text(data, '$encrypted', 'keyRef') != 'key1';

查询可以工作,但速度较慢(未使用索引)。

Finalize Aggregate  (cost=161357.25..161357.26 rows=1 width=8) (actual time=6531.578..6538.081 rows=1 loops=1)
  Output: count(id)
  Buffers: shared hit=6550991 read=731452 written=2
  ->  Gather  (cost=161357.04..161357.25 rows=2 width=8) (actual time=6531.568..6538.072 rows=3 loops=1)
        Output: (PARTIAL count(id))
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=6550991 read=731452 written=2
        ->  Partial Aggregate  (cost=160357.04..160357.05 rows=1 width=8) (actual time=6521.380..6521.381 rows=1 loops=3)
              Output: PARTIAL count(id)
              Buffers: shared hit=6550991 read=731452 written=2
              Worker 0:  actual time=6516.406..6516.407 rows=1 loops=1
                JIT:
                  Functions: 5
"                  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"                  Timing: Generation 0.297 ms, Inlining 0.000 ms, Optimization 0.186 ms, Emission 3.301 ms, Total 3.784 ms"
                Buffers: shared hit=2185669 read=243926
              Worker 1:  actual time=6516.327..6516.328 rows=1 loops=1
                JIT:
                  Functions: 5
"                  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"                  Timing: Generation 0.331 ms, Inlining 0.000 ms, Optimization 0.161 ms, Emission 3.384 ms, Total 3.876 ms"
                Buffers: shared hit=2155723 read=241152
              ->  Parallel Seq Scan on public.customers  (cost=0.00..158110.85 rows=898475 width=16) (actual time=3.594..6466.590 rows=722392 loops=3)
"                    Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
"                    Filter: (jsonb_extract_path_text(customers.data, VARIADIC '{$encrypted,keyRef}'::text[]) <> 'key1'::text)"
                    Buffers: shared hit=6550991 read=731452 written=2
                    Worker 0:  actual time=3.490..6460.749 rows=724558 loops=1
                      Buffers: shared hit=2185669 read=243926
                    Worker 1:  actual time=3.937..6460.802 rows=712789 loops=1
                      Buffers: shared hit=2155723 read=241152
Planning:
  Buffers: shared hit=18 read=2 dirtied=1
Planning Time: 0.453 ms
JIT:
  Functions: 17
"  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"  Timing: Generation 0.931 ms, Inlining 0.000 ms, Optimization 0.516 ms, Emission 9.871 ms, Total 11.319 ms"
Execution Time: 6538.429 ms

方法3

CREATE INDEX idx_customer_encryption_key ON customers ( ((data #> '{$encrypted,keyRef}')::varchar));
SELECT COUNT(id) FROM customers WHERE (data #> '{$encrypted,keyRef}')::varchar <> 'initialKey';

查询可以工作,但速度较慢(未使用索引)。

Finalize Aggregate  (cost=165872.20..165872.21 rows=1 width=8) (actual time=6914.265..6920.442 rows=1 loops=1)
  Output: count(id)
  Buffers: shared hit=6550978 read=731465 written=13
  ->  Gather  (cost=165871.99..165872.20 rows=2 width=8) (actual time=6914.144..6920.432 rows=3 loops=1)
        Output: (PARTIAL count(id))
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=6550978 read=731465 written=13
        ->  Partial Aggregate  (cost=164871.99..164872.00 rows=1 width=8) (actual time=6904.453..6904.454 rows=1 loops=3)
              Output: PARTIAL count(id)
              Buffers: shared hit=6550978 read=731465 written=13
              Worker 0:  actual time=6899.759..6899.759 rows=1 loops=1
                JIT:
                  Functions: 5
"                  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"                  Timing: Generation 0.349 ms, Inlining 0.000 ms, Optimization 0.173 ms, Emission 3.345 ms, Total 3.867 ms"
                Buffers: shared hit=2172611 read=241759 written=7
              Worker 1:  actual time=6899.599..6899.600 rows=1 loops=1
                JIT:
                  Functions: 5
"                  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"                  Timing: Generation 0.347 ms, Inlining 0.000 ms, Optimization 0.174 ms, Emission 3.352 ms, Total 3.874 ms"
                Buffers: shared hit=2175840 read=243601 written=3
              ->  Parallel Seq Scan on public.customers  (cost=0.00..162625.80 rows=898475 width=16) (actual time=3.750..6848.950 rows=722392 loops=3)
"                    Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
"                    Filter: ((((customers.data #> '{$encrypted,keyRef}'::text[]))::character varying)::text <> 'initialKey'::text)"
                    Buffers: shared hit=6550978 read=731465 written=13
                    Worker 0:  actual time=3.532..6844.562 rows=716749 loops=1
                      Buffers: shared hit=2172611 read=241759 written=7
                    Worker 1:  actual time=4.014..6843.198 rows=721111 loops=1
                      Buffers: shared hit=2175840 read=243601 written=3
Planning:
  Buffers: shared hit=18 read=2 dirtied=1
Planning Time: 0.367 ms
JIT:
  Functions: 17
"  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"  Timing: Generation 1.042 ms, Inlining 0.000 ms, Optimization 0.531 ms, Emission 10.215 ms, Total 11.788 ms"
Execution Time: 6920.835 ms

方法4

CREATE INDEX idx_customer_encryption_key ON customers USING gin( (data -> '$encrypted') jsonb_path_ops);
SELECT COUNT(id) FROM customers WHERE data -> '$encrypted' @> '{"$encrypted": { "keyRef": "key1"}}';

不完全是我想要的,只是测试索引是否有效。使用索引但结果总是0。

Aggregate  (cost=58652.33..58652.34 rows=1 width=8) (actual time=0.020..0.021 rows=1 loops=1)
  Output: count(data)
  Buffers: shared hit=1 read=3
  ->  Bitmap Heap Scan on public.customers  (cost=215.96..58598.15 rows=21672 width=341) (actual time=0.018..0.018 rows=0 loops=1)
"        Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
"        Recheck Cond: ((customers.data -> '$encrypted'::text) @> '{""$encrypted"": {""keyRef"": ""key1""}}'::jsonb)"
        Buffers: shared hit=1 read=3
        ->  Bitmap Index Scan on idx_customer_encryption_key  (cost=0.00..210.54 rows=21672 width=0) (actual time=0.016..0.016 rows=0 loops=1)
"              Index Cond: ((customers.data -> '$encrypted'::text) @> '{""$encrypted"": {""keyRef"": ""key1""}}'::jsonb)"
              Buffers: shared hit=1 read=3
Planning:
  Buffers: shared read=1
Planning Time: 0.680 ms
Execution Time: 0.062 ms
SELECT COUNT(data) FROM customers WHERE data -> '$encrypted' @> '{ "keyRef": "key1"}';

计数正确,但速度慢(不使用索引)。

Aggregate  (cost=58652.33..58652.34 rows=1 width=8) (actual time=32023.053..32023.054 rows=1 loops=1)
  Output: count(data)
  Buffers: shared hit=6550821 read=731787
  ->  Bitmap Heap Scan on public.customers  (cost=215.96..58598.15 rows=21672 width=341) (actual time=112.830..31779.441 rows=2167122 loops=1)
"        Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
"        Recheck Cond: ((customers.data -> '$encrypted'::text) @> '{""keyRef"": ""initialKey""}'::jsonb)"
        Rows Removed by Index Recheck: 49
        Heap Blocks: exact=45746 lossy=98820
        Buffers: shared hit=6550821 read=731787
        ->  Bitmap Index Scan on idx_customer_encryption_key  (cost=0.00..210.54 rows=21672 width=0) (actual time=106.353..106.354 rows=2167122 loops=1)
"              Index Cond: ((customers.data -> '$encrypted'::text) @> '{""keyRef"": ""initialKey""}'::jsonb)"
              Buffers: shared hit=2 read=303
Planning:
  Buffers: shared hit=1
Planning Time: 0.068 ms
Execution Time: 32023.422 ms
伯吉斯的评论

一个12b1x一个13b1x

方法5

CREATE INDEX idx_customer_encryption_key ON customers USING GIN( ((data -> '$encrypted')::jsonb));
SELECT COUNT(data) FROM customers WHERE data -> '$encrypted' @> '{"keyRef" : "key1"}';

查询可以工作,但速度较慢(未使用索引)。

Aggregate  (cost=58676.33..58676.34 rows=1 width=8) (actual time=27662.270..27662.272 rows=1 loops=1)
  Output: count(data)
  Buffers: shared hit=6551064 read=731894 written=13274
  ->  Bitmap Heap Scan on public.customers  (cost=239.96..58622.15 rows=21672 width=341) (actual time=145.119..27449.141 rows=2167122 loops=1)
"        Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
"        Recheck Cond: ((customers.data -> '$encrypted'::text) @> '{""keyRef"": ""initialKey""}'::jsonb)"
        Rows Removed by Index Recheck: 49
        Heap Blocks: exact=45746 lossy=98820
        Buffers: shared hit=6551064 read=731894 written=13274
        ->  Bitmap Index Scan on idx_customer_encryption_key  (cost=0.00..234.54 rows=21672 width=0) (actual time=138.368..138.369 rows=2167122 loops=1)
"              Index Cond: ((customers.data -> '$encrypted'::text) @> '{""keyRef"": ""initialKey""}'::jsonb)"
              Buffers: shared hit=240 read=415
Planning:
  Buffers: shared hit=22 read=4 dirtied=1
Planning Time: 0.575 ms
Execution Time: 27662.936 ms

方法6

CREATE INDEX idx_customer_encryption_key ON customers USING gin( (data #> '{$encrypted,keyRef}') );
SELECT COUNT(data) from customers WHERE (data #>> '{$encrypted,keyRef}')::text = 'key1';

查询可以工作,但速度较慢(未使用索引)。

Finalize Aggregate  (cost=159122.35..159122.36 rows=1 width=8) (actual time=7124.276..7130.794 rows=1 loops=1)
  Output: count(data)
  Buffers: shared hit=6550972 read=731471 written=307
  ->  Gather  (cost=159122.14..159122.35 rows=2 width=8) (actual time=7124.001..7130.783 rows=3 loops=1)
        Output: (PARTIAL count(data))
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=6550972 read=731471 written=307
        ->  Partial Aggregate  (cost=158122.14..158122.15 rows=1 width=8) (actual time=7108.420..7108.420 rows=1 loops=3)
              Output: PARTIAL count(data)
              Buffers: shared hit=6550972 read=731471 written=307
              Worker 0:  actual time=7100.765..7100.765 rows=1 loops=1
                JIT:
                  Functions: 5
"                  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"                  Timing: Generation 0.632 ms, Inlining 0.000 ms, Optimization 0.411 ms, Emission 9.283 ms, Total 10.327 ms"
                Buffers: shared hit=2175813 read=242587 written=85
              Worker 1:  actual time=7100.660..7100.661 rows=1 loops=1
                JIT:
                  Functions: 5
"                  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"                  Timing: Generation 0.620 ms, Inlining 0.000 ms, Optimization 0.393 ms, Emission 8.127 ms, Total 9.140 ms"
                Buffers: shared hit=2180525 read=244599 written=117
              ->  Parallel Seq Scan on public.customers  (cost=0.00..158110.85 rows=4515 width=341) (actual time=8.311..7054.700 rows=722374 loops=3)
"                    Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
"                    Filter: ((customers.data #>> '{$encrypted,keyRef}'::text[]) = 'initialKey'::text)"
                    Rows Removed by Filter: 18
                    Buffers: shared hit=6550972 read=731471 written=307
                    Worker 0:  actual time=10.292..7046.967 rows=718818 loops=1
                      Buffers: shared hit=2175813 read=242587 written=85
                    Worker 1:  actual time=8.559..7046.322 rows=726031 loops=1
                      Buffers: shared hit=2180525 read=244599 written=117
Planning:
  Buffers: shared hit=17 dirtied=1
Planning Time: 0.187 ms
JIT:
  Functions: 17
"  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"  Timing: Generation 1.666 ms, Inlining 0.000 ms, Optimization 1.118 ms, Emission 23.162 ms, Total 25.945 ms"
Execution Time: 7131.263 ms

方法7

现在我不知道为什么查询不使用索引了。
有什么想法吗?任何帮助都是非常感谢的!

rsaldnfx

rsaldnfx1#

虽然理论上一个常规的B树索引可以通过对〈和〉进行分段扫描并合并它们来实现!=,但这并没有实现,所以在当前的系统中,您需要一个过滤索引,它的WHERE与查询中使用的WHERE匹配。

create index on customers (id) where data->'$encrypted' ->> 'keyRef' != 'key1';

select COUNT(id) FROM customers WHERE data->'$encrypted' ->> 'keyRef' != 'key1';

当然,除非这种情况非常罕见,否则索引可能就没有用了,而且,如果“data”没有路径'$encrypted' -> 'keyRef',这可能就不能达到你想要的效果,因为你会测试NULL != 'key1',结果不会是true。

相关问题