目前我正纠结于PostgreSQL 11、JSONB和数据库索引。
我们有一个客户表(简称):
create table public.customers (
id uuid not null,
created timestamp without time zone,
lastmodified timestamp without time zone,
data jsonb,
primary key (id, mandantid)
);
数据列中的JSON如下所示(已缩短):
{
"id": "...",
"$encrypted": {
"iv": "...",
"data": "...",
"keyRef": "key1"
}
}
我想做的是创建一个查询来计算所有的数据库条目,即没有'key1'作为'keyRef'。我不是一个PostgreSQLMaven,所以我尝试了几种方法。
我搜索了几个文档/stackoverflow/blog,找到了以下方法:
方法1
CREATE INDEX idx_customer_encryption_key ON customers(( (data->'$encrypted')::jsonb ->>'keyRef'::text));
COUNT(id) FROM customers WHERE data->'$encrypted' ->> 'keyRef' != 'key1';
查询可以工作,但速度较慢(未使用索引)。
Finalize Aggregate (cost=163614.73..163614.74 rows=1 width=8) (actual time=6604.325..6610.718 rows=1 loops=1)
Output: count(id)
Buffers: shared hit=6550991 read=731452
-> Gather (cost=163614.51..163614.72 rows=2 width=8) (actual time=6604.315..6610.709 rows=3 loops=1)
Output: (PARTIAL count(id))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=6550991 read=731452
-> Partial Aggregate (cost=162614.51..162614.52 rows=1 width=8) (actual time=6594.083..6594.083 rows=1 loops=3)
Output: PARTIAL count(id)
Buffers: shared hit=6550991 read=731452
Worker 0: actual time=6588.966..6588.967 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.314 ms, Inlining 0.000 ms, Optimization 0.164 ms, Emission 3.179 ms, Total 3.657 ms"
Buffers: shared hit=2180408 read=243504
Worker 1: actual time=6589.102..6589.102 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.321 ms, Inlining 0.000 ms, Optimization 0.165 ms, Emission 3.174 ms, Total 3.661 ms"
Buffers: shared hit=2142345 read=238322
-> Parallel Seq Scan on public.customers (cost=0.00..160368.33 rows=898475 width=16) (actual time=3.728..6539.198 rows=722392 loops=3)
" Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
Filter: (((customers.data -> '$encrypted'::text) ->> 'keyRef'::text) <> 'key1'::text)
Buffers: shared hit=6550991 read=731452
Worker 0: actual time=3.343..6533.485 rows=720314 loops=1
Buffers: shared hit=2180408 read=243504
Worker 1: actual time=3.808..6535.494 rows=706508 loops=1
Buffers: shared hit=2142345 read=238322
Planning:
Buffers: shared hit=20 read=2 dirtied=1
Planning Time: 0.561 ms
JIT:
Functions: 17
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 1.021 ms, Inlining 0.000 ms, Optimization 0.569 ms, Emission 10.153 ms, Total 11.744 ms"
Execution Time: 6611.164 ms
方法2
CREATE INDEX idx_customer_encryption_key ON customers (jsonb_extract_path_text(data, '$encrypted', 'keyRef'));
SELECT COUNT(id) FROM customers WHERE jsonb_extract_path_text(data, '$encrypted', 'keyRef') != 'key1';
查询可以工作,但速度较慢(未使用索引)。
Finalize Aggregate (cost=161357.25..161357.26 rows=1 width=8) (actual time=6531.578..6538.081 rows=1 loops=1)
Output: count(id)
Buffers: shared hit=6550991 read=731452 written=2
-> Gather (cost=161357.04..161357.25 rows=2 width=8) (actual time=6531.568..6538.072 rows=3 loops=1)
Output: (PARTIAL count(id))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=6550991 read=731452 written=2
-> Partial Aggregate (cost=160357.04..160357.05 rows=1 width=8) (actual time=6521.380..6521.381 rows=1 loops=3)
Output: PARTIAL count(id)
Buffers: shared hit=6550991 read=731452 written=2
Worker 0: actual time=6516.406..6516.407 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.297 ms, Inlining 0.000 ms, Optimization 0.186 ms, Emission 3.301 ms, Total 3.784 ms"
Buffers: shared hit=2185669 read=243926
Worker 1: actual time=6516.327..6516.328 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.331 ms, Inlining 0.000 ms, Optimization 0.161 ms, Emission 3.384 ms, Total 3.876 ms"
Buffers: shared hit=2155723 read=241152
-> Parallel Seq Scan on public.customers (cost=0.00..158110.85 rows=898475 width=16) (actual time=3.594..6466.590 rows=722392 loops=3)
" Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
" Filter: (jsonb_extract_path_text(customers.data, VARIADIC '{$encrypted,keyRef}'::text[]) <> 'key1'::text)"
Buffers: shared hit=6550991 read=731452 written=2
Worker 0: actual time=3.490..6460.749 rows=724558 loops=1
Buffers: shared hit=2185669 read=243926
Worker 1: actual time=3.937..6460.802 rows=712789 loops=1
Buffers: shared hit=2155723 read=241152
Planning:
Buffers: shared hit=18 read=2 dirtied=1
Planning Time: 0.453 ms
JIT:
Functions: 17
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.931 ms, Inlining 0.000 ms, Optimization 0.516 ms, Emission 9.871 ms, Total 11.319 ms"
Execution Time: 6538.429 ms
方法3
CREATE INDEX idx_customer_encryption_key ON customers ( ((data #> '{$encrypted,keyRef}')::varchar));
SELECT COUNT(id) FROM customers WHERE (data #> '{$encrypted,keyRef}')::varchar <> 'initialKey';
查询可以工作,但速度较慢(未使用索引)。
Finalize Aggregate (cost=165872.20..165872.21 rows=1 width=8) (actual time=6914.265..6920.442 rows=1 loops=1)
Output: count(id)
Buffers: shared hit=6550978 read=731465 written=13
-> Gather (cost=165871.99..165872.20 rows=2 width=8) (actual time=6914.144..6920.432 rows=3 loops=1)
Output: (PARTIAL count(id))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=6550978 read=731465 written=13
-> Partial Aggregate (cost=164871.99..164872.00 rows=1 width=8) (actual time=6904.453..6904.454 rows=1 loops=3)
Output: PARTIAL count(id)
Buffers: shared hit=6550978 read=731465 written=13
Worker 0: actual time=6899.759..6899.759 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.349 ms, Inlining 0.000 ms, Optimization 0.173 ms, Emission 3.345 ms, Total 3.867 ms"
Buffers: shared hit=2172611 read=241759 written=7
Worker 1: actual time=6899.599..6899.600 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.347 ms, Inlining 0.000 ms, Optimization 0.174 ms, Emission 3.352 ms, Total 3.874 ms"
Buffers: shared hit=2175840 read=243601 written=3
-> Parallel Seq Scan on public.customers (cost=0.00..162625.80 rows=898475 width=16) (actual time=3.750..6848.950 rows=722392 loops=3)
" Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
" Filter: ((((customers.data #> '{$encrypted,keyRef}'::text[]))::character varying)::text <> 'initialKey'::text)"
Buffers: shared hit=6550978 read=731465 written=13
Worker 0: actual time=3.532..6844.562 rows=716749 loops=1
Buffers: shared hit=2172611 read=241759 written=7
Worker 1: actual time=4.014..6843.198 rows=721111 loops=1
Buffers: shared hit=2175840 read=243601 written=3
Planning:
Buffers: shared hit=18 read=2 dirtied=1
Planning Time: 0.367 ms
JIT:
Functions: 17
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 1.042 ms, Inlining 0.000 ms, Optimization 0.531 ms, Emission 10.215 ms, Total 11.788 ms"
Execution Time: 6920.835 ms
方法4
CREATE INDEX idx_customer_encryption_key ON customers USING gin( (data -> '$encrypted') jsonb_path_ops);
SELECT COUNT(id) FROM customers WHERE data -> '$encrypted' @> '{"$encrypted": { "keyRef": "key1"}}';
不完全是我想要的,只是测试索引是否有效。使用索引但结果总是0。
Aggregate (cost=58652.33..58652.34 rows=1 width=8) (actual time=0.020..0.021 rows=1 loops=1)
Output: count(data)
Buffers: shared hit=1 read=3
-> Bitmap Heap Scan on public.customers (cost=215.96..58598.15 rows=21672 width=341) (actual time=0.018..0.018 rows=0 loops=1)
" Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
" Recheck Cond: ((customers.data -> '$encrypted'::text) @> '{""$encrypted"": {""keyRef"": ""key1""}}'::jsonb)"
Buffers: shared hit=1 read=3
-> Bitmap Index Scan on idx_customer_encryption_key (cost=0.00..210.54 rows=21672 width=0) (actual time=0.016..0.016 rows=0 loops=1)
" Index Cond: ((customers.data -> '$encrypted'::text) @> '{""$encrypted"": {""keyRef"": ""key1""}}'::jsonb)"
Buffers: shared hit=1 read=3
Planning:
Buffers: shared read=1
Planning Time: 0.680 ms
Execution Time: 0.062 ms
SELECT COUNT(data) FROM customers WHERE data -> '$encrypted' @> '{ "keyRef": "key1"}';
计数正确,但速度慢(不使用索引)。
Aggregate (cost=58652.33..58652.34 rows=1 width=8) (actual time=32023.053..32023.054 rows=1 loops=1)
Output: count(data)
Buffers: shared hit=6550821 read=731787
-> Bitmap Heap Scan on public.customers (cost=215.96..58598.15 rows=21672 width=341) (actual time=112.830..31779.441 rows=2167122 loops=1)
" Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
" Recheck Cond: ((customers.data -> '$encrypted'::text) @> '{""keyRef"": ""initialKey""}'::jsonb)"
Rows Removed by Index Recheck: 49
Heap Blocks: exact=45746 lossy=98820
Buffers: shared hit=6550821 read=731787
-> Bitmap Index Scan on idx_customer_encryption_key (cost=0.00..210.54 rows=21672 width=0) (actual time=106.353..106.354 rows=2167122 loops=1)
" Index Cond: ((customers.data -> '$encrypted'::text) @> '{""keyRef"": ""initialKey""}'::jsonb)"
Buffers: shared hit=2 read=303
Planning:
Buffers: shared hit=1
Planning Time: 0.068 ms
Execution Time: 32023.422 ms
伯吉斯的评论
一个12b1x一个13b1x
方法5
CREATE INDEX idx_customer_encryption_key ON customers USING GIN( ((data -> '$encrypted')::jsonb));
SELECT COUNT(data) FROM customers WHERE data -> '$encrypted' @> '{"keyRef" : "key1"}';
查询可以工作,但速度较慢(未使用索引)。
Aggregate (cost=58676.33..58676.34 rows=1 width=8) (actual time=27662.270..27662.272 rows=1 loops=1)
Output: count(data)
Buffers: shared hit=6551064 read=731894 written=13274
-> Bitmap Heap Scan on public.customers (cost=239.96..58622.15 rows=21672 width=341) (actual time=145.119..27449.141 rows=2167122 loops=1)
" Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
" Recheck Cond: ((customers.data -> '$encrypted'::text) @> '{""keyRef"": ""initialKey""}'::jsonb)"
Rows Removed by Index Recheck: 49
Heap Blocks: exact=45746 lossy=98820
Buffers: shared hit=6551064 read=731894 written=13274
-> Bitmap Index Scan on idx_customer_encryption_key (cost=0.00..234.54 rows=21672 width=0) (actual time=138.368..138.369 rows=2167122 loops=1)
" Index Cond: ((customers.data -> '$encrypted'::text) @> '{""keyRef"": ""initialKey""}'::jsonb)"
Buffers: shared hit=240 read=415
Planning:
Buffers: shared hit=22 read=4 dirtied=1
Planning Time: 0.575 ms
Execution Time: 27662.936 ms
方法6
CREATE INDEX idx_customer_encryption_key ON customers USING gin( (data #> '{$encrypted,keyRef}') );
SELECT COUNT(data) from customers WHERE (data #>> '{$encrypted,keyRef}')::text = 'key1';
查询可以工作,但速度较慢(未使用索引)。
Finalize Aggregate (cost=159122.35..159122.36 rows=1 width=8) (actual time=7124.276..7130.794 rows=1 loops=1)
Output: count(data)
Buffers: shared hit=6550972 read=731471 written=307
-> Gather (cost=159122.14..159122.35 rows=2 width=8) (actual time=7124.001..7130.783 rows=3 loops=1)
Output: (PARTIAL count(data))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=6550972 read=731471 written=307
-> Partial Aggregate (cost=158122.14..158122.15 rows=1 width=8) (actual time=7108.420..7108.420 rows=1 loops=3)
Output: PARTIAL count(data)
Buffers: shared hit=6550972 read=731471 written=307
Worker 0: actual time=7100.765..7100.765 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.632 ms, Inlining 0.000 ms, Optimization 0.411 ms, Emission 9.283 ms, Total 10.327 ms"
Buffers: shared hit=2175813 read=242587 written=85
Worker 1: actual time=7100.660..7100.661 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.620 ms, Inlining 0.000 ms, Optimization 0.393 ms, Emission 8.127 ms, Total 9.140 ms"
Buffers: shared hit=2180525 read=244599 written=117
-> Parallel Seq Scan on public.customers (cost=0.00..158110.85 rows=4515 width=341) (actual time=8.311..7054.700 rows=722374 loops=3)
" Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
" Filter: ((customers.data #>> '{$encrypted,keyRef}'::text[]) = 'initialKey'::text)"
Rows Removed by Filter: 18
Buffers: shared hit=6550972 read=731471 written=307
Worker 0: actual time=10.292..7046.967 rows=718818 loops=1
Buffers: shared hit=2175813 read=242587 written=85
Worker 1: actual time=8.559..7046.322 rows=726031 loops=1
Buffers: shared hit=2180525 read=244599 written=117
Planning:
Buffers: shared hit=17 dirtied=1
Planning Time: 0.187 ms
JIT:
Functions: 17
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 1.666 ms, Inlining 0.000 ms, Optimization 1.118 ms, Emission 23.162 ms, Total 25.945 ms"
Execution Time: 7131.263 ms
方法7
现在我不知道为什么查询不使用索引了。
有什么想法吗?任何帮助都是非常感谢的!
1条答案
按热度按时间rsaldnfx1#
虽然理论上一个常规的B树索引可以通过对〈和〉进行分段扫描并合并它们来实现!=,但这并没有实现,所以在当前的系统中,您需要一个过滤索引,它的WHERE与查询中使用的WHERE匹配。
当然,除非这种情况非常罕见,否则索引可能就没有用了,而且,如果“data”没有路径
'$encrypted' -> 'keyRef'
,这可能就不能达到你想要的效果,因为你会测试NULL != 'key1'
,结果不会是true。