了解仅索引查询的PostgreSQL共享缓冲区读取和堆提取

PostgreSQL版本：13.10（AWS RDS）
仅索引扫描数据库查询

explain
  (analyse, verbose, buffers)
SELECT
  block_timestamp
FROM
  archive_tokentransfer
WHERE
  (
    from_addr = 72387
    AND block_timestamp >= 0
  )
order by
  block_timestamp
LIMIT
  10000

字符串
查询计划

"QUERY PLAN"
"Limit  (cost=0.71..4279.42 rows=10000 width=8) (actual time=0.483..60.586 rows=10000 loops=1)"
"  Output: block_timestamp"
"  Buffers: shared hit=4909 read=1659 written=200"
"  I/O Timings: read=37.730 write=13.824"
"  ->  Index Only Scan Backward using token_transfer_from_addr on public.archive_tokentransfer  (cost=0.71..245647.39 rows=574114 width=8) (actual time=0.482..59.544 rows=10000 loops=1)"
"        Output: block_timestamp"
"        Index Cond: ((archive_tokentransfer.from_addr = 72387) AND (archive_tokentransfer.block_timestamp >= 0))"
"        Heap Fetches: 1145"
"        Buffers: shared hit=4909 read=1659 written=200"
"        I/O Timings: read=37.730 write=13.824"
"Planning:"
"  Buffers: shared hit=235 read=7"
"  I/O Timings: read=0.036"
"Planning Time: 29.771 ms"
"Execution Time: 61.209 ms"

型
最后一个真空细节

[
  {
    "schemaname": "public",
    "relname": "archive_tokentransfer",
    "last_vacuum": null,
    "last_autovacuum": "2023-10-19 02:51:55.087541+00",
    "last_analyze": null,
    "last_autoanalyze": "2023-11-06 18:53:41.891353+00"
  }
]

型
我无法推断为什么我们必须通过6 k缓冲区来获取10 k行的索引扫描。而且，堆获取的数量（~ 1 k）似乎很高，因为我们更新表中的很少行。
索引有3个IntegerField（4字节）和2个BigInterField（8字节）-> 28字节/行+一些开销。当将索引的大小除以行数时，我们得到的平均大小约为50字节。由于每个postgres页面为8 kb，我们应该有大约100行在每个页面，即使我们假设页面是半满的。对于10 k行，这应该需要大约100页读取。
我尝试使用postgresql reindex concurrently重新创建索引，缓冲区读取和堆读取的数量没有差异，但索引的大小确实从大约1200 GB减少到850 GB。
同样，当我们尝试from_addr的不同值时，我们没有看到类似的行为，即使是那些具有超过10 k行的值。它们的缓冲区读取大约为100，堆读取为0。
我还尝试删除并重新插入相关的行-缓冲区命中从6 k减少到3.5k，但堆读取从975增加到10，001。
UPDATE：表模式

devapp=> \d+ archive_tokentransfer;
                                                          Table "public.archive_tokentransfer"
      Column       |       Type       | Collation | Nullable |                      Default                      | Storage  | Stats target | Description 
-------------------+------------------+-----------+----------+---------------------------------------------------+----------+--------------+-------------
 id                | bigint           |           | not null | nextval('archive_tokentransfer_id_seq'::regclass) | plain    |              | 
 block_timestamp   | bigint           |           | not null |                                                   | plain    |              | 
 value             | double precision |           | not null |                                                   | plain    |              | 
 value_usd         | double precision |           |          |                                                   | plain    |              | 
 block_number      | integer          |           | not null |                                                   | plain    |              | 
 tx_from_address   | integer          |           | not null |                                                   | plain    |              | 
 tx_to_address     | integer          |           | not null |                                                   | plain    |              | 
 emitting_contract | integer          |           | not null |                                                   | plain    |              | 
 from_addr         | integer          |           | not null |                                                   | plain    |              | 
 to_addr           | integer          |           | not null |                                                   | plain    |              | 
 chain_id          | smallint         |           | not null |                                                   | plain    |              | 
 tx_idx            | smallint         |           | not null |                                                   | plain    |              | 
 type              | smallint         |           | not null |                                                   | plain    |              | 
 pricing_strategy  | smallint         |           | not null |                                                   | plain    |              | 
 token_id          | bytea            |           |          |                                                   | extended |              | 
Indexes:
    "archive_tokentransfer_pkey" PRIMARY KEY, btree (id)
    "block_data_idx" btree (chain_id, block_number, tx_idx)
    "token_transfer_from_addr" btree (from_addr, block_timestamp DESC) INCLUDE (to_addr, emitting_contract)
    "token_transfer_nft_transfers" btree (emitting_contract, token_id, block_timestamp DESC) INCLUDE (from_addr, to_addr, value) WHERE token_id IS NOT NULL
    "token_transfer_participants" btree (emitting_contract, block_timestamp DESC) INCLUDE (from_addr, to_addr)
    "token_transfer_to_addr" btree (to_addr, block_timestamp DESC) INCLUDE (from_addr, emitting_contract)
    "token_transfer_value_usd" btree (block_timestamp, value_usd DESC) INCLUDE (from_addr, to_addr, emitting_contract) WHERE value_usd IS NOT NULL
Check constraints:
    "archive_tokentransfer_chain_id_check" CHECK (chain_id >= 0)
    "archive_tokentransfer_pricing_strategy_check" CHECK (pricing_strategy >= 0)
    "archive_tokentransfer_tx_idx_check" CHECK (tx_idx >= 0)
    "archive_tokentransfer_type_check" CHECK (type >= 0)
    "no_cross_chain_data_bt" CHECK (NOT chain_id = 0)
    "one_fungibletoken_only_bt" CHECK (NOT (token_id IS NOT NULL AND type = 4))
    "only_allow_token_address_types_bt" CHECK (type = ANY (ARRAY[4, 5, 6]))
Access method: heap

型

我无法推断为什么我们必须通过6 k缓冲区来获取10 k行的索引扫描。
仅索引扫描需要查询每个元组的可见性Map。可见性Map的页面通过共享缓冲区处理，并计入缓冲区计数。每次元组被与前一个元组不同的可见性Map页面覆盖时，它将被计为缓冲区访问。这些访问几乎肯定是命中，不读取（这与你所展示的一致）。没有代码去重复缓冲区命中计数。如果有4页可见性Map被重复循环，每次“当前”页面发生变化时，它都被计为缓冲区命中，即使它只是改变到两次更改前的状态。
此外，堆读取的数量（~ 1 k）似乎很高，因为我们更新表中的很少几行。
对于一个索引为850 GB的表，1000次更新很容易被认为是“很少”。也许你确实有一些更新集中在那个特定的from_addr上。但是没有意义。只需手动清理表。它会修复问题，或者不会，然后你就会知道是哪一个。

了解仅索引查询的PostgreSQL共享缓冲区读取和堆提取

1条答案

相关问题

热门标签

最新问答