cassandra查询第二个分页索引在数据增长时变慢

uqzxnwby  于 2021-06-15  发布在  Cassandra
关注(0)|答案(1)|浏览(384)

当我使用分页查询二级索引时,当数据增长时,查询速度会变慢。
我认为使用分页,无论数据增长多大,查询一个页面都需要相同的时间。是真的吗?为什么我的查询速度变慢了?
我的简化表是

  1. CREATE TABLE closed_executions (
  2. domain_id uuid,
  3. workflow_id text,
  4. start_time timestamp,
  5. workflow_type_name text,
  6. PRIMARY KEY ((domain_id), start_time)
  7. ) WITH CLUSTERING ORDER BY (start_time DESC)
  8. AND COMPACTION = {
  9. 'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
  10. }
  11. AND GC_GRACE_SECONDS = 172800;

我创建了一个二级索引

  1. CREATE INDEX closed_by_type ON closed_executions (workflow_type_name);

我用以下cql查询

  1. SELECT workflow_id, start_time, workflow_type_name
  2. FROM closed_executions
  3. WHERE domain_id = ?
  4. AND start_time >= ?
  5. AND start_time <= ?
  6. AND workflow_type_name = ?

和代码

  1. query := v.session.Query(templateGetClosedWorkflowExecutionsByType,
  2. request.DomainUUID,
  3. common.UnixNanoToCQLTimestamp(request.EarliestStartTime),
  4. common.UnixNanoToCQLTimestamp(request.LatestStartTime),
  5. request.WorkflowTypeName).Consistency(gocql.One)
  6. iter := query.PageSize(request.PageSize).PageState(request.NextPageToken).Iter()
  7. // PageSize is 10, but could be thousand

环境:
macbook pro系列
Cassandra:3.11.0
gocql:github.com/gocql/gocql-master
观察:
10k排,秒以内
10万排,约3秒
1米排,约17秒
调试日志:

  1. INFO [ScheduledTasks:1] 2018-09-11 16:29:48,349 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)
  2. DEBUG [ScheduledTasks:1] 2018-09-11 16:29:48,357 MonitoringTask.java:173 - 1 operations were slow in the last 5005 msecs:
  3. <SELECT * FROM cadence_visibility.closed_executions WHERE workflow_type_name = code.uber.internal/devexp/cadence-bench/load/basic.stressWorkflowExecute AND token(domain_id, domain_partition) >= token(d3138e78-abe7-48a0-adb9-8c466a9bb3fa, 0) AND token(domain_id, domain_partition) <= token(d3138e78-abe7-48a0-adb9-8c466a9bb3fa, 0) AND start_time >= 2018-09-11 16:29-0700 AND start_time <= 1969-12-31 16:00-0800 LIMIT 10>, time 2747 msec - slow timeout 500 msec
  4. DEBUG [COMMIT-LOG-ALLOCATOR] 2018-09-11 16:31:47,774 AbstractCommitLogSegmentManager.java:107 - No segments in reserve; creating a fresh one
  5. DEBUG [ScheduledTasks:1] 2018-09-11 16:40:22,922 ColumnFamilyStore.java:899 - Enqueuing flush of size_estimates: 23.997MiB (2%) on-heap, 0.000KiB (0%) off-heap

相关参考(我的问题没有答案):
https://lists.apache.org/thread.html/%3ccaaikobidknhvoz8oqqmnczfzhdfidfw6hts63vxxcohisqyzgg@mail.gmail.com%3e
https://www.datastax.com/dev/blog/cassandra-native-secondary-index-deep-dive
https://docs.datastax.com/en/developer/java-driver/3.2/manual/paging/
--编辑表状态返回

  1. Total number of tables: 105
  2. ----------------
  3. Keyspace : cadence_visibility
  4. Read Count: 19
  5. Read Latency: 0.5125263157894736 ms.
  6. Write Count: 3220964
  7. Write Latency: 0.04900822269357869 ms.
  8. Pending Flushes: 0
  9. Table: closed_executions
  10. SSTable count: 1
  11. SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
  12. Space used (live): 20.3 MiB
  13. Space used (total): 20.3 MiB
  14. Space used by snapshots (total): 0 bytes
  15. Off heap memory used (total): 6.35 KiB
  16. SSTable Compression Ratio: 0.40192660515179696
  17. Number of keys (estimate): 3
  18. Memtable cell count: 28667
  19. Memtable data size: 7.35 MiB
  20. Memtable off heap memory used: 0 bytes
  21. Memtable switch count: 9
  22. Local read count: 9
  23. Local read latency: NaN ms
  24. Local write count: 327024
  25. Local write latency: NaN ms
  26. Pending flushes: 0
  27. Percent repaired: 0.0
  28. Bloom filter false positives: 0
  29. Bloom filter false ratio: 0.00000
  30. Bloom filter space used: 16 bytes
  31. Bloom filter off heap memory used: 8 bytes
  32. Index summary off heap memory used: 38 bytes
  33. Compression metadata off heap memory used: 6.3 KiB
  34. Compacted partition minimum bytes: 150
  35. Compacted partition maximum bytes: 62479625
  36. Compacted partition mean bytes: 31239902
  37. Average live cells per slice (last five minutes): NaN
  38. Maximum live cells per slice (last five minutes): 0
  39. Average tombstones per slice (last five minutes): NaN
  40. Maximum tombstones per slice (last five minutes): 0
  41. Dropped Mutations: 0 bytes
  42. ----------------
qkf9rpyu

qkf9rpyu1#

为什么分页不能作为主表扩展?
你的二级索引中的数据是分散的,分页只会应用逻辑,直到它到达页码,因为你的数据不是按时间聚集的,你仍然需要筛选大量的行,然后才能找到你的前10个。
查询跟踪确实显示分页在非常晚的阶段。
为什么二级索引速度慢?
首先,cassandra读取索引表以检索所有匹配行的主键,对于每个匹配行,它将读取原始表以提取数据。这是已知的反模式与低基数指数(参考https://www.datastax.com/dev/blog/cassandra-native-secondary-index-deep-dive)

相关问题