我在kubernetes有3个节点的cassandra集群。使用比特纳米/Cassandra Helm 图部署Cassandra。
一段时间后,由于请求数过多而导致获取错误
WARN [GossipTasks:1] 2020-01-09 11:39:33,070 FailureDetector.java:278 - Not marking nodes down due to local pause of 8206335128 > 5000000000
WARN [GossipTasks:1] 2020-01-09 11:39:42,238 FailureDetector.java:278 - Not marking nodes down due to local pause of 6668041401 > 5000000000
WARN [GossipTasks:1] 2020-01-09 11:40:03,341 FailureDetector.java:278 - Not marking nodes down due to local pause of 15041441083 > 5000000000
WARN [PERIODIC-COMMIT-LOG-SYNCER] 2020-01-09 11:41:55,606 NoSpamLogger.java:94 - Out of 1 commit log syncs over the past 0.00s with average duration of 11850.79ms, 1 have exceeded the configured commit interval by an average of 1850.79ms
WARN [GossipTasks:1] 2020-01-09 11:42:20,019 Gossiper.java:783 - Gossip stage has 1 pending tasks; skipping status check (no nodes will be marked down)
NFO [RequestResponseStage-1] 2020-01-09 11:45:36,329 Gossiper.java:1011 - InetAddress /100.96.7.7 is now UP
INFO [RequestResponseStage-1] 2020-01-09 11:45:36,330 Gossiper.java:1011 - InetAddress /100.96.7.7 is now UP
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,931 MessagingService.java:1236 - MUTATION messages were dropped in last 5000 ms: 0 internal and 45 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 2874 ms
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,933 StatusLogger.java:47 - Pool Name Active Pending Completed Blocked All Time Blocked
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,949 StatusLogger.java:51 - MutationStage 0 0 226236 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,950 StatusLogger.java:51 - ViewMutationStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,950 StatusLogger.java:51 - ReadStage 0 0 244468 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,951 StatusLogger.java:51 - RequestResponseStage 0 0 341270 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,952 StatusLogger.java:51 - ReadRepairStage 0 0 5395 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,953 StatusLogger.java:51 - CounterMutationStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,958 StatusLogger.java:51 - MiscStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,959 StatusLogger.java:51 - CompactionExecutor 0 0 686641 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,960 StatusLogger.java:51 - MemtableReclaimMemory 0 0 689 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,962 StatusLogger.java:51 - PendingRangeCalculator 0 0 9 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,964 StatusLogger.java:51 - GossipStage 0 0 3093860 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,966 StatusLogger.java:51 - SecondaryIndexManagement 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,970 StatusLogger.java:51 - HintsDispatcher 0 0 10 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,973 StatusLogger.java:51 - MigrationStage 0 0 6 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,973 StatusLogger.java:51 - MemtablePostFlush 0 0 717 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,974 StatusLogger.java:51 - PerDiskMemtableFlushWriter_0 0 0 689 0 0
:
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,974 StatusLogger.java:51 - PerDiskMemtableFlushWriter_0 0 0 689 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,974 StatusLogger.java:51 - ValidationExecutor 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,975 StatusLogger.java:51 - Sampler 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,975 StatusLogger.java:51 - MemtableFlushWriter 0 0 689 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,976 StatusLogger.java:51 - InternalResponseStage 0 0 869 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,977 StatusLogger.java:51 - AntiEntropyStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,978 StatusLogger.java:51 - CacheCleanupExecutor 0 0 0 0 0
INFO
INFO [Service Thread] 2020-01-09 12:11:49,292 GCInspector.java:284 - ParNew GC in 659ms. CMS Old Gen: 2056877512 -> 2057740336; Par Eden Space: 671088640 -> 0; Par Survivor Space: 2636992 -> 6187520
试图解决一些基于参考文献的问题,但没有给出针对kubernetes-cassandra的错误消息:由于局部暂停而没有标记节点。为什么?
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 245904 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 0 0 696906 0 0
MutationStage 0 0 244820 0 0
MemtableReclaimMemory 0 0 697 0 0
PendingRangeCalculator 0 0 9 0 0
GossipStage 0 0 3138625 0 0
SecondaryIndexManagement 0 0 0 0 0
HintsDispatcher 0 0 10 0 0
RequestResponseStage 0 0 364305 0 0
Native-Transport-Requests 0 0 11089339 0 241
ReadRepairStage 0 0 5395 0 0
CounterMutationStage 0 0 0 0 0
MigrationStage 0 0 6 0 0
MemtablePostFlush 0 0 725 0 0
PerDiskMemtableFlushWriter_0 0 0 697 0 0
ValidationExecutor 0 0 0 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 697 0 0
InternalResponseStage 0 0 869 0 0
ViewMutationStage 0 0 0 0 0
AntiEntropyStage 0 0 0 0 0
CacheCleanupExecutor 0 0 0 0 0
Message type Dropped
READ 0
RANGE_SLICE 0
_TRACE 0
HINT 0
MUTATION 45
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 0
PAGED_RANGE 0
READ_REPAIR 0
2条答案
按热度按时间ymdaylpp1#
根据您在上面发布的日志条目,节点被重载,使它们没有响应。由于commitlog磁盘无法跟上写操作的速度,因此会删除突变。
您需要检查集群的大小,因为您可能需要添加更多节点来增加容量。干杯!
4zcjmb1e2#
从上面的“tpstats”度量看起来不错,但是我们可以看到一些变异,所以它表明集群正在过载。一些请求也被阻止了。委员会成员似乎不接受这封信。您应该计划集群扩展或开始调试节点过载的原因。