如何在cassandra中计算出表中键的大致数目？

vx6bjr1n 于 2021-06-14 发布在 Cassandra

关注(0)|答案(2)|浏览(341)

我见过有人提到 ‘Number of key(estimate) 从跑步 nodetool cfstats ，但至少在我的系统（cassandra版本3.11.3）中，我没有看到：

Table: XXXXXX
            SSTable count: 4
            Space used (live): 2393755943
            Space used (total): 2393755943
            Space used by snapshots (total): 0
            Off heap memory used (total): 2529880
            SSTable Compression Ratio: 0.11501749368144083
            Number of partitions (estimate): 1146
            Memtable cell count: 296777
            Memtable data size: 147223380
            Memtable off heap memory used: 0
            Memtable switch count: 127
            Local read count: 9
            Local read latency: NaN ms
            Local write count: 44951572
            Local write latency: 0.043 ms
            Pending flushes: 0
            Percent repaired: 0.0
            Bloom filter false positives: 0
            Bloom filter false ratio: 0.00000
            Bloom filter space used: 2144
            Bloom filter off heap memory used: 2112
            Index summary off heap memory used: 240
            Compression metadata off heap memory used: 2527528
            Compacted partition minimum bytes: 447
            Compacted partition maximum bytes: 43388628
            Compacted partition mean bytes: 13547448
            Average live cells per slice (last five minutes): NaN
            Maximum live cells per slice (last five minutes): 0
            Average tombstones per slice (last five minutes): NaN
            Maximum tombstones per slice (last five minutes): 0
            Dropped Mutations: 0

有什么方法可以估计吗 select count(*) from XXXXXX 用这个版本的Cassandra？

cassandra

来源：https://stackoverflow.com/questions/57962082/how-does-one-figure-out-the-approximate-number-of-keys-in-a-table-in-cassandra

2条答案

按热度按时间

wztqucjr1#

“键的数量”和“分区的数量”是一样的——同样是一个估计值。如果您的分区键是主键（没有集群列），那么您将估计该节点上的行数。否则，很简单，分区键值的估计数。
-吉姆

赞(0）回复(0）举报 2021-06-14

zz2j4svz2#

Cassandra-13722改变了这一点。“密钥数”的估计值总是意味着“分区数”，这就很明显了。
要估算一个大表中的行数，可以将该值（分区数）作为起点。然后估计出聚类键组合（行）数的平均值，您应该能够对其进行有根据的猜测。
另一种想法是，计算一行的大小（字节）。然后看p50的输出 nodetool tablehistograms keyspacename.tablename :

Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)
50%             2.00             35.43           4866.32               124                 1

将分区大小的p50（第50百分位）除以一行的大小。这将为您提供该表返回的平均行数。然后将其乘以“分区数”，就可以得到该节点的编号。
一个人如何得到Cassandra一行的大小？

$ bin/cqlsh 127.0.0.1 -u aaron -p yourPasswordSucks -e "SELECT * FROM system.local WHERE key='local';" > local.txt
$ ls -al local.txt
-rw-r--r--  1 z001mj8  DHC\Domain Users  2321 Sep 16 15:08 local.txt

显然，您需要去掉管道分隔符和行标题之类的内容（更不用说考虑字符串和数字的大小差异），但是文件的最终字节大小应该使您处于大致的范围内。

赞(0）回复(0）举报 2021-06-14

我来回答

如何在cassandra中计算出表中键的大致数目？

2条答案

相关问题

热门标签

最新问答