在Astra Cassandra中插入新条目时出现问题

bqf10yzr  于 2022-09-27  发布在  Cassandra
关注(0)|答案(2)|浏览(163)

我正在AWS机器上从Cassandra迁移到阿斯特拉Cassandra,但存在一些问题:
我无法在Astra Cassandra中插入一个列,该列大约有200万个字符和1.77 MB(我有更大的数据要插入,大约有2000万个字符)。有人知道如何解决这个问题吗?
我正在通过Python应用程序(cassandra驱动程序==3.17.0)插入它,这是我得到的错误堆栈:

start.sh[5625]: [2022-07-12 15:14:39,336] 
INFO in db_ops: error = Error from server: code=1500
[Replica(s) failed to execute write] 
message="Operation failed - received 0 responses and 2 failures: UNKNOWN from 0.0.0.125:7000, UNKNOWN from 0.0.0.181:7000" 
info={'consistency': 'LOCAL_QUORUM', 'required_responses': 2, 'received_responses': 0, 'failures': 2}

如果我使用一半的字符,它就会工作。
新的Astra Cassandra CQL控制台表说明:

token@cqlsh> describe mykeyspace.series;

CREATE TABLE mykeyspace.series (
    type text,
    name text,
    as_of timestamp,
    data text,
    hash text,
    PRIMARY KEY ((type, name, as_of))
) WITH additional_write_policy = '99PERCENTILE'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99PERCENTILE';

旧Cassandra表说明:

ansible@cqlsh> describe mykeyspace.series;

CREATE TABLE mykeyspace.series (
    type text,
    name text,
    as_of timestamp,
    data text,
    hash text,
    PRIMARY KEY ((type, name, as_of))
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

数据示例:

{"type": "OP", "name": "book", "as_of": "2022-03-17", "data": [{"year": 2022, "month": 3, "day": 17, "hour": 0, "quarter": 1, "week": 11, "wk_year": 2022, "is_peak": 0, "value": 1.28056854009628e-08}, .... ], "hash": "84421b8d934b06488e1ac464bd46e83ccd2beea5eb2f9f2c52428b706a9b2a10"}

where this json contains 27.000 entries inside the data array like : 

{"year": 2022, "month": 3, "day": 17, "hour": 0, "quarter": 1, "week": 11, "wk_year": 2022, "is_peak": 0, "value": 1.28056854009628e-08}

代码的Python部分:

def insert_to_table(self, table_name,**kwargs):
        try:
            ...
            elif table_name == "series":
                self.session.execute(
                    self.session.prepare("INSERT INTO series (type, name, as_of, data, hash) VALUES (?, ?, ?, ?, ?)"),
                    (
                        kwargs["type"],
                        kwargs["name"],
                        kwargs["as_of"],
                        kwargs["data"],
                        kwargs["hash"],
                    ),
                )
            return True
        except Exception as error:
            current_app.logger.error('src/db/db_ops.py insert_to_table() table_name = %s error = %s', table_name, error)
            return False

非常感谢!

cyej8jka

cyej8jka1#

您正在达到最大变异大小的配置限制。在Cassandra上,默认为16 MB,而在Astra DB上,目前为4 MB(可能会增加,但仍强烈建议使用veyer大单元大小执行插入)。
存储此数据的一种更灵活的方法是修改数据模型,并将带有巨大字符串的大行拆分为几行,每行包含27000条左右的单个条目。通过正确使用分区,您仍然可以通过单个查询检索整个内容(为了方便起见,在数据库和驱动程序之间分页,这将有助于避免读取如此大的单个行时可能出现的烦人超时)。
顺便提一下,我建议您只在insert_to_table函数之外创建一次准备好的语句(缓存它或其他东西)。在insert函数中,只需self.session.execute(already_prepared_statement, (value1, value2, ...))即可显著提高性能。
最后一点:我认为驱动程序只能从版本3.24.0开始连接到Astra DB,所以我不确定您是如何使用版本3.17的。我认为版本3.17不知道cloud参数与Cluster构造函数有关。无论如何,我建议您将驱动程序升级到最新版本(目前为3.25.0)。

5vf7fwbs

5vf7fwbs2#

你在问题中发布的细节有些不太正确。
在您发布的架构中,data列的类型为text

data text,

但您发布的示例数据看起来像是在插入键/值对,奇怪的是,它们的格式似乎类似于CQL集合类型。
如果它真的是一个字符串,那么它的格式将为:

... "data": "key1: value1, key2: value2, ...", ...

检查数据和代码,然后重试。干杯

相关问题