MongoDB -根据另一个查询的结果通过查询删除文档的最佳方式

neskvpey 于 2022-11-03 发布在 Go

关注(0)|答案(1)|浏览(155)

我有一个可以包含几百万个文档的集合，为了简单起见，让我们假设它们看起来像这样：

{'_id': '1', 'user_id': 1, 'event_type': 'a', 'name': 'x'}
{'_id': '2', 'user_id': 1, 'event_type': 'b', 'name': 'x'}
{'_id': '3', 'user_id': 1, 'event_type': 'c', 'name': 'x'}
{'_id': '4', 'user_id': 2, 'event_type': 'a', 'name': 'x'}
{'_id': '5', 'user_id': 2, 'event_type': 'b', 'name': 'x'}
{'_id': '6', 'user_id': 3, 'event_type': 'a', 'name': 'x'}
{'_id': '7', 'user_id': 3, 'event_type': 'b', 'name': 'x'}
{'_id': '8', 'user_id': 4, 'event_type': 'a', 'name': 'x'}
{'_id': '9', 'user_id': 4, 'event_type': 'b', 'name': 'x'}
{'_id': '10', 'user_id': 4, 'event_type': 'c', 'name': 'x'}

如果user_id具有event_type为'c'的文档，我希望有一个按user_id运行和删除所有文档的每日作业
因此，生成的集合将为

{'_id': '4', 'user_id': 2, 'event_type': 'a', 'name': 'x'}
{'_id': '5', 'user_id': 2, 'event_type': 'b', 'name': 'x'}
{'_id': '6', 'user_id': 3, 'event_type': 'a', 'name': 'x'}
{'_id': '7', 'user_id': 3, 'event_type': 'b', 'name': 'x'}

我成功地用mongoshell做了这个

var cur = db.my_collection.find({'event_type': 'c'})
ids = [];
while (cur.hasNext()) {
  ids.push(cur.next()['user_id']);
  if (ids.length == 5){
    print('deleting for user_ids', ids);
    print(db.my_collection.deleteMany({user_id: {$in: ids}}));
    ids = [];
  }
}
if (ids.length){db.my_collection.deleteMany({user_id: {$in: ids}})}

创建了一个游标来保存event_type为“c”的所有文档，将它们分组为5个一批，然后删除具有这些ID的所有文档。
它可以工作，但看起来非常慢，就像每个cur.next()一次只得到一个文档。
我想知道是否有更好或更正确的方法来实现这一点，如果是elasticsearch，我会创建一个切片滚动，并行扫描每个切片，并提交每个包含1000个id的并行deleteByQuery请求。
从扩展Angular 来看，我预计收集中会有数百万个文档（约10M），其中30万个文档与查询匹配，约70万个文档应被删除

mongodb

来源：https://stackoverflow.com/questions/74209592/mongodb-best-way-to-delete-documents-by-query-based-on-results-of-another-quer

1条答案

按热度按时间

7hiiyaii1#

这听起来像是您可以将deleteMany与原始查询一起使用：

db.my_collection.deleteMany({
    event_type: 'c'
})

它没有大小限制，根据示例大小，它可能只需要几分钟就可以运行。
编辑：
我个人会尝试使用distinct函数，这是最干净和最简单的代码。独特的确实有一个16 mb的限制，大约300 k ~一天的唯一id（取决于用户id字段的大小）听起来有点接近阈值，或者超过它。

const userIds = db.my_collection.distinct('user_id', { event_type: 'c'});
db.my_collection.deleteMany({user_id: {$in: userIds}})

假设您希望增加规模，或者这会使您的测试失败，那么最好的方法是使用与您的方法类似的方法，只是批量要大得多。例如：

const batchSize = 100000;
const count = await db.my_collection.countDocuments({'event_type': 'c'});
let iteration = 0;
while (iteration * batchSize < count) {
    const batch = await db.my_collection.find({'event_type': 'c'}, { projection: { user_id: 1}}).limit(batchSize).toArray();
    if (batch.length === 0) {
        break
    }
    await db.my_collection.deleteMany({user_id: {$in: batch.map(v => v.user_id)}});
    iteration++
}

展开查看全部

赞(0）回复(0）举报 2022-11-03

我来回答

MongoDB -根据另一个查询的结果通过查询删除文档的最佳方式

1条答案

相关问题

热门标签

最新问答