mongoose 优化聚合管道

ehxuflar  于 2022-11-11  发布在  Go
关注(0)|答案(1)|浏览(181)

我有一个mongo聚合管道,同时运行类似下面的代码

{
                $match: {
                    "cohortId": payload.cohortId, "template": req.query.template
                },
            }, {
                $lookup: {
                    from: "messages", localField: "messageId", foreignField: "metaData.messageId", as: "message"
                }
            }, {
                $lookup: {
                    from: "webhook", localField: "messageId", foreignField: "messageId", as: "webhook"
                }
            }, {
                $unwind: {
                    path: "$webhook"
                }
            }, {
                $unwind: {
                    path: "$message"
                }
            },

            {
                $project: {
                    "_id": 0,
                    "messageId": 1,
                    "cohortId": 1,
                    "template": 1,
                    "origin": 1,
                    "WBA_AccountId": 1,
                    "WBA_PhoneId": 1,
                    "clientPhone": "$phone",
                    "messageDirection": "$message.metaData.direction",
                    "messageTime": "$message.metaData.time",
                    "messageSent": "$webhook.status.sentFlag",
                    "messageDelivered": "$webhook.status.deliveredFlag",
                    "messageRead": "$webhook.status.readFlag",
                    "messageFailed": "$webhook.status.failedFlag",
                    "messageFailedReason": "$webhook.status.failedReason",
                    "messageSentTime": "$webhook.status.sentTimestamp",
                    "messageDeliveredTime": "$webhook.status.deliveredTimestamp",
                    "messageReadTime": "$webhook.status.readTimestamp",
                    "messageFailedTime": "$webhook.status.failedTimestamp"
                }
            },

其中Web挂接中的数据为

{
  "_id": {
    "$oid": "6374d59618bff45fa34f08f5"
  },
  "messageId": "<Message ID as String>",
  "conversationExpiry": 1668687660,
  "conversationId": "<Conversation ID as String>",
  "billableFlag": "true",
  "WBA_PhoneId": 1234567890, //masked
  "WBA_AccountId": 1234567890, //masked
  "WBA_DisplayPhone": 1234567890, //masked
  "phone": 1234567890, //masked
  "status": {
    "sentFlag": true,
    "sentTimestamp": 1668601237,
    "deliveredFlag": true,
    "readFlag": true,
    "failedFlag": false,
    "deliveredTimestamp": 1668601238,
    "readTimestamp": 1668601250,
    "failedTimestamp": 0,
    "errorMessage": "",
    "errorCode": ""
  },
  "createdAt": {
    "$date": {
      "$numberLong": "1668601238086"
    }
  },
  "updatedAt": {
    "$date": {
      "$numberLong": "1668601250918"
    }
  }
}

受众数据看起来像

{
  "_id": {
    "$oid": "635a840b97405992d3cb794d"
  },
  "WBA_AccountId": 1234567890, //masked
  "WBA_PhoneId": 1234567890, //masked
  "messageId": "<Message ID as String>",
  "phone": 1234567890, //masked
  "cohortId": "<String Value, I refer this in Aggregation Pipeline to trigger it>",
  "createdAt": {
    "$date": {
      "$numberLong": "1666876427333"
    }
  },
  "end": "2022-10-27",
  "origin": "Clevertap_API_Campaigns",
  "start": "2022-10-27",
  "template": "<String Value, I refer this in Aggregation Pipeline to trigger it>",
  "updatedAt": {
    "$date": {
      "$numberLong": "1666876427333"
    }
  }
}

问题

当我点击控制器触发管道时,使用队列和模板,迭代6000多个文档需要3分钟,网络调用超时。
如何优化管道?
正在使用M10 Atlas集群
编辑:添加实时指标Cluster Usage的屏幕截图

hfwmuf9z

hfwmuf9z1#

在此聚合管道中有3次使用索引的机会,但当前数据库根本无法这样做。
第一个机会是audiances集合上的初始$match。有两个字段要筛选,cohortIdtemplate。这两个字段都有相等 predicate ,因此索引键可以按任意顺序排列:

db.getSiblingDB('Production').audiances.createIndex({ cohortId:1, template:1 })

类似地,数据库可以尝试为两个$lookup操作使用索引。

{
                "$lookup": 
                ...
                "totalDocsExamined": 97236382,
                "collectionScans": 9908,
                "indexesUsed": [],
            },
            {
                "$lookup": 
                ...
                "totalDocsExamined": 95408871,
                "collectionScans": 9906,
                "indexesUsed": [],
            },

您在评论中提到您“* 在系统中有一个类似这样的索引 ”,并且您“ 尝试在metaData.messageId上创建一个text索引 *"。后一个索引不合适,因为您没有执行a $text search,所以它不能使用。屏幕截图中显示的第一个索引是{ phone: 1, metadata.messageid: 1 }。由于 first 键(phone)不是查询($lookup)的一部分。
由于$lookup将进行直接比较,因此以下两个索引将(极大地)提高操作的性能和效率:

db.getSiblingDB('Production').messages.createIndex({ "metaData.messageId": 1 })
db.getSiblingDB('Production').webhook.createIndex ({ "messageId": 1 })

相关问题