mongoose Mongo聚合管道-在另一个集合中的多个查找

nafvub8i  于 11个月前  发布在  Go
关注(0)|答案(1)|浏览(98)

我在理解聚合管道时遇到了一些麻烦,当我与另一个集合进行看似复杂的匹配时,目标是获得一个特定用户在分析集合中没有video_impression条目的视频列表。
我的数据看起来像这样:

db={
  "videos": [
    {
      "_id": "1",
      "name": "1's Video",
      "status": "complete",
      "privacy": "public"
    },
    {
      "_id": "2",
      "name": "2's Video",
      "status": "complete",
      "privacy": "public"
    },
    {
      "_id": "3",
      "name": "3's Video",
      "status": "complete",
      "privacy": "public"
    },
    {
      "_id": "4",
      "name": "4's Video",
      "status": "complete",
      "privacy": "private"
    },
    {
      "_id": "5",
      "name": "5's Video",
      "status": "flagged",
      "privacy": "public"
    }
  ],
  "analytics": [
    {
      "_id": "1",
      "user": "1",
      "event": "video_impression",
      "data": {
        "video": "1"
      }
    },
    {
      "_id": "2",
      "user": "2",
      "event": "video_impression",
      "data": {
        "video": "2"
      }
    }
  ]
}

字符串
我已经设法得到一个匹配工作,但它的作品 “全球” 即。它不考虑到考虑用户ID,所以它的回馈文件,不匹配任何人。

db.videos.aggregate([
  {
    $match: {
      "status": "complete",
      "privacy": "public"
    }
  },
  {
    $lookup: {
      from: "analytics",
      localField: "_id",
      foreignField: "data.video",
      as: "matched_docs"
    }
  },
  {
    $match: {
      "matched_docs": {
        $eq: []
      }
    }
  }
])


我尝试在管道中添加另一个$lookup阶段来查找user字段,但似乎也不起作用,因为数据总是空的。我遇到的问题的Here's a Mongo Playground可能有助于进一步解释它。

h7wcgrx3

h7wcgrx31#

**1.**首先,从analytics集合运行这个聚合会比从videos集合运行更好。如果有users集合的话,使用users集合会更好。
**2.**根据jQueeny的评论,“analytics”集合示例有点不完整。我假设每个事件只存在一次,所以如果用户观看两个视频,analytics中将有两个条目,而不是只有一个带有视频数组的条目。PS。我建议您更改此设置,以便它是data中每个事件类型的对象id数组,每个用户都有单独的记录,甚至可以将它们全部合并为一个,这取决于您计划以后如何使用它。

anaylytics集合:

[
  { "_id": "1", "user": "1", "event": "video_impression", "data": { "video": "1" } },
  { "_id": "2", "user": "2", "event": "video_impression", "data": { "video": "2" } },
  { "_id": "3", "user": "2", "event": "video_impression", "data": { "video": "3" } },
  { "_id": "4", "user": "2", "event": "liked_video", "data": { "video": "2" } }
]

字符串

3.这里的策略是使用$lookup的流水线语法,通过一个Uncorrelated子查询获取所有的视频id,它的优点是只运行一次,然后使用该高速缓存:

  • MongoDB只需要在缓存查询之前运行一次$lookup子查询,因为源集合和外部集合之间没有关系。$lookup子查询不基于源集合中的任何值。此行为提高了此查询的后续执行性能。*

但是,如果视频集合太大,并且每个阶段的文档变得>100 MB,则此管道将失败,您将需要使用相关子查询。

**4.**这里使用的方法是:

a)使用analytics集合,过滤到仅video_impressions事件,按user_id分组,然后创建他们观看过/有印象的视频的set(唯一数组)。
B)使用查找将“公共+完整”视频的所有视频ID放入一个数组中
c)做所有视频和视频之间的差异与印象.

**5.**顺便说一句,如果你想一次只为一个用户做这件事,比如网页/FE,那么在第一个匹配阶段用event添加user: <user_id>

db.analytics.aggregate([
  {
    // select only the video_impression events
    $match: { event: "video_impression" }
  },
  {
    // first uniquify your users but you should
    // probably run this from the users collection
    $group: {
      _id: "$user",
      impressioned_vids: { "$addToSet": "$data.video" },
      // remove this if you want the user as _id
      user: { "$first": "$user" }
    }
  },
  { $project: { _id: 0 } },
  {
    // uncorrelated subquery which should only run once
    // and then is cached
    $lookup: {
      from: "videos",
      pipeline: [
        {
          $match: {
            status: "complete",
            privacy: "public"
          }
        },
        {
          $group: {
            _id: null,
            video_ids: { $push: "$_id" }
          }
        }
      ],
      as: "all_vids"
    }
  },
  {
    // put it conveniently into a single list
    $set: { all_vids: { $first: "$all_vids.video_ids" } }
  },
  {
    // these are the public-complete videos which that user has not seen
    $set: {
      unimpressed: {
        $setDifference: [ "$all_vids", "$impressioned_vids" ]
      }
    }
  },
  {
    // get rid of the other fields, uncomment to debug
    $project: {
      user: 1,
      unimpressed: 1
    }
  }
])


使用我修改后的analytics集合和您原始的videos集合,结果如下:

[
  {
    "unimpressed": ["1"],
    "user": "2"
  },
  {
    "unimpressed": ["2", "3"],
    "user": "1"
  }
]


Mongo Playground

备选方案2

如果all_vids的列表对于 *uncorrelated $lookup * 来说太大,那么它将需要一个correlated $lookup,它对每个文档执行一次。这需要将“seen videos”数组赋给let中的一个变量,然后在查找管道中使用它。

{
    // correlated subquery which executes per record
    $lookup: {
      from: "videos",
      let: { seen_vid_ids: "$impressioned_vids" },
      pipeline: [
        {
          $match: {
            status: "complete",
            privacy: "public",
            $expr: {
              $not: {
                $in: ["$_id", "$$seen_vid_ids"],
              },
            },
          }
        }
      ],
      as: "unseen_vids"
    }
  },
  {
    // these are the public-complete videos which that user has not seen
    $set: {
      unimpressed: "$unseen_vids._id"
    }
  },


Mongo Playground with the full aggregation

相关问题