为什么mongodb时间序列有duplicate _id

pn9klfpd  于 2023-01-08  发布在  Go
关注(0)|答案(1)|浏览(128)

我刚刚意识到mongodb时间序列中两个或多个文档可以有相同的ID。
这正常吗?

ohtdti5x

ohtdti5x1#

我找不到文档中明确提到的,但是没有为时间序列集合自动生成的唯一索引on_id字段。
时间序列集合的行为与普通集合类似。您可以像平常一样插入和查询数据。MongoDB将时间序列集合视为由内部集合支持的可写非物化视图。当您插入数据时,内部集合会自动将时间序列数据组织为优化的存储格式。当您创建时间序列集合时,MongoDB自动在时间字段上创建一个内部聚集索引。
示例:

db.createCollection(
    "weather",
    {
       timeseries: {
          timeField: "timestamp",
          metaField: "metadata",
          granularity: "hours"
       }
    }
)
    
db.weather.insertMany( [
   {
      "metadata": "temperature",
      "timestamp": ISODate("2022-05-18T00:00:00.000Z"),
      "temp": 12
   },
   {
      "metadata": "temperature",
      "timestamp": ISODate("2022-05-18T02:00:00.000Z"),
      "temp": 11
   },
   {
      "metadata": "temperature",
      "timestamp": ISODate("2022-05-18T04:00:00.000Z"),
      "temp": 9
   }
])

现在,如果您查询system.buckets.weather集合,

db.getCollection('system.buckets.weather').find({})

{
    "_id" : ObjectId("62843700f921421b34e56d1f"),
    "control" : {
        "version" : 1,
        "min" : {
            "_id" : ObjectId("63b7a7460a8571fbefcb480b"),
            "timestamp" : ISODate("2022-05-17T17:30:00.000-06:30"),
            "temp" : 9.0
        },
        "max" : {
            "_id" : ObjectId("63b7a7460a8571fbefcb480d"),
            "timestamp" : ISODate("2022-05-17T21:30:00.000-06:30"),
            "temp" : 12.0
        }
    },
    "meta" : "temperature",
    "data" : {
        "timestamp" : {
            "0" : ISODate("2022-05-17T17:30:00.000-06:30"),
            "1" : ISODate("2022-05-17T19:30:00.000-06:30"),
            "2" : ISODate("2022-05-17T21:30:00.000-06:30")
        },
        "_id" : {
            "0" : ObjectId("63b7a7460a8571fbefcb480b"),
            "1" : ObjectId("63b7a7460a8571fbefcb480c"),
            "2" : ObjectId("63b7a7460a8571fbefcb480d")
        },
        "temp" : {
            "0" : 12.0,
            "1" : 11.0,
            "2" : 9.0
        }
    }
}

这个comment是这么说的
The primary key index of a Time Series collection is an automatically created clustered index on a server generated unique _id value for a group of documents with a unique metaField for a time span. This index and value can be seen in the corresponding system.buckets.foo collection. The _id of the document cannot currently be indexed and cannot be the primary key index for a Time Series collection like a regular collection.

相关问题