Elasticsearch查询返回每天的第一个事件

thtygnil  于 2023-10-17  发布在  ElasticSearch
关注(0)|答案(1)|浏览(111)

假设我有一个Elasticsearch索引,其中包含表示事件的文档,每个文档都有一个startTimestamp和一个endTimestamp。10月的每一天,我都想知道:

  • 有多少事件与那天重叠
  • 与当天重叠的最早开始的事件

有没有一种方法可以比执行31个单独的查询(10月份每天一个)更有效地实现这一点,即使在每天有大量事件的情况下(因此对它们进行分页是不切实际的)?

smdncfj3

smdncfj31#

如果您愿意稍微修改一下模式,那么在一个带有date_range类型和日期直方图的查询中就可以很容易地做到这一点。

DELETE test
PUT test
{
  "mappings": {
    "properties": {
      "event_range": {
        "type": "date_range"
      }
    }
  }
}

PUT test/_bulk?refresh
{"index": {}}
{"event_range":{ "gte":  "2023-10-01T10:00:10Z", "lte": "2023-10-01T10:00:11Z" }}
{"index": {}}
{"event_range":{ "gte":  "2023-10-02T00:00:10Z", "lte": "2023-10-04T22:59:59Z" }}
{"index": {}}
{"event_range":{ "gte":  "2023-10-03T00:00:10Z", "lte": "2023-10-03T10:59:59Z" }}
{"index": {}}
{"event_range":{ "gte":  "2023-10-06T00:00:10Z", "lte": "2023-11-05T22:59:59Z" }}

GET test/_search
{
  "size": 0, 
  "query": {
    "range": {
      "event_range": {
        "gte": "2023-10-01",
        "lt": "2023-11-01"
      }
    }
  },
  "aggs": {
    "by_day": {
      "date_histogram": {
        "field": "event_range",
        "calendar_interval": "1d",
        "hard_bounds": {
          "min": "2023-10-01",
          "max": "2023-10-31"
        }
      },
      "aggs": {
        "first_event": {
          "top_hits": {
            "size": 1,
            "sort": {"event_range": {"order": "asc", "mode": "min"}}
          }
        }
      }
    }
  }
}

10月份每天的搜索结果如下所示:

{
          "key_as_string": "2023-10-03T00:00:00.000Z",  <---- This is the day
          "key": 1696291200000,
          "doc_count": 2,  <---- Number of events intersecting with this day
          "first_event": {
            "hits": {
              "total": {
                "value": 2,
                "relation": "eq"
              },
              "max_score": null,
              "hits": [
                {
                  "_index": "test",
                  "_id": "q0C744oBImXKNSrQtMis",
                  "_score": null,
                  "_source": {   <---- The source of the earliest event
                    "event_range": {
                      "gte": "2023-10-02T00:00:10Z",
                      "lte": "2023-10-04T22:59:59Z"
                    }
                  },
                  "sort": [
                    "AamK7a9nEKmK/OthmA=="
                  ]
                }
              ]
            }
          }
        }

使用过滤器聚合而不是日期直方图可以实现类似的功能,但这需要您手动编写所有31个过滤器来检查交集。其余的都差不多。

相关问题