elasticsearch 使用太多字词元素优化ES查询

oknwwptz 于 2022-12-11 发布在 ElasticSearch

关注(0)|答案(1)|浏览(114)

我们正在处理一个包含数十亿条记录的数据集，目前所有数据都保存在ElasticSearch中，所有查询和聚合都使用ElasticSearch执行。
简化的查询主体如下所示，我们将设备id放在terms中，然后将它们与should进行合并，以避免terms的1024限制，terms元素的总数高达100，000，现在变得非常慢。

{
"_source": {
    "excludes": [
        "raw_msg"
    ]
},
"query": {
        "filter": {
            "bool": {
                "must": [
                    {
                        "range": {
                            "create_ms": {
                                "gte": 1664985600000,
                                "lte": 1665071999999
                            }
                        }
                    }
                ],
                "should": [
                    {
                        "terms": {
                            "device_id": [
                                "1328871",
                                "1328899",
                                "1328898",
                                "1328934",
                                "1328919",
                                "1328976",
                                "1328977",
                                "1328879",
                                "1328910",
                                "1328902",
                                ...       # more values, since terms not support values more than 1024, wen concate all of them with should
                            ]
                        }
                    },
                    {
                        "terms": {
                            "device_id": [
                                "1428871",
                                "1428899",
                                "1428898",
                                "1428934",
                                "1428919",
                                "1428976",
                                "1428977",
                                "1428879",
                                "1428910",
                                "1428902",
                                ...
                            ]
                        }
                    },
                    ...  # concate more terms until all of the 100,000 values are included
                ],
                "minimum_should_match": 1
            }
        }
},
"aggs": {
    "create_ms": {
        "date_histogram": {
            "field": "create_ms",
            "interval": "hour",
        }
    }
},
"size": 0}

我的问题是，有没有办法优化这个案例？或者有没有更好的选择来做这种搜索？
实时或接近实时是必须的，其它发动机也可以接受。
数据的简化模式：

"id" : {
        "type" : "long"
    },
    "content" : {
        "type" : "text"
    },
    "device_id" : {
        "type" : "keyword"
    },
    "create_ms" : {
        "type" : "date"
    },
    ... # more field

elasticsearch

来源：https://stackoverflow.com/questions/74744149/optimize-es-query-with-too-many-terms-elements

1条答案

按热度按时间

rdrgkggo1#

您可以将术语查询与术语查找结合使用，以指定更大的值列表，如下所示
将您的ID存储在ID为“device_ids”的特定文档中

"should": [
  {
    "terms": {
      "device_id": {
        "index": "your-index-name",
        "id": "device_ids",
        "path": "field-name"
      }
    }
  }
]

赞(0）回复(0）举报 2022-12-11

我来回答

elasticsearch 使用太多字词元素优化ES查询

1条答案

相关问题

热门标签

最新问答