条件的ElasticSearch聚合筛选器数组

vbopmzt1  于 2023-01-01  发布在  ElasticSearch
关注(0)|答案(2)|浏览(160)

我的数据如下所示:

[
    {
        "name": "Scott",
        "origin": "London",
        "travel": [
            {
                "active": false,
                "city": "Berlin",
                "visited": "2020-02-01"
            },
            {
                "active": true,
                "city": "Prague",
                "visited": "2020-02-15"
            }
        ]
    },
    {
        "name": "Lilly",
        "origin": "London",
        "travel": [
            {
                "active": true,
                "city": "Scotland",
                "visited": "2020-02-01"
            }
        ]
    }
]

我想执行一个聚合,其中每个顶级起点都是一个bucket,然后执行一个嵌套聚合,以查看当前访问每个城市的人数,因此我只关心 * 如果 * activetrue,则城市是什么。
使用一个过滤器,它将搜索visited数组,并返回完整的数组(两个对象),如果其中一个将active设置为true,我不想包括active为false的城市。
预期输出:

{
  "aggregations": {
    "origin": {
      "buckets": [
        {
          "key": "London",
          "buckets": [
            {
              "key": "travel",
              "doc_count": 2555,
              "buckets": [
                {
                  "key": "Scotland",
                  "doc_count": 1
                },
                {
                  "key": "Prague",
                  "doc_count": 1
                }
              ]
            }
          ]
        }
      ]
    }
  }
}

在上面,我只有2个travel聚合下的计数,因为只有两个travel对象的active设置为true。
目前,我的聚合设置如下:

{
  "from": 0,
  "aggs": {
    "origin": {
      "terms": {
        "field": "origin"
      },
      "aggs": {
        "travel": {
          "filter": {
            "term": {
              "travel.active": true
            }
          },
          "aggs": {
            "city": {
              "terms": {
                "field": "city"
              }
            }
          }
        }
      }
    }
  }
}

我在origin上有一个顶级聚合,然后在travel数组上有一个嵌套的聚合,这里我在travel.active = true上有一个过滤器,然后有另一个嵌套的聚合为每个城市创建bucket。
在我的聚合中,它仍然产生Berlin作为一个城市,即使我过滤了active = true。
我的猜测是因为它允许它,因为active: true对于数组中的一个对象为真。
如何从聚合中完全过滤掉active: false

3df52oht

3df52oht1#

您必须使用**“嵌套聚合"。**参考的官方文档链接
以下是查询的示例:

Map:

PUT /city_index
{
  "mappings": {
    "properties": {
      "name" : { "type" : "keyword" },
      "origin" : { "type" : "keyword" },
      "travel": { 
        "type": "nested",
        "properties": {
          "active": {
            "type": "boolean"
          },
          "city": {
            "type": "keyword"
          },
          "visited" : {
            "type":"date"
          }
        }
      }
    }
  }
}

插入:

PUT /city_index/_doc/1
{
  "name": "Scott", 
  "origin" : "London",
  "travel": [
    {
      "active": false,
      "city": "Berlin",
      "visited" : "2020-02-01"
    },
    {
      "active": true,
      "city": "Prague",
      "visited": "2020-02-15"
    }
  ]
}

PUT /city_index/_doc/2
{
  "name": "Lilly",
  "origin": "London",
  "travel": [
    {
      "active": true,
      "city": "Scotland",
      "visited": "2020-02-01"
    }
  ]
}

查询:

GET /city_index/_search
{
  "size": 0,
  "aggs": {
    "origin": {
      "terms": {
        "field": "origin"
      },
      "aggs": {
        "city": {
          "nested": {
            "path": "travel"
          },
          "aggs": {
            "travel": {
              "filter": {
                "term": {
                  "travel.active": true
                }
              },
              "aggs": {
                "city": {
                  "terms": {
                    "field": "travel.city"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

输出:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "origin": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "London",
          "doc_count": 2,
          "city": {
            "doc_count": 3,
            "travel": {
              "doc_count": 2,
              "city": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "Prague",
                    "doc_count": 1
                  },
                  {
                    "key": "Scotland",
                    "doc_count": 1
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}
4dbbbstv

4dbbbstv2#

@karthick的建议很好,但是我在查询中添加了过滤器,这样你在聚合阶段的值会更少。

GET idx_travel/_search
{
  "size": 0,
  "query": {
    "nested": {
      "path": "travel",
      "query": {
        "term": {
          "travel.active": {
            "value": true
          }
        }
      }
    }
  },
  "aggs": {
    "origin": {
      "terms": {
        "field": "origin"
      },
      "aggs": {
        "city": {
          "nested": {
            "path": "travel"
          },
          "aggs": {
            "city": {
              "terms": {
                "field": "travel.city"
              }
            }
          }
        }
      }
    }
  }
}

相关问题