按数组中的值进行ElasticSearch排序

ryoqjall  于 2022-11-22  发布在  ElasticSearch
关注(0)|答案(1)|浏览(102)

我在Elasticsearch中的每条记录都有一个对象数组,看起来像这样:

{
  "counts_by_year": [
    {
      "year": 2022,
      "works_count": 22523,
      "cited_by_count": 18054
    },
    {
      "year": 2021,
      "works_count": 32059,
      "cited_by_count": 24817
    },
    {
      "year": 2020,
      "works_count": 27210,
      "cited_by_count": 30238
    },
    {
      "year": 2019,
      "works_count": 22592,
      "cited_by_count": 33631
    }
  ]
}

我想做的是使用works_count的平均值对记录进行排序,其中年份为2022,年份为2021。在这种情况下,我是否可以使用基于脚本的排序?或者我是否应该尝试将这些值复制到单独的字段中,然后根据该字段进行排序?
编辑-Map为:

{
  "mappings": {
    "_doc": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        .
        .
        .
        "counts_by_year": {
          "properties": {
            "cited_by_count": {
              "type": "integer"
            },
            "works_count": {
              "type": "integer"
            },
            "year": {
              "type": "integer"
            }
          }
        },
        .
        .
        .
      }
    }
  }
}
f45qwnt8

f45qwnt81#

Tldr;

这要看情况。很可能是的,除非count_by_year是嵌套的。

溶液

沿着这样的东西应该可以解决问题

GET /_search
{
  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "lang": "painless",
        "source": "doc['counts_by_year.works_count'].stream().mapToLong(x -> x).average().orElse(0);"
      }
    }
  }
}

解决方案(嵌套字段)

PUT 74404793-2
{
  "mappings": {
      "properties": {
        "counts_by_year": {
          "type": "nested", 
          "properties": {
            "cited_by_count": {
              "type": "long"
            },
            "works_count": {
              "type": "long"
            },
            "year": {
              "type": "long"
            }
          }
        }
      }
    }
}

POST /74404793-2/_doc/
{
  "counts_by_year": [
    {
      "year": 2022,
      "works_count": 22523,
      "cited_by_count": 18054
    },
    {
      "year": 2021,
      "works_count": 32059,
      "cited_by_count": 24817
    },
    {
      "year": 2020,
      "works_count": 27210,
      "cited_by_count": 30238
    },
    {
      "year": 2019,
      "works_count": 22592,
      "cited_by_count": 33631
    }
  ]
}

我正在使用_source访问文档,如果您有大文档,它可能会严重影响性能。

GET 74404793-2/_search
{
  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "lang": "painless",
        "source": """
        params._source['counts_by_year']
        .stream()
        .filter(x -> x['year'] > 2020)
        .mapToLong(x -> x['works_count'])
        .average().orElse(0);"""
      }
    }
  }
}

相关问题