如何在elasticsearch上的列表中有完全匹配时提高分数？

uqdfh47h 于 2023-11-17 发布在 ElasticSearch

关注(0)|答案(2)|浏览(225)

我对ElasticSearch很陌生，我下面有这个问题。
有这两个记录：

POST test/_doc/1
    {
      "id": 1,
      "authors": [
        {
          "name": "Test Name",
          "url": "/url/1/"
        }
      ]
    }
    POST test/_doc/2
    {
      "id": 2,
      "authors": [
        {
          "name": "Test Name",
          "url": "/url/1/"
        },
            {
          "name": "Another author",
          "url": "/url/another/"
        }
      ]
    }

字符串
这个查询：

GET test/_search
    {
      "query": {
        "function_score": {
          "query": {
            "bool": {
              "should": [
                {
                  "match_phrase": {
                    "authors.name": {
                      "_name": "exact match in authors",
                      "query": "Test Name",
                      "boost": 100,
                      "slop": 1
                    }
                  }
                }
              ]
            }
          }
        }
      }
    }

型
为什么当有多个作者时，分数会降低？我如何才能使它更高或与只有一个作者的记录相同？

{
      ...
      "hits": {
        "max_score": 42.221836,
        "hits": [
          {
            "_score": 42.221836,
            "_source": {
              "id": 1,
              "authors": [
                {
                  "name": "Test Name",
                  "url": "/url/1/"
                }
              ]
            },
            "matched_queries": [
              "exact match in authors"
            ]
          },
          {
            "_score": 32.088596,
            "_source": {
              "id": 2,
              "authors": [
                {
                  "name": "Test Name",
                  "url": "/url/1/"
                },
                {
                  "name": "Another author",
                  "url": "/url/another/"
                }
              ]
            },
            "matched_queries": [
              "exact match in authors"
            ]
          }
        ]
      }
    }

型
我在文件上找不到任何关于这个的东西。
下面的详细信息只是为了确保stackoverflow不会显示以下错误：It looks like your post is mostly code; please add some more details.

elasticsearch

来源：https://stackoverflow.com/questions/77469343/how-to-improve-score-when-having-a-exact-match-in-a-list-on-elasticsearch

2条答案

按热度按时间

kqqjbcuj1#

TLDR;

这是因为你的第二个文件有一个较长的字段。你可能不习惯看：

恒定计分
滤波器
功能评分

去理解

这是什么意思？
Elasticsearch在处理一个文档数组时，会像这样存储它们：
最初：

{
  "authors": [
    {
      "name": "A0"
    },
        {
      "name": "A1"
    }
  ]
}

字符串
收件人：

{
  "authors.name": ["A0", "A1"]
}

型
而文档得分的计算采用TF/IDF，但TF与文档长度有关。

文档% 1 authors.name的长度为% 2
Doc 2 authors.name的长度为4

调查：

你可以使用API _explain：

GET 77469343/_explain/1
{
  "query": {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "authors.name": {
              "_name": "exact match in authors",
              "query": "Test Name",
              "boost": 100,
              "slop": 1
            }
          }
        }
      ]
    }
  }
}

型
这将给你给予以下结果：

文档1

{
  "_index": "77469343",
  "_id": "1",
  "matched": true,
  "explanation": {
    "value": 42.221836,
    "description": """weight(authors.name:"test name"~1 in 0) [PerFieldSimilarity], result of:""",
    "details": [
      {
        "value": 42.221836,
        "description": "score(freq=1.0), computed as boost * idf * tf from:",
        "details": [
          {
            "value": 220,
            "description": "boost",
            "details": []
          },
          {
            "value": 0.36464313,
            "description": "idf, sum of:",
            "details": [
              {
                "value": 0.18232156,
                "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                "details": [
                  {
                    "value": 2,
                    "description": "n, number of documents containing term",
                    "details": []
                  },
                  {
                    "value": 2,
                    "description": "N, total number of documents with field",
                    "details": []
                  }
                ]
              },
              {
                "value": 0.18232156,
                "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                "details": [
                  {
                    "value": 2,
                    "description": "n, number of documents containing term",
                    "details": []
                  },
                  {
                    "value": 2,
                    "description": "N, total number of documents with field",
                    "details": []
                  }
                ]
              }
            ]
          },
          {
            "value": 0.5263158,
            "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
            "details": [
              {
                "value": 1,
                "description": "phraseFreq=1.0",
                "details": []
              },
              {
                "value": 1.2,
                "description": "k1, term saturation parameter",
                "details": []
              },
              {
                "value": 0.75,
                "description": "b, length normalization parameter",
                "details": []
              },
              {
                "value": 2,
                "description": "dl, length of field",
                "details": []
              },
              {
                "value": 3,
                "description": "avgdl, average length of field",
                "details": []
              }
            ]
          }
        ]
      }
    ]
  }
}

型

文档2

{
  "_index": "77469343",
  "_id": "2",
  "matched": true,
  "explanation": {
    "value": 32.088596,
    "description": """weight(authors.name:"test name"~1 in 1) [PerFieldSimilarity], result of:""",
    "details": [
      {
        "value": 32.088596,
        "description": "score(freq=1.0), computed as boost * idf * tf from:",
        "details": [
          {
            "value": 220,
            "description": "boost",
            "details": []
          },
          {
            "value": 0.36464313,
            "description": "idf, sum of:",
            "details": [
              {
                "value": 0.18232156,
                "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                "details": [
                  {
                    "value": 2,
                    "description": "n, number of documents containing term",
                    "details": []
                  },
                  {
                    "value": 2,
                    "description": "N, total number of documents with field",
                    "details": []
                  }
                ]
              },
              {
                "value": 0.18232156,
                "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                "details": [
                  {
                    "value": 2,
                    "description": "n, number of documents containing term",
                    "details": []
                  },
                  {
                    "value": 2,
                    "description": "N, total number of documents with field",
                    "details": []
                  }
                ]
              }
            ]
          },
          {
            "value": 0.40000004,
            "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
            "details": [
              {
                "value": 1,
                "description": "phraseFreq=1.0",
                "details": []
              },
              {
                "value": 1.2,
                "description": "k1, term saturation parameter",
                "details": []
              },
              {
                "value": 0.75,
                "description": "b, length normalization parameter",
                "details": []
              },
              {
                "value": 4,
                "description": "dl, length of field",
                "details": []
              },
              {
                "value": 3,
                "description": "avgdl, average length of field",
                "details": []
              }
            ]
          }
        ]
      }
    ]
  }
}

型

修复

常量评分

如果你仍然想要一个分数，你可能想看看constant_score查询：

GET 77469343/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "constant_score": {
            "filter": {
              "match_phrase": {
                "authors.name": {
                  "_name": "exact match in authors",
                  "query": "Test Name",
                  "boost": 100,
                  "slop": 1
                }
              }
            },
            "boost": 1.2
          }
        }
      ]
    }
  }
}

型

过滤而不是应该？

如果你使用过滤器，匹配的文档不会影响分数：

GET 77469343/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "match_phrase": {
            "authors.name": {
              "_name": "exact match in authors",
              "query": "Test Name",
              "boost": 100,
              "slop": 1
            }
          }
        }
      ]
    }
  }
}

型

展开查看全部

赞(0）回复(0）举报 2023-11-17

o4tp2gmn2#

我尝试了@paulo解决方案，但它并不完全适合我，所以我最终添加了一个嵌套字段：

"authors": {
        "type": "nested",
        "properties": {
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "url": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },

字符串
并使用此查询：

{
    "nested": {
        "path": "authors",
        "_name": "exact match in authors",
        "query": {
            "bool": {
                "must": {
                    "match_phrase": {
                        "authors.name": {
                            "query": "Test Name",
                            "boost": 100,
                            "slop": 1,
                        }
                    }
                }
            }
        },
    }
}

型
ElasticSearch文档：https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
经过这些修改后，它工作得很好！

展开查看全部

赞(0）回复(0）举报 2023-11-17

我来回答

如何在elasticsearch上的列表中有完全匹配时提高分数？

2条答案

TLDR;

去理解

调查：

文档1

文档2

修复

常量评分

过滤而不是应该？

相关问题

热门标签

最新问答