ElasticSearch模糊精确搜索

ukdjmx9f 于 2021-06-13 发布在 ElasticSearch

关注(0)|答案(1)|浏览(322)

我有一个索引，其中包含公司名称、公司缩写和公司业务的描述（索引模式如下）。本文档中的元素示例如下：

{
  "abbreviation": "APPL",
  "name": "Apple",
  "description": "Computer software and hardware"
}

通常用户会在 abbreviation 搜索文档时。有时他们可能会错误地输入这个，而elasticsearch在这种情况下非常有效。然而，大多数时候用户会准确地输入缩写，虽然他们会在响应的顶部得到最好的匹配，但一些分数较低（大于0）的垃圾会回来。我试过摆弄 min_score 但由于分数波动很大，很难选择这个参数。
有没有一种方法可以去除那些与实际情况不完全匹配的文档 abbreviation 字段，但仍有模糊匹配作为备份，以防精确匹配或用户搜索其他字段（例如。 name 以及 description )找不到？
以下是几个例子：
正在查询 AAPL 生成3个结果，这两个结果与查询完全匹配，因此得分相当高，但是 ADP 仍然有些相似，但显然不是用户搜索的内容。

{
  "abbreviation": "APPL",
  "name": "Apple, Inc.",
  "description": "Computer software and hardware"
},
{
  "abbreviation": "APPL",
  "name": "Apple, Inc.",
  "description": "Computer software and hardware"
},
{
  "abbreviation": "ADP",
  "name": "Automatic Data Processing, Inc",
  "description": "Computer software and hardware"
}

查询 Apple ，我们再次得到前几个条目是超级相关的，但随后一些其他公司的名字出现了。

{
  "abbreviation": "APPL",
  "name": "Apple, Inc.",
  "description": "Computer software and hardware"
},
{
  "abbreviation": "APPL",
  "name": "Apple, Inc.",
  "description": "Computer software and hardware"
},
{
  "abbreviation": "CSCO",
  "name": "AppDynamics (Cisco subsidiary)",
  "description": "Computer software"
}

文档的架构：

{
  "settings": {
    "index": {
      "requests.cache.enable": true
    }
  },
  "mappings": {
    "properties": {
      "abbreviation_and_name": {
        "type": "text",
        "boost": 2
      },
      "abbreviation": { "type": "text", "copy_to": "abbreviation_and_name", "boost": 20 },
      "name": { "type": "text", "copy_to": "abbreviation_and_name" },
      "description": { "type": "text" }
    }
  }
}

elasticsearch elasticsearch-5 elasticsearch-dsl

来源：https://stackoverflow.com/questions/65323561/elasticsearch-exact-search-with-fuzzy-search

1条答案

按热度按时间

rks48beu1#

首先，我可能会问，为什么在搜索aapl时要带回以下文档：

{
  "abbreviation": "ADP",
  "name": "Automatic Data Processing, Inc",
  "description": "Computer software and hardware"
}

第二，我建议从索引Map中删除增强条件，建议在查询级别增强。
但总的来说，我相信您可能只需要一个or查询：

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "abbreviation": {
              "query": "AAPL",
              "boost": 2
            }
          }
        },
        {
          "multi_match": {
            "query": "AAPL",
            "fields": ["name", "description"],
            "fuzziness": "AUTO"
          }
        }
      ]
    }
  }
}

这可能不会像您描述的那样产生精确的结果，但我相信这对于您的用例来说应该可以很好地工作。

赞(0）回复(0）举报 2021-06-13

我来回答

ElasticSearch模糊精确搜索

1条答案

相关问题

热门标签

最新问答