gram,模糊性]

4jb9z9bj  于 2021-06-14  发布在  ElasticSearch
关注(0)|答案(0)|浏览(268)

使用标记器、fuzziness和edge n-gram,我有三个文档:
《星际迷航i》
“星际迷航”
《星际迷航:星际迷航纪录片》
用模糊搜索“星际迷航”会给“星际迷航”一个比“星际迷航”更高的分数,因为额外的标记匹配“迷航”(=>“迷航”)。对抗这种情况的最佳方法是少模糊或无模糊的匹配吗?
此外,《星际迷航:星际迷航记录片》获得了更高的分数,因为它符合《星际迷航》和《迷航》。有没有办法只匹配最好的代币或者任何其他方法来给它和《星际迷航1》一样的分数(因为两者都包含《星际迷航》)?
编辑:
Map和设置:

PUT /stackoverflow
{
  "settings": {
    "number_of_shards": 1,
    "analysis": {
      "filter": {
        "edge_n_gram": {
          "type": "edge_ngram",
          "min_gram": "1",
          "max_gram": "50"
        }
      },
      "analyzer": {
        "autocomplete": {
          "filter": [
            "lowercase",
            "asciifolding",
            "edge_n_gram"
          ],
          "type": "custom",
          "tokenizer": "autocomplete"
        },
        "autocomplete_search": {
          "filter": [
            "lowercase",
            "asciifolding"
          ],
          "type": "custom",
          "tokenizer": "char_group"
        },
        "full_word": {
          "filter": [
            "lowercase",
            "asciifolding"
          ],
          "type": "custom",
          "tokenizer": "char_group"
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "standard"
        },
        "char_group": {
          "type": "char_group",
          "tokenize_on_chars": [
            "whitespace",
            "-",
            "."
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "search_field_full": {
        "type": "text",
        "similarity": "boolean",
        "fields": {
          "raw": {
            "type": "text",
            "similarity": "boolean",
            "analyzer": "full_word",
            "search_analyzer": "autocomplete_search"
          }
        },
        "analyzer": "autocomplete",
        "search_analyzer": "autocomplete_search"
      }
    }
  }
}

文件:

POST stackoverflow/_doc/
{
  "search_field_full": "Star Trek I"
}

POST stackoverflow/_doc/
{
  "search_field_full": "Star Trakian: A Star Trek Documentary"
}

POST stackoverflow/_doc/
{
  "search_field_full": "Star Trekian"
}

查询:

GET stackoverflow/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "fields": [
              "search_field_full"
            ],
            "fuzziness": "AUTO:4,7",
            "max_expansions": 500,
            "minimum_should_match": 2,
            "operator": "or",
            "query": "Star Trek",
            "type": "best_fields"
          }
        }
      ],
      "should": [
        {
          "multi_match": {
            "fields": [
              "search_field_full.raw^30"
            ],
            "fuzziness": 0,
            "operator": "or",
            "query": "Star Trek",
            "type": "best_fields"
          }
        },
        {
          "multi_match": {
            "fields": [
              "search_field_full.raw^20"
            ],
            "fuzziness": 1,
            "operator": "or",
            "query": "Star Trek",
            "type": "best_fields"
          }
        }
      ]
    }
  }
}

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题