ElasticSearch中的同义词搜索

ymdaylpp  于 2022-11-02  发布在  ElasticSearch
关注(0)|答案(1)|浏览(308)

我想使用同义词的概念从索引中检索数据。当我使用标题A执行搜索时,我还想检索标题包含B的文档。为此,我设置了以下Map:

{
    "settings": {
        "index" : {
            "analysis" : {
                "filter" : {
                    "synonym_filter" : {
                        "type" : "synonym",
                        "synonyms" : [
                            "A=>A,B"
                        ]
                    }
                },
                "analyzer" : {
                    "synonym_analyzer" : {
                        "tokenizer" : "keyword",
                        "filter" : ["synonym_filter"] 
                    }
                }
            }
        }
    },
    "mappings": {
            "properties": {
              "title": { 
                "type": "text",
                "analyzer" : "synonym_analyzer"
              }
            }
    }
}

然后,我向索引中添加了3个文档

{
  "title": "C"
}
{
  "title": "B"
}
{
  "title": "A"
}

然后,我使用分析API来查看它是否工作(一切正常):

curl -X GET "localhost:9200/my_custom_index_title/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
  "analyzer": "synonym_analyzer", 
  "text":     "A"
}
'
{
  "tokens" : [
    {
      "token" : "A",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "B",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "SYNONYM",
      "position" : 0
    }
  ]
}

url -X GET "localhost:9200/my_custom_index_title/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
  "analyzer": "synonym_analyzer", 
  "text":     "B"
}
'
{
  "tokens" : [
    {
      "token" : "B",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "word",
      "position" : 0
    }
  ]
}

当我搜索标题A时,结果是正确的:

{
    "query": {
    "match": {
      "title": {
        "query": "A"
      }
    }
  }
}

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 0.6951314,
        "hits": [
            {
                "_index": "my_custom_index_title",
                "_id": "i5bb_4IBqFAXxSLAgrDj",
                "_score": 0.6951314,
                "_source": {
                    "title": "A"
                }
            },
            {
                "_index": "my_custom_index_title",
                "_id": "jJbb_4IBqFAXxSLAlLBj",
                "_score": 0.52354836,
                "_source": {
                    "title": "B"
                }
            }
        ]
    }
}

但当我搜索B时,结果不正确,我只想搜索包含B的结果,而不是A

{
    "query": {
    "match": {
      "title": {
        "query": "B"
      }
    }
  }
}

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 0.52354836,
        "hits": [
            {
                "_index": "my_custom_index_title",
                "_id": "i5bb_4IBqFAXxSLAgrDj",
                "_score": 0.52354836,
                "_source": {
                    "title": "A"
                }
            },
            {
                "_index": "my_custom_index_title",
                "_id": "jJbb_4IBqFAXxSLAlLBj",
                "_score": 0.52354836,
                "_source": {
                    "title": "B"
                }
            }
        ]
    }
}

例如,当我搜索电脑时,我希望获得笔记本电脑,电脑,mac。但当我搜索mac时,我只想得到它的结果(而不是笔记本电脑和电脑)
我不明白为什么使用B进行搜索的结果不是只返回一个结果

lokaqttq

lokaqttq1#

我理解,在这个例子中,当您将synonym_analyzer作为字段分析器时,您对同义词进行了索引。要解决这个问题,您可以只在搜索时使用同义词,添加参数“search_analyzer”。注意,我在synonym_analyzer中添加了小写过滤器,因为标准分析器默认应用小写。
要获取术语B的标记同义词,请执行以下操作:

{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "synonym_filter": {
            "type": "synonym",
            "expand":"false",
            "synonyms": [
              "A=>A,B"
            ]
          }
        },
        "analyzer": {
          "synonym_analyzer": {
            "tokenizer": "keyword",
            "filter": [
              "lowercase",
              "synonym_filter"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "standard",
        "search_analyzer": "synonym_analyzer"
      }
    }
  }
}

相关问题