匹配查询中的所有精确单词

z4bn682m 于 2021-06-10 发布在 ElasticSearch

关注(0)|答案(2)|浏览(534)

我想使用elasticsearch javaapi创建一个查询，它只匹配（1）完整的单词和（2）searchquery中的所有单词。下面是一个例子：
文本： hello wonderful world 这些应匹配：

hello
hello wonderful
hello world
wonderful world
hello wonderful world
wonderful
world

这些不应该匹配：
hell hello fniefsgbsugbs 我为匹配查询尝试了以下参数，但它仍然匹配上面的两个示例。
这是使用elasticsearch 7.7.1 java api生成查询的代码：

import org.elasticsearch.index.query.QueryBuilders
...
QueryBuilders.matchQuery(field, query)
            .autoGenerateSynonymsPhraseQuery(false)
            .fuzziness(0)
            .prefixLength(0)
            .fuzzyTranspositions(false)
            .operator(Operator.AND)
            .minimumShouldMatch("100%")

将生成此查询：

{
  "size": 100,
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "searchableText": {
              "query": "hell",
              "operator": "AND",
              "fuzziness": "0",
              "prefix_length": 0,
              "max_expansions": 50,
              "minimum_should_match": "100%",
              "fuzzy_transpositions": false,
              "lenient": false,
              "zero_terms_query": "NONE",
              "auto_generate_synonyms_phrase_query": false,
              "boost": 1
            }
          }
        }
      ]
    }
  }
}

有人能帮我找到解决这个问题的好办法吗？
编辑：这里是设置和Map（我删除了所有与 searchableText 使其尽可能小）：

{
    "settings": {
      "analysis": {
        "normalizer": {
          "lowercase_normalizer": {
            "type": "custom",
            "filter": [
              "lowercase"
            ]
          }
        },
        "filter": {
          "german_stemmer": {
            "type": "stemmer",
            "language": "light_german"
          },
          "ngram_filter": {
            "type": "shingle",
            "max_shingle_size": 4,
            "min_shingle_size": 2,
            "output_unigrams": false,
            "output_unigrams_if_no_shingles": false
          }
        },
        "analyzer": {
          "german": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "german_synonyms",
              "german_stop",
              "german_keywords",
              "german_no_stemming",
              "german_stemmer"
            ]
          },
          "german_ngram": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "german_synonyms",
              "german_keywords",
              "german_no_stemming",
              "german_stemmer",
              "ngram_filter"
            ]
          }
        }
      }
    },
    "mappings": {
      "properties": {
        "description": {
          "type": "text",
          "copy_to": "searchableText",
          "analyzer": "german"
        },
        "name": {
          "type": "text",
          "copy_to": "searchableText",
          "analyzer": "german"
        },
        "userTags": {
          "type": "keyword",
          "copy_to": "searchableText",
          "normalizer": "lowercase_normalizer"
        },
        "searchableText": {
          "type": "text",
          "analyzer": "german",
          "fields": {
            "ngram": {
              "type": "text",
              "analyzer": "german_ngram"
            }
          }
        },
        "searches": {
          "type": "keyword",
          "copy_to": "searchableText",
          "normalizer": "lowercase_normalizer"
        }
      }
    }
  }

编辑2：这些是提到的过滤器：

"filter": {
    "german_stop": {
      "type": "stop",
      "stopwords": "_german_"
    },
    "german_stemmer": {
      "type": "stemmer",
      "language": "light_german"
    },
    "ngram_filter": {
      "type": "shingle",
      "max_shingle_size": 4,
      "min_shingle_size": 2,
      "output_unigrams": false,
      "output_unigrams_if_no_shingles": false
    }
}

elasticsearch elasticsearch-java-api elasticsearch-7

来源：https://stackoverflow.com/questions/64878011/match-all-exact-words-in-a-query

2条答案

按热度按时间

ipakzgxi1#

我尝试用你的设置和Map创建索引，但由于没有提供以下筛选器，我得到了错误，并在删除这些筛选器后创建了索引。

"german_synonyms",
"german_stop",
"german_keywords",
"german_no_stemming",

在那之后，我索引了你的样本文件 hello wonderful world 并使用了您的搜索查询，但它工作正常，如您预期的，没有返回结果 hell 或者 hello fniefsgbsugbs 如下图所示

{
    "size": 100,
    "query": {
        "bool": {
            "filter": [
                {
                    "match": {
                        "searchableText": {
                            "query": "hello fniefsgbsugbs",
                            "operator": "AND",
                            "fuzziness": "0",
                            "prefix_length": 0,
                            "max_expansions": 50,
                            "minimum_should_match": "100%",
                            "fuzzy_transpositions": false,
                            "lenient": false,
                            "zero_terms_query": "NONE",
                            "auto_generate_synonyms_phrase_query": false,
                            "boost": 1
                        }
                    }
                }
            ]
        }
    }
}

它又回来了

"hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }

我也一样 hell ，而返回结果 hello , hello wonderful 以及其他预期匹配的条款。
编辑：您使用的是分析的匹配查询，即，它分析搜索项，应用与字段上应用索引时间相同的分析器，并将搜索时间标记与索引时间标记相匹配。
为了正确调试这类问题，请使用analyze api并检查索引文档标记和搜索词标记。

展开查看全部

赞(0）回复(0）举报 2021-06-11

jchrr9hc2#

对于索引为“关键字”的字段，我通常更喜欢querystring query dsl而不是match query。例如：

{
    "query" : {
        "query_string" : {
            "query" : "my_field:('hello', 'wonderful', 'world')"
        }
    }
}

将匹配所有您编写的应该匹配的组合，而不是您不想要的组合。括号中术语的关系类似于sql“in”，因此字段中出现的任何一个都将与文档匹配。另外，这种格式在创建复杂搜索时提供了极大的灵活性。如果这有帮助，请告诉我。

赞(0）回复(0）举报 2021-06-10

我来回答

匹配查询中的所有精确单词

2条答案

相关问题

热门标签

最新问答