当使用keyword_repeat过滤器时,带有should条件的Elasticsearch bool语句无法按预期工作

t98cgbkg  于 2023-06-21  发布在  ElasticSearch
关注(0)|答案(1)|浏览(103)

如果我在索引设置中使用keyword_repeat过滤器,那么当使用should通过bool查询搜索文档时,只搜索匹配条件的第一个字段。Elasticsearch版本:8.7.1
创建索引

curl -X PUT "elasticsearch:9200/my-test-index?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {          
          "tokenizer": "default_tokenizer",
          "filter": [
            "lowercase",
            "keyword_repeat",
            "default_stemmer"
          ]
        }
      },
      "tokenizer": {
        "default_tokenizer": {
          "type": "standard"
        }
      },
      "filter": {
        "default_stemmer": {
          "type": "stemmer",
          "language": "english"
        },
        "unique_stem": {
          "type": "unique",
          "only_on_same_position": true
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "field1": {
        "type": "text"
      },
      "field2": {
        "type": "text"
      }
    }
  }
}
'

添加文档

curl -X POST "elasticsearch:9200/my-test-index/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{

"field1": "running man",
"field2": "other text"

}
'

搜索文档

curl -X GET "elasticsearch:9200/my-test-index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "should": [
        { "match": { "field2":  "running" }},
        { "match": { "field1": "running" }}
      ]
    }
  }
}
'

回复:

{
  "took" : 243,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

我希望文件能被找到。
但是具有不同字段顺序的请求(field 1,field 2)

curl -X GET "elasticsearch:9200/my-test-index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "should": [
        { "match": { "field1":  "running" }},
        { "match": { "field2": "running" }}
      ]
    }
  }
}
'

查找文档

{
  "took" : 62,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.92058265,
    "hits" : [
      {
        "_index" : "my-test-index",
        "_id" : "1",
        "_score" : 0.92058265,
        "_source" : {
          "field1" : "running man",
          "field2" : "other text"
        }
      }
    ]
  }
}

我希望should条件的工作方式与OR条件类似,因此无论查询中字段的顺序如何,两个查询都应该返回结果。如果我从索引设置中删除keyword_repeat,一切都按预期工作,两个查询都能找到文档。
使用keyword_repeat过滤器的索引标记列表

curl -X GET "elasticsearch:9200/my-test-index/_termvectors/1?pretty&fields=field1,field2"

{
  "_index" : "my-test-index",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "took" : 116,
  "term_vectors" : {
    "field2" : {
      "field_statistics" : {
        "sum_doc_freq" : 2,
        "doc_count" : 1,
        "sum_ttf" : 4
      },
      "terms" : {
        "other" : {
          "term_freq" : 2,
          "tokens" : [
            {
              "position" : 0,
              "start_offset" : 0,
              "end_offset" : 5
            },
            {
              "position" : 0,
              "start_offset" : 0,
              "end_offset" : 5
            }
          ]
        },
        "text" : {
          "term_freq" : 2,
          "tokens" : [
            {
              "position" : 1,
              "start_offset" : 6,
              "end_offset" : 10
            },
            {
              "position" : 1,
              "start_offset" : 6,
              "end_offset" : 10
            }
          ]
        }
      }
    },
    "field1" : {
      "field_statistics" : {
        "sum_doc_freq" : 3,
        "doc_count" : 1,
        "sum_ttf" : 4
      },
      "terms" : {
        "man" : {
          "term_freq" : 2,
          "tokens" : [
            {
              "position" : 1,
              "start_offset" : 8,
              "end_offset" : 11
            },
            {
              "position" : 1,
              "start_offset" : 8,
              "end_offset" : 11
            }
          ]
        },
        "run" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 0,
              "start_offset" : 0,
              "end_offset" : 7
            }
          ]
        },
        "running" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 0,
              "start_offset" : 0,
              "end_offset" : 7
            }
          ]
        }
      }
    }
  }
}

我试着测试了不同版本的elasticsearch,得到了以下结果:
8.8.1 -按预期工作8.8.0 -按预期工作

8.7.1 -存在问题****8.7.0 -存在问题

8.6.2 -按预期工作。

zbq4xfa0

zbq4xfa01#

查询的顺序并不重要。所以下面的查询需要返回相同的结果。也许是因为refresh_interval,您第一次看到的结果是空的。

PUT test_should_index?pretty
{
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {          
          "tokenizer": "default_tokenizer",
          "filter": [
            "lowercase",
            "keyword_repeat",
            "default_stemmer"
          ]
        }
      },
      "tokenizer": {
        "default_tokenizer": {
          "type": "standard"
        }
      },
      "filter": {
        "default_stemmer": {
          "type": "stemmer",
          "language": "english"
        },
        "unique_stem": {
          "type": "unique",
          "only_on_same_position": true
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "field1": {
        "type": "text"
      },
      "field2": {
        "type": "text"
      }
    }
  }
}
### Adding a document
POST test_should_index/_doc/1?pretty&refresh
{

"field1": "running man",
"field2": "other text"

}
### Searching documents
GET test_should_index/_search?pretty
{
  "query": {
    "bool": {
      "should": [
        { "match": { "field2":  "running" }},
        { "match": { "field1": "running" }}
      ]
    }
  }
}

GET test_should_index/_search?pretty
{
  "query": {
    "bool": {
      "should": [
        { "match": { "field1":  "running" }},
        { "match": { "field2": "running" }}
      ]
    }
  }
}

结果:

# GET test_should_index/_search?pretty 200 OK
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.46029133,
    "hits": [
      {
        "_index": "test_should_index",
        "_id": "1",
        "_score": 0.46029133,
        "_source": {
          "field1": "running man",
          "field2": "other text"
        }
      }
    ]
  }
}
# GET test_should_index/_search?pretty 200 OK
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.46029133,
    "hits": [
      {
        "_index": "test_should_index",
        "_id": "1",
        "_score": 0.46029133,
        "_source": {
          "field1": "running man",
          "field2": "other text"
        }
      }
    ]
  }
}

相关问题