elasticsearch基于前缀和自定义标记器的自动建议

wljmcqd8  于 2021-06-10  发布在  ElasticSearch
关注(0)|答案(1)|浏览(310)

我目前正在使用ngram开发自动建议功能。
我有以下过滤器、分析仪:

"nGram_filter": {
          "type": "nGram",
          "min_gram": 3,
          "max_gram": 10,
          "token_chars": [
            "letter",
            "digit",
            "punctuation",
            "symbol"
          ]
        }
"nGram_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "asciifolding",
            "nGram_filter"
          ]
        }

现在当我标记样本数据 test_table_for analyzers 并搜索字符串测试,表格,分析器,我能得到上面的记录。现在我了解到令牌是用我指定的过滤器创建的,因此它可以工作。
但是我需要添加另一个特性-我也需要启用前缀过滤器。例如:当我搜索test\u table(10个字符)时,我能够得到结果,因为max n-gram是10,但是当我尝试test\u table\u for时,它返回零结果,因为记录中没有这个标记 test_table_for analyzers .
如何为现有的n-gram分析器添加基于前缀的过滤器?就像我应该能够得到结果匹配最多10个字符时,搜索(目前的工作),而且我应该能够建议何时搜索字符串匹配的记录从一开始。

wpcxdonn

wpcxdonn1#

使用单个分析器是不可能的,您必须创建另一个字段,在该字段中可以创建用于 prefix 搜索,添加索引Map,显示其中还包括您当前的分析器。
索引Map

{
    "settings": {
        "analysis": {
            "filter": {
                "autocomplete_filter": {
                    "type": "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 30
                },
                "nGram_filter": {
                    "type": "nGram",
                    "min_gram": 3,
                    "max_gram": 10,
                    "token_chars": [
                        "letter",
                        "digit",
                        "punctuation",
                        "symbol"
                    ]
                }
            },
            "analyzer": {
                "prefixanalyzer": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "autocomplete_filter"
                    ]
                },
                "ngramanalyzer": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "nGram_filter"
                    ]
                }
            }
        },
        "index.max_ngram_diff" : 30
    },
    "mappings": {
        "properties": {
            "title_prefix": {
                "type": "text",
                "analyzer": "prefixanalyzer",
                "search_analyzer": "standard"
            },
            "title" :{
                "type": "text",
                "analyzer": "ngramanalyzer",
                "search_analyzer": "standard"
            }
        }
    }
}

现在你可以用 analyze 确认前缀令牌的api:

{
    "analyzer": "prefixanalyzer",
    "text" : "test_table_for analyzers"
}

还有你的信物 test_table_for 也存在,如下所示

{"tokens":[{"token":"t","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"te","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"tes","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_t","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_ta","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_tab","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_tabl","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_table","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_table_","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_table_f","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_table_fo","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_table_for","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"a","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"an","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"ana","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"anal","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"analy","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"analyz","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"analyze","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"analyzer","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"analyzers","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1}]}

现在,您可以使用多重匹配查询,它将为您提供所需的搜索结果,如下所示:
搜索查询

{
    "query": {
        "multi_match": {
            "query": "test_table_for",
            "fields": [
                "title",
                "title_prefix"
            ]
        }
    }
}

搜索结果

"hits": [
            {
                "_index": "so_63981157",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.45920232,
                "_source": {
                    "title_prefix": "test_table_for analyzers",
                    "title": "test_table_for analyzers"
                }
            }
        ]

相关问题