如何自定义elasticsearch中的分数计算?

bbmckpt7  于 2021-06-10  发布在  ElasticSearch
关注(0)|答案(2)|浏览(365)

我有以下要求。

{'bool':
         {'must': [
              {"terms": {"state.keyword": ["Alaska", "Alabama"]}
          ],
          'should': [
              {'match': {'abstract': 'Spill and Overfill Prevention 18 AAC 78.045'}},
              {'match': {'title': 'Spill and Overfill Prevention 18 AAC 78.045'}},
              {'constant_score': {
                  'filter': {
                      'match': {'title': 'Spill and Overfill Prevention 18 AAC 78.045'}
                  }
              }}
          ]}
     }

需要计算分数 title (匹配)。
为此我试着用 constant_score .
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-constant-score-query.html
然而,这并没有达到预期的效果。它只是将每个元素的结果精确地递增1。
这是分析的结果

{'took': 21, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 6, 'relation': 'eq'}, 'max_score'
: 4.754379, 'hits': [{'_index': 'articles', '_type': '_doc', '_id': '483703', '_score': 4.754379, '_source':

这是解释结果

{'_index': 'articles', '_type': '_doc', '_id': '483703', 'matched': True, 'explanation': {'value': 6.6602507, 'description': 'sum of:', 'details': [{'value': 0.150
05009, 'description': 'weight(legal_language:and in 2) [PerFieldSimilarity], result of:', 'details': [{'value': 0.15005009, 'description': 'score(freq=14.0), compu
ted as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}, {'value': 0.074107975, 'description': 'idf, computed as log(1 +
(N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 6, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N
, total number of documents with field', 'details': []}]}, {'value': 0.92034066, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from
:', 'details': [{'value': 14.0, 'description': 'freq, occurrences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation para
meter', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details': []}, {'value': 504.0, 'description': 'dl, length of field (a
pproximate)', 'details': []}, {'value': 497.5, 'description': 'avgdl, average length of field', 'details': []}]}]}]}, {'value': 0.3779109, 'description': 'weight(l
egal_language:18 in 2) [PerFieldSimilarity], result of:', 'details': [{'value': 0.3779109, 'description': 'score(freq=3.0), computed as boost * idf * tf from:', 'd
etails': [{'value': 2.2, 'description': 'boost', 'details': []}, {'value': 0.24116206, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:',
'details': [{'value': 5, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with fi
eld', 'details': []}]}, {'value': 0.7122915, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 3.0, 'desc
ription': 'freq, occurrences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.7
5, 'description': 'b, length normalization parameter', 'details': []}, {'value': 504.0, 'description': 'dl, length of field (approximate)', 'details': []}, {'value
': 497.5, 'description': 'avgdl, average length of field', 'details': []}]}]}]}, {'value': 0.3779109, 'description': 'weight(legal_language:aac in 2) [PerFieldSimi
larity], result of:', 'details': [{'value': 0.3779109, 'description': 'score(freq=3.0), computed as boost * idf * tf from:', 'details': [{'value': 2.2, 'descriptio
n': 'boost', 'details': []}, {'value': 0.24116206, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 5, 'descriptio
n': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'details': []}]}, {'value': 0.
7122915, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 3.0, 'description': 'freq, occurrences of term
within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'description': 'b, length normali
zation parameter', 'details': []}, {'value': 504.0, 'description': 'dl, length of field (approximate)', 'details': []}, {'value': 497.5, 'description': 'avgdl, ave
rage length of field', 'details': []}]}]}]}, {'value': 1.0089812, 'description': 'weight(title:spill in 2) [PerFieldSimilarity], result of:', 'details': [{'value':
1.0089812, 'description': 'score(freq=1.0), computed as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}, {'value': 1.02
96195, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 2, 'description': 'n, number of documents containing term'
, 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'details': []}]}, {'value': 0.44543427, 'description': 'tf, computed as fr
eq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 1.0, 'description': 'freq, occurrences of term within document', 'details': []}, {'value'
: 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details': []}, {'value'
: 7.0, 'description': 'dl, length of field', 'details': []}, {'value': 6.6666665, 'description': 'avgdl, average length of field', 'details': []}]}]}]}, {'value':
0.072622515, 'description': 'weight(title:and in 2) [PerFieldSimilarity], result of:', 'details': [{'value': 0.072622515, 'description': 'score(freq=1.0), computed
as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}, {'value': 0.074107975, 'description': 'idf, computed as log(1 + (N
- n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 6, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N, t
otal number of documents with field', 'details': []}]}, {'value': 0.44543427, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:',
'details': [{'value': 1.0, 'description': 'freq, occurrences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation paramete
r', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details': []}, {'value': 7.0, 'description': 'dl, length of field', 'detai
ls': []}, {'value': 6.6666665, 'description': 'avgdl, average length of field', 'details': []}]}]}]}, {'value': 1.0089812, 'description': 'weight(title:overfill in
2) [PerFieldSimilarity], result of:', 'details': [{'value': 1.0089812, 'description': 'score(freq=1.0), computed as boost * idf * tf from:', 'details': [{'value':
2.2, 'description': 'boost', 'details': []}, {'value': 1.0296195, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value'
: 2, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'details': []}
]}, {'value': 0.44543427, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 1.0, 'description': 'freq, oc
currences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'description': '
b, length normalization parameter', 'details': []}, {'value': 7.0, 'description': 'dl, length of field', 'details': []}, {'value': 6.6666665, 'description': 'avgdl
, average length of field', 'details': []}]}]}]}, {'value': 1.0089812, 'description': 'weight(title:prevention in 2) [PerFieldSimilarity], result of:', 'details':
[{'value': 1.0089812, 'description': 'score(freq=1.0), computed as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}, {'va
lue': 1.0296195, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 2, 'description': 'n, number of documents contai
ning term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'details': []}]}, {'value': 0.44543427, 'description': 'tf, comp
uted as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 1.0, 'description': 'freq, occurrences of term within document', 'details': []}
, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details': []}
, {'value': 7.0, 'description': 'dl, length of field', 'details': []}, {'value': 6.6666665, 'description': 'avgdl, average length of field', 'details': []}]}]}]},
{'value': 0.072622515, 'description': 'weight(title:18 in 2) [PerFieldSimilarity], result of:', 'details': [{'value': 0.072622515, 'description': 'score(freq=1.0),
computed as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}, {'value': 0.074107975, 'description': 'idf, computed as lo
g(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 6, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'descriptio
n': 'N, total number of documents with field', 'details': []}]}, {'value': 0.44543427, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)
) from:', 'details': [{'value': 1.0, 'description': 'freq, occurrences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation
parameter', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details': []}, {'value': 7.0, 'description': 'dl, length of field
', 'details': []}, {'value': 6.6666665, 'description': 'avgdl, average length of field', 'details': []}]}]}]}, {'value': 0.072622515, 'description': 'weight(title:
aac in 2) [PerFieldSimilarity], result of:', 'details': [{'value': 0.072622515, 'description': 'score(freq=1.0), computed as boost * idf * tf from:', 'details': [{
'value': 2.2, 'description': 'boost', 'details': []}, {'value': 0.074107975, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details':
[{'value': 6, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'det
ails': []}]}, {'value': 0.44543427, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 1.0, 'description':
'freq, occurrences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'descr
iption': 'b, length normalization parameter', 'details': []}, {'value': 7.0, 'description': 'dl, length of field', 'details': []}, {'value': 6.6666665, 'descriptio
n': 'avgdl, average length of field', 'details': []}]}]}]}, {'value': 1.5095675, 'description': 'weight(title:78.045 in 2) [PerFieldSimilarity], result of:', 'deta
ils': [{'value': 1.5095675, 'description': 'score(freq=1.0), computed as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}
, {'value': 1.5404451, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 1, 'description': 'n, number of documents
containing term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'details': []}]}, {'value': 0.44543427, 'description': 'tf
, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 1.0, 'description': 'freq, occurrences of term within document', 'details
': []}, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details
': []}, {'value': 7.0, 'description': 'dl, length of field', 'details': []}, {'value': 6.6666665, 'description': 'avgdl, average length of field', 'details': []}]}
]}]}, {'value': 1.0, 'description': 'ConstantScore(title.keyword:Spill and Overfill Prevention 18 AAC 78.045)', 'details': []}]}}

script_score ```
{'query': {
'function_score': {
'query': {
'bool': {
'should': [
{'match': {'legal_language': 'inspections and testing 691'}},
{'match': {'title': 'inspections and testing 691'}}
]
}
},
'script_score': {
'script': {'source': "doc['title'].value"}
}
}
}}

Map

{
"articles" : {
"mappings" : {
"properties" : {
"abstract" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"categories" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"cfr40_part280" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"citation" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"effective_date" : {
"type" : "date"
},
"id" : {
"type" : "long"
},
"legal_language" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"local_regulation" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"reference_images" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"state" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"tags" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"unique_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}

回溯

Traceback (most recent call last): File
"D:\work_projects\dewey_project\webapp\articles\services\elasticsearch_service.py",
line 103, in retrieve_articles
result = current_app.elasticsearch.search( File "d:\work_projects\dewey_project\venv\lib\site-packages\elasticsearch\client\utils.py",
line 84, in wrapped
return func(*args, params=params,**kwargs) File "d:\work_projects\dewey_project\venv\lib\site-packages\elasticsearch\client_init
.py",
line 1547, in search
return self.transport.perform_request( File "d:\work_projects\dewey_project\venv\lib\site-packages\elasticsearch\transport.py",
line 351, in perform_request
status, headers_response, data = connection.perform_request( File
"d:\work_projects\dewey_project\venv\lib\site-packages\elasticsearch\connection\http_urllib3.py",
line 261, in perform_request
self._raise_error(response.status, raw_data) File "d:\work_projects\dewey_project\venv\lib\site-packages\elasticsearch\connection\base.py",
line 181, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)( elasticsearch.exceptions.RequestError: RequestError(400,
'search_phase_execution_exception', 'runtime error')

zvokhttg

zvokhttg1#

不太清楚您想要达到什么目的,但是看起来您希望仅基于服务器上的匹配获得文档的tf/idf分数 title 现场。而且您还希望对查询添加其他约束。如果是这样,你应该使用 filter 合同条款 bool 查询。他们不会修改你的分数,但会根据他们的匹配过滤结果。

{
    "bool": {
        "should": [
            {"match": {"title": "Spill and Overfill Prevention 18 AAC 78.045"}}
        ],
        "filter": [
            {"match": {"abstract": "Spill and Overfill Prevention 18 AAC 78.045"}},
            {"terms": {"state.keyword": ["Alaska", "Alabama"]}
        ]
    }
}

这将返回与原始查询稍有不同的结果,因为它需要匹配 abstract 查询字段 Spill and Overfill Prevention 18 AAC 78.045 . 如果希望保持原始查询的行为,则应将其作为常量分数查询移动到 should

{
    "query": {
        "bool": {
            "should": [
                {"match": {"title": "Spill and Overfill Prevention 18 AAC 78.045"}},
                {"constant_score": {
                    "filter": {
                        "match": {"legal_language": "Spill and Overfill Prevention 18 AAC 78.045"}
                    }
                }}
            ],
            "filter": [
                {"terms": {"state.keyword": ["Alaska", "Alabama"]]}},
            ],
        }
    }
}

然后从结果分数中减去1。

e0bqpujr

e0bqpujr2#

如果你需要控制得分过程,有一个 function_score 用于自定义和替换原始查询的查询 _score . 你可以看看 function_score 在这里查询。

相关问题