如何在elasticsearch上的列表中有完全匹配时提高分数?

uqdfh47h  于 2023-11-17  发布在  ElasticSearch
关注(0)|答案(2)|浏览(225)

我对ElasticSearch很陌生,我下面有这个问题。
有这两个记录:

  1. POST test/_doc/1
  2. {
  3. "id": 1,
  4. "authors": [
  5. {
  6. "name": "Test Name",
  7. "url": "/url/1/"
  8. }
  9. ]
  10. }
  11. POST test/_doc/2
  12. {
  13. "id": 2,
  14. "authors": [
  15. {
  16. "name": "Test Name",
  17. "url": "/url/1/"
  18. },
  19. {
  20. "name": "Another author",
  21. "url": "/url/another/"
  22. }
  23. ]
  24. }

字符串
这个查询:

  1. GET test/_search
  2. {
  3. "query": {
  4. "function_score": {
  5. "query": {
  6. "bool": {
  7. "should": [
  8. {
  9. "match_phrase": {
  10. "authors.name": {
  11. "_name": "exact match in authors",
  12. "query": "Test Name",
  13. "boost": 100,
  14. "slop": 1
  15. }
  16. }
  17. }
  18. ]
  19. }
  20. }
  21. }
  22. }
  23. }


为什么当有多个作者时,分数会降低?我如何才能使它更高或与只有一个作者的记录相同?

  1. {
  2. ...
  3. "hits": {
  4. "max_score": 42.221836,
  5. "hits": [
  6. {
  7. "_score": 42.221836,
  8. "_source": {
  9. "id": 1,
  10. "authors": [
  11. {
  12. "name": "Test Name",
  13. "url": "/url/1/"
  14. }
  15. ]
  16. },
  17. "matched_queries": [
  18. "exact match in authors"
  19. ]
  20. },
  21. {
  22. "_score": 32.088596,
  23. "_source": {
  24. "id": 2,
  25. "authors": [
  26. {
  27. "name": "Test Name",
  28. "url": "/url/1/"
  29. },
  30. {
  31. "name": "Another author",
  32. "url": "/url/another/"
  33. }
  34. ]
  35. },
  36. "matched_queries": [
  37. "exact match in authors"
  38. ]
  39. }
  40. ]
  41. }
  42. }


我在文件上找不到任何关于这个的东西。
下面的详细信息只是为了确保stackoverflow不会显示以下错误:It looks like your post is mostly code; please add some more details.

kqqjbcuj

kqqjbcuj1#

TLDR;

这是因为你的第二个文件有一个较长的字段。你可能不习惯看:

  • 恒定计分
  • 滤波器
  • 功能评分

去理解

这是什么意思?
Elasticsearch在处理一个文档数组时,会像这样存储它们:
最初:

  1. {
  2. "authors": [
  3. {
  4. "name": "A0"
  5. },
  6. {
  7. "name": "A1"
  8. }
  9. ]
  10. }

字符串
收件人:

  1. {
  2. "authors.name": ["A0", "A1"]
  3. }


而文档得分的计算采用TF/IDF,但TF与文档长度有关。

  • 文档% 1 authors.name的长度为% 2
  • Doc 2 authors.name的长度为4

调查:

你可以使用API _explain

  1. GET 77469343/_explain/1
  2. {
  3. "query": {
  4. "bool": {
  5. "should": [
  6. {
  7. "match_phrase": {
  8. "authors.name": {
  9. "_name": "exact match in authors",
  10. "query": "Test Name",
  11. "boost": 100,
  12. "slop": 1
  13. }
  14. }
  15. }
  16. ]
  17. }
  18. }
  19. }


这将给你给予以下结果:

文档1

  1. {
  2. "_index": "77469343",
  3. "_id": "1",
  4. "matched": true,
  5. "explanation": {
  6. "value": 42.221836,
  7. "description": """weight(authors.name:"test name"~1 in 0) [PerFieldSimilarity], result of:""",
  8. "details": [
  9. {
  10. "value": 42.221836,
  11. "description": "score(freq=1.0), computed as boost * idf * tf from:",
  12. "details": [
  13. {
  14. "value": 220,
  15. "description": "boost",
  16. "details": []
  17. },
  18. {
  19. "value": 0.36464313,
  20. "description": "idf, sum of:",
  21. "details": [
  22. {
  23. "value": 0.18232156,
  24. "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
  25. "details": [
  26. {
  27. "value": 2,
  28. "description": "n, number of documents containing term",
  29. "details": []
  30. },
  31. {
  32. "value": 2,
  33. "description": "N, total number of documents with field",
  34. "details": []
  35. }
  36. ]
  37. },
  38. {
  39. "value": 0.18232156,
  40. "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
  41. "details": [
  42. {
  43. "value": 2,
  44. "description": "n, number of documents containing term",
  45. "details": []
  46. },
  47. {
  48. "value": 2,
  49. "description": "N, total number of documents with field",
  50. "details": []
  51. }
  52. ]
  53. }
  54. ]
  55. },
  56. {
  57. "value": 0.5263158,
  58. "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
  59. "details": [
  60. {
  61. "value": 1,
  62. "description": "phraseFreq=1.0",
  63. "details": []
  64. },
  65. {
  66. "value": 1.2,
  67. "description": "k1, term saturation parameter",
  68. "details": []
  69. },
  70. {
  71. "value": 0.75,
  72. "description": "b, length normalization parameter",
  73. "details": []
  74. },
  75. {
  76. "value": 2,
  77. "description": "dl, length of field",
  78. "details": []
  79. },
  80. {
  81. "value": 3,
  82. "description": "avgdl, average length of field",
  83. "details": []
  84. }
  85. ]
  86. }
  87. ]
  88. }
  89. ]
  90. }
  91. }

文档2

  1. {
  2. "_index": "77469343",
  3. "_id": "2",
  4. "matched": true,
  5. "explanation": {
  6. "value": 32.088596,
  7. "description": """weight(authors.name:"test name"~1 in 1) [PerFieldSimilarity], result of:""",
  8. "details": [
  9. {
  10. "value": 32.088596,
  11. "description": "score(freq=1.0), computed as boost * idf * tf from:",
  12. "details": [
  13. {
  14. "value": 220,
  15. "description": "boost",
  16. "details": []
  17. },
  18. {
  19. "value": 0.36464313,
  20. "description": "idf, sum of:",
  21. "details": [
  22. {
  23. "value": 0.18232156,
  24. "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
  25. "details": [
  26. {
  27. "value": 2,
  28. "description": "n, number of documents containing term",
  29. "details": []
  30. },
  31. {
  32. "value": 2,
  33. "description": "N, total number of documents with field",
  34. "details": []
  35. }
  36. ]
  37. },
  38. {
  39. "value": 0.18232156,
  40. "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
  41. "details": [
  42. {
  43. "value": 2,
  44. "description": "n, number of documents containing term",
  45. "details": []
  46. },
  47. {
  48. "value": 2,
  49. "description": "N, total number of documents with field",
  50. "details": []
  51. }
  52. ]
  53. }
  54. ]
  55. },
  56. {
  57. "value": 0.40000004,
  58. "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
  59. "details": [
  60. {
  61. "value": 1,
  62. "description": "phraseFreq=1.0",
  63. "details": []
  64. },
  65. {
  66. "value": 1.2,
  67. "description": "k1, term saturation parameter",
  68. "details": []
  69. },
  70. {
  71. "value": 0.75,
  72. "description": "b, length normalization parameter",
  73. "details": []
  74. },
  75. {
  76. "value": 4,
  77. "description": "dl, length of field",
  78. "details": []
  79. },
  80. {
  81. "value": 3,
  82. "description": "avgdl, average length of field",
  83. "details": []
  84. }
  85. ]
  86. }
  87. ]
  88. }
  89. ]
  90. }
  91. }

修复

常量评分

如果你仍然想要一个分数,你可能想看看constant_score查询:

  1. GET 77469343/_search
  2. {
  3. "query": {
  4. "bool": {
  5. "should": [
  6. {
  7. "constant_score": {
  8. "filter": {
  9. "match_phrase": {
  10. "authors.name": {
  11. "_name": "exact match in authors",
  12. "query": "Test Name",
  13. "boost": 100,
  14. "slop": 1
  15. }
  16. }
  17. },
  18. "boost": 1.2
  19. }
  20. }
  21. ]
  22. }
  23. }
  24. }

过滤而不是应该?

如果你使用过滤器,匹配的文档不会影响分数:

  1. GET 77469343/_search
  2. {
  3. "query": {
  4. "bool": {
  5. "filter": [
  6. {
  7. "match_phrase": {
  8. "authors.name": {
  9. "_name": "exact match in authors",
  10. "query": "Test Name",
  11. "boost": 100,
  12. "slop": 1
  13. }
  14. }
  15. }
  16. ]
  17. }
  18. }
  19. }

展开查看全部
o4tp2gmn

o4tp2gmn2#

我尝试了@paulo解决方案,但它并不完全适合我,所以我最终添加了一个嵌套字段:

  1. "authors": {
  2. "type": "nested",
  3. "properties": {
  4. "name": {
  5. "type": "text",
  6. "fields": {
  7. "keyword": {
  8. "type": "keyword",
  9. "ignore_above": 256
  10. }
  11. }
  12. },
  13. "url": {
  14. "type": "text",
  15. "fields": {
  16. "keyword": {
  17. "type": "keyword",
  18. "ignore_above": 256
  19. }
  20. }
  21. }
  22. }
  23. },

字符串
并使用此查询:

  1. {
  2. "nested": {
  3. "path": "authors",
  4. "_name": "exact match in authors",
  5. "query": {
  6. "bool": {
  7. "must": {
  8. "match_phrase": {
  9. "authors.name": {
  10. "query": "Test Name",
  11. "boost": 100,
  12. "slop": 1,
  13. }
  14. }
  15. }
  16. }
  17. },
  18. }
  19. }


ElasticSearch文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
经过这些修改后,它工作得很好!

展开查看全部

相关问题