如何在DenseVectorField上搜索Solr 9时获得距离分数

2sbarzqh  于 2024-01-07  发布在  Solr
关注(0)|答案(1)|浏览(245)

我为一些诗歌和童谣创建了一个solr索引(版本9.3.0)。我试图搜索相关的诗歌和童谣,并希望获取每个匹配文档的点积距离。我找不到任何方法来获取该信息。以下是我在managed-schema文件中添加到solr的字段:

  1. <fieldType name="knn_vector" class="solr.DenseVectorField" vectorDimension="384"
  2. similarityFunction="dot_product" knnAlgorithm="hnsw"
  3. hnswMaxConnections="16" hnswBeamWidth="50"/>
  4. <field name="bge_small_vector" type="knn_vector" indexed="true" stored="true"/>

字符串
下面是我用来查询solr索引的python代码:

  1. import pysolr
  2. from encoder import Encoder
  3. from sentence_transformers import SentenceTransformer
  4. import pprint
  5. pp = pprint.PrettyPrinter(indent=4, width=100)
  6. solr = pysolr.Solr('http://localhost:8983/solr/docindex')
  7. model = SentenceTransformer('BAAI/bge-small-en-v1.5')
  8. document = '''Three blind mice. Three blind mice.
  9. See how they run. See how they run.
  10. They all ran after the farmer's wife,
  11. Who cut off their tails with a carving knife.
  12. Did you ever see such a sight in your life
  13. As three blind mice?'''
  14. embedding = model.encode(document, normalize_embeddings=True, convert_to_numpy=True)
  15. solr_response=solr.search(
  16. q=r'{!knn f=bge_small_vector topK=10}[' + ",".join([f'{a:.12f}' for a in embedding]) + ']',
  17. rows=10,
  18. start=0,
  19. debugQuery="true",
  20. wt='json')
  21. for item in solr_response:
  22. pp.pprint(item)
  23. pp.pprint(solr_response.debug)


我能找到的关于距离的唯一参考是在调试响应中,它并不特定于任何文档:

  1. { 'QParser': 'KnnQParser',
  2. 'explain': {'': '\n**0.81944466 = within top 10**\n'},
  3. 'parsedquery': 'KnnVectorQuery(KnnVectorQuery:bge_small_vector[-0.02721269,...][10])',
  4. 'parsedquery_toString': 'KnnVectorQuery:bge_small_vector[-0.02721269,...][10]',
  5. ...
  6. }


有谁知道如何让solr在DenseVectorField查询中返回每个文档的距离?

9bfwbjaz

9bfwbjaz1#

https://opensearch.org/docs/latest/search-plugins/knn/approximate-knn/的论文中,它展示了在OpenSearch中将距离转换为分数的方法。我刚刚测试了L2距离分数= 1 /(1 + distance),因此距离=(1 / score)- 1。对于欧几里得距离,您可能需要取结果的平方根。

相关问题