solr/lucene短文本相似类

9fkzdhlc  于 2021-07-12  发布在  Java
关注(0)|答案(0)|浏览(222)

如果我有两个文档,其中一个字段“name”已编入索引,并且值为:
文件1:manfred mü卖方
文件2:manfred maximilian mü卖方
我在寻找“m”ü在具有两个碎片/副本的solr示例上,有时doc1的得分高于doc2,有时则相反。哪个相似度等级的可靠度DOC1高于DOC2(越短越好)?
查询:

rows=100&fl=*,score&q=_query_:"{!complexphrase+df%3Dname}Müller"+OR+_query_:"{!complexphrase+df%3Dname_exakt}Müller"&fq=(quelle:GP)&sort=score+DESC

领域:

<field name="name" type="komplexes_textfeld" indexed="true" stored="true"/>
<field name="name_exakt" type="einfaches_textfeld" indexed="true" stored="true"/>
<copyField source="name" dest="name_exakt"/>

请求处理程序:

<requestHandler name="/select" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <int name="rows">10</int>
      <str name="qf">name_exakt^3 name^2</str>

字段类型:

<fieldType name="einfaches_textfeld" class="solr.TextField" positionIncrementGap="100">
          <analyzer>
            <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-chars.txt"/>
            <tokenizer class="solr.StandardTokenizerFactory"/> 
            <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="false" />
            <filter class="solr.LowerCaseFilterFactory"/>
            <!-- <filter class="solr.GermanNormalizationFilterFactory"/> -->
          </analyzer>

          <similarity class="org.apache.lucene.search.similarities.BM25Similarity" />
        </fieldType>

<fieldType name="komplexes_textfeld" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
            <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-chars.txt"/>
            <tokenizer class="solr.StandardTokenizerFactory"/> 
            <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="false" />
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> 
            <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="50" preserveOriginal="true"/>
        </analyzer> 

        <analyzer type="query"> 
            <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-chars.txt"/>
            <tokenizer class="solr.StandardTokenizerFactory"/> 
            <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="false" />
            <filter class="solr.LowerCaseFilterFactory"/> 
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> 
        </analyzer> 

        <similarity class="org.apache.lucene.search.similarities.BM25Similarity" />
    </fieldType>

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题