我是Lucene的新手,可能做错了什么,所以如果是这样的话,请纠正我。我已经找了几天答案了,不知道该怎么办。
我们的目标是使用Lucene.NET
通过部分搜索(如StartsWith
)来搜索用户名,并只突出显示找到的部分。
我是这样处理的。
首先索引创建:
using var indexDir = FSDirectory.Open(Path.Combine(IndexDirectory, IndexName));
using var standardAnalyzer = new StandardAnalyzer(CurrentVersion);
var indexConfig = new IndexWriterConfig(CurrentVersion, standardAnalyzer);
indexConfig.OpenMode = OpenMode.CREATE_OR_APPEND;
using var indexWriter = new IndexWriter(indexDir, indexConfig);
if (indexWriter.NumDocs == 0)
{
//fill the index with Documents
}
文件的建立方式如下:
static Document BuildClientDocument(int id, string surname, string name)
{
var document = new Document()
{
new StringField("Id", id.ToString(), Field.Store.YES),
new TextField("Surname", surname, Field.Store.YES),
new TextField("Surname_sort", surname.ToLower(), Field.Store.NO),
new TextField("Name", name, Field.Store.YES),
new TextField("Name_sort", name.ToLower(), Field.Store.NO),
};
return document;
}
搜索过程如下:
using var multiReader = new MultiReader(indexWriter.GetReader(true)); //the plan was to use multiple indexes per entity types
var indexSearcher = new IndexSearcher(multiReader);
var queryString = "abc"; //just as a sample
var queryWords = queryString.SplitWords();
var query = new BooleanQuery();
queryWords
.Process((word, index) =>
{
var boolean = new BooleanQuery()
{
{ new PrefixQuery(new Term("Surname", word)) { Boost = 100 }, Occur.SHOULD }, //surnames are most important to match
{ new PrefixQuery(new Term("Name", word)) { Boost = 50 }, Occur.SHOULD }, //names are less important
};
boolean.Boost = (queryWords.Count() - index); //first words in a search query are more important than others
query.Add(boolean, Occur.MUST);
})
;
var topDocs = indexSearcher.Search(query, 50, new Sort( //sort by relevance and then in lexicographical order
SortField.FIELD_SCORE,
new SortField("Surname_sort", SortFieldType.STRING),
new SortField("Name_sort", SortFieldType.STRING)
));
并突出显示:
var htmlFormatter = new SimpleHTMLFormatter();
var queryScorer = new QueryScorer(query);
var highlighter = new Highlighter(htmlFormatter, queryScorer);
foreach (var found in topDocs.ScoreDocs)
{
var document = indexSearcher.Doc(found.Doc);
var surname = document.Get("Surname"); //just for simplicity
var surnameFragment = highlighter.GetBestFragment(standardAnalyzer, "Surname", surname);
Console.WriteLine(surnameFragment);
}
问题是荧光笔返回的结果如下:
<b>abc</b>
<b>abcd</b>
<b>abcde</b>
<b>abcdef</b>
因此,它“突出显示”整个单词,即使我正在搜索部分。Explain
返回NON-MATCH
所有的方式,所以不确定它是否有帮助。
是否可以只突出显示搜索到的零件?就像我的例子一样。
1条答案
按热度按时间kpbwa7wx1#
在进一步研究这一点的时候,我得出了一个结论,要使这种突出显示工作,需要调整索引生成方法,并按部分拆分索引,以便正确计算偏移量。否则突出显示将只突出显示周围的单词(片段)。
因此,基于此,我已经设法建立了一个简单的荧光笔自己。
使用方法如下:
输出如下所示:
第一个
我愿意接受其他更好的解决方案。