lucene 分组收集器中不支持排序的数值字段

lhcgjxsq  于 2023-03-12  发布在  Lucene
关注(0)|答案(1)|浏览(311)

以下函数根据搜索结果收集可用的筛选器选项,但它仅适用于NUMERIC字段,不适用于SORTED_NUMERIC

fun IndexSearcher.collectNumericFilterOptions(
    query: Query,
    sort: Sort = Sort(),
    field: String,
    topNGroups: Int = 128,
    mapper: Function<Int?, Int?> = Function { it },
): Set<Int?> {
    val firstGroupSelector = LongRangeGroupSelector(LongValuesSource.fromIntField(field), LongRangeFactory(1, 1, topNGroups.toLong()))
    val firstPassGroupingCollector = FirstPassGroupingCollector(firstGroupSelector, sort, topNGroups)
    search(query, firstPassGroupingCollector)
    val topGroups = firstPassGroupingCollector.getTopGroups(0) ?: return emptySet()
    val groupSelector = firstPassGroupingCollector.groupSelector
    val distinctValuesCollector = DistinctValuesCollector(groupSelector, topGroups, groupSelector)
    search(query, distinctValuesCollector)
    return distinctValuesCollector.groups.map { mapper.apply(it.groupValue.min.toInt()) }.toSet()
}

当字段将被索引为这样,然后它的工作:

document.apply {
    add(IntPoint(ITERATION.name, iteration ?: -1))
    add(StoredField(ITERATION.name, iteration ?: -1))
    add(NumericDocValuesField(ITERATION.name, iteration?.toLong() ?: -1L)) 
}

但当像这样索引时就不一样了:

document.apply {
    add(IntPoint(ITERATION.name, iteration ?: -1))
    add(StoredField(ITERATION.name, iteration ?: -1))
    add(SortedNumericDocValuesField(ITERATION.name, iteration?.toLong() ?: -1L))
}

将引发以下异常:

java.lang.IllegalStateException: unexpected docvalues type SORTED_NUMERIC for field 'ITERATION' (expected=NUMERIC). Re-index with correct docvalues type.

    at org.apache.lucene.index.DocValues.checkField(DocValues.java:218)
    at org.apache.lucene.index.DocValues.getNumeric(DocValues.java:237)
    at org.apache.lucene.search.LongValuesSource$FieldValuesSource.getValues(LongValuesSource.java:254)
    at org.apache.lucene.search.grouping.LongRangeGroupSelector.setScorer(LongRangeGroupSelector.java:63)
    at org.apache.lucene.search.grouping.FirstPassGroupingCollector.setScorer(FirstPassGroupingCollector.java:157)

有没有办法让它也适用于排序的数值字段?

c0vxltue

c0vxltue1#

实现了自定义LongFieldValuesSource

data class LongFieldValuesSource(val field: String) : LongValuesSource() {
    @Throws(IOException::class)
    override fun getValues(ctx: LeafReaderContext, scores: DoubleValues) = toLongValues(DocValues.getSortedNumeric(ctx.reader(), field))

    override fun isCacheable(ctx: LeafReaderContext) = DocValues.isCacheable(ctx, field)

    override fun needsScores() = false

    @Throws(IOException::class)
    override fun rewrite(searcher: IndexSearcher) = this

    private fun toLongValues(sortedNumericDocValues: SortedNumericDocValues) =
        object : LongValues() {
            @Throws(IOException::class)
            override fun longValue() = sortedNumericDocValues.nextValue()

            @Throws(IOException::class)
            override fun advanceExact(target: Int) = sortedNumericDocValues.advanceExact(target)
        }
}

相关问题