Lucene/Luwak无法匹配NOT查询中的数值

xxb16uws  于 2022-11-07  发布在  Lucene
关注(0)|答案(1)|浏览(209)

我已经创建了一个小的Lucene/Luwak原型,我在Lucene语法中添加了一个查询,在它之后我想提供一个InputDocument,它应该会给予我一个与该查询匹配的结果。
对于TextFields,一切似乎都在工作。然而,当我试图对Numbers / DoublePoint做同样的事情时,我从来没有得到匹配(对于Not查询/反向搜索)。
如果我使用文本值,则它是有效的:

storeRuleQuery("ruleID_1" , "textA:* -textA:A");
textValues.put("textA" , "B");
And in console: Match in Luwak: ruleID_1:textA:* -textA:A

VS系列

storeRuleQuery("ruleID_1" , "numberA:* -numberA:500");
numberValues.put("numberA" , 900d);
And in console: No Match

让我来解释一下我使用的代码:
首先,我为我的显示器创建一个RamDirectory:

fsDirectory = new RAMDirectory();

我还定义了一个字段类型:

private static final FieldType FIELD_TYPE = new FieldType();
FIELD_TYPE.setStored(false);
FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);

然后我创建我的监视器:

QueryIndexConfiguration config = new QueryIndexConfiguration();
        config.storeQueries(true);
        monitor = new Monitor(new LuwakQueryParser(null, new KeywordAnalyzer(), number, text), new TermFilteredPresearcher(), fsDirectory, config);

为了使用DoublePoints,我创建了自己的QueryParser(LuwakQueryParser)

public class LuwakQueryParser implements MonitorQueryParser {

    private QueryParser parser = null;

    /**
     * Creates a parser with a given default field and analyzer
     * @param defaultField the default field
     * @param analyzer an analyzer to use to analyzer query terms
     */
    public LuwakQueryParser(String defaultField, Analyzer analyzer, List<String> numbers, List<String> text) {
        this.parser = new RangeQueryParser(defaultField, analyzer, numbers, text);
        this.parser.setLowercaseExpandedTerms(false);
        this.parser.setAllowLeadingWildcard(true);
        this.parser.setDefaultOperator(Operator.OR);
    }

    @Override
    public Query parse(String query, Map<String, String> metadata) throws Exception {
        return parser.parse(query);
    }
}

正如您所看到的,我使用了一个自定义的RangeQueryParser,然后用它来解析查询

public class RangeQueryParser extends QueryParser {

    private final List<String> numbers;
    private final List<String> text;

    public RangeQueryParser(String f, Analyzer a, List<String> numbers, List<String> text) {
        super(f, a);
        this.numbers = numbers;
        this.text = text;
    }

    @Override
    protected Query newFieldQuery(Analyzer analyzer, String field, String queryText, boolean quoted) throws ParseException {
        if (StringUtils.isNotBlank(queryText) && isNumber(field) && NumberUtils.isNumber(queryText)) {
            //needed for single value, transforms it to a rage (eg [500 TO 500])
            return (DoublePoint.newExactQuery(field, Double.parseDouble(queryText)));
        } else if(isText(field)){
            return (super.newFieldQuery(analyzer, field, queryText, quoted));
        }
        return (super.newFieldQuery(analyzer, field, queryText, quoted));
    }

我已经删除了本例中当前不需要的未使用的代码
正如您所看到的,newFieldQuery方法检查它是文本值还是数字值,并调整查询。文本将存储为普通的fieldQuery,而数字将转换为DoublePoint. newExactQuery。例如,它将“numberA:500”转换为“numberA:[500 to 500]”
然后,我向监视器添加一个查询:

//input: storeRuleQuery("ruleID_1" , "numberA:* -numberA:500");

public void storeRuleQuery(String ruleID, String query) throws IOException, UpdateException {
        String queryString = query;
        if (queryString.trim().length() > 0) {
            MonitorQuery monitorQuery = new MonitorQuery(ruleID, queryString);
            monitor.deleteById(ruleID);
            monitor.update(monitorQuery);
        }
    }

这是由monitor.update()方法调用创建的BooleanQuery:

然后,我想通过提供一个InputDocument来匹配ruleID_1,如下所示:

Map<String, Double> numberValues = new HashMap<>();
Map<String, String> textValues = new HashMap<>();

numberValues.put("numberA" , 900d);
InputDocument.Builder builder = InputDocument.builder("document_1");
        for(String numberField : numberValues.keySet()){
            builder.addField(new DoublePoint(numberField, (numberValues.get(numberField))));
        }

        for(String textField : textValues.keySet()){
            builder.addField(new Field(textField, (textValues.get(textField)), FIELD_TYPE));
        }

        List<InputDocument> documents = new ArrayList() {{
            add(builder.build());
        }};
        DocumentBatch batch = DocumentBatch.of(documents);
        Matches<HighlightsMatch> matches;
        matches = monitor.match(batch, HighlightingMatcher.FACTORY);

这是从这个输入文档和我们的matcher.match()创建的布尔查询:
第一页第二页
然后我检索匹配项(在本例中,我得到0个匹配项):

Set<Map<String, String>> matchingIds = new HashSet<>();
        for (DocumentMatches<HighlightsMatch> docMatches : matches) {
            for (HighlightsMatch match : docMatches) {
                MonitorQuery mq = monitor.getQuery(match.getQueryId());
                HashMap<String, String> q = new HashMap<>();
                q.put(match.getQueryId(), mq.getQuery());
                matchingIds.add(q);
            }
        }

        Map<String, String> results = new HashMap<>();

        for (Map<String, String> v : matchingIds) {
            results.put(v.keySet().iterator().next(), v.values().iterator().next());
        }

        for(String key : results.keySet()){
            System.out.println("Match in Luwak: " + key + ":" + results.get(key));
        }

我使用的luwak版本:

<dependency>
       <groupId>com.github.flaxsearch</groupId>
       <artifactId>luwak</artifactId>
       <version>1.5.0</version>
 </dependency>
5lhxktic

5lhxktic1#

简短回答:通配符只能匹配字符;不能用于数字。我现在要将数字通配符表示为一个范围,如numberA:[-Double.MAX_VALUE TO Double.MAX_VALUE]

相关问题