在产生脏令牌的Lucene中实现自定义前缀移除器令牌过滤器

vsaztqbk  于 2022-11-07  发布在  Lucene
关注(0)|答案(1)|浏览(178)

我正在尝试实现一个lucene过滤器来删除查询中某个词的前缀。看起来在多次查询之后,过滤器被重用,所以char缓冲区是脏的。
下面的代码是简化的,前缀是一个外部参数。

public static class PrefixFilter extends TokenFilter {

    private final PackedTokenAttributeImpl termAtt = (PackedTokenAttributeImpl) addAttribute(CharTermAttribute.class);

    public PrefixFilter(TokenStream in) {
      super(in);
    }

    @Override
    public final boolean incrementToken() throws IOException {
      if (!input.incrementToken()) {
        return false;
      }
      String value = new String(termAtt.buffer());
      value = value.trim();
      value = value.toLowerCase();
      value = StringUtils.removeStart(value, "prefix_");
      if (value.isBlank()) {
        termAtt.setEmpty();
      } else {
        termAtt.copyBuffer(value.toCharArray(), 0, value.length());
        termAtt.setLength(value.length());
      }
      return true;
    }
  }

所以在10或12次查询之后,值“prefix_a”变成了“abcde”。
因此,我尝试以这种方式添加termBuffer偏移结束值:

termAtt.setEmpty();
    termAtt.resizeBuffer(value.length());
    termAtt.copyBuffer(value.toCharArray(), 0, value.length());
    termAtt.setLength(value.length());
    termAtt.setOffset(0, value.length());

但我不知道这是不是对的。有人能帮我吗?

  • 谢谢-谢谢
ki0zmccv

ki0zmccv1#

看看这对你有没有帮助,

/**
 * Standard number token filter.
 */
public class StandardnumberTokenFilter extends TokenFilter {

    private final LinkedList<PackedTokenAttributeImpl> tokens;

    private final StandardnumberService service;

    private final Settings settings;

    private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);

    private final PositionIncrementAttribute posIncAtt = addAttribute(PositionIncrementAttribute.class);

    private State current;

    protected StandardnumberTokenFilter(TokenStream input, StandardnumberService service, Settings settings) {
        super(input);
        this.tokens = new LinkedList<>();
        this.service = service;
        this.settings = settings;
    }

    @Override
    public final boolean incrementToken() throws IOException {
        if (!tokens.isEmpty()) {
            if (current == null) {
                throw new IllegalArgumentException("current is null");
            }
            PackedTokenAttributeImpl token = tokens.removeFirst();
            restoreState(current);
            termAtt.setEmpty().append(token);
            posIncAtt.setPositionIncrement(0);
            return true;
        }
        if (input.incrementToken()) {
            detect();
            if (!tokens.isEmpty()) {
                current = captureState();
            }
            return true;
        } else {
            return false;
        }
    }

    private void detect() throws CharacterCodingException {
        CharSequence term = new String(termAtt.buffer(), 0, termAtt.length());
        Collection<CharSequence> variants = service.lookup(settings, term);
        for (CharSequence ch : variants) {
            if (ch != null) {
                PackedTokenAttributeImpl token = new PackedTokenAttributeImpl();
                token.append(ch);
                tokens.add(token);
            }
        }
    }

    @Override
    public void reset() throws IOException {
        super.reset();
        tokens.clear();
        current = null;
    }

    @Override
    public boolean equals(Object object) {
        return object instanceof StandardnumberTokenFilter &&
                service.equals(((StandardnumberTokenFilter)object).service) &&
                settings.equals(((StandardnumberTokenFilter)object).settings);
    }

    @Override
    public int hashCode() {
        return service.hashCode() ^ settings.hashCode();
    }
}

https://github.com/jprante/elasticsearch-plugin-bundle/blob/f63690f877cc7f50360faffbac827622c9d404ef/src/main/java/org/xbib/elasticsearch/plugin/bundle/index/analysis/standardnumber/StandardnumberTokenFilter.java

相关问题