org.apache.lucene.util.fst.Util类的使用及代码示例

x33g5p2x  于2022-02-01 转载在 其他  
字(14.6k)|赞(0)|评价(0)|浏览(223)

本文整理了Java中org.apache.lucene.util.fst.Util类的一些代码示例,展示了Util类的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。Util类的具体详情如下:
包路径:org.apache.lucene.util.fst.Util
类名称:Util

Util介绍

[英]Static helper methods.
[中]静态助手方法。

代码示例

代码示例来源:origin: org.apache.lucene/lucene-core

  1. final Builder<BytesRef> indexBuilder = new Builder<>(FST.INPUT_TYPE.BYTE1,
  2. 0, 0, true, false, Integer.MAX_VALUE,
  3. outputs, true, 15);
  4. assert bytes.length > 0;
  5. scratchBytes.writeTo(bytes, 0);
  6. indexBuilder.add(Util.toIntsRef(prefix, scratchIntsRef), new BytesRef(bytes, 0, bytes.length));
  7. scratchBytes.reset();
  8. index = indexBuilder.finish();

代码示例来源:origin: org.apache.lucene/lucene-core

  1. SegmentTermsEnumFrame f = getFrame(ord);
  2. assert f != null;
  3. final BytesRef prefix = new BytesRef(term.get().bytes, 0, f.prefix);
  4. if (f.nextEnt == -1) {
  5. out.println(" frame " + (isSeekFrame ? "(seek)" : "(next)") + " ord=" + ord + " fp=" + f.fp + (f.isFloor ? (" (fpOrig=" + f.fpOrig + ")") : "") + " prefixLen=" + f.prefix + " prefix=" + prefix + (f.nextEnt == -1 ? "" : (" (of " + f.entCount + ")")) + " hasTerms=" + f.hasTerms + " isFloor=" + f.isFloor + " code=" + ((f.fp<< BlockTreeTermsReader.OUTPUT_FLAGS_NUM_BITS) + (f.hasTerms ? BlockTreeTermsReader.OUTPUT_FLAG_HAS_TERMS:0) + (f.isFloor ? BlockTreeTermsReader.OUTPUT_FLAG_IS_FLOOR:0)) + " isLastInFloor=" + f.isLastInFloor + " mdUpto=" + f.metaDataUpto + " tbOrd=" + f.getTermBlockOrd());
  6. if (f.prefix > 0 && isSeekFrame && f.arc.label != (term.byteAt(f.prefix-1)&0xFF)) {
  7. out.println(" broken seek state: arc.label=" + (char) f.arc.label + " vs term byte=" + (char) (term.byteAt(f.prefix-1)&0xFF));
  8. throw new RuntimeException("seek state is broken");
  9. BytesRef output = Util.get(fr.index, prefix);
  10. if (output == null) {
  11. out.println(" broken seek state: prefix is not final in index");

代码示例来源:origin: org.apache.lucene/lucene-core

  1. /** Reverse lookup (lookup by output instead of by input),
  2. * in the special case when your FSTs outputs are
  3. * strictly ascending. This locates the input/output
  4. * pair where the output is equal to the target, and will
  5. * return null if that output does not exist.
  6. *
  7. * <p>NOTE: this only works with {@code FST<Long>}, only
  8. * works when the outputs are ascending in order with
  9. * the inputs.
  10. * For example, simple ordinals (0, 1,
  11. * 2, ...), or file offets (when appending to a file)
  12. * fit this. */
  13. public static IntsRef getByOutput(FST<Long> fst, long targetOutput) throws IOException {
  14. final BytesReader in = fst.getBytesReader();
  15. // TODO: would be nice not to alloc this on every lookup
  16. FST.Arc<Long> arc = fst.getFirstArc(new FST.Arc<Long>());
  17. FST.Arc<Long> scratchArc = new FST.Arc<>();
  18. final IntsRefBuilder result = new IntsRefBuilder();
  19. return getByOutput(fst, targetOutput, in, arc, scratchArc, result);
  20. }

代码示例来源:origin: org.apache.lucene/lucene-core

  1. emitDotState(out, "initial", "point", "white", "");
  2. emitDotState(out, Long.toString(startArc.target), isFinal ? finalStateShape : stateShape, stateColor, finalOutput == null ? "" : fst.outputs.outputToString(finalOutput));
  3. emitDotState(out, Long.toString(arc.target), stateShape, stateColor, finalOutput);
  4. out.write(" " + node + " -> " + arc.target + " [label=\"" + printableLabel(arc.label) + outs + "\"" + (arc.isFinal() ? " style=\"bold\"" : "" ) + " color=\"" + arcColor + "\"]\n");

代码示例来源:origin: org.elasticsearch/elasticsearch

  1. BytesRefBuilder scratch = new BytesRefBuilder();
  2. new LimitedFiniteStringsIterator(toAutomaton(surfaceForm, ts2a), maxGraphExpansions);
  3. for (IntsRef string; (string = finiteStrings.next()) != null; count++) {
  4. Util.toBytesRef(string, scratch);
  5. if (scratch.length() > Short.MAX_VALUE-2) {
  6. throw new IllegalArgumentException(
  7. "cannot handle analyzed forms > " + (Short.MAX_VALUE-2) + " in length (got " + scratch.length() + ")");
  8. short analyzedLength = (short) scratch.length();
  9. Builder<Pair<Long,BytesRef>> builder = new Builder<>(FST.INPUT_TYPE.BYTE1, outputs);
  10. analyzed.append((byte) dedup);
  11. Util.toIntsRef(analyzed.get(), scratchInts);
  12. builder.add(scratchInts.get(), outputs.newPair(cost, BytesRef.deepCopyOf(surface)));
  13. } else {
  14. int payloadOffset = input.getPosition() + surface.length;
  15. System.arraycopy(bytes.bytes, payloadOffset, br.bytes, surface.length+1, payloadLength);
  16. br.length = br.bytes.length;
  17. builder.add(scratchInts.get(), outputs.newPair(cost, br));

代码示例来源:origin: org.apache.lucene/lucene-analyzers-common

  1. BytesRefBuilder flagsScratch = new BytesRefBuilder();
  2. IntsRefBuilder scratchInts = new IntsRefBuilder();
  3. BytesRef scratch1 = new BytesRef();
  4. BytesRef scratch2 = new BytesRef();
  5. IntsRefBuilder currentOrds = new IntsRefBuilder();
  6. } else {
  7. encodeFlags(flagsScratch, wordForm);
  8. int ord = flagLookup.add(flagsScratch.get());
  9. if (ord < 0) {
  10. Util.toUTF32(currentEntry, scratchInts);
  11. words.add(scratchInts.get(), currentOrds.get());
  12. Util.toUTF32(currentEntry, scratchInts);
  13. words.add(scratchInts.get(), currentOrds.get());
  14. success2 = true;
  15. } finally {

代码示例来源:origin: harbby/presto-connectors

  1. BytesRefBuilder b = new BytesRefBuilder();
  2. b.append(tokenBytes);
  3. lastTokens[gramCount-1] = b;
  4. for(int i=token.length()-1;i>=0;i--) {
  5. if (token.byteAt(i) == separator) {
  6. BytesRef context = new BytesRef(token.bytes(), 0, i);
  7. Long output = Util.get(fst, Util.toIntsRef(context, new IntsRefBuilder()));
  8. assert output != null;
  9. contextCount = decodeWeight(output);
  10. lastTokenFragment = new BytesRef(token.bytes(), i + 1, token.length() - i - 1);
  11. break;
  12. searcher.addStartPaths(arc, prefixOutput, true, new IntsRefBuilder());
  13. token.setLength(prefixLength);
  14. Util.toBytesRef(completion.input, suffix);
  15. token.append(suffix);

代码示例来源:origin: harbby/presto-connectors

  1. BytesRef scratch = new BytesRef();
  2. InputIterator iter = new WFSTInputIterator(iterator);
  3. IntsRefBuilder scratchInts = new IntsRefBuilder();
  4. BytesRefBuilder previous = null;
  5. PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
  6. Builder<Long> builder = new Builder<>(FST.INPUT_TYPE.BYTE1, outputs);
  7. while ((scratch = iter.next()) != null) {
  8. long cost = iter.weight();
  9. previous = new BytesRefBuilder();
  10. } else if (scratch.equals(previous.get())) {
  11. continue; // for duplicate suggestions, the best weight is actually
  12. Util.toIntsRef(scratch, scratchInts);
  13. builder.add(scratchInts.get(), cost);
  14. previous.copyBytes(scratch);
  15. count++;
  16. fst = builder.finish();

代码示例来源:origin: org.elasticsearch/elasticsearch

  1. public void finishTerm(long defaultWeight) throws IOException {
  2. ArrayUtil.timSort(surfaceFormsAndPayload, 0, count);
  3. int deduplicator = 0;
  4. analyzed.append((byte) 0);
  5. analyzed.setLength(analyzed.length() + 1);
  6. analyzed.grow(analyzed.length());
  7. for (int i = 0; i < count; i++) {
  8. analyzed.setByteAt(analyzed.length() - 1, (byte) deduplicator++);
  9. Util.toIntsRef(analyzed.get(), scratchInts);
  10. SurfaceFormAndPayload candiate = surfaceFormsAndPayload[i];
  11. long cost = candiate.weight == -1 ? encodeWeight(Math.min(Integer.MAX_VALUE, defaultWeight)) : candiate.weight;
  12. builder.add(scratchInts.get(), outputs.newPair(cost, candiate.payload));
  13. }
  14. seenSurfaceForms.clear();
  15. count = 0;
  16. }

代码示例来源:origin: lintool/warcbase

  1. public String getUrl(int id) {
  2. BytesRef scratchBytes = new BytesRef();
  3. IntsRef key = null;
  4. try {
  5. key = Util.getByOutput(fst, id);
  6. } catch (IOException e) {
  7. LOG.error("Error id " + id);
  8. e.printStackTrace();
  9. return null;
  10. }
  11. if (key == null) {
  12. return null;
  13. }
  14. return Util.toBytesRef(key, scratchBytes).utf8ToString();
  15. }

代码示例来源:origin: org.apache.lucene/lucene-analyzers-common

  1. new org.apache.lucene.util.fst.Builder<>(FST.INPUT_TYPE.BYTE4, outputs);
  2. BytesRefBuilder scratch = new BytesRefBuilder();
  3. ByteArrayDataOutput scratchOutput = new ByteArrayDataOutput();
  4. Arrays.sort(sortedKeys, CharsRef.getUTF16SortedAsUTF8Comparator());
  5. final IntsRefBuilder scratchIntsRef = new IntsRefBuilder();
  6. scratch.grow(estimatedSize);
  7. scratchOutput.reset(scratch.bytes());
  8. scratch.setLength(scratchOutput.getPosition());
  9. builder.add(Util.toUTF32(input, scratchIntsRef), scratch.toBytesRef());
  10. FST<BytesRef> fst = builder.finish();
  11. return new SynonymMap(fst, words, maxHorizontalContext);

代码示例来源:origin: org.apache.lucene/lucene-spellchecker

  1. /**
  2. * Builds the final automaton from a list of entries.
  3. */
  4. private FST<Object> buildAutomaton(BytesRefSorter sorter) throws IOException {
  5. // Build the automaton.
  6. final Outputs<Object> outputs = NoOutputs.getSingleton();
  7. final Object empty = outputs.getNoOutput();
  8. final Builder<Object> builder = new Builder<Object>(
  9. FST.INPUT_TYPE.BYTE1, 0, 0, true, true,
  10. shareMaxTailLength, outputs, null, false);
  11. BytesRef scratch = new BytesRef();
  12. BytesRef entry;
  13. final IntsRef scratchIntsRef = new IntsRef();
  14. int count = 0;
  15. BytesRefIterator iter = sorter.iterator();
  16. while((entry = iter.next()) != null) {
  17. count++;
  18. if (scratch.compareTo(entry) != 0) {
  19. builder.add(Util.toIntsRef(entry, scratchIntsRef), empty);
  20. scratch.copyBytes(entry);
  21. }
  22. }
  23. return count == 0 ? null : builder.finish();
  24. }
  25. }

代码示例来源:origin: org.apache.lucene/lucene-codecs

  1. OrdsSegmentTermsEnumFrame f = getFrame(ord);
  2. assert f != null;
  3. final BytesRef prefix = new BytesRef(term.bytes(), 0, f.prefix);
  4. if (f.nextEnt == -1) {
  5. out.println(" frame " + (isSeekFrame ? "(seek)" : "(next)") + " ord=" + ord + " fp=" + f.fp + (f.isFloor ? (" (fpOrig=" + f.fpOrig + ")") : "") + " prefixLen=" + f.prefix + " prefix=" + brToString(prefix) + (f.nextEnt == -1 ? "" : (" (of " + f.entCount + ")")) + " hasTerms=" + f.hasTerms + " isFloor=" + f.isFloor + " code=" + ((f.fp<<OrdsBlockTreeTermsWriter.OUTPUT_FLAGS_NUM_BITS) + (f.hasTerms ? OrdsBlockTreeTermsWriter.OUTPUT_FLAG_HAS_TERMS:0) + (f.isFloor ? OrdsBlockTreeTermsWriter.OUTPUT_FLAG_IS_FLOOR:0)) + " isLastInFloor=" + f.isLastInFloor + " mdUpto=" + f.metaDataUpto + " tbOrd=" + f.getTermBlockOrd() + " termOrd=" + f.termOrd);
  6. if (f.prefix > 0 && isSeekFrame && f.arc.label != (term.byteAt(f.prefix-1)&0xFF)) {
  7. out.println(" broken seek state: arc.label=" + (char) f.arc.label + " vs term byte=" + (char) (term.byteAt(f.prefix-1)&0xFF));
  8. throw new RuntimeException("seek state is broken");
  9. Output output = Util.get(fr.index, prefix);
  10. if (output == null) {
  11. out.println(" broken seek state: prefix is not final in index");

代码示例来源:origin: org.apache.lucene/lucene-spellchecker

  1. @Override
  2. public void build(TermFreqIterator iterator) throws IOException {
  3. BytesRef scratch = new BytesRef();
  4. TermFreqIterator iter = new WFSTTermFreqIteratorWrapper(iterator,
  5. BytesRef.getUTF8SortedAsUnicodeComparator());
  6. IntsRef scratchInts = new IntsRef();
  7. BytesRef previous = null;
  8. PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton(true);
  9. Builder<Long> builder = new Builder<Long>(FST.INPUT_TYPE.BYTE1, outputs);
  10. while ((scratch = iter.next()) != null) {
  11. long cost = iter.weight();
  12. if (previous == null) {
  13. previous = new BytesRef();
  14. } else if (scratch.equals(previous)) {
  15. continue; // for duplicate suggestions, the best weight is actually
  16. // added
  17. }
  18. Util.toIntsRef(scratch, scratchInts);
  19. builder.add(scratchInts, cost);
  20. previous.copyBytes(scratch);
  21. }
  22. fst = builder.finish();
  23. }

代码示例来源:origin: org.apache.lucene/lucene-analyzers

  1. new org.apache.lucene.util.fst.Builder<BytesRef>(FST.INPUT_TYPE.BYTE4, outputs);
  2. BytesRef scratch = new BytesRef(64);
  3. ByteArrayDataOutput scratchOutput = new ByteArrayDataOutput();
  4. scratch.grow(estimatedSize);
  5. scratchOutput.reset(scratch.bytes, scratch.offset, scratch.bytes.length);
  6. assert scratch.offset == 0;
  7. builder.add(Util.toUTF32(input, scratchIntsRef), BytesRef.deepCopyOf(scratch));
  8. FST<BytesRef> fst = builder.finish();
  9. return new SynonymMap(fst, words, maxHorizontalContext);

代码示例来源:origin: org.apache.lucene/lucene-codecs

  1. private void writeFST(FieldInfo field, Iterable<BytesRef> values) throws IOException {
  2. meta.writeVInt(field.number);
  3. meta.writeByte(FST);
  4. meta.writeLong(data.getFilePointer());
  5. PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
  6. Builder<Long> builder = new Builder<>(INPUT_TYPE.BYTE1, outputs);
  7. IntsRefBuilder scratch = new IntsRefBuilder();
  8. long ord = 0;
  9. for (BytesRef v : values) {
  10. builder.add(Util.toIntsRef(v, scratch), ord);
  11. ord++;
  12. }
  13. FST<Long> fst = builder.finish();
  14. if (fst != null) {
  15. fst.save(data);
  16. }
  17. meta.writeVLong(ord);
  18. }

代码示例来源:origin: org.apache.lucene/lucene-codecs

  1. final PairOutputs<Long,PairOutputs.Pair<Long,Long>> outputs = new PairOutputs<>(posIntOutputs,
  2. outputsInner);
  3. b = new Builder<>(FST.INPUT_TYPE.BYTE1, outputs);
  4. IndexInput in = SimpleTextFieldsReader.this.in.clone();
  5. in.seek(termsStart);
  6. final BytesRefBuilder lastTerm = new BytesRefBuilder();
  7. long lastDocsStart = -1;
  8. int docFreq = 0;
  9. long totalTermFreq = 0;
  10. FixedBitSet visitedDocs = new FixedBitSet(maxDoc);
  11. final IntsRefBuilder scratchIntsRef = new IntsRefBuilder();
  12. while(true) {
  13. SimpleTextUtil.readLine(in, scratch);
  14. if (scratch.get().equals(END) || StringHelper.startsWith(scratch.get(), FIELD)) {
  15. if (lastDocsStart != -1) {
  16. b.add(Util.toIntsRef(lastTerm.get(), scratchIntsRef),
  17. outputs.newPair(lastDocsStart,
  18. outputsInner.newPair((long) docFreq, totalTermFreq)));
  19. } else if (StringHelper.startsWith(scratch.get(), TERM)) {
  20. if (lastDocsStart != -1) {
  21. b.add(Util.toIntsRef(lastTerm.get(), scratchIntsRef), outputs.newPair(lastDocsStart,
  22. outputsInner.newPair((long) docFreq, totalTermFreq)));

代码示例来源:origin: lintool/warcbase

  1. public int getID(String url) {
  2. Long id = null;
  3. try {
  4. id = Util.get(fst, new BytesRef(url));
  5. } catch (IOException e) {
  6. // Log error, but assume that URL doesn't exist.
  7. LOG.error("Error fetching " + url);
  8. e.printStackTrace();
  9. return -1;
  10. }
  11. return id == null ? -1 : id.intValue();
  12. }

代码示例来源:origin: org.apache.lucene/lucene-analyzers-common

  1. private FST<CharsRef> parseConversions(LineNumberReader reader, int num) throws IOException, ParseException {
  2. Map<String,String> mappings = new TreeMap<>();
  3. for (int i = 0; i < num; i++) {
  4. String line = reader.readLine();
  5. String parts[] = line.split("\\s+");
  6. if (parts.length != 3) {
  7. throw new ParseException("invalid syntax: " + line, reader.getLineNumber());
  8. }
  9. if (mappings.put(parts[1], parts[2]) != null) {
  10. throw new IllegalStateException("duplicate mapping specified for: " + parts[1]);
  11. }
  12. }
  13. Outputs<CharsRef> outputs = CharSequenceOutputs.getSingleton();
  14. Builder<CharsRef> builder = new Builder<>(FST.INPUT_TYPE.BYTE2, outputs);
  15. IntsRefBuilder scratchInts = new IntsRefBuilder();
  16. for (Map.Entry<String,String> entry : mappings.entrySet()) {
  17. Util.toUTF16(entry.getKey(), scratchInts);
  18. builder.add(scratchInts.get(), new CharsRef(entry.getValue()));
  19. }
  20. return builder.finish();
  21. }

代码示例来源:origin: harbby/presto-connectors

  1. /**
  2. * Builds the final automaton from a list of entries.
  3. */
  4. private FST<Object> buildAutomaton(BytesRefSorter sorter) throws IOException {
  5. // Build the automaton.
  6. final Outputs<Object> outputs = NoOutputs.getSingleton();
  7. final Object empty = outputs.getNoOutput();
  8. final Builder<Object> builder = new Builder<>(
  9. FST.INPUT_TYPE.BYTE1, 0, 0, true, true,
  10. shareMaxTailLength, outputs, false,
  11. PackedInts.DEFAULT, true, 15);
  12. BytesRefBuilder scratch = new BytesRefBuilder();
  13. BytesRef entry;
  14. final IntsRefBuilder scratchIntsRef = new IntsRefBuilder();
  15. int count = 0;
  16. BytesRefIterator iter = sorter.iterator();
  17. while((entry = iter.next()) != null) {
  18. count++;
  19. if (scratch.get().compareTo(entry) != 0) {
  20. builder.add(Util.toIntsRef(entry, scratchIntsRef), empty);
  21. scratch.copyBytes(entry);
  22. }
  23. }
  24. return count == 0 ? null : builder.finish();
  25. }
  26. }

相关文章