org.apache.lucene.util.automaton.Operations.determinize()方法的使用及代码示例

x33g5p2x  于2022-01-26 转载在 其他  
字(8.2k)|赞(0)|评价(0)|浏览(159)

本文整理了Java中org.apache.lucene.util.automaton.Operations.determinize()方法的一些代码示例,展示了Operations.determinize()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。Operations.determinize()方法的具体详情如下:
包路径:org.apache.lucene.util.automaton.Operations
类名称:Operations
方法名:determinize

Operations.determinize介绍

[英]Determinizes the given automaton.

Worst case complexity: exponential in number of states.
[中]确定给定的自动机。
最坏情况复杂性:状态数呈指数增长。

代码示例

代码示例来源:origin: org.apache.lucene/lucene-core

  1. public GraphTokenStreamFiniteStrings(TokenStream in) throws IOException {
  2. Automaton aut = build(in);
  3. this.det = Operations.removeDeadStates(Operations.determinize(aut, DEFAULT_MAX_DETERMINIZED_STATES));
  4. }

代码示例来源:origin: org.apache.lucene/lucene-core

  1. /**
  2. * Returns the longest BytesRef that is a suffix of all accepted strings.
  3. * Worst case complexity: exponential in number of states (this calls
  4. * determinize).
  5. * @param maxDeterminizedStates maximum number of states determinizing the
  6. * automaton can result in. Set higher to allow more complex queries and
  7. * lower to prevent memory exhaustion.
  8. * @return common suffix, which can be an empty (length 0) BytesRef (never null)
  9. */
  10. public static BytesRef getCommonSuffixBytesRef(Automaton a, int maxDeterminizedStates) {
  11. // reverse the language of the automaton, then reverse its common prefix.
  12. Automaton r = Operations.determinize(reverse(a), maxDeterminizedStates);
  13. BytesRef ref = getCommonPrefixBytesRef(r);
  14. reverseBytes(ref);
  15. return ref;
  16. }

代码示例来源:origin: org.apache.lucene/lucene-core

  1. a = Operations.determinize(a, maxDeterminizedStates);
  2. this.automaton = a;
  3. points = a.getStartPoints();

代码示例来源:origin: org.apache.lucene/lucene-core

  1. /**
  2. * Returns a (deterministic) automaton that accepts the complement of the
  3. * language of the given automaton.
  4. * <p>
  5. * Complexity: linear in number of states if already deterministic and
  6. * exponential otherwise.
  7. * @param maxDeterminizedStates maximum number of states determinizing the
  8. * automaton can result in. Set higher to allow more complex queries and
  9. * lower to prevent memory exhaustion.
  10. */
  11. static public Automaton complement(Automaton a, int maxDeterminizedStates) {
  12. a = totalize(determinize(a, maxDeterminizedStates));
  13. int numStates = a.getNumStates();
  14. for (int p=0;p<numStates;p++) {
  15. a.setAccept(p, !a.isAccept(p));
  16. }
  17. return removeDeadStates(a);
  18. }

代码示例来源:origin: org.apache.lucene/lucene-core

  1. a = Operations.determinize(a, maxDeterminizedStates);

代码示例来源:origin: org.elasticsearch/elasticsearch

  1. protected Automaton convertAutomaton(Automaton a) {
  2. if (queryPrefix != null) {
  3. a = Operations.concatenate(Arrays.asList(queryPrefix, a));
  4. // This automaton should not blow up during determinize:
  5. a = Operations.determinize(a, Integer.MAX_VALUE);
  6. }
  7. return a;
  8. }

代码示例来源:origin: org.apache.lucene/lucene-core

  1. automaton = Operations.determinize(automaton, maxDeterminizedStates);

代码示例来源:origin: org.apache.lucene/lucene-analyzers-common

  1. /** Creates a new SimpleSplitPatternTokenizerFactory */
  2. public SimplePatternSplitTokenizerFactory(Map<String,String> args) {
  3. super(args);
  4. maxDeterminizedStates = getInt(args, "maxDeterminizedStates", Operations.DEFAULT_MAX_DETERMINIZED_STATES);
  5. dfa = Operations.determinize(new RegExp(require(args, PATTERN)).toAutomaton(), maxDeterminizedStates);
  6. if (args.isEmpty() == false) {
  7. throw new IllegalArgumentException("Unknown parameters: " + args);
  8. }
  9. }

代码示例来源:origin: org.apache.lucene/lucene-analyzers-common

  1. /** Creates a new SimplePatternTokenizerFactory */
  2. public SimplePatternTokenizerFactory(Map<String,String> args) {
  3. super(args);
  4. maxDeterminizedStates = getInt(args, "maxDeterminizedStates", Operations.DEFAULT_MAX_DETERMINIZED_STATES);
  5. dfa = Operations.determinize(new RegExp(require(args, PATTERN)).toAutomaton(), maxDeterminizedStates);
  6. if (args.isEmpty() == false) {
  7. throw new IllegalArgumentException("Unknown parameters: " + args);
  8. }
  9. }

代码示例来源:origin: org.elasticsearch/elasticsearch

  1. final Automaton toLookupAutomaton(final CharSequence key) throws IOException {
  2. // TODO: is there a Reader from a CharSequence?
  3. // Turn tokenstream into automaton:
  4. Automaton automaton = null;
  5. try (TokenStream ts = queryAnalyzer.tokenStream("", key.toString())) {
  6. automaton = getTokenStreamToAutomaton().toAutomaton(ts);
  7. }
  8. automaton = replaceSep(automaton);
  9. // TODO: we can optimize this somewhat by determinizing
  10. // while we convert
  11. // This automaton should not blow up during determinize:
  12. automaton = Operations.determinize(automaton, Integer.MAX_VALUE);
  13. return automaton;
  14. }

代码示例来源:origin: org.elasticsearch/elasticsearch

  1. @Override
  2. protected Automaton convertAutomaton(Automaton a) {
  3. if (unicodeAware) {
  4. // FLORIAN EDIT: get converted Automaton from superclass
  5. Automaton utf8automaton = new UTF32ToUTF8().convert(super.convertAutomaton(a));
  6. // This automaton should not blow up during determinize:
  7. utf8automaton = Operations.determinize(utf8automaton, Integer.MAX_VALUE);
  8. return utf8automaton;
  9. } else {
  10. return super.convertAutomaton(a);
  11. }
  12. }

代码示例来源:origin: org.elasticsearch/elasticsearch

  1. return Operations.determinize(a, DEFAULT_MAX_DETERMINIZED_STATES);

代码示例来源:origin: org.apache.lucene/lucene-analyzers-common

  1. /**
  2. * Converts the tokenStream to an automaton. Does *not* close it.
  3. */
  4. public Automaton toAutomaton(boolean unicodeAware) throws IOException {
  5. // TODO refactor this
  6. // maybe we could hook up a modified automaton from TermAutomatonQuery here?
  7. // Create corresponding automaton: labels are bytes
  8. // from each analyzed token, with byte 0 used as
  9. // separator between tokens:
  10. final TokenStreamToAutomaton tsta;
  11. if (preserveSep) {
  12. tsta = new EscapingTokenStreamToAutomaton(SEP_LABEL);
  13. } else {
  14. // When we're not preserving sep, we don't steal 0xff
  15. // byte, so we don't need to do any escaping:
  16. tsta = new TokenStreamToAutomaton();
  17. }
  18. tsta.setPreservePositionIncrements(preservePositionIncrements);
  19. tsta.setUnicodeArcs(unicodeAware);
  20. Automaton automaton = tsta.toAutomaton(inputTokenStream);
  21. // TODO: we can optimize this somewhat by determinizing
  22. // while we convert
  23. automaton = replaceSep(automaton, preserveSep, SEP_LABEL);
  24. // This automaton should not blow up during determinize:
  25. return Operations.determinize(automaton, maxGraphExpansions);
  26. }

代码示例来源:origin: apache/servicemix-bundles

  1. protected Automaton convertAutomaton(Automaton a) {
  2. if (queryPrefix != null) {
  3. a = Operations.concatenate(Arrays.asList(queryPrefix, a));
  4. // This automaton should not blow up during determinize:
  5. a = Operations.determinize(a, Integer.MAX_VALUE);
  6. }
  7. return a;
  8. }

代码示例来源:origin: com.strapdata.elasticsearch/elasticsearch

  1. protected Automaton convertAutomaton(Automaton a) {
  2. if (queryPrefix != null) {
  3. a = Operations.concatenate(Arrays.asList(queryPrefix, a));
  4. // This automaton should not blow up during determinize:
  5. a = Operations.determinize(a, Integer.MAX_VALUE);
  6. }
  7. return a;
  8. }

代码示例来源:origin: org.apache.servicemix.bundles/org.apache.servicemix.bundles.elasticsearch

  1. protected Automaton convertAutomaton(Automaton a) {
  2. if (queryPrefix != null) {
  3. a = Operations.concatenate(Arrays.asList(queryPrefix, a));
  4. // This automaton should not blow up during determinize:
  5. a = Operations.determinize(a, Integer.MAX_VALUE);
  6. }
  7. return a;
  8. }

代码示例来源:origin: harbby/presto-connectors

  1. @Override
  2. protected Automaton convertAutomaton(Automaton a) {
  3. if (unicodeAware) {
  4. Automaton utf8automaton = new UTF32ToUTF8().convert(a);
  5. utf8automaton = Operations.determinize(utf8automaton, DEFAULT_MAX_DETERMINIZED_STATES);
  6. return utf8automaton;
  7. } else {
  8. return a;
  9. }
  10. }

代码示例来源:origin: org.apache.servicemix.bundles/org.apache.servicemix.bundles.elasticsearch

  1. @Override
  2. protected Automaton convertAutomaton(Automaton a) {
  3. if (unicodeAware) {
  4. // FLORIAN EDIT: get converted Automaton from superclass
  5. Automaton utf8automaton = new UTF32ToUTF8().convert(super.convertAutomaton(a));
  6. // This automaton should not blow up during determinize:
  7. utf8automaton = Operations.determinize(utf8automaton, Integer.MAX_VALUE);
  8. return utf8automaton;
  9. } else {
  10. return super.convertAutomaton(a);
  11. }
  12. }

代码示例来源:origin: com.strapdata.elasticsearch/elasticsearch

  1. @Override
  2. protected Automaton convertAutomaton(Automaton a) {
  3. if (unicodeAware) {
  4. // FLORIAN EDIT: get converted Automaton from superclass
  5. Automaton utf8automaton = new UTF32ToUTF8().convert(super.convertAutomaton(a));
  6. // This automaton should not blow up during determinize:
  7. utf8automaton = Operations.determinize(utf8automaton, Integer.MAX_VALUE);
  8. return utf8automaton;
  9. } else {
  10. return super.convertAutomaton(a);
  11. }
  12. }

代码示例来源:origin: wikimedia/search-highlighter

  1. private Factory(String regexString, int maxDeterminizedStates) {
  2. Automaton automaton = new RegExp(regexString).toAutomaton(maxDeterminizedStates);
  3. forward = new OffsetReturningRunAutomaton(automaton, false);
  4. if (hasLeadingWildcard(automaton)) {
  5. Automaton reversed = Operations.determinize(Operations.reverse(
  6. new RegExp("(" + regexString + ").*").toAutomaton(maxDeterminizedStates)), maxDeterminizedStates);
  7. reverse = new AcceptReturningReverseRunAutomaton(reversed);
  8. } else {
  9. reverse = null;
  10. }
  11. }

相关文章