如何在Lucene搜索中精确匹配文本?

wz1wpwve  于 2022-11-07  发布在  Lucene
关注(0)|答案(3)|浏览(292)

我正在尝试匹配标题列中的文本从ASA5505 8.2到ASA5516的配置迁移
我的程序看起来像这样。

  1. Directory directory = FSDirectory.open(indexDir);
  2. MultiFieldQueryParser queryParser = new MultiFieldQueryParser(Version.LUCENE_35,new String[] {"TITLE"}, new StandardAnalyzer(Version.LUCENE_35));
  3. IndexReader reader = IndexReader.open(directory);
  4. IndexSearcher searcher = new IndexSearcher(reader);
  5. queryParser.setPhraseSlop(0);
  6. queryParser.setLowercaseExpandedTerms(true);
  7. Query query = queryParser.parse("TITLE:Config migration from ASA5505 8.2 to ASA5516");
  8. System.out.println(queryStr);
  9. TopDocs topDocs = searcher.search(query,100);
  10. System.out.println(topDocs.totalHits);
  11. ScoreDoc[] hits = topDocs.scoreDocs;
  12. System.out.println(hits.length + " Record(s) Found");
  13. for (int i = 0; i < hits.length; i++) {
  14. int docId = hits[i].doc;
  15. Document d = searcher.doc(docId);
  16. System.out.println("\"Title :\" " +d.get("TITLE") );
  17. }

但它的回归

  1. "Title :" Config migration from ASA5505 8.2 to ASA5516
  2. "Title :" Firewall migration from ASA5585 to ASA5555
  3. "Title :" Firewall migration from ASA5585 to ASA5555

后2个结果不是预期结果。那么,需要进行哪些修改才能完全匹配文本从ASA5505 8.2到ASA5516的配置迁移
我的索引函数如下所示

  1. public class Lucene {
  2. public static final String INDEX_DIR = "./Lucene";
  3. private static final String JDBC_DRIVER = "oracle.jdbc.OracleDriver";
  4. private static final String CONNECTION_URL = "jdbc:oracle:thin:xxxxxxx"
  5. private static final String USER_NAME = "localhost";
  6. private static final String PASSWORD = "localhost";
  7. private static final String QUERY = "select * from TITLE_TABLE";
  8. public static void main(String[] args) throws Exception {
  9. File indexDir = new File(INDEX_DIR);
  10. Lucene indexer = new Lucene();
  11. try {
  12. Date start = new Date();
  13. Class.forName(JDBC_DRIVER).newInstance();
  14. Connection conn = DriverManager.getConnection(CONNECTION_URL, USER_NAME, PASSWORD);
  15. SimpleAnalyzer analyzer = new SimpleAnalyzer(Version.LUCENE_35);
  16. IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_35, analyzer);
  17. IndexWriter indexWriter = new IndexWriter(FSDirectory.open(indexDir), indexWriterConfig);
  18. System.out.println("Indexing to directory '" + indexDir + "'...");
  19. int indexedDocumentCount = indexer.indexDocs(indexWriter, conn);
  20. indexWriter.close();
  21. System.out.println(indexedDocumentCount + " records have been indexed successfully");
  22. System.out.println("Total Time:" + (new Date().getTime() - start.getTime()) / (1000));
  23. } catch (Exception e) {
  24. e.printStackTrace();
  25. }
  26. }
  27. int indexDocs(IndexWriter writer, Connection conn) throws Exception {
  28. String sql = QUERY;
  29. Statement stmt = conn.createStatement();
  30. stmt.setFetchSize(100000);
  31. ResultSet rs = stmt.executeQuery(sql);
  32. int i = 0;
  33. while (rs.next()) {
  34. System.out.println("Addind Doc No:" + i);
  35. Document d = new Document();
  36. System.out.println(rs.getString("TITLE"));
  37. d.add(new Field("TITLE", rs.getString("TITLE"), Field.Store.YES, Field.Index.ANALYZED));
  38. d.add(new Field("NAME", rs.getString("NAME"), Field.Store.YES, Field.Index.ANALYZED));
  39. writer.addDocument(d);
  40. i++;
  41. }
  42. return i;
  43. }
  44. }
dz6r00yl

dz6r00yl1#

PVR是正确的,使用短语查询可能是正确的解决方案,但是他们遗漏了如何使用PhraseQuery类。尽管您已经在使用QueryParser,所以只需使用查询解析器语法,将搜索文本括在引号中:

  1. Query query = queryParser.parse("TITLE:\"Config migration from ASA5505 8.2 to ASA5516\"");

根据您的更新,您在索引时间和查询时间使用了不同的分析器。SimpleAnalyzerStandardAnalyzer做的事情不同。除非您有很好的理由不这样做,否则您应该在索引和查询时使用相同的分析方法。
因此,将索引代码中的分析器更改为StandardAnalyzer(反之亦然,查询时使用SimpleAnalyzer),您应该会看到更好的结果。

f0ofjuux

f0ofjuux2#

这是我为你写的完美作品:
用途:queryParser.parse("\"Config migration from ASA5505 8.2 to ASA5516\"");
1.若要建立索引

  1. public static void main(String[] args)
  2. {
  3. IndexWriter writer = getIndexWriter();
  4. Document doc = new Document();
  5. Document doc1 = new Document();
  6. Document doc2 = new Document();
  7. doc.add(new Field("TITLE", "Config migration from ASA5505 8.2 to ASA5516",Field.Store.YES,Field.Index.ANALYZED));
  8. doc1.add(new Field("TITLE", "Firewall migration from ASA5585 to ASA5555",Field.Store.YES,Field.Index.ANALYZED));
  9. doc2.add(new Field("TITLE", "Firewall migration from ASA5585 to ASA5555",Field.Store.YES,Field.Index.ANALYZED));
  10. try
  11. {
  12. writer.addDocument(doc);
  13. writer.addDocument(doc1);
  14. writer.addDocument(doc2);
  15. writer.close();
  16. } catch (IOException e) {
  17. // TODO Auto-generated catch block
  18. e.printStackTrace();
  19. }
  20. }
  21. public static IndexWriter getIndexWriter()
  22. {
  23. IndexWriter indexWriter=null;
  24. try
  25. {
  26. File file=new File("D://index//");
  27. if(!file.exists())
  28. file.mkdir();
  29. IndexWriterConfig conf=new IndexWriterConfig(Version.LUCENE_34, new StandardAnalyzer(Version.LUCENE_34));
  30. Directory directory=FSDirectory.open(file);
  31. indexWriter=new IndexWriter(directory, conf);
  32. } catch (IOException e) {
  33. // TODO Auto-generated catch block
  34. e.printStackTrace();
  35. }
  36. return indexWriter;
  37. }

}
2.搜索字符串

  1. public static void main(String[] args)
  2. {
  3. IndexReader reader=getIndexReader();
  4. IndexSearcher searcher = new IndexSearcher(reader);
  5. QueryParser parser = new QueryParser(Version.LUCENE_34, "TITLE" ,new StandardAnalyzer(Version.LUCENE_34));
  6. Query query;
  7. try
  8. {
  9. query = parser.parse("\"Config migration from ASA5505 8.2 to ASA5516\"");
  10. TopDocs hits = searcher.search(query,3);
  11. ScoreDoc[] document = hits.scoreDocs;
  12. int i=0;
  13. for(i=0;i<document.length;i++)
  14. {
  15. Document doc = searcher.doc(i);
  16. System.out.println("TITLE=" + doc.get("TITLE"));
  17. }
  18. searcher.close();
  19. }
  20. catch (Exception e)
  21. {
  22. // TODO Auto-generated catch block
  23. e.printStackTrace();
  24. }
  25. }
  26. public static IndexReader getIndexReader()
  27. {
  28. IndexReader reader=null;
  29. Directory dir;
  30. try
  31. {
  32. dir = FSDirectory.open(new File("D://index//"));
  33. reader=IndexReader.open(dir);
  34. } catch (IOException e)
  35. {
  36. // TODO Auto-generated catch block
  37. e.printStackTrace();
  38. }
  39. return reader;
  40. }
展开查看全部
fzwojiic

fzwojiic3#

尝试PhraseQuery,如下所示:

  1. BooleanQuery mainQuery= new BooleanQuery();
  2. String searchTerm="config migration from asa5505 8.2 to asa5516";
  3. String strArray[]= searchTerm.split(" ");
  4. for(int index=0;index<strArray.length;index++)
  5. {
  6. PhraseQuery query1 = new PhraseQuery();
  7. query1.add(new Term("TITLE",strArray[index]));
  8. mainQuery.add(query1,BooleanClause.Occur.MUST);
  9. }

然后执行mainQuery
检查堆栈溢出的this线程,它可以帮助您使用PhraseQuery进行精确搜索。

相关问题