如何在Lucene搜索中精确匹配文本？

wz1wpwve 于 2022-11-07 发布在 Lucene

关注(0)|答案(3)|浏览(292)

我正在尝试匹配标题列中的文本从ASA5505 8.2到ASA5516的配置迁移。
我的程序看起来像这样。

Directory directory = FSDirectory.open(indexDir);
MultiFieldQueryParser queryParser = new MultiFieldQueryParser(Version.LUCENE_35,new String[] {"TITLE"}, new StandardAnalyzer(Version.LUCENE_35));        
IndexReader reader = IndexReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);       
queryParser.setPhraseSlop(0);
queryParser.setLowercaseExpandedTerms(true);
Query query = queryParser.parse("TITLE:Config migration from ASA5505 8.2 to ASA5516");
System.out.println(queryStr);
TopDocs topDocs = searcher.search(query,100);
System.out.println(topDocs.totalHits);
ScoreDoc[] hits = topDocs.scoreDocs;
System.out.println(hits.length + " Record(s) Found");
for (int i = 0; i < hits.length; i++) {
    int docId = hits[i].doc;
    Document d = searcher.doc(docId);
    System.out.println("\"Title :\" " +d.get("TITLE") );
}

但它的回归

"Title :" Config migration from ASA5505 8.2 to ASA5516
"Title :" Firewall  migration from ASA5585 to  ASA5555
"Title :" Firewall  migration from ASA5585 to  ASA5555

后2个结果不是预期结果。那么，需要进行哪些修改才能完全匹配文本从ASA5505 8.2到ASA5516的配置迁移
我的索引函数如下所示

public class Lucene {
public static final String INDEX_DIR = "./Lucene";
private static final String JDBC_DRIVER = "oracle.jdbc.OracleDriver";
private static final String CONNECTION_URL = "jdbc:oracle:thin:xxxxxxx"
private static final String USER_NAME = "localhost";
private static final String PASSWORD = "localhost";
private static final String QUERY = "select * from TITLE_TABLE";
public static void main(String[] args) throws Exception {
    File indexDir = new File(INDEX_DIR);
    Lucene indexer = new Lucene();
    try {
        Date start = new Date();
        Class.forName(JDBC_DRIVER).newInstance();
        Connection conn = DriverManager.getConnection(CONNECTION_URL, USER_NAME, PASSWORD);
        SimpleAnalyzer analyzer = new SimpleAnalyzer(Version.LUCENE_35);
        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_35, analyzer);
        IndexWriter indexWriter = new IndexWriter(FSDirectory.open(indexDir), indexWriterConfig);
        System.out.println("Indexing to directory '" + indexDir + "'...");
        int indexedDocumentCount = indexer.indexDocs(indexWriter, conn);
        indexWriter.close();
        System.out.println(indexedDocumentCount + " records have been indexed successfully");
        System.out.println("Total Time:" + (new Date().getTime() - start.getTime()) / (1000));
    } catch (Exception e) {
        e.printStackTrace();
    }
}
int indexDocs(IndexWriter writer, Connection conn) throws Exception {
    String sql = QUERY;
    Statement stmt = conn.createStatement();
    stmt.setFetchSize(100000);
    ResultSet rs = stmt.executeQuery(sql);
    int i = 0;
    while (rs.next()) {
        System.out.println("Addind Doc No:" + i);
        Document d = new Document();
        System.out.println(rs.getString("TITLE"));
        d.add(new Field("TITLE", rs.getString("TITLE"), Field.Store.YES, Field.Index.ANALYZED));
        d.add(new Field("NAME", rs.getString("NAME"), Field.Store.YES, Field.Index.ANALYZED));
        writer.addDocument(d);
        i++;
    }
    return i;
}
}

lucene

来源：https://stackoverflow.com/questions/37495639/how-to-match-exact-text-in-lucene-search

3条答案

按热度按时间

dz6r00yl1#

PVR是正确的，使用短语查询可能是正确的解决方案，但是他们遗漏了如何使用PhraseQuery类。尽管您已经在使用QueryParser，所以只需使用查询解析器语法，将搜索文本括在引号中：

Query query = queryParser.parse("TITLE:\"Config migration from ASA5505 8.2 to ASA5516\"");

根据您的更新，您在索引时间和查询时间使用了不同的分析器。SimpleAnalyzer和StandardAnalyzer做的事情不同。除非您有很好的理由不这样做，否则您应该在索引和查询时使用相同的分析方法。
因此，将索引代码中的分析器更改为StandardAnalyzer（反之亦然，查询时使用SimpleAnalyzer），您应该会看到更好的结果。

赞(0）回复(0）举报 2022-11-07

f0ofjuux2#

这是我为你写的完美作品：
用途：queryParser.parse("\"Config migration from ASA5505 8.2 to ASA5516\"");
1.若要建立索引

public static void main(String[] args) 
{
    IndexWriter writer = getIndexWriter();
    Document doc = new Document();
    Document doc1 = new Document();
    Document doc2 = new Document();
    doc.add(new Field("TITLE", "Config migration from ASA5505 8.2 to ASA5516",Field.Store.YES,Field.Index.ANALYZED));
    doc1.add(new Field("TITLE", "Firewall  migration from ASA5585 to ASA5555",Field.Store.YES,Field.Index.ANALYZED));
    doc2.add(new Field("TITLE", "Firewall  migration from ASA5585 to ASA5555",Field.Store.YES,Field.Index.ANALYZED));
    try 
    {
        writer.addDocument(doc);
        writer.addDocument(doc1);
        writer.addDocument(doc2);
        writer.close();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}
public static IndexWriter getIndexWriter()
{
    IndexWriter indexWriter=null;
    try 
    {
    File file=new File("D://index//");
    if(!file.exists())
        file.mkdir();
    IndexWriterConfig conf=new IndexWriterConfig(Version.LUCENE_34, new StandardAnalyzer(Version.LUCENE_34));
    Directory directory=FSDirectory.open(file);
    indexWriter=new IndexWriter(directory, conf);
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    return indexWriter;
}

}
2.搜索字符串

public static void main(String[] args) 
    {
    IndexReader reader=getIndexReader();
    IndexSearcher searcher = new IndexSearcher(reader);
    QueryParser parser = new QueryParser(Version.LUCENE_34, "TITLE" ,new StandardAnalyzer(Version.LUCENE_34));
    Query query;
    try 
    {
    query = parser.parse("\"Config migration from ASA5505 8.2 to ASA5516\"");
    TopDocs hits = searcher.search(query,3);
    ScoreDoc[] document = hits.scoreDocs;
    int i=0;
    for(i=0;i<document.length;i++)
    {
        Document doc = searcher.doc(i);
        System.out.println("TITLE=" + doc.get("TITLE"));
    }
        searcher.close();
    } 
    catch (Exception e) 
    {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } 
            }
public static IndexReader getIndexReader()
{
    IndexReader reader=null;
    Directory dir;
    try 
    {
        dir = FSDirectory.open(new File("D://index//"));
        reader=IndexReader.open(dir);
    } catch (IOException e) 
    {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    return reader;
}

展开查看全部

赞(0）回复(0）举报 2022-11-07

fzwojiic3#

尝试PhraseQuery，如下所示：

BooleanQuery mainQuery= new BooleanQuery(); 
String searchTerm="config migration from asa5505 8.2 to asa5516";
String strArray[]= searchTerm.split(" ");
for(int index=0;index<strArray.length;index++)
{
    PhraseQuery query1 = new PhraseQuery();
     query1.add(new Term("TITLE",strArray[index]));
     mainQuery.add(query1,BooleanClause.Occur.MUST);
}

然后执行mainQuery。
检查堆栈溢出的this线程，它可以帮助您使用PhraseQuery进行精确搜索。

赞(0）回复(0）举报 2022-11-07

我来回答

如何在Lucene搜索中精确匹配文本？

3条答案

相关问题

热门标签

最新问答