org.htmlparser.Parser.visitAllNodesWith()方法的使用及代码示例

x33g5p2x  于2022-01-26 转载在 其他  
字(8.0k)|赞(0)|评价(0)|浏览(131)

本文整理了Java中org.htmlparser.Parser.visitAllNodesWith()方法的一些代码示例,展示了Parser.visitAllNodesWith()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。Parser.visitAllNodesWith()方法的具体详情如下:
包路径:org.htmlparser.Parser
类名称:Parser
方法名:visitAllNodesWith

Parser.visitAllNodesWith介绍

[英]Apply the given visitor to the current page. The visitor is passed to the accept() method of each node in the page in a depth first traversal. The visitor beginParsing() method is called prior to processing the page and finishedParsing() is called after the processing.
[中]将给定的访问者应用到当前页面。在深度优先遍历中,访问者被传递到页面中每个节点的accept()方法。在处理页面之前调用visitorbeginParsing()方法,在处理之后调用finishedParsing()

代码示例

代码示例来源:origin: com.bbossgroups/bboss-htmlparser

  1. /**
  2. * Extract the text from a page.
  3. * @return The textual contents of the page.
  4. * @exception ParserException If a parse error occurs.
  5. */
  6. protected String extractStrings ()
  7. throws
  8. ParserException
  9. {
  10. String ret;
  11. mParser.visitAllNodesWith (this);
  12. ret = mBuffer.toString ();
  13. mBuffer = new StringBuilder(4096);
  14. return (ret);
  15. }

代码示例来源:origin: org.htmlparser/htmlparser

  1. /**
  2. * Extract the text from a page.
  3. * @return The textual contents of the page.
  4. * @exception ParserException If a parse error occurs.
  5. */
  6. protected String extractStrings ()
  7. throws
  8. ParserException
  9. {
  10. String ret;
  11. mCollapseState = 0;
  12. mParser.visitAllNodesWith (this);
  13. ret = mBuffer.toString ();
  14. mBuffer = new StringBuffer(4096);
  15. return (ret);
  16. }

代码示例来源:origin: fhopf/akka-crawler-example

  1. @Override
  2. public PageContent fetchPageContent(String url) {
  3. logger.debug("Fetching {}", url);
  4. try {
  5. Parser parser = new Parser(url);
  6. PageContentVisitor visitor = new PageContentVisitor(baseUrl, url);
  7. parser.visitAllNodesWith(visitor);
  8. return visitor.getContent();
  9. } catch (ParserException ex) {
  10. throw new IllegalStateException(ex);
  11. }
  12. }

代码示例来源:origin: omegat-org/omegat

  1. @Override
  2. public void processFile(BufferedReader infile, BufferedWriter outfile, FilterContext fc) throws IOException,
  3. TranslationException {
  4. StringBuilder all = null;
  5. try {
  6. all = new StringBuilder();
  7. char[] cbuf = new char[1000];
  8. int len = -1;
  9. while ((len = infile.read(cbuf)) > 0) {
  10. all.append(cbuf, 0, len);
  11. }
  12. } catch (OutOfMemoryError e) {
  13. // out of memory?
  14. all = null;
  15. System.gc();
  16. throw new IOException(OStrings.getString("HHC__FILE_TOO_BIG"));
  17. }
  18. Parser parser = new Parser();
  19. try {
  20. parser.setInputHTML(all.toString());
  21. parser.visitAllNodesWith(new HHCFilterVisitor(this, outfile));
  22. } catch (ParserException pe) {
  23. System.out.println(pe);
  24. }
  25. }

代码示例来源:origin: oaqa/knn4qa

  1. public PostCleaner(String html, int minCodeChars, boolean excludeCode) {
  2. try {
  3. Parser htmlParser = Parser.createParser(html, "utf8");
  4. PostCleanerVisitor res = new PostCleanerVisitor(minCodeChars, excludeCode);
  5. htmlParser.visitAllNodesWith(res);
  6. mText = res.getText();
  7. } catch (ParserException e) {
  8. System.err.println(" Parser exception: " + e + " trying simple conversion");
  9. // Plan B!!!
  10. mText = PostCleanerVisitor.simpleProc(html);
  11. }
  12. }

代码示例来源:origin: org.htmlparser/htmlparser

  1. mParser.visitAllNodesWith (this);
  2. updateStrings (mBuffer.toString ());
  3. mBuffer = new StringBuffer (4096);
  4. mCollapseState = 0;
  5. mParser.visitAllNodesWith (this);
  6. updateStrings (mBuffer.toString ());

代码示例来源:origin: org.exoplatform.core/exo.core.component.document

  1. parser.visitAllNodesWith(sb);

代码示例来源:origin: com.bbossgroups/bboss-htmlparser

  1. mParser.visitAllNodesWith (this);
  2. updateStrings (mBuffer.toString ());
  3. mParser.visitAllNodesWith (this);
  4. updateStrings (mBuffer.toString ());

代码示例来源:origin: com.bbossgroups.pdp/pdp-cms

  1. parser.visitAllNodesWith(this);

代码示例来源:origin: org.opencms/opencms-core

  1. /**
  2. * @see org.opencms.util.I_CmsHtmlNodeVisitor#process(java.lang.String, java.lang.String)
  3. */
  4. public String process(String html, String encoding) throws ParserException {
  5. m_result = new StringBuffer();
  6. Parser parser = new Parser();
  7. Lexer lexer = new Lexer();
  8. // initialize the page with the given char set
  9. Page page = new Page(html, encoding);
  10. lexer.setPage(page);
  11. parser.setLexer(lexer);
  12. if ((m_noAutoCloseTags != null) && (m_noAutoCloseTags.size() > 0)) {
  13. // Degrade Composite tags that do have children in the DOM tree
  14. // to simple single tags: This allows to finish this tag with opened HTML tags without the effect
  15. // that html parser will generate the closing tags.
  16. PrototypicalNodeFactory factory = configureNoAutoCorrectionTags();
  17. lexer.setNodeFactory(factory);
  18. }
  19. // process the page using the given visitor
  20. parser.visitAllNodesWith(this);
  21. // return the result
  22. return getResult();
  23. }

代码示例来源:origin: org.opencms/opencms-solr

  1. /**
  2. * @see org.opencms.util.I_CmsHtmlNodeVisitor#process(java.lang.String, java.lang.String)
  3. */
  4. public String process(String html, String encoding) throws ParserException {
  5. m_result = new StringBuffer();
  6. Parser parser = new Parser();
  7. Lexer lexer = new Lexer();
  8. // initialize the page with the given char set
  9. Page page = new Page(html, encoding);
  10. lexer.setPage(page);
  11. parser.setLexer(lexer);
  12. if (m_noAutoCloseTags != null && m_noAutoCloseTags.size() > 0) {
  13. // Degrade Composite tags that do have children in the DOM tree
  14. // to simple single tags: This allows to finish this tag with opened HTML tags without the effect
  15. // that html parser will generate the closing tags.
  16. PrototypicalNodeFactory factory = configureNoAutoCorrectionTags();
  17. lexer.setNodeFactory(factory);
  18. }
  19. // process the page using the given visitor
  20. parser.visitAllNodesWith(this);
  21. // return the result
  22. return getResult();
  23. }

代码示例来源:origin: dbiir/rainbow

  1. HtmlPage page = new HtmlPage(parser);
  2. try {
  3. parser.visitAllNodesWith(page);
  4. } catch (ParserException e) {
  5. log.error("visit page error:", e);

代码示例来源:origin: org.opencms/org.opencms.workplace.tools.content

  1. parser.setLexer(lexer);
  2. parser.visitAllNodesWith(this);

代码示例来源:origin: org.opencms/opencms-core

  1. /**
  2. * Extract the text from a HTML page.<p>
  3. *
  4. * @param in the html content input stream
  5. * @param encoding the encoding of the content
  6. *
  7. * @return the extracted text from the page
  8. * @throws ParserException if the parsing of the HTML failed
  9. * @throws UnsupportedEncodingException if the given encoding is not supported
  10. */
  11. public static String extractText(InputStream in, String encoding)
  12. throws ParserException, UnsupportedEncodingException {
  13. Parser parser = new Parser();
  14. Lexer lexer = new Lexer();
  15. Page page = new Page(in, encoding);
  16. lexer.setPage(page);
  17. parser.setLexer(lexer);
  18. StringBean stringBean = new StringBean();
  19. parser.visitAllNodesWith(stringBean);
  20. String result = stringBean.getStrings();
  21. return result == null ? "" : result;
  22. }

代码示例来源:origin: com.bbossgroups.pdp/pdp-cms

  1. parser.visitAllNodesWith(this);

代码示例来源:origin: org.opencms/opencms-solr

  1. /**
  2. * Extract the text from a HTML page.<p>
  3. *
  4. * @param in the html content input stream
  5. * @param encoding the encoding of the content
  6. *
  7. * @return the extracted text from the page
  8. * @throws ParserException if the parsing of the HTML failed
  9. * @throws UnsupportedEncodingException if the given encoding is not supported
  10. */
  11. public static String extractText(InputStream in, String encoding)
  12. throws ParserException, UnsupportedEncodingException {
  13. Parser parser = new Parser();
  14. Lexer lexer = new Lexer();
  15. Page page = new Page(in, encoding);
  16. lexer.setPage(page);
  17. parser.setLexer(lexer);
  18. StringBean stringBean = new StringBean();
  19. parser.visitAllNodesWith(stringBean);
  20. String result = stringBean.getStrings();
  21. return result == null ? "" : result;
  22. }

代码示例来源:origin: com.bbossgroups.pdp/pdp-cms

  1. /**
  2. * Extract the text from a HTML page.<p>
  3. *
  4. * @param in the html content input stream
  5. * @param encoding the encoding of the content
  6. *
  7. * @return the extracted text from the page
  8. * @throws ParserException if the parsing of the HTML failed
  9. * @throws UnsupportedEncodingException if the given encoding is not supported
  10. */
  11. public static String extractText(InputStream in, String encoding)
  12. throws ParserException, UnsupportedEncodingException {
  13. Parser parser = new Parser();
  14. Lexer lexer = new Lexer();
  15. Page page = new Page(in, encoding);
  16. lexer.setPage(page);
  17. parser.setLexer(lexer);
  18. StringBean stringBean = new StringBean();
  19. parser.visitAllNodesWith(stringBean);
  20. return stringBean.getStrings();
  21. }

代码示例来源:origin: fhopf/akka-crawler-example

  1. @Test
  2. public void testLinkExtraction() throws ParserException {
  3. Parser parser = new Parser("http://synyx.de");
  4. ObjectFindingVisitor visitor = new ObjectFindingVisitor(LinkTag.class);
  5. parser.visitAllNodesWith(visitor);
  6. Node[] links = visitor.getTags();
  7. // TODO this could use some more meaningful assertions
  8. assertTrue(links.length > 0);
  9. for (int i = 0; i < links.length; i++) {
  10. LinkTag linkTag = (LinkTag) links[i];
  11. System.out.print("\"" + linkTag.getLinkText() + "\" => ");
  12. System.out.println(linkTag.getLink());
  13. }
  14. }
  15. }

代码示例来源:origin: org.opencms/opencms-core

  1. parser.visitAllNodesWith(visitor);

代码示例来源:origin: org.opencms/opencms-solr

  1. parser.visitAllNodesWith(visitor);

相关文章