org.jsoup.Jsoup.parseBodyFragment()方法的使用及代码示例

x33g5p2x  于2022-01-21 转载在 其他  
字(10.0k)|赞(0)|评价(0)|浏览(484)

本文整理了Java中org.jsoup.Jsoup.parseBodyFragment()方法的一些代码示例,展示了Jsoup.parseBodyFragment()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。Jsoup.parseBodyFragment()方法的具体详情如下:
包路径:org.jsoup.Jsoup
类名称:Jsoup
方法名:parseBodyFragment

Jsoup.parseBodyFragment介绍

[英]Parse a fragment of HTML, with the assumption that it forms the body of the HTML.
[中]解析HTML片段,假设它构成HTML的主体。

代码示例

代码示例来源:origin: JpressProjects/jpress

  1. public static List<String> getImageSrcs(String html) {
  2. if (StrUtils.isBlank(html)) {
  3. return null;
  4. }
  5. List<String> list = new ArrayList<String>();
  6. Document doc = Jsoup.parseBodyFragment(html);
  7. Elements es = doc.select("img");
  8. if (es != null && es.size() > 0) {
  9. for (Element e : es) {
  10. String src = e.attr("src");
  11. if (StrUtils.isNotBlank(src)) list.add(src);
  12. }
  13. }
  14. return list.isEmpty() ? null : list;
  15. }

代码示例来源:origin: JpressProjects/jpress

  1. public static String getFirstImageSrc(String html) {
  2. if (StrUtils.isBlank(html))
  3. return null;
  4. Elements es = Jsoup.parseBodyFragment(html).select("img");
  5. if (es != null && es.size() > 0) {
  6. String src = es.first().attr("src");
  7. return StrUtils.isBlank(src) ? null : src;
  8. }
  9. return null;
  10. }

代码示例来源:origin: RipMeApp/ripme

  1. @Override
  2. public List<String> getURLsFromJSON(JSONObject json) {
  3. List<String> imageURLs = new ArrayList<>();
  4. JSONArray results = json.getJSONObject("content").getJSONArray("results");
  5. for (int i = 0; i < results.length(); i++) {
  6. Document doc = Jsoup.parseBodyFragment(results.getJSONObject(i).getString("html"));
  7. if (doc.html().contains("ismature")) {
  8. LOGGER.info("Downloading nsfw image");
  9. String nsfwImage = getFullsizedNSFWImage(doc.select("span").attr("href"));
  10. if (nsfwImage != null && nsfwImage.startsWith("http")) {
  11. imageURLs.add(nsfwImage);
  12. }
  13. }
  14. try {
  15. String imageURL = doc.select("span").first().attr("data-super-full-img");
  16. if (!imageURL.isEmpty() && imageURL.startsWith("http")) {
  17. imageURLs.add(imageURL);
  18. }
  19. } catch (NullPointerException e) {
  20. LOGGER.info(i + " does not contain any images");
  21. }
  22. }
  23. return imageURLs;
  24. }

代码示例来源:origin: ankidroid/Anki-Android

  1. /**
  2. * Returns the list of text snippets contained in the given HTML fragment that should be read
  3. * using the Android text-to-speech engine, together with the languages they are in.
  4. * <p>
  5. * Each returned LocalisedText object contains the text extracted from a &lt;tts&gt; element
  6. * whose 'service' attribute is set to 'android', and the localeCode taken from the 'voice'
  7. * attribute of that element. This holds unless the HTML fragment contains no such &lt;tts&gt;
  8. * elements; in that case the function returns a single LocalisedText object containing the
  9. * text extracted from the whole HTML fragment, with the localeCode set to an empty string.
  10. */
  11. public static List<LocalisedText> getTextsToRead(String html) {
  12. List<LocalisedText> textsToRead = new ArrayList<>();
  13. Element elem = Jsoup.parseBodyFragment(html).body();
  14. parseTtsElements(elem, textsToRead);
  15. if (textsToRead.size() == 0) {
  16. // No <tts service="android"> elements found: return the text of the whole HTML fragment
  17. textsToRead.add(new LocalisedText(elem.text()));
  18. }
  19. return textsToRead;
  20. }

代码示例来源:origin: RipMeApp/ripme

  1. doc = Jsoup.parseBodyFragment(body);
  2. List<Element> elements = doc.select("a");
  3. Set<String> photoIDsToGet = new HashSet<>();

代码示例来源:origin: org.jsoup/jsoup

  1. /**
  2. Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted
  3. tags and attributes.
  4. @param bodyHtml input untrusted HTML (body fragment)
  5. @param baseUri URL to resolve relative URLs against
  6. @param whitelist white-list of permitted HTML elements
  7. @return safe HTML (body fragment)
  8. @see Cleaner#clean(Document)
  9. */
  10. public static String clean(String bodyHtml, String baseUri, Whitelist whitelist) {
  11. Document dirty = parseBodyFragment(bodyHtml, baseUri);
  12. Cleaner cleaner = new Cleaner(whitelist);
  13. Document clean = cleaner.clean(dirty);
  14. return clean.body().html();
  15. }

代码示例来源:origin: JpressProjects/jpress

  1. document = Jsoup.parseBodyFragment(html, params.getBaseUri());
  2. } else {
  3. document = Jsoup.parseBodyFragment(html);

代码示例来源:origin: org.jsoup/jsoup

  1. /**
  2. * Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of
  3. * permitted tags and attributes.
  4. * <p>The HTML is treated as a body fragment; it's expected the cleaned HTML will be used within the body of an
  5. * existing document. If you want to clean full documents, use {@link Cleaner#clean(Document)} instead, and add
  6. * structural tags (<code>html, head, body</code> etc) to the whitelist.
  7. *
  8. * @param bodyHtml input untrusted HTML (body fragment)
  9. * @param baseUri URL to resolve relative URLs against
  10. * @param whitelist white-list of permitted HTML elements
  11. * @param outputSettings document output settings; use to control pretty-printing and entity escape modes
  12. * @return safe HTML (body fragment)
  13. * @see Cleaner#clean(Document)
  14. */
  15. public static String clean(String bodyHtml, String baseUri, Whitelist whitelist, Document.OutputSettings outputSettings) {
  16. Document dirty = parseBodyFragment(bodyHtml, baseUri);
  17. Cleaner cleaner = new Cleaner(whitelist);
  18. Document clean = cleaner.clean(dirty);
  19. clean.outputSettings(outputSettings);
  20. return clean.body().html();
  21. }

代码示例来源:origin: TakWolf/CNode-Material-Design

  1. public static Document handleHtml(String html) {
  2. // 保证html不为null
  3. html = TextUtils.isEmpty(html) ? "" : html;
  4. // 过滤xss
  5. return cleaner.clean(Jsoup.parseBodyFragment(html, ApiDefine.HOST_BASE_URL));
  6. }

代码示例来源:origin: jbake-org/jbake

  1. /**
  2. * Image paths are specified as w.r.t. assets folder. This function prefix site host to all img src except
  3. * the ones that starts with http://, https://.
  4. * <p>
  5. * If image path starts with "./", i.e. relative to the source file, then it first replace that with output file directory and the add site host.
  6. *
  7. * @param fileContents Map representing file contents
  8. * @param configuration Configuration object
  9. */
  10. public static void fixImageSourceUrls(Map<String, Object> fileContents, JBakeConfiguration configuration) {
  11. String htmlContent = fileContents.get(Attributes.BODY).toString();
  12. boolean prependSiteHost = configuration.getImgPathPrependHost();
  13. String siteHost = configuration.getSiteHost();
  14. String uri = getDocumentUri(fileContents);
  15. Document document = Jsoup.parseBodyFragment(htmlContent);
  16. Elements allImgs = document.getElementsByTag("img");
  17. for (Element img : allImgs) {
  18. transformImageSource(img, uri, siteHost, prependSiteHost);
  19. }
  20. //Use body().html() to prevent adding <body></body> from parsed fragment.
  21. fileContents.put(Attributes.BODY, document.body().html());
  22. }

代码示例来源:origin: HubSpot/jinjava

  1. @Override
  2. public Object filter(Object object, JinjavaInterpreter interpreter, String... arg) {
  3. if (!(object instanceof String)) {
  4. return object;
  5. }
  6. String val = interpreter.renderFlat((String) object);
  7. String strippedVal = Jsoup.parseBodyFragment(val).text();
  8. String normalizedVal = WHITESPACE.matcher(strippedVal).replaceAll(" ");
  9. return normalizedVal;
  10. }

代码示例来源:origin: andriusvelykis/reflow-maven-skin

  1. /**
  2. * Parses body fragment to the {@code <body>} element.
  3. *
  4. * @param content
  5. * @return the {@code body} element of the parsed content
  6. */
  7. private Element parseContent(String content) {
  8. Document doc = Jsoup.parseBodyFragment(content);
  9. doc.outputSettings().charset(outputEncoding);
  10. return doc.body();
  11. }

代码示例来源:origin: HubSpot/jinjava

  1. @Test
  2. public void urlizeText() {
  3. Document dom = Jsoup.parseBodyFragment(jinjava.render("{{ txt|urlize }}", new HashMap<String, Object>()));
  4. assertThat(dom.select("a")).hasSize(3);
  5. assertThat(dom.select("a").get(0).attr("href")).isEqualTo("http://www.espn.com");
  6. assertThat(dom.select("a").get(1).attr("href")).isEqualTo("http://yahoo.com");
  7. assertThat(dom.select("a").get(2).attr("href")).isEqualTo("https://hubspot.com");
  8. }

代码示例来源:origin: IQSS/dataverse

  1. public static String prettyPrint(String ugly) {
  2. Document doc = Jsoup.parseBodyFragment(ugly);
  3. doc.outputSettings().indentAmount(2);
  4. return doc.body().html();
  5. }

代码示例来源:origin: HubSpot/jinjava

  1. @Test
  2. public void testSimpleSlice() throws Exception {
  3. Document dom = Jsoup.parseBodyFragment(
  4. jinjava.render(
  5. Resources.toString(Resources.getResource("filter/slice-filter.jinja"), StandardCharsets.UTF_8),
  6. ImmutableMap.of("items", (Object) Lists.newArrayList("a", "b", "c", "d", "e", "f", "g"))));
  7. assertThat(dom.select(".columwrapper ul")).hasSize(3);
  8. assertThat(dom.select(".columwrapper .column-1 li")).hasSize(3);
  9. assertThat(dom.select(".columwrapper .column-2 li")).hasSize(3);
  10. assertThat(dom.select(".columwrapper .column-3 li")).hasSize(3);
  11. }

代码示例来源:origin: com.hubspot.jinjava/jinjava

  1. @Test
  2. public void testSimpleFn() {
  3. Document dom = Jsoup.parseBodyFragment(interpreter.render(fixture("simple")));
  4. assertThat(dom.select("div h2").text().trim()).isEqualTo("Hello World");
  5. assertThat(dom.select("div.contents").text().trim()).isEqualTo("This is a simple dialog rendered by using a macro and a call block.");
  6. }

代码示例来源:origin: dstl/baleen

  1. @Test
  2. public void testMixedEmpty() {
  3. Document doc = Jsoup.parseBodyFragment("<p></p><div></div><p>Hello</p>");
  4. m.manipulate(doc);
  5. assertEquals(doc.body().select("p").size(), 1);
  6. }

代码示例来源:origin: HubSpot/jinjava

  1. @Test
  2. public void forLoopUsingLoopLastVar() {
  3. context.put("the_list", Lists.newArrayList(1L, 2L, 3L, 7L));
  4. TagNode tagNode = (TagNode) fixture("loop-last-var");
  5. Document dom = Jsoup.parseBodyFragment(tag.interpret(tagNode, interpreter));
  6. assertThat(dom.select("h3")).hasSize(3);
  7. }

代码示例来源:origin: dstl/baleen

  1. @Test
  2. public void testSubjectHeading() {
  3. Document document =
  4. Jsoup.parseBodyFragment(
  5. "<p><b>THIS IS A SUBJECT HEADING</b></p><p>THIS IS A NOT SUBJECT HEADING</p><p>THIS IS not a SUBJECT HEADING</p><p>THIS IS NOT A SUBJECT HEADING EITHER.</p>");
  6. manipulator.manipulate(document);
  7. Elements h1s = document.select("h1");
  8. assertEquals(1, h1s.size());
  9. assertEquals("THIS IS A SUBJECT HEADING", h1s.first().text());
  10. }

代码示例来源:origin: dstl/baleen

  1. @Test
  2. public void test() {
  3. Document doc =
  4. Jsoup.parseBodyFragment(
  5. "<header>this</header><header></header><p>This is some text</p><footer>other</footer>");
  6. m.manipulate(doc);
  7. assertTrue(doc.body().select("header,footer").isEmpty());
  8. }
  9. }

相关文章