org.jsoup.Jsoup.parseBodyFragment()方法的使用及代码示例

x33g5p2x  于2022-01-21 转载在 其他  
字(10.0k)|赞(0)|评价(0)|浏览(432)

本文整理了Java中org.jsoup.Jsoup.parseBodyFragment()方法的一些代码示例,展示了Jsoup.parseBodyFragment()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。Jsoup.parseBodyFragment()方法的具体详情如下:
包路径:org.jsoup.Jsoup
类名称:Jsoup
方法名:parseBodyFragment

Jsoup.parseBodyFragment介绍

[英]Parse a fragment of HTML, with the assumption that it forms the body of the HTML.
[中]解析HTML片段,假设它构成HTML的主体。

代码示例

代码示例来源:origin: JpressProjects/jpress

public static List<String> getImageSrcs(String html) {
  if (StrUtils.isBlank(html)) {
    return null;
  }
  List<String> list = new ArrayList<String>();
  Document doc = Jsoup.parseBodyFragment(html);
  Elements es = doc.select("img");
  if (es != null && es.size() > 0) {
    for (Element e : es) {
      String src = e.attr("src");
      if (StrUtils.isNotBlank(src)) list.add(src);
    }
  }
  return list.isEmpty() ? null : list;
}

代码示例来源:origin: JpressProjects/jpress

public static String getFirstImageSrc(String html) {
  if (StrUtils.isBlank(html))
    return null;
  Elements es = Jsoup.parseBodyFragment(html).select("img");
  if (es != null && es.size() > 0) {
    String src = es.first().attr("src");
    return StrUtils.isBlank(src) ? null : src;
  }
  return null;
}

代码示例来源:origin: RipMeApp/ripme

@Override
public List<String> getURLsFromJSON(JSONObject json) {
  List<String> imageURLs = new ArrayList<>();
  JSONArray results = json.getJSONObject("content").getJSONArray("results");
  for (int i = 0; i < results.length(); i++) {
    Document doc = Jsoup.parseBodyFragment(results.getJSONObject(i).getString("html"));
    if (doc.html().contains("ismature")) {
      LOGGER.info("Downloading nsfw image");
      String nsfwImage = getFullsizedNSFWImage(doc.select("span").attr("href"));
      if (nsfwImage != null && nsfwImage.startsWith("http")) {
        imageURLs.add(nsfwImage);
      }
    }
    try {
      String imageURL = doc.select("span").first().attr("data-super-full-img");
      if (!imageURL.isEmpty() && imageURL.startsWith("http")) {
        imageURLs.add(imageURL);
      }
    } catch (NullPointerException e) {
      LOGGER.info(i + " does not contain any images");
    }
  }
  return imageURLs;
}

代码示例来源:origin: ankidroid/Anki-Android

/**
 * Returns the list of text snippets contained in the given HTML fragment that should be read
 * using the Android text-to-speech engine, together with the languages they are in.
 * <p>
 * Each returned LocalisedText object contains the text extracted from a &lt;tts&gt; element
 * whose 'service' attribute is set to 'android', and the localeCode taken from the 'voice'
 * attribute of that element. This holds unless the HTML fragment contains no such &lt;tts&gt;
 * elements; in that case the function returns a single LocalisedText object containing the
 * text extracted from the whole HTML fragment, with the localeCode set to an empty string.
 */
public static List<LocalisedText> getTextsToRead(String html) {
  List<LocalisedText> textsToRead = new ArrayList<>();
  Element elem = Jsoup.parseBodyFragment(html).body();
  parseTtsElements(elem, textsToRead);
  if (textsToRead.size() == 0) {
    // No <tts service="android"> elements found: return the text of the whole HTML fragment
    textsToRead.add(new LocalisedText(elem.text()));
  }
  return textsToRead;
}

代码示例来源:origin: RipMeApp/ripme

doc = Jsoup.parseBodyFragment(body);
List<Element> elements = doc.select("a");
Set<String> photoIDsToGet = new HashSet<>();

代码示例来源:origin: org.jsoup/jsoup

/**
 Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted
 tags and attributes.
 @param bodyHtml  input untrusted HTML (body fragment)
 @param baseUri   URL to resolve relative URLs against
 @param whitelist white-list of permitted HTML elements
 @return safe HTML (body fragment)
 @see Cleaner#clean(Document)
 */
public static String clean(String bodyHtml, String baseUri, Whitelist whitelist) {
  Document dirty = parseBodyFragment(bodyHtml, baseUri);
  Cleaner cleaner = new Cleaner(whitelist);
  Document clean = cleaner.clean(dirty);
  return clean.body().html();
}

代码示例来源:origin: JpressProjects/jpress

document = Jsoup.parseBodyFragment(html, params.getBaseUri());
} else {
  document = Jsoup.parseBodyFragment(html);

代码示例来源:origin: org.jsoup/jsoup

/**
 * Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of
 * permitted tags and attributes.
 * <p>The HTML is treated as a body fragment; it's expected the cleaned HTML will be used within the body of an
 * existing document. If you want to clean full documents, use {@link Cleaner#clean(Document)} instead, and add
 * structural tags (<code>html, head, body</code> etc) to the whitelist.
 *
 * @param bodyHtml input untrusted HTML (body fragment)
 * @param baseUri URL to resolve relative URLs against
 * @param whitelist white-list of permitted HTML elements
 * @param outputSettings document output settings; use to control pretty-printing and entity escape modes
 * @return safe HTML (body fragment)
 * @see Cleaner#clean(Document)
 */
public static String clean(String bodyHtml, String baseUri, Whitelist whitelist, Document.OutputSettings outputSettings) {
  Document dirty = parseBodyFragment(bodyHtml, baseUri);
  Cleaner cleaner = new Cleaner(whitelist);
  Document clean = cleaner.clean(dirty);
  clean.outputSettings(outputSettings);
  return clean.body().html();
}

代码示例来源:origin: TakWolf/CNode-Material-Design

public static Document handleHtml(String html) {
  // 保证html不为null
  html = TextUtils.isEmpty(html) ? "" : html;
  // 过滤xss
  return cleaner.clean(Jsoup.parseBodyFragment(html, ApiDefine.HOST_BASE_URL));
}

代码示例来源:origin: jbake-org/jbake

/**
 * Image paths are specified as w.r.t. assets folder. This function prefix site host to all img src except
 * the ones that starts with http://, https://.
 * <p>
 * If image path starts with "./", i.e. relative to the source file, then it first replace that with output file directory and the add site host.
 *
 * @param fileContents  Map representing file contents
 * @param configuration Configuration object
 */
public static void fixImageSourceUrls(Map<String, Object> fileContents, JBakeConfiguration configuration) {
  String htmlContent = fileContents.get(Attributes.BODY).toString();
  boolean prependSiteHost = configuration.getImgPathPrependHost();
  String siteHost = configuration.getSiteHost();
  String uri = getDocumentUri(fileContents);
  Document document = Jsoup.parseBodyFragment(htmlContent);
  Elements allImgs = document.getElementsByTag("img");
  for (Element img : allImgs) {
    transformImageSource(img, uri, siteHost, prependSiteHost);
  }
  //Use body().html() to prevent adding <body></body> from parsed fragment.
  fileContents.put(Attributes.BODY, document.body().html());
}

代码示例来源:origin: HubSpot/jinjava

@Override
public Object filter(Object object, JinjavaInterpreter interpreter, String... arg) {
 if (!(object instanceof String)) {
  return object;
 }
 String val = interpreter.renderFlat((String) object);
 String strippedVal = Jsoup.parseBodyFragment(val).text();
 String normalizedVal = WHITESPACE.matcher(strippedVal).replaceAll(" ");
 return normalizedVal;
}

代码示例来源:origin: andriusvelykis/reflow-maven-skin

/**
 * Parses body fragment to the {@code <body>} element.
 * 
 * @param content
 * @return the {@code body} element of the parsed content
 */
private Element parseContent(String content) {
  Document doc = Jsoup.parseBodyFragment(content);
  doc.outputSettings().charset(outputEncoding);
  return doc.body();
}

代码示例来源:origin: HubSpot/jinjava

@Test
public void urlizeText() {
 Document dom = Jsoup.parseBodyFragment(jinjava.render("{{ txt|urlize }}", new HashMap<String, Object>()));
 assertThat(dom.select("a")).hasSize(3);
 assertThat(dom.select("a").get(0).attr("href")).isEqualTo("http://www.espn.com");
 assertThat(dom.select("a").get(1).attr("href")).isEqualTo("http://yahoo.com");
 assertThat(dom.select("a").get(2).attr("href")).isEqualTo("https://hubspot.com");
}

代码示例来源:origin: IQSS/dataverse

public static String prettyPrint(String ugly) {
  Document doc = Jsoup.parseBodyFragment(ugly);
  doc.outputSettings().indentAmount(2);
  return doc.body().html();
}

代码示例来源:origin: HubSpot/jinjava

@Test
public void testSimpleSlice() throws Exception {
 Document dom = Jsoup.parseBodyFragment(
   jinjava.render(
     Resources.toString(Resources.getResource("filter/slice-filter.jinja"), StandardCharsets.UTF_8),
     ImmutableMap.of("items", (Object) Lists.newArrayList("a", "b", "c", "d", "e", "f", "g"))));
 assertThat(dom.select(".columwrapper ul")).hasSize(3);
 assertThat(dom.select(".columwrapper .column-1 li")).hasSize(3);
 assertThat(dom.select(".columwrapper .column-2 li")).hasSize(3);
 assertThat(dom.select(".columwrapper .column-3 li")).hasSize(3);
}

代码示例来源:origin: com.hubspot.jinjava/jinjava

@Test
public void testSimpleFn() {
 Document dom = Jsoup.parseBodyFragment(interpreter.render(fixture("simple")));
 assertThat(dom.select("div h2").text().trim()).isEqualTo("Hello World");
 assertThat(dom.select("div.contents").text().trim()).isEqualTo("This is a simple dialog rendered by using a macro and a call block.");
}

代码示例来源:origin: dstl/baleen

@Test
public void testMixedEmpty() {
 Document doc = Jsoup.parseBodyFragment("<p></p><div></div><p>Hello</p>");
 m.manipulate(doc);
 assertEquals(doc.body().select("p").size(), 1);
}

代码示例来源:origin: HubSpot/jinjava

@Test
public void forLoopUsingLoopLastVar() {
 context.put("the_list", Lists.newArrayList(1L, 2L, 3L, 7L));
 TagNode tagNode = (TagNode) fixture("loop-last-var");
 Document dom = Jsoup.parseBodyFragment(tag.interpret(tagNode, interpreter));
 assertThat(dom.select("h3")).hasSize(3);
}

代码示例来源:origin: dstl/baleen

@Test
public void testSubjectHeading() {
 Document document =
   Jsoup.parseBodyFragment(
     "<p><b>THIS IS A SUBJECT HEADING</b></p><p>THIS IS A NOT SUBJECT HEADING</p><p>THIS IS not a SUBJECT HEADING</p><p>THIS IS NOT A SUBJECT HEADING EITHER.</p>");
 manipulator.manipulate(document);
 Elements h1s = document.select("h1");
 assertEquals(1, h1s.size());
 assertEquals("THIS IS A SUBJECT HEADING", h1s.first().text());
}

代码示例来源:origin: dstl/baleen

@Test
 public void test() {
  Document doc =
    Jsoup.parseBodyFragment(
      "<header>this</header><header></header><p>This is some text</p><footer>other</footer>");
  m.manipulate(doc);

  assertTrue(doc.body().select("header,footer").isEmpty());
 }
}

相关文章