本文整理了Java中org.jsoup.Jsoup.parseBodyFragment()
方法的一些代码示例,展示了Jsoup.parseBodyFragment()
的具体用法。这些代码示例主要来源于Github
/Stackoverflow
/Maven
等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。Jsoup.parseBodyFragment()
方法的具体详情如下:
包路径:org.jsoup.Jsoup
类名称:Jsoup
方法名:parseBodyFragment
[英]Parse a fragment of HTML, with the assumption that it forms the body of the HTML.
[中]解析HTML片段,假设它构成HTML的主体。
代码示例来源:origin: JpressProjects/jpress
public static List<String> getImageSrcs(String html) {
if (StrUtils.isBlank(html)) {
return null;
}
List<String> list = new ArrayList<String>();
Document doc = Jsoup.parseBodyFragment(html);
Elements es = doc.select("img");
if (es != null && es.size() > 0) {
for (Element e : es) {
String src = e.attr("src");
if (StrUtils.isNotBlank(src)) list.add(src);
}
}
return list.isEmpty() ? null : list;
}
代码示例来源:origin: JpressProjects/jpress
public static String getFirstImageSrc(String html) {
if (StrUtils.isBlank(html))
return null;
Elements es = Jsoup.parseBodyFragment(html).select("img");
if (es != null && es.size() > 0) {
String src = es.first().attr("src");
return StrUtils.isBlank(src) ? null : src;
}
return null;
}
代码示例来源:origin: RipMeApp/ripme
@Override
public List<String> getURLsFromJSON(JSONObject json) {
List<String> imageURLs = new ArrayList<>();
JSONArray results = json.getJSONObject("content").getJSONArray("results");
for (int i = 0; i < results.length(); i++) {
Document doc = Jsoup.parseBodyFragment(results.getJSONObject(i).getString("html"));
if (doc.html().contains("ismature")) {
LOGGER.info("Downloading nsfw image");
String nsfwImage = getFullsizedNSFWImage(doc.select("span").attr("href"));
if (nsfwImage != null && nsfwImage.startsWith("http")) {
imageURLs.add(nsfwImage);
}
}
try {
String imageURL = doc.select("span").first().attr("data-super-full-img");
if (!imageURL.isEmpty() && imageURL.startsWith("http")) {
imageURLs.add(imageURL);
}
} catch (NullPointerException e) {
LOGGER.info(i + " does not contain any images");
}
}
return imageURLs;
}
代码示例来源:origin: ankidroid/Anki-Android
/**
* Returns the list of text snippets contained in the given HTML fragment that should be read
* using the Android text-to-speech engine, together with the languages they are in.
* <p>
* Each returned LocalisedText object contains the text extracted from a <tts> element
* whose 'service' attribute is set to 'android', and the localeCode taken from the 'voice'
* attribute of that element. This holds unless the HTML fragment contains no such <tts>
* elements; in that case the function returns a single LocalisedText object containing the
* text extracted from the whole HTML fragment, with the localeCode set to an empty string.
*/
public static List<LocalisedText> getTextsToRead(String html) {
List<LocalisedText> textsToRead = new ArrayList<>();
Element elem = Jsoup.parseBodyFragment(html).body();
parseTtsElements(elem, textsToRead);
if (textsToRead.size() == 0) {
// No <tts service="android"> elements found: return the text of the whole HTML fragment
textsToRead.add(new LocalisedText(elem.text()));
}
return textsToRead;
}
代码示例来源:origin: RipMeApp/ripme
doc = Jsoup.parseBodyFragment(body);
List<Element> elements = doc.select("a");
Set<String> photoIDsToGet = new HashSet<>();
代码示例来源:origin: org.jsoup/jsoup
/**
Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted
tags and attributes.
@param bodyHtml input untrusted HTML (body fragment)
@param baseUri URL to resolve relative URLs against
@param whitelist white-list of permitted HTML elements
@return safe HTML (body fragment)
@see Cleaner#clean(Document)
*/
public static String clean(String bodyHtml, String baseUri, Whitelist whitelist) {
Document dirty = parseBodyFragment(bodyHtml, baseUri);
Cleaner cleaner = new Cleaner(whitelist);
Document clean = cleaner.clean(dirty);
return clean.body().html();
}
代码示例来源:origin: JpressProjects/jpress
document = Jsoup.parseBodyFragment(html, params.getBaseUri());
} else {
document = Jsoup.parseBodyFragment(html);
代码示例来源:origin: org.jsoup/jsoup
/**
* Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of
* permitted tags and attributes.
* <p>The HTML is treated as a body fragment; it's expected the cleaned HTML will be used within the body of an
* existing document. If you want to clean full documents, use {@link Cleaner#clean(Document)} instead, and add
* structural tags (<code>html, head, body</code> etc) to the whitelist.
*
* @param bodyHtml input untrusted HTML (body fragment)
* @param baseUri URL to resolve relative URLs against
* @param whitelist white-list of permitted HTML elements
* @param outputSettings document output settings; use to control pretty-printing and entity escape modes
* @return safe HTML (body fragment)
* @see Cleaner#clean(Document)
*/
public static String clean(String bodyHtml, String baseUri, Whitelist whitelist, Document.OutputSettings outputSettings) {
Document dirty = parseBodyFragment(bodyHtml, baseUri);
Cleaner cleaner = new Cleaner(whitelist);
Document clean = cleaner.clean(dirty);
clean.outputSettings(outputSettings);
return clean.body().html();
}
代码示例来源:origin: TakWolf/CNode-Material-Design
public static Document handleHtml(String html) {
// 保证html不为null
html = TextUtils.isEmpty(html) ? "" : html;
// 过滤xss
return cleaner.clean(Jsoup.parseBodyFragment(html, ApiDefine.HOST_BASE_URL));
}
代码示例来源:origin: jbake-org/jbake
/**
* Image paths are specified as w.r.t. assets folder. This function prefix site host to all img src except
* the ones that starts with http://, https://.
* <p>
* If image path starts with "./", i.e. relative to the source file, then it first replace that with output file directory and the add site host.
*
* @param fileContents Map representing file contents
* @param configuration Configuration object
*/
public static void fixImageSourceUrls(Map<String, Object> fileContents, JBakeConfiguration configuration) {
String htmlContent = fileContents.get(Attributes.BODY).toString();
boolean prependSiteHost = configuration.getImgPathPrependHost();
String siteHost = configuration.getSiteHost();
String uri = getDocumentUri(fileContents);
Document document = Jsoup.parseBodyFragment(htmlContent);
Elements allImgs = document.getElementsByTag("img");
for (Element img : allImgs) {
transformImageSource(img, uri, siteHost, prependSiteHost);
}
//Use body().html() to prevent adding <body></body> from parsed fragment.
fileContents.put(Attributes.BODY, document.body().html());
}
代码示例来源:origin: HubSpot/jinjava
@Override
public Object filter(Object object, JinjavaInterpreter interpreter, String... arg) {
if (!(object instanceof String)) {
return object;
}
String val = interpreter.renderFlat((String) object);
String strippedVal = Jsoup.parseBodyFragment(val).text();
String normalizedVal = WHITESPACE.matcher(strippedVal).replaceAll(" ");
return normalizedVal;
}
代码示例来源:origin: andriusvelykis/reflow-maven-skin
/**
* Parses body fragment to the {@code <body>} element.
*
* @param content
* @return the {@code body} element of the parsed content
*/
private Element parseContent(String content) {
Document doc = Jsoup.parseBodyFragment(content);
doc.outputSettings().charset(outputEncoding);
return doc.body();
}
代码示例来源:origin: HubSpot/jinjava
@Test
public void urlizeText() {
Document dom = Jsoup.parseBodyFragment(jinjava.render("{{ txt|urlize }}", new HashMap<String, Object>()));
assertThat(dom.select("a")).hasSize(3);
assertThat(dom.select("a").get(0).attr("href")).isEqualTo("http://www.espn.com");
assertThat(dom.select("a").get(1).attr("href")).isEqualTo("http://yahoo.com");
assertThat(dom.select("a").get(2).attr("href")).isEqualTo("https://hubspot.com");
}
代码示例来源:origin: IQSS/dataverse
public static String prettyPrint(String ugly) {
Document doc = Jsoup.parseBodyFragment(ugly);
doc.outputSettings().indentAmount(2);
return doc.body().html();
}
代码示例来源:origin: HubSpot/jinjava
@Test
public void testSimpleSlice() throws Exception {
Document dom = Jsoup.parseBodyFragment(
jinjava.render(
Resources.toString(Resources.getResource("filter/slice-filter.jinja"), StandardCharsets.UTF_8),
ImmutableMap.of("items", (Object) Lists.newArrayList("a", "b", "c", "d", "e", "f", "g"))));
assertThat(dom.select(".columwrapper ul")).hasSize(3);
assertThat(dom.select(".columwrapper .column-1 li")).hasSize(3);
assertThat(dom.select(".columwrapper .column-2 li")).hasSize(3);
assertThat(dom.select(".columwrapper .column-3 li")).hasSize(3);
}
代码示例来源:origin: com.hubspot.jinjava/jinjava
@Test
public void testSimpleFn() {
Document dom = Jsoup.parseBodyFragment(interpreter.render(fixture("simple")));
assertThat(dom.select("div h2").text().trim()).isEqualTo("Hello World");
assertThat(dom.select("div.contents").text().trim()).isEqualTo("This is a simple dialog rendered by using a macro and a call block.");
}
代码示例来源:origin: dstl/baleen
@Test
public void testMixedEmpty() {
Document doc = Jsoup.parseBodyFragment("<p></p><div></div><p>Hello</p>");
m.manipulate(doc);
assertEquals(doc.body().select("p").size(), 1);
}
代码示例来源:origin: HubSpot/jinjava
@Test
public void forLoopUsingLoopLastVar() {
context.put("the_list", Lists.newArrayList(1L, 2L, 3L, 7L));
TagNode tagNode = (TagNode) fixture("loop-last-var");
Document dom = Jsoup.parseBodyFragment(tag.interpret(tagNode, interpreter));
assertThat(dom.select("h3")).hasSize(3);
}
代码示例来源:origin: dstl/baleen
@Test
public void testSubjectHeading() {
Document document =
Jsoup.parseBodyFragment(
"<p><b>THIS IS A SUBJECT HEADING</b></p><p>THIS IS A NOT SUBJECT HEADING</p><p>THIS IS not a SUBJECT HEADING</p><p>THIS IS NOT A SUBJECT HEADING EITHER.</p>");
manipulator.manipulate(document);
Elements h1s = document.select("h1");
assertEquals(1, h1s.size());
assertEquals("THIS IS A SUBJECT HEADING", h1s.first().text());
}
代码示例来源:origin: dstl/baleen
@Test
public void test() {
Document doc =
Jsoup.parseBodyFragment(
"<header>this</header><header></header><p>This is some text</p><footer>other</footer>");
m.manipulate(doc);
assertTrue(doc.body().select("header,footer").isEmpty());
}
}
内容来源于网络,如有侵权,请联系作者删除!