使用DOMDocument,?如何获取“h1 h2 h3 h4 h5 h6”DOM中存在的所有代码?我需要“h1 h2 h3 h4 h5 h6”之间的html内容
第一个
输出:
string(1) "p"
string(6) "txt1"
-----
string(2) "br"
string(0) ""
-----
string(2) "br"
string(0) ""
-----
string(2) "br"
string(0) ""
发送1发送3
发送器
发送5发送7...
没有标签的文本不包括在内。我怎么拿它?
完整示例,如果(1)1 =测试查询为NOT,则包含:非(包含(“h1 h2 h3..))
如果(0)$query = '//*[包含(“h1 h2 h3 h4 h5 h6”,名称())]';这会从标题中生成一个TOC,但我还需要标题之间的html
<?php
$html = <<<'HTML'
txt1
<h2>h2 txt2</h2>
txt3<br>
txt4<br>
txt5
<h3>h3 txt6</h3>
txt7
<h3>h3 txt8</h3>
txt9<br>
<h2>h2 txt10</h2>
txt11
<h2>h2 txt12</h2>
txt13
HTML;
libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML($html);
$xp = new DOMXPath($dom);
#$query = '//*[contains("h1 h2 h3 h4 h5 h6", name())]';
# 1 = test
if(1){
$query = '//*[not(contains("h1 h2 h3 h4 h5 h6 html body", name()))]';
$nodes = $xp->query($query);
//Using DOMDocument, ? how to get all code that exists between within "h1 h2 h3 h4 h5 h6" DOM?
//I need the html content between the "h1 h2 h3 h4 h5 h6" + I can query DOM "h1 h2 h3 h4 h5 h6" elements
echo '<pre>';
#var_dump($nodes); exit;
foreach($nodes as $node) {
echo '<hr>';
var_dump($node->localName);
var_dump($node->nodeValue);
}
echo '<pre>';
$vardumpis= <<<'VARDU'
string(1) "p"
string(6) "txt1"
-----
string(2) "br"
string(0) ""
-----
string(2) "br"
string(0) ""
-----
string(2) "br"
string(0) ""
VARDU;
exit;
}
# end test
$query = '//*[contains("h1 h2 h3 h4 h5 h6", name())]';
$nodes = $xp->query($query);
//generate TOC from headlines result1:
$currentLevel = ['level' => 0, 'count' => 0];
$stack = [];
$format = '<li>%s</li>';
$result1 = '';
foreach($nodes as $node) {
$level = (int)$node->tagName[1]; // extract the digit after h
while($level < $currentLevel['level']) {
$currentLevel = array_pop($stack);
$result1 .= '</ul>';
}
if ($level === $currentLevel['level']) {
$currentLevel['count']++;
} else {
$stack[] = $currentLevel;
$currentLevel = ['level' => $level, 'count' => 1];
$result1 .= '<ul>';
}
$result1 .= sprintf($format, $node->nodeValue);
}
$result1 .= str_repeat('</ul>', count($stack));
//THIS is what I need result2:
$target2 = <<<'TARG'
txt1<br>
</ul><h2>h2 txt2</h2><ul>
txt3<br>
txt4<br>
txt5
<h3>h3 txt6</h3><ul>
txt7
</ul><h3>h3 txt8</h3><ul>
txt9
</ul>
</ul><h2>h2 txt10</h2><ul>
txt11
</ul><h2>h2 txt12</h2><ul>
txt13
</ul>
TARG;
file_put_contents('toc15.htm', 'This I have: TOC result1:<br>'. $result1 .'<br><br><hr>This I need: target2 with content between headlines tags <br>'. $target2);
//help php DOM: https://3v4l.org/aDSrK https://schlitt.info/opensource/blog/0704_xpath.html#node-relations https://www.php.net/manual/en/class.domdocument.php https://schlitt.info/opensource/blog/0704_xpath.html#node-relations https://www.abdulibrahim.com/php-scraping-using-dom-and-xpath-tutorial/#xpath_conditions https://www.lambdatest.com/blog/complete-guide-for-using-xpath-in-selenium-with-examples/
这我有:TOC结果1:
h2 txt2
______h3 txt6
______h3 txt8
h2 txt10
h2 txt12
1条答案
按热度按时间tjvv9vkg1#
整个文档的字符串值可以通过以下简单的XPath获得:字符串(/)
文档中的所有文本节点将为://文字()
因此,使用:
如果您有任何问题,请与我们联系。
我有:
so, now the recognized text has to be written into the file.
At least a little further.