python-3.x 如何使用BeautifulSoup获取标签前的所有元素？

mxg2im7a 于 2022-11-19 发布在 Python

关注(0)|答案(2)|浏览(229)

我想找到第一个之前的所有元素，并在找到后退出循环。

example = """This should be in before section<p>Content before</p><p><strong>First Title</strong></p>Content of first title1<p>Content of first title2</p><p><strong>Second title</strong></p><p>Content of second title</p></strong>"""
soup = BeautifulSoup(example, 'html.parser')

for data in soup:
    print(data.previous_sibling)
    print(data.nextSibling.name)
    if nextSibling.name == '<p><strong>':
       print('found and add before content in variable')

输出变量应具有：

This should be in before section<p>Content before</p>

编辑：也尝试了以下代码

res = []
for sibling in soup.find('p').previous_siblings:
    res.append(sibling.text)
    
res.reverse()
res = ' '.join(res)

print(res)

它应该检查，而不仅仅是，我不知道如何才能做到这一点。

python-3.x

来源：https://stackoverflow.com/questions/74455762/how-to-get-all-elements-before-pstrong-tag-using-beautifulsoup

2条答案

按热度按时间

a6b3iqyw1#

我发现的解决方案也许其他可以找到有用的所以张贴我的答案在这里：

example = """<span>output1</span>This should be in overview section<span>output1</span><p>output 2</p><p><strong>First Title</strong></p>Content of first title1<p>Content of first title2</p><p><strong>Second title</strong></p><p>Content of second title</p></strong>"""
soup = BeautifulSoup(example, 'html.parser')

res = []
for sibling in soup.select_one('p:has(strong)').previous_siblings:
    res.append(sibling.text)
    
res.reverse()
res = ' '.join(res)

print(res)

使用p:has(strong)关键字，这是我从@HedgeHog得到的答案，谢谢你，我在我的解决方案中使用。

赞(0）回复(0）举报 2022-11-19

iq0todco2#

您也可以选择相反的方式来使用find_previous：

e = soup.select_one('p:has(strong)')
print(e.find_previous('p').previous, e.find_previous('p'))

示例

from bs4 import BeautifulSoup

example = """This should be in before section<p>Content before</p><p><strong>First Title</strong></p>Content of first title1<p>Content of first title2</p><p><strong>Second title</strong></p><p>Content of second title</p></strong>"""
soup = BeautifulSoup(example)
    
e = soup.select_one('p:has(strong)')
print(e.find_previous('p').previous, e.find_previous('p'))

输出

This should be in before section <p>Content before</p>

赞(0）回复(0）举报 2022-11-19

我来回答

python-3.x 如何使用BeautifulSoup获取标签前的所有元素< p>< strong>？

2条答案

输出

相关问题

热门标签

最新问答