使用没有明显标记的Python BeautifulSoup(BS4)获取HREF

9cbw7uwe 于 2023-03-07 发布在 Python

关注(0)|答案(1)|浏览(178)

我正在尝试从这个HTML代码子集中获取URL。基本上，我发现所有的<item>标签都使用soup.find_all('item', class_="sale-item")，我想从第一个href中提取URL。有人能帮忙吗？

<item class="sale-item" style="margin:6px; padding: 0; text-decoration:none;" tabindex="0" title="">
<style="text-decoration:none; display:block"="" height:100%;="" href="http://www.assus.com/12165456ALPHA.html">
<header>
<div class="smaht-header" style="width:100%; color:#000000; background-color:#EEEEEE;"><span style="color:#2a6293!important">ONLINE-ONLY</span></div>
<div style="color:#2a6293; width:100%; float:left">
....code continues to </item>

很多BS 4解决方案都假设href封装在a标签中。我不确定如何继续这个设置...提前感谢你！

python

来源：https://stackoverflow.com/questions/75645404/obtaining-href-with-python-beautifulsoup-bs4-with-no-apparent-tags

1条答案

按热度按时间

djmepvbi1#

如果你只想要一个item标签中所有前href的列表（item_links如下），你可以使用select（带有CSS选择器）和list comprehension：

item_links = [i.find(href=True)['href'] for i in soup.select('item.sale-item:has([href])')]

如果你想循环item并做更多的事情而不仅仅是获取链接，你可以使用for循环，如下所示：

for item in soup.select('item.sale-item:has([href])'):
    item_link = item.find(href=True)['href'] # item.select_one('[href]')['href']
    ## WHATEVER YOU WANT TO DO WITH item AND item_link ##

如果您想要遍历item，而不管它们是否包含任何带有href的标签：

for item in soup.find_all('item', class_="sale-item"): # soup.select('item.sale-item'):
    item_link = item.find(href=True) ## no Tags with href in item --> item_link=None
    if item_link: item_link = item_link['href']
    ## WHATEVER YOU WANT TO DO WITH item AND item_link ##

无论属性属于哪种类型的标签，这些都将找到第一个href。

赞(0）回复(0）举报 2023-03-07

我来回答

使用没有明显标记的Python BeautifulSoup(BS4)获取HREF

1条答案

相关问题

热门标签

最新问答