如何在python BS4中从li标签获取属性值

mwg9r5ms  于 2023-02-21  发布在  Python
关注(0)|答案(1)|浏览(280)

如何使用BS4库获取此链接标记的src属性?
现在我正在使用下面的代码来实现这个结果,但是我不能

<li class="active" id="server_0" data-embed="<iframe src='https://vk.com/video_ext.php?oid=757563422&id=456240701&hash=1d8fcd32c5b5f28b' scrolling='no' frameborder='0' width='100%' height='100%' allowfullscreen='true' webkitallowfullscreen='true' mozallowfullscreen='true' ></iframe>"><a><span><i class="fa fa-eye"></i></span> <strong>vk</strong></a></li>

我想要这个值源代码='https://vk.com/video_ext.php?oid=757563422&id=456240701&hash=1d8fcd32c5b5f28b'
这是我的代码我访问['数据嵌入']我不知道如何提取链接这是我的代码

from bs4 import BeautifulSoup as bs
import cloudscraper

scraper = cloudscraper.create_scraper()
access = "https://w.mycima.cc/play.php?vid=d4d8322b9"
response = scraper.get(access)
doc2 = bs(response.content, "lxml")
container2 = doc2.find("div", id='player').find("ul", class_="list_servers list_embedded col-sec").find("li")
link = container2['data-embed']
print(link)

结果

<Response [200]>
https://w.mycima.cc/play.php?vid=d4d8322b9
<iframe src='https://vk.com/video_ext.php?oid=757563422&id=456240701&hash=1d8fcd32c5b5f28b' scrolling='no' frameborder='0' width='100%' height='100%' allowfullscreen='true' webkitallowfullscreen='true' mozallowfullscreen='true' ></iframe>

Process finished with exit code 0
kokeuurv

kokeuurv1#

从靓汤的文档来看
可以通过将标记视为字典来访问标记的属性
他们给予了这样一个例子:

tag = BeautifulSoup('<b id="boldest">bold</b>', 'html.parser')
tag['id']
# 'boldest'

参考文献和更多详细信息,请参见:https://www.crummy.com/software/BeautifulSoup/bs4/doc/#attributes
所以,对于你的情况,你可以写

print(link.find("iframe")['src'])

如果link结果是纯文本,而不是soup对象(根据注解,您的特定示例可能就是这种情况),那么您可以求助于字符串搜索、regex或更漂亮的soup'ing,例如:

link = """<Response [200]>https://w.mycima.cc/play.php?vid=d4d8322b9<iframe src='https://vk.com/video_ext.php?oid=757563422&id=456240701&hash=1d8fcd32c5b5f28b'></iframe>"""
iframe = re.search(r"<iframe.*>", link)
if iframe:
    soup = BeautifulSoup(iframe.group(0),"html.parser")
    print("src=" + soup.find("iframe")['src'])

相关问题