从beautifulsoup提取特定数据输出

nuypyhwy 于 2021-08-20 发布在 Java

关注(0)|答案(2)|浏览(431)

我正在使用这个脚本。它提供了我想要的数据，但我所需要的只是“更新日期”部分。试图去掉后面的文字。


# import library

from bs4 import BeautifulSoup
import requests

# Request to website and download HTML contents

url='https://data.ed.gov/dataset/college-scorecard-all-data-files-through-6-2020/resources'
req=requests.get(url)
content=req.text
soup=BeautifulSoup(content)
raw=soup.findAll(class_="module-content")[3].text
print(raw.strip())

这是我得到的输出：

Updated 1-19-2021

There are no views created for this resource yet.

粗体和斜体输出是我想要得到的，而不是其他项目。

python beautifulsoup

来源：https://stackoverflow.com/questions/68307876/extract-specific-data-output-from-beautifulsoup

2条答案

按热度按时间

zujrkrfu1#

你可以使用 find_next() 返回第一个下一个匹配项的方法：

raw=soup.findAll(class_="module-content")[3].find_next(text=True)

完整示例：

from bs4 import BeautifulSoup
import requests

# Request to website and download HTML contents

url='https://data.ed.gov/dataset/college-scorecard-all-data-files-through-6-2020/resources'
req=requests.get(url)
content=req.text
soup=BeautifulSoup(content, "html.parser")
raw=soup.findAll(class_="module-content")[3].find_next(text=True)
print(raw.strip())

输出：

Updated 1-19-2021

赞(0）回复(0）举报 2021-08-20

bogh5gae2#

尝试：

import requests
from bs4 import BeautifulSoup

url = "https://data.ed.gov/dataset/college-scorecard-all-data-files-through-6-2020/resources"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

print(soup.select_one(".inner-primary .module-content").contents[0].strip())

印刷品：

Updated 1-19-2021

赞(0）回复(0）举报 2021-08-20

我来回答

从beautifulsoup提取特定数据输出

2条答案

相关问题

热门标签

最新问答