python—如何使用beautifulsoup4和请求获取标题的内容

zu0ti5jz  于 2021-09-29  发布在  Java
关注(0)|答案(1)|浏览(413)

所以我从这个链接中取了药物的名称:药物列表
现在我想获得每种药物的内容,同时每种药物都有自己的链接示例:medicines示例
如何使用beautifulsoup4和请求库获取该药物的内容?

import requests
from bs4 import BeautifulSoup
from pprint import pp

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:90.0) Gecko/20100101 Firefox/90.0'
}

def main(url):
    r = requests.get(url, headers=headers)
    soup = BeautifulSoup(r.text, 'lxml')
    title = [x.text for x in soup.select(
        'a[class$=section__item-link]')]
    count = 0
    for x in range (0, len(title)):
        count += 1
        print("{0}. {1}\n".format(count, title[x]))

main('https://www.klikdokter.com/obat')
zzwlnbp8

zzwlnbp81#

根据我所看到的来自https://www.klikdokter.com/obat 您应该能够执行以下操作:-

import requests
from bs4 import BeautifulSoup
AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_5_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Safari/605.1.15'
BASEURL = 'https://www.klikdokter.com/obat'
headers = {'User-Agent': AGENT}
response = requests.get(BASEURL, headers=headers)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
for tag in soup.find_all('a', class_='topics-index--section__item-link'):
    href = tag.get('href')
    if href is not None:
        print(href)
        response = requests.get(href, headers=headers)
        response.raise_for_status()
        """ Do your processing here """

相关问题