如何使用python在<tr>括号中抓取多个项目并将其拆分为3个变量?

btqmn9zl  于 2022-10-22  发布在  Python
关注(0)|答案(2)|浏览(109)

我正在尝试用多个括号刮一个网站。我的计划是有3个变量(oem,model,leadtime)来生成期望的输出。然而,我不知道如何在3个变量中抓取这个网页。鉴于我是python和BeautifulSoup的新手,我非常感谢您的反馈。
具有3个变量和命令的期望输出:
打印(oem、型号、交付周期)

Renault, Mégane E-Tech, 12 Monate
Nissan, Ariya, 6 Monate
...
Volvo, XC90, 10-12 Monate

截至目前的输出:

Renault Mégane E-Tech12 Monate
Nissan Ariya6 Monate
Peugeot e-2086-7 Monate
KIA Sportage5-6 Monate6-7 Monate (Hybrid)
Jeep Compass3-5 Monate3-5 Monate (Hybrid)
VW Taigo3-6 Monate
...
XC9010-12 Monate

截至目前的代码:

from bs4 import BeautifulSoup
import requests

# Inputs/URLs to scrape:

URL = ('https://www.carwow.de/neuwagen-lieferzeiten#gref')
(response := requests.get(URL)).raise_for_status()
soup = BeautifulSoup(response.text, 'lxml')
overview = soup.find()

for card in overview.find_all('tbody'):
    for model2 in card.find_all('tr'):
        model = model2.text.replace('Angebote vergleichen', '')
        #oem?-->this needs to be defined
        #leadtime?--> this needs to defined
        print(model)
zdwk9cvp

zdwk9cvp1#

品牌名称位于h3标记内。你可以用这种方法得到父母

from bs4 import BeautifulSoup
import requests

# Inputs/URLs to scrape:

URL = ('https://www.carwow.de/neuwagen-lieferzeiten#gref')
(response := requests.get(URL)).raise_for_status()
soup = BeautifulSoup(response.text, 'lxml')
overview = soup.find()

for el in overview.find_all("div", {"class": "expandable-content-container"}):
    header = el.find("h3").text.strip()
    if not header.startswith("Top 10") and not header.endswith("?"):
        for row in el.find_all("tr")[1:]:
            model_monate = ", ".join(
                list(map(lambda x: x.text, row.find_all("td")[:-1]))
            )
            print(f"{el.find('h3').text.strip()}, {model_monate}")
        print("----")
rryofs0p

rryofs0p2#

您试图获取的车型信息的部分实际上存储在单独的td标记中,这意味着,您可以访问它们的索引来获取相应的信息,请尝试下面的代码。

import requests
from bs4 import BeautifulSoup

response = requests.get("https://www.carwow.de/neuwagen-lieferzeiten#gref").text
soup = BeautifulSoup(response, 'html.parser')

for tbody in soup.select('tbody'):
    for tr in tbody:
        brand = tr.select('td > a')[0].get('href').split('/')[3].capitalize()
        model = tr.select('td > a')[0].get('href').split('/')[4].capitalize()
        monate = tr.select('td')[1].getText(strip=True)
        print(f'{brand}, {model}, {monate}')

相关问题