如何使用python在＜tr＞括号中抓取多个项目并将其拆分为3个变量？

btqmn9zl 于 2022-10-22 发布在 Python

关注(0)|答案(2)|浏览(109)

我正在尝试用多个括号刮一个网站。我的计划是有3个变量（oem，model，leadtime）来生成期望的输出。然而，我不知道如何在3个变量中抓取这个网页。鉴于我是python和BeautifulSoup的新手，我非常感谢您的反馈。
具有3个变量和命令的期望输出：
打印（oem、型号、交付周期）

Renault, Mégane E-Tech, 12 Monate
Nissan, Ariya, 6 Monate
...
Volvo, XC90, 10-12 Monate

截至目前的输出：

Renault Mégane E-Tech12 Monate
Nissan Ariya6 Monate
Peugeot e-2086-7 Monate
KIA Sportage5-6 Monate6-7 Monate (Hybrid)
Jeep Compass3-5 Monate3-5 Monate (Hybrid)
VW Taigo3-6 Monate
...
XC9010-12 Monate

截至目前的代码：

from bs4 import BeautifulSoup
import requests

# Inputs/URLs to scrape:

URL = ('https://www.carwow.de/neuwagen-lieferzeiten#gref')
(response := requests.get(URL)).raise_for_status()
soup = BeautifulSoup(response.text, 'lxml')
overview = soup.find()

for card in overview.find_all('tbody'):
    for model2 in card.find_all('tr'):
        model = model2.text.replace('Angebote vergleichen', '')
        #oem?-->this needs to be defined
        #leadtime?--> this needs to defined
        print(model)

python

来源：https://stackoverflow.com/questions/73233342/how-to-webscrape-multiple-items-in-tr-bracket-and-split-them-in-3-variables-wi

2条答案

按热度按时间

zdwk9cvp1#

品牌名称位于h3标记内。你可以用这种方法得到父母

from bs4 import BeautifulSoup
import requests

# Inputs/URLs to scrape:

URL = ('https://www.carwow.de/neuwagen-lieferzeiten#gref')
(response := requests.get(URL)).raise_for_status()
soup = BeautifulSoup(response.text, 'lxml')
overview = soup.find()

for el in overview.find_all("div", {"class": "expandable-content-container"}):
    header = el.find("h3").text.strip()
    if not header.startswith("Top 10") and not header.endswith("?"):
        for row in el.find_all("tr")[1:]:
            model_monate = ", ".join(
                list(map(lambda x: x.text, row.find_all("td")[:-1]))
            )
            print(f"{el.find('h3').text.strip()}, {model_monate}")
        print("----")

赞(0）回复(0）举报 2022-10-22

rryofs0p2#

您试图获取的车型信息的部分实际上存储在单独的td标记中，这意味着，您可以访问它们的索引来获取相应的信息，请尝试下面的代码。

import requests
from bs4 import BeautifulSoup

response = requests.get("https://www.carwow.de/neuwagen-lieferzeiten#gref").text
soup = BeautifulSoup(response, 'html.parser')

for tbody in soup.select('tbody'):
    for tr in tbody:
        brand = tr.select('td > a')[0].get('href').split('/')[3].capitalize()
        model = tr.select('td > a')[0].get('href').split('/')[4].capitalize()
        monate = tr.select('td')[1].getText(strip=True)
        print(f'{brand}, {model}, {monate}')

赞(0）回复(0）举报 2022-10-22

我来回答

如何使用python在＜tr＞括号中抓取多个项目并将其拆分为3个变量？

2条答案

相关问题

热门标签

最新问答