pandas 网页抓取值错误BeautifulSoup

roejwanj  于 2022-11-27  发布在  其他
关注(0)|答案(1)|浏览(149)

所以我正在抓取一个[website][1],我想根据这些URL检索网页,并将每个网页转换为一个beautifulsoup对象
检索汽车制造年份、发动机、价格、经销商信息(如果可用)以及用于访问详细汽车信息的URL(href)。
当我运行代码时,我得到错误“ValueError:not enough values to unpack(expected 4,got 3)”当我删除一个值而不是make、model、year和price时,我将其更改为make、model和price,并出现另一个错误“too many values to unpack(expected 3)”

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://jammer.ie/used-cars?page={}&per-page=12"

all_data = []

for page in range(1, 3):  # <-- increase number of pages here
    soup = BeautifulSoup(requests.get(url.format(page)).text, "html.parser")

    for car in soup.select(".car"):
        info = car.select_one(".top-info").get_text(strip=True, separator="|")
        make, model, year, price = info.split("|")
        dealer_name = car.select_one(".dealer-name h6").get_text(
            strip=True, separator=" "
        )
        address = car.select_one(".address").get_text(strip=True)

        features = {}
        for feature in car.select(".car--features li"):
            k = feature.img["src"].split("/")[-1].split(".")[0]
            v = feature.span.text
            features[f"feature_{k}"] = v

        all_data.append(
            {
                "make": make,
                "model": model,
                "year": year,
                "price": price,
                "dealer_name": dealer_name,
                "address": address,
                "url": "https://jammer.ie"
                + car.select_one("a[href*=vehicle]")["href"],
                **features,
            }
        )

df = pd.DataFrame(all_data)
# prints sample data to screen:
print(df.tail().to_markdown(index=False))
# saves all data to CSV
df.to_csv('data.csv', index=False)
9jyewag0

9jyewag01#

您可以检查汽车是否包含型号:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://jammer.ie/used-cars?page={}&per-page=12"

all_data = []

for page in range(1, 3):  # <-- increase number of pages here
    soup = BeautifulSoup(requests.get(url.format(page)).text, "html.parser")

    for car in soup.select(".car"):
        info = car.select_one(".top-info").get_text(strip=True, separator="|")
        info = info.split("|")
        if len(info) == 4:
            make, model, year, price = info
        else:
            make, year, price = info
            model = "N/A"
        dealer_name = car.select_one(".dealer-name h6").get_text(
            strip=True, separator=" "
        )
        address = car.select_one(".address").get_text(strip=True)

        features = {}
        for feature in car.select(".car--features li"):
            k = feature.img["src"].split("/")[-1].split(".")[0]
            v = feature.span.text
            features[f"feature_{k}"] = v

        all_data.append(
            {
                "make": make,
                "model": model,
                "year": year,
                "price": price,
                "dealer_name": dealer_name,
                "address": address,
                "url": "https://jammer.ie"
                + car.select_one("a[href*=vehicle]")["href"],
                **features,
            }
        )

df = pd.DataFrame(all_data)
# prints sample data to screen:
print(df.tail().to_markdown(index=False))
# saves all data to CSV
df.to_csv("data.csv", index=False)

印刷品:
| 制作|模型化|年份|标价|经销商名称|位址|网址|特征_速度|功能引擎|特征_传输|功能_门图标1|功能_汽油5|特征_掀背式|功能所有者|特征绘制|
| - -| - -| - -| - -| - -| - -| - -| - -| - -| - -| - -| - -| - -| - -| - -|
| 座椅|莱昂|二〇一五年|申请价格|麦克纳马拉汽车公司|科克公司|https://jammer.ie/vehicle/166591-seat-leon-2015|四万五千英里|1.2升|手动操作|5门|汽油|掀背车|2名前业主|灰色|
| 丰田汽车|韦尔索|二〇一二年|8,250欧元|美国化学会|都柏林公司|https://jammer.ie/vehicle/166590-toyota-verso-2012|98179英里|1.5升|全自动|4门|汽油|多用途车|楠|紫色|
| 马自达|德米奥|二〇一二年|7,950欧元|美国化学会|都柏林公司|https://jammer.ie/vehicle/166589-mazda-demio-2012|82644英里|1.3升|全自动|4门|汽油|掀背车|楠|红色|
| 丰田汽车|花冠|二〇一七年|14,950欧元|美国化学会|都柏林公司|https://jammer.ie/vehicle/166588-toyota-corolla-2017|78916英里|1.5升|全自动|4门|楠|房地产|楠|银色|
| 马自达|德米奥|二〇一三年|8,950欧元|美国化学会|都柏林公司|https://jammer.ie/vehicle/166587-mazda-demio-2013|53439英里|1.3升|全自动|4门|汽油|掀背车|楠|灰色|

相关问题