所以我在抓取一个网站,代码给了我所有我想要的信息,但是当抓取的时候,它也给了我一个带有价格的“€”符号。所以我想把价格作为一个整数,去掉“€”符号,这样我就可以计算出每年的平均汽车价格。它给了我ValueError:无效的int()文字,基数为10:'price'但是当我试着看这个网站上的其他问题的答案时,答案对我不起作用。Year也是一个字符串,所以把year也转换成int有意义吗?这样我就可以做方程了。
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://jammer.ie/used-cars?page={}&per-page=12"
all_data = []
for page in range(1, 4): # <-- increase number of pages here
soup = BeautifulSoup(requests.get(url.format(page)).text, "html.parser")
for car in soup.select(".car"):
info = car.select_one(".top-info").get_text(strip=True, separator="|")
info = info.split("|")
if len(info) == 4:
make, model, year, price = info
else:
make, year, price = info
model = "N/A"
dealer_name = car.select_one(".dealer-name h6").get_text(
strip=True, separator=" "
)
address = car.select_one(".address").get_text(strip=True)
features = {}
for feature in car.select(".car--features li"):
k = feature.img["src"].split("/")[-1].split(".")[0]
v = feature.span.text
features[f"feature_{k}"] = v
all_data.append(
{
"make": make,
"model": model,
"year": year,
"price": price,
"dealer_name": dealer_name,
"address": address,
"url": "https://jammer.ie"
+ car.select_one("a[href*=vehicle]")["href"],
**features,
}
)
df = pd.DataFrame(all_data)
# prints sample data to screen:
print(df.tail().to_markdown(index=False))
# saves all data to CSV
df.to_csv("data.csv", index=False)
我厌倦了使用
df = pd.read_csv('data.csv', usecols= ['price','year'])
print(type("price"))
print(int("price"))
但这对我来说不起作用。我也厌倦了把它转换成一个浮动,这也不起作用。
2条答案
按热度按时间svmlkihl1#
当你有数据在PandasDataFrame你可以做:
印刷品:
xfyts7mz2#
You can define a custom function for that and apply it on new/existing column, like so: