我想从Vivino.com收集数据。我想收集所有关于葡萄酒的信息(名称,国家,评级,描述,价格等)和葡萄酒的评论。
import requests
import pandas as pd
r = requests.get(
"https://www.vivino.com/api/explore/explore",
params = {
"country_code": "FR",
"country_codes[]":"pt",
"currency_code":"EUR",
"grape_filter":"varietal",
"min_rating":"1",
"order_by":"price",
"order":"asc",
"page": 1,
"price_range_max":"500",
"price_range_min":"0",
"wine_type_ids[]":"1"
},
headers= {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0"
}
)
results = [
(
t["vintage"]["wine"]["winery"]["name"],
f'{t["vintage"]["wine"]["name"]} {t["vintage"]["year"]}',
t["vintage"]["statistics"]["ratings_average"],
t["vintage"]["statistics"]["ratings_count"]
)
for t in r.json()["explore_vintage"]["matches"]
]
dataframe = pd.DataFrame(results,columns=['Winery','Wine','Rating','num_review'])
print(dataframe)
有了这段代码,我可以收集['Winery' 'Wine' 'Rating' 'num_review']
下面的代码,我可以收集评论:
import re
import json
import requests
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0",
}
url = "https://www.vivino.com/FR/en/dauprat-pauillac/w/3823873?year=2017&price_id=24797287"
api_url = (
"https://www.vivino.com/api/wines/{id}/reviews?per_page=9999&year={year}"
) # <-- increased the number of reviews to 9999
id_ = re.search(r"/(\d{5,})", url).group(1)
year = re.search(r"year=(\d+)", url).group(1)
data = requests.get(api_url.format(id=id_, year=year), headers=headers).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for r in data["reviews"]:
print(r["note"])
print("-" * 80)
有没有人能帮我一下,我怎么才能把所有这些信息结合起来?那么,所有的葡萄酒信息包括相应的评论?
1条答案
按热度按时间jyztefdp1#
要从第一个 Dataframe 中获取关于葡萄酒的所有评论,您可以使用下一个示例:
创建
data.csv
(~ 40 k评论)(LibreOffice的屏幕截图):