csv 抓取Trustpilot评论的Python脚本问题

nc1teljy  于 2023-11-14  发布在  Python
关注(0)|答案(1)|浏览(151)

我是新的python和编码一般,但试图创建一个脚本来拉客户评论从Trustpilot.我想我有一些工作,并在谷歌Bard测试它.我可以让Bard返回结果,但当我在我的Mac上运行相同的脚本PyCharm CE,它创建一个.csv文件与正确的标题,但没有数据.
我肯定我错过了一些明显的东西。为什么谷歌吟游诗人可以运行脚本并返回结果,但当我在我的机器上运行它时,我只得到csv文件中的头文件?
任何帮助都将不胜感激。当我在本地运行它时,我没有错误。我已经安装了python 3.12和所有必需的模块。
谢谢.....贾斯汀

from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import csv
import datetime

# Create a new Selenium webdriver instance
driver = webdriver.Chrome()

# Navigate to the given page
driver.get("https://uk.trustpilot.com/review/www.whsmith.co.uk")

# Wait for the page to load
driver.implicitly_wait(10)

# Get the HTML source code of the page
html = driver.page_source

# Create a BeautifulSoup object from the HTML source code
soup = BeautifulSoup(html, "html.parser")

# Extract all of the reviews from the page
reviews = soup.findAll("div", class_="review")

# Create a new CSV file to store the reviews
with open("whsmith_reviews.csv", "w", newline="") as f:
    writer = csv.writer(f)

    # Write the header row
    writer.writerow(["Review Title", "Review Text", "Rating", "Review Date"])

    # Iterate over the reviews and write them to the CSV file
    for review in reviews:
        title = review.find("h2", class_="review-title").text
        text = review.find("p", class_="review-text").text
        rating = review.find("span", class_="review-rating").text
        date_str = review.find("span", class_="review-date").text
        date = datetime.datetime.strptime(date_str, "%d %b %Y")

        # Add the review to the CSV file
        writer.writerow([title, text, rating, date])

# Close the Selenium webdriver instance
driver.quit()

字符串

sqougxex

sqougxex1#

主要的问题是你的选择的评论有没有这样的div与类review,可能会集中在articles

soup.select('article'):

字符串

  • 在较新的代码中,避免使用旧的语法findAll(),而是使用find_all()select()css selectors-更多信息请花一分钟时间查看文档 *

在这种情况下也不需要 selenium ,只需看看:

from bs4 import BeautifulSoup
import requests, csv

data = []

from_page = 1
to_page = 5

for i in range(from_page, to_page + 1):
    response = requests.get(f"https://uk.trustpilot.com/review/www.whsmith.co.uk")
    web_page = response.text
    soup = BeautifulSoup(web_page, "html.parser")

    for e in soup.select('article'):
        data.append({
            'review_title':e.h2.text,
            'review_date_original': e.select_one('[data-service-review-date-of-experience-typography]').text.split(': ')[-1],
            'review_rating':e.select_one('[data-service-review-rating] img').get('alt'),
            'review_text': e.select_one('[data-service-review-text-typography]').text if e.select_one('[data-service-review-text-typography]') else None,
            'page_number':i
        })


with open('zzz_my_result.csv', 'w', newline='') as output_file:
    dict_writer = csv.DictWriter(output_file, data[0].keys())
    dict_writer.writeheader()
    dict_writer.writerows(data)

相关问题