python 提取Yelp评分评论

xmjla07d  于 2023-03-16  发布在  Python
关注(0)|答案(1)|浏览(161)

我对网络刮取非常陌生,最近开始在Yelp上刮取一家餐馆。我在提取用户的评论评级时遇到了麻烦,每次尝试,似乎都没有匹配项。

rating = review.find('img', attrs={'class': 'offscreen__373c0__1KofL'}).get('alt')
rating_list = re.findall('\d+', rating)

if len(rating_list) > 0:
   rating_float = float(rating_list[0])
   print(rating_float)
else:
   print("No matches have been found")

是我的"review.find('img', attrs={'class': 'offscreen__373c0__1KofL'}).get('alt')"错了吗?
我已经添加了 selenium 作为评论指出,但它仍然是来短。
URL链接:https://www.yelp.com/biz/el-farolito-san-francisco-2?osq=Mexican%20Food
网址等级:src=”https://s3-media0.fl.yelpcdn.com/assets/public/stars_v2.yji-52d3d7a328db670d4402843cbddeed89.png“
还有别的建议吗?
谢谢你。

62lalag4

62lalag41#

您可以尝试使用第三方API,如SerpApi的Yelp Reviews API,这是一个免费的付费API。它将绕过来自Google的屏蔽(包括验证码),无需创建解析器和维护它。
在API的帮助下,我们可以进行分页搜索以获得所有可能的结果。
为此,我们首先需要找到ID的餐厅列表:

params = {
  "engine": "yelp",                      # SerpApi search engine
  "find_desc": "Mexican Food",           # query
  "find_loc": "San Francisco, CA, USA",  # location of serach
  "api_key": "...",                      # serpapi key, https://serpapi.com/manage-api-key
  "start": 0                             # pagination
}

search = GoogleSearch(params)         # where data extraction happens on the backend
results = search.get_dict()           # JSON -> Python dict

# example from the first page only
organic_results_data = [
    (result['title'], result['place_ids'][0]) 
    for result in results['organic_results']
]

然后,我们使用“place_ids”参数遍历所有餐厅,并使用while循环提取所有评论:

for title, place_id in organic_results_data:
    reviews_params = {
        'api_key': "...",                   # serpapi key, https://serpapi.com/manage-api-key
        'engine': 'yelp_reviews',           # SerpApi search engine 
        'place_id': place_id,               # Yelp ID of a place
        'start': 0                          # pagination
    }
    
    reviews_search = GoogleSearch(reviews_params)
    
    reviews_page_limit = 20                 # 2 pages

    reviews = []
    
    # pagination
    while True:
        new_reviews_page_results = reviews_search.get_dict()

        if 'error' in new_reviews_page_results:
            break
        
        reviews.extend(new_reviews_page_results['reviews'])
        
        reviews_params['start'] += 10
        
        # stop reviews pagination after 2 pages of reviews
        if reviews_params['start'] == reviews_page_limit:
            break

在联机IDE中检查完整代码。

from serpapi import GoogleSearch
import os, json

params = {
  "engine": "yelp",                      # SerpApi search engine
  "find_desc": "Mexican Food",           # query
  "find_loc": "San Francisco, CA, USA",  # location of serach
  "api_key": "...",                      # serpapi key, https://serpapi.com/manage-api-key
  "start": 0                             # pagination
}

search = GoogleSearch(params)         # where data extraction happens on the backend
results = search.get_dict()           # JSON -> Python dict

organic_results_data = [
    (result['title'], result['place_ids'][0]) 
    for result in results['organic_results']
]

yelp_reviews = []

for title, place_id in organic_results_data:
    reviews_params = {
        # https://docs.python.org/3/library/os.html#os.getenv
        'api_key': '...',                   # your serpapi api
        'engine': 'yelp_reviews',           # SerpApi search engine 
        'place_id': place_id,               # Yelp ID of a place
        'start': 0                          # pagination
    }
    
    reviews_search = GoogleSearch(reviews_params)
    reviews_page_limit = 20                 # 2 pages
    reviews = []
    
    # pagination
    while True:
        new_reviews_page_results = reviews_search.get_dict()

        if 'error' in new_reviews_page_results:
            break
        
        reviews.extend(new_reviews_page_results['reviews'])
        
        reviews_params['start'] += 10
        
        # stop reviews pagination after 2 pages of reviews
        if reviews_params['start'] == reviews_page_limit:
            break
            
        yelp_reviews.append({
            'title': title,
            'reviews': reviews
        })
print(json.dumps(yelp_reviews, indent=2, ensure_ascii=False))

输出示例:

[
    {
        "user": {
          "name": "Cindy H.",
          "user_id": "WwhMBjrtxbwLCdZCZPx8rA",
          "link": "https://www.yelp.com/user_details?userid=WwhMBjrtxbwLCdZCZPx8rA",
          "thumbnail": "https://s3-media0.fl.yelpcdn.com/photo/28KGXzhU1YE1vjHDCcKGwg/60s.jpg",
          "address": "Elk Grove, CA",
          "friends": 52,
          "photos": 392,
          "reviews": 227
        },
        "comment": {
          "text": "By far my favorite fast food Mexican spot! My boyfriend and I just finished a concert it's about to be midnight and of course I've been told by many to try this place and now I only wish this place was next door to where I live. We shared a Quesadilla Suiza with Al Pastor and a Carne Asada Super Burrito. Both were huge portions so perfect for sharing and super mouthwatering good. If we had to pick our favorite it would be the Quesadilla Suiza, you have meat choice but Al Pastor for the win!",
          "language": "en"
        },
        "date": "3/30/2019",
        "rating": 5,
        "tags": [
          "1 photo"
        ],
        "photos": [
          {
            "link": "https://s3-media0.fl.yelpcdn.com/bphoto/6XcZ9jT8GhchV2_hC7GqeA/o.jpg",
            "caption": "Quesadilla Suiza with Al Pastor",
            "uploaded": "March 30, 2019"
          }
        ]
      },
      {
        "user": {
          "name": "Jessica L.",
          "user_id": "6b_AT0JndXFB9xDd31ilBA",
          "link": "https://www.yelp.com/user_details?userid=6b_AT0JndXFB9xDd31ilBA",
          "thumbnail": "https://s3-media0.fl.yelpcdn.com/photo/1u31AhXlc5eSSP5-ufEOeQ/60s.jpg",
          "address": "Los Altos, CA",
          "friends": 175,
          "photos": 705,
          "reviews": 84
        },
        "comment": {
          "text": "Personally I'm a bigger fan of la taqueria, but this place is definitely a close second.",
          "language": "en"
        },
        "date": "1/20/2019",
        "rating": 4,
        "tags": [
          "3 photos"
        ],
        "photos": [
          {
            "link": "https://s3-media0.fl.yelpcdn.com/bphoto/Qs_2Q838VmTTUAoqjL4dGA/o.jpg",
            "uploaded": "January 19, 2019"
          },
          {
            "link": "https://s3-media0.fl.yelpcdn.com/bphoto/JQKGPayP0XbZNJ67zBqRfA/o.jpg",
            "uploaded": "January 19, 2019"
          },
          {
            "link": "https://s3-media0.fl.yelpcdn.com/bphoto/AaSiy3bJf9SIIM5svx_XBw/o.jpg",
            "uploaded": "January 19, 2019"
          }
        ]
      },
    
    other results ...
]

有一篇Scrape Yelp Reviews Results with SerpApi and Python的博客文章,你可以从中得到更多的代码解释。
免责声明我为SerpApi工作。

相关问题