python 如何修复代码刮Zomato网站?

r6l8ljro  于 2022-12-10  发布在  Python
关注(0)|答案(3)|浏览(130)

我编写了这段代码,但得到了错误“IndexError:列表索引超出范围”。请问,我如何解决这个问题?

import requests
    from bs4 import BeautifulSoup

    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, 
                                           like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
    response = requests.get("https://www.zomato.com/bangalore/top-restaurants",headers=headers)

    content = response.content
    soup = BeautifulSoup(content,"html.parser")

    top_rest = soup.find_all("div",attrs={"class": "sc-bblaLu dOXFUL"})
    list_tr = top_rest[0].find_all("div",attrs={"class": "sc-gTAwTn cKXlHE"})

list_rest =[]
for tr in list_tr:
    dataframe ={}
    dataframe["rest_name"] = (tr.find("div",attrs={"class": "res_title zblack bold nowrap"})).text.replace('\n', ' ')
    dataframe["rest_address"] = (tr.find("div",attrs={"class": "nowrap grey-text fontsize5 ttupper"})).text.replace('\n', ' ')
    dataframe["cuisine_type"] = (tr.find("div",attrs={"class":"nowrap grey-text"})).text.replace('\n', ' ')
    list_rest.append(dataframe)
list_rest
avwztpqn

avwztpqn1#

您收到这个错误是因为当您尝试取得top_rest的第一个元素**“top_rest[0]"**时,top_rest是空的。发生这个错误的原因是因为您尝试参照的第一个类别是动态命名的。您会注意到,如果您重新整理页面,该div的相同位置将不会以相同的名称命名。所以当您尝试抓取时,会得到空的结果。
另一种方法是抓取所有div,然后缩小到您想要的元素,注意动态div命名模式,这样从一个请求到另一个请求,您将得到不同的结果:

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
response = requests.get("https://www.zomato.com/bangalore/top-restaurants",headers=headers)

content = response.content
soup = BeautifulSoup(content,"html.parser")

top_rest = soup.find_all("div")
list_tr = top_rest[0].find_all("div",attrs={"class": "bke1zw-1 eMsYsc"})
list_tr
jslywgbw

jslywgbw2#

我最近做了一个项目,让我研究抓取Zomato的网站在马尼拉,菲律宾.我用Geolibrary得到马尼拉市的经度和纬度值,然后抓取餐厅的详细信息使用这些信息.你可以在zomato网站上获得自己的API密钥,一天最多打1000个电话。

# Use geopy library to get the latitude and longitude values of Manila City.
from geopy.geocoders import Nominatim

address = 'Manila City, Philippines'
geolocator = Nominatim(user_agent = 'Makati_explorer')
location = geolocator.geocode(address)
latitude = location.lenter code hereatitude
longitude = location.longitude
print('The geographical coordinate of Makati City are {}, {}.'.format(latitude, longitude))

# Use Zomato's API to make call
headers = {'user-key': '617e6e315c6ec2ad5234e884957bfa4d'}
venues_information = []

for index, row in foursquare_venues.iterrows():
    print("Fetching data for venue: {}".format(index + 1))
    venue = []
    url = ('https://developers.zomato.com/api/v2.1/search?q={}' + 
          '&start=0&count=1&lat={}&lon={}&sort=real_distance').format(row['name'], row['lat'], row['lng'])
    try:
        result = requests.get(url, headers = headers).json()
    except:
        print("There was an error...")
    try:

        if (len(result['restaurants']) > 0):
            venue.append(result['restaurants'][0]['restaurant']['name'])
            venue.append(result['restaurants'][0]['restaurant']['location']['latitude'])
            venue.append(result['restaurants'][0]['restaurant']['location']['longitude'])
            venue.append(result['restaurants'][0]['restaurant']['average_cost_for_two'])
            venue.append(result['restaurants'][0]['restaurant']['price_range'])
            venue.append(result['restaurants'][0]['restaurant']['user_rating']['aggregate_rating'])
            venue.append(result['restaurants'][0]['restaurant']['location']['address'])
            venues_information.append(venue)
        else:
            venues_information.append(np.zeros(6))
    except:
        pass

ZomatoVenues = pd.DataFrame(venues_information, 
                                  columns = ['venue', 'latitude', 
                                             'longitude', 'price_for_two', 
                                             'price_range', 'rating', 'address'])
rryofs0p

rryofs0p3#

使用Web Scraping Language,我可以写出以下代码:

GOTO https://www.zomato.com/bangalore/top-restaurants
EXTRACT {'rest_name': '//div[@class="res_title zblack bold nowrap"]', 
         'rest_address': '//div[@class="nowrap grey-text fontsize5 ttupper', 
         'cusine_type': '//div[@class="nowrap grey-text"]'} IN //div[@class="bke1zw-1 eMsYsc"]

这将迭代类为bke1zw-1 eMsYsc的每个记录元素,并提取每个餐馆的信息。

相关问题