我正在尝试刮https://howlongtobeat.com,但我不断得到302重定向。我发现网站正在使用 AJAX 从网络监视器。
我的代码:
class HltbSpider(scrapy.Spider):
name = 'hltb'
def start_requests(self):
for i in list(range(1,2)):
url = f'https://howlongtobeat.com/search_results?page={i}'
payload = "queryString=&t=games&sorthead=popular&sortd=0&plat=&length_type=main&length_min=&length_max=&v=&f=&g=&detail=&randomize=0"
headers = {
"content-type":"application/x-www-form-urlencoded",
"user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Mobile Safari/537.36"
}
yield scrapy.Request(url, meta = {'dont_redirect': True,'handle_httpstatus_list': [302]}, method='POST', body=payload, headers=headers, callback=self.parse)
def parse(self, response):
cards = response.css('div[class="search_list_details"]')
for card in cards:
game_name = card.css('a[class=text_white]::attr(title)').get()
game_dict = {"Game_name":game_name}
yield game_dict
它以前是工作的,当所有的突然停止工作,我一直得到302重定向。什么似乎是问题?
1条答案
按热度按时间rsl1atfo1#
请尝试设置对请求标头的引用:
"referer": "https://howlongtobeat.com/"