我试着在https://twitchtracker.com/games/32982上抓取查看者图表。我试着检查页面,但是我似乎找不到图表的值。
我试着检查元素,但我得到的只是这个。
dauxcl2d1#
下面是一个示例,说明如何使用scrapy-playwright来提取所需的数据:
import scrapyheaders = { 'authority': 'api.twitch.tv', 'accept': '*/*', 'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8', 'authorization': 'Bearer a4xqvkgns8duyuudnnr3pmo5mjxl36', 'client-id': '2eji4dzts0ppwwpprxwnao9d27sfwz', 'origin': 'https://twitchtracker.com', 'referer': 'https://twitchtracker.com/', 'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"', 'sec-ch-ua-mobile': '?0', 'sec-ch-ua-platform': '"macOS"', 'sec-fetch-dest': 'empty', 'sec-fetch-mode': 'cors', 'sec-fetch-site': 'cross-site', 'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',}class twitchSpider(scrapy.Spider): name = 'twitch' start_urls = ['https://twitchtracker.com/games/32982'] def start_requests(self): for url in self.start_urls: yield scrapy.Request( url, headers=headers, callback = self.parse, meta = { 'playwright':True } ) def parse(self,response): from collections import defaultdict values = response.xpath("(.//*[name()='g'][@class='highcharts-series-group'])[1]//*[name()='g']//*//@d").getall() class_values = response.xpath("(.//*[name()='g'][@class='highcharts-series-group'])[1]//*[name()='g']//*//@class").getall() chart_dict = defaultdict(list) for charts, names in zip(values, class_values): #chart_dict[names].append(charts) chart_dict['big'].append(charts.split(' ')[1::3]) chart_dict['small'].append(charts.split(' ')[2::3]) chart_dict['classes'].append(names) yield chart_dict
import scrapy
headers = {
'authority': 'api.twitch.tv',
'accept': '*/*',
'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
'authorization': 'Bearer a4xqvkgns8duyuudnnr3pmo5mjxl36',
'client-id': '2eji4dzts0ppwwpprxwnao9d27sfwz',
'origin': 'https://twitchtracker.com',
'referer': 'https://twitchtracker.com/',
'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"macOS"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
class twitchSpider(scrapy.Spider):
name = 'twitch'
start_urls = ['https://twitchtracker.com/games/32982']
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url,
headers=headers,
callback = self.parse,
meta = {
'playwright':True
)
def parse(self,response):
from collections import defaultdict
values = response.xpath("(.//*[name()='g'][@class='highcharts-series-group'])[1]//*[name()='g']//*//@d").getall()
class_values = response.xpath("(.//*[name()='g'][@class='highcharts-series-group'])[1]//*[name()='g']//*//@class").getall()
chart_dict = defaultdict(list)
for charts, names in zip(values, class_values):
#chart_dict[names].append(charts)
chart_dict['big'].append(charts.split(' ')[1::3])
chart_dict['small'].append(charts.split(' ')[2::3])
chart_dict['classes'].append(names)
yield chart_dict
当我们得到重叠的折线图时,这将抓取不同系列的图表数据。然后我们提取big或small表示的两个值,剩下的可以通过绘制正确的值来完成。
big
small
1条答案
按热度按时间dauxcl2d1#
下面是一个示例,说明如何使用scrapy-playwright来提取所需的数据:
当我们得到重叠的折线图时,这将抓取不同系列的图表数据。
然后我们提取
big
或small
表示的两个值,剩下的可以通过绘制正确的值来完成。