scrapy 如何在Fangraphs中从表中提取数据

cbjzeqam  于 12个月前  发布在  其他
关注(0)|答案(2)|浏览(66)

我正在尝试使用Scrapy从这个table中提取玩家名称
我试过这个

rows = response.css('table tr.is-selected_invalid')

这只是为了得到行,但这甚至不工作。我还尝试使用td.alight-left.fixed直接从表中获取数据,但这也无济于事。我很感激任何帮助。

ijxebb2r

ijxebb2r1#

使用此选择器:

div.table-scroll table tbody tr.is-selected__invalid

注意:“invalid”前有两个下划线。

0vvn1miw

0vvn1miw2#

你试图抓取的表格是通过JavaScript呈现的,并且不存在于你从你的请求返回到你在问题中引用的url的html中。
正因为如此,没有选择器允许您通过对该URL的请求访问数据。
你也可以做的是解构所发出的API请求,并使用你在浏览器开发工具的网络选项卡中看到的请求将整个表作为JSON对象接收回来。
举例来说:

import scrapy
import json

body ={
    "strPlayerId":"all",
    "strSplitArr":[],
    "strGroup":"season",
    "strPosition":"B",
    "strType":"1",
    "strStartDate":"2023-03-01",
    "strEndDate":"2023-11-01",
    "strSplitTeams":False,
    "dctFilters":[],
    "strStatType":"player",
    "strAutoPt":True,
    "arrPlayerId":[],
    "strSplitArrPitch":[],
    "arrWxTemperature":None,
    "arrWxPressure":None,
    "arrWxAirDensity":None,
    "arrWxElevation":None,
    "arrWxWindSpeed":None
}
headers = {
    "Content-Type":'application/json'
}

class TableSpider(scrapy.Spider):
    name = "table"
    def start_requests(self):
        yield scrapy.Request("https://www.fangraphs.com/api/leaders/splits/splits-leaders", method="POST", body=json.dumps(body), headers=headers)

    def parse(self, response):
        yield response.json()

部分输出

2023-09-14 19:04:30 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://www.fangraphs.com/api/leaders/splits/splits-leaders> (referer: https://www.fangraphs.com/leaders/splits-leaderboards?splitArr=&spl
itArrPitch=&position=B&autoPt=true&splitTeams=false&statType=player&statgroup=1&startDate=2023-03-01&endDate=2023-11-01&players=&filter=&groupBy=season&wxTemperature=&wxPressure=&wxAirDensity=&wxElevation=&
wxWindSpeed=&sort=22,1)
2023-09-14 19:04:31 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fangraphs.com/api/leaders/splits/splits-leaders>
{'k': ['Season', 'playerName', 'TeamNameAbb', 'G', 'PA', 'AB', 'H', '1B', '2B', '3B', 'HR', 'R', 'RBI', 'BB', 'IBB', 'SO', 'HBP', 'SF', 'SH', 'GDP', 'SB', 'CS', 'AVG', 'playerid'], 'v': [['2023', 'Miguel Ca
brera', 'DET', '86', '320', '290', '73', '54', '16', '0', '3', '17', '29', '27', '0', '66', '1', '2', '0', '11', '0', '0', '0.25172414', '1744'], ['2023', 'David Peralta', 'LAD', '119', '378', '351', '94',
'66', '20', '1', '7', '42', '51', '19', '1', '65', '2', '6', '0', '7', '4', '1', '0.26780627', '2136'], ['2023', 'Carlos Santana', '2 Tms', '132', '559', '497', '116', '68', '29', '0', '19', '69', '73', '59
', '0', '98', '0', '3', '0', '6', '6', '0', '0.2334004', '2396'], ['2023', 'Tommy Pham', '2 Tms', '112', '420', '376', '100', '56', '26', '2', '16', '49', '62', '38', '1', '94', '2', '4', '0', '10', '20', '
3', '0.26595745', '2967'], ['2023', 'Anthony Rizzo', 'NYY', '99', '421', '373', '91', '65', '14', '0', '12', '45', '41', '35', '1', '97', '12', '1', '0', '10', '0', '3', '0.24396783', '3473'], ['2023', 'Mik
e Moustakas', '2 Tms', '107', '370', '337', '85', '58', '15', '0', '12', '43', '47', '23', '1', '90', '3', '7', '0', '7', '0', '0', '0.25222552', '4892'], ['2023', 'Jason Heyward', 'LAD', '108', '325', '283
', '76', '44', '18', '0', '14', '53', '38', '33', '1', '56', '3', '3', '0', '6', '2', '2', '0.26855124', '4940'], ['2023', 'Giancarlo Stanton', 'NYY', '91', '373', '333', '66', '32', '11', '0', '23', '42',
'57', '36', '3', '106', '2', '1', '0', '9', '0', '0', '0.1981982', '4949'], ['2023', 'Justin Turner', 'BOS', '132', '570', '505', '143', '92', '28', '0', '23', '84', '94', '49', '0', '96', '10', '6', '0', '
10', '4', '0', '0.28316832', '5235'], ['2023', 'Robbie Grossman', 'TEX', '102', '386', '328', '79', '45', '23', '1', '10', '56', '48', '48', '0', '90', '2', '8', '0', '7', '0', '0', '0.24085366', '5254'], [
'2023', 'Aaron Hicks', '2 Tms', '78', '267', '234', '61', '44', '8', '1', '8', '40', '34', '32', '1', '60', '0', '1', '0', '2', '3', '0', '0.26068376', '5297'], ........., 'pt': 260, 'auto': 'True', 'dev': 'BIS'}
2023-09-14 19:04:31 [scrapy.core.engine] INFO: Closing spider (finished)

相关问题