我正在尝试使用Scrapy从这个table中提取玩家名称我试过这个
rows = response.css('table tr.is-selected_invalid')
这只是为了得到行,但这甚至不工作。我还尝试使用td.alight-left.fixed直接从表中获取数据,但这也无济于事。我很感激任何帮助。
ijxebb2r1#
使用此选择器:
div.table-scroll table tbody tr.is-selected__invalid
注意:“invalid”前有两个下划线。
0vvn1miw2#
你试图抓取的表格是通过JavaScript呈现的,并且不存在于你从你的请求返回到你在问题中引用的url的html中。正因为如此,没有选择器允许您通过对该URL的请求访问数据。你也可以做的是解构所发出的API请求,并使用你在浏览器开发工具的网络选项卡中看到的请求将整个表作为JSON对象接收回来。举例来说:
import scrapy import json body ={ "strPlayerId":"all", "strSplitArr":[], "strGroup":"season", "strPosition":"B", "strType":"1", "strStartDate":"2023-03-01", "strEndDate":"2023-11-01", "strSplitTeams":False, "dctFilters":[], "strStatType":"player", "strAutoPt":True, "arrPlayerId":[], "strSplitArrPitch":[], "arrWxTemperature":None, "arrWxPressure":None, "arrWxAirDensity":None, "arrWxElevation":None, "arrWxWindSpeed":None } headers = { "Content-Type":'application/json' } class TableSpider(scrapy.Spider): name = "table" def start_requests(self): yield scrapy.Request("https://www.fangraphs.com/api/leaders/splits/splits-leaders", method="POST", body=json.dumps(body), headers=headers) def parse(self, response): yield response.json()
部分输出
2023-09-14 19:04:30 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://www.fangraphs.com/api/leaders/splits/splits-leaders> (referer: https://www.fangraphs.com/leaders/splits-leaderboards?splitArr=&spl itArrPitch=&position=B&autoPt=true&splitTeams=false&statType=player&statgroup=1&startDate=2023-03-01&endDate=2023-11-01&players=&filter=&groupBy=season&wxTemperature=&wxPressure=&wxAirDensity=&wxElevation=& wxWindSpeed=&sort=22,1) 2023-09-14 19:04:31 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fangraphs.com/api/leaders/splits/splits-leaders> {'k': ['Season', 'playerName', 'TeamNameAbb', 'G', 'PA', 'AB', 'H', '1B', '2B', '3B', 'HR', 'R', 'RBI', 'BB', 'IBB', 'SO', 'HBP', 'SF', 'SH', 'GDP', 'SB', 'CS', 'AVG', 'playerid'], 'v': [['2023', 'Miguel Ca brera', 'DET', '86', '320', '290', '73', '54', '16', '0', '3', '17', '29', '27', '0', '66', '1', '2', '0', '11', '0', '0', '0.25172414', '1744'], ['2023', 'David Peralta', 'LAD', '119', '378', '351', '94', '66', '20', '1', '7', '42', '51', '19', '1', '65', '2', '6', '0', '7', '4', '1', '0.26780627', '2136'], ['2023', 'Carlos Santana', '2 Tms', '132', '559', '497', '116', '68', '29', '0', '19', '69', '73', '59 ', '0', '98', '0', '3', '0', '6', '6', '0', '0.2334004', '2396'], ['2023', 'Tommy Pham', '2 Tms', '112', '420', '376', '100', '56', '26', '2', '16', '49', '62', '38', '1', '94', '2', '4', '0', '10', '20', ' 3', '0.26595745', '2967'], ['2023', 'Anthony Rizzo', 'NYY', '99', '421', '373', '91', '65', '14', '0', '12', '45', '41', '35', '1', '97', '12', '1', '0', '10', '0', '3', '0.24396783', '3473'], ['2023', 'Mik e Moustakas', '2 Tms', '107', '370', '337', '85', '58', '15', '0', '12', '43', '47', '23', '1', '90', '3', '7', '0', '7', '0', '0', '0.25222552', '4892'], ['2023', 'Jason Heyward', 'LAD', '108', '325', '283 ', '76', '44', '18', '0', '14', '53', '38', '33', '1', '56', '3', '3', '0', '6', '2', '2', '0.26855124', '4940'], ['2023', 'Giancarlo Stanton', 'NYY', '91', '373', '333', '66', '32', '11', '0', '23', '42', '57', '36', '3', '106', '2', '1', '0', '9', '0', '0', '0.1981982', '4949'], ['2023', 'Justin Turner', 'BOS', '132', '570', '505', '143', '92', '28', '0', '23', '84', '94', '49', '0', '96', '10', '6', '0', ' 10', '4', '0', '0.28316832', '5235'], ['2023', 'Robbie Grossman', 'TEX', '102', '386', '328', '79', '45', '23', '1', '10', '56', '48', '48', '0', '90', '2', '8', '0', '7', '0', '0', '0.24085366', '5254'], [ '2023', 'Aaron Hicks', '2 Tms', '78', '267', '234', '61', '44', '8', '1', '8', '40', '34', '32', '1', '60', '0', '1', '0', '2', '3', '0', '0.26068376', '5297'], ........., 'pt': 260, 'auto': 'True', 'dev': 'BIS'} 2023-09-14 19:04:31 [scrapy.core.engine] INFO: Closing spider (finished)
2条答案
按热度按时间ijxebb2r1#
使用此选择器:
注意:“invalid”前有两个下划线。
0vvn1miw2#
你试图抓取的表格是通过JavaScript呈现的,并且不存在于你从你的请求返回到你在问题中引用的url的html中。
正因为如此,没有选择器允许您通过对该URL的请求访问数据。
你也可以做的是解构所发出的API请求,并使用你在浏览器开发工具的网络选项卡中看到的请求将整个表作为JSON对象接收回来。
举例来说:
部分输出