我尝试刮transfermarkt.nl与scrapy的帮助下.该网站曾经给予一个404错误,所以改变了设置为
HTTPERROR_ALLOWED_CODES = [404]
USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"
ROBOTSTXT_OBEY = False
现在我跑的时候
import scrapy
class TransferMarketScraper(scrapy.Spider):
name = 'transfermarket'
starts_urls = ['https://www.transfermarkt.nl/heracles-almelo/kader/verein/1304/saison_id/2022/plus/1']
def parse(self, response):
for player in response.css('div.grid-view table.items tbody tr').get():
#player number
try:
player_number = int(
player.css('div.rn_nummer::text').get().strip()
)
except ValueError:
player_number = 'NA'
except AttributeError:
continue
yield {'player_number': player_number}`
我得到爬网0页,即使当我使用scrapy shell检查时,响应确实返回值。这里可能有什么问题?
2条答案
按热度按时间xxls0lw81#
您没有发送需要解析的请求,您需要添加
xoefb8l82#
您有一个打字错误。您将
start_urls
写成了starts_urls
。编辑:
您可能需要更改的另一件事是删除
get()
。