Here, I am trying to extract all the car links from this website "https://www.euroncap.com/en/ratings-rewards/electric-vehicles/#?selectedMake=0&selectedMakeName=Select%20a%20make&selectedModel=0&selectedStar=&includeFullSafetyPackage=true&includeStandardSafetyPackage=true&selectedModelName=All&selectedProtocols=45155,41776&selectedClasses=1202,1199,1201,1196,1205,1203,1198,1179,40250,1197,1204,1180,34736,44997&allClasses=true&allProtocols=false&allDriverAssistanceTechnologies=false&selectedDriverAssistanceTechnologies=&thirdRowFitment=false" for example. I am trying to extract the link of "Volvo c40 recharge" for extracting I used python Scrapy response.css('div.rating-table-row-c.c9 a').xpath('@href').extract()
but I am getting output as ['/en{{assessment.Url}}']
but the actual url is "/en/results/volvo/c40-recharge/45878" How can I extract this?.
1条答案
按热度按时间a9wyjsp71#
这些数据是用JavaScript呈现的,所以不能直接用scrapy获取(除非使用scrapy-splash或selenium-scrapy等),可以通过禁用JavaScript并重新加载页面来查看。
如果你在devtools中打开“Network”选项卡,那么你可以看到它从一个JSON文件中获取数据,所以你可以直接从这个文件中获取你想要的数据。
带碎屑外壳的示例: