使用Scrapy,未能报废图书标题作者和链接从无限滚动书店www.aseeralkotb.com网站,从devtools检查,找不到下一页的链接.代码
import scrapy
class booksSpider(scrapy.Spider):
name = 'books'
start_urls = [
'https://www.aseeralkotb.com/categories/%D8%B3%D9%8A%D8%A7%D8%B3%D8%A9',
]
def parse(self, response):
for book in response.css('div.flex.flex-col.items-center'):
yield {
'title': book.css('a:not([itemprop="author"])::attr(title)').get(),
'author': book.css('h5[itemprop=name]::text').get(),
'detailslinks': book.css('a[title]::attr(href)').re(r'.*books.*')
}
for link in book:
yield response.follow(link.get(),method='POST',callback = self.parse_links)
1条答案
按热度按时间vfhzx4xs1#
该网站正在发送
Ajax
请求,来自API的响应是json沿着html。使用scrapy,它抛出响应状态419,但使用强大的requests
模块,它工作。输出: