**我不想使用API来提取数据,我只是想学习这种方式的项目。**下一页的元素是不可见的,该网站有无限的滚动。我已经刮了第一页,但我不能刮或创建一个循环提取,直到结束页。网址是https://www.futuretools.io/。
from scrapy_playwright.page import PageMethod
from playwright.sync_api import sync_playwright
class ToolsSpider(scrapy.Spider):
name = "tools"
def start_requests(self):
yield scrapy.Request("https://www.futuretools.io",
meta=dict(
playwright = True,
playwright_page_methods = [
PageMethod("wait_for_selector", "div.jetboost-list-wrapper-n5zn > div.w-dyn-items div.tool"),
]
))
async def parse(self, response):
for tool in response.css("div.jetboost-list-wrapper-n5zn > div.w-dyn-items div.tool"):
yield{
'title': tool.css('div.div-block-18 a.tool-item-link---new::text').get(),
'description': tool.css('div.tool-item-description-box---new::text').get(),
'total_votes': tool.css('div.list-upvote div.text-block-52::text').get(),
'category': tool.css('div.collection-list-wrapper-9 div.text-block-53::text').get()
}```
字符串
1条答案
按热度按时间cunj1qz11#
你可以滚动到
div.tool.w-dyn-item.w-col.w-col-6:nth-child(number_of_items)
出现编辑:
这里我使用了next_page按钮。你可以采取滚动的方法,但这似乎更干净。
字符串