基本上我的目标是刮每个产品项目页,但我认为我的代码是错误的,我不知道如何使用其他方法。
import scrapy
class AdamdentalSpider(scrapy.Spider):
name = "adamdental"
start_urls = [ "https://www.adamdental.com.au/search?ProductSearch=%25" ]
def parse(self, response):
products = response.css("div[data-role=product]")
for product in products:
title_item = products.css("span.widget-productlist-title a")[0]
url = title_item.attrib['href']
yield scrapy.Request(
url = self.start_urls[0] + url,
callback = self.parse_details
)
def parse_details(self, response):
main = response.css("div.product-detail-right")
yield{
"title": main.css("h1.widget-product-title::text"),
"sku": main.css("h4.subtitle::text"),
"price": main.css("span.item-price"),
"description": main.css("div.widget-product-field.info-group.widget-product-field-ProductDescription.description-gap"),
}
1条答案
按热度按时间fslejnso1#
一个请求和两个响应沿着两个yield并不是使用Scrapy提取数据的正确方法。
输出:
...等等