scrapy 当下一页链接不产生任何结果时,

xam8gpfp  于 2023-10-20  发布在  其他
关注(0)|答案(1)|浏览(129)

我试图得到的https://www.salewa.com/de-de/herren产品使用下面的代码.问题是,当next_page转到/de-de/herren?p=4时,它不产生任何项。在浏览器上,它是无限滚动的,一直滚动到p=9。因此,我的代码只产生108个项目,而不是295个项目。之前我认为问题是空页,所以我想跳过它的if len(products) > 0:,但现在它停止在第3页,并没有得到更多的产品。

import scrapy
from scrapy.selector import Selector
import re
import json
from scrapy import Spider, Request
from datetime import datetime as dt
import csv

class Salewa_Spider(Spider):
    name = "salewa"
    allowed_domains = ["salewa.com"]
    start_urls = ["https://www.salewa.com/de-de/herren"]

    def parse(self, response):
        products = response.css('div.product--info')
        for product in products:
            yield{
                    'name' : product.css('h2.product--title::text').get().strip(),
                    'price': product.css('span.price--default::text').get().strip(),
                    'url' : product.css('a.product--information-box').attrib['href'],
                }
        if len(products) > 0:
            try:
                next_page = response.css('a[class^="listing-page--nav page--next"]').attrib['href']
            except:
                next_page = []
            if next_page is not None:
               next_page_url = 'https://www.salewa.com' + next_page
               yield response.follow(next_page_url, callback=self.parse)
p8h8hvxi

p8h8hvxi1#

这是因为无限滚动是从一个不同的url调用获取信息,以填充产品信息。
中间页面的url可以通过浏览器开发工具的网络标签找到。你需要发现这个url是什么,并在你的scrapy请求中复制它,以便从无限滚动中获取其余的项目。
对于这个网站,具体的API URL是“https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part={page number}&o=1&n=36&loadProducts=1”返回一个JSON对象,该对象包含该页面的所有HTML元素。
你可以做的是为每个页面发送单独的请求,从json对象中提取html,将其转换为scrapy选择器,然后你就可以像解析第一个页面一样解析信息。使用这种策略,我能够产生296个独特的结果
举例来说:

from scrapy.selector import Selector
from scrapy import Spider, Request

class Salewa_Spider(Spider):
    name = "salewa"
    allowed_domains = ["salewa.com"]

    def start_requests(self):
        yield Request("https://www.salewa.com/de-de/herren")  #  request for the first page
        for i in range(2, 10):
            # request for remaining pages
            url = "https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=" +  str(i) + "&o=1&n=36&loadProducts=1"
            yield Request(url)

    def parse(self, response):
        try:
            # if parsing the first page this will fail otherwise this part is needed
            html = Selector(text="<html>" + response.json()['listing'] + "</html>")  
            response = html
        except:
            pass
        products = response.css('div.product--info')
        for product in products:
            yield{
                    'name' : product.css('h2.product--title::text').get().strip(),
                    'price': product.css('span.price--default::text').get().strip(),
                    'url' : product.css('a.product--information-box').attrib['href'],
                }

输出

2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=2&o=1&n=36&loadProducts=1>
{'name': 'PEDROC MERINO KURZE SOCKEN HERREN', 'price': '22,00\xa0€', 'url': 'https://www.salewa.com/de-de/pedroc-merino-kurze-socken-herren-00-0000069055?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=2&o=1&n=36&loadProducts=1>
{'name': 'Lavaredo Hemp Ripstop Hose Herren', 'price': '104,00\xa0€', 'url': 'https://www.salewa.com/de-de/lavaredo-hemp-ripstop-hose-herren--00-0000028550?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=2&o=1&n=36&loadProducts=1>
{'name': "Pedroc Dry'Ton Mesh T-Shirt Herren", 'price': '60,00\xa0€', 'url': 'https://www.salewa.com/de-de/pedroc-dryton-mesh-t-shirt-herren-00-0000028584?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=2&o=1&n=36&loadProducts=1>
{'name': 'Lavaredo Hemp Pullover Herren', 'price': '100,00\xa0€', 'url': 'https://www.salewa.com/de-de/lavaredo-hemp-pullover-herren--00-0000028547?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=2&o=1&n=36&loadProducts=1>
{'name': 'Fanes 3 Layers Powertex Hemp 2 in 1 Parka Herren', 'price': '700,00\xa0€', 'url': 'https://www.salewa.com/de-de/fanes-3-layers-powertex-hemp-2-in-1-parka-herren-00-0000028666?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=2&o=1&n=36&loadProducts=1>
{'name': 'Alp Trainer 2 Schuh Herren', 'price': '170,00\xa0€', 'url': 'https://www.salewa.com/de-de/alp-trainer-2-schuh-herren-00-0000061402?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=2&o=1&n=36&loadProducts=1>
{'name': 'Fanes Engineered Merino Logo Pullover Herren', 'price': '112,00\xa0€', 'url': 'https://www.salewa.com/de-de/fanes-engineered-merino-logo-pullover-herren-00-0000028355?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=2&o=1&n=36&loadProducts=1>
{'name': 'Ortles RDS Hybrid Daunenjacke Herren', 'price': '340,00\xa0€', 'url': 'https://www.salewa.com/de-de/ortles-rds-hybrid-daunenjacke-herren-00-0000028458?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Wildfire 2 Gore-Tex® Schuh Herren', 'price': '190,00\xa0€', 'url': 'https://www.salewa.com/de-de/wildfire-2-gore-tex-schuh-herren-00-0000061414?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Puez Dolomitic 2 Durastretch Regular Hose Herren', 'price': '100,00\xa0€', 'url': 'https://www.salewa.com/de-de/puez-dolomitic-2-durastretch-regular-hose-herren-00-0000028484?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Puez Polarlite Fleece Herren', 'price': '100,00\xa0€', 'url': 'https://www.salewa.com/de-de/puez-polarlite-fleece-herren-00-0000028478?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Puez Dolomitic 2 Durastretch Kurze Hose Herren', 'price': '100,00\xa0€', 'url': 'https://www.salewa.com/de-de/puez-dolomitic-2-durastretch-kurze-hose-herren-00-0000028486?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Puez Dolomitic 2 Durastretch Lange Hose Herren', 'price': '100,00\xa0€', 'url': 'https://www.salewa.com/de-de/puez-dolomitic-2-durastretch-lange-hose-herren-00-0000028485?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Puez Polarlite Half Zip Fleece Herren', 'price': '80,00\xa0€', 'url': 'https://www.salewa.com/de-de/puez-polarlite-half-zip-fleece-herren-00-0000028481?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Tognazza Polarlite Herren Jacke', 'price': '84,00\xa0€', 'url': 'https://www.salewa.com/de-de/tognazza-polarlite-herren-jacke-00-0000027918?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Woolen 2 Layers Kapuzenjacke Herren', 'price': '220,00\xa0€', 'url': 'https://www.salewa.com/de-de/woolen-2-layers-kapuzenjacke-herren-00-0000027331?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Fanes Sarner Down Hybrid Weste Herren', 'price': '250,00\xa0€', 'url': 'https://www.salewa.com/de-de/fanes-sarner-down-hybrid-weste-herren-00-0000028017?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Lagazuoi 3 Daunen Herren Jacke', 'price': '220,00\xa0€', 'url': 'https://www.salewa.com/de-de/lagazuoi-3-daunen-herren-jacke-00-0000026705?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Sarner Wolle Hoody Herren', 'price': '270,00\xa0€', 'url': 'https://www.salewa.com/de-de/sarner-wolle-hoody-herren--00-0000026162?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Solidlogo Dri-Release® T-Shirt Herren', 'price': '32,00\xa0€', 'url': 'https://www.salewa.com/de-de/solidlogo-dri-release-t-shirt-herren-00-0000027018?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Paganella Polarlite Herren Jacke', 'price': '63,00\xa0€', 'url': 'https://www.salewa.com/de-de/paganella-polarlite-herren-jacke-00-0000027924?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Wildfire Leather Gore-Tex® Schuh Herren', 'price': '180,00\xa0€', 'url': 'https://www.salewa.com/de-de/wildfire-leather-gore-tex-schuh-herren-00-0000061416?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Brenta RDS Daunenjacke Herren', 'price': '192,00\xa0€', 'url': 'https://www.salewa.com/de-de/brenta-rds-daunenjacke-herren-00-0000027883?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Zebru Responsive Langarm Herren T-Shirt', 'price': '90,00\xa0€', 'url': 'https://www.salewa.com/de-de/zebru-responsive-langarm-herren-t-shirt-00-0000027957?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Solidlogo Dry’Ton Langarm Shirt Herren', 'price': '50,00\xa0€', 'url': 'https://www.salewa.com/de-de/solidlogo-dryton-langarm-shirt-herren-00-0000027340?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Pedroc 3 Durastretch Hose Herren', 'price': '70,00\xa0€', 'url': 'https://www.salewa.com/de-de/pedroc-3-durastretch-hose-herren--00-0000026955?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Pure Merino Wollstirnband', 'price': '30,00\xa0€', 'url': 'https://www.salewa.com/de-de/pure-merino-wollstirnband-00-0000028769?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=9&o=1&n=36&loadProducts=1>
{'name': 'Sesvenna Gore® Windstopper® Grip Handschuhe', 'price': '70,00\xa0€', 'url': 'https://www.salewa.com/de-de/sesvenna-gore-windstopper-grip-handschuhe-00-0000026577?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=9&o=1&n=36&loadProducts=1>
{'name': 'Rainbow Gürtel', 'price': '21,00\xa0€', 'url': 'https://www.salewa.com/de-de/rainbow-guertel-00-0000024812?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=9&o=1&n=36&loadProducts=1>
{'name': 'Hiking Gamaschen Größe M', 'price': '45,00\xa0€', 'url': 'https://www.salewa.com/de-de/hiking-gamaschen-groee-m-00-0000002117?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=9&o=1&n=36&loadProducts=1>
{'name': 'Hiking Gamaschen Größe L', 'price': '45,00\xa0€', 'url': 'https://www.salewa.com/de-de/hiking-gamaschen-groee-l-00-0000002116?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=9&o=1&n=36&loadProducts=1>
{'name': 'Approach Gamaschen', 'price': '55,00\xa0€', 'url': 'https://www.salewa.com/de-de/approach-gamaschen-00-0000002115?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=9&o=1&n=36&loadProducts=1>
{'name': 'Trekking Gamaschen', 'price': '65,00\xa0€', 'url': 'https://www.salewa.com/de-de/trekking-gamaschen-00-0000002114?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=9&o=1&n=36&loadProducts=1>
{'name': 'Fanes Regenhut mit Krempe', 'price': '40,00\xa0€', 'url': 'https://www.salewa.com/de-de/fanes-regenhut-mit-krempe-00-0000027464?c=316582&listing=1'}

相关问题