scrapy scrapy 的提取< li>与< ul>利用

zd287kbt  于 2022-11-09  发布在  其他
关注(0)|答案(2)|浏览(132)

我是新来的Scrapy,但我遇到了一个问题,形成一个准确的选择器的基础上scrapy的教程代码基本上我试图列出所有的企业,他们的地址和他们的网站.但当我试图列出他们只有一个结果出来(如果我把他们都设置为getall然后我得到他们所有的只是他们被随机扔在那里,我需要他们的格式:
(一个月一个月)
下面是我使用的代码:

class RynekMainSpider(scrapy.Spider):
    name = "RynekMain"
    start_urls = [
        'https://rynekpierwotny.pl/deweloperzy/?page=1',
    ]

    def parse(self, response):
        for quote in response.css('ul.rp-1qtpzi4'):
            yield {
                'address': quote.css('address.rp-o9b83y::text').get(),
                'name': quote.css('h2.rp-69f2r4::text').get(),
                'link': quote.css('li.rp-np9kb1 a::attr(href)').get(),
            }

        ``` 
Thanks in advance.
chhkpiq4

chhkpiq41#

您只得到一个输出,因为元素选择/定位器策略ul.rp-1qtpzi4不正确,这意味着它没有选择整个页面中的所有列表,而是正确的选择,如
.rp-y89gny.eboilu01 ul li选择所有24个项目

import scrapy
from scrapy.crawler import CrawlerProcess

class RynekMainSpider(scrapy.Spider):
    name = "RynekMain"
    start_urls = [
        'https://rynekpierwotny.pl/deweloperzy/?page=1']

    def parse(self, response):
        for quote in response.css('.rp-y89gny.eboilu01 ul li'):
            yield {
                'address': quote.css('address.rp-o9b83y::text').get(),
                'name': quote.css('h2.rp-69f2r4::text').get(),
                'link': quote.css('li.rp-np9kb1 a::attr(href)').get(),
            }

if __name__ == "__main__":
    process =CrawlerProcess()
    process.crawl(RynekMainSpider)
    process.start()

输出:

{'address': 'mazowieckie, Warszawa', 'name': 'Dom Development S.A.', 'link': 'https://rynekpierwotny.pl/deweloperzy/dom-development-sa-955/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'mazowieckie, Warszawa', 'name': 'Ronson Development Sp. z o.o.', 'link': 'https://rynekpierwotny.pl/deweloperzy/ronson-development-sp-z-oo-863/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'mazowieckie, Warszawa', 'name': 'Echo Investment S.A.', 'link': 'https://rynekpierwotny.pl/deweloperzy/echo-investment-sa-7478/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'dolnośląskie, Wrocław, Psie Pole', 'name': 'INTER-ES Deweloper', 'link': 'https://rynekpierwotny.pl/deweloperzy/inter-es-deweloper-928/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'śląskie, Bielsko-Biała', 'name': 'Murapol S.A.', 'link': 'https://rynekpierwotny.pl/deweloperzy/murapol-sa-884/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'mazowieckie, Warszawa', 'name': 'Robyg S.A.', 'link': 'https://rynekpierwotny.pl/deweloperzy/robyg-sa-888/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'śląskie, cieszyński, Cieszyn', 'name': 'ATAL S.A.', 'link': 'https://rynekpierwotny.pl/deweloperzy/atal-sa-1084/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'zachodniopomorskie, Szczecin', 'name': 'Assethome – Przedstawiciel Dewelopera', 'link': 'https://rynekpierwotny.pl/deweloperzy/asset-home-przedstawiciel-dewelopera-7429/'}    
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'mazowieckie, Warszawa', 'name': 'Hreit', 'link': 'https://rynekpierwotny.pl/deweloperzy/hreit-7892/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'dolnośląskie, Wrocław', 'name': 'Develia', 'link': 'https://rynekpierwotny.pl/deweloperzy/develia-1048/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'dolnośląskie, Wrocław, Fabryczna', 'name': 'PROFIT Development', 'link': 'https://rynekpierwotny.pl/deweloperzy/profit-development-940/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'mazowieckie, Warszawa', 'name': 'Novisa Development Sp. z o.o. Sp. J.', 'link': 'https://rynekpierwotny.pl/deweloperzy/novisa-development-sp-z-oo-sp-j-484/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'pomorskie, Gdańsk', 'name': 'Robyg', 'link': 'https://rynekpierwotny.pl/deweloperzy/robyg-grupa-deweloperska-4251/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'mazowieckie, Warszawa', 'name': 'Arche S.A.', 'link': 'https://rynekpierwotny.pl/deweloperzy/arche-sp-z-oo-934/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'warmińsko-mazurskie, ełcki, Ełk', 'name': 'Rutkowski Development Sp. J.', 'link': 'https://rynekpierwotny.pl/deweloperzy/rutkowski-development-sp-j-1846/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'mazowieckie, Warszawa', 'name': 'Cordia Polska', 'link': 'https://rynekpierwotny.pl/deweloperzy/cordia-polska-3824/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'mazowieckie, Warszawa', 'name': 'Budlex Sp. z o.o.', 'link': 'https://rynekpierwotny.pl/deweloperzy/budlex-sp-z-oo-1684/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'pomorskie, Gdańsk', 'name': 'Euro Styl S.A.', 'link': 'https://rynekpierwotny.pl/deweloperzy/euro-styl-sa-964/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'łódzkie, Skierniewice', 'name': 'JHM DEVELOPMENT S.A.', 'link': 'https://rynekpierwotny.pl/deweloperzy/jhm-development-sa-892/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'dolnośląskie, Wrocław', 'name': 'Lokum Deweloper', 'link': 'https://rynekpierwotny.pl/deweloperzy/lokum-deweloper-948/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'podlaskie, Łomża', 'name': 'Eldor Bud Sp. z o.o.', 'link': 'https://rynekpierwotny.pl/deweloperzy/eldor-bud-sp-z-oo-4355/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'mazowieckie, Warszawa', 'name': 'Nexity Polska Sp. z o.o.', 'link': 'https://rynekpierwotny.pl/deweloperzy/nexity-polska-sp-z-oo-2856/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'mazowieckie, Warszawa', 'name': 'Spravia', 'link': 'https://rynekpierwotny.pl/deweloperzy/spravia-1236/'}
2022-06-24 02:16:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://rynekpierwotny.pl/deweloperzy/>
{'address': 'małopolskie, Kraków', 'name': 'Bryksy', 'link': 'https://rynekpierwotny.pl/deweloperzy/bryksy-914/'}

 'item_scraped_count': 24,,
7kjnsjlb

7kjnsjlb2#

response.css('ul.rp-1qtpzi4')将获得项的容器,而不是项(li标记)本身。因此,您将在容器上循环(一次),只获得第一项。
将其更改为:

import scrapy

class RynekMainSpider(scrapy.Spider):
    name = "RynekMain"
    start_urls = [
        'https://rynekpierwotny.pl/deweloperzy/?page=1',
    ]

    def parse(self, response):
        for quote in response.css('ul.rp-1qtpzi4 li'):
            yield {
                'address': quote.css('address.rp-o9b83y::text').get(),
                'name': quote.css('h2.rp-69f2r4::text').get(),
                'link': quote.css('li.rp-np9kb1 a::attr(href)').get(),
            }

相关问题