scrapy 如何为这段代码创建if和catch(异常)

x3naxklr 于 2022-11-09 发布在其他

关注(0)|答案(1)|浏览(138)

所以我想抓取多个URL的数据并检索所有信息。但如果超过1个URL，我只能从1个URL抓取数据，这将是一个错误（列表索引超出范围）。我得到了使用try和catch的信息。语法本身应该是什么样子的？

import scrapy

class QuotesSpider(scrapy.Spider): name = "quotes"

def start_requests(self):
    urls = [
       # 'https://jdih.kaltimprov.go.id/produk_hukum/detail/9ef7f994-9db4'

    ]
    for url in urls:
        yield scrapy.Request(url=url, callback=self.parse)

def parse(self, response):
    yield{
        'Kategori':response.xpath('//*[@class="text-left"]/text()')[0].extract(), 
        'Nomor':response.xpath('//*[@class="text-left"]/text()')[1].extract(),
        'Judul':response.xpath('//*[@class="text-left"]/text()')[2].extract().strip(),
        'Tanggal Diterapkan':response.xpath('//*[@class="text-left"]/text()')[3].extract(),
        'Tanggal Diundangkan':response.xpath('//*[@class="text-left"]/text()')[4].extract(),
        'Keterangan Status':response.xpath('//*[@class="text-left"]/p/text()')[0].extract(),
        'Statistik View':response.xpath('//*[@class="text-left"]/text()')[5].extract(),
        'Statistik Download':response.xpath('//*[@class="text-left"]/text()')[6].extract(),
        'Katalog': response.xpath('//*[@class="text-left"]/p/span/text').extract(),
        'Abstraksi' :response.xpath('//*[@class="text-left"]/p/text()')[1].extract(),
        'Lampiran': response.css('body > section > div > div > div > div.row > div.col-3 > a::attr(href)').extract()  
    }

scrapy

来源：https://stackoverflow.com/questions/71627068/how-to-create-if-and-catch-exceptionfor-this-code-of-scrapy

1条答案

按热度按时间

8mmmxcuj1#

这不是抓取多个url的问题，而是xpath选择器的问题。对于每个元素，你都给予一个xpath从列表中选择一个元素。如果没有文本要提取，也没有列表，就会出现“超出范围”的错误。
我试过你的代码，并添加两个网址：

class QuestionSpider(scrapy.Spider):
name = 'question'
allowed_domains = ['jdih.kaltimprov.go.id']
start_urls = ['https://jdih.kaltimprov.go.id/produk_hukum/detail/9ef7f994-9db4',
    'https://jdih.kaltimprov.go.id/produk_hukum/detail/5d0c7c0c-aa58']

def parse(self, response):
    yield{
    'Kategori':response.xpath('//*[@class="text-left"]/text()')[0].extract(), 
    'Nomor':response.xpath('//*[@class="text-left"]/text()')[1].extract(),
    'Judul':response.xpath('//*[@class="text-left"]/text()')[2].extract().strip(),
    'Tanggal Diterapkan':response.xpath('//*[@class="text-left"]/text()')[3].extract(),
    'Tanggal Diundangkan':response.xpath('//*[@class="text-left"]/text()')[4].extract(),
    'Keterangan Status':response.xpath('//*[@class="text-left"]/p/text()')[0].extract(),
    'Statistik View':response.xpath('//*[@class="text-left"]/text()')[5].extract(),
    'Statistik Download':response.xpath('//*[@class="text-left"]/text()')[6].extract(),
    'Katalog': response.xpath('//*[@class="text-left"]/p/span/text').extract(),
    'Abstraksi' :response.xpath('//*[@class="text-left"]/p/text()')[1].extract(),
    'Lampiran': response.css('body > section > div > div > div > div.row > div.col-3 > a::attr(href)').extract()  
    }

它给我一个错误：

File "C:\Users\30463\desktop\quetsion3spider\quetsion3spider\spiders\question.py", line 17, in parse
'Keterangan Status':response.xpath('//*[@class="text-left"]/p/text()')[0].extract(),
File "D:\anaconda\lib\site-packages\parsel\selector.py", line 70, in __getitem__
o = super(SelectorList, self).__getitem__(pos)
IndexError: list index out of range

这第二行显示了选择器的问题。希望这能帮助你。

赞(0）回复(0）举报 2022-11-09

我来回答

scrapy 如何为这段代码创建if和catch(异常)

1条答案

相关问题

热门标签

最新问答