我有一只简单的蜘蛛。
import scrapy
from scrapy.crawler import CrawlerProcess
class ScraperSpider(scrapy.Spider):
name = "scraper"
def start_requests(self):
urls = [
'https://api.ipify.org?format=json',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
self.logger.info('================Request: %s, IP address: %s' % (response.request, response.text))
if __name__ == "__main__":
process = CrawlerProcess()
process.crawl(ScraperSpider)
process.start()
字符串
但是,它给出了一个错误:
2023-12-18 23:56:34 [scrapy.core.engine] DEBUG: Crawled (400) <GET https://api.ipify.org?format=json> (referer: None)
2023-12-18 23:56:34 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <400 https://api.ipify.org?format=json>: HTTP status code is not handled or not allowed
2023-12-18 23:56:34 [scrapy.core.engine] INFO: Closing spider (finished)
型
但实际上url可以用curl或browser获取。
1条答案
按热度按时间8gsdolmq1#
在url中的
?
之前添加/
:字符串
输出量:
型