我如何确保我在每个零碎的请求中得到新的IP?我尝试了StormProxy和SmartProxy,但它给出的IP对于一个会话是相同的。
但是,每次运行时的ip都是新的,但对于单个会话,ip是相同的。
我的代码如下:
import json
import uuid
import scrapy
from scrapy.crawler import CrawlerProcess
class IpTest(scrapy.Spider):
name = 'IP_test'
previous_ip = ''
count = 1
ip_url = 'https://ifconfig.me/all.json'
def start_requests(self,):
yield scrapy.Request(
self.ip_url,
dont_filter=True,
meta={
'cookiejar': uuid.uuid4().hex,
'proxy': MY_ROTATING_PROXY # either stormproxy or smartproxy
}
)
def parse(self, response):
ip_address = json.loads(response.text)['ip_addr']
self.logger.info(f"IP: {ip_address}")
if self.count < 10:
self.count += 1
yield from self.start_requests()
settings = {
'DOWNLOAD_DELAY': 1,
'CONCURRENT_REQUESTS': 1,
}
process = CrawlerProcess(settings)
process.crawl(IpTest)
process.start()
输出日志:
2020-12-27 21:15:52 [scrapy.core.engine] INFO: Spider opened
2020-12-27 21:15:52 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-12-27 21:15:52 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-12-27 21:15:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ifconfig.me/all.json> (referer: None)
2020-12-27 21:15:55 [IP_test] INFO: IP: 190.239.69.94
2020-12-27 21:15:56 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ifconfig.me/all.json> (referer: https://ifconfig.me/all.json)
2020-12-27 21:15:56 [IP_test] INFO: IP: 190.239.69.94
2020-12-27 21:15:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ifconfig.me/all.json> (referer: https://ifconfig.me/all.json)
2020-12-27 21:15:57 [IP_test] INFO: IP: 190.239.69.94
2020-12-27 21:15:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ifconfig.me/all.json> (referer: https://ifconfig.me/all.json)
2020-12-27 21:15:59 [IP_test] INFO: IP: 190.239.69.94
2020-12-27 21:16:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ifconfig.me/all.json> (referer: https://ifconfig.me/all.json)
2020-12-27 21:16:00 [IP_test] INFO: IP: 190.239.69.94
2020-12-27 21:16:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ifconfig.me/all.json> (referer: https://ifconfig.me/all.json)
2020-12-27 21:16:01 [IP_test] INFO: IP: 190.239.69.94
2020-12-27 21:16:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ifconfig.me/all.json> (referer: https://ifconfig.me/all.json)
2020-12-27 21:16:03 [IP_test] INFO: IP: 190.239.69.94
2020-12-27 21:16:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ifconfig.me/all.json> (referer: https://ifconfig.me/all.json)
2020-12-27 21:16:04 [IP_test] INFO: IP: 190.239.69.94
2020-12-27 21:16:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ifconfig.me/all.json> (referer: https://ifconfig.me/all.json)
2020-12-27 21:16:06 [IP_test] INFO: IP: 190.239.69.94
2020-12-27 21:16:07 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ifconfig.me/all.json> (referer: https://ifconfig.me/all.json)
2020-12-27 21:16:07 [IP_test] INFO: IP: 190.239.69.94
2020-12-27 21:16:07 [scrapy.core.engine] INFO: Closing spider (finished)
我在这里做错了什么?我甚至试着禁用cookie(COOKIES_ENABLED = False
),从request.meta中删除cookiejar。但没有成功。
1条答案
按热度按时间kx1ctssn1#
这很难,但我找到了答案。对于Storm,您需要传递带有“Connection”的报头:'close'。在这种情况下,您将为每个请求获取新的代理。例如:
在这种情况下,Storm将关闭连接,并根据请求为您提供新的IP