默认用法为:
import scrapy
from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_project_settings
class MySpider1(scrapy.Spider):
# Your first spider definition
...
class MySpider2(scrapy.Spider):
# Your second spider definition
...
configure_logging()
settings = get_project_settings()
runner = CrawlerRunner(settings)
runner.crawl(MySpider1)
runner.crawl(MySpider2)
d = runner.join()
d.addBoth(lambda _: reactor.stop())
我的代码:
import scrapy
from scrapy.crawler import CrawlerRunner
from twisted.internet import reactor
runner1 = CrawlerRunner(settings = {
"FEEDS": {
r"file:///C:\\Users\Messi\\1.json": {"format": "json", "overwrite": True}
},
})
runner2 = CrawlerRunner(settings = {
"FEEDS": {
r"file:///C:\\Users\Messi\\2.json": {"format": "json", "overwrite": True}
},
})
runner3 = CrawlerRunner(settings = {
"FEEDS": {
r"file:///C:\\Users\Messi\\3.json": {"format": "json", "overwrite": True}
},
})
h = runner1.crawl(Live1)
h.addBoth(lambda _: reactor.stop())
a = runner2.crawl(Live2)
a.addBoth(lambda _: reactor.stop())
t = runner3.crawl(Live3)
t.addBoth(lambda _: reactor.stop())
reactor.run()
上面的代码不起作用!我怎么能运行不同的蜘蛛在同一时间,他们有不同的爬虫运行设置?设置是不同的,所以我用不同的变量为他们runner 1,runner 2,runner 3...什么应该是正确的用法?请你帮助我关于这个主题。非常感谢。
1条答案
按热度按时间dluptydi1#
就像我在评论中说的,我认为使用custom_settings更好。
不管怎样,这对我很有效:
另一种方式:
1.json:
2.json:
我有点猜到答案了,我不确定哪个更好。如果有人想在评论中纠正/解释/澄清,我会很高兴的。