我的Scrapy Shell命令可以工作,但输出为空

js5cn81o  于 2022-11-09  发布在  Shell
关注(0)|答案(1)|浏览(203)

我在Scrapy Shell中测试了代码,工作正常。

fetch('https://www.livescores.com/?tz=3')
response.css('div.dh')
gununMaclari = response.css('div.dh')
gununMaclari.css('span.hh span.ih span.kh::text').get()
gununMaclari.css('span.hh span.jh span.kh::text').get()

这些命令显示我的主场和客场球队。如果我使用getall()我可以达到所有的数据为主场和客场。但当我运行下面的代码,输出是空的。这是问题,我不能解决它。有人能帮我找到这个问题吗?谢谢。

import scrapy
from scrapy.crawler import CrawlerRunner

class LivescoresTodayList(scrapy.Spider):

    name = 'todayMatcheslist'
    custom_settings = {'CONCURRENT_REQUESTS': '1'}

    def start_requests(self):
        yield scrapy.Request('https://www.livescores.com/?tz=3')

    def parse(self, response):

        for gununMaclari in response.css('div.dh'):
            yield{
                'Home': gununMaclari.css('span.hh span.ih span.kh::text').get(),
                'Away': gununMaclari.css('span.hh span.jh span.kh::text').get()
            }

runnerTodayList = CrawlerRunner(settings = {
    "FEEDS": {
        "todayMatcheslist.json": {"format": "json", "overwrite": True},
    },
})
runnerTodayList.crawl(LivescoresTodayList)
swvgeqrz

swvgeqrz1#

看看这个
spider本身是好的。如果你使用的是CrawlerRunner,你需要配置日志和设置,并启动reactor。

CrawlerProcess示例:

import scrapy
from scrapy.crawler import CrawlerProcess

class LivescoresTodayList(scrapy.Spider):
    name = 'todayMatcheslist'
    custom_settings = {'CONCURRENT_REQUESTS': '1'}

    def start_requests(self):
        yield scrapy.Request('https://www.livescores.com/?tz=3')

    def parse(self, response):
        for gununMaclari in response.css('div.dh'):
            yield{
                'Home': gununMaclari.css('span.hh span.ih span.kh::text').get(),
                'Away': gununMaclari.css('span.hh span.jh span.kh::text').get()
            }

process = CrawlerProcess(settings={
    "FEEDS": {
        "todayMatcheslist.json": {"format": "json", "overwrite": True},
    },
})

process.crawl(LivescoresTodayList)
process.start()

CrawlerRunner示例:

import scrapy
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from twisted.internet import reactor

class LivescoresTodayList(scrapy.Spider):
    name = 'todayMatcheslist'
    custom_settings = {'CONCURRENT_REQUESTS': '1'}

    def start_requests(self):
        yield scrapy.Request('https://www.livescores.com/?tz=3')

    def parse(self, response):
        for gununMaclari in response.css('div.dh'):
            yield{
                'Home': gununMaclari.css('span.hh span.ih span.kh::text').get(),
                'Away': gununMaclari.css('span.hh span.jh span.kh::text').get()
            }

configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
runnerTodayList = CrawlerRunner(settings={
    "FEEDS": {
        "todayMatcheslist.json": {"format": "json", "overwrite": True},
    },
})
d = runnerTodayList.crawl(LivescoresTodayList)
d.addBoth(lambda _: reactor.stop())
reactor.run()

相关问题