Scrapy在主页上工作，但不是任何其他页面

ccrfmcuu 于 2023-10-20 发布在其他

关注(0)|答案(1)|浏览(154)

新来的菜鸟我试图从https://www.aims.gov.au上刮取数据，但更具体地说，是https://weather.aims.gov.au/#/station/4。然而，当我试图从station/4页面上刮取数据时，我没有得到任何东西，与我刮取他们的主页aims.gov.au页面时相比，我基本上可以检索到任何东西。你知道这是为什么吗？这是我的代码，希望有人能帮我看看我错在哪里。
第一个代码段抓取一个随机的标题，只是为了表明我可以从网站上抓取，但是当我移动到我想要的页面（第二个代码段）时，我不能抓取任何东西。
所有设置都是默认的其他生成的scrapy文件。
首页代码（测试，看看我是否可以刮这里）：

class GBRspider(scrapy.Spider):
    name = 'GBRspider'
    allowed_domains = ['weather.aims.gov.au']
    start_urls = ['https://www.aims.gov.au']


    def parse(self, response):
        data = response
        yield{
            'temp' : data.css('h1.banner-title::text').get()
        }

这给了我一个temp：“澳大利亚热带海洋研究机构”
所需页面的代码：

class GBRspider(scrapy.Spider):
    name = 'GBRspider'
    allowed_domains = ['weather.aims.gov.au']
    start_urls = ['https://weather.aims.gov.au/#/station/4']


    def parse(self, response):
        data = response
        yield{
            'temp' : data.css('h1.ng-binding::text').get()
        }

这给了我一个temp：没有，应该是戴维斯礁
谢谢你

scrapy

来源：https://stackoverflow.com/questions/76983033/scrapy-working-on-home-page-but-not-any-other-page

1条答案

按热度按时间

hujrc8aj1#

这是因为用于呈现主页的信息都包含在对主页URL的http请求的初始响应中。
另一个URL：https://weather.aims.gov.au/#/station/4从https://api.aims.gov.au/weather/station/4的API请求中获取渲染页面所需的信息，https://api.aims.gov.au/weather/station/4会产生一个json响应，服务器随后使用该响应来渲染页面。因此，为了获得您所寻求的信息，您所要做的就是向API URL发送请求。
举例来说：

import scrapy

class GBRspider(scrapy.Spider):
    name = 'GBRspider'
    allowed_domains = ['aims.gov.au']
    start_urls = ['https://api.aims.gov.au/weather/station/4']

    def parse(self, response):
        data = response.json()
        data["site_name"]
        yield{
            'temp' : data["site_name"]
        }

输出

2023-08-26 12:55:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://api.aims.gov.au/weather/station/4> (referer: None)
2023-08-26 12:55:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.aims.gov.au/weather/station/4>
{'temp': 'Davies Reef'}
2023-08-26 12:55:59 [scrapy.core.engine] INFO: Closing spider (finished)

赞(0）回复(0）举报 2023-10-20

我来回答

Scrapy在主页上工作，但不是任何其他页面

1条答案

相关问题

热门标签

最新问答