Python Scrapy向.json文件屈服，无法正常工作

j0pj023g 于 2022-11-09 发布在 Python

关注(0)|答案(1)|浏览(130)

我想使用Scrapy来提取一个url中不同书籍的标题，并将它们作为字典数组输出/存储在一个json文件中。

以下是我的代码：

import scrapy

class BooksSpider(scrapy.Spider):
    name = "books"
    star_urls = [ 
        "http://books.toscrape.com"
    ]

def parse(self, response):
    titles = response.css("article.product_pod h3 a::attr(title)").getall()
    for title in titles:
        yield {"title": title}

以下是我在终端中输入的内容：

scrapy crawl books -o books.json

books.json文件已创建，但为空。
我检查了我是在正确的目录和venv，但它仍然不工作.

然而：

早些时候，我部署了这个蜘蛛来抓取整个html数据并将其写入books.html文件，一切都正常。

以下是我的代码：

import scrapy

class BooksSpider(scrapy.Spider):
    name = "books"
    star_urls = [ 
        "http://books.toscrape.com"
    ]
    def parse(self, response):
        with open("books.html", "wb") as file:
            file.write(response.body)

这是我输入终端的内容

scrapy crawl books

知道我做错了什么吗？谢谢
编辑：
输入response.css('article.product_pod h3 a::attr(title)').getall()
变成了一堆废铁

['A Light in the Attic', 'Tipping the Velvet', 'Soumission', 'Sharp Objects', 'Sapiens: A Brief History of Humankind', 'The Requiem Red', 'The Dirty Little Secrets of Getting Your Dream Job', 'The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull', 'The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics', 'The Black Maria', 'Starving Hearts (Triangular Trade Trilogy, #1)', "Shakespeare's Sonnets", 'Set Me Free', "Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)", 'Rip it Up and Start Again', 'Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991', 'Olio', 'Mesaerion: The Best Science Fiction Stories 1800-1849', 'Libertarianism for Beginners', "It's Only the Himalayas"]

scrapy

来源：https://stackoverflow.com/questions/73908758/python-scrapy-yield-to-json-file-not-working

1条答案

按热度按时间

sg24os4d1#

现在运行代码。它应该工作

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ['http://books.toscrape.com/']

    def parse(self, response):

        titles = response.css('.product_pod')
        for title in titles:
            yield {
                "title": title.css('h3 a::attr(title)').get()
                #"title": title.css('h3 a::text').get()
            }

输出：

[
    {
        "title": "A Light in the Attic"
    },
    {
        "title": "Tipping the Velvet"
    },
    {
        "title": "Soumission"
    },
    {
        "title": "Sharp Objects"
    },
    {
        "title": "Sapiens: A Brief History of Humankind"
    },
    {
        "title": "The Requiem Red"
    },
    {
        "title": "The Dirty Little Secrets of Getting Your Dream Job"
    },
    {
        "title": "The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull"
    },
    {
        "title": "The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics"
    },
    {
        "title": "The Black Maria"
    },
    {
        "title": "Starving Hearts (Triangular Trade Trilogy, #1)"
    },
    {
        "title": "Shakespeare's Sonnets"
    },
    {
        "title": "Set Me Free"
    },
    {
        "title": "Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)"
    },
    {
        "title": "Rip it Up and Start Again"
    },
    {
        "title": "Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991"
    },
    {
        "title": "Olio"
    },
    {
        "title": "Mesaerion: The Best Science Fiction Stories 1800-1849"
    },
    {
        "title": "Libertarianism for Beginners"
    },
    {
        "title": "It's Only the Himalayas"
    }
]

赞(0）回复(0）举报 2022-11-09

我来回答

Python Scrapy向.json文件屈服，无法正常工作

1条答案

相关问题

热门标签

最新问答