我尝试在python中使用scrapy来抓取作者名称，但其结果为“无”或有时会得到“\t\t\t\t\t\t\n\n\n\n\n\n\t”而不是作者名称

zzzyeukh 于 2022-11-23 发布在 Python

关注(0)|答案(1)|浏览(168)

我尝试在python中使用scrapy来抓取作者名，但是它给出的结果是“无”或者有时得到“\t\t\t\t\t\n\n\n\n\n\n\t”而不是作者名。我尝试了很多方法，比如response.cssresponse.xpath等。当我从inspecting中复制XPath时，它对文章Headline也出现了同样的问题，但随后我尝试使用SelectorGadget复制XPath，它对Headline有效，但对于作者SelectorGadget，Xpath也不适用于我。
这是我的代码

class NewsSpider(scrapy.Spider):
    name = "cruiseradio"

    def start_requests(self):
        url = input("Enter the article url: ")
        
        yield scrapy.Request(url, callback=self.parse_dir_contents)

    def parse_dir_contents(self, response):
        try:
            Author = response.css('span.elementor-post-info__item--type-author::text').get()
        except IndexError:
            Author = "NULL"
        yield{
            'Author': Author,
        }

这是网站的URL。https://cruiseradio.net/new-expedition-ship-delivered-atlas-ocean-voyages/

scrapy

来源：https://stackoverflow.com/questions/74061758/i-am-trying-to-scrape-author-name-using-scrapy-in-python-but-its-giving-result-n

1条答案

按热度按时间

kmbjn2e31#

看看这个

[i.strip() for i in response.css('[itemprop="author"] span[class*="item--type-author"] ::text').getall() if i.strip() and i.lower().strip() != "by"][0]

如果每个帖子只有1个作者，如果可以有多个作者，请从末尾删除[0]部分。

赞(0）回复(0）举报 2022-11-23

我来回答

我尝试在python中使用scrapy来抓取作者名称，但其结果为“无”或有时会得到“\t\t\t\t\t\t\n\n\n\n\n\n\t”而不是作者名称

1条答案

相关问题

热门标签

最新问答