Response.css返回空数组- Scrapy

vngu2lb8  于 2024-01-09  发布在  其他
关注(0)|答案(1)|浏览(105)

我是新的网页抓取和Scrapy一般.我试图从黄页和运行到挑战刮.当我在终端中运行获取,我得到一个200响应.但是当尝试做响应.css('article. address-indicators')例如,我得到一个空数组.我用books.toscrape.com测试了这一点,它工作正常.

fetch("https://www.yellowpages.com/search?search_term=hairdressers%20&search_location=Los%20Angeles%2C%20CA&search_type=searchbox_top")

字符串

bt1cpqcv

bt1cpqcv1#

默认情况下,scrapy会遵守robots.txt中的规则。请参阅下面的日志:

>>> fetch('https://www.yellowpages.com/search?search_term=hairdressers%20&search_location=Los%20Angeles%2C%20CA&search_type=searchbox_top')
2023-12-31 11:09:26 [scrapy.core.engine] INFO: Spider opened
2023-12-31 11:09:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.yellowpages.com/robots.txt> (referer: None)
2023-12-31 11:09:27 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://www.yellowpages.com/search?search_term=hairdressers%20&search_location=Los%20Angeles%2C%20CA&search_type=searchbox_top>

字符串
您可以覆盖默认值(风险自担):scrapy shell --set ROBOTSTXT_OBEY=False
然后你可以使用response.css('....')或类似的表达式。

相关问题