Scrapy Shell仅从div类元素中提取文本

6ie5vjzr 于 12个月前发布在 Shell

关注(0)|答案(1)|浏览(212)

我正试图从这个网站只拉日期值http://www.nflweather.com/
我相信我有代码，但我需要清理的结果一点点

response.xpath('//div[@class="fw-bold text-wrap"]/text()').extract()

字符串
我的结果返回\n\t

'\n\t\t\t12/28/23 08:15 PM EST\n\t\t'

型
我期待只是得到一个很好的清洁日期和时间。我看到一些其他版本在这里做的脚本，我希望能够有它从Scrapy shell 完成。

scrapy

来源：https://stackoverflow.com/questions/77740515/scrapy-shell-extract-text-only-from-div-class-element

1条答案

按热度按时间

3qpi33ja1#

你可以使用Selector.re方法将正则表达式应用于组中的每个匹配选择器。如果你只需要日期，那么你可以像(?:\d{2}/?){3}一样使用正则表达式模式。
例如使用scrappy shell：

In [1]: fetch('http://www.nflweather.com/')
2023-12-31 19:07:51 [scrapy.core.engine] INFO: Spider opened
2023-12-31 19:07:51 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.nflweather.com/> from <GET http://www.nflweather.com/>
2023-12-31 19:07:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nflweather.com/> (referer: None)

In [2]: response.xpath('//div[@class="fw-bold text-wrap"]/text()').re(r"(?:\d{2}/?){3}")
Out[2]:
['12/28/23',
 '12/30/23',
 '12/31/23',
 '12/31/23',
 '12/31/23',
 '12/31/23',
 '12/31/23',
 '12/31/23',
 '12/31/23',
 '12/31/23',
 '12/31/23',
 '12/31/23',
 '12/31/23',
 '12/31/23',
 '12/31/23',
 '12/31/23']

字符串
编辑：
如果你也想保持时间，那么我简单地建议在选择器查询的结果上调用strip()：

lst = response.xpath('//div[@class="fw-bold text-wrap"]/text()').getall()
[i.strip() for i in lst]

型

赞(0）回复(0）举报 12个月前

我来回答

Scrapy Shell仅从div类元素中提取文本

1条答案

相关问题

热门标签

最新问答