Scrapy刮擦具有相同类名的内容

bxgwgixi 于 2022-11-09 发布在其他

关注(0)|答案(3)|浏览(163)

我正在使用scrapy从一个特定的webiste抓取数据。抓取工作正常，但是当从div抓取具有相同类名的内容时，我遇到了问题。例如：

<div class="same_name">
 this is the 1st div
</div>
<div class="same_name">
 this is the 2nd div
</div>
<div class="same_name">
 this is the 3rd div
</div>

我只想检索这是第一个div。我使用的代码是：

desc = hxs.select('//div[@class = "same_name"]/text()').extract()

但是它会把所有的内容都返回给我。任何帮助都会很有帮助！！

scrapy

来源：https://stackoverflow.com/questions/22981261/scrapy-scrape-content-having-same-class-name

3条答案

按热度按时间

kr98yfug1#

好吧，这个对我有用。

print desc[0]

它返回给我这是第一个div这是我想要的。

赞(0）回复(0）举报 2022-11-09

ie3xauqp2#

你可以使用BeautifulSoup，它是一个很棒的html解析器。

from BeautifulSoup import BeautifulSoup
html = """
<div class="same_name">
this is the 1st div
</div>
<div class="same_name">
this is the 2nd div
</div>
<div class="same_name">
this is the 3rd div
</div>
"""
soup = BeautifulSoup(html)
print soup.text

这样就行了。

展开查看全部

赞(0）回复(0）举报 2022-11-09

nmpmafwu3#

使用xpath，你将得到所有具有相同类的div，进一步，你可以在它们上循环以得到结果（对于scrapy）：

divs = response.xpath('//div[@class="full class name"]')
for div in divs:
  if div.css("div.class"):

赞(0）回复(0）举报 2022-11-09

我来回答

Scrapy刮擦具有相同类名的内容

3条答案

相关问题

热门标签

最新问答