scrapy 使用css选择器选择一组元素和文本

fnatzsnv 于 2022-11-09 发布在其他

关注(0)|答案(2)|浏览(199)

我有一个HTML页面像：-

<div>
<a href='link'>
<u class>name</u>
</a>
text
<br>
<a href='link'>
<u class>name</u>
</a>
text
<br>
<a href='link'>
<u class>name</u>
</a>
text
<br>
<a href='link'>
<u class>name</u>
</a>
text
<br>
<a href='link'>
<u class>name</u>
</a>
text
<br>
</div>

我需要选择这样的组：-

<a href='link'>
<u class>name</u>
</a>
text
<br>

我需要从一个组中选择3个值：- link、name和text。有没有什么方法可以选择这样一个组，并使用scrapy、CSS选择器、Xpath或其他方法从每个组中提取这些特定的值？

scrapy

来源：https://stackoverflow.com/questions/72935072/select-a-group-of-elements-and-text-using-css-selectors

2条答案

按热度按时间

vhmi4jdf1#

Scrapy提供了一种在html页面上使用Items- as items（定义键-值对的Python对象）来yield多个值的机制。
您可以单独提取，但将它们作为键-值对一起生成。

要提取元素属性值，请使用attr（）。
要提取innerhtml，请使用文本。

就像你可以这样定义你的解析函数：

def parse(self, response):

        for_link = response.css(' .row.no-gutters div:nth-child(3) div:nth-child(8)  a::attr(href)').getall()

        for_name = response.css(' .row.no-gutters div:nth-child(3) div:nth-child(8) a u::text').getall()

        for_text =  response.css(' .row.no-gutters div:nth-child(3) div:nth-child(8)::text').getall()

            # Yield all elements
            yield {"link": for_link, "name": for_name, "text": for_text}

打开items.py文件。


# Define here the models for your scraped

# items

# Import the required library

import scrapy

# Define the fields for Scrapy item here

# in class

class <yourspider>Item(scrapy.Item):

    # Item key for a
    for_link = scrapy.Field()

    # Item key for u
    for_name = scrapy.Field()

    # Item key for span
    for_text = scrapy.Field()

有关详细信息，请访问read this tutorial

赞(0）回复(0）举报 2022-11-09

e7arh2l62#

如果可以像这样在范围中绕排文本：

<a href='link'>
<u class>name</u>
</a>
<span>text</span>
<br>

然后您可以选择CSS中的所有内容，如下所示：
a, a + span {}
或者，您可以分别设定这两个项目的型式：
a {}个
a + span {}
+表示“紧接着”或“紧接着”

赞(0）回复(0）举报 2022-11-09

我来回答

scrapy 使用css选择器选择一组元素和文本

2条答案

相关问题

热门标签

最新问答