使用Scrapy和XPath解析表中的信息

y53ybaqx 于 2022-11-09 发布在其他

关注(0)|答案(2)|浏览(172)

我正在尝试从一个带有scrapy和xpath的网站中提取属性：

response.xpath('//section[@id="attributes"]/div/table/tbody/tr/td/text()').extract()

这些属性以下列方式嵌套：

<section id="attributes">
<h5>Attributes</h5>
    <div>
        <table>
            <tbody>
                <tr>
                    <td>Attribute 1</td>
                    <td>Value 1</td>
                </tr>           
                <tr>
                    <td>Attriburte 2</td>
                    <td>Value 2</td>
                </tr>

与此相关的问题有两个：
1.获取td元素的内容（XPath命令将返回[]）
1.一旦检索到td，我需要以某种方式获得配对。例如：“属性1”=“值1”
我是新来的phyton和scrapy，任何帮助都是非常感谢。

scrapy

来源：https://stackoverflow.com/questions/53437448/parse-info-from-tables-with-scrapy-and-xpath

2条答案

按热度按时间

wswtfjt71#

首先，您应该尝试从XPath中删除tbody标记，因为它通常不在页面源代码中。
您可以按如下方式更新代码：

cells = response.xpath('//section[@id="attributes"]/div/table//tr/td/text()').extract()
att_values = [{first: second} for first, second in zip(cells[::2], cells[1::2])]

您将获得属性-值对的列表：

[{attr_1: value_1}, {attr_2: value_2}, {attr_3: value_3}, ...]

或

att_values = {first: second for first, second in zip(cells[::2], cells[1::2])}

# or:

# att_values = dict( zip(cells[::2], cells[1::2]) )

去拿字典

{attr_1: value_1, attr_2: value_2, attr_3: value_3, ...}

赞(0）回复(0）举报 2022-11-09

eanckbw92#

请尝试：

for row in response.css('section#attributes table tr'):
    td1 = row.xpath('.//td[1]/text()').get()
    td2 = row.xpath('.//td[2]/text()').get()
    # your logic further

赞(0）回复(0）举报 2022-11-09

我来回答

使用Scrapy和XPath解析表中的信息

2条答案

相关问题

热门标签

最新问答