scrapy 我刮了一个网站,但不能正确存储数据,需要一个很好的解释

shstlldc  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(116)
import scrapy
from ..items import TrendyolItem
class Trendyolspider(scrapy.Spider):

    name = "trendy"
    start_urls = [
        'https://www.trendyol.com/sr?q=diz%C3%BCst%C3%BC+bilgisayar&qt=diz%C3%BCst%C3%BC+bilgisayar&st=diz%C3%BCst%C3%BC+bilgisayar&os=1&pi=1'
    ]

    def parse(self, response):
        items = TrendyolItem()
        all_data_pc = response.css('div.prdct-cntnr-wrppr')
        for trendy in all_data_pc:
            price = trendy.css('div.prc-box-dscntd::text').extract()
            brand = trendy.css('span.prdct-desc-cntnr-ttl::text').extract()
            features = trendy.css('span.prdct-desc-cntnr-name.hasRatings::text').extract()

            items['price'] = price
            items['brand'] = brand
            items['features'] = features

            yield items

这应该给予如下输出

{'brand': ['Huawei]
 'price':  ['13.308,40 TL']

但这段代码给出的输出不是这些

{'brand': ['Huawei',
           'Apple',
           'Monster',
           'LENOVO',
           'ASUS',
           'Apple',
           'ASUS',
           'Huawei',
           'Dell',
           'Huawei',
then price then features...

我该怎么解决我做错的事呢谢谢你们。

yks3o0rb

yks3o0rb1#

response.css('div.prdct-cntnr-wrppr')将返回项目的容器,因此您只在容器本身上循环一次(而不是项目),并且extract()将返回所有项目值的列表。
查看更改:

import scrapy
from ..items import TrendyolItem

class Trendyolspider(scrapy.Spider):
    name = "trendy"
    start_urls = [
        'https://www.trendyol.com/sr?q=diz%C3%BCst%C3%BC+bilgisayar&qt=diz%C3%BCst%C3%BC+bilgisayar&st=diz%C3%BCst%C3%BC+bilgisayar&os=1&pi=1'
    ]

    def parse(self, response):
        all_data_pc = response.css('div.p-card-wrppr')
        for trendy in all_data_pc:
            items = TrendyolItem()
            price = trendy.css('div.prc-box-dscntd::text').get()
            brand = trendy.css('span.prdct-desc-cntnr-ttl::text').get()
            features = trendy.css('span.prdct-desc-cntnr-name.hasRatings::text').get()

            items['price'] = price
            items['brand'] = brand
            items['features'] = features
            yield items

相关问题