我编写了下面的python脚本来返回https://shop.doverstreetmarket.com/collections/shops-noah上列出的物品的名称、价格和链接
import scrapy
class DSMUKSpider(scrapy.Spider):
name = 'dsmuk'
start_urls = ['https://shop.doverstreetmarket.com/collections/shops-noah']
def parse(self, response):
for dsmuk_product in response.css('article.h-full'):
try:
yield {
'name': dsmuk_product.css('h2.font-display.text-xs.leading-2xs.md:text-sm.md:leading-xs.mb-2.5.a.span::text').get(),
'price': dsmuk_product.css('div.flex.flex-wrap.gap-x-2.uppercase.span::text').get().replace('£',''),
'link': dsmuk_product.css('h2.font-display.text-xs.leading-2xs.md:text-sm.md:leading-xs.mb-2.5.a').attrib['href'],
}
except:
yield {
}`
字符串
所需的输出表如下所示:
| 名称|价格|链路|
| --|--|--|
| 基思哈林宝丽来长袖T恤| 58 |/产品系列/noah shops-noah/产品/noah-mens-noah-x-keith-haring-polaroid-l-白色|
| 基思哈林皮革首饰| 48 |/产品系列/noah shops-noah/产品/noah-noah-x-keith-haring-leather-or-brown|
但是,使用scrapy crawl命令运行这个spider会产生一个空白的csv -没有标题或单元格值。
我怀疑a类中间包含项目名称的span类阻止了解析返回所需的文本;然而,我不完全确定如何调整CSS符号来解决这个问题-我希望在这里得到任何帮助。请参阅下面我试图抓取的底层html片段:
<li class="col-span-6 sm:col-span-3">
<article class="
h-full flex flex-col relative
">
<div class="mb-2" style="background-color: #F5F5F5;">
<img src="//shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=600" alt="Noah - Keith Haring Polaroid Long Sleeve Tee - (White)" srcset="//shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=160 160w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=175 175w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=238 238w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=273 273w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=320 320w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=350 350w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=374 374w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=400 400w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=476 476w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=546 546w" width="600" height="600" class=" block w-full h-full aspect-[4/5] object-contain mix-blend-multiply" sizes="(min-width: 1360px) calc(calc(((100 - 16.2) / 100 * 1360px) - 1.5rem) / 4 - (0.5rem * 3 / 4)), (min-width: 1280px) calc(calc(((100vw - (16.2 / 100 * 100vw)) - ((1.5rem * 2) - ((1.5rem * 2) * (16.2 / 100)))) - 1.5rem) / 4 - (0.5rem * 3 / 4)), (min-width: 744px) calc(calc(100vw - (1.5rem * 2)) / 4 - (0.5rem * 3 / 4)), (min-width: 640px) calc(calc(100vw - (1rem * 2)) / 4 - (0.5rem * 3 / 4)), calc(calc(100vw - (1rem * 2)) / 2 - (0.5rem / 2))">
</div>
<h2 class="font-display text-xs leading-2xs md:text-sm md:leading-xs mb-2.5" data-id="7153074733318">
<a class="
before:absolute before:inset-0 before:z-10
focus:before:focus-ring focus-visible:before:focus-ring
[&:focus:not(:focus-visible)]:before:not-focus-ring focus:not-focus-ring
" href="/collections/shops-noah/products/noah-mens-noah-x-keith-haring-polaroid-l-white" @click.prevent="setHistory(7153074733318, '/collections/shops-noah/products/noah-mens-noah-x-keith-haring-polaroid-l-white')">
<span class="block uppercase">
Noah
</span>
Keith Haring Polaroid Long Sleeve Tee
</a>
</h2>
<div class="relative mt-auto">
<div class="group-hover:invisible text-xs leading-2xs md:text-sm md:leading-xs uppercase">
<div class="
flex flex-wrap gap-x-2 uppercase
">
<span class="sr-only">
Regular price
</span>
<span class="
">
£58
</span>
</div>
</div>
</div>
</article>
</li>
<li class="col-span-6 sm:col-span-3">
<article class="
h-full flex flex-col relative
">
<div class="mb-2" style="background-color: #F5F5F5;">
<img src="//shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=600" alt="Noah - Keith Haring Leather Ornament - (Brown)" srcset="//shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=160 160w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=175 175w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=238 238w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=273 273w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=320 320w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=350 350w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=374 374w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=400 400w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=476 476w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=546 546w" width="600" height="600" class=" block w-full h-full aspect-[4/5] object-contain mix-blend-darken" sizes="(min-width: 1360px) calc(calc(((100 - 16.2) / 100 * 1360px) - 1.5rem) / 4 - (0.5rem * 3 / 4)), (min-width: 1280px) calc(calc(((100vw - (16.2 / 100 * 100vw)) - ((1.5rem * 2) - ((1.5rem * 2) * (16.2 / 100)))) - 1.5rem) / 4 - (0.5rem * 3 / 4)), (min-width: 744px) calc(calc(100vw - (1.5rem * 2)) / 4 - (0.5rem * 3 / 4)), (min-width: 640px) calc(calc(100vw - (1rem * 2)) / 4 - (0.5rem * 3 / 4)), calc(calc(100vw - (1rem * 2)) / 2 - (0.5rem / 2))">
</div>
<h2 class="font-display text-xs leading-2xs md:text-sm md:leading-xs mb-2.5" data-id="7153075290374">
<a class="
before:absolute before:inset-0 before:z-10
focus:before:focus-ring focus-visible:before:focus-ring
[&:focus:not(:focus-visible)]:before:not-focus-ring focus:not-focus-ring
" href="/collections/shops-noah/products/noah-noah-x-keith-haring-leather-or-brown" @click.prevent="setHistory(7153075290374, '/collections/shops-noah/products/noah-noah-x-keith-haring-leather-or-brown')">
<span class="block uppercase">
Noah
</span>
Keith Haring Leather Ornament
</a>
</h2>
<div class="relative mt-auto">
<div class="group-hover:invisible text-xs leading-2xs md:text-sm md:leading-xs uppercase">
<div class="
flex flex-wrap gap-x-2 uppercase
">
<span class="sr-only">
Regular price
</span>
<span class="
">
£48
</span>
</div>
</div>
</div>
</article>
</li>
<li class="col-span-6 sm:col-span-3">
<article class="
h-full flex flex-col relative
">
<div class="mb-2" style="background-color: #F5F5F5;">
<img src="//shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=600" alt="Noah - The Cure Men's Raglan Hoodie - (Black)" srcset="//shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=160 160w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=175 175w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=238 238w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=273 273w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=320 320w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=350 350w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=374 374w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=400 400w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=476 476w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=546 546w" width="600" height="600" loading="lazy" class=" block w-full h-full aspect-[4/5] object-contain mix-blend-darken" sizes="(min-width: 1360px) calc(calc(((100 - 16.2) / 100 * 1360px) - 1.5rem) / 4 - (0.5rem * 3 / 4)), (min-width: 1280px) calc(calc(((100vw - (16.2 / 100 * 100vw)) - ((1.5rem * 2) - ((1.5rem * 2) * (16.2 / 100)))) - 1.5rem) / 4 - (0.5rem * 3 / 4)), (min-width: 744px) calc(calc(100vw - (1.5rem * 2)) / 4 - (0.5rem * 3 / 4)), (min-width: 640px) calc(calc(100vw - (1rem * 2)) / 4 - (0.5rem * 3 / 4)), calc(calc(100vw - (1rem * 2)) / 2 - (0.5rem / 2))">
</div>
<h2 class="font-display text-xs leading-2xs md:text-sm md:leading-xs mb-2.5" data-id="7153073717510">
<a class="
before:absolute before:inset-0 before:z-10
focus:before:focus-ring focus-visible:before:focus-ring
[&:focus:not(:focus-visible)]:before:not-focus-ring focus:not-focus-ring
" href="/collections/shops-noah/products/noah-mens-noah-x-the-cure-raglan-hoodie-black" @click.prevent="setHistory(7153073717510, '/collections/shops-noah/products/noah-mens-noah-x-the-cure-raglan-hoodie-black')">
<span class="block uppercase">
Noah
</span>
The Cure Men's Raglan Hoodie
</a>
</h2>
<div class="relative mt-auto">
<div class="group-hover:invisible text-xs leading-2xs md:text-sm md:leading-xs uppercase">
<div class="
flex flex-wrap gap-x-2 uppercase
">
<span class="sr-only">
Regular price
</span>
<span class="
">
£198
</span>
</div>
</div>
</div>
</article>
</li>
型
2条答案
按热度按时间xienkqul1#
这应该实现你正在寻找的东西:
说明:
对于name属性all,我们只需要获取h2元素的所有子元素的所有内部文本内容,为此,我们需要使用
getall()
方法返回字符串列表,因此有必要连接字符串,然后去除空白。对于price属性,我们需要div.flex的第二个子span元素的文本内容,所以我们再次使用
getall()
方法,但这次我们只选择结果列表的索引1中的内容,然后再次去除空白。对于link属性,我们只需要h2元素的子元素a标签的href属性.保持简单。
字符串
部分对数输出
型
qgzx9mmu2#
如果想使用纯python方法,可以提取脚本返回的JSON:
字符串
要生成(
.csv
),您可以使用:用途:型
一个带有pandas '
to_csv
的变体:型
输出(*144行x 3列 *):
| 名称|价格|链路|
| --|--|--|
| 基思哈林男子圆领-(白色)| 148 |/产品系列/noah shops-noah/产品/noah-mens-noah-x-keith-haring-crewneck-white|
| 基思哈林男子圆领-(石楠灰色)| 148 |/产品系列/noah shops-noah/产品/noah-mens-noah-x-keith-haring-crewneck-heather|
| 基思哈林宝丽来长袖T恤-(白色)| 58 |/产品系列/noah shops-noah/产品/noah-mens-noah-x-keith-haring-polaroid-l-白色|
| ......这是什么?|......这是什么?|......这是什么?|
| 男子运动卡车司机-(橙子)| 58 |/产品系列/诺亚购物中心/产品/诺亚男装运动卡车司机-橙子|
| 男士Core Logo 5-Panel -(黑色)| 64 |/产品系列/noah shops-noah/产品/noah-men-core-logo-5-panel-black|
| 男式Core Logo 5拼接-(鸭)| 64 |/产品系列/noah shops-noah/产品/noah-mens-core-logo-5-panel-duck|