我正试图从HTML导出图像的TXT文件,主要是从Shopify网站。大多数的img的从Shopify网站的结构是相同的。由于某种原因,我不能刮的图像链接。我只需要第一个链接。
下面是一个HTML标记的示例。
<div class="grid-product__content"><a class="grid-product__link" href="/products/ayla-ring-gold">
<div class="grid-product__image-mask"><div class="grid__image-ratio grid__image-ratio--square">
<img alt="Ayla Ring | Gold - Alexa Kelley" class="lazyload grid__image-contain" data-aspectratio="1.0" data-sizes="auto" data-src="//cdn.shopify.com/s/files/1/1351/4197/products/Ayla_Ring_Gold_Hero_{width}x.jpg?v=1660506192" data-widths="[360, 540, 720, 900, 1080]"/>
</div><div class="grid-product__secondary-image small--hide"><img alt="Ayla Ring | Gold - Alexa Kelley" class="lazyload" data-aspectratio="1.0" data-sizes="auto" data-src="//cdn.shopify.com/s/files/1/1351/4197/products/Ayla_Ring_Gold_2_{width}x.jpg?v=1660506192" data-widths="[360, 540, 720, 1000]"/>
</div></div>
<div class="grid-product__meta">
<div class="grid-product__title grid-product__title--body">Ayla Ring | Gold</div><div class="grid-product__price"><span class="money">$85.00 USD</span>
返回的错误为“AttributeError:“NoneType”对象没有属性“get”“。我知道错误的含义,只是不知道如何获取链接。
这是我的代码...
baseurl = ('https://alexakelley.com')
protocol = ('https:')
dataset = []
with open(r'/run/user/759001103/gvfs/smb-share:server=192.168.0.150,share=indexserver/Country/USA/A/Alexakelley/alexakelley2.txt', "r") as f:
soup = BeautifulSoup(f.read(), "html.parser")
for e in soup.find('div', class_='grid grid--uniform'):
dataset.append({
'Field_01':protocol + e.find('img', class_='grid__image-contain lazyautosizes lazyloaded').get('data-srcset'),
'Field_02':e.find('div', class_='grid-product__title grid-product__title--body').get_text(strip=True),
'Field_03':baseurl + e.find('a', class_='grid-product__link').get('href'),
'Field_04':e.find('span', class_='money').get_text(strip=True)
})
df = pd.DataFrame(dataset).to_csv(r'/run/user/759001103/gvfs/smb-share:server=192.168.0.150,share=indexserver/Country/USA/A/Alexakelley/Alexakelley All.csv', index = False)
print(dataset)
如果我省略了Field_01,Field 02-Field 04将返回结果,所以我的代码可以正常工作。如何处理Field_01行代码?
1条答案
按热度按时间f4t66c6m1#
我改变了方法,获取与您识别的类匹配的所有元素(* 存储在ResultSets中的元素 *)-然后循环每个soup/ResultSet中包含的项,以获取位置中的元素并构建数据集/列表。
注意:在我的测试中,我找到了23个项目,但是,最后一个结果集"
spans with money
"有24个元素,因此,考虑这种不一致性。下面是修改后的代码:
结果:
| 指标|字段_01|字段_02|字段_03|字段_04|
| - ------|- ------|- ------|- ------|- ------|
| 无|https://cdn.shopify.com/s/files/1/1351/4197/products/Ayla_Ring_Gold_Hero_{width}x.jpg?v=1660506192|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 1个|商品名:艾玲银英雄x. jpg?v = 1660506245|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 第二章|网址:http://cdn.shopfy.com/s/files/1351/4197/products/Ayla_Ring_SilverY_Hero_5ee0a758-b4f7 - 471d-bcda-86a556cbc3d7_{宽度} x. jpg?v = 1661231526|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 三个|图片来源:https://cdn.shopify.com/s/files/1/1351/4197/products/Noemie_Ring_Gold_Hero_{宽度} x. jpg?v = 1660506641|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 四个|图片来源:http://cdn.shopify.com/s/files/1/1351/4197/products/Carolina_Ring_Gold_Hero_{宽度} x. jpg?v = 1660506629|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 五个|网站名称:gisele_ring_gold_hero_0b928b0a-542c-4ce5-bad5-dc535935a12f_{宽度} x. jpg?v = 1670545956|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 六个|https://cdn.shopify.com/s/files/1/1351/4197/products/Vera_Ring_Hero_{width}x.jpg?v=1670543130|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 七|网站名称:Gisele_Ring_Silver_Hero_01fa927d-73d4 - 4236 - 87b2 - 28881347cd7f_{宽度} x. jpg?v = 1670545866|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 八个|https://cdn.shopify.com/s/files/1/1351/4197/products/Elise_Ring_Gold_Hero_{width}x.jpg?v=1670544918|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 九|商品名:伊莉斯银铃英雄x. jpg?v = 1670544859|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 十个|商品名:杰琳娜戒指黄金英雄x. jpg?v = 1660504898|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 十一|https://cdn.shopify.com/s/files/1/1351/4197/products/Emeri_Gold_Hero_{width}x.jpg?v=1660504754|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 十二|https://cdn.shopify.com/s/files/1/1351/4197/products/Adele_Ring_Gold_Hero_{width}x.jpg?v=1660504305|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 十三|https://cdn.shopify.com/s/files/1/1351/4197/products/Maia_Ring_Gold_Hero_{width}x.jpg?v=1660505010|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 十四|https://cdn.shopify.com/s/files/1/1351/4197/products/Fiona_Ring_GC_Hero_{width}x.jpg?v=1660504768|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 十五|网站名称:http://cdn.shopfy.com/s/files/1351/4197/products/20210625_ALEXA.KELLEY_AKNG0003_0480copy_e2344c32-c893 - 4df9 - 9678 - 54a2d1e0f007_{宽度} x. jpg?v = 1633294772|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 十六|网址:https://cdn.shopify.com/s/files/1/1351/4197/products/Alaia_Bracelet_Gold_4毫米宽x. jpg图片格式= 1660504482|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 十七|网址:http://cdn.shopfy.com/s/files/1351/4197/products/20210625_ALEXA.KELLEY_AKNS0004_0479copy_0bdc78af-abc4 - 4867-bc22 - 1f9c517d520e_{宽度} x. jpg?v = 1633294849|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 十八|网址:https://cdn.shopify.com/s/files/1/1351/4197/products/Alaia_Bracelet_Silver_4mm_{宽度} x. jpg?v = 1660504542|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 十九|商品名:伊沙贝尔戒指银色英雄x. jpg?v = 1660504875|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 二十个|图片来源:https://cdn.shopify.com/s/files/1/1351/4197/products/Noemie_Ring_Silver_Hero_{宽度} x. jpg?v = 1660505177|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 二十一|图片来源:百度商城|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |
| 二十二|图片来源:http://cdn.shopify.com/s/files/1/1351/4197/products/Natalie_Ring_Silver_Hero_{宽度} x. jpg?v = 1660505083|娜塔莉银戒指|/products/纳塔莉-戒指-银|80美元| $80.00 USD |