我正在写一个漂亮的汤脚本,如下所示:
for i in urls:
url = remove_extra_char_in_values(i)
response_get = requests.get(url)
if response_get.status_code == 200:
site = requests.get(url, headers=HEADERS).text
bs = BeautifulSoup(site, 'html.parser')
images = bs.find_all('img', {'src': re.compile('.jpg')})
for image in images:
name = image['alt']
images_link = image['src']
with open(name.replace(' ', '-') + '.jpg', 'wb') as f:
img = requests.get(images_link)
f.write(img.content)
print('Writing: ', name)
elif response_get.status_code != 200:
print('could not download link: ', url)
我正在尝试从标签中提取图像,这些标签看起来链接如下:
<img data-test-id="Img"
src="https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440"
srcset="https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440
100w, https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 200w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 320w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 360w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 375w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 400w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 414w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 640w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 720w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 750w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 768w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 828w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 1024w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 1280w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 1366w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 1440w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 1536w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 1920w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 2048w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 2560w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 2732w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 2880w,
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 3840w"
sizes="(min-width: 1400px) 100vw, (min-width: 1200px) 100vw, (min-width: 980px) 100vw, (min-
width: 600px) 100vw, 100vw" class="Img-ifujty gIUKwk"
链接有img/src,但没有拾取它,‘data test id=“img”’是否与之相关,或者最后是类?如果是这样的话,我如何在find_all('img',{'src':re.compile('jpg')}中包含这些和img/scr?似乎也没有使用js。
当我在这里时,如何让bs浏览所有网页,从整个网站获取图像?不,只是调用页面。
要清楚的编辑。
该脚本不会从相关站点下载图像。在这种情况下https://www.vogue.com.au/vogue-living
rgs,
暂无答案!
目前还没有任何答案,快来回答吧!