[Selenium]我想保存我的Pinterest板上的所有图像

j1dl9f46  于 2023-02-12  发布在  其他
关注(0)|答案(1)|浏览(212)

我想保存所有图片从一个pinterest板。我有麻烦写的过程回到板,并转到下一个图像后,下载图像,我会很感激,如果你能帮助我。
电路板示例:https://www.pinterest.jp/aku_ma/%E3%82%A2%E3%83%8B%E3%83%A1%E3%82%A2%E3%82%A4%E3%82%B3%E3%83%B3/
1.登录
1.访问董事会←我已经做到了这一点.
1.访问图板中图像的页面
1.按下载按钮并保存到指定路径
1.返回讨论板并访问下一个图像的页面

红头发在红头发里

  1. import os
  2. import selenium
  3. import time
  4. from selenium.webdriver.chrome.options import Options
  5. from selenium.webdriver.common.by import By
  6. from selenium import webdriver
  7. from webdriver_manager.chrome import ChromeDriverManager
  8. url ='https://www.pinterest.jp/aku_ma/%E3%82%A2%E3%83%8B%E3%83%A1%E3%82%A2%E3%82%A4%E3%82%B3%E3%83%B3/'
  9. profilefolder = '--user-data-dir=' + '/Users/t/Library/Application Support/Google/Chrome/Default'
  10. emailAdress = 'xxxx@gmail.com'
  11. passwordNumber='xxxx'
  12. foldername="/Users/t/Desktop/koreanLikeImages"
  13. speed = 1
  14. options = Options()
  15. # options.add_argument('--headless')
  16. DRIVER_PATH = "./chromedriver" # My ChromeDrivers Path
  17. driver = webdriver.Chrome(options=options)
  18. driver.get(url)
  19. loginButton = driver.find_element(By.CSS_SELECTOR, "div[data-test-id='login-button']")
  20. loginButton.click()#Push at login button
  21. time.sleep(1)
  22. #Enter ID,Pass
  23. email = driver.find_element(By.ID,"email")
  24. email.send_keys(emailAdress)
  25. password = driver.find_element(By.ID,"password")
  26. password.send_keys(passwordNumber)
  27. # Push The Red Login Button
  28. redLoginButton = driver.find_element(By.CLASS_NAME, "SignupButton")
  29. redLoginButton.click()
  30. time.sleep(3)
  31. driver.get(url)
bwitn5fc

bwitn5fc1#

步骤3,4和5是不必要的,因为当你在主页面的高分辨率链接已经加载在html中。例如,这是一个图像的html代码

  1. <img ... srcset="
  2. https://i.pinimg.com/236x/80/c8/ec/80c8ec56386197561bac4c4e40d331b8.jpg 1x,
  3. https://i.pinimg.com/474x/80/c8/ec/80c8ec56386197561bac4c4e40d331b8.jpg 2x,
  4. https://i.pinimg.com/736x/80/c8/ec/80c8ec56386197561bac4c4e40d331b8.jpg 3x,
  5. https://i.pinimg.com/originals/80/c8/ec/80c8ec56386197561bac4c4e40d331b8.jpg 4x">

如你所见,每张图片有4个url,每个url是不同分辨率的图片,4x分辨率最高,使用urllib.request.urlretrieve(url)我们可以下载url关联的文件,这样我们就可以直接在主页上下载高质量的图片。

  1. import urllib.request
  2. from selenium.common.exceptions import StaleElementReferenceException
  3. foldername = 'C://Users//gtu//Desktop//folder//'
  4. urls = []
  5. new_images = False
  6. while 1:
  7. images = driver.find_elements(By.CSS_SELECTOR, 'img[srcset]')
  8. for img in images:
  9. try:
  10. url = img.get_attribute('srcset').split(',')[-1].split()[0] # [-1] selects the larget resolution
  11. except StaleElementReferenceException:
  12. # as you scroll down old images are removed from the html, so it may raise this error but it's not a real problem
  13. continue
  14. if url not in urls:
  15. # scroll down so that new images are loaded
  16. driver.execute_script('arguments[0].scrollIntoView({block: "center", behavior: "smooth"});', img)
  17. urls.append(url)
  18. print(url)
  19. new_images = True
  20. file_name = url.split('/')[-1]
  21. # download the image
  22. urllib.request.urlretrieve(url, foldername + file_name)
  23. time.sleep(1)
  24. # if there are no new images it means we reached the bottom of the page
  25. if not new_images:
  26. break
  27. else:
  28. new_images = False
展开查看全部

相关问题