为什么PYTHON中的SeleniumWebDriver不能返回所有图像链接？

jgovgodb 于 2022-11-10 发布在 Python

关注(0)|答案(1)|浏览(157)

我正在使用Selify WebDriver从一个加载了JavaScript的网站收集图像的URL。下面的代码似乎只返回了大约240个链接中的160个。为什么会出现这种情况--因为使用了JavaScript呈现？
有没有办法调整我的代码来绕过这个问题？

driver = webdriver.Chrome(ChromeDriverManager().install(), options = chrome_options)
driver.get('https://www.politicsanddesign.com/')
img_url = driver.find_elements_by_xpath("//div[@class='responsive-image-wrapper']/img")

img_url2 = []
for element in img_url:
    new_srcset = 'https:' + element.get_attribute("srcset").split(' 400w', 1)[0]
    img_url2.append(new_srcset)

selenium

来源：https://stackoverflow.com/questions/74305657/why-is-selenium-webdriver-in-python-not-returning-all-image-links

1条答案

按热度按时间

lxkprmvk1#

您需要等待加载所有这些元素。
推荐的方法是使用WebDriverWaitexpected_conditions显式等待。
此代码为我提供了img_url2列表中的760-880个元素：

import time

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")

webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 10)

url = "https://www.politicsanddesign.com/"

driver.get(url) # once the browser opens, turn off the year filter and scroll all the way to the bottom as the page does not load all elements on rendering
wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div[@class='responsive-image-wrapper']/img")))

# time.sleep(2)

img_url = driver.find_elements(By.XPATH, "//div[@class='responsive-image-wrapper']/img")

img_url2 = []
for element in img_url:
    new_srcset = 'https:' + element.get_attribute("srcset").split(' 400w', 1)[0]
    img_url2.append(new_srcset)

我不确定这个代码是否足够稳定，所以如果需要，您可以激活wait行和下一行之间的延迟，以获取所有这些img_url。
编辑：
一旦浏览器打开，您将需要关闭页面的过滤器，然后一直滚动到页面底部，因为它不会在呈现时自动加载所有元素；只有在您稍微处理过页面之后才会这样做。

赞(0）回复(0）举报 2022-11-10

我来回答

为什么PYTHON中的SeleniumWebDriver不能返回所有图像链接？

1条答案

相关问题

热门标签

最新问答