使用Selenium(Python)点击“下载”按钮并从Chrome下载PDF

k7fdbhmy  于 2023-06-28  发布在  Python
关注(0)|答案(2)|浏览(217)

我正试图从网站下载PDF文件集(在zip)。我设法打开浏览器并点击正确的按钮,然而,下载不发生。
我已经尝试改变默认目录,添加更多的选项等,但似乎没有工作。

from selenium import webdriver
import time
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

options = Options()
options.add_experimental_option('prefs',  {
    "download.default_directory": r"USING FULL PATH HERE\\",
    "download.prompt_for_download": False,
    "download.directory_upgrade": True,
    "plugins.always_open_pdf_externally": True
    }
)
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=options)

driver.get('https://www.sciencedirect.com/journal/international-journal-of-greenhouse-gas-control/vol/127/suppl/C')

# Find the button using XPath
button = driver.find_element(By.XPATH, '//*[@id="react-root"]/div/div/div/main/section[1]/div/div/div/form/button/span')

# Click on the button
button.click()

time.sleep(5)

driver.quit()

感谢您的评分

n1bvdmb6

n1bvdmb61#

点击按钮启动下载后,您需要等待文件下载完毕后才能退出浏览器。

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Click on the button
button.click()

# Wait for the file to be downloaded
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.TAG_NAME, 'html')))

driver.quit()

分别从www.example.com .ui和selenium.webdriver.support添加了一些额外模块WebDriverWait和expected_conditions的导入selenium.webdriver.support。这些模块允许您等待,直到满足特定条件。在本例中,我们等待直到出现HTML元素,这表示下载完成。希望这有帮助!!

ufj5ltwl

ufj5ltwl2#

检查下面的工作代码:

driver = webdriver.Chrome()
driver.maximize_window()
driver.get('https://www.sciencedirect.com/journal/international-journal-of-greenhouse-gas-control/vol/127/suppl/C')
wait = WebDriverWait(driver, 30)
button = wait.until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Download full issue']")))

# Click on the button
button.click()
driver.quit()

注意:我更改了XPath表达式,使其更易于用户阅读,并且不那么古怪。

更新:

**根本原因:**网站识别了bot(您的自动化脚本),不允许下载,这是根本原因。
**解决方案:**要解决此问题,您需要使用第三方库undetected_chromedriver。这个库确保网站不会识别机器人。首先需要安装undetected_chromedriver

然后使用下面使用undetected_chromedriver的工作代码:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import undetected_chromedriver as uc

driver = uc.Chrome()
driver.maximize_window()
driver.get('https://www.sciencedirect.com/journal/international-journal-of-greenhouse-gas-control/vol/127/suppl/C')
wait = WebDriverWait(driver, 30)
# Click on download button
wait.until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Download full issue']"))).click()
# Close the pop-up window
wait.until(EC.element_to_be_clickable((By.XPATH, "(//span[@class='button-text'])[5]"))).click()

相关问题