使用Selenium(Python)点击“下载”按钮并从Chrome下载PDF

k7fdbhmy 于 2023-06-28 发布在 Python

关注(0)|答案(2)|浏览(253)

我正试图从网站下载PDF文件集（在zip）。我设法打开浏览器并点击正确的按钮，然而，下载不发生。
我已经尝试改变默认目录，添加更多的选项等，但似乎没有工作。

from selenium import webdriver
import time
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

options = Options()
options.add_experimental_option('prefs',  {
    "download.default_directory": r"USING FULL PATH HERE\\",
    "download.prompt_for_download": False,
    "download.directory_upgrade": True,
    "plugins.always_open_pdf_externally": True
    }
)
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=options)

driver.get('https://www.sciencedirect.com/journal/international-journal-of-greenhouse-gas-control/vol/127/suppl/C')

# Find the button using XPath
button = driver.find_element(By.XPATH, '//*[@id="react-root"]/div/div/div/main/section[1]/div/div/div/form/button/span')

# Click on the button
button.click()

time.sleep(5)

driver.quit()

感谢您的评分

python

来源：https://stackoverflow.com/questions/76565744/using-selenium-python-to-click-download-button-and-download-pdf-from-chrome

2条答案

按热度按时间

n1bvdmb61#

点击按钮启动下载后，您需要等待文件下载完毕后才能退出浏览器。

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Click on the button
button.click()

# Wait for the file to be downloaded
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.TAG_NAME, 'html')))

driver.quit()

分别从www.example.com .ui和selenium.webdriver.support添加了一些额外模块WebDriverWait和expected_conditions的导入selenium.webdriver.support。这些模块允许您等待，直到满足特定条件。在本例中，我们等待直到出现HTML元素，这表示下载完成。希望这有帮助！！

赞(0）回复(0）举报 2023-06-28

ufj5ltwl2#

检查下面的工作代码：

driver = webdriver.Chrome()
driver.maximize_window()
driver.get('https://www.sciencedirect.com/journal/international-journal-of-greenhouse-gas-control/vol/127/suppl/C')
wait = WebDriverWait(driver, 30)
button = wait.until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Download full issue']")))

# Click on the button
button.click()
driver.quit()

注意：我更改了XPath表达式，使其更易于用户阅读，并且不那么古怪。

更新：

**根本原因：**网站识别了bot（您的自动化脚本），不允许下载，这是根本原因。
**解决方案：**要解决此问题，您需要使用第三方库undetected_chromedriver。这个库确保网站不会识别机器人。首先需要安装undetected_chromedriver。

然后使用下面使用undetected_chromedriver的工作代码：

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import undetected_chromedriver as uc

driver = uc.Chrome()
driver.maximize_window()
driver.get('https://www.sciencedirect.com/journal/international-journal-of-greenhouse-gas-control/vol/127/suppl/C')
wait = WebDriverWait(driver, 30)
# Click on download button
wait.until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Download full issue']"))).click()
# Close the pop-up window
wait.until(EC.element_to_be_clickable((By.XPATH, "(//span[@class='button-text'])[5]"))).click()

赞(0）回复(0）举报 2023-06-28

我来回答

使用Selenium(Python)点击“下载”按钮并从Chrome下载PDF

2条答案

更新：

相关问题

热门标签

最新问答