pandas 如何使用数据框架从网站上用python来废弃url链接,用google翻译来翻译url链接?

nhaq1z21  于 2023-03-06  发布在  Python
关注(0)|答案(1)|浏览(111)

我正在努力创建一个数据框架,但目前的作品,因为它废弃了网站的标题和课程。现在我正在努力写一些函数使用数据框架,将从网站计数为多少网址链接。此后,必须从网站翻译这些文本上下文(英语到印地语)。任何人谁可以帮助我这个问题?

`# scrapping of the class-central.com website links
# this application uses selinium driver to access the web-pages
#

from selenium import webdriver
from selenium.webdriver.common.by import By
import time

url = "https://www.classcentral.com/collection/top-free-online-courses"
driver = webdriver.Chrome()
driver.get(url)
time.sleep(2)

all_courses = driver.find_element(by=By.CLASS_NAME, value='catalog-grid__results')
course_titles = all_courses .find_elements(by=By.CSS_SELECTOR, value='[class="color-charcoal course-name"]')

for title in course_titles:
    print(title.text)
`
nc1teljy

nc1teljy1#

我不确定我理解的是否正确,但如果你想加载所有课程,你必须点击"加载更多"直到按钮不可用。你可以通过href属性获得课程的URL:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
import pandas as pd
import time

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument("window-size=1920,1080")
driver = webdriver.Chrome(chrome_options=chrome_options)
url = "https://www.classcentral.com/collection/top-free-online-courses"
driver.get(url)

try:
    while True:
        # wait until button is clickable
        WebDriverWait(driver, 1).until(
                expected_conditions.element_to_be_clickable((By.XPATH, "//button[@data-name='LOAD_MORE']"))
            ).click()
        time.sleep(0.5)
except Exception as e:
    pass

all_courses = driver.find_element(by=By.CLASS_NAME, value='catalog-grid__results')
courses = all_courses.find_elements(by=By.CSS_SELECTOR, value='[class="color-charcoal course-name"]')

df = pd.DataFrame([[course.text, course.get_attribute('href')] for course in courses],
                    columns=['Title (eng)', 'Link'])

输出:

Title (eng)                                               Link
0                        Medical Parasitology | 医学寄生虫学  https://www.classcentral.com/course/edx-medica...
1    Understanding Medical Research: Your Facebook ...  https://www.classcentral.com/course/medical-re...
2    An Introduction to Interactive Programming in ...  https://www.classcentral.com/course/interactiv...
3                                        Mountains 101  https://www.classcentral.com/course/mountains-...
4                       Quantum Mechanics for Everyone  https://www.classcentral.com/course/edx-quantu...
..                                                 ...                                                ...
260                          Web Security Fundamentals  https://www.classcentral.com/course/edx-web-se...
261  Viral Marketing and How to Craft Contagious Co...  https://www.classcentral.com/course/wharton-co...
262                              Introduction to Linux  https://www.classcentral.com/course/edx-introd...
263            Bitcoin and Cryptocurrency Technologies  https://www.classcentral.com/course/bitcointec...
264  Machine Learning Foundations: A Case Study App...  https://www.classcentral.com/course/ml-foundat...

[265 rows x 2 columns]

相关问题