我试着刮这一页："示例网站"https://www.semi.org/en/resources/member-directory"
就其本身而言，代码似乎运行良好：link = browser.find_element(By.CLASS_NAME, "member-company__title").find_element(By.TAG_NAME, 'a').get_attribute('href')返回我的链接，但是当我把代码嵌套在for循环中时，我得到一个错误，css选择器找不到元素，我尝试使用X_Path，但是只能访问第一个容器。
这是我的代码：

results_df = pd.DataFrame({'Company Name': [], 'Join Date': [], 'Company ID': [],'Company Description': [], 'Link': [], 'Primary Industry': [], 
                           'Primary Product Category': [], 'Primary Sub Product Category': [], 'Keywords': [], 'Address':[]})
browser = webdriver.Chrome()
# Load the desired URL
another_url = "https://www.semi.org/en/resources/member-directory"
browser.get(another_url)
time.sleep(3)
containers = browser.find_elements(By.TAG_NAME, 'tr')
for i in range(len(containers)):
    container = containers[i]
    link = container.find_element(By.TAG_NAME, 'a').get_attribute('href')
    browser.get(link)
    print("Page navigated after click" + browser.title)
    time.sleep(3)
    company_name =  browser.find_element(By.CLASS_NAME, "page-title").text
    try:
        join_date = browser.find_element(By.CLASS_NAME, "member-company__join-date").find_element(By.TAG_NAME, 'span').text
    except NoSuchElementException:
        join_date = "None"
    try:
        c_ID = browser.find_element(By.CLASS_NAME, "member-company__company-id").find_element(By.TAG_NAME, 'span').text
    except NoSuchElementException:
        c_ID = "None"
    try:
        company_description = browser.find_element(By.CLASS_NAME, "member-company__description").text
    except NoSuchElementException:
        company_description = "None" 
    try:
        company_link = browser.find_element(By.CLASS_NAME,"member-company__website").find_element(By.TAG_NAME, 'div').get_attribute('href')
    except NoSuchElementException:
        company_link = "None"
    try:
        primary_industry = browser.find_element(By.CLASS_NAME, "member-company__primary-industry").find_element(By.TAG_NAME, 'div').text
    except NoSuchElementException:
        primary_industry = "None"
    try:
        primary_product_cat = browser.find_element(By.CLASS_NAME, "member-company__primary-product-category").find_element(By.TAG_NAME, 'div').text
    except NoSuchElementException:
        primary_product_cat = "None"
    try:
        primary_sub_product_cat = browser.find_element(By.CLASS_NAME, "member-company__primary-product-subcategory").find_element(By.TAG_NAME, 'div').text
    except NoSuchElementException:
        primary_sub_product_cat = "None"
    
    try:
        keywords = browser.find_element(By.CLASS_NAME, "member-company__keywords ").find_element(By.TAG_NAME, 'div').text
    except NoSuchElementException:
        keywords = "None"
    try:
        address = browser.find_element(By.CLASS_NAME,"member-company__address").text.replace("Street Address","")
    except NoSuchElementException:
        address = "None"
    browser.get(another_url)
    time.sleep(5)
    result_df = pd.DataFrame({"Company Name": [company_name], 
        "Join Date": [join_date],
        "Company ID": [c_ID],
        "Company Description": [company_description],
        "Company Website": [company_link],
        "Primary Industry": [primary_industry],
        "Primary Product Category": [primary_product_cat],
        "Primary Sub Product Category": [primary_sub_product_cat],
        "Keywords": [keywords],
        "Address":[address]})
    results_df = pd.concat([results_df, result_df])
    results_df.reset_index(drop=True, inplace=True)
    results_df.to_csv('semi_test', index=False)
browser.close()

这是怎么回事？
'

1条答案

按热度按时间

ncgqoxb01#

这主要是由于语句containers = browser.find_elements(By.TAG_NAME, 'tr')。如果打印出容器，你会注意到选择的第一行是没有链接的标题，你的脚本将失败，并给出你所看到的异常。你可以用containers = containers[1:]修复这个问题，但你会遇到StaleElementReferenceException的问题，因为你在打开另一个链接后回到了主页。您应该一次从页面中抓取所有链接，然后遍历这些链接以抓取每个链接，而不是一次又一次地返回主页。

赞(0）回复(0）举报 2023-02-12

无法使用Selenium CSS选择器找到元素，即使它单独工作正常

1条答案

相关问题

热门标签

最新问答