无法使用Selenium CSS选择器找到元素,即使它单独工作正常

m2xkgtsf  于 2023-02-12  发布在  其他
关注(0)|答案(1)|浏览(217)

我试着刮这一页:"示例网站"https://www.semi.org/en/resources/member-directory"
就其本身而言,代码似乎运行良好:link = browser.find_element(By.CLASS_NAME, "member-company__title").find_element(By.TAG_NAME, 'a').get_attribute('href')返回我的链接,但是当我把代码嵌套在for循环中时,我得到一个错误,css选择器找不到元素,我尝试使用X_Path,但是只能访问第一个容器。
这是我的代码:

  1. results_df = pd.DataFrame({'Company Name': [], 'Join Date': [], 'Company ID': [],'Company Description': [], 'Link': [], 'Primary Industry': [],
  2. 'Primary Product Category': [], 'Primary Sub Product Category': [], 'Keywords': [], 'Address':[]})
  3. browser = webdriver.Chrome()
  4. # Load the desired URL
  5. another_url = "https://www.semi.org/en/resources/member-directory"
  6. browser.get(another_url)
  7. time.sleep(3)
  8. containers = browser.find_elements(By.TAG_NAME, 'tr')
  9. for i in range(len(containers)):
  10. container = containers[i]
  11. link = container.find_element(By.TAG_NAME, 'a').get_attribute('href')
  12. browser.get(link)
  13. print("Page navigated after click" + browser.title)
  14. time.sleep(3)
  15. company_name = browser.find_element(By.CLASS_NAME, "page-title").text
  16. try:
  17. join_date = browser.find_element(By.CLASS_NAME, "member-company__join-date").find_element(By.TAG_NAME, 'span').text
  18. except NoSuchElementException:
  19. join_date = "None"
  20. try:
  21. c_ID = browser.find_element(By.CLASS_NAME, "member-company__company-id").find_element(By.TAG_NAME, 'span').text
  22. except NoSuchElementException:
  23. c_ID = "None"
  24. try:
  25. company_description = browser.find_element(By.CLASS_NAME, "member-company__description").text
  26. except NoSuchElementException:
  27. company_description = "None"
  28. try:
  29. company_link = browser.find_element(By.CLASS_NAME,"member-company__website").find_element(By.TAG_NAME, 'div').get_attribute('href')
  30. except NoSuchElementException:
  31. company_link = "None"
  32. try:
  33. primary_industry = browser.find_element(By.CLASS_NAME, "member-company__primary-industry").find_element(By.TAG_NAME, 'div').text
  34. except NoSuchElementException:
  35. primary_industry = "None"
  36. try:
  37. primary_product_cat = browser.find_element(By.CLASS_NAME, "member-company__primary-product-category").find_element(By.TAG_NAME, 'div').text
  38. except NoSuchElementException:
  39. primary_product_cat = "None"
  40. try:
  41. primary_sub_product_cat = browser.find_element(By.CLASS_NAME, "member-company__primary-product-subcategory").find_element(By.TAG_NAME, 'div').text
  42. except NoSuchElementException:
  43. primary_sub_product_cat = "None"
  44. try:
  45. keywords = browser.find_element(By.CLASS_NAME, "member-company__keywords ").find_element(By.TAG_NAME, 'div').text
  46. except NoSuchElementException:
  47. keywords = "None"
  48. try:
  49. address = browser.find_element(By.CLASS_NAME,"member-company__address").text.replace("Street Address","")
  50. except NoSuchElementException:
  51. address = "None"
  52. browser.get(another_url)
  53. time.sleep(5)
  54. result_df = pd.DataFrame({"Company Name": [company_name],
  55. "Join Date": [join_date],
  56. "Company ID": [c_ID],
  57. "Company Description": [company_description],
  58. "Company Website": [company_link],
  59. "Primary Industry": [primary_industry],
  60. "Primary Product Category": [primary_product_cat],
  61. "Primary Sub Product Category": [primary_sub_product_cat],
  62. "Keywords": [keywords],
  63. "Address":[address]})
  64. results_df = pd.concat([results_df, result_df])
  65. results_df.reset_index(drop=True, inplace=True)
  66. results_df.to_csv('semi_test', index=False)
  67. browser.close()

这是怎么回事?
'

ncgqoxb0

ncgqoxb01#

这主要是由于语句containers = browser.find_elements(By.TAG_NAME, 'tr')。如果打印出容器,你会注意到选择的第一行是没有链接的标题,你的脚本将失败,并给出你所看到的异常。你可以用containers = containers[1:]修复这个问题,但你会遇到StaleElementReferenceException的问题,因为你在打开另一个链接后回到了主页。您应该一次从页面中抓取所有链接,然后遍历这些链接以抓取每个链接,而不是一次又一次地返回主页。

相关问题