使用selenium从网页中检索数据-不检索所有数据

bpsygsoo  于 2023-05-29  发布在  其他
关注(0)|答案(2)|浏览(265)

我试图从www.example.com检索数据(硬币名称,价格,coinmarket上限和流通供应)coinmarketcap.com,但当我运行下面的代码时,我只得到11个硬币名称。另外,我无法检索其他数据。我尝试了几种选择,但都没有成功。我的目标是将数据存储在dataframe中,这样我就可以分析它。

  1. driver = webdriver.Chrome(r'C:\Users\Ejer\PycharmProjects\pythonProject\chromedriver')
  2. driver.get('https://coinmarketcap.com/')
  3. Crypto = driver.find_elements_by_xpath("//div[contains(concat(' ', normalize-space(@class), ' '), 'sc-16r8icm-0 sc-1teo54s-1 lgwUsc')]")
  4. #price = driver.find_elements_by_xpath('//td[@class="cmc-link"]')
  5. #coincap = driver.find_elements_by_xpath('//td[@class="DAY"]')
  6. CMC_list = []
  7. for c in range(len(Crypto)):
  8. CMC_list.append(Crypto[c].text)
  9. print(CMC_list)
  10. #driver.get('https://coinmarketcap.com/')
  11. #print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[contains(@class, 'cmc-table')]//tbody//tr//td/a//p[@color='text']")))[:50]])
  12. driver.close()
xxslljrj

xxslljrj1#

尝试以下代码行以获取第页上的所有值:

  1. cryptos = [name.text for name in driver.find_elements_by_xpath('//td[3]/a[@class="cmc-link" and starts-with(@href, "/currencies/")]//p[@color="text"]')]
tsm1rwdh

tsm1rwdh2#

尝试使用BeautifulSoup删除coinmarket数据集

  1. data_list = []
  2. crypto_count = 0
  3. for page in range(1, 100):
  4. url = f'https://coinmarketcap.com/?page={page}'
  5. response = requests.get(url)
  6. soup = BeautifulSoup(response.text, 'html.parser')
  7. rows = soup.find('table', {'class': 'sc-beb003d5-3 ieTeVa cmc- table'}).find('tbody').find_all('tr')
  8. crypto_list = []
  9. for row in rows:
  10. dic = {}
  11. cells = row.find_all('td')
  12. if len(cells) >= 10:
  13. dic['Name'] = cells[2].text.strip()
  14. dic['Price'] = cells[3].text.strip().replace(',', '')
  15. dic['OneH'] = cells[4].text.strip()
  16. dic['TwentyfourH'] = cells[5].text.strip()
  17. dic['SevenD'] = cells[6].text.strip()
  18. dic['MarketCap'] = cells[7].text.strip().replace(',', '')
  19. dic['Volume'] = cells[8].text.strip().replace(',', '')
  20. dic['CirculatingSupply'] = cells[9].text.strip().replace(',', '')
  21. crypto_list.append(dic)
  22. crypto_count += 1
  23. if crypto_count == 1000:
  24. break
  25. data_list.append(crypto_list)
展开查看全部

相关问题