python 我怎么能一次从一个只在滚动后显示响应的网站上抓取所有数据?

x7yiwoj4  于 2024-01-05  发布在  Python
关注(0)|答案(1)|浏览(147)

我试图从一个网站大学的名称和地址:https://www.collegenp.com/2-science-colleges/,但问题是,我只得到了列表中的前11所大学的数据,而没有得到其他大学的数据.我已经尝试了我所知道的一切.但没有方法工作.
我的代码是:

  1. from selenium import webdriver
  2. import bs4
  3. from bs4 import BeautifulSoup
  4. import requests
  5. import pandas as pd
  6. from time import sleep
  7. driver=webdriver.Chrome('C:/Users/acer/Downloads/chromedriver.exe')
  8. driver.get('https://www.collegenp.com/2-science-colleges/')
  9. driver.refresh()
  10. sleep(20)
  11. page=requests.get("https://www.collegenp.com/2-science-colleges/")
  12. college = []
  13. location=[]
  14. soup= BeautifulSoup(page.content,'html.parser')
  15. for a in soup.find_all('div',attrs={'class':'media'}):
  16. name=a.find('h3',attrs={'class':'college-name'})
  17. college.append(name.text)
  18. loc=a.find('span',attrs={'class':'college-address'})
  19. location.append(loc.text)
  20. df=pd.DataFrame({'College name':college,'Locations':location})
  21. df.to_csv('hell.csv',index=False,encoding='utf-8')

字符串
有没有什么方法可以让我把所有的数据都 * 出来?

q35jwt9p

q35jwt9p1#

您可以使用此代码从下一页获取信息:

  1. import requests
  2. from bs4 import BeautifulSoup
  3. import pandas as pd
  4. url = "https://www.collegenp.com/2-science-colleges/"
  5. headers = {"X-Requested-With": "XMLHttpRequest"}
  6. data = {"state": "on", "action": "filter", "count": "0"}
  7. all_data = []
  8. for page in range(0, 5): # <-- increase number of pages here
  9. print("Getting page {}..".format(page))
  10. data["count"] = page * 10
  11. soup = BeautifulSoup(
  12. requests.post(url, data=data, headers=headers).content,
  13. "html.parser",
  14. )
  15. for c in soup.select(".college-name"):
  16. all_data.append(
  17. {
  18. "College name": c.get_text(strip=True),
  19. "Location": c.find_next(class_="college-address").get_text(
  20. strip=True
  21. ),
  22. }
  23. )
  24. df = pd.DataFrame(all_data)
  25. print(df)
  26. df.to_csv("data.csv", index=False)

字符串
印刷品:

  1. College name Location
  2. 0 Caspian Valley College,Lalitpur Kumaripati, Lalitpur
  3. 1 Advance Academy and Republica College,Lalitpur Kumaripati, Lalitpur
  4. 2 Araniko International Academy,Lalitpur Satdobato, Lalitpur
  5. 3 Bagiswori Secondary School, Taulachhen, Bhakta... Chyamhasing, Bhaktapur
  6. 4 Bajra Barahi Secondary School,Lalitpur Chapagaon, Lalitpur
  7. 5 Bhanubhakta Memorial College,Kathmandu Lazimpat, Kathmandu
  8. 6 Damak Model Secondary School,Jhapa Damak, Jhapa
  9. 7 Damak Multiple Campus,Jhapa Damak, Jhapa
  10. 8 Einstein Academy,Lalitpur Thasikhel, Lalitpur
  11. 9 Hari Khetan Multiple Campus,Parsa Birganj, Parsa
  12. 10 Kankai Adarsha Campus,Morang Birtamode, Morang
  13. 11 Lumbini Adarsh Degree College,Nawalparasi Kawasoti, Nawalparasi
  14. 12 Madhyabindu Multiple Campus,Nawalparasi Kawasoti, Nawalparasi
  15. 13 Marshyangdi Multiple Campus,Lamjung Besishahar, Lamjung
  16. ...


并保存data.csv(来自LibreOffice的屏幕截图):


的数据

展开查看全部

相关问题