如何使用Python从多个页面进行网页抓取到Excel?

yftpprvb  于 2023-08-02  发布在  Python
关注(0)|答案(1)|浏览(132)

我试图从这个表中提取数据,但我不能为下一组页面做
网址:-https://securities.stanford.edu/filings.html?page=1
只能对page = 1执行
我试着用漂亮的汤,但无法得到第2页,第3页等的回应。我需要一些帮助转换所有的表格数据到excel

  1. def opencodezscraping(webpage, page_number):
  2. next_page = webpage + str(page_number)
  3. response= requests.get(str(next_page))
  4. soup = BeautifulSoup(response.content,"html.parser")
  5. soup_table= soup.find('table',{"class":"table table-bordered table-striped table-hover"})
  6. for j in soup_table.find_all('tr')[1:]:
  7. row_data = j.find_all('td')
  8. row = [i.text for i in row_data]
  9. print(row)
  10. #Generating the next page url
  11. if page_number < 16:
  12. page_number = page_number + 1
  13. opencodezscraping(webpage, page_number)
  14. #calling the function with relevant parameters
  15. opencodezscraping('https://securities.stanford.edu/filings.html?page=', 2)

字符串

vulvrdjw

vulvrdjw1#

没必要让自己为难。有一个.read_html()函数可以实现你想要的功能。

  1. >>> import pandas as pd
  2. >>>
  3. >>> dfs = pd.read_html('https://securities.stanford.edu/filings.html?page=1')
  4. >>>
  5. >>> len(dfs)
  6. 1
  7. >>> df = dfs[0]
  8. >>> df
  9. Filing Name Filing Date District Court Exchange Ticker
  10. 0 AT&T Inc. 07/28/2023 D. New Jersey New York SE T
  11. 1 Syneos Health, Inc. 07/27/2023 S.D. New York NASDAQ SYNH
  12. ...

字符串
然后你可以.stack()页面df的,附加到listdict s,无论什么。

展开查看全部

相关问题