python BeautifulSoup(response.content,'html. parser')返回错误的html结构

nhaq1z21  于 2024-01-05  发布在  Python
关注(0)|答案(1)|浏览(153)

为什么

  1. soup = BeautifulSoup(response.content, 'html.parser')

字符串
返回

  1. <ul><li><li><li></li></li></li></ul>


而不是

  1. <ul><li></li><li></li><li></li></ul>


完整代码

  1. from datetime import datetime
  2. import requests
  3. from bs4 import BeautifulSoup
  4. def is_holiday_or_weekend():
  5. current_year = datetime.now().year
  6. today = datetime.now().strftime('%Y-%m-%d')
  7. url = f"https://www.kalendorius.today/nedarbo-dienos/{current_year}"
  8. # Start a session to maintain cookies
  9. session = requests.Session()
  10. try:
  11. # Send initial request to get PHPSESSID cookie
  12. session.get(url)
  13. headers = {
  14. 'User-Agent': 'Mozilla/5.0',
  15. 'Accept': 'application/json',
  16. }
  17. # Fetch the holiday data with headers and cookies
  18. response = session.get(url, headers=headers)
  19. response.raise_for_status()
  20. # Parse the HTML content
  21. soup = BeautifulSoup(response.content, 'html.parser')
  22. print(soup) # here is problems, wrong html structure
  23. # Extract <li> elements within <ul> of class 'calendar-items-list'
  24. holidays_elements = soup.find_all('ul', class_='calendar-items-list')
  25. holidays = {}
  26. for ul in holidays_elements:
  27. print(ul)
  28. for index, li in enumerate(ul.find_all('li')):
  29. date, name = li.get_text().strip().split(' - ', 1)
  30. if date not in holidays: # Add this check to avoid duplicates
  31. holidays[date] = name
  32. # Check if today is a holiday or a weekend
  33. if today in holidays or datetime.now().weekday() >= 5:
  34. return True
  35. return False
  36. except requests.RequestException as e:
  37. print(f"Error fetching holiday data: {e}")
  38. return None
  39. # Usage
  40. if is_holiday_or_weekend():
  41. print("Today is a holiday or weekend.")
  42. else:
  43. print("Today is a regular working day.")


如何打印每个li元素?

tzcvj98z

tzcvj98z1#

您的代码对我来说很好,并打印出:“Today is a regular working day."
顺便说一下,你可以简化holidays字典的构建方式,如下所示:

  1. import re, json
  2. data = json.loads(re.search(r"return\s*(\[.+\])",
  3. soup.select_one("script[type='text/javascript']").text).group(1))
  4. holidays = {d["date"]: d["title"] for d in data}

字符串
乌普图特

  1. {
  2. '2024-01-01': 'Naujieji metai',
  3. '2024-02-13': 'Užgavėnės',
  4. '2024-02-16': ' Lietuvos Valstybės atkūrimo diena',
  5. '2024-03-11': 'Nepriklausomybės atkūrimo diena',
  6. '2024-03-31': 'Velykos',
  7. '2024-04-01': 'Velykų antroji diena',
  8. '2024-05-01': 'Tarptautinė darbo diena',
  9. '2024-05-05': 'Motinos diena',
  10. '2024-06-02': 'Tėvo diena',
  11. '2024-06-24': 'Joninės',
  12. '2024-07-06': 'Karaliaus Mindaugo karūnavimo diena',
  13. '2024-08-15': 'Žolinė',
  14. '2024-11-01': 'Visų šventųjų diena',
  15. '2024-11-02': 'Vėlinės',
  16. '2024-12-24': 'Šv. Kūčios',
  17. '2024-12-25': 'Šv. Kalėdos',
  18. '2024-12-26': 'Šv. Kalėdų antroji diena'
  19. }

展开查看全部

相关问题