python BeautifulSoup(response.content,'html. parser')返回错误的html结构

nhaq1z21  于 9个月前  发布在  Python
关注(0)|答案(1)|浏览(111)

为什么

soup = BeautifulSoup(response.content, 'html.parser')

字符串
返回

<ul><li><li><li></li></li></li></ul>


而不是

<ul><li></li><li></li><li></li></ul>


完整代码

from datetime import datetime
import requests
from bs4 import BeautifulSoup

def is_holiday_or_weekend():
    current_year = datetime.now().year
    today = datetime.now().strftime('%Y-%m-%d')

    url = f"https://www.kalendorius.today/nedarbo-dienos/{current_year}"

    # Start a session to maintain cookies
    session = requests.Session()

    try:
        # Send initial request to get PHPSESSID cookie
        session.get(url)
        headers = {
            'User-Agent': 'Mozilla/5.0',
            'Accept': 'application/json',
        }

        # Fetch the holiday data with headers and cookies
        response = session.get(url, headers=headers)
        response.raise_for_status()

        # Parse the HTML content
        soup = BeautifulSoup(response.content, 'html.parser')
        print(soup) # here is problems, wrong html structure
        # Extract <li> elements within <ul> of class 'calendar-items-list'
        holidays_elements = soup.find_all('ul', class_='calendar-items-list')
        holidays = {}
        for ul in holidays_elements:
            print(ul)
            for index, li in enumerate(ul.find_all('li')):

                date, name = li.get_text().strip().split(' - ', 1)
                if date not in holidays:  # Add this check to avoid duplicates
                    holidays[date] = name

        # Check if today is a holiday or a weekend
        if today in holidays or datetime.now().weekday() >= 5:
            return True

        return False

    except requests.RequestException as e:
        print(f"Error fetching holiday data: {e}")
        return None

# Usage
if is_holiday_or_weekend():
    print("Today is a holiday or weekend.")
else:
    print("Today is a regular working day.")


如何打印每个li元素?

tzcvj98z

tzcvj98z1#

您的代码对我来说很好,并打印出:“Today is a regular working day."
顺便说一下,你可以简化holidays字典的构建方式,如下所示:

import re, json

data = json.loads(re.search(r"return\s*(\[.+\])",
    soup.select_one("script[type='text/javascript']").text).group(1))

holidays = {d["date"]: d["title"] for d in data}

字符串
乌普图特

{
    '2024-01-01': 'Naujieji metai',
    '2024-02-13': 'Užgavėnės',
    '2024-02-16': ' Lietuvos Valstybės atkūrimo diena',
    '2024-03-11': 'Nepriklausomybės atkūrimo diena',
    '2024-03-31': 'Velykos',
    '2024-04-01': 'Velykų antroji diena',
    '2024-05-01': 'Tarptautinė darbo diena',
    '2024-05-05': 'Motinos diena',
    '2024-06-02': 'Tėvo diena',
    '2024-06-24': 'Joninės',
    '2024-07-06': 'Karaliaus Mindaugo karūnavimo diena',
    '2024-08-15': 'Žolinė',
    '2024-11-01': 'Visų šventųjų diena',
    '2024-11-02': 'Vėlinės',
    '2024-12-24': 'Šv. Kūčios',
    '2024-12-25': 'Šv. Kalėdos',
    '2024-12-26': 'Šv. Kalėdų antroji diena'
}

相关问题