用漂亮的汤在Python中循环div

yqkkidmi  于 2022-12-17  发布在  Python
关注(0)|答案(1)|浏览(172)

尝试了解在python中循环非表格式数据的最佳方式(tr/td)
示例数据:https://www.nhlpa.com/the-pa/certified-agents?range=A-Z

尝试创建一个表的名称,头像URL,公司,地址,教育。
到目前为止,正在尝试执行以下操作,但似乎无法理解如何进入内容组件的div:

r=requests.get(url)
soup=BeautifulSoup(r.text, 'html5lib')
table = soup.find_all('div', attrs = {'class':'col-lg-6 agent'}) 
for a in table:
    if a.find('div', attrs = {'headshot'}):
        headshot_url=a.find('div', attrs = {'headshot'}).img```
5sxhfpxr

5sxhfpxr1#

希望这对〈3有帮助

r=requests.get(url)
print("fetched")
soup=BeautifulSoup(r.text, 'html.parser')
table = soup.find_all('div', attrs = {'class':'col-lg-6 agent'}) 
for a in table:
    headshots=a.find('div', attrs = {'headshot'})
    #find all divs with headshot class
    if headshots:
        #check if not None
        headshot_url=headshots.img["src"]
        #get the url
    else:
        headshot_url=None
        #So nothing gets wrong with our data sets
    
    content=a.find('div', attrs = {'content'})
    #find all divs with content class
    if content:
        #check if the div actually exist
        if content.h3:
            name=str(content.h3.contents[0]).replace("\xa0"," ")
        else:
            name=None
        if content.h5:
            company=content.h5.contents[0]
        else:
            company=None
    else:
        name,company=None,None
        #if content is None, then by default both of these None
    html_address=content.address
    
    if html_address:
        address=html_address.contents[0]
        #You might wanna edit this if you want
    else:
        address=None
    
    edu=a.find("div",attrs={'education'}).find("div",attrs={"class":None})
    #find all divs with education class
    
    if edu:
        education=edu.contents[0]
        
    else:
        education=None
    
    #YOUR FINAL DATA SET IS:
    data_set={"headshot_url":headshot_url,"name":name,"company":company,'address':address,'education':education}

相关问题