pandas 从跨度中提取值

93ze6v8z  于 2024-01-04  发布在  其他
关注(0)|答案(2)|浏览(188)

我想从天气站点提取雪深值到栅格。(https://www.yr.no/nb/sn%C3%B8dybder/NO-46/Norge/Vestland)特别是Jordalen - Nåsen地区的雪深。Screen shot
我得到的最接近的是使用以下代码打印所有值:

import pandas as pd
import requests 
from bs4 import BeautifulSoup 

r=requests.get('https://www.yr.no/nb/sn%C3%B8dybder/NO-46/Norge/Vestland')
soup = BeautifulSoup(r.content, 'html.parser') 

result=soup.find_all("span", {"class": "snow-depth__value"})

print(result)

字符串
但是,我一直没有成功地找到一种方法来将这个特定的值转移到一个pandas框架中。

ac1kyiln

ac1kyiln1#

这在bs4中对我很有效,我认为find_all中的实际参数称为class_,因为class是python中的特殊保留字。我确实冒昧地实际创建了一个2d表,就像页面上的那样:

from bs4 import BeautifulSoup
from bs4.element import ResultSet,Tag
import requests
from requests.models import Response
from typing import Generator,List
from pandas import DataFrame

type LazyRow = Generator[str,None,None]
type LazyRowTable = Generator[LazyRow,None,None]

response: Response = requests.get('https://www.yr.no/nb/sn%C3%B8dybder/NO-46/Norge/Vestland')
html: str = response.content
soup: BeautifulSoup = BeautifulSoup(html,'html.parser')
table_headers: ResultSet = soup.find_all('th',class_='fluid-table__cell--heading')
table_rows: ResultSet = soup.find_all('tr',class_='fluid-table__row fluid-table__row--link')
# columns 1-3 contain short-term data and column 5 (last column) contains cumulative data 
# (I think, I can't read norwegian)
colnames: List[str] = [th.text for th in table_headers]
# parenthesis instead of square brace makes a Generator comprehension
row_data: LazyRowTable = ((child.text for child in row.children) for row in table_rows)

df: DataFrame = DataFrame(row_data,columns = colnames)
df.astype(int,errors='ignore')

字符串

更新:

代码完全重构的基础上的意见和事实,这是一个二维表,而不是一维列表。

w8ntj3qf

w8ntj3qf2#

可以使用string变量查找HTML节点的内部内容。请参阅:here
就像这样:

result=[]
for i in soup.find_all("span", {"class": "snow-depth__value"}):
    result.append(i.string)

# or inline

result = [i.string for i in soup.find_all("span", {"class": "snow-depth__value"})]

字符串
有了这个,你就有了一个列表,你可以把它写进一个嵌套框架中。

df=pd.DataFrame(result,columns=['Show'])

相关问题