python 如何根据样式报废特定段落的值

gr8qqesn  于 2023-01-29  发布在  Python
关注(0)|答案(2)|浏览(132)

在这一页中有这样一段话:
X在商业登记簿中提交的最新财务报表对应于2020年,显示营业额范围为“600万至3000万欧元”。
在页面中,它是:欧元P.a. - S.r.l.的最后存款额与所有2020年的存款额一致,并报告了一个范围为"Tra 6.000.000和30.000.000欧元“。
我需要刮取''中的值(在6,000,000和30,000,000欧元之间),并将其放入名为“range”的列中。
我尝试了没有成功这个代码:

import pandas as pd
from bs4 import BeautifulSoup
import requests

url = 'https://www.informazione-aziende.it/Azienda_EURO-PA-SRL'
resp = requests.get(url)
soup = BeautifulSoup(resp.text, 'html.parser')

turnover = soup.find("span", {"id": "turnover"}).text
year = soup.find("span", {"id": "year"}).text

data = {'turnover': turnover, 'year': year}
df = pd.DataFrame(data, index=[0])
print(df)

但我得到:属性错误:“NoneType”对象没有属性“text”

whhtz7ly

whhtz7ly1#

首先,用BeautifulSoup刮取整个文本,并将其赋给一个变量,例如:

text = "The latest financial statements filed by x in the business register it corresponds to the year 2020 and shows a turnover range of 'Between 6,000,000 and 30,000,000 Euros'."

然后,执行以下代码:

import re

pattern = "'.+'"
result = re.search(pattern, text)
result = result[0].replace("'", "")

输出将为:
"在600万到3000万欧元之间"

t40tm48m

t40tm48m2#

备选方案可以是:

  • 用单引号字符-'-拆分文本,得到列表位置1的文本。

代码:

text = "The latest financial statements filed by x in the business register it corresponds to the year 2020 and shows a turnover range of 'Between 6,000,000 and 30,000,000 Euros'."

# Get the text at position 1 of the list: 
desired_text = text.split("'")[1]

# Print the result: 
print(desired_text)

结果:

Between 6,000,000 and 30,000,000 Euros

相关问题