类似于此线程和任务scrape with BS4 Wikipedia text (pair each heading with paragraphs associated) - and output it to CSV-format
- 问题**:我们能否将此应用于类似的任务:将数据收集视为数字枢纽:我已经应用了一个刮刀到一个单一的网站与此-它的工作-但如何实现一个csv输出到刮刀迭代的网址,我们可以把输出到csv太-同时应用同样的技术!?
Title: (probably a h4 tag)
'Evolutionary Stage',
'Geographical Scope',
from bs4 import BeautifulSoup
import requests
page_link = 'https://s3platform-legacy.jrc.ec.europa.eu/digital-innovation-hubs-tool/-/dih/3480/view'
page_response = requests.get(page_link,verify=False, timeout=5)
page_content = BeautifulSoup(page_response.content, "html.parser")
textContent = []
for tag in page_content.find_all('h4')[1:]:
for item in tag.find_next_siblings('p'):
if texth4 in item.find_previous_siblings('h4')[0].text.strip():
Description', 'Link to national or regional initiatives for digitising industry', 'Market and Services', 'Service Examples', 'Leveraging the holding system "EndoTAIX" from scientific development to ready-to -market', 'For one of SurgiTAIX AG\'s products, the holding system "EndoTAIX" for surgical instrument fixation, the SurgiTAIX AG cooperated very closely with the RWTH University\'s Helmholtz institute. The services provided comprised the complete first phase of scientific development. Besides, after the first concepts of the holding system took shape, a prototype was successfully build in the scope of a feasibility study. In the role regarding the self-conception as a transfer service provider offering services itself, the SurgiTAIX AG refined the technology to market level and successfully performed all the steps necessary within the process to the approval and certification of the product. Afterwards, the product was delivered to another vendor with SurgiTAIX AG carrying out the production process as an OEM.', 'Development of a self-adapting robotic rehabilitation system', 'Based on the expertise of different partners of the hub, DIERS International GmbH (SME) was enabled to develop a self-adapting robotic rehabilitation system that allows patients after stroke to relearn motion patterns autonomously. The particular challenge of this cooperation was to adjust the robot to the individual and actual needs of the patient at any particular time of the exercise. Therefore, different sensors have been utilized to detect the actual movement performance of the patient. Feature extraction algorithms have been developed to identify the actual needs of the individual patient and intelligent predicting control algorithms enable the robot to independently adapt the movement task to the needs of the patient. These challenges could be solved only by the services provided by different partners of the hub which include the transfer of the newly developed technologies, access to patient data, acquisition of knowledge and demands from healthcare personal and coordinating the application for public funding.', 'Establishment of a robotic couch lab and test facility for radiotherapy', 'With the help of services provided by different partners of the hub, the robotic integrator SME BEC GmbH was given the opportunity to enhance their robotic patient positioning device "ExaMove" to allow for compensation of lung tumor movements during free breathing. The provided services solved the need to establish a test facility within the intended environment (the radiotherapy department) and provided the transfer of necessary innovative technologies such as new sensors and intelligent automatic control algorithms. Furthermore, the provided services included the coordination of the consortium, identifying, preparing and coordinating the application for public funding, provision of access to the hospital’s infrastructure and the acquisition of knowledge and demands from healthcare personal.', 'Organization', 'Evolutionary Stage', 'Geographical Scope', 'Funding', 'Partners', 'Technologies']
- 更新日期:**
import requests
from bs4 import BeautifulSoup
import csv
# create a list of the URLs for each digital hub
urls = ['https://s3platform.jrc.ec.europa.eu/digital-innovation-hubs-tool/details/AL00106',
# add the rest of the URLs here
# create an empty list to store the data for each digital hub
data = []
# iterate over each URL and extract the relevant information
for url in urls:
# make a GET request to the webpage
response = requests.get(url)
# parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')
# extract the relevant information from the HTML
name = soup.find('h3', class_='mb-0').text.strip()
country = soup.find('div', class_='col-12 col-md-6 col-lg-4 mb-3 mb-md-0').text.strip()
website = soup.find('a', href=lambda href: href and 'http' in href).get('href')
description = soup.find('div', class_='col-12 col-md-8').text.strip()
# add the extracted information to the data list as a dictionary
data.append({'Name': name, 'Country': country, 'Website': website, 'Description': description})
# write the data to a CSV file
with open('digital_hubs.csv', 'w', newline='') as csvfile:
fieldnames = ['Name', 'Country', 'Website', 'Description']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
for hub in data: