在python中将xml转换为rdf格式

6ie5vjzr  于 2023-06-28  发布在  Python
关注(0)|答案(1)|浏览(152)

我是新的阅读xml文件和获取信息。下面是我的xml文件(子集)。

<icd_10_v2019>
    <item type="chapter">
        <name>I</name>
        <description>Certain infectious and parasitic diseases</description>
        <item type="block">
            <name>A00-A09</name>
            <description>Intestinal infectious diseases</description>
            <item type="category">
                <name>A00</name>
                <description>Cholera</description>
                <item type="subcategory">
                    <name>A00.0</name>
                    <description>Cholera due to Vibrio cholerae 01, biovar cholerae</description>
                </item>
                <item type="subcategory">
                    <name>A00.1</name>
                    <description>Cholera due to Vibrio cholerae 01, biovar eltor</description>
                </item>
                <item type="subcategory">
                    <name>A00.9</name>
                    <description>Cholera, unspecified</description>
                </item>
            </item>
            <item type="category">
                <name>A01</name>
                <description>Typhoid and paratyphoid fevers</description>
                <item type="subcategory">
                    <name>A01.0</name>
                    <description>Typhoid fever</description>
                </item>
                <item type="subcategory">
                    <name>A01.1</name>
                    <description>Paratyphoid fever A</description>
                </item>
                <item type="subcategory">
                    <name>A01.2</name>
                    <description>Paratyphoid fever B</description>
                </item>
                <item type="subcategory">
                    <name>A01.3</name>
                    <description>Paratyphoid fever C</description>
                </item>
                <item type="subcategory">
                    <name>A01.4</name>
                    <description>Paratyphoid fever, unspecified</description>
                </item>
            </item>
            <item type="category">
                <name>A02</name>
                <description>Other salmonella infections</description>
                <item type="subcategory">
                    <name>A02.0</name>
                    <description>Salmonella enteritis</description>
                </item>
                <item type="subcategory">
                    <name>A02.1</name>
                    <description>Salmonella sepsis</description>
                </item>
                <item type="subcategory">
                    <name>A02.2</name>
                    <description>Localized salmonella infections</description>
                </item>
                <item type="subcategory">
                    <name>A02.8</name>
                    <description>Other specified salmonella infections</description>
                </item>
                <item type="subcategory">
                    <name>A02.9</name>
                    <description>Salmonella infection, unspecified</description>
                </item>
            </item>
            <item type="category">
                <name>A03</name>
                <description>Shigellosis</description>
                <item type="subcategory">
                    <name>A03.0</name>
                    <description>Shigellosis due to Shigella dysenteriae</description>
                </item>
                <item type="subcategory">
                    <name>A03.1</name>
                    <description>Shigellosis due to Shigella flexneri</description>
                </item>
                <item type="subcategory">
                    <name>A03.2</name>
                    <description>Shigellosis due to Shigella boydii</description>
                </item>
                <item type="subcategory">
                    <name>A03.3</name>
                    <description>Shigellosis due to Shigella sonnei</description>
                </item>
                <item type="subcategory">
                    <name>A03.8</name>
                    <description>Other shigellosis</description>
                </item>
                <item type="subcategory">
                    <name>A03.9</name>
                    <description>Shigellosis, unspecified</description>
                </item>
            </item>
            <item type="category">
                <name>A04</name>
                <description>Other bacterial intestinal infections</description>
                <item type="subcategory">
                    <name>A04.0</name>
                    <description>Enteropathogenic Escherichia coli infection</description>
                </item>
                <item type="subcategory">
                    <name>A04.1</name>
                    <description>Enterotoxigenic Escherichia coli infection</description>
                </item>
                <item type="subcategory">
                    <name>A04.2</name>
                    <description>Enteroinvasive Escherichia coli infection</description>
                </item>
                <item type="subcategory">
                    <name>A04.3</name>
                    <description>Enterohaemorrhagic Escherichia coli infection</description>
                </item>
                <item type="subcategory">
                    <name>A04.4</name>
                    <description>Other intestinal Escherichia coli infections</description>
                </item>
                <item type="subcategory">
                    <name>A04.5</name>
                    <description>Campylobacter enteritis</description>
                </item>
                <item type="subcategory">
                    <name>A04.6</name>
                    <description>Enteritis due to Yersinia enterocolitica</description>
                </item>
                <item type="subcategory">
                    <name>A04.7</name>
                    <description>Enterocolitis due to Clostridium difficile</description>
                </item>
                <item type="subcategory">
                    <name>A04.8</name>
                    <description>Other specified bacterial intestinal infections</description>
                </item>
                <item type="subcategory">
                    <name>A04.9</name>
                    <description>Bacterial intestinal infection, unspecified</description>
                </item>
            </item>
            <item type="category">
                <name>A05</name>
                <description>Other bacterial foodborne intoxications, not elsewhere classified</description>
                <item type="subcategory">
                    <name>A05.0</name>
                    <description>Foodborne staphylococcal intoxication</description>
                </item>
                <item type="subcategory">
                    <name>A05.1</name>
                    <description>Botulism</description>
                </item>
                <item type="subcategory">
                    <name>A05.2</name>
                    <description>Foodborne Clostridium perfringens [Clostridium welchii] intoxication</description>
                </item>
                <item type="subcategory">
                    <name>A05.3</name>
                    <description>Foodborne Vibrio parahaemolyticus intoxication</description>
                </item>
                <item type="subcategory">
                    <name>A05.4</name>
                    <description>Foodborne Bacillus cereus intoxication</description>
                </item>
                <item type="subcategory">
                    <name>A05.8</name>
                    <description>Other specified bacterial foodborne intoxications</description>
                </item>
                <item type="subcategory">
                    <name>A05.9</name>
                    <description>Bacterial foodborne intoxication, unspecified</description>
                </item>
            </item>
            <item type="category">
                <name>A06</name>
                <description>Amoebiasis</description>
                <item type="subcategory">
                    <name>A06.0</name>
                    <description>Acute amoebic dysentery</description>
                </item>
                <item type="subcategory">
                    <name>A06.1</name>
                    <description>Chronic intestinal amoebiasis</description>
                </item>
                <item type="subcategory">
                    <name>A06.2</name>
                    <description>Amoebic nondysenteric colitis</description>
                </item>
                <item type="subcategory">
                    <name>A06.3</name>
                    <description>Amoeboma of intestine</description>
                </item>
                <item type="subcategory">
                    <name>A06.4</name>
                    <description>Amoebic liver abscess</description>
                </item>
                <item type="subcategory">
                    <name>A06.5</name>
                    <description>Amoebic lung abscess</description>
                </item>
                <item type="subcategory">
                    <name>A06.6</name>
                    <description>Amoebic brain abscess</description>
                </item>
                <item type="subcategory">
                    <name>A06.7</name>
                    <description>Cutaneous amoebiasis</description>
                </item>
                <item type="subcategory">
                    <name>A06.8</name>
                    <description>Amoebic infection of other sites</description>
                </item>
                <item type="subcategory">
                    <name>A06.9</name>
                    <description>Amoebiasis, unspecified</description>
                </item>
            </item>
            <item type="category">
                <name>A07</name>
                <description>Other protozoal intestinal diseases</description>
                <item type="subcategory">
                    <name>A07.0</name>
                    <description>Balantidiasis</description>
                </item>
                <item type="subcategory">
                    <name>A07.1</name>
                    <description>Giardiasis [lambliasis]</description>
                </item>
                <item type="subcategory">
                    <name>A07.2</name>
                    <description>Cryptosporidiosis</description>
                </item>
                <item type="subcategory">
                    <name>A07.3</name>
                    <description>Isosporiasis</description>
                </item>
                <item type="subcategory">
                    <name>A07.8</name>
                    <description>Other specified protozoal intestinal diseases</description>
                </item>
                <item type="subcategory">
                    <name>A07.9</name>
                    <description>Protozoal intestinal disease, unspecified</description>
                </item>
            </item>
            <item type="category">
                <name>A08</name>
                <description>Viral and other specified intestinal infections</description>
                <item type="subcategory">
                    <name>A08.0</name>
                    <description>Rotaviral enteritis</description>
                </item>
                <item type="subcategory">
                    <name>A08.1</name>
                    <description>Acute gastroenteropathy due to Norovirus</description>
                </item>
                <item type="subcategory">
                    <name>A08.2</name>
                    <description>Adenoviral enteritis</description>
                </item>
                <item type="subcategory">
                    <name>A08.3</name>
                    <description>Other viral enteritis</description>
                </item>
                <item type="subcategory">
                    <name>A08.4</name>
                    <description>Viral intestinal infection, unspecified</description>
                </item>
                <item type="subcategory">
                    <name>A08.5</name>
                    <description>Other specified intestinal infections</description>
                </item>
            </item>
            <item type="category">
                <name>A09</name>
                <description>Other gastroenteritis and colitis of infectious and unspecified origin</description>
                <item type="subcategory">
                    <name>A09.0</name>
                    <description>Other and unspecified gastroenteritis and colitis of infectious origin</description>
                </item>
                <item type="subcategory">
                    <name>A09.9</name>
                    <description>Gastroenteritis and colitis of unspecified origin</description>
                </item>
            </item>
        </item>
        <item type="block">
            <name>A15-A19</name>
            <description>Tuberculosis</description>
            <item type="category">
                <name>A15</name>
                <description>Respiratory tuberculosis, bacteriologically and histologically confirmed</description>
                <item type="subcategory">
                    <name>A15.0</name>
                    <description>Tuberculosis of lung, confirmed by sputum microscopy with or without culture</description>
                </item>
                <item type="subcategory">
                    <name>A15.1</name>
                    <description>Tuberculosis of lung, confirmed by culture only</description>
                </item>
                <item type="subcategory">
                    <name>A15.2</name>
                    <description>Tuberculosis of lung, confirmed histologically</description>
                </item>
                <item type="subcategory">
                    <name>A15.3</name>
                    <description>Tuberculosis of lung, confirmed by unspecified means</description>
                </item>
                <item type="subcategory">
                    <name>A15.4</name>
                    <description>Tuberculosis of intrathoracic lymph nodes, confirmed bacteriologically and histologically</description>
                </item>
                <item type="subcategory">
                    <name>A15.5</name>
                    <description>Tuberculosis of larynx, trachea and bronchus, confirmed bacteriologically and histologically</description>
                </item>
                <item type="subcategory">
                    <name>A15.6</name>
                    <description>Tuberculous pleurisy, confirmed bacteriologically and histologically</description>
                </item>
                <item type="subcategory">
                    <name>A15.7</name>
                    <description>Primary respiratory tuberculosis, confirmed bacteriologically and histologically</description>
                </item>
                <item type="subcategory">
                    <name>A15.8</name>
                    <description>Other respiratory tuberculosis, confirmed bacteriologically and histologically</description>
                </item>
                <item type="subcategory">
                    <name>A15.9</name>
                    <description>Respiratory tuberculosis unspecified, confirmed bacteriologically and histologically</description>
                </item>
            </item>
</icd_10_v2019>

我期望的层次结构如下:

Certain infectious and parasitic diseases (I)
 Intestinal infectious diseases (A00-A09)
  Cholera (A00)
    Cholera due to Vibrio cholerae 01, biovar cholerae (A00.0)
    Cholera due to Vibrio cholerae 01, biovar eltor (A00.1)
    Cholera, unspecified (A00.9)
  Typhoid and paratyphoid fevers (A01)
    Typhoid fever (A01.0)
    ..... (so on ....)

最后,我想将其保存为图形格式(rdf)。我如何才能做到这一点?任何帮助是高度赞赏。
到目前为止,我尝试了下面的代码。

import xml.etree.ElementTree as ET
import rdflib
from rdflib import Graph, Namespace, URIRef, Literal

graph = Graph()

ICD_NS = Namespace("http://example.com/icd/")

#Load the XML data
tree = ET.parse('icd10_v19.xml')
root = tree.getroot()

#Create an rdf graph
graph = Graph()

def process_element(element, parent_uri):
    print (element)
    
    if element.find("name").text is not None:
        element_uri = parent_uri + element.find("name").text
        print (element_uri)
        graph.add((element_uri, ICD_NS['name'], Literal(element.find('name').text)))
        graph.add((element_uri, ICD_NS['description'], Literal(element.find('description').text)))
        graph.add((element_uri, ICD_NS['type'], Literal(element.attrib['type'])))
   
        #Recursively process child elements
        for child in element.findall("item"):
            process_element(child, element_uri + '/')
    
#Start processing from the root element
process_element(root, ICD_NS[''])

#Serialize the graph to RDF/XML format
rdf_data = graph.serialize(format='xml')

#Save the RDF/XML data to a file
with open("icd10_graph.rdf", "wb") as f:
    print (f)
    f.write(rdf_data)
mwg9r5ms

mwg9r5ms1#

要将提供的XML文档解析为RDF,可以使用例如beautifulsoup

from bs4 import BeautifulSoup
from rdflib import Graph, Namespace, URIRef, Literal

def parse(graph, node, uri=None):
    if uri is None:
        uri = []

    type_ = node.get('type')
    name = node.find('name').text
    description = node.description.text

    uri_ref = URIRef('/'.join(uri + [name]))

    graph.add((uri_ref, ICD_NS['name'], Literal(name)))
    graph.add((uri_ref, ICD_NS['description'], Literal(description)))
    graph.add((uri_ref, ICD_NS['type'], Literal(type_)))

    for i in node.find_all('item', recursive=False):
        parse(graph, i, uri + [name])

with open('icd10_v19.xml', 'r') as f_in:
    soup = BeautifulSoup(f_in.read(), 'xml')

graph = Graph()
ICD_NS = Namespace("http://example.com/icd/")

parse(graph,soup.item)

rdf_data = graph.serialize(format='xml')
print(rdf_data)

这将打印以下XML:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
   xmlns:ns1="http://example.com/icd/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <rdf:Description rdf:about="I/A00-A09/A04/A04.9">
    <ns1:name>A04.9</ns1:name>
    <ns1:description>Bacterial intestinal infection, unspecified</ns1:description>
    <ns1:type>subcategory</ns1:type>
  </rdf:Description>
  <rdf:Description rdf:about="I/A00-A09/A00/A00.1">
    <ns1:name>A00.1</ns1:name>
    <ns1:description>Cholera due to Vibrio cholerae 01, biovar eltor</ns1:description>
    <ns1:type>subcategory</ns1:type>
  </rdf:Description>
  <rdf:Description rdf:about="I/A00-A09/A04/A04.0">
    <ns1:name>A04.0</ns1:name>
    <ns1:description>Enteropathogenic Escherichia coli infection</ns1:description>
    <ns1:type>subcategory</ns1:type>
  </rdf:Description>
  <rdf:Description rdf:about="I/A00-A09/A05">
    <ns1:name>A05</ns1:name>
    <ns1:description>Other bacterial foodborne intoxications, not elsewhere classified</ns1:description>
    <ns1:type>category</ns1:type>
  </rdf:Description>
  <rdf:Description rdf:about="I/A00-A09/A06/A06.7">
    <ns1:name>A06.7</ns1:name>
    <ns1:description>Cutaneous amoebiasis</ns1:description>
    <ns1:type>subcategory</ns1:type>
  </rdf:Description>
  <rdf:Description rdf:about="I/A15-A19/A15/A15.0">
    <ns1:name>A15.0</ns1:name>
    <ns1:description>Tuberculosis of lung, confirmed by sputum microscopy with or without culture</ns1:description>
    <ns1:type>subcategory</ns1:type>
  </rdf:Description>

...etc.

编辑:如果你想拥有树型RDF结构(而不是平面):

from bs4 import BeautifulSoup
from rdflib import Graph, Namespace, URIRef, Literal, BNode

def parse(graph, node, bs_node):
    if node is None:
        node = graph

    type_ = bs_node.get('type')
    name = bs_node.find('name').text
    description = bs_node.description.text

    graph.add((node, ICD_NS['name'], Literal(name)))
    graph.add((node, ICD_NS['description'], Literal(description)))
    graph.add((node, ICD_NS['type'], Literal(type_)))

    for i in bs_node.find_all('item', recursive=False):
        new_node = BNode()
        graph.add((node, ICD_NS['child'], new_node))
        parse(graph, new_node, i)

with open('icd10_v19.xml', 'r') as f_in:
    soup = BeautifulSoup(f_in.read(), 'xml')

graph = Graph()
ICD_NS = Namespace("http://example.com/icd/")

parse(graph, None, soup.item)

rdf_data = graph.serialize(format='xml')
print(rdf_data)

图纸:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
   xmlns:ns1="http://example.com/icd/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <rdf:Description rdf:nodeID="Nacbbf7c42d89443bb32fa62ae819603b">
    <ns1:name>A03</ns1:name>
    <ns1:description>Shigellosis</ns1:description>
    <ns1:type>category</ns1:type>
    <ns1:child rdf:nodeID="Nc684d62c8c2d452b9f2e6e127487671c"/>
    <ns1:child rdf:nodeID="Nccf58081834140f39545c6bbd8bb9747"/>
    <ns1:child rdf:nodeID="Nf15557855d634d55ada31463e21d447b"/>
    <ns1:child rdf:nodeID="Nb6ff921fec5e4aebb03af96eb9f19348"/>
    <ns1:child rdf:nodeID="N57b0de8bf5bf4010b41a4a2335d3a0a3"/>
    <ns1:child rdf:nodeID="N20dabf2261e04298b82510160a955e31"/>
  </rdf:Description>
  <rdf:Description rdf:nodeID="N0276fe2299504854a4f1f2215abfc76e">
    <ns1:name>A00.0</ns1:name>
    <ns1:description>Cholera due to Vibrio cholerae 01, biovar cholerae</ns1:description>
    <ns1:type>subcategory</ns1:type>
  </rdf:Description>
  <rdf:Description rdf:nodeID="Nf78dc7f58cb24966b8c9c48594ccae8b">
    <ns1:name>A15.4</ns1:name>
    <ns1:description>Tuberculosis of intrathoracic lymph nodes, confirmed bacteriologically and histologically</ns1:description>
    <ns1:type>subcategory</ns1:type>
  </rdf:Description>
  <rdf:Description rdf:nodeID="N43b24a8b48f44ed7b5eb1f463f87527d">
    <ns1:name>A05.8</ns1:name>
    <ns1:description>Other specified bacterial foodborne intoxications</ns1:description>
    <ns1:type>subcategory</ns1:type>
  </rdf:Description>

...etc.

相关问题