如何将整个xml数据库吸收到elasticsearch?

bzzcjhmw  于 2021-06-13  发布在  ElasticSearch
关注(0)|答案(1)|浏览(363)

假设我有20个xml文件,这是整个数据库。有没有可能将这20个xml文件都吸收到ElasticSearch中?如果是,有什么选择?

krugob8w

krugob8w1#

对于python3,我建议使用xmltodict
pip install xmltodict elasticsearch 我假设xml文件有记录:

<records>
    <record>...</record>
    ...
    <record>...</record>
</records>

所以他们必须被分成记录。
使用以下内容编辑名为“load.py”的脚本:

import sys
import xmltodict
import json
from elasticsearch import Elasticsearch

INDEX="xmlfiles"
TYPE= "record"

def xml_to_actions(xmlcontent):
    for record in xmlcontent["records"]:
        yield ('{ "index" : { "_index" : "%s", "_type" : "%s" }}'% (INDEX, TYPE))
        yield (json.dumps(record, default=int))

e = Elasticsearch()  # no args, connect to localhost:9200
if not e.indices.exists(INDEX):
    raise RuntimeError('index does not exists, use `curl -X PUT "localhost:9200/%s"` and try again'%INDEX)

for f in sys.argv:
    with open(f, "rt") as fin:
        r = e.bulk(xml_to_actions(xmldict.parse(fin)))  # return a dict
        print(f, not r["errors"])

将其用于: python load.py xml1.xml xml2.xml ... xml20.xml

相关问题