python 解析深度嵌套的单行数据

62lalag4  于 2023-01-08  发布在  Python
关注(0)|答案(1)|浏览(119)

我的单行输入数据为:

Row(contact=Row(officeAdd=None, homeAdd=(street='62 Crown Street', city='London', country='UK'), phone=Row(mobile=Row(primary='XXX-XXX-1234', alternate='XXX-XXX-1235'))))

我想将其解析为csv,同时保留字段名的层次结构,如下所示:

contact/officeAdd, contact/homeAdd/street, contact/homeAdd/city, contact/homeAdd/country, contact/phone/mobile/primary, contact/phone/mobile/alternate
None, 62 Crown Street, London, UK, XXX-XXX-1234, XXX-XXX-1235

到目前为止,我还不能用正则表达式正确地得到层次结构。用正则表达式可以吗,或者我需要一个不同的方法?

zengzsys

zengzsys1#

我编写了一些代码,将数据的层次结构(由字符串表示)提取到嵌入式列表/字典中。
我无法编写一个像样的正则表达式来分隔**括号外的','**周围的文本(这可能是可能的,但我做得不够好),所以我为此编写了一个帮助函数。

line = "Row(contact=Row(officeAdd=None, homeAdd=(street='62 Crown Street', city='London', country='UK'), phone=Row(mobile=Row(primary='XXX-XXX-1234', alternate='XXX-XXX-1235'))))"
    • 嵌入指令的行:**
import re
def text_to_dicts(text):
    text = text.replace("=(","=Row(")
    if not text.startswith('Row'):
        return text.strip("'") if text else None
    inside = re.findall('Row\((.*)\)', text)[0]
    entries = upper_level_split(inside,',')
    result = []
    for entry in entries:
        key, value = entry.split('=',1)
        result.append({key.strip():text_to_dicts(value)})
    return result
    
def upper_level_split(text, sep):
    level, parsed = 0, ['']
    for letter in text:
        if letter == sep and level == 0:
            parsed.append('')
            continue
        if letter == '(':
            level += 1
        if letter == ')':
            level -= 1
        parsed[-1] += letter
    return parsed
    • 输出:**
text_to_dicts(line)
# [{'contact': [{'officeAdd': 'None'},
#               {'homeAdd': [{'street': '62 Crown Street'},
#                            {'city': 'London'},
#                            {'country': 'UK'}]
#               },
#               {'phone': [{'mobile': [{'primary': 'XXX-XXX-1234'},
#                                      {'alternate': 'XXX-XXX-1235'}]
#                          }]
#               }]
#  }]
    • CSV就绪列表中的嵌入指令:**
def dicts_to_table(mydict, path = '', mylist = None):
    if mylist == None:
        mylist = [[],[]]
    for key, value in mydict.items():
        new_path = path + '/' + key if path else key
        if type(value) == list:
            for d in value:
                dicts_to_table(d, new_path, mylist)
        else:
            mylist[0].append(new_path)
            mylist[1].append(value)
            
    return mylist
    • 输出:**
dicts_to_table(text_to_dicts(line)[0])
# [['contact/officeAdd', 'contact/homeAdd/street', 'contact/homeAdd/city', 'contact/homeAdd/country', 'contact/phone/mobile/primary', 'contact/phone/mobile/alternate'],
# ['None', '62 Crown Street', 'London', 'UK', 'XXX-XXX-1234', 'XXX-XXX-1235']]

相关问题