pandas 处理嵌套字典中多个叶节点的代码出现问题

gt0wga4j  于 2023-09-29  发布在  其他
关注(0)|答案(2)|浏览(97)

下面的代码:

import pandas as pd

data = [
    [8567, None, None, None],
    [8596, 8595, None, 5033],
    [8576, 8571, None, 447],
    [8576, 8571, -1879674.00, 152],
    [8576, 8571, 2971934.78, 152],
    [8576, 8571, -21044.15, 150],
    [8577, 8571, None, 5047],
    [8574, 8569, 7807810.50, 329688],
    [8575, 8569, None, 3734],
    [8573, 8568, None, 414397],
    [8572, 8568, 12234723.90, 336487],
    [8571, 8567, None, None],
    [8569, 8567, None, None],
    [8568, 8567, None, None],
    [8595, 8567, None, None]]

df = pd.DataFrame(data, columns=["HIERARCHYNODEID", "PARENTNODEID", "HVALUE", "IDs"])
df = df.fillna(0)

def build_dict(df):
    hierarchy_dict = {}
    root_nodes = df[df['PARENTNODEID'] == 0]
    for _, r in root_nodes.iterrows():
        hierarchy_dict[r['HIERARCHYNODEID']] = build_dict_helper(df, r['HIERARCHYNODEID'])
    return hierarchy_dict

def build_dict_helper(df, parent):
    children = df[df['PARENTNODEID'] == parent]
    node = {"HVALUE": 0, "IDs": []}
    child_nodes = {}
    for _, r in children.iterrows():
        child_node = build_dict_helper(df, r['HIERARCHYNODEID'])
        node["HVALUE"] += r["HVALUE"] + child_node["HVALUE"]
        node["IDs"].extend([r["IDs"]] + child_node["IDs"])  
        if r["HIERARCHYNODEID"] in child_nodes:
            existing_child = child_nodes[r["HIERARCHYNODEID"]]
            existing_child["HVALUE"] += r["HVALUE"] + child_node["HVALUE"]
            existing_child["IDs"].extend([r["IDs"]] + child_node["IDs"]) 
        else:
            child_nodes[r['HIERARCHYNODEID']] = child_node
    if child_nodes:
        node.update(child_nodes)
    return node

def create_named_dict(df):
    dct = build_dict(df)
    return dct

result = create_named_dict(df)
print(result)

这段代码的结果与我需要实现的目标不匹配:
{8567.0:'H值':21113751.03,“ID”:[0.0,447.0,152.0,152.0,150.0,5047.0,0.0,329688.0,3734.0,0.0,414397.0,336487.0,0.0,5033.0],8571.0:数据'H值':1071216.63,'ID':[447.0,152.0,152.0,150.0,5047.0],8576.0:'HDVALUE':1071216.63,'IDs':【152,152,150】},8577.0:输入'HVALUE':0,'IDs':[5047]}},8569.0:'HDVALUE':7807810.5,'IDs':【329688.0、3734.0】、8574.0:'HDVALUE':7807810.5,'IDs':【329688】},8575.0:输入'HVALUE':0,'IDs':[3734]}},8568.0:'HTVALUE':12234723.9,'IDs':[414397.0,336487.0],8573.0:输入'HVALUE':0,'IDs':【414397.0】},8572.0:服务器'HVALUE':12234723.9,'IDs':[336487]}},8595.0:'HTVALUE':0.0,'IDs':[5033.0]、8596.0:输入'HVALUE':0,'IDs':[]}}}}
例如我的代码给出的问题:
8568.0:'H值':12234723.9,“ID”:[414397.0,336487.0],8573.0:输入'HVALUE':0,'IDs':[]},8572.0:'HTVALUE':0,'IDs':[]}}
但预期值为:
8568.0:'H值':12234723.9,“ID”:[414397.0,336487.0],8573.0:输入'HVALUE':0,'IDs':[414397.0]},8572.0:'HTVALUE':12234723.9,'IDs':[336487]
重要的是要知道,这里的数据是一个小样本,我们可以有多个叶节点和子字典,这个想法是有通用功能来处理所有类型的嵌套字典,有人能帮忙吗?谢谢

3wabscal

3wabscal1#

我实现了你想要的(如果我理解正确的话,节点的HVALUE和应该包括它自己的值,ID列出它自己的ID),简化了一点递归助手,并删除了完全无用的create_named_dict函数:

def build_dict(df):
    hierarchy_dict = {}
    root_nodes = df[df['PARENTNODEID'] == 0]
    for _, r in root_nodes.iterrows():
        hierarchy_dict[r['HIERARCHYNODEID']] = build_dict_helper(df, r)
    return hierarchy_dict

def build_dict_helper(df, parent):
    children = df[df['PARENTNODEID'] == parent['HIERARCHYNODEID']]
    node = {"HVALUE": parent['HVALUE'], "IDs": [parent['IDs']]}
    for _, r in children.iterrows():
        child_dict = build_dict_helper(df, r)
        node["HVALUE"] +=  child_dict["HVALUE"]
        node["IDs"].extend(child_dict["IDs"])
        existing_values = node.get(r['HIERARCHYNODEID'], {'HVALUE': 0, 'IDs': []})
        child_dict['HVALUE'] += existing_values['HVALUE']
        child_dict['IDs'].extend(existing_values['IDs'])
        node.setdefault(r['HIERARCHYNODEID'], {}).update(child_dict)
    return node

result = build_dict(df)
print(result)

测试结果:

{8567.0: 
   {'HVALUE': 21113751.03, 
    'IDs': [0.0, 0.0, 447.0, 152.0, 152.0, 150.0, 5047.0, 0.0, 329688.0, 3734.0, 0.0, 414397.0, 336487.0, 0.0, 5033.0], 
    8571.0: 
       {'HVALUE': 1071216.63, 
        'IDs': [0.0, 447.0, 152.0, 152.0, 150.0, 5047.0], 
        8576.0: {'HVALUE': 1071216.63, 'IDs': [150.0, 152.0, 152.0, 447.0]}, 
        8577.0: {'HVALUE': 0.0, 'IDs': [5047.0]}
       }, 
    8569.0: 
       {'HVALUE': 7807810.5, 
        'IDs': [0.0, 329688.0, 3734.0], 
        8574.0: {'HVALUE': 7807810.5, 'IDs': [329688.0]}, 
        8575.0: {'HVALUE': 0.0, 'IDs': [3734.0]}
       }, 
    8568.0: 
       {'HVALUE': 12234723.9, 
        'IDs': [0.0, 414397.0, 336487.0], 
        8573.0: {'HVALUE': 0.0, 'IDs': [414397.0]}, 
        8572.0: {'HVALUE': 12234723.9, 'IDs': [336487.0]}
       }, 
    8595.0: 
       {'HVALUE': 0.0, 
        'IDs': [0.0, 5033.0], 
        8596.0: {'HVALUE': 0.0, 'IDs': [5033.0]}
       }
   }
}
hjzp0vay

hjzp0vay2#

你的基本情况似乎是混乱和错误的。在build_dict_helper中构造初始节点时,应该使用该节点的两个字段值初始化HVALUEIDs列表,否则无子节点的节点永远不会得到填充的值。因此,您不需要在循环中不断将r的值添加到子节点的值中,您只需添加子节点的值。
此外,IDs字段有点古怪,它似乎只包含children的值加上您的值,如果您的值不是EMPTY。
因为我认为解释和代码都很重要,这里是代码,只是为了build_dict和build_dict_helper必须更改:

def build_dict(df):
    hierarchy_dict = {}
    root_nodes = df[df['PARENTNODEID'] == 0]
    for _, r in root_nodes.iterrows():
        hierarchy_dict[r['HIERARCHYNODEID']] = build_dict_helper(df, r) # passing whole 'r'
    return hierarchy_dict

def build_dict_helper(df, parent):
    children = df[df['PARENTNODEID'] == parent['PARENTNODEID']
    node = {"HVALUE": parent["HVALUE"], "IDs": parent["IDs"] == None ? [] : [parent["IDs"]]}
    child_nodes = {}
    for _, r in children.iterrows():
        child_node = build_dict_helper(df, r['HIERARCHYNODEID'])
        node["HVALUE"] += child_node["HVALUE"]
        if child_node["IDs"].empty():
            node["IDs"].extend(None)
        else  
            node["IDs"].extend(child_node["IDs"])  
        if r["HIERARCHYNODEID"] in child_nodes:
            existing_child = child_nodes[r["HIERARCHYNODEID"]]
            existing_child["HVALUE"] += child_node["HVALUE"]
            if child_node["IDs"].empty():
                existing_child["IDs"].extend(None)
            else  
                existing_child["IDs"].extend(child_node["IDs"])  
        else:
            child_nodes[r['HIERARCHYNODEID']] = child_node
    if child_nodes:
        node.update(child_nodes)
    return node

相关问题