下面的代码:
import pandas as pd
data = [
[8567, None, None, None],
[8596, 8595, None, 5033],
[8576, 8571, None, 447],
[8576, 8571, -1879674.00, 152],
[8576, 8571, 2971934.78, 152],
[8576, 8571, -21044.15, 150],
[8577, 8571, None, 5047],
[8574, 8569, 7807810.50, 329688],
[8575, 8569, None, 3734],
[8573, 8568, None, 414397],
[8572, 8568, 12234723.90, 336487],
[8571, 8567, None, None],
[8569, 8567, None, None],
[8568, 8567, None, None],
[8595, 8567, None, None]]
df = pd.DataFrame(data, columns=["HIERARCHYNODEID", "PARENTNODEID", "HVALUE", "IDs"])
df = df.fillna(0)
def build_dict(df):
hierarchy_dict = {}
root_nodes = df[df['PARENTNODEID'] == 0]
for _, r in root_nodes.iterrows():
hierarchy_dict[r['HIERARCHYNODEID']] = build_dict_helper(df, r['HIERARCHYNODEID'])
return hierarchy_dict
def build_dict_helper(df, parent):
children = df[df['PARENTNODEID'] == parent]
node = {"HVALUE": 0, "IDs": []}
child_nodes = {}
for _, r in children.iterrows():
child_node = build_dict_helper(df, r['HIERARCHYNODEID'])
node["HVALUE"] += r["HVALUE"] + child_node["HVALUE"]
node["IDs"].extend([r["IDs"]] + child_node["IDs"])
if r["HIERARCHYNODEID"] in child_nodes:
existing_child = child_nodes[r["HIERARCHYNODEID"]]
existing_child["HVALUE"] += r["HVALUE"] + child_node["HVALUE"]
existing_child["IDs"].extend([r["IDs"]] + child_node["IDs"])
else:
child_nodes[r['HIERARCHYNODEID']] = child_node
if child_nodes:
node.update(child_nodes)
return node
def create_named_dict(df):
dct = build_dict(df)
return dct
result = create_named_dict(df)
print(result)
这段代码的结果与我需要实现的目标不匹配:
{8567.0:'H值':21113751.03,“ID”:[0.0,447.0,152.0,152.0,150.0,5047.0,0.0,329688.0,3734.0,0.0,414397.0,336487.0,0.0,5033.0],8571.0:数据'H值':1071216.63,'ID':[447.0,152.0,152.0,150.0,5047.0],8576.0:'HDVALUE':1071216.63,'IDs':【152,152,150】},8577.0:输入'HVALUE':0,'IDs':[5047]}},8569.0:'HDVALUE':7807810.5,'IDs':【329688.0、3734.0】、8574.0:'HDVALUE':7807810.5,'IDs':【329688】},8575.0:输入'HVALUE':0,'IDs':[3734]}},8568.0:'HTVALUE':12234723.9,'IDs':[414397.0,336487.0],8573.0:输入'HVALUE':0,'IDs':【414397.0】},8572.0:服务器'HVALUE':12234723.9,'IDs':[336487]}},8595.0:'HTVALUE':0.0,'IDs':[5033.0]、8596.0:输入'HVALUE':0,'IDs':[]}}}}
例如我的代码给出的问题:
8568.0:'H值':12234723.9,“ID”:[414397.0,336487.0],8573.0:输入'HVALUE':0,'IDs':[]},8572.0:'HTVALUE':0,'IDs':[]}}
但预期值为:
8568.0:'H值':12234723.9,“ID”:[414397.0,336487.0],8573.0:输入'HVALUE':0,'IDs':[414397.0]},8572.0:'HTVALUE':12234723.9,'IDs':[336487]
重要的是要知道,这里的数据是一个小样本,我们可以有多个叶节点和子字典,这个想法是有通用功能来处理所有类型的嵌套字典,有人能帮忙吗?谢谢
2条答案
按热度按时间3wabscal1#
我实现了你想要的(如果我理解正确的话,节点的HVALUE和应该包括它自己的值,ID列出它自己的ID),简化了一点递归助手,并删除了完全无用的
create_named_dict
函数:测试结果:
hjzp0vay2#
你的基本情况似乎是混乱和错误的。在
build_dict_helper
中构造初始节点时,应该使用该节点的两个字段值初始化HVALUE
和IDs
列表,否则无子节点的节点永远不会得到填充的值。因此,您不需要在循环中不断将r
的值添加到子节点的值中,您只需添加子节点的值。此外,
IDs
字段有点古怪,它似乎只包含children的值加上您的值,如果您的值不是EMPTY。因为我认为解释和代码都很重要,这里是代码,只是为了build_dict和build_dict_helper必须更改: