python-3.x 按多个键对2n个字典的列表进行分组和聚合

lg40wkob  于 2022-12-20  发布在  Python
关注(0)|答案(3)|浏览(125)

我有一个字典列表(只是一个例子,列表更大):

my_list = [
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'}, 
 'values': [[1671256800, '100'], [1671260400, '100']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'}, 
 'values': [[1671256800, '100'], [1671260400, '100']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'}, 
 'values': [[1671256800, '300'], [1671260400, '300']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'}, 
 'values': [[1671256800, '300'], [1671260400, '300']]}]

我想对每个帐户的所有values求和,按版本列出email_domain,并相应地更新my_list。

my_list = [
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'}, 
 'values': [[1671256800, '200'], [1671260400, '200']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'}, 
 'values': [[1671256800, '600'], [1671260400, '600']]}]

注:

  • 'values': [[1671256800, '600'], [1671260400, '600']]中,每个数组的第一个值是时间戳(1671256800,1671260400)。
  • 在发布这个问题之前,我在这个网站上找了很多线索。对于这个用例,我找不到一个2n字典列表的正确语法。非常感谢您的帮助!

我试着跟踪group-and-aggregate-a-list-of-dictionaries-by-multiple-keys
我开始:

d = (pd.DataFrame(my_list)).groupby(['metric']['ebs_account'], ['metric']['version']).values.
vddsk6oq

vddsk6oq1#

可以将defaultdictfrozenset一起使用

from collections import defaultdict

group_dict = defaultdict(dict)

for record in my_list:
    key = frozenset(record['metric'].items())
    for x, y in record['values']:
        group_dict[key][x] = group_dict[key].setdefault(x, 0) + int(y)
        
res = [{'metrics': dict(k), 'values': [[k, str(vv)] for k, vv in v.items()]} for k, v in group_dict.items()]

print(res)
    • 输出:**
[{'metrics': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'},
  'values': [[1671256800, '200'], [1671260400, '200']]},
 {'metrics': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'},
  'values': [[1671256800, '600'], [1671260400, '600']]}]
dfddblmv

dfddblmv2#

给你

import pandas as pd

my_list = [
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'}, 
 'values': [[1671256800, '100'], [1671260400, '100']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'}, 
 'values': [[1671256800, '100'], [1671260400, '100']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'}, 
 'values': [[1671256800, '300'], [1671260400, '300']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'}, 
 'values': [[1671256800, '300'], [1671260400, '300']]}]

my_new_list = {}
for i in my_list:
    version = i['metric']['version']
    if version not in my_new_list:
        my_new_list[version] = i
    else:
        my_new_list[version]['values']+=i['values']
        df = pd.DataFrame(my_new_list[version]['values'])
        df[1] = df[1].apply(lambda x:int(x))
        df = df.groupby(0)[1].sum().reset_index().values.tolist()
        my_new_list[version]['values'] = df

output = list(my_new_list.values())
print(output)
z9smfwbn

z9smfwbn3#

试试看:

from collections import Counter

out = {}
for d in my_list:
    a, e, vr = (
        d["metric"]["account"],
        d["metric"]["email_domain"],
        d["metric"]["version"],
    )

    for t, v in d["values"]:
        out.setdefault((a, e, vr), Counter())[t] += int(v)

out = [
    {
        "metric": {"account": a, "email_domain": e, "version": vr},
        "values": [[kk, str(vv)] for kk, vv in v.items()],
    }
    for (a, e, vr), v in out.items()
]

print(out)

图纸:

[
    {
        "metric": {"account": "1", "email_domain": "gmail.com", "version": "a"},
        "values": [[1671256800, "200"], [1671260400, "200"]],
    },
    {
        "metric": {"account": "1", "email_domain": "gmail.com", "version": "b"},
        "values": [[1671256800, "600"], [1671260400, "600"]],
    },
]

相关问题