pandas 将元组和浮点值的字典转换为 Dataframe

hl0ma9xz  于 2023-02-11  发布在  其他
关注(0)|答案(4)|浏览(166)

我有一个格式化的数据,格式为dict [tuple [str,str],list [float]],我想将其转换为Pandas Dataframe
示例数据:{("A","B"):【-0.008035100996494293,0.008541940711438656】}
我尝试使用一些使用拆分函数的数据操作。预期:-

lymnna71

lymnna711#

import pandas as pd
data = {('A','B'): [-0.008035100996494293,0.008541940711438656], ('C','D'): [-0.008035100996494293,0.008541940711438656]}
title = []
heading = []
num_col1 = []
num_col2 = []
for key, val in data.items():
    title.append(key[0])
    heading.append(key[1])
    num_col1.append(val[0])
    num_col2.append(val[1])
data_ = {'title':title, 'heading':heading, 'num_col1':num_col1, 'num_col2':num_col1}
pd.DataFrame(data_)
ergxz8rk

ergxz8rk2#

最好的办法是手工构造Index,我们可以使用pandas.MultiIndex.from_tuples,因为字典键是以元组的形式存储的,从这里我们只需要将字典的值存储到DataFrame的主体中。

import pandas as pd

data = {('A','B'): [-0.008035100996494293,0.008541940711438656]}
index = pd.MultiIndex.from_tuples(data.keys(), names=['title', 'heading'])
df = pd.DataFrame(data.values(), index=index).reset_index()

print(df)
  title heading         0         1
0     A       B -0.008035  0.008542

如果要进行连锁操作,可以执行以下操作:

import pandas as pd

data = {('A','B'): [-0.008035100996494293,0.008541940711438656]}
df = (
    pd.DataFrame.from_dict(data, orient='index')
    .pipe(lambda d:
        d.set_axis(pd.MultiIndex.from_tuples(d.index, names=['title', 'heading']))
    )
    .reset_index()
)

print(df)
  title heading         0         1
0     A       B -0.008035  0.008542
omjgkv6w

omjgkv6w3#

另一种可能的解决方案,在元组和列表长度不同的情况下也有效:

pd.concat([pd.DataFrame.from_records([x for x in d.keys()],
                                     columns=['title', 'h1', 'h2']),
           pd.DataFrame.from_records([x[1] for x in d.items()])], axis=1)

输出:

title h1    h2         0         1    2
0     A  B  None -0.008035  0.008542  NaN
1     C  B     D -0.010351  1.008542  5.0

数据输入:

d = {('A','B'): [-0.008035100996494293,0.008541940711438656],
     ('C','B', 'D'): [-0.01035100996494293,1.008541940711438656, 5]}
6kkfgxo0

6kkfgxo04#

你可以在迭代字典条目时扩展键和值。Pandas会看到4个值,它们会被放到一行中。

>>> import pandas as pd
>>> data = {('A','B'): [-0.008035100996494293,0.008541940711438656]}
>>> pd.DataFrame(((*d[0], *d[1]) for d in data.items()), columns=("Title", "Heading", "Foo", "Bar"))
  Title Heading       Foo       Bar
0     A       B -0.008035  0.008542

相关问题