在python中将multiheader csv转换为嵌套字典

sirbozc5  于 10个月前  发布在  Python
关注(0)|答案(2)|浏览(81)

我的csv文件的内容是::

[user@localhost ~]$ cat tempcsv.csv
info,info,auth,req,req
name,desc,username,key1,key2
a,alphabet,admin,1,team

这里,csv包含两个头。
关于使用pandas读取csv

>>> import pandas as pd
>>> pd.read_csv('tempcsv.csv', sep=',', header=[0,1], encoding = "utf-8-sig", skipinitialspace=True, tupleize_cols=True)
  (info, name) (info, desc) (auth, username)  (req, key1) (req, key2)
0            a     alphabet            admin            1        team
>>> df = pd.read_csv('tempcsv.csv', sep=',', header=[0,1], encoding = "utf-8-sig", skipinitialspace=True, tupleize_cols=True)
>>> df.to_dict()
{(u'req', u'key1'): {0: 1}, (u'req', u'key2'): {0: u'team'}, (u'info', u'name'): {0: u'a'}, (u'auth', u'username'): {0: u'admin'}, (u'info', u'desc'): {0: u'alphabet'}}

现在,在尝试将dataframe转换为dict时,我得到了一个以key为tuple的列表。

>>> df.to_dict('records')
[{(u'req', u'key1'): 1, (u'req', u'key2'): u'team', (u'info', u'name'): u'a', (u'auth', u'username'): u'admin', (u'info', u'desc'): u'alphabet'}]

当我试着把它们转换成一本合适的字典时,

>>> result = []
>>> row_data = {}
>>> for row in df.to_dict('records'):
...     for key,value in row.iteritems():
...             row_data.setdefault(key[0], {})[key[1]] = value
...
>>> row_data
{u'info': {u'name': u'a', u'desc': u'alphabet'}, u'req': {u'key2': u'team', u'key1': 1}, u'auth': {u'username': u'admin'}}

这就是我的预期输出

{u'info': {u'name': u'a', u'desc': u'alphabet'}, u'req': {u'key2': u'team', u'key1': 1}, u'auth': {u'username': u'admin'}}

所以我的问题是,有没有办法用python把dataframe转换成dict?
或者,有没有一种方法可以使用csv包做同样的事情?
Pandas版本:

>>> import pandas as pd
pd>>> pd.__version__
'0.14.1'
yyhrrdl8

yyhrrdl81#

使用pandas 2.0.3进行测试

import pandas as pd
from io import StringIO

csv_data = """info,info,auth,req,req
name,desc,username,key1,key2
a,alphabet,admin,1,team"""

csv_stream = StringIO(csv_data)
df = pd.read_csv(csv_stream, header=[0, 1])
df.columns = pd.MultiIndex.from_tuples(df.columns)

formatted_dict = {}
for (outer_key, inner_key), value in df.to_dict(orient='records')[0].items():
    formatted_dict.setdefault(outer_key, {})[inner_key] = value

print(formatted_dict)

输出量:
用户名:'a',' desc':'alphabet'},' auth':'用户名':'admin'},' req':'key1':1,'key2':联系我们

cx6n0qe3

cx6n0qe32#

我不认为pandas能够解析这样的CSV,但是你可以使用内置的csv模块自己解析数据,例如:

import csv
import collections

with open("tempcsv.csv", "rb") as f: # on Python 3.x use: open("tempcsv.csv", "r", newline="")
    reader = csv.reader(f)  # create a CSV reader
    header = next(reader)  # collect the primary header
    subheader = next(reader)  # collect the subheader
    rows = []
    for row in reader:  # iterate over the rest of the CSV file
        parsed_row = collections.defaultdict(dict)  # use a dictionary factory
        for i, v in enumerate(header):  # iterate over the primary header fields
            # update each in the factory using the primary->secondary header map
            parsed_row[v].update({subheader[i]: row[i]})  
        rows.append(parsed_row)

这将创建一个rows列表,其中包含所有带有 fused header的行。对于您的数据,行看起来像这样:

{'info': {'name': 'a', 'desc': 'alphabet'},
 'auth': {'username': 'admin'},
 'req': {'key1': '1', 'key2': 'team'}}

相关问题