pandas 如何转换JSON文件并将其转换为CSV,使用Dataframes保留标题

yvt65v4c  于 2023-01-19  发布在  其他
关注(0)|答案(2)|浏览(145)

我有一个.gz类型的文件,其中包含JSON对象,如下所示:
input:

{ "name":"John", "age":21, "gender":"male" }
{ "name":"Mike", "age":29, "gender":"male" }
{ "name":"Tim", "age":20, "gender":"male" }
{ "name":"Kim", "age":39, "gender":"female" }

注意:注意每个JSON obj的末尾没有逗号。
我使用以下命令将其保存到 Dataframe :

import pandas as pd
data_location = 's3://myBucket/myFolder'
raw_json_data = pd.read_json(data_location, lines=True)
raw_json_data.head(2)

问:我想把它转换成CSV,可能是这样的:
expected output:

name, age, gender
John, 21, male
Mike, 29, male
Tim, 20, male
Kim, 39, female

我使用了这个,但是没有工作以给予预期的输出-我错过了什么吗?

df=pd.read_json(raw_json_data)
df.to_csv('results.csv')
fruv7luv

fruv7luv1#

首先,您可以创建包含一列字典的 Dataframe

import json
from io import StringIO

df = pd.read_csv(StringIO("""
{ "name":"John", "age":21, "gender":"male" }
{ "name":"Mike", "age":29, "gender":"male" }
{ "name":"Tim", "age":20, "gender":"male" }
{ "name":"Kim", "age":39, "gender":"female" } 
"""), delimiter='|', header=None)  # instead of StringIO part, you can have the path of input file

df    
                 0
0   { "name":"John", "age":21, "gender":"male" }
1   { "name":"Mike", "age":29, "gender":"male" }
2   { "name":"Tim", "age":20, "gender":"male" }
3   { "name":"Kim", "age":39, "gender":"female" }

您可以使用json_normalize将各个字典转换为 Dataframe

def func(x):
    result = pd.json_normalize(json.loads(x.iloc[0]))
    return result

result = df.apply(func, axis=1)
result
0       name  age gender
0  John  21   male 
1       name  age gender
0  Mike  29   male 
2      name  age gender
0  Tim  20   male   
3      name  age  gender
0  Kim  39   female
dtype: object

上述输出将是一系列 Dataframe ,要将其转换为单个 Dataframe ,您可以执行以下操作

pd.concat([r for r in result], ignore_index=True)

    name    age gender
0   John    21  male
1   Mike    29  male
2   Tim     20  male
3   Kim     39  female
7eumitmz

7eumitmz2#

    • 我有一个.gz类型的文件,其中有JSON对象 * 被认为是指有一个.gz文件,其中有一个.json文件。
  • 使用pathlib方法读入文件,然后将行拆分为liststrings
  • Path('test.json')'test.json()'可以是文件的路径(如果文件位于其他目录中)。
  • 使用ast.literal_evalstrings转换为dicts
import pandas as pd
from pathlib import Path
from ast import literal_eval

# read the file in using the pathlib methods
text = Path('test.json').read_text().split('\n')

# map the strings to dicts
text = map(literal_eval, text)

# load the list of dicts into a dataframe
df = pd.DataFrame(text)

# save to a csv
df.to_csv('results.csv', index=False)

.gz文件读取

  • 使用json模块阅读这些行会出现问题,因为数据不是正确格式的.json文件。
import gzip
import pandas as pd
from ast import literal_eval

# open the gzip file
with gzip.open('testing.json.gz', 'rt', encoding='UTF-8') as zipfile:
    data = [literal_eval(v.strip()) for v in zipfile]

# create the dataframe
df = pd.DataFrame(data)

# save to a csv
df.to_csv('results.csv', index=False)

相关问题