将Nested Json数据放入Pandas数据框中并进行数据处理

xxhby3vn  于 2022-12-20  发布在  其他
关注(0)|答案(1)|浏览(159)

我想把下面的嵌套JSON放到一个没有重复值的Pandas数据框中。我已经创建了一个python脚本来把数据放到数据框中。但是我在删除内部列表中的重复值时遇到了麻烦。

{
   "status" : "success",
   "data" : {
      "resultType" : "matrix",
      "result" : [
         {
            "metric" : {
               "__name__" : "up",
               "job" : "prometheus",
               "instance" : "localhost:9090"
            },
            "values" : [
               [ 1435781430.781, "1" ],
               [ 1435781445.781, "1" ],
               [ 1435781460.781, "1" ]
            ]
         },
         {
            "metric" : {
               "__name__" : "up",
               "job" : "node",
               "instance" : "localhost:9091"
            },
            "values" : [
               [ 1435781430.781, "0" ],
               [ 1435781445.781, "0" ],
               [ 1435781460.781, "1" ]
            ]
         }
      ]
   }
}

这是Json文件。由于值在第一个值列表中重复,所以我只想获得这个值,
第1435781430.781号法律公告
从第二个列表中我只想得到这两行,

[ 1435781430.781, "0" ],
[ 1435781460.781, "1" ]

这是我用的python脚本,但它只给出了前两行,

import pandas as pd
import json

response = """
{
   "status" : "success",
   "data" : {
      "resultType" : "matrix",
      "result" : [
         {
            "metric" : {
               "__name__" : "up",
               "job" : "prometheus",
               "instance" : "localhost:9090"
            },
            "values" : [
               [ 1435781430.781, "1" ],
               [ 1435781445.781, "1" ],
               [ 1435781460.781, "1" ]
            ]
         },
         {
            "metric" : {
               "__name__" : "up",
               "job" : "node",
               "instance" : "localhost:9091"
            },
            "values" : [
               [ 1435781430.781, "0" ],
               [ 1435781445.781, "0" ],
               [ 1435781460.781, "1" ]
            ]
         }
      ]
   }
}
"""

data_1 = json.loads(response)

df = pd.DataFrame(data_1["data"]["result"])
df['status'] = data_1['status']
df['resultType'] = data_1['data']['resultType']
df['value'] = data_1
df = pd.concat([df['status'],df['resultType'], df.pop("metric").apply(pd.Series), df.pop("value").apply(pd.Series)], axis=1)

print(df)

'

lyfkaqu1

lyfkaqu11#

你可以试试这样的方法:

df = pd.json_normalize(data['data']['result'])
# Turn list items into separate rows
df = df.explode("values")
# There is still list items on "values" column, separate it into two columns, so we can use second value for finding duplicates.
df[['values', 'value1']] = df['values'].to_list()
# Drop duplicates, you might want to check if this is correct subset for dropping
df = df.drop_duplicates(subset=["metric.instance", "metric.job"  , "value1"])

结果

values metric.__name__  metric.job metric.instance value1
0  1.435781e+09              up  prometheus  localhost:9090      1
1  1.435781e+09              up        node  localhost:9091      0
1  1.435781e+09              up        node  localhost:9091      1

相关问题