pandas 解析深度嵌套的JSON文件

kzipqqlq  于 2023-08-01  发布在  其他
关注(0)|答案(3)|浏览(101)

我很难用json_normalize获取我需要的信息。我已经看过文档和大概10个深嵌套JSON文件的示例,但是我不能很好地掌握函数的上下文来提取正确的信息。我试图构建一个 Dataframe ,其中包含每个传感器的时间戳值(值键)。1534023900是UTC秒的时间戳。
enter image description here
下面是JSON的一个简短示例。
有什么想法吗

{
    "created": "2020-05-12T15:10:37Z",
    "device": {
        "device_info": {
            "device_fw": 204,
            "device_sn": "06-02133",
            "device_trait": 2,
            "device_type": 190
        },
        "timeseries": [
            {
                "configuration": {
                    "sensors": [
                        {
                            "measurements": [
                                "BATTERY",
                                "BATTERY_MV"
                            ],
                            "port": 7,
                            "sensor_bonus_value": "Unavailable",
                            "sensor_firmware_ver": "Unavailable",
                            "sensor_number": 133,
                            "sensor_sn": "Unavailable"
                        },
                        {
                            "measurements": [
                                "REFERENCE_KPA",
                                "TEMPC_LOGGER"
                            ],
                            "port": 8,
                            "sensor_bonus_value": "Unavailable",
                            "sensor_firmware_ver": "Unavailable",
                            "sensor_number": 134,
                            "sensor_sn": "Unavailable"
                        }
                    ],
                    "valid_since": "2018-08-11T21:45:00Z",
                    "values": [
                        [
                            1534023900,
                            0,
                            19,
                            [
                                {
                                    "description": "Battery Percent",
                                    "error": false,
                                    "units": "%",
                                    "value": 100
                                },
                                {
                                    "description": "Battery Voltage",
                                    "error": false,
                                    "units": " mV",
                                    "value": 7864
                                }
                            ],
                            [
                                {
                                    "description": "Reference Pressure",
                                    "error": false,
                                    "units": " kPa",
                                    "value": 100.62
                                },
                                {
                                    "description": "Logger Temperature",
                                    "error": false,
                                    "units": " \u00b0C",
                                    "value": 28.34
                                }
                            ]
                        ]
					}
				}
			}
		}
	}	
}

字符串

bq3bfh9z

bq3bfh9z1#

jmespath可以帮助处理嵌套数据:docs非常健壮,但访问数据的基本功能如下:如果它是一个键,那么你可以使用.,如果它不是数据中的第一个条目,如果它是一个数组/列表,你可以使用[]
您的数据位置摘要:device -> timeseries(dict)->[](array)->configuration(dict)->values(key)->[](array)->[0](array and get the first value)
实际代码:

import jmespath
expression = jmespath.compile('device.timeseries[].configuration.values[][0]')
expression.search(data)
[1534023900]

字符串

ubby3x7f

ubby3x7f2#

修改后的JSON

{
    "created": "2020-05-12T15:10:37Z",
    "device": {
        "device_info": {
            "device_fw": 204,
            "device_sn": "06-02133",
            "device_trait": 2,
            "device_type": 190
        },
        "timeseries": [
            {
                "configuration": {
                    "sensors": [
                        {
                            "measurements": [
                                "BATTERY",
                                "BATTERY_MV"
                            ],
                            "port": 7,
                            "sensor_bonus_value": "Unavailable",
                            "sensor_firmware_ver": "Unavailable",
                            "sensor_number": 133,
                            "sensor_sn": "Unavailable"
                        },
                        {
                            "measurements": [
                                "REFERENCE_KPA",
                                "TEMPC_LOGGER"
                            ],
                            "port": 8,
                            "sensor_bonus_value": "Unavailable",
                            "sensor_firmware_ver": "Unavailable",
                            "sensor_number": 134,
                            "sensor_sn": "Unavailable"
                        }
                    ],
                    "valid_since": "2018-08-11T21:45:00Z",
                    "values": [
                        [
                            1534023900,
                            0,
                            19,
                            [
                                {
                                    "description": "Battery Percent",
                                    "error": false,
                                    "units": "%",
                                    "value": 100
                                },
                                {
                                    "description": "Battery Voltage",
                                    "error": false,
                                    "units": " mV",
                                    "value": 7864
                                }
                            ],
                            [
                                {
                                    "description": "Reference Pressure",
                                    "error": false,
                                    "units": " kPa",
                                    "value": 100.62
                                },
                                {
                                    "description": "Logger Temperature",
                                    "error": false,
                                    "units": " \u00b0C",
                                    "value": 28.34
                                }
                            ]
                        ]
                            ]
                    }
                }]
            }
}

字符串

vfh0ocws

vfh0ocws3#

我一直在研究一个类似的问题,json_normalize对我没有太大帮助。但是,我已经编写了下面的代码,对于大多数嵌套的JSON用例都很好地工作。我仍然处于开发状态,正在测试各种JSON文件。任何人,请随时发表评论或提供任何改进这一部分的建议。
我的主要目标是保留JSON的结构并将其提供在 Dataframe 中。

import json
import pandas as pd
import numpy as np

def flatten_outer (data):
full_list = []
def flatten_inner(sub_data,first_level_key='',index=0,tot_len=0):
    for k,v in sub_data.items():
        full_key = first_level_key+'.'+k if first_level_key !='' else k
        if isinstance(v, dict): 
            flatten_inner(v, full_key)
            
        elif isinstance(v, list):
            
            for i in range(0, len(v)): 

                if (isinstance(v[i], dict)):
                    
                    flatten_inner(v[i], full_key,index=i, tot_len=len(v))
                else: 
                    val_ls = value_list[full_key] if full_key in value_list.keys() else []
                    val_ls.append(v)
                    value_list[full_key] = val_ls
                    break
        else:

            if full_key in value_list.keys():
                placeholder_list = value_list[full_key]
                placeholder_list[index] = v
                value_list[full_key] = placeholder_list
            else:
                if index == 0:
                    if tot_len == 0:
                        value_list[full_key] = v
                    else:
                        placeholder_list = [None]*tot_len
                        placeholder_list[0] = v
                        value_list[full_key] = placeholder_list
                else:
                    
                    dif = tot_len - index - 1
                    placeholder_list = [None] * index
                    placeholder_list.append(v)
                    placeholder_list = placeholder_list + [None] * dif
                    value_list[full_key] = placeholder_list
            
    return value_list
    
for row in data:
    value_list = dict() #creating a value_list to store key value pairs(column values) for each record
    cv =  flatten_inner(row)
    full_list.append(cv)

return full_list

def df_create_clean(full_list):
    df = pd.DataFrame(full_list)
    df = df.where(pd.notnull(df),None)
    cols = df.columns
    for col in cols:
        df[col] = df[col].apply(lambda x: None if (isinstance(x,list) and len(x)==0) else x)
        df[col] = df[col].apply(lambda x: x[0] if (isinstance(x,list) and len(x)==1) else x)
    return df

def flatten_json(data):
    df = pd.DataFrame(flatten_outer(data))
    cleaned_df = df_create_clean(df)
    return cleaned_df

字符串
将上面的代码保存到文件flat_json.py中。运行以下代码。

import flatten_json as fj
data = [{
"created": "2020-05-12T15:10:37Z",
"device": {
    "device_info": {
        "device_fw": 204,
        "device_sn": "06-02133",
        "device_trait": 2,
        "device_type": 190
    },
    "timeseries": [
        {
            "configuration": {
                "sensors": [
                    {
                        "measurements": [
                            "BATTERY",
                            "BATTERY_MV"
                        ],
                        "port": 7,
                        "sensor_bonus_value": "Unavailable",
                        "sensor_firmware_ver": "Unavailable",
                        "sensor_number": 133,
                        "sensor_sn": "Unavailable"
                    },
                    {
                        "measurements": [
                            "REFERENCE_KPA",
                            "TEMPC_LOGGER"
                        ],
                        "port": 8,
                        "sensor_bonus_value": "Unavailable",
                        "sensor_firmware_ver": "Unavailable",
                        "sensor_number": 134,
                        "sensor_sn": "Unavailable"
                    }
                ],
                "valid_since": "2018-08-11T21:45:00Z",
                "values": [
                    [
                        1534023900,
                        0,
                        19,
                        [
                            {
                                "description": "Battery Percent",
                                "error": False,
                                "units": "%",
                                "value": 100
                            },
                            {
                                "description": "Battery Voltage",
                                "error": False,
                                "units": " mV",
                                "value": 7864
                            }
                        ],
                        [
                            {
                                "description": "Reference Pressure",
                                "error": False,
                                "units": " kPa",
                                "value": 100.62
                            },
                            {
                                "description": "Logger Temperature",
                                "error": False,
                                "units": " \u00b0C",
                                "value": 28.34
                            }
                        ]
                    ]
                        ]
                }
            }]
        }
}   ]

df = fj.flatten_json(data)

print(df.loc[0])


产出

created                                                                                     2020-05-12T15:10:37Z
device.device_info.device_fw                                                                                 204
device.device_info.device_sn                                                                            06-02133
device.device_info.device_trait                                                                                2
device.device_info.device_type                                                                               190
device.timeseries.configuration.sensors.measurements           [[BATTERY, BATTERY_MV], [REFERENCE_KPA, TEMPC_LOG...
device.timeseries.configuration.sensors.port                                                              [7, 8]
device.timeseries.configuration.sensors.sensor_bonus_value                            [Unavailable, Unavailable]
device.timeseries.configuration.sensors.sensor_firmware_ver                           [Unavailable, Unavailable]
device.timeseries.configuration.sensors.sensor_number                                                 [133, 134]
device.timeseries.configuration.sensors.sensor_sn                                     [Unavailable, Unavailable]
device.timeseries.configuration.valid_since                                                 2018-08-11T21:45:00Z
device.timeseries.configuration.values                         [1534023900, 0, 19, [{'description': 'Battery ...


现在,在这个df中,您可以使用device. timesies.configuration.values列对每个传感器数据做进一步的分析。

相关问题