jq递归分析JSON对象

yptwkmov  于 2023-02-20  发布在  其他
关注(0)|答案(2)|浏览(202)

我需要分析一些巨大的JSON文件,以便将其转换为一些表。我发现jq在检查这些文件时非常有用,但这样的文件将有数百个,而我对jq还很陌生。
我已经在我的~/.jq中有了一些非常方便的功能(非常感谢@mikehwang)

def profile_object:
    to_entries | def parse_entry: {"key": .key, "value": .value | type}; map(parse_entry)
        | sort_by(.key) | from_entries;

def profile_array_objects:
    map(profile_object) | map(to_entries) | reduce .[] as $item ([]; . + $item) | sort_by(.key) | from_entries;

我肯定在我描述完我的问题后我必须修改它们。
我想要一个jq行来分析一个对象。如果一个键Map到一个对象数组,那么收集对象中的唯一键,如果那里有嵌套的对象数组,那么继续分析。如果一个值是一个对象,那么分析那个对象。
抱歉,这个例子太长了,但请想象一下几GB的情况:

{
    "name": "XYZ Company",
    "type": "Contractors",
    "reporting": [
        {
            "group_id": "660",
            "groups": [
                {
                    "ids": [
                        987654321,
                        987654321,
                        987654321
                    ],   
                    "market": {
                        "name": "Austin, TX",
                        "value": "873275"
                    }
                },
                {
                    "ids": [
                        987654321,
                        987654321,
                        987654321
                    ],   
                    "market": {
                        "name": "Nashville, TN",
                        "value": "2393287"
                    }
                }
            ]
        }
    ],
    "product_agreements": [
        {
            "negotiation_arrangement": "FFVII",
            "code": "84144",
            "type": "DJ",
            "type_version": "V10",
            "description": "DJ in a mask",
            "name": "Claptone",
            "negotiated_rates": [
                {
                    "company_references": [
                        1,
                        5,
                        458
                    ],
                    "negotiated_prices": [
                        {
                            "type": "negotiated",
                            "rate": 17.73,
                            "expiration_date": "9999-12-31",
                            "code": [
                                "11"
                            ],
                            "billing_class": "professional"
                        }
                    ]
                },
                {
                    "company_references": [
                        747
                    ],
                    "negotiated_prices": [
                        {
                            "type": "fee",
                            "rate": 28.42,
                            "expiration_date": "9999-12-31",
                            "code": [
                                "11"
                            ],
                            "billing_class": "professional"
                        }
                    ]
                }
            ]
        },
        {
            "negotiation_arrangement": "MGS3",
            "name": "David Byrne",
            "type": "Producer",
            "type_version": "V10",
            "code": "654321",
            "description": "Frontman from Talking Heads",
            "negotiated_rates": [
                {
                    "company_references": [
                        1,
                        9,
                        2344,
                        8456
                    ],
                    "negotiated_prices": [
                        {
                            "type": "negotiated",
                            "rate": 68.73,
                            "expiration_date": "9999-12-31",
                            "code": [
                                "11"
                            ],
                            "billing_class": "professional"
                        }
                    ]
                },
                {
                    "company_references": [
                        679
                    ],
                    "negotiated_prices": [
                        {
                            "type": "fee",
                            "rate": 89.25,
                            "expiration_date": "9999-12-31",
                            "code": [
                                "11"
                            ],
                            "billing_class": "professional"
                        }
                    ]
                }
            ]
        }
    ],
    "version": "1.3.1",
    "last_updated_on": "2023-02-01"
}

预期输出:

{
    "name": "string",
    "type": "string",
    "reporting": [
      {
        "group_id": "number",
        "groups": [
            {
                "ids": [
                    "number"
                ],
                "market": {
                    "type": "string",
                    "value": "string"
                }
            }
        ]
      }
    ],
    "product_agreements": [
      {
        "negotiation_arrangement": "string",
        "code": "string",
        "type": "string",
        "type_version": "string",
        "description": "string",
        "name": "string",
        "negotiated_rates": [
          {
            "company_references": [
                "number"
            ],
            "negotiated_prices": [
              {
                "type": "string",
                "rate": "number",
                "expiration_date": "string",
                "code": [
                  "string"
                ],
                "billing_class": "string"
              }
            ]
          }
        ]        
      }
    ],
    "version": "string",
    "last_updated_on": "string"
}

真的很抱歉,如果有任何错误,但我试图使它所有的一致性和简单,因为我可以。
重申一下,如果一个值是一个对象或数组,递归地分析JSON对象中的每个键。解决方案需要与键名无关。如果需要,很乐意进一步说明。

nhaq1z21

nhaq1z211#

https://gist.github.com/pkoppstein/a5abb4ebef3b0f72a6ed上的jq模块schema.jq被设计用来生成您所描述的那种结构模式。
对于非常大的输入,它可能非常慢,因此如果JSON足够规则,则可能使用混合策略-分析足够多的数据以得出全面的结构模式,然后检查它是否适用。
有关schema.jq生成的结构模式的一致性测试,请参见https://github.com/pkoppstein/JESS

iecba09b

iecba09b2#

根据您的input.json,下面是一个解决方案:

jq '
def schema:
    if   type == "object" then .[] |= schema
    elif type == "array"  then [first | schema]
    else type
    end;
schema
' input.json

相关问题