json 如何在jq --stream命令中使用'select'？

egmofgnx 于 2022-12-15 发布在其他

关注(0)|答案(2)|浏览(93)

我有一个非常大的json文档（~100 GB），我尝试使用jq解析出满足给定条件的特定对象。因为它太大了，我无法将其读入内存，需要使用--stream选项。
我知道如何运行一个select来提取我不流媒体时所需要的内容，但在确定如何正确配置我的命令时可能需要一些帮助。
下面是我的文档example.json的一个示例。

{
  "reporting_entity_name" : "INSURANCE COMPANY",
  "reporting_entity_type" : "INSURER",
  "last_updated_on" : "2022-12-01",
  "version" : "1.0.0",
  "in_network" : [ {
    "negotiation_arrangement" : "ffs",
    "name" : "ER VISIT",
    "billing_code_type" : "CPT",
    "billing_code_type_version" : "2022",
    "billing_code" : "99285",
    "description" : "HIGHEST LEVEL ER VISIT",
    "negotiated_rates" : [ {
      "provider_groups" : [ {
        "npi" : [ 111111111, 222222222],
        "tin" : {
          "type" : "ein",
          "value" : "99-9999999"
        }
      } ],
      "negotiated_prices" : [ {
        "negotiated_type" : "negotiated",
        "negotiated_rate" : 550.50,
        "expiration_date" : "9999-12-31",
        "service_code" : [ "23" ],
        "billing_class" : "institutional"
      } ]
    } ]
  }
]
}

我正在尝试获取in_network对象，其中billing_code等于99285。
如果我能做到这一点没有流，这里是我会如何处理它：

jq '.in_network[] | select(.billing_code == "99285")' example.json

预期产出：

{
  "negotiation_arrangement": "ffs",
  "name": "ER VISIT",
  "billing_code_type": "CPT",
  "billing_code_type_version": "2022",
  "billing_code": "99285",
  "description": "HIGHEST LEVEL ER VISIT",
  "negotiated_rates": [
    {
      "provider_groups": [
        {
          "npi": [
            111111111,
            222222222
          ],
          "tin": {
            "type": "ein",
            "value": "99-9999999"
          }
        }
      ],
      "negotiated_prices": [
        {
          "negotiated_type": "negotiated",
          "negotiated_rate": 550.5,
          "expiration_date": "9999-12-31",
          "service_code": [
            "23"
          ],
          "billing_class": "institutional"
        }
      ]
    }
  ]
}

任何关于如何使用--stream选项配置此选项的帮助都将非常感谢！

JSON

来源：https://stackoverflow.com/questions/74745000/how-to-use-select-within-a-jq-stream-command

2条答案

按热度按时间

dxxyhpgq1#

如果仅.in_network数组中的对象就可以放入内存，则在数组项处截断（两级深度）：

jq --stream -n '
  fromstream(2|truncate_stream(inputs | select(.[0][0] == "in_network")))
  | select(.billing_code == "99285")
' example.json

{
  "negotiation_arrangement": "ffs",
  "name": "ER VISIT",
  "billing_code_type": "CPT",
  "billing_code_type_version": "2022",
  "billing_code": "99285",
  "description": "HIGHEST LEVEL ER VISIT",
  "negotiated_rates": [
    {
      "provider_groups": [
        {
          "npi": [
            111111111,
            222222222
          ],
          "tin": {
            "type": "ein",
            "value": "99-9999999"
          }
        }
      ],
      "negotiated_prices": [
        {
          "negotiated_type": "negotiated",
          "negotiated_rate": 550.5,
          "expiration_date": "9999-12-31",
          "service_code": [
            "23"
          ],
          "billing_class": "institutional"
        }
      ]
    }
  ]
}

赞(0）回复(0）举报 2022-12-15

nle07wnf2#

您会发现jq —-stream即使对于10GB的数据也非常慢。由于jq旨在补充其他shell工具，因此我建议使用jstream（https://github.com/bcicen/jstream）或我自己的jm或jm.py（https://github.com/pkoppstein/jm）来“splat”数组，并将结果通过管道传输到jq。
例如，要达到与jq过滤器相同的效果：

jm —-pointer /in_network example.json | 
  jq 'select(.billing_code == "99285")'

赞(0）回复(0）举报 2022-12-15

我来回答

json 如何在jq --stream命令中使用'select'？

2条答案

相关问题

热门标签

最新问答