嵌套json数组上的spark sql

sbtkgmzw  于 2021-07-09  发布在  Spark
关注(0)|答案(1)|浏览(466)

我有以下两个json消息

[{
  "identifier": {
    "domain": "OFFICE ADDRESS",
    "id": "987654321",
    "version": 1
  },
  "payload": {

    "contactMethods": [
      {
        "faxDiallingNumber": "0000/11111",
        "objId": 8,
        "type": "Fax",
        "use": "Business Address"
      },
      {
        "objId": 9,
        "telephoneDiallingNumber": "0999/99999",
        "telephoneType": "Fixed",
        "type": "Telephone",
        "use": "Business Address"
      },
      {
        "addressLine1": "house no",
        "addressLine3": "street name",
        "addressLine4": "area name",
        "cityCode": "city name",
        "countryCode": "US",
        "objId": 10,
        "postalCode": "12345",
        "preferredContactMethodFlag": true,
        "type": "International Address",
        "use": "Registered"
      }
    ]
  }
},
{
  "identifier": {
    "domain": "HOME ADDRESS",
    "id": "123456789",
    "version": 1
  },
  "payload": {

    "contactMethods": [
      {
        "faxDiallingNumber": "0000/22222",
        "objId": 11,
        "type": "Fax",
        "use": "home Address"
      },
      {
        "addressLine1": "house no",
        "addressLine3": "street name",
        "addressLine4": "area name",
        "cityCode": null,
        "countryCode": "US",
        "objId": 12,
        "postalCode": "45678",
        "preferredContactMethodFlag": true,
        "type": "International Address",
        "use": "Registered"
      },
      {
        "objId": 13,
        "telephoneDiallingNumber": "0999/88888,
        "telephoneType": "Fixed",
        "type": "Telephone",
        "use": "home Address"
      }
    ]
  }
}
]

使用pysparksql,我尝试下面的方法来找出id,其中type是“international address”,citycode是null。
我的输出应该是

你能告诉我正确的语法吗。
我试过分解,数组包含,展平函数,但没有得到它的权利。

select 
identified.id ,
payload.contactMethods.type,
payload.contactMethods.cityCode
from sample_json_df -- (will create a dataframe on the json file using pyspark)
where
payload.contactMethods.type = 'International Address'
and payload.contactMethods.cityCode is null
xe55xuns

xe55xuns1#

explode应该是这样工作的:

SELECT
  id,
  contactMethods.type as type,
  contactMethods.cityCode
FROM
  (
    SELECT
      identifier.id,
      explode(payload.contactMethods) as contactMethods
    FROM
      sample_json_df
  )
WHERE
  contactMethods.type = 'International Address'
  AND contactMethods.cityCode is null

相关问题