我需要获取一些复杂的嵌套json,并将其转换为以制表符分隔的输出,其中输入json的每个ts和y对都有一个唯一的输出行。我知道如何以制表符分隔的格式输出,但在正确地展平json时遇到了问题。根据下面的输入和期望的输出有什么建议吗?我正在使用elephantbird加载json。
我有以下输入:
{
“gateway": [
{"beer" : [
{"change_date": "change_date"},
{"type": "squirrel-pale-ale"},
{"vendor": "foo-vendor"},
{"size": "size"}
]
},
{"name": "SBS01"},
{"hw_version": "1.1"}
],
"sensors": [
[
{"info": {
"name": "fake-sensor01",
"serial_number": “fakies40911",
"type": "temperature"
}
},
{"values": [
{"ts": 1400869261, "y": 998}, // "ts" is UNIX Epoch in UTC
{"ts": 1400869276, "y": 1002}
]
}
],
[
{"info": {
"name": "fake-sensor02",
"serial_number": “fakies40944",
"type": "flow"
}
},
{"values": [
{"ts": 1400869294, "y": 54},
{"ts": 1400869303, "y": 76}
]
}
]
]
}
我可以用这个pig脚本加载它:
register 's3://path-to-scripts/elephant-bird-core-4.5.jar';
register 's3://path-to-scripts/elephant-bird-hadoop-compat-4.5.jar';
register 's3://path-to-scripts/elephant-bird-pig-4.5.jar';
register 's3://path-to-scripts/json-simple-1.1.1.jar';
data = load 's3://path-to-data/example_record.json' using com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]);
现在我想得到每个ts和y对的扁平元组,同时保留其他属性。我尝试过用flatten和从Map中引用kv对来生成各种语句序列,但是很困难。寻找如何获得此结果的建议:
(SBS01, 1.1, change_date, squirrel-pale-ale, foo-vendor, size, fake-sensor01, fakies40911, temperature, 1400869261, 998)
(SBS01, 1.1, change_date, squirrel-pale-ale, foo-vendor, size, fake-sensor01, fakies40911, temperature, 1400869276, 1002)
(SBS01, 1.1, change_date, squirrel-pale-ale, foo-vendor, size, fake-sensor02, fakies40944, flow, 1400869294, 54)
(SBS01, 1.1, change_date, squirrel-pale-ale, foo-vendor, size, fake-sensor02, fakies40944, flow, 1400869303, 76)
暂无答案!
目前还没有任何答案,快来回答吧!