使用pig加载json数组

5hcedyr0  于 2021-06-24  发布在  Pig
关注(0)|答案(0)|浏览(190)

我有一个文件格式为每行json数组。
像这样的

["6400000000",{"status":"FINE","ok":"false","addresses":"00:00:00:00:00:00"}]
["4900000000",{"status":"FINE","ok":"true","addresses":"00:00:00:00:00:00"}]

我正在amazon emr上运行以下程序:

register 's3://mybucket/jar/elephant-bird-core-4.9.jar';
register 's3://mybucket/jar/elephant-bird-pig-4.9.jar';
register 's3://mybucket/jar/elephant-bird-hadoop-compat-4.9.jar';
register 's3://mybucket/jar/json-simple-1.1.jar';

sample = load 's3://mybucket/data/sample.json' using com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') as (json:map[]);

dump sample;

对于json中的每一行,我都会得到以下错误:

java.lang.ClassCastException: org.json.simple.JSONArray cannot be cast to org.json.simple.JSONObject
    at com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:158)
    at com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:129)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:562)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:151)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)

我遗漏了什么吗?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题