avrodeserialization在使用sum派生列时失败,但在使用count派生同一列时成功序列化数据在kafka中

yrdbyhpb  于 2021-07-09  发布在  Spark
关注(0)|答案(1)|浏览(276)

以下是我的sql:

select hostnetworkid,roamertype,carrierid, total_failure,total_count,date_format(timestamp(unix_timestamp(window.start)),\"yyyyMMdd\") as eventdate, date_format(timestamp(unix_timestamp(window.start)),\"HH:mm\") as start from (select hostnetworkid,roamertype,carrierid,window(event_timestamp, '#window'),count(case when status=1 then 1 else 0 end) as total_failure ,count(*) as total_count from #kpi group by hostnetworkid,roamertype,carrierid,window(event_timestamp, '#window'))

以下是我的sql,它在avro中提供了arrayindexoutofboundsexception:

select hostnetworkid,roamertype,carrierid, total_failure,total_count,date_format(timestamp(unix_timestamp(window.start)),\"yyyyMMdd\") as eventdate, date_format(timestamp(unix_timestamp(window.start)),\"HH:mm\") as start from (select hostnetworkid,roamertype,carrierid,window(event_timestamp, '#window'),sum(case when status=1 then 1 else 0 end) as total_failure ,count(*) as total_count from #kpi group by hostnetworkid,roamertype,carrierid,window(event_timestamp, '#window'))

有人能帮忙吗?为什么反序列化下面的avro模式对count有效,但对sum无效。这是我的avro模式文件

{"record","name":"MapKpi7","namespace":"com.mobileum",
              "fields":[{"name":"hostnetworkid","type":["int","null"]},{"name":"roamertype","type":["int","null"]}, {"name":"carrierid","type":["int","null"]}, {"name":"total_failure","type":"long"},{"name":"total_count","type":"long"},{"name":"eventdate","type":["string","null"]},{"name":"start","type":["string","null"]}]}

下面是堆栈跟踪:

java.lang.ArrayIndexOutOfBoundsException: 3
        at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:402)
        at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
        at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
44u64gxh

44u64gxh1#

通过将total\u failure模式定义为联合来解决这个问题:{“name”:“total\u failure”,“type”:[“long”,“null”]},而不是{“name”:“total\u failure”,“type”:“long”},

相关问题