avrodeserialization在使用sum派生列时失败，但在使用count派生同一列时成功序列化数据在kafka中

yrdbyhpb 于 2021-07-09 发布在 Spark

关注(0)|答案(1)|浏览(284)

以下是我的sql：

select hostnetworkid,roamertype,carrierid, total_failure,total_count,date_format(timestamp(unix_timestamp(window.start)),\"yyyyMMdd\") as eventdate, date_format(timestamp(unix_timestamp(window.start)),\"HH:mm\") as start from (select hostnetworkid,roamertype,carrierid,window(event_timestamp, '#window'),count(case when status=1 then 1 else 0 end) as total_failure ,count(*) as total_count from #kpi group by hostnetworkid,roamertype,carrierid,window(event_timestamp, '#window'))

以下是我的sql，它在avro中提供了arrayindexoutofboundsexception：

select hostnetworkid,roamertype,carrierid, total_failure,total_count,date_format(timestamp(unix_timestamp(window.start)),\"yyyyMMdd\") as eventdate, date_format(timestamp(unix_timestamp(window.start)),\"HH:mm\") as start from (select hostnetworkid,roamertype,carrierid,window(event_timestamp, '#window'),sum(case when status=1 then 1 else 0 end) as total_failure ,count(*) as total_count from #kpi group by hostnetworkid,roamertype,carrierid,window(event_timestamp, '#window'))

有人能帮忙吗？为什么反序列化下面的avro模式对count有效，但对sum无效。这是我的avro模式文件

{"record","name":"MapKpi7","namespace":"com.mobileum",
              "fields":[{"name":"hostnetworkid","type":["int","null"]},{"name":"roamertype","type":["int","null"]}, {"name":"carrierid","type":["int","null"]}, {"name":"total_failure","type":"long"},{"name":"total_count","type":"long"},{"name":"eventdate","type":["string","null"]},{"name":"start","type":["string","null"]}]}

下面是堆栈跟踪：

java.lang.ArrayIndexOutOfBoundsException: 3
        at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:402)
        at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
        at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)

avro apache-kafka apache-spark spark-avro

来源：https://stackoverflow.com/questions/66900848/avrodeserialisation-failing-when-deriving-a-col-using-sum-but-is-successful-when