Flink 将Row或GenericRowData与DataStream API一起使用更好吗?

wj8zmpe1  于 2022-12-09  发布在  Apache
关注(0)|答案(2)|浏览(894)

我正在使用flink 1.15.2,我应该使用Row还是GenericRowData来继承我自己的数据类型?我主要使用流API。谢谢。

ccgok5k5

ccgok5k51#

一般来说,DataStream API在记录类型方面非常灵活。POJO类型可能是最方便的类型。基本上可以使用任何Java类,但您需要检查通过反射提取的是哪个TypeInformation。有时需要手动覆盖它。
对于Row,您将始终必须手动提供类型,因为反射不能基于类签名做很多事情。
应该避免使用GenericRowData,它是一个内部类,有很多注意事项(字符串必须是StringData,数组处理不简单)。此外,GenericRowData在反序列化后会变为BinaryRowData。TLDR此类型用于SQL引擎。

e4yzc0pl

e4yzc0pl2#

The docs are actually helpful here, I was confused too.
The section at the top titled "All Known Implementing Classes" lists all the implementations. RowData and GenericRowData are described as internal data structures. If you can use a POJO, then great. But if you need something that implements RowData , take a look at BinaryRowData , BoxedWrapperRowData , ColumnarRowData , NestedRowData , or any of the implementations there that aren't listed as internal.
I'm personally using NestedRowData to map a DataStream[Row] into a DataStream[RowData] and I'm not at all sure that's a good idea :) Especially since I can't seem to add a string attribute

相关问题