Flink 1.12将Avro通用记录序列化到Kafka失败,出现com.esotericsoftware.kryo,不支持操作异常

qgelzfjb  于 2022-12-09  发布在  Apache
关注(0)|答案(1)|浏览(203)

I have a DataStream[GenericRecord]:

val consumer = new FlinkKafkaConsumer[String]("input_csv_topic", new SimpleStringSchema(), properties)
val stream = senv.
    addSource(consumer).
    map(line => {
        val arr = line.split(",")

        val schemaUrl = "" // avro schema link, standard .avsc file format
        val schemaStr = scala.io.Source.fromURL(schemaUrl).mkString.toString().stripLineEnd

        import org.codehaus.jettison.json.{JSONObject, JSONArray}
        val schemaFields: JSONArray = new JSONObject(schemaStr).optJSONArray("fields")

        val genericDevice: GenericRecord = new GenericData.Record(new Schema.Parser().parse(schemaStr))

        for(i <- 0 until arr.length) {
            val fieldObj: JSONObject = schemaFields.optJSONObject(i)
            val columnName = fieldObj.optString("name")
            var columnType = fieldObj.optString("type")

            if (columnType.contains("string")) {
                genericDevice.put(columnName, arr(i))
            } else if (columnType.contains("int")) {
                genericDevice.put(columnName, toInt(arr(i)).getOrElse(0).asInstanceOf[Number].intValue)
            } else if (columnType.contains("long")) {
                genericDevice.put(columnName, toLong(arr(i)).getOrElse(0).asInstanceOf[Number].longValue)
            }
        }

        genericDevice
    })

val kafkaSink = new FlinkKafkaProducer[GenericRecord](
    "output_avro_topic",
    new MyKafkaAvroSerializationSchema[GenericRecord](classOf[GenericRecord], "output_avro_topic", "this is the key", schemaStr),
    properties,
    FlinkKafkaProducer.Semantic.AT_LEAST_ONCE)

stream.addSink(kafkaSink)

Here is MyKafkaAvroSerializationSchema implementation:

class MyKafkaAvroSerializationSchema[T](avroType: Class[T], topic: String, key: String, schemaStr: String) extends KafkaSerializationSchema[T]  {

    lazy val schema: Schema = new Schema.Parser().parse(schemaStr)

    override def serialize(element: T, timestamp: lang.Long): ProducerRecord[Array[Byte], Array[Byte]] = {

        val cl = Thread.currentThread().getContextClassLoader()
        val genericData = new GenericData(cl)
        val writer = new GenericDatumWriter[T](schema, genericData)

        // val writer = new ReflectDatumWriter[T](schema)
        // val writer = new SpecificDatumWriter[T](schema)
        val out = new ByteArrayOutputStream()
        val encoder: BinaryEncoder = EncoderFactory.get().binaryEncoder(out, null)
        writer.write(element, encoder)
        encoder.flush()
        out.close()

        new ProducerRecord[Array[Byte], Array[Byte]](topic, key.getBytes, out.toByteArray)
    }
}

Here's stack trace screenshot:

com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException
    Serialization trace:
    reserved (org.apache.avro.Schema$Field)
    fieldMap (org.apache.avro.Schema$RecordSchema)
    schema (org.apache.avro.generic.GenericData$Record)

How to use Flink to serialize Avro Generic Record to Kafka? I have tested different writers, but still got com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException, thanks for your input.

frebpwbc

frebpwbc1#

您可以简单地将flink-avro模块添加到您的项目中,并使用已经提供的AvroSerializationSchema,在提供模式后,该AvroSerializationSchema可以用于SpecificRecordGenericRecord

相关问题