大家好,提前感谢大家花时间阅读本文:)我正试图在hadoop集群中发送一个json对象,用spark处理它,这个json大约是15kb。我将flume代理设置为:
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 41400
a1.sources.r1.max-line-length = 512000
a1.sources.r1.eventSize = 512000
# a1.sources.deserializer.maxLineLength = 512000
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /hadoop/hdfs/data
a1.sinks.k1.hdfs.filePrefix = CDR
a1.sinks.k1.hdfs.callTimeout = 15000
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 226
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.batchSize = 226
# Use a channel which buffers events in memory
a1.channels.c1.type = file
a1.channels.c1.capacity = 512000
a1.channels.c1.transactionCapacity =512000
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
除此之外,我有一个perl脚本,它通过指定端口的套接字发送json对象,但是当我启动flume代理时,我得到以下消息:
WARN source.NetcatSource: Client sent event exceeding the maximum length
我不明白的是,我把我的事件的最大行长度设置为512000字节,大于15KB,有人能帮我吗?谢谢,对不起,我的英语不好
1条答案
按热度按时间l0oc07j21#
您可以验证json(在perl脚本上)是否以换行符(eol)结束。
参考文件:https://flume.apache.org/flumeuserguide.html#netcat-来源