我已成功配置flume将文本文件从本地文件夹传输到hdfs。我的问题是,当这个文件被转换成hdfs时,一些不需要的文本“hdfs.write.longwriter+binary characters”会在我的文本文件中加上前缀。这是我的flume.conf
agent.sources = flumedump
agent.channels = memoryChannel
agent.sinks = flumeHDFS
agent.sources.flumedump.type = spooldir
agent.sources.flumedump.spoolDir = /opt/test/flume/flumedump/
agent.sources.flumedump.channels = memoryChannel
# Each sink's type must be defined
agent.sinks.flumeHDFS.type = hdfs
agent.sinks.flumeHDFS.hdfs.path = hdfs://bigdata.ibm.com:9000/user/vin
agent.sinks.flumeHDFS.fileType = DataStream
# Format to be written
agent.sinks.flumeHDFS.hdfs.writeFormat = Text
agent.sinks.flumeHDFS.hdfs.maxOpenFiles = 10
# rollover file based on maximum size of 10 MB
agent.sinks.flumeHDFS.hdfs.rollSize = 10485760
# never rollover based on the number of events
agent.sinks.flumeHDFS.hdfs.rollCount = 0
# rollover file based on max time of 1 mi
agent.sinks.flumeHDFS.hdfs.rollInterval = 60
# Specify the channel the sink should use
agent.sinks.flumeHDFS.channel = memoryChannel
# Each channel's type is defined.
agent.channels.memoryChannel.type = memory
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 100
我的源文本文件非常简单,包含文本:嗨,我的名字是hadoop,这是文件一。
我在hdfs中得到的sink文件如下所示:seq!org.apache.hadoop.io.longwritable org.apache.hadoop.io.text������5����>i<4小时�ǥ�+嗨,我叫hadoop,这是文件一。
请让我知道我做错了什么?
1条答案
按热度按时间ffdz8vbo1#
我想出来了。我得修这条线
agent.sinks.flumeHDFS.fileType = DataStream
把它改成agent.sinks.flumeHDFS.hdfs.fileType = DataStream
这解决了这个问题。