sink.hdfs writer在我的文本文件中添加垃圾

0sgqnhkj 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(242)

我已成功配置flume将文本文件从本地文件夹传输到hdfs。我的问题是，当这个文件被转换成hdfs时，一些不需要的文本“hdfs.write.longwriter+binary characters”会在我的文本文件中加上前缀。这是我的flume.conf

agent.sources = flumedump
agent.channels = memoryChannel
agent.sinks = flumeHDFS

agent.sources.flumedump.type = spooldir
agent.sources.flumedump.spoolDir = /opt/test/flume/flumedump/
agent.sources.flumedump.channels = memoryChannel

# Each sink's type must be defined

agent.sinks.flumeHDFS.type = hdfs
agent.sinks.flumeHDFS.hdfs.path = hdfs://bigdata.ibm.com:9000/user/vin
agent.sinks.flumeHDFS.fileType = DataStream

# Format to be written

agent.sinks.flumeHDFS.hdfs.writeFormat = Text

agent.sinks.flumeHDFS.hdfs.maxOpenFiles = 10

# rollover file based on maximum size of 10 MB

agent.sinks.flumeHDFS.hdfs.rollSize = 10485760

# never rollover based on the number of events

agent.sinks.flumeHDFS.hdfs.rollCount = 0

# rollover file based on max time of 1 mi

agent.sinks.flumeHDFS.hdfs.rollInterval = 60

# Specify the channel the sink should use

agent.sinks.flumeHDFS.channel = memoryChannel

# Each channel's type is defined.

agent.channels.memoryChannel.type = memory

# Other config values specific to each type of channel(sink or source)

# can be defined as well

# In this case, it specifies the capacity of the memory channel

agent.channels.memoryChannel.capacity = 100

我的源文本文件非常简单，包含文本：嗨，我的名字是hadoop，这是文件一。
我在hdfs中得到的sink文件如下所示：seq！org.apache.hadoop.io.longwritable org.apache.hadoop.io.text��5��>i<4小时�ǥ�+嗨，我叫hadoop，这是文件一。
请让我知道我做错了什么？

hadoop flume flume-ng

来源：https://stackoverflow.com/questions/26199991/sink-hdfs-writer-adds-garbage-in-my-text-file