我想将本地文件系统中的确切文件转储到hdfs

5jdjgkvh  于 2021-06-01  发布在  Hadoop
关注(0)|答案(0)|浏览(194)

我想把本地文件转储到hdfs,但是flume正在合并所有.gz文件中的所有数据并将其写入一个文件,我想根据系统的当前时间戳在hdfs上写入.gz文件。
代理程序配置


# Identify the components on agent agent1:

agent1.sources = agent1_source
agent1.sinks = agent1_sink
agent1.channels = agent1_channel

# Configure the source:

agent1.sources.agent1_source.type = spooldir
agent1.sources.agent1_source.spoolDir =/data/
agent1.sources.agent1_source.fileSuffix= .COMPLETED
agent1.sources.agent1_source.deserializer=org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder

# Describe the sink:

agent1.sinks.agent1_sink.type = hdfs
agent1.sinks.agent1_sink.hdfs.path = /user/hadoop/data
agent1.sinks.agent1_sink.hdfs.writeFormat = Text

# agent1.sinks.agent1_sink.hdfs.fileType = DataStream

agent1.sinks.agent1_sink.hdfs.rollInterval=0
agent1.sinks.agent1_sink.hdfs.rollSize=0
agent1.sinks.agent1_sink.hdfs.fileType =CompressedStream
agent1.sinks.agent1_sink.hdfs.codeC=gzip
agent1.sinks.agent1_sink.hdfs.rollCount=0
agent1.sinks.agent1_sink.hdfs.idleTimeout=1

# Configure a channel that buffers events in memory:

agent1.channels.agent1_channel.type = memory
agent1.channels.agent1_channel.capacity = 20000
agent1.channels.agent1_channel.transactionCapacity = 100

# Bind the source and sink to the channel:

agent1.sources.agent1_source.channels = agent1_channel
agent1.sinks.agent1_sink.channel = agent1_channel

我有如下格式的hdfs文件:text1_2018-02-01.txt.gz text2_2018-02-02.txt.gz
我想把它存储在hdfs上,比如/user/hadoop/data/event_date=2018-02-01/text1.txt.gz/user/hadoop/data/event_date=2018-02-02/text2.txt.gz提前谢谢。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题