flume hdfs sink不断生成.tmp文件

e4yzc0pl  于 2021-06-04  发布在  Flume
关注(0)|答案(1)|浏览(535)

某些hdfs接收器文件未关闭
有人说,如果接收器进程因超时条件等问题而失败,它不会再次尝试关闭文件。
我已经查看了我的flume日志文件,但没有错误。但是,日志文件显示,每个周期,flume生成两个tmp文件并只关闭一个tmp文件。。。
任何关于配置的建议都将不胜感激!谢谢!


# Name the components on this agent

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Configure the Kafka Source

a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 1000

# a1.sources.r1.batchDurationMillis = 2000

a1.sources.r1.kafka.bootstrap.servers = 150.2.237.16:6667,150.2.237.17:6667
a1.sources.r1.kafka.topics = 1-sysmaster1-thread
a1.sources.r1.kafka.consumer.group.id = flume_hdfs_consumer

# Describe the sink

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /user/flume/kafka-data/1-sysmaster1-thread/%y%m%d
a1.sinks.k1.hdfs.filePrefix = 1-sysmaster1-thread-%H%M

# Describing sink with the problem of Encoding

a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text

# Describing sink with the problem of many hdfs files

### Roll a file after certain amount of events occurs  ###

a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 10000
a1.sinks.k1.hdfs.batchSize = 1000

# Use a channel which buffers events in memory

a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 1000

# Use File channel

# a1.channels.c1.type = file

# a1.channels.cl.checkpointDir = /home/bigdata/flume/checkpoint

# a1.channels.c1.dataDirs = /home/bigdata/flume/data

# Bind the source and sink to the channel

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
23 4월 2019 11:47:04,105 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1147.1555987622865.tmp
23 4월 2019 11:48:03,382 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSDataStream.configure:57)  - Serializer = TEXT, UseRawLocalFileSystem = false
23 4월 2019 11:48:03,457 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1148.1555987683383.tmp
23 4월 2019 11:48:08,664 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.doClose:438)  - Closing /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1148.1555987683383.tmp
23 4월 2019 11:48:08,689 INFO  [hdfs-k1-call-runner-8] (org.apache.flume.sink.hdfs.BucketWriter$7.call:681)  - Renaming /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1148.1555987683383.tmp to /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1148.1555987683383
23 4월 2019 11:48:08,712 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1148.1555987683384.tmp
23 4월 2019 11:49:03,711 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSDataStream.configure:57)  - Serializer = TEXT, UseRawLocalFileSystem = false
23 4월 2019 11:49:03,806 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1149.1555987743712.tmp
23 4월 2019 11:49:05,439 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.doClose:438)  - Closing /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1149.1555987743712.tmp
23 4월 2019 11:49:05,460 INFO  [hdfs-k1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter$7.call:681)  - Renaming /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1149.1555987743712.tmp to /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1149.1555987743712
23 4월 2019 11:49:05,480 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1149.1555987743713.tmp
23 4월 2019 11:50:02,354 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSDataStream.configure:57)  - Serializer = TEXT, UseRawLocalFileSystem = false
23 4월 2019 11:50:02,387 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1150.1555987802355.tmp
23 4월 2019 11:50:03,015 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.doClose:438)  - Closing /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1150.1555987802355.tmp
23 4월 2019 11:50:03,032 INFO  [hdfs-k1-call-runner-4] (org.apache.flume.sink.hdfs.BucketWriter$7.call:681)  - Renaming /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1150.1555987802355.tmp to /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1150.1555987802355
[root@sd-mds-01 logs]# hdfs dfs -ls /user/flume/kafka-data/1-sysmaster1-thread/190423/
Found 163 items
-rw-r--r--   3 root hdfs    1781109 2019-04-23 11:20 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1120.1555986001199
-rw-r--r--   3 root hdfs     212118 2019-04-23 11:20 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1120.1555986001200.tmp
-rw-r--r--   3 root hdfs    1777270 2019-04-23 11:21 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1121.1555986062575
-rw-r--r--   3 root hdfs      54451 2019-04-23 11:21 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1121.1555986062576.tmp
-rw-r--r--   3 root hdfs    1781741 2019-04-23 11:22 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1122.1555986123181
-rw-r--r--   3 root hdfs      34735 2019-04-23 11:22 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1122.1555986123182.tmp
-rw-r--r--   3 root hdfs    1782315 2019-04-23 11:23 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1123.1555986183768
-rw-r--r--   3 root hdfs      28682 2019-04-23 11:23 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1123.1555986183769.tmp
-rw-r--r--   3 root hdfs    1782437 2019-04-23 11:24 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1124.1555986244304
-rw-r--r--   3 root hdfs     211547 2019-04-23 11:24 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1124.1555986244305.tmp
-rw-r--r--   3 root hdfs    1782775 2019-04-23 11:25 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1125.1555986302891
-rw-r--r--   3 root hdfs      35918 2019-04-23 11:25 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1125.1555986302892.tmp
-rw-r--r--   3 root hdfs    1781180 2019-04-23 11:26 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1126.1555986362097
-rw-r--r--   3 root hdfs      30967 2019-04-23 11:26 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1126.1555986362098.tmp
-rw-r--r--   3 root hdfs    1781682 2019-04-23 11:27 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1127.1555986423432
-rw-r--r--   3 root hdfs      41381 2019-04-23 11:27 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1127.1555986423433.tmp
-rw-r--r--   3 root hdfs    1781710 2019-04-23 11:28 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1128.1555986483928
-rw-r--r--   3 root hdfs     211240 2019-04-23 11:28 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1128.1555986483929.tmp
-rw-r--r--   3 root hdfs    1785456 2019-04-23 11:29 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1129.1555986542442
u2nhd7ah

u2nhd7ah1#

我发现了问题。。。
我将配置设置为滚动带有时间前缀的文件

a1.sinks.k1.hdfs.filePrefix = 1-sysmaster1-thread-%H%M

正如您看到的结果路径一样,每分钟结束时生成的每个文件都无法正确地汇总。
从配置文件中删除该行之后,它就可以正常工作了。

相关问题