flume代理:将主机添加到消息,然后发布到kafka主题

azpvetkf  于 2021-06-03  发布在  Hadoop
关注(0)|答案(2)|浏览(336)

我们开始通过将消息发布到kafka主题来整合应用程序中的事件日志数据。虽然我们可以直接从应用程序向kafka写入,但我们选择将其作为一个一般问题来处理,并使用flume代理。这提供了一些灵活性:如果我们想从服务器捕获其他内容,我们可以跟踪不同的源并发布到不同的kafka主题。
我们创建了一个flume agent conf文件来跟踪日志并发布到kafka主题:

tier1.sources  = source1
tier1.channels = channel1
tier1.sinks = sink1

tier1.sources.source1.type = exec
tier1.sources.source1.command = tail -F /var/log/some_log.log
tier1.sources.source1.channels = channel1

tier1.channels.channel1.type = memory
tier1.channels.channel1.capacity = 10000
tier1.channels.channel1.transactionCapacity = 1000

tier1.sinks.sink1.type = org.apache.flume.sink.kafka.KafkaSink
tier1.sinks.sink1.topic = some_log
tier1.sinks.sink1.brokerList = hadoop01:9092,hadoop02.com:9092,hadoop03.com:9092
tier1.sinks.sink1.channel = channel1
tier1.sinks.sink1.batchSize = 20

不幸的是,消息本身并没有指定生成它们的主机。如果有一个应用程序在多个主机上运行,并且发生了错误,那么就无法确定是哪个主机生成了消息。
我注意到,如果flume直接写入hdfs,我们可以使用flume拦截器写入特定的hdfs位置。虽然我们可以用kafka做一些类似的事情,即为每台服务器创建一个新的主题,但这可能会变得很难处理。我们会有成千上万的主题。
flume在发布到kafka主题时是否可以附加/包括发起主机的主机名?

5fjcxozz

5fjcxozz1#

您可以创建一个自定义tcp源,它读取客户机地址并将其添加到头中。

@Override
    public void configure(Context context) {
        port = context.getInteger("port");
        buffer = context.getInteger("buffer");

        try{
            serverSocket = new ServerSocket(port);
            logger.info("FlumeTCP source initialized");
        }catch(Exception e) {
            logger.error("FlumeTCP source failed to initialize");
        }
    }

@Override
    public void start() {
        try {
            clientSocket = serverSocket.accept();
            receiveBuffer = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
            logger.info("Connection established with client : " + clientSocket.getRemoteSocketAddress());
            final ChannelProcessor channel = getChannelProcessor();
            final Map<String, String> headers = new HashMap<String, String>();
            headers.put("hostname", clientSocket.getRemoteSocketAddress().toString());
            String line = "";
            List<Event> events = new ArrayList<Event>();

            while ((line = receiveBuffer.readLine()) != null) {
                Event event = EventBuilder.withBody(
                        line, Charset.defaultCharset(),headers);

                logger.info("Event created");
                events.add(event);
                if (events.size() == buffer) {
                    channel.processEventBatch(events);
                }
            }
        } catch (Exception e) {

        }
        super.start();
    }

flume-conf.properties可以配置为:


# Licensed to the Apache Software Foundation (ASF) under one

# or more contributor license agreements.  See the NOTICE file

# distributed with this work for additional information

# regarding copyright ownership.  The ASF licenses this file

# to you under the Apache License, Version 2.0 (the

# "License"); you may not use this file except in compliance

# with the License.  You may obtain a copy of the License at

# 

# http://www.apache.org/licenses/LICENSE-2.0

# 

# Unless required by applicable law or agreed to in writing,

# software distributed under the License is distributed on an

# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

# KIND, either express or implied.  See the License for the

# specific language governing permissions and limitations

# under the License.

# The configuration file needs to define the sources,

# the channels and the sinks.

# Sources, channels and sinks are defined per agent,

# in this case called 'agent'

agent.sources = CustomTcpSource
agent.channels = memoryChannel
agent.sinks = loggerSink

# For each one of the sources, the type is defined

agent.sources.CustomTcpSource.type = com.vishnu.flume.source.CustomFlumeTCPSource
agent.sources.CustomTcpSource.port = 4443
agent.sources.CustomTcpSource.buffer = 1

# The channel can be defined as follows.

agent.sources.CustomTcpSource.channels = memoryChannel

# Each sink's type must be defined

agent.sinks.loggerSink.type = logger

# Specify the channel the sink should use

agent.sinks.loggerSink.channel = memoryChannel

# Each channel's type is defined.

agent.channels.memoryChannel.type = memory

# Other config values specific to each type of channel(sink or source)

# can be defined as well

# In this case, it specifies the capacity of the memory channel

agent.channels.memoryChannel.capacity = 100

我发送了一条测试消息来测试这个,它看起来像:

Event: { headers:{hostname=/127.0.0.1:50999} body: 74 65 73 74 20 6D 65 73 73 61 67 65             test message }

我已经在github上传了这个项目

sy5wg1nm

sy5wg1nm2#

如果你用的是 exec 没有什么可以阻止您运行智能命令,将主机名作为日志文件内容的前缀。
注意:如果命令使用管道之类的东西,还需要如下指定shell:

tier1.sources.source1.type = exec
tier1.sources.source1.shell = /bin/sh -c
tier1.sources.source1.command =  tail -F /var/log/auth.log | sed --unbuffered "s/^/$(hostname) /"

消息如下所示:

frb.hi.inet 2015-11-17 08:39:39.432 INFO [...]

... 哪里 frb.hi.inet 告诉我们主人的名字。

相关问题