如何通过java代码更改apache flume的配置文件?

vwoqyblh  于 2021-06-01  发布在  Hadoop
关注(0)|答案(1)|浏览(354)

iam目前正在进行一个大数据项目,对twitter的热门主题进行情绪分析。我学习了cloudera的教程,了解了如何通过flume将tweet发送到hadoop。
http://blog.cloudera.com/blog/2012/09/analyzing-twitter-data-with-hadoop/
flume.conf(Flume形态):


# Licensed to the Apache Software Foundation (ASF) under one

# or more contributor license agreements. See the NOTICE file

# distributed with this work for additional information

# regarding copyright ownership. The ASF licenses this file

# to you under the Apache License, Version 2.0 (the

# "License"); you may not use this file except in compliance

# with the License. You may obtain a copy of the License at

# 

# http://www.apache.org/licenses/LICENSE-2.0

# 

# Unless required by applicable law or agreed to in writing,

# software distributed under the License is distributed on an

# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

# KIND, either express or implied. See the License for the

# specific language governing permissions and limitations

# under the License.

# The configuration file needs to define the sources,

# the channels and the sinks.

# Sources, channels and sinks are defined per agent,

# in this case called 'TwitterAgent'

TwitterAgent.sources = Twitter

TwitterAgent.channels = MemChannel

TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource

TwitterAgent.sources.Twitter.channels = MemChannel

TwitterAgent.sources.Twitter.consumerKey = 

TwitterAgent.sources.Twitter.consumerSecret = 

TwitterAgent.sources.Twitter.accessToken =  

TwitterAgent.sources.Twitter.accessTokenSecret =  

TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing

TwitterAgent.sinks.HDFS.channel = MemChannel

TwitterAgent.sinks.HDFS.type = hdfs

TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hadoop1:8020/user/flume/tweets/%Y/%m/%d/%H/

TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream

TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000

TwitterAgent.sinks.HDFS.hdfs.rollSize = 0

TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000

TwitterAgent.channels.MemChannel.type = memory

TwitterAgent.channels.MemChannel.capacity = 10000

TwitterAgent.channels.MemChannel.transactionCapacity = 100

现在要扩展到我的应用程序中,我需要flume配置文件中的keywords部分来包含趋势主题,我找到了java代码来获取趋势主题,但是我现在有一个问题,我不知道如何将此代码连接到flume配置文件,或者如何在keywords部分添加实时趋势主题来创建一个新文件。我在网上搜索了很多,因为我是这个领域的初学者,如果你能提供一些信息或至少是其他的选择,这将是非常有帮助的。

xzabzqsa

xzabzqsa1#

一个非常有趣的问题。。!
我同意@cricket\u 007的评论-在不重新启动flume代理的情况下编辑配置是不可能的。
我将不能说太多,因为我还没有看到你的java代码,以获得关键字的趋势主题。不过,根据您提供的信息,我可以想到一种替代方法(或者我更愿意说是一种变通方法),但我自己还没有尝试过。
您可以这样修改twittersource.java类:

public void configure(Context context) {
consumerKey = context.getString(TwitterSourceConstants.CONSUMER_KEY_KEY);
consumerSecret = context.getString(TwitterSourceConstants.CONSUMER_SECRET_KEY);
accessToken = context.getString(TwitterSourceConstants.ACCESS_TOKEN_KEY);
accessTokenSecret = context.getString(TwitterSourceConstants.ACCESS_TOKEN_SECRET_KEY);

//MODIFY THE FOLLOWING PORTION
String keywordString = context.getString(TwitterSourceConstants.KEYWORDS_KEY, "");
if (keywordString.trim().length() == 0) {
    keywords = new String[0];
} else {
  keywords = keywordString.split(",");
  for (int i = 0; i < keywords.length; i++) {
    keywords[i] = keywords[i].trim();
  }
}
//UNTIL THIS POINT

ConfigurationBuilder cb = new ConfigurationBuilder();
cb.setOAuthConsumerKey(consumerKey);
cb.setOAuthConsumerSecret(consumerSecret);
cb.setOAuthAccessToken(accessToken);
cb.setOAuthAccessTokenSecret(accessTokenSecret);
cb.setJSONStoreEnabled(true);
cb.setIncludeEntitiesEnabled(true);

twitterStream = new TwitterStreamFactory(cb.build()).getInstance(); 
}

我在上面的注解中输入了keywordstring变量,您可以调用java代码(我假设这是一个方法,您可以从中返回逗号分隔的关键字字符串),而不是从flume.conf中的上下文中提取(只需删除context.getstring()部分)。
除此之外,只需从flume.conf中删除以下语句:

TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing

我希望这有帮助。

相关问题