iam目前正在进行一个大数据项目,对twitter的热门主题进行情绪分析。我学习了cloudera的教程,了解了如何通过flume将tweet发送到hadoop。
http://blog.cloudera.com/blog/2012/09/analyzing-twitter-data-with-hadoop/
flume.conf(Flume形态):
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# in this case called 'TwitterAgent'
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey =
TwitterAgent.sources.Twitter.consumerSecret =
TwitterAgent.sources.Twitter.accessToken =
TwitterAgent.sources.Twitter.accessTokenSecret =
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hadoop1:8020/user/flume/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
现在要扩展到我的应用程序中,我需要flume配置文件中的keywords部分来包含趋势主题,我找到了java代码来获取趋势主题,但是我现在有一个问题,我不知道如何将此代码连接到flume配置文件,或者如何在keywords部分添加实时趋势主题来创建一个新文件。我在网上搜索了很多,因为我是这个领域的初学者,如果你能提供一些信息或至少是其他的选择,这将是非常有帮助的。
1条答案
按热度按时间xzabzqsa1#
一个非常有趣的问题。。!
我同意@cricket\u 007的评论-在不重新启动flume代理的情况下编辑配置是不可能的。
我将不能说太多,因为我还没有看到你的java代码,以获得关键字的趋势主题。不过,根据您提供的信息,我可以想到一种替代方法(或者我更愿意说是一种变通方法),但我自己还没有尝试过。
您可以这样修改twittersource.java类:
我在上面的注解中输入了keywordstring变量,您可以调用java代码(我假设这是一个方法,您可以从中返回逗号分隔的关键字字符串),而不是从flume.conf中的上下文中提取(只需删除context.getstring()部分)。
除此之外,只需从flume.conf中删除以下语句:
我希望这有帮助。