我正在尝试使用flume和hive进行twitter分析。为了从twitter获取tweets,我在flume.conf文件中设置了所有必需的参数(consumerkey、consumersecret、accesstoken和accesstokensecret)。
TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS TwitterAgent.sources.Twitter.type =
com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <consumerKey>
TwitterAgent.sources.Twitter.consumerSecret = <consumerSecret>
TwitterAgent.sources.Twitter.accessToken = <accessToken>
TwitterAgent.sources.Twitter.accessTokenSecret = <accessTokenSecret
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics,
bigdata, cloudera, data science, data scientiest, business
intelligence, mapreduce, data warehouse, data warehousing, mahout,
hbase, nosql, newsql, businessintelligence, cloudcomputing
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs TwitterAgent.sinks.HDFS.hdfs.path
= hdfs://localhost:9000/user/flume/tweets/ TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
我已经使用bash rc为flume tar球和flume源快照jar文件设置了类路径。
export FLUME_HOME=/home/students/apache-flume-1.4.0-bin
export FLUME_SRC=/home/students/flume-sources-1.0-SNAPSHOT.jar
export PATH=$FLUME_HOME/bin:$FLUME_SRC/bin:$PATH
当我运行flume代理时
flume-ng agent --conf-file twitter_flume.conf --name TwitterAgent -Dflume.root.logger=INFO,console -n TwitterAgent
我可以看到下面的日志跟踪,什么都没有发生
15/06/23 23:41:55 INFO source.DefaultSourceFactory: Creating instance
of source Twitter, type com.cloudera.flume.source.TwitterSource
15/06/23 23:41:55 ERROR
node.PollingPropertiesFileConfigurationProvider: Failed to load
configuration data. Exception follows.
org.apache.flume.FlumeException: Unable to load source type:
com.cloudera.flume.source.TwitterSource, class:
com.cloudera.flume.source.TwitterSource at
org.apache.flume.source.DefaultSourceFactory.getClass(DefaultSourceFactory.java:67)
at
org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:40)
at
org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:327)
at
org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102)
at
org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744) Caused by:
java.lang.ClassNotFoundException:
com.cloudera.flume.source.TwitterSource at
java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
java.lang.ClassLoader.loadClass(ClassLoader.java:358) at
java.lang.Class.forName0(Native Method) at
java.lang.Class.forName(Class.java:190) at
org.apache.flume.source.DefaultSourceFactory.getClass(DefaultSourceFactory.java:65)
... 11 more
我可以知道为什么在我设置flume source.jar时抛出这个错误吗?请帮我解决这个问题。
4条答案
按热度按时间dfty9e191#
您没有设置类路径,而是设置路径(用于查找可执行二进制文件,而不是java.jar文件)。
您可以在flume conf目录下的flume-env.sh文件中设置flume\u classpath变量;或者添加
-classpath <path/to/the/jar>
命令行上的选项。piok6c0g2#
请在cloudera上找到flume twitter设置:
1下面是文件/usr/lib/flume ng/conf/flume.conf:
2将下面的flume-env.sh.template文件重命名为flume-env.sh
~]$sudo cp/usr/lib/flume ng/conf/flume-env.sh.template/usr/lib/flume ng/conf/flume-env.sh
三。将flume-env.sh文件中的java\u home和flume\u classpath设置为:
export java\u home=/usr/java/jdk1.7.0\u 67-cloudera
flume\u classpath=“/usr/lib/flume ng/lib/flume-sources-1.0-snapshot.jar”
4如果您在系统上找不到“/usr/lib/flume ng/lib/flume-sources-1.0-snapshot.jar”,请从google下载apache-flume-1.6.0-bin并将其lib文件夹复制到当前lib文件夹。
确保flume-sources-1.0-snapshot.jar文件在lib文件夹中可用。
4.1. 重命名旧库文件夹
4.2. 下载并放到cloudera桌面上,然后执行以下操作:
~]$sudo mv/usr/lib/flume ng/lib/usr/lib/flume ng/lib\云时代
~]$sudo mv/home/cloudera/desktop/apache-flume-1.6.0-bin/lib/usr/lib/flume ng/lib
5现在运行flume代理命令:
~]$flume ng agent--conf文件/usr/lib/flume ng/conf/flume.conf--名称twitteragent-dflume.root.logger=info,控制台-n twitteragent
此操作应成功运行。祝你一切顺利。
fdbelqdn3#
我想
com.cloudera.flume.source.TwitterSource
不再工作了。尝试org.apache.flume.source.twitter.TwitterSource
bvhaajcl4#
抱歉,它确实有效,但请确保您的flume/lib中有所有的jar。请按照中的所有步骤进行操作:http://bigdatanalysis.blogspot.com.es/2014/02/collecting-tweets-in-hadoop-using-flume.html