我对giraph还比较陌生,我正在尝试让giraph edit compile deploy循环为我们的代码工作。我能够运行各种例子的灵感来自http://blog.cloudera.com/blog/2014/02/how-to-write-and-run-giraph-jobs-on-hadoop/ ,但在运行修改后的simpleshortestpathsvertex giraph示例时,遇到了classnotfoundexception。我尝试过各种-libjars和hadoop\u类路径的组合,但是我没有主意了,我非常感谢你的帮助。细节如下。
版本
hadoop:hadoop 2.0.0-cdh4.4.0版本
giraph:giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar
pagerankbenchmark运行正常
$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.benchmark.PageRankBenchmark \
-Dgiraph.zkList=<myhost>:2181 \
-e 1 -s 3 -v -V 50 -w 1
...
14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015
...
(full output is below)
giraphrunner simpleshortestpathsvertex也运行正常
$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
org.apache.giraph.examples.SimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op goutput/shortestpathsC2 \
-ca SimpleShortestPathsVertex.source=2 \
-w 1
...
14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017
...
(full output is below)
奖励:结果是正确的:
$ hadoop fs -cat goutput/shortestpathsC2/p*
0 1.0
2 2.0
1 0.0
3 1.0
4 5.0
但是我的simpleshortestpathsvertex的修改版本得到classnotfoundexception
包含修改顶点的jar(kdlsimpleshortestpathsvertex,无包)正常:
$ jar -tf ~/kdl_hadoop_play.jar
META-INF/MANIFEST.MF
KdlSimpleShortestPathsVertex.class
META-INF/
但是我的跑步呕吐:
$ hadoop jar $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
-libjars ~/kdl_hadoop_play.jar \
KdlSimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /user/cornell/ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /user/cornell/goutput/shortestpathsC2 \
-ca KdlSimpleShortestPathsVertex.source=2 \
-w 1
Exception in thread "main" java.lang.ClassNotFoundException: KdlSimpleShortestPathsVertex
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.giraph.utils.ConfigurationUtils.populateGiraphConfiguration(ConfigurationUtils.java:210)
at org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:147)
at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
我猜。。。
…环顾四周后发现,可能giraphrunner没有正确处理-libjars,正如http://grepalex.com/2013/02/25/hadoop-libjars/ (“确保您的代码使用的是genericoptionsparser”)。浏览giraph源代码时,我没有看到该类被访问。我尝试将hadoop\u类路径设置到我的jar中,但没有解决问题。
任何帮助都太棒了!
pagerankbenchmark输出
14/08/01 11:42:27 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/08/01 11:42:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/01 11:42:28 WARN bsp.BspOutputFormat: checkOutputSpecs: ImmutableOutputCommiter will not check anything
14/08/01 11:42:29 INFO mapred.JobClient: Running job: job_201407291058_0015
14/08/01 11:42:30 INFO mapred.JobClient: map 0% reduce 0%
14/08/01 11:42:40 INFO mapred.JobClient: map 50% reduce 0%
14/08/01 11:42:41 INFO mapred.JobClient: map 100% reduce 0%
14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015
14/08/01 11:42:44 INFO mapred.JobClient: Counters: 39
14/08/01 11:42:44 INFO mapred.JobClient: File System Counters
14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of bytes read=0
14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of bytes written=369846
14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of read operations=0
14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of large read operations=0
14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of write operations=0
14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of bytes read=88
14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of bytes written=0
14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of read operations=2
14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of large read operations=0
14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of write operations=1
14/08/01 11:42:44 INFO mapred.JobClient: Job Counters
14/08/01 11:42:44 INFO mapred.JobClient: Launched map tasks=2
14/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=15772
14/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient: Map-Reduce Framework
14/08/01 11:42:44 INFO mapred.JobClient: Map input records=2
14/08/01 11:42:44 INFO mapred.JobClient: Map output records=0
14/08/01 11:42:44 INFO mapred.JobClient: Input split bytes=88
14/08/01 11:42:44 INFO mapred.JobClient: Spilled Records=0
14/08/01 11:42:44 INFO mapred.JobClient: CPU time spent (ms)=2230
14/08/01 11:42:44 INFO mapred.JobClient: Physical memory (bytes) snapshot=411357184
14/08/01 11:42:44 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2428895232
14/08/01 11:42:44 INFO mapred.JobClient: Total committed heap usage (bytes)=806027264
14/08/01 11:42:44 INFO mapred.JobClient: Giraph Stats
14/08/01 11:42:44 INFO mapred.JobClient: Aggregate edges=50
14/08/01 11:42:44 INFO mapred.JobClient: Aggregate finished vertices=50
14/08/01 11:42:44 INFO mapred.JobClient: Aggregate vertices=50
14/08/01 11:42:44 INFO mapred.JobClient: Current master task partition=0
14/08/01 11:42:44 INFO mapred.JobClient: Current workers=1
14/08/01 11:42:44 INFO mapred.JobClient: Last checkpointed superstep=0
14/08/01 11:42:44 INFO mapred.JobClient: Sent messages=0
14/08/01 11:42:44 INFO mapred.JobClient: Superstep=4
14/08/01 11:42:44 INFO mapred.JobClient: Giraph Timers
14/08/01 11:42:44 INFO mapred.JobClient: Input superstep (milliseconds)=238
14/08/01 11:42:44 INFO mapred.JobClient: Setup (milliseconds)=2903
14/08/01 11:42:44 INFO mapred.JobClient: Shutdown (milliseconds)=68
14/08/01 11:42:44 INFO mapred.JobClient: Superstep 0 (milliseconds)=77
14/08/01 11:42:44 INFO mapred.JobClient: Superstep 1 (milliseconds)=64
14/08/01 11:42:44 INFO mapred.JobClient: Superstep 2 (milliseconds)=45
14/08/01 11:42:44 INFO mapred.JobClient: Superstep 3 (milliseconds)=43
14/08/01 11:42:44 INFO mapred.JobClient: Total (milliseconds)=3442
simpleshortestpathsvertex输出
14/08/01 11:47:37 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
14/08/01 11:47:37 INFO utils.ConfigurationUtils: Setting custom argument [SimpleShortestPathsVertex.source] to [2] in GiraphConfiguration
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex index type is not known
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex value type is not known
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format edge value type is not known
14/08/01 11:47:37 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/08/01 11:47:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/01 11:47:38 INFO mapred.JobClient: Running job: job_201407291058_0017
14/08/01 11:47:39 INFO mapred.JobClient: map 0% reduce 0%
14/08/01 11:47:44 INFO mapred.JobClient: map 50% reduce 0%
14/08/01 11:47:45 INFO mapred.JobClient: map 100% reduce 0%
14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017
14/08/01 11:47:46 INFO mapred.JobClient: Counters: 39
14/08/01 11:47:46 INFO mapred.JobClient: File System Counters
14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of bytes read=0
14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of bytes written=367068
14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of read operations=0
14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of large read operations=0
14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of write operations=0
14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of bytes read=200
14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of bytes written=30
14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of read operations=5
14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of large read operations=0
14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of write operations=2
14/08/01 11:47:46 INFO mapred.JobClient: Job Counters
14/08/01 11:47:46 INFO mapred.JobClient: Launched map tasks=2
14/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=8538
14/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient: Map-Reduce Framework
14/08/01 11:47:46 INFO mapred.JobClient: Map input records=2
14/08/01 11:47:46 INFO mapred.JobClient: Map output records=0
14/08/01 11:47:46 INFO mapred.JobClient: Input split bytes=88
14/08/01 11:47:46 INFO mapred.JobClient: Spilled Records=0
14/08/01 11:47:46 INFO mapred.JobClient: CPU time spent (ms)=1590
14/08/01 11:47:46 INFO mapred.JobClient: Physical memory (bytes) snapshot=341344256
14/08/01 11:47:46 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2363527168
14/08/01 11:47:46 INFO mapred.JobClient: Total committed heap usage (bytes)=504758272
14/08/01 11:47:46 INFO mapred.JobClient: Giraph Stats
14/08/01 11:47:46 INFO mapred.JobClient: Aggregate edges=12
14/08/01 11:47:46 INFO mapred.JobClient: Aggregate finished vertices=5
14/08/01 11:47:46 INFO mapred.JobClient: Aggregate vertices=5
14/08/01 11:47:46 INFO mapred.JobClient: Current master task partition=0
14/08/01 11:47:46 INFO mapred.JobClient: Current workers=1
14/08/01 11:47:46 INFO mapred.JobClient: Last checkpointed superstep=0
14/08/01 11:47:46 INFO mapred.JobClient: Sent messages=0
14/08/01 11:47:46 INFO mapred.JobClient: Superstep=4
14/08/01 11:47:46 INFO mapred.JobClient: Giraph Timers
14/08/01 11:47:46 INFO mapred.JobClient: Input superstep (milliseconds)=181
14/08/01 11:47:46 INFO mapred.JobClient: Setup (milliseconds)=313
14/08/01 11:47:46 INFO mapred.JobClient: Shutdown (milliseconds)=128
14/08/01 11:47:46 INFO mapred.JobClient: Superstep 0 (milliseconds)=57
14/08/01 11:47:46 INFO mapred.JobClient: Superstep 1 (milliseconds)=54
14/08/01 11:47:46 INFO mapred.JobClient: Superstep 2 (milliseconds)=36
14/08/01 11:47:46 INFO mapred.JobClient: Superstep 3 (milliseconds)=35
14/08/01 11:47:46 INFO mapred.JobClient: Total (milliseconds)=805
2条答案
按热度按时间jckbn6z71#
我不知道为什么这不起作用,但有一个快速和肮脏的方法来解决这个。试着把你的代码放进去
giraph-examples/src/main/java/org/apache/giraph/examples/
目录(simpleshortestpath所在的位置)。然后通过运行mvn -DskipTests --projects giraph-examples --also-make package
. 然后简单地运行程序,就像用文件名替换simpleshortestpath一样。我希望这有帮助。7rtdyuoh2#
好的,在看了hadoop脚本以及hadoop和giraph源代码之后,我想我找到了答案。很大的提示来自于将libjars选项与hadoop结合使用,以及输出中的以下行:
warn mapred.jobclient:使用GenericOptions分析参数。应用程序应该实现同样的工具。
原因似乎是giraphrunner使用其自己的configurationutils.parseargs()来获取org.apache.commons.cli.commandline,而不是使用推荐的org.apache.hadoop.util.genericoptionsparser.getcommandline(),后者接受“libjars”选项。这使我转而使用hadoop的通用类路径处理工具:类路径和/或hadoop\u类路径。以下是有效的方法:
使用冒号分隔符将hadoop\u classpath设置为包含应用程序jar和gigraph核心jar。
使用相同的类路径传递libjars,但使用逗号分隔符。
例如,在我的机器上:
给出了预期的输出和结果。
更一般地说,如果giraph团队将代码更改为使用(显然)更标准的解析器,这会很有帮助。
希望有帮助!