mahout在运行:第06章:wikipedia作业失败,出现java.lang.arrayindexoutofboundsexception异常

mi7gmzs6  于 2021-06-04  发布在  Hadoop
关注(0)|答案(1)|浏览(329)

我使用的hadoop版本是

$ hadoop version
Hadoop 2.5.0-cdh5.2.0
Subversion http://github.com/cloudera/hadoop -r e1f20a08bde76a33b79df026d00a0c91b2298387
Compiled by jenkins on 2014-10-11T21:00Z
Compiled with protoc 2.5.0
From source with checksum 309bccd135b199bdfdd6df5f3f4153d
This command was run using /DCNFS/applications/cdh/5.2/app/hadoop-2.5.0-cdh5.2.0/share/hadoop/common/hadoop-common-2.5.0-cdh5.2.0.jar

我的input.txt看起来像

$ hadoop dfs -cat input/input.txt | head -5
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

1: 1664968
2: 3 747213 1664968 1691047 4095634 5535664
3: 9 77935 79583 84707 564578 594898 681805 681886 835470 880698 1109091 1125108 1279972 1463445 1497566 1783284 1997564 2006526 2070954 2250217 2268713 2276203 2374802 2571397 2640902 2647217 2732378 2821237 3088028 3092827 3211549 3283735 3491412 3492254 3498305 3505664 3547201 3603437 3617913 3793767 3907547 4021634 4025897 4086017 4183126 4184025 4189168 4192731 4395141 4899940 4987592 4999120 5017477 5149173 5149311 5158741 5223097 5302153 5474252 5535280
4: 145
5: 8 57544 58089 60048 65880 284186 313376 564578 717529 729993 1097284 1204280 1204407 1255317 1670218 1720928 1850305 2269887 2333350 2359764 2640693 2743982 3303009 3322952 3492254 3573013 3721693 3797343 3797349 3797359 3849461 4033556 4173124 4189215 4207986 4669945 4817900 4901416 5010479 5062062 5072938 5098953 5292042 5429924 5599862 5599863 5689049

我的users.txt看起来像

$ hadoop dfs -cat input/users.txt 
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

3: 9 77935 79583 84707 564578 594898 681805 681886 835470 880698 1109091
1125108 1279972 1463445 1497566 1783284 1997564 2006526 2070954 2250217
2268713 2276203 2374802 2571397 2640902 2647217 2732378 2821237 3088028
3092827 3211549 3283735 3491412 3492254 3498305 3505664 3547201 3603437
3617913 3793767 3907547 4021634 4025897 4086017 4183126 4184025 4189168
4192731 4395141 4899940 4987592 4999120 5017477 5149173 5149311 5158741
5223097 5302153 5474252 5535280

我的工作是

$ hadoop jar mahout-core-0.9-cdh5.2.0-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile input/users.txt --booleanData -s SIMILARITY_COOCCURRENCE

但它失败了

15/02/07 16:48:44 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --maxPrefsInItemSimilarity=[500], --maxPrefsPerUser=[10], --maxSimilaritiesPerItem=[100], --minPrefsPerUser=[1], --numRecommendations=[10], --similarityClassname=[SIMILARITY_COOCCURRENCE], --startPhase=[0], --tempDir=[temp], --usersFile=[input/users.txt]}
15/02/07 16:48:44 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --input=[input/input.txt], --minPrefsPerUser=[1], --output=[temp/preparePreferenceMatrix], --ratingShift=[0.0], --startPhase=[0], --tempDir=[temp]}
15/02/07 16:48:44 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
15/02/07 16:48:44 INFO Configuration.deprecation: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
15/02/07 16:48:44 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
15/02/07 16:48:44 INFO client.RMProxy: Connecting to ResourceManager at name1.hadoop.dc.engr.scu.edu/10.128.0.201:8032
15/02/07 16:48:45 INFO input.FileInputFormat: Total input paths to process : 1
15/02/07 16:48:45 INFO mapreduce.JobSubmitter: number of splits:8
15/02/07 16:48:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1422500076160_0023
15/02/07 16:48:46 INFO impl.YarnClientImpl: Submitted application application_1422500076160_0023
15/02/07 16:48:46 INFO mapreduce.Job: The url to track the job: http://name1.hadoop.dc.engr.scu.edu:8088/proxy/application_1422500076160_0023/
15/02/07 16:48:46 INFO mapreduce.Job: Running job: job_1422500076160_0023
15/02/07 16:48:56 INFO mapreduce.Job: Job job_1422500076160_0023 running in uber mode : false
15/02/07 16:48:56 INFO mapreduce.Job:  map 0% reduce 0%
15/02/07 16:49:02 INFO mapreduce.Job: Task Id : attempt_1422500076160_0023_m_000006_0, Status : FAILED
Error: java.lang.ArrayIndexOutOfBoundsException: 1
    at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)
    at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

15/02/07 16:49:02 INFO mapreduce.Job: Task Id : attempt_1422500076160_0023_m_000001_0, Status : FAILED
Error: java.lang.ArrayIndexOutOfBoundsException: 1
    at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)
    at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

我认为数据格式不正确,有人能帮我解决这个问题吗?我是新来的 MapReduce 以及 Hadoop 谢谢

h9vpoimq

h9vpoimq1#

我不再从事这个项目了,这本书在现阶段是不受支持的。但看起来您是在原始输入上运行此作业,而不是在使用您在书中看到的自定义Map器将其从此格式解析为标准格式之后。

相关问题