mahout随机林分类器示例arrayindexoutofboundsexception

rdrgkggo  于 2021-06-04  发布在  Hadoop
关注(0)|答案(1)|浏览(368)

在尝试运行随机森林的例子我遇到 java.lang.ArrayIndexOutOfBoundsException: 100 错误。这里100与树的数目有关。Map部分是100%完成,减少是0%。我用 hadoop-1.2.1 以及 mahout-distribution-0.7 . 我也试过了 mahout-distribution-0.9 同样的错误。
有人幸运地运行了这个例子吗?

rkkpypqq

rkkpypqq1#

发现问题。如果使用mapred.job.tracker=local运行hadoop,则partialbuilder无法使用mapred.map.tasks获取Map任务数。因此,它计算每个Map任务的树数是错误的。
解决方案:在本地hadoop上运行随机林作业时不要使用参数“-p”。
细节:

windiana@host:~/mahout/data/> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.9-job.jar org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d testdata/KDDTrain+.arff -ds testdata/KDDTrain+.info -sl 5 -t 100 -o nsl-forest
Warning: $HADOOP_HOME is deprecated.

14/08/07 11:25:18 INFO mapreduce.BuildForest: InMem Mapred implementation
14/08/07 11:25:18 INFO mapreduce.BuildForest: Building the forest...
14/08/07 11:25:18 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Creating KDDTrain+.info in /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata-work-5026960219142699303 with rwxr-xr-x
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.info as /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata/KDDTrain+.info
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.info as /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata/KDDTrain+.info
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Creating KDDTrain+.arff in /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata-work-5750487161401524172 with rwxr-xr-x
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.arff as /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata/KDDTrain+.arff
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.arff as /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata/KDDTrain+.arff
14/08/07 11:25:19 INFO mapred.JobClient: Running job: job_local966281240_0001
14/08/07 11:25:19 INFO mapred.LocalJobRunner: Waiting for map tasks
14/08/07 11:25:19 INFO mapred.LocalJobRunner: Starting task: attempt_local966281240_0001_m_000000_0
14/08/07 11:25:19 INFO util.ProcessTree: setsid exited with exit code 0
14/08/07 11:25:19 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2df8fdda
14/08/07 11:25:19 INFO mapred.MapTask: Processing split: [firstId:0, nbTrees:100, seed:null]
14/08/07 11:25:19 INFO inmem.InMemMapper: Loading the data...
14/08/07 11:25:20 INFO mapred.JobClient:  map 0% reduce 0%
14/08/07 11:25:21 INFO inmem.InMemMapper: Data loaded : 125973 instances
14/08/07 11:25:25 INFO mapred.LocalJobRunner: 
14/08/07 11:25:26 INFO mapred.JobClient:  map 1% reduce 0%

...

14/08/07 11:27:59 INFO mapred.JobClient:  map 98% reduce 0%
14/08/07 11:28:00 INFO mapred.Task: Task:attempt_local966281240_0001_m_000000_0 is done. And is in the process of commiting
14/08/07 11:28:00 INFO mapred.LocalJobRunner: 
14/08/07 11:28:00 INFO mapred.Task: Task attempt_local966281240_0001_m_000000_0 is allowed to commit now
14/08/07 11:28:00 INFO output.FileOutputCommitter: Saved output of task 'attempt_local966281240_0001_m_000000_0' to file:/home/martin/Programmieren/mahout/data/cut/nsl-forest
14/08/07 11:28:00 INFO mapred.LocalJobRunner: 
14/08/07 11:28:00 INFO mapred.Task: Task 'attempt_local966281240_0001_m_000000_0' done.
14/08/07 11:28:00 INFO mapred.LocalJobRunner: Finishing task: attempt_local966281240_0001_m_000000_0
14/08/07 11:28:00 INFO mapred.LocalJobRunner: Map task executor complete.
14/08/07 11:28:00 INFO mapred.JobClient:  map 99% reduce 0%
14/08/07 11:28:00 INFO mapred.JobClient: Job complete: job_local966281240_0001
14/08/07 11:28:00 INFO mapred.JobClient: Counters: 12
14/08/07 11:28:00 INFO mapred.JobClient:   File Output Format Counters 
14/08/07 11:28:00 INFO mapred.JobClient:     Bytes Written=2353226
14/08/07 11:28:00 INFO mapred.JobClient:   File Input Format Counters 
14/08/07 11:28:00 INFO mapred.JobClient:     Bytes Read=0
14/08/07 11:28:00 INFO mapred.JobClient:   FileSystemCounters
14/08/07 11:28:00 INFO mapred.JobClient:     FILE_BYTES_READ=61962918
14/08/07 11:28:00 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=45667235
14/08/07 11:28:00 INFO mapred.JobClient:   Map-Reduce Framework
14/08/07 11:28:00 INFO mapred.JobClient:     Map input records=100
14/08/07 11:28:00 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
14/08/07 11:28:00 INFO mapred.JobClient:     Spilled Records=0
14/08/07 11:28:00 INFO mapred.JobClient:     Total committed heap usage (bytes)=132120576
14/08/07 11:28:00 INFO mapred.JobClient:     CPU time spent (ms)=0
14/08/07 11:28:00 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
14/08/07 11:28:00 INFO mapred.JobClient:     SPLIT_RAW_BYTES=90
14/08/07 11:28:00 INFO mapred.JobClient:     Map output records=100
14/08/07 11:28:00 INFO common.HadoopUtil: Deleting file:/home/martin/Programmieren/mahout/data/cut/nsl-forest
14/08/07 11:28:00 INFO mapreduce.BuildForest: Build Time: 0h 2m 41s 702
14/08/07 11:28:00 INFO mapreduce.BuildForest: Forest num Nodes: 130056
14/08/07 11:28:00 INFO mapreduce.BuildForest: Forest mean num Nodes: 1300
14/08/07 11:28:00 INFO mapreduce.BuildForest: Forest mean max Depth: 19
14/08/07 11:28:00 INFO mapreduce.BuildForest: Storing the forest in: nsl-forest/forest.seq

相关问题