我对大数据和hadoop的世界还很陌生,我正在尝试运行一个google中可用的代码,它包括四个步骤,比如将数据放入hadoop文件系统,然后向数据添加索引,然后主要的步骤是使用map和reduce创建一个简化的数据。
我能够运行前两步:代码使用xml处理位置:
我用的代码是http://asterixdb.ics.uci.edu/fuzzyjoin/
当我做最后一步模糊连接时,它会给我一系列错误:
特此将跟踪文件附加到:
hduser@ubuntu:/home/midhu/fuzzyjoin$ cd fuzzyjoin-hadoop
hduser@ubuntu:/home/midhu/fuzzyjoin/fuzzyjoin-hadoop$ hadoop jar target/fuzzyjoin-hadoop-0.0.2-SNAPSHOT.jar fuzzyjoin -conf src/main/resources/fuzzyjoin/dblp.quickstart.xml
16/04/03 13:55:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Complete-Job started: Sun Apr 03 13:55:42 IST 2016
Multi-Job started: Sun Apr 03 13:55:42 IST 2016
FuzzyJoinDriver(TokensBasic.phase1)
Input Path: {hdfs://localhost:54310/user/hduser/dblp-small/records-000}
Output Path: hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000
Map Jobs: 2
Reduce Jobs: 1
Properties: {fuzzyjoin.similarity.name=Jaccard
fuzzyjoin.similarity.threshold=.5
fuzzyjoin.tokenizer=Word
fuzzyjoin.tokens.package=Scalar
fuzzyjoin.tokens.lengthstats=false
fuzzyjoin.ridpairs.group.class=TokenIdentity
fuzzyjoin.ridpairs.group.factor=1
fuzzyjoin.data.tokens=
fuzzyjoin.data.joinindex=}
Job started: Sun Apr 03 13:55:42 IST 2016
16/04/03 13:55:42 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
16/04/03 13:55:42 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
16/04/03 13:55:42 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/04/03 13:55:43 INFO mapred.FileInputFormat: Total input paths to process : 1
16/04/03 13:55:43 INFO mapreduce.JobSubmitter: number of splits:1
16/04/03 13:55:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1780986358_0001
16/04/03 13:55:44 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/04/03 13:55:44 INFO mapreduce.Job: Running job: job_local1780986358_0001
16/04/03 13:55:44 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/04/03 13:55:44 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
16/04/03 13:55:45 INFO mapred.LocalJobRunner: Waiting for map tasks
16/04/03 13:55:45 INFO mapred.LocalJobRunner: Starting task: attempt_local1780986358_0001_m_000000_0
16/04/03 13:55:46 INFO mapreduce.Job: Job job_local1780986358_0001 running in uber mode : false
16/04/03 13:55:46 INFO mapreduce.Job: map 0% reduce 0%
16/04/03 13:55:46 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/04/03 13:55:46 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687
16/04/03 13:55:46 INFO mapred.MapTask: numReduceTasks: 1
16/04/03 13:55:49 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/04/03 13:55:49 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/04/03 13:55:49 INFO mapred.MapTask: soft limit at 83886080
16/04/03 13:55:49 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/04/03 13:55:49 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/04/03 13:55:49 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/04/03 13:55:52 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687 > map
16/04/03 13:55:54 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687 > map
16/04/03 13:55:54 INFO mapred.MapTask: Starting flush of map output
16/04/03 13:55:54 INFO mapred.MapTask: Spilling map output
16/04/03 13:55:54 INFO mapred.MapTask: bufstart = 0; bufend = 15588; bufvoid = 104857600
16/04/03 13:55:54 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26209408(104837632); length = 4989/6553600
16/04/03 13:55:54 INFO mapred.MapTask: Finished spill 0
16/04/03 13:55:54 INFO mapred.Task: Task:attempt_local1780986358_0001_m_000000_0 is done. And is in the process of committing
16/04/03 13:55:54 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687
16/04/03 13:55:54 INFO mapred.Task: Task 'attempt_local1780986358_0001_m_000000_0' done.
16/04/03 13:55:54 INFO mapred.LocalJobRunner: Finishing task: attempt_local1780986358_0001_m_000000_0
16/04/03 13:55:54 INFO mapred.LocalJobRunner: map task executor complete.
16/04/03 13:55:54 INFO mapred.LocalJobRunner: Waiting for reduce tasks
16/04/03 13:55:54 INFO mapred.LocalJobRunner: Starting task: attempt_local1780986358_0001_r_000000_0
16/04/03 13:55:54 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/04/03 13:55:54 INFO mapreduce.Job: map 100% reduce 0%
16/04/03 13:55:54 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@3209e0
16/04/03 13:55:54 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
16/04/03 13:55:54 INFO reduce.EventFetcher: attempt_local1780986358_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
16/04/03 13:55:56 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1780986358_0001_m_000000_0 decomp: 9062 len: 9066 to MEMORY
16/04/03 13:55:56 INFO reduce.InMemoryMapOutput: Read 9062 bytes from map-output for attempt_local1780986358_0001_m_000000_0
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 9062, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->9062
16/04/03 13:55:57 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
16/04/03 13:55:57 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
16/04/03 13:55:57 INFO mapred.Merger: Merging 1 sorted segments
16/04/03 13:55:57 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9056 bytes
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: Merged 1 segments, 9062 bytes to disk to satisfy reduce memory limit
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: Merging 1 files, 9066 bytes from disk
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
16/04/03 13:55:57 INFO mapred.Merger: Merging 1 sorted segments
16/04/03 13:55:57 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9056 bytes
16/04/03 13:55:57 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/03 13:56:00 INFO mapred.LocalJobRunner: reduce > reduce
16/04/03 13:56:00 INFO mapreduce.Job: map 100% reduce 100%
16/04/03 13:56:01 INFO mapred.Task: Task:attempt_local1780986358_0001_r_000000_0 is done. And is in the process of committing
16/04/03 13:56:01 INFO mapred.LocalJobRunner: reduce > reduce
16/04/03 13:56:01 INFO mapred.Task: Task attempt_local1780986358_0001_r_000000_0 is allowed to commit now
16/04/03 13:56:02 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1780986358_0001_r_000000_0' to hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000/_temporary/0/task_local1780986358_0001_r_000000
16/04/03 13:56:02 INFO mapred.LocalJobRunner: reduce > reduce
16/04/03 13:56:02 INFO mapred.Task: Task 'attempt_local1780986358_0001_r_000000_0' done.
16/04/03 13:56:02 INFO mapred.LocalJobRunner: Finishing task: attempt_local1780986358_0001_r_000000_0
16/04/03 13:56:02 INFO mapred.LocalJobRunner: reduce task executor complete.
16/04/03 13:56:02 INFO mapreduce.Job: Job job_local1780986358_0001 completed successfully
16/04/03 13:56:03 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=1080562
FILE: Number of bytes written=1589660
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=73374
HDFS: Number of bytes written=12847
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=18
Map-Reduce Framework
Map input records=100
Map output records=1248
Map output bytes=15588
Map output materialized bytes=9066
Input split bytes=120
Combine input records=1248
Combine output records=597
Reduce input groups=597
Reduce shuffle bytes=9066
Reduce input records=597
Reduce output records=597
Spilled Records=1194
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=176
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=241836032
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=36687
File Output Format Counters
Bytes Written=12847
Job ended: Sun Apr 03 13:56:04 IST 2016
The job took 21.44 seconds.
FuzzyJoinDriver(TokensBasic.phase2)
Input Path: {hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000}
Output Path: hdfs://localhost:54310/user/hduser/dblp-small/tokens-000
Map Jobs: 2
Reduce Jobs: 1
Properties: {fuzzyjoin.similarity.name=Jaccard
fuzzyjoin.similarity.threshold=.5
fuzzyjoin.tokenizer=Word
fuzzyjoin.tokens.package=Scalar
fuzzyjoin.tokens.lengthstats=false
fuzzyjoin.ridpairs.group.class=TokenIdentity
fuzzyjoin.ridpairs.group.factor=1
fuzzyjoin.data.tokens=
fuzzyjoin.data.joinindex=}
Job started: Sun Apr 03 13:56:04 IST 2016
16/04/03 13:56:04 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/04/03 13:56:04 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/04/03 13:56:05 INFO mapred.FileInputFormat: Total input paths to process : 1
16/04/03 13:56:05 INFO mapreduce.JobSubmitter: number of splits:1
16/04/03 13:56:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local954589393_0002
16/04/03 13:56:05 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/04/03 13:56:05 INFO mapreduce.Job: Running job: job_local954589393_0002
16/04/03 13:56:05 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/04/03 13:56:05 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
16/04/03 13:56:05 INFO mapred.LocalJobRunner: Waiting for map tasks
16/04/03 13:56:05 INFO mapred.LocalJobRunner: Starting task: attempt_local954589393_0002_m_000000_0
16/04/03 13:56:05 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/04/03 13:56:05 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000/part-00000:0+12847
16/04/03 13:56:05 INFO mapred.MapTask: numReduceTasks: 1
16/04/03 13:56:06 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/04/03 13:56:06 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/04/03 13:56:06 INFO mapred.MapTask: soft limit at 83886080
16/04/03 13:56:06 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/04/03 13:56:06 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/04/03 13:56:06 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/04/03 13:56:06 INFO mapred.LocalJobRunner:
16/04/03 13:56:06 INFO mapred.MapTask: Starting flush of map output
16/04/03 13:56:06 INFO mapred.MapTask: Spilling map output
16/04/03 13:56:06 INFO mapred.MapTask: bufstart = 0; bufend = 7866; bufvoid = 104857600
16/04/03 13:56:06 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26212012(104848048); length = 2385/6553600
16/04/03 13:56:06 INFO mapred.MapTask: Finished spill 0
16/04/03 13:56:06 INFO mapred.Task: Task:attempt_local954589393_0002_m_000000_0 is done. And is in the process of committing
16/04/03 13:56:06 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000/part-00000:0+12847
16/04/03 13:56:06 INFO mapred.Task: Task 'attempt_local954589393_0002_m_000000_0' done.
16/04/03 13:56:06 INFO mapred.LocalJobRunner: Finishing task: attempt_local954589393_0002_m_000000_0
16/04/03 13:56:06 INFO mapred.LocalJobRunner: map task executor complete.
16/04/03 13:56:06 INFO mapred.LocalJobRunner: Waiting for reduce tasks
16/04/03 13:56:06 INFO mapred.LocalJobRunner: Starting task: attempt_local954589393_0002_r_000000_0
16/04/03 13:56:06 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/04/03 13:56:06 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@4950dd
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
16/04/03 13:56:06 INFO reduce.EventFetcher: attempt_local954589393_0002_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
16/04/03 13:56:06 INFO reduce.LocalFetcher: localfetcher#2 about to shuffle output of map attempt_local954589393_0002_m_000000_0 decomp: 9062 len: 9066 to MEMORY
16/04/03 13:56:06 INFO reduce.InMemoryMapOutput: Read 9062 bytes from map-output for attempt_local954589393_0002_m_000000_0
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 9062, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->9062
16/04/03 13:56:06 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
16/04/03 13:56:06 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
16/04/03 13:56:06 INFO mapred.Merger: Merging 1 sorted segments
16/04/03 13:56:06 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9056 bytes
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: Merged 1 segments, 9062 bytes to disk to satisfy reduce memory limit
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: Merging 1 files, 9066 bytes from disk
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
16/04/03 13:56:06 INFO mapred.Merger: Merging 1 sorted segments
16/04/03 13:56:06 INFO mapreduce.Job: Job job_local954589393_0002 running in uber mode : false
16/04/03 13:56:06 INFO mapreduce.Job: map 100% reduce 0%
16/04/03 13:56:06 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9056 bytes
16/04/03 13:56:06 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/03 13:56:06 INFO mapred.Task: Task:attempt_local954589393_0002_r_000000_0 is done. And is in the process of committing
16/04/03 13:56:06 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/03 13:56:06 INFO mapred.Task: Task attempt_local954589393_0002_r_000000_0 is allowed to commit now
16/04/03 13:56:06 INFO output.FileOutputCommitter: Saved output of task 'attempt_local954589393_0002_r_000000_0' to hdfs://localhost:54310/user/hduser/dblp-small/tokens-000/_temporary/0/task_local954589393_0002_r_000000
16/04/03 13:56:06 INFO mapred.LocalJobRunner: reduce > reduce
16/04/03 13:56:06 INFO mapred.Task: Task 'attempt_local954589393_0002_r_000000_0' done.
16/04/03 13:56:06 INFO mapred.LocalJobRunner: Finishing task: attempt_local954589393_0002_r_000000_0
16/04/03 13:56:06 INFO mapred.LocalJobRunner: reduce task executor complete.
16/04/03 13:56:07 INFO mapreduce.Job: map 100% reduce 100%
16/04/03 13:56:07 INFO mapreduce.Job: Job job_local954589393_0002 completed successfully
16/04/03 13:56:07 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=2179300
FILE: Number of bytes written=3182466
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=99068
HDFS: Number of bytes written=31172
HDFS: Number of read operations=45
HDFS: Number of large read operations=0
HDFS: Number of write operations=30
Map-Reduce Framework
Map input records=597
Map output records=597
Map output bytes=7866
Map output materialized bytes=9066
Input split bytes=126
Combine input records=0
Combine output records=0
Reduce input groups=18
Reduce shuffle bytes=9066
Reduce input records=597
Reduce output records=597
Spilled Records=1194
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=488
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=336207872
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=12847
File Output Format Counters
Bytes Written=5478
Job ended: Sun Apr 03 13:56:07 IST 2016
The job took 3.563 seconds.
Multi-Job ended: Sun Apr 03 13:56:07 IST 2016
The multi-job took 25.128 seconds.
FuzzyJoinDriver(RIDPairsImproved)
Input Path: {hdfs://localhost:54310/user/hduser/dblp-small/records-000}
Output Path: hdfs://localhost:54310/user/hduser/dblp-small/ridpairs-000
Map Jobs: 2
Reduce Jobs: 1
Properties: {fuzzyjoin.similarity.name=Jaccard
fuzzyjoin.similarity.threshold=.5
fuzzyjoin.tokenizer=Word
fuzzyjoin.tokens.package=Scalar
fuzzyjoin.tokens.lengthstats=false
fuzzyjoin.ridpairs.group.class=TokenIdentity
fuzzyjoin.ridpairs.group.factor=1
fuzzyjoin.data.tokens=dblp-small/tokens-000/part-00000
fuzzyjoin.data.joinindex=}
Job started: Sun Apr 03 13:56:08 IST 2016
16/04/03 13:56:08 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/04/03 13:56:08 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/04/03 13:56:09 INFO mapred.FileInputFormat: Total input paths to process : 1
16/04/03 13:56:09 INFO mapreduce.JobSubmitter: number of splits:1
16/04/03 13:56:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1951342027_0003
16/04/03 13:56:16 INFO mapred.LocalDistributedCacheManager: Creating symlink: /tmp/mapred/local/1459671970648/part-00000 <- /home/midhu/fuzzyjoin/fuzzyjoin-hadoop/part-00000
16/04/03 13:56:16 INFO mapred.LocalDistributedCacheManager: Localized hdfs://localhost:54310/user/hduser/dblp-small/tokens-000/part-00000 as file:/tmp/mapred/local/1459671970648/part-00000
16/04/03 13:56:17 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/04/03 13:56:17 INFO mapreduce.Job: Running job: job_local1951342027_0003
16/04/03 13:56:17 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/04/03 13:56:17 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
16/04/03 13:56:17 INFO mapred.LocalJobRunner: Waiting for map tasks
16/04/03 13:56:17 INFO mapred.LocalJobRunner: Starting task: attempt_local1951342027_0003_m_000000_0
16/04/03 13:56:17 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/04/03 13:56:17 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687
16/04/03 13:56:17 INFO mapred.MapTask: numReduceTasks: 1
16/04/03 13:56:17 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/04/03 13:56:17 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/04/03 13:56:17 INFO mapred.MapTask: soft limit at 83886080
16/04/03 13:56:17 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/04/03 13:56:17 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/04/03 13:56:17 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/04/03 13:56:17 INFO mapred.LocalJobRunner: map task executor complete.
16/04/03 13:56:17 WARN mapred.LocalJobRunner: job_local1951342027_0003
java.lang.Exception: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 10 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 15 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 18 more
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: file:/tmp/mapred/local/1459671970648/part-00000 (No such file or directory)
at edu.uci.ics.fuzzyjoin.tokenorder.TokenLoad.loadTokenRank(TokenLoad.java:60)
at edu.uci.ics.fuzzyjoin.tokenorder.TokenLoad.loadTokenRank(TokenLoad.java:40)
at edu.uci.ics.fuzzyjoin.hadoop.ridpairs.token.MapSelfJoin.configure(MapSelfJoin.java:98)
... 23 more
Caused by: java.io.FileNotFoundException: file:/tmp/mapred/local/1459671970648/part-00000 (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at java.io.FileInputStream.<init>(FileInputStream.java:101)
at edu.uci.ics.fuzzyjoin.tokenorder.TokenLoad.loadTokenRank(TokenLoad.java:45)
... 25 more
16/04/03 13:56:18 INFO mapreduce.Job: Job job_local1951342027_0003 running in uber mode : false
16/04/03 13:56:18 INFO mapreduce.Job: map 0% reduce 0%
16/04/03 13:56:18 INFO mapreduce.Job: Job job_local1951342027_0003 failed with state FAILED due to: NA
16/04/03 13:56:18 INFO mapreduce.Job: Counters: 0
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at edu.uci.ics.fuzzyjoin.hadoop.FuzzyJoinDriver.run(FuzzyJoinDriver.java:179)
at edu.uci.ics.fuzzyjoin.hadoop.ridpairs.RIDPairsImproved.main(RIDPairsImproved.java:108)
at edu.uci.ics.fuzzyjoin.hadoop.FuzzyJoin.bib(FuzzyJoin.java:39)
at edu.uci.ics.fuzzyjoin.hadoop.FuzzyJoin.main(FuzzyJoin.java:86)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at edu.uci.ics.fuzzyjoin.hadoop.FuzzyJoinDriver.main(FuzzyJoinDriver.java:121)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
我想这是在ubuntu中hadoop的配置错误,我使用了本教程中的配置http://www.bogotobogo.com/hadoop/bigdata_hadoop_install_on_ubuntu_single_node_cluster.php
1条答案
按热度按时间jogvjijk1#
最后我成功地运行了代码并纠正了错误。这个错误是因为在本地运行mapreduce程序,我把它改为在yarn中运行,代码对所有类型的数据都很好