它在local上运行得很好,但不能在mapreduce上运行。是因为我在reducer.py中使用了一些第三方库吗?
在gcp控制台shell中,我使用以下命令启动mapreduce:
gcloud dataproc jobs submit hadoop --cluster cc-1 --region=us-central1 --jar file:///usr/lib/hadoop-mapreduce/hadoop-streaming.jar --files=mapper.py,reducer.py -- -mapper "mapper.py" -reducer "reducer.py" -input gs://2018weather/2018.csv -output gs://2018weather/output-streaming
错误信息如下:
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.9.2.jar] /tmp/streamjob3637931789919464651.jar tmpDir=null
19/11/22 00:58:35 INFO client.RMProxy: Connecting to ResourceManager at cc-1-m/10.128.0.21:8032
19/11/22 00:58:35 INFO client.AHSProxy: Connecting to Application History server at cc-1-m/10.128.0.21:10200
19/11/22 00:58:36 INFO client.RMProxy: Connecting to ResourceManager at cc-1-m/10.128.0.21:8032
19/11/22 00:58:36 INFO client.AHSProxy: Connecting to Application History server at cc-1-m/10.128.0.21:10200
19/11/22 00:58:38 INFO mapred.FileInputFormat: Total input files to process : 1
19/11/22 00:58:38 INFO mapreduce.JobSubmitter: number of splits:42
19/11/22 00:58:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1574381669902_0002
19/11/22 00:58:39 INFO impl.YarnClientImpl: Submitted application application_1574381669902_0002
19/11/22 00:58:39 INFO mapreduce.Job: The url to track the job: http://cc-1-m:8088/proxy/application_1574381669902_0002/
19/11/22 00:58:39 INFO mapreduce.Job: Running job: job_1574381669902_0002
19/11/22 00:58:49 INFO mapreduce.Job: Job job_1574381669902_0002 running in uber mode : false
19/11/22 00:58:49 INFO mapreduce.Job: map 0% reduce 0%
19/11/22 00:59:07 INFO mapreduce.Job: map 5% reduce 0%
19/11/22 00:59:13 INFO mapreduce.Job: map 12% reduce 0%
19/11/22 00:59:15 INFO mapreduce.Job: map 31% reduce 0%
19/11/22 00:59:16 INFO mapreduce.Job: map 33% reduce 0%
19/11/22 00:59:24 INFO mapreduce.Job: map 38% reduce 0%
19/11/22 00:59:35 INFO mapreduce.Job: map 45% reduce 0%
19/11/22 00:59:36 INFO mapreduce.Job: map 50% reduce 0%
19/11/22 00:59:37 INFO mapreduce.Job: map 60% reduce 0%
19/11/22 00:59:38 INFO mapreduce.Job: map 67% reduce 0%
19/11/22 00:59:40 INFO mapreduce.Job: map 71% reduce 0%
19/11/22 00:59:56 INFO mapreduce.Job: map 79% reduce 0%
19/11/22 00:59:58 INFO mapreduce.Job: map 83% reduce 0%
19/11/22 00:59:59 INFO mapreduce.Job: map 90% reduce 0%
19/11/22 01:00:00 INFO mapreduce.Job: map 100% reduce 0%
19/11/22 01:00:09 INFO mapreduce.Job: Task Id : attempt_1574381669902_0002_r_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
19/11/22 01:00:17 INFO mapreduce.Job: Task Id : attempt_1574381669902_0002_r_000000_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
19/11/22 01:00:25 INFO mapreduce.Job: Task Id : attempt_1574381669902_0002_r_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
19/11/22 01:00:35 INFO mapreduce.Job: map 100% reduce 100%
19/11/22 01:00:35 INFO mapreduce.Job: Job job_1574381669902_0002 failed with state FAILED due to: Task failed task_1574381669902_0002_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1
19/11/22 01:00:35 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=8927219
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
GS: Number of bytes read=1171999949
GS: Number of bytes written=0
GS: Number of read operations=0
GS: Number of large read operations=0
GS: Number of write operations=0
HDFS: Number of bytes read=3234
HDFS: Number of bytes written=0
HDFS: Number of read operations=42
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Failed reduce tasks=4
Killed map tasks=2
Launched map tasks=42
Launched reduce tasks=4
Rack-local map tasks=42
Total time spent by all maps in occupied slots (ms)=3381160
Total time spent by all reduces in occupied slots (ms)=197616
Total time spent by all map tasks (ms)=845290
Total time spent by all reduce tasks (ms)=24702
Total vcore-milliseconds taken by all map tasks=845290
Total vcore-milliseconds taken by all reduce tasks=24702
Total megabyte-milliseconds taken by all map tasks=865576960
Total megabyte-milliseconds taken by all reduce tasks=50589696
Map-Reduce Framework
Map input records=33448213
Map output records=935
Map output bytes=28541
Map output materialized bytes=30663
Input split bytes=3234
Combine input records=0
Spilled Records=935
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=16391
CPU time spent (ms)=63120
Physical memory (bytes) snapshot=18240372736
Virtual memory (bytes) snapshot=107584892928
Total committed heap usage (bytes)=13746216960
File Input Format Counters
Bytes Read=1171999949
19/11/22 01:00:35 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
暂无答案!
目前还没有任何答案,快来回答吧!