我最近开始学习如何使用Hadoop系统,并决定是时候尝试编写一些代码了。在此之前,我想尝试运行入门页面中的示例。但是,它似乎没有产生任何可见的结果。
我目前使用的是Hadoop 3.3.1版本,使用的是单节点设置,使用的是jdk 11.0.11。我在Windows 10上运行此版本(由于当前的开发要求)。
我在cmd上使用了以下命令:
hadoop jar %hadoop_home%/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar grep input /output 'dfs[a-z.]+'
命令的输出:
C:\Windows\system32>hadoop jar %hadoop_home%/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar grep input /output 'dfs[a-z.]+'
2021-12-15 00:33:10,486 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
2021-12-15 00:33:10,800 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/E/.staging/job_1639519343908_0005
2021-12-15 00:33:11,029 INFO input.FileInputFormat: Total input files to process : 10
2021-12-15 00:33:11,108 INFO mapreduce.JobSubmitter: number of splits:10
2021-12-15 00:33:11,281 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1639519343908_0005
2021-12-15 00:33:11,281 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-12-15 00:33:11,442 INFO conf.Configuration: resource-types.xml not found
2021-12-15 00:33:11,443 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-12-15 00:33:11,497 INFO impl.YarnClientImpl: Submitted application application_1639519343908_0005
2021-12-15 00:33:11,527 INFO mapreduce.Job: The url to track the job: http://DESKTOP-S15C716:8088/proxy/application_1639519343908_0005/
2021-12-15 00:33:11,528 INFO mapreduce.Job: Running job: job_1639519343908_0005
2021-12-15 00:33:19,611 INFO mapreduce.Job: Job job_1639519343908_0005 running in uber mode : false
2021-12-15 00:33:19,615 INFO mapreduce.Job: map 0% reduce 0%
2021-12-15 00:33:31,178 INFO mapreduce.Job: map 50% reduce 0%
2021-12-15 00:33:32,263 INFO mapreduce.Job: map 60% reduce 0%
2021-12-15 00:33:39,624 INFO mapreduce.Job: map 90% reduce 0%
2021-12-15 00:33:40,632 INFO mapreduce.Job: map 100% reduce 0%
2021-12-15 00:33:41,636 INFO mapreduce.Job: map 100% reduce 100%
2021-12-15 00:33:41,648 INFO mapreduce.Job: Job job_1639519343908_0005 completed successfully
2021-12-15 00:33:41,760 INFO mapreduce.Job: Counters: 51
File System Counters
FILE: Number of bytes read=6
FILE: Number of bytes written=3021766
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=31877
HDFS: Number of bytes written=86
HDFS: Number of read operations=35
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Killed map tasks=1
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=89653
Total time spent by all reduces in occupied slots (ms)=8222
Total time spent by all map tasks (ms)=89653
Total time spent by all reduce tasks (ms)=8222
Total vcore-milliseconds taken by all map tasks=89653
Total vcore-milliseconds taken by all reduce tasks=8222
Total megabyte-milliseconds taken by all map tasks=91804672
Total megabyte-milliseconds taken by all reduce tasks=8419328
Map-Reduce Framework
Map input records=819
Map output records=0
Map output bytes=0
Map output materialized bytes=60
Input split bytes=1139
Combine input records=0
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=60
Reduce input records=0
Reduce output records=0
Spilled Records=0
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=90
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=2952790016
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=30738
File Output Format Counters
Bytes Written=86
2021-12-15 00:33:41,790 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
2021-12-15 00:33:41,814 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/E/.staging/job_1639519343908_0006
2021-12-15 00:33:41,855 INFO input.FileInputFormat: Total input files to process : 1
2021-12-15 00:33:41,913 INFO mapreduce.JobSubmitter: number of splits:1
2021-12-15 00:33:41,950 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1639519343908_0006
2021-12-15 00:33:41,950 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-12-15 00:33:42,179 INFO impl.YarnClientImpl: Submitted application application_1639519343908_0006
2021-12-15 00:33:42,190 INFO mapreduce.Job: The url to track the job: http://DESKTOP-S15C716:8088/proxy/application_1639519343908_0006/
2021-12-15 00:33:42,191 INFO mapreduce.Job: Running job: job_1639519343908_0006
2021-12-15 00:33:55,301 INFO mapreduce.Job: Job job_1639519343908_0006 running in uber mode : false
2021-12-15 00:33:55,302 INFO mapreduce.Job: map 0% reduce 0%
2021-12-15 00:34:00,336 INFO mapreduce.Job: map 100% reduce 0%
2021-12-15 00:34:06,366 INFO mapreduce.Job: map 100% reduce 100%
2021-12-15 00:34:07,375 INFO mapreduce.Job: Job job_1639519343908_0006 completed successfully
2021-12-15 00:34:07,404 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=6
FILE: Number of bytes written=548197
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=212
HDFS: Number of bytes written=0
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3232
Total time spent by all reduces in occupied slots (ms)=3610
Total time spent by all map tasks (ms)=3232
Total time spent by all reduce tasks (ms)=3610
Total vcore-milliseconds taken by all map tasks=3232
Total vcore-milliseconds taken by all reduce tasks=3610
Total megabyte-milliseconds taken by all map tasks=3309568
Total megabyte-milliseconds taken by all reduce tasks=3696640
Map-Reduce Framework
Map input records=0
Map output records=0
Map output bytes=0
Map output materialized bytes=6
Input split bytes=126
Combine input records=0
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=6
Reduce input records=0
Reduce output records=0
Spilled Records=0
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=13
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=536870912
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=86
File Output Format Counters
Bytes Written=0
然而,当查看现在创建的“output”文件夹的内容时,我收到了以下结果:
hdfs dfs -ls /output
Found 2 items
-rw-r--r-- 1 E supergroup 0 2021-12-15 00:34 /output/_SUCCESS
-rw-r--r-- 1 E supergroup 0 2021-12-15 00:34 /output/part-r-00000
也就是说,没有数据写入这些文件!有人可以帮助我吗?
1条答案
按热度按时间bcs8qyzn1#
如果HDFS
input
文件夹中没有与grep
模式'dfs[a-z.]+'
匹配的数据,则输出将为空从链接的文档(适用于Unix,而非Windows)中,确保此命令已完成
您还可以在本地执行
grep dfs $HADOOP_HOME/etc/hadoop/*.xml
(至少在Unix上),以验证是否有数据输出