我正在学习使用Hadoop。我有一台旧笔记本电脑,我安装了Linux Mint 21。我能够安装Hadoop。
下面的命令是正确的:当我运行hdfs dfs -ls /
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/home/dell/hadoop/share/hadoop/common/lib/hadoop-auth-2.7.3.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Found 2 items
drwxr-xr-x - nenn supergroup 0 2023-02-11 15:30 /my_data
drwx------ - nenn supergroup 0 2023-02-11 15:21 /tmp
在my_data中,当我执行hdfs dfs -ls -R /
drwxr-xr-x - nenn supergroup 0 2023-02-11 15:30 /my_data
> -rw-r--r-- 1 nenn supergroup 1174876 2023-02-11 15:01 /my_data/book1.txt
drwx------ - nenn supergroup 0 2023-02-11 15:21 /tmp
drwx------ - nenn supergroup 0 2023-02-11 15:21 /tmp/hadoop-yarn
drwx------ - nenn supergroup 0 2023-02-11 15:29 /tmp/hadoop-yarn/staging
drwx------ - nenn supergroup 0 2023-02-11 15:21 /tmp/hadoop-yarn/staging/d
然后运行脚本hadoop jar /home/nenn/wordcount.jar WordCount /my_data/book1.txt /my_data/output_wordcount
23/02/11 15:33:55 INFO client.RMProxy: Connecting to ResourceManager at /
23/02/11 15:33:55 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
23/02/11 15:33:56 INFO input.FileInputFormat: Total input paths to process : 1
23/02/11 15:33:56 INFO mapreduce.JobSubmitter: number of splits:1
23/02/11 15:33:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1676124615395_0004
23/02/11 15:33:56 INFO impl.YarnClientImpl: Submitted application application_1676124615395_0004
23/02/11 15:33:56 INFO mapreduce.Job: The url to track the job: http://my-computer-05:8088/proxy/application_1676124615395_0004/
23/02/11 15:33:56 INFO mapreduce.Job: Running job: job_1676124615395_0004
这个脚本wordcount. jar是学校给我的。2我在学校试的时候,它还能用。3但是现在我想在我自己的电脑上试一下,我知道它能不能用。
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
// defining the class WordCount
public class WordCount {
// defining the class TokenizerMapper
// this class is in charge of the mapping process
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
// it extends the class Mapper from mapreduce api
// this mapper takes as input an Object (identifier of the partition) and a Text (the partition of the text)
// it outputs a Text (a word) and an Integer (1)
// defining the value to emit
private final static IntWritable one = new IntWritable(1);
// initializing the word to emit
private Text word = new Text();
// defining the function performed during map
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
// tokenizing the text partition
StringTokenizer itr = new StringTokenizer(value.toString());
// running through the tokens
while (itr.hasMoreTokens()) {
// setting the value of word
word.set(itr.nextToken().toLowerCase().replaceAll("[^a-z 0-9A-Z]",""));
// emitting the key-value pair
context.write(word, one);
// defining the class IntSumReducer
// this class is in charge of the reducing process
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
// it extends the class Reducer from mapreduce api
// it takes as input a Text (a word) and a list of integers (1s)
// it outputs a Text (a word) and an integer (the frequency of the word)
// initializing the frequency
private IntWritable result = new IntWritable();
// defining the function performed during reduce
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
// initializing the the sum
int sum = 0;
// running through the values associated to this key
for (IntWritable val : values) {
// incrementing the sum
sum += val.get();
// attributing the sum to the value to emit
// emitting the key-value pair
context.write(key, result);
// defining the main class containing the parameters of the job
public static void main(String[] args) throws Exception {
// initializing configuration
Configuration conf = new Configuration();
// initializing job
Job job = Job.getInstance(conf, "word count");
// providing job with the classes for mapper and reducer
job.setMapperClass(TokenizerMapper.class); // mapper
job.setCombinerClass(IntSumReducer.class); // combiner
job.setReducerClass(IntSumReducer.class); // reducer
// providing job with the output classes
// arguments to interpret
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// completion of the job
System.exit(job.waitForCompletion(true) ? 0 : 1);
The url to track the job
,然后在YARN UI中查看应用程序的 * 实际 * 日志或者使用
yarn logs