我正在处理下面的代码,在编译时遇到了问题。我试图实现的是一个索引词的用法,这样对于每个词,它引用文件中的位置和编号,对于每个文件。假设.txt中有“boy”,我们会得到
男孩/usr/.txt:13
意思是男孩是文件中的第一个和第三个单词
我正在使用下面的代码,在编译时看到两个错误。一个是找不到GenericOptions解析器,另一个是找不到文件名。我试图修改通用的wordcount代码。有人能给我指出正确的方向吗?
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordIndex {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
//context.getInputSplit();
//Path filePath = ((FileSplit) context.getInputSplit()).getPath();
//String filename = ((FileSplit)context.getInputSplit()).getPath().getName();
String line = value.toString();
StringTokenizer itr = new StringTokenizer(line);
//StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
String fileName = ((org.apache.hadoop.mapreduce.lib.input.FileSplit) context.getInputSplit()).getPath().getName();
word.set(itr.nextToken().toLowerCase().replaceAll("[^a-z]+","") +" "+ filename); // get rid of special char
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(DocWordIndex.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
}
1条答案
按热度按时间368yc8dk1#
我将您的代码保持原样,并在进行了3次修改后能够编译:
在下面的语句中,更改
filename
至fileName
(大写)fileName
)更改:
收件人:
进口 Package
GenericOptionsParser
:添加以下导入:
job.setJarByClass(DocWordIndex.class);
job.setJarByClass(WordIndex.class);