我很难为前10个(键,值)对输出编写reducer代码。
我当前的输出格式为((年,市场),总量)。我要找的是每年前10名的总金额。我当前的代码是每年为每个市场输出每个金额。
如有任何建议,我们将不胜感激!
Map器:
public class FundingMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text Year = new Text();
private Text Market = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
CSVReader reader = new CSVReader(new StringReader(line));
String[] array = reader.readNext();
reader.close();
Year.set(array[14]);
Market.set(array[3]);
String amountString = array[15].replaceAll("[^0-9]","");
int amount = 0;
try {
amount = Integer.parseInt(amountString);
}
catch(NumberFormatException nfe) {
return;
}
IntWritable intW = new IntWritable(amount);
String S = new StringBuilder().append(Year + " ").append(Market + " ").toString();
context.write(new Text(S), intW);
}
}
减速器:
public class FundingReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,
InterruptedException {
int sum = 0;
for(IntWritable value : values) {
sum += value.get();
}
context.write(key, new IntWritable(sum));
}
}
数据样本:
/organization/contravir-pharmaceuticals ContraVir Pharmaceuticals |Biotechnology| Biotechnology USA NY New York City New York /funding-round/9a7cc724deba554585e2b79c14605866 post_ipo_equity 8/22/14 2014-08 2014-Q3 2014 4,742,648
/organization/contravir-pharmaceuticals ContraVir Pharmaceuticals |Biotechnology| Biotechnology USA NY New York City New York /funding-round/04a7ec54417a0f9a6c99cf8db2eac819 venture A 10/15/14 2014-10 2014-Q4 2014 9,000,000
/organization/contravir-pharmaceuticals ContraVir Pharmaceuticals |Biotechnology| Biotechnology USA NY New York City New York /funding-round/328384053df3a992ca6d5da55ca0420e venture 2/14/14 2014-02 2014-Q1 2014 3,225,000
/organization/contrib-com contrib.com |Entrepreneur|Technology|Domains|Education|Social Media| Social Media USA FL Palm Beaches Delray Beach /funding-round/fea112ed22657c1456820aa26af3ab17 seed 6/17/14 2014-06 2014-Q2 2014 300,000
输出样本:
2014 Biotechnology 16967648
2014 Social Media 300000
1条答案
按热度按时间wooyq4lh1#
您需要在Map输出中输入key as year。这将确保您每年在reducer中的某个时间获得值。然后你可以过滤出10个值到你的输出中。看看下面。