我正在写一个pig程序,它读取包含city、zip的文件,然后将city传递给udf。自定义项将加载一个包含哈希图中的县、市的文件。然后udf从哈希Map中找到城市的县并返回它。
请让我知道我在这里做错了什么;运行程序时出现以下错误:
2014-12-28 16:15:16,506 WARN org.apache.hadoop.mapred.Child: Error running child
org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught error from UDF: GetCounty, Out of bounds access [1]
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:370)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:434)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:340)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at GetCounty.exec(GetCounty.java:33)
at GetCounty.exec(GetCounty.java:1)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:337)
... 15 more
2014-12-28 16:15:16,510 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
输入文件包含以下数据:
File zipcity:
irving 75038
san francisco 94903
san rafael 94905
las vegas 98043
coppel 75063
File citycounty:
irving dallas
las vegas tarrant
san francisco san francisco
coppel dallas
public class GetCounty extends EvalFunc<String> {
String lookupfile;
HashMap<String, String> lookup = null;
public String exec(Tuple input) throws IOException{
if ( input.size() != 1 ){
return null;
}
if ( lookup == null ) {
FileSystem fs = FileSystem.get(UDFContext.getUDFContext().getJobConf());
DataInputStream in = fs.open(new Path(lookupfile));
String line;
while ( (line = in.readLine()) != null){
String[] tok = new String[2];
tok = line.split(":", 2);
lookup.put(tok[0], tok[1]);
}}
String city = (String) input.get(0);
return lookup.get(city);
}
public GetCounty(String f){
lookupfile = f;
}
}
我调用pig如下:
grunt> register 'PigMyUDF.jar';
grunt> define GetCounty GetCounty('pig/citycounty');
grunt> a = load 'pig/zipcity' as ( city:chararray, zip:int );
grunt> b = foreach a generate city, zip, GetCounty(city);
grunt> dump b;
1条答案
按热度按时间gzszwxb41#
你能试试这个吗?。输入字段由制表符分隔。
拉链
城市国家
Pig手稿:
输出: