pig group-无法获取多个字段

7y4bm7vi  于 2021-05-29  发布在  Hadoop
关注(0)|答案(3)|浏览(375)

我想知道每个部门拿最高工资的人是谁-我拿到了每个部门的最高工资,但没能把名字写进去。。。。用pig脚本附加文件

  1. EmpData = LOAD '/data/EmpDet3.csv' using PigStorage(',') as
  2. (fname:chararray,lname:chararray,position:chararray,dept:chararray, salary:chararray);
  3. Grp_Dept = GROUP EmpData by dept;
  4. EmpDataC = FOREACH EmpData GENERATE fname,lname,position,dept,(int)SUBSTRING(salary,1,10) as sal;
  5. Group_Pos = GROUP EmpDataC by position;
  6. Max_Sal = FOREACH Group_Pos GENERATE group,MAX(EmpDataC.sal);
  7. dump Max_Sal;

我越来越

  1. (FIRE ENGINEER,103331)
  2. (POLICE OFFICER,90778)
  3. (POLICE OFFICER2,86520)
  4. (WATER RATE TAKER,88968)
  5. (CIVIL ENGINEER IV,104736)
  6. (ELECTRICAL MECHANIC,91520)
  7. (ASST TO THE ALDERMAN,70764)
  8. (GENERAL LABORER - DSS,40560)
  9. (CHIEF CONTRACT EXPEDITER,84780)

我需要的细节以及人的名字

  1. (FIRE ENGINEER,Dudolfi,103331)
  2. (POLICE OFFICER,AARON,90778)
  3. (POLICE OFFICER2,ABBATE,86520)
  4. (WATER RATE TAKER,AARON,88968)
  5. (CIVIL ENGINEER IV,ABAD JR,104736)
  6. (ELECTRICAL MECHANIC,ABBATACOLA,91520)
  7. (ASST TO THE ALDERMAN,ABARCA,70764)
  8. (GENERAL LABORER - DSS,ABARCA,40560)
  9. (CHIEF CONTRACT EXPEDITER,AARON,84780)

我的输入文件

  1. "AARON, ELVIA J",WATER RATE TAKER,WATER MGMNT,"$88,968.00 "
  2. "AARON, JEFFERY M",POLICE OFFICER,POLICE,"$80,778.00 "
  3. "AARON, KARINA",POLICE OFFICER,POLICE,"$90,778.00 "
  4. "AARON, KIMBERLEI R",CHIEF CONTRACT EXPEDITER,GENERAL SERVICES,"$84,780.00 "
  5. "ABAD JR, VICENTE M",CIVIL ENGINEER IV,WATER MGMNT,"$104,736.00 "
  6. "ABARCA, ANABEL",ASST TO THE ALDERMAN,CITY COUNCIL,"$70,764.00 "
  7. "ABARCA, EMMANUEL",GENERAL LABORER - DSS,STREETS & SAN,"$40,560.00 "
  8. "ABBATACOLA, ROBERT J",ELECTRICAL MECHANIC,AVIATION,"$91,520.00 "
  9. "ABBATEMARCO, JAMES J",FIRE ENGINEER,FIRE,"$90,456.00 "
  10. "ABBATE, TERRY M",POLICE OFFICER2,POLICE,"$86,520.00 "
  11. "XXRON, KINA",POLICE OFFICER2,POLICE,"$50,778.00 "
  12. "Dudolfi, Cris",FIRE ENGINEER,FIRE,"$103,331.00 "
irtuqstp

irtuqstp1#

好 啊。。这会有用的。但也要记住,同一职位下的两个人可以有相同的工资,如果工资是最高的,那么下面的脚本也会生成他们的记录

  1. Emp_Data = LOAD '/data/EmpDet3.csv' using PigStorage(',') as
  2. (fname:chararray,lname:chararray,position:chararray,dept:chararray, salary:chararray);
  3. Grp_Dept = GROUP Emp_Data by dept;
  4. Emp_DataC = FOREACH Emp_Data GENERATE fname,lname,position,dept,(int)SUBSTRING(salary,1,10) as sal;
  5. Group_Pos = GROUP Emp_DataC by position;
  6. Pos_max_sal = FOREACH Group_Pos GENERATE group as pos,MAX(EmpDataC.sal) as highest_sal;
  7. Emp_max_sal = join Emp_Data by (position,salary),Pos_max_sal by (pos,highest_sal)
  8. final_set = foreach Emp_max_sal generate position, CONCAT(fname,lname) as emp_name, salary
2exbekwf

2exbekwf2#

--谢谢nihal bhagchandani,bellow你的美尼顿工作了

  1. EmpData = LOAD '/sachin/emp' using PigStorage(',') as (fname:chararray,lname:chararray,position:chararray,dept:chararray, salary:int);
  2. Grp_Dept = GROUP EmpData by dept;
  3. EmpDataC = FOREACH EmpData GENERATE fname,lname,position,dept,salary as sal;
  4. Group_Pos = GROUP EmpDataC by position;
  5. Max_Sal = FOREACH Group_Pos GENERATE group,MAX(EmpDataC.sal) as SalMax;
  6. filterMainData = JOIN EmpData BY salary, Max_Sal by SalMax;
  7. filterData = FOREACH filterMainData GENERATE EmpData::position as position, EmpData::fname as fname, EmpData::salary as salary;
  8. orderedData = ORDER filterData BY salary DESC;
  9. dump orderedData;
mzaanser

mzaanser3#

最好将这两个字段(department和position)分组并转换为元组。
像这样:

  1. Emp_DataC = FOREACH Emp_Data GENERATE fname,lname,position,dept,(int)SUBSTRING(salary,1,10) as sal;
  2. group_data = GROUP Emp_DataC by (dept,position);
  3. tuple_data = foreach group_data generate group as tuple_name:TUBLE(dept as dept:chararray,position as position:chararray),MAX(EmpDataC.sal) as highest_sal;
  4. data = foreach tuple_data generate tuple_name.dept as dept,tuple_name.position as position,highest_sal;

相关问题