apachepig:联合的问题

wwtsj6pe  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(280)

apachepig:我有两个具有相同模式的数据集x,y。x有222条记录,y有70条记录。我需要合并两个,意思是垂直添加两个数据集。如果我使用union,问题是输出的记录数超过了预期的数量。即z=并集x,y;有888张唱片。谁能给点建议吗。
示例代码:

  1. UI_INFO = filter LOGS BY hotel_book.step == 'UI_INFO';
  2. LOGS_ITINERARY = filter LOGS BY hotel_book.step == 'CREATE_ITINERARY';
  3. LOGS_JOINED = join UI_INFO by header.itinerary_id LEFT OUTER , LOGS_ITINERARY by header.itinerary_id;
  4. LOGS_BOOK_COL1 = FOREACH LOGS_JOINED {
  5. CURRENCY = LOGS_ITINERARY::hotel_book.itinerary.hotel.rooms.currency;
  6. GENERATE UI_INFO::header.date_time AS date_time,
  7. LOGS_ITINERARY::hotel_book.pay_at_hotel AS pay_at_hotel,
  8. UI_INFO::header.referrer AS referrer,
  9. UI_INFO::hotel_book.step AS stage,
  10. FLATTEN( (IsEmpty(CURRENCY) ? TOBAG('unknown') : CURRENCY) ) AS currency;
  11. };
  12. REMAINING_LOGS = FILTER LOGS BY (hotel_book.step == 'CREATE_ITINERARY' OR hotel_book.step == 'PROVISIONAL_BOOK')
  13. LOGS_BOOK_COL2 = FOREACH REMAINING_LOGS {
  14. CURRENCY = hotel_book.itinerary.hotel.rooms.currency;
  15. GENERATE header.date_time AS date_time,
  16. hotel_book.pay_at_hotel AS pay_at_hotel,
  17. header.referrer AS referrer,
  18. hotel_book.step AS stage,
  19. FLATTEN( (IsEmpty(CURRENCY) ? TOBAG('unknown') : CURRENCY) ) AS currency;
  20. };
  21. LOGS_BOOK_COL = UNION LOGS_BOOK_COL1,LOGS_BOOK_COL2;

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题