pig-在配置单元表中存储复杂的关系模式

tpxzln5u  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(399)

这是我今天的交易。好吧,我已经创建了一个关系,作为一对夫妇的转换后,阅读了关系Hive。问题是,我想把经过几次分析后的最终关系存储回Hive,但我做不到。让我们在我的代码里看清楚。
第一个字符串是当我从配置单元加载并转换结果时:

july = LOAD 'POC.july' USING org.apache.hive.hcatalog.pig.HCatLoader ;  
july_cl = FOREACH july GENERATE GetDay(ToDate(start_date)) as day:int,start_station,duration; jul_cl_fl = FILTER july_cl BY day==31; 
july_gr = GROUP jul_cl_fl BY (day,start_station); 
july_result = FOREACH july_gr { 
           total_dura = SUM(jul_cl_fl.duration); 
           avg_dura = AVG(jul_cl_fl.duration); 
           qty_trips = COUNT(jul_cl_fl); 
           GENERATE FLATTEN(group),total_dura,avg_dura,qty_trips;
 };

所以,现在当我尝试存储关系结果时,我不能,因为模式已更改,并且我认为它与配置单元不兼容:
使用org.apache.hive.hcatalog.pig.hcatstorer()将july\u结果存储到'poc.july\u analysis';
即使我试着为最后一段感情制定一个特别的计划,我也没有弄明白。

july_result = FOREACH july_gr {
              total_dura = SUM(jul_cl_fl.duration);
              avg_dura = AVG(jul_cl_fl.duration);
              qty_trips = COUNT(jul_cl_fl);
              GENERATE FLATTEN(group) as (day:int),total_dura as (total_dura:int),avg_dura as (avg_dura:int),qty_trips as (qty_trips:int);
              };
of1yzvn4

of1yzvn41#

通过对hortonworks社区的研究,我得到了如何在pig中为组关系定义输出格式的解决方案。我的新代码如下所示:

july_result = FOREACH july_gr {
              total_dura = SUM(jul_cl_fl.duration);
              avg_dura = AVG(jul_cl_fl.duration);
              qty_trips = COUNT(jul_cl_fl);
              GENERATE FLATTEN( group) AS (day, code_station),(int)total_dura as (total_dura:int),(float)avg_dura as (avg_dura:float),(int)qty_trips as (qty_trips:int);
              };

谢谢你们。

相关问题