配置单元sql:获取特定范围内每年对应的最大值列表

am46iovg  于 2021-06-24  发布在  Hive
关注(0)|答案(2)|浏览(447)

我有这样一个数据集:

ticker | value | year
  A       1      2008
  A       2      2008
  A       5      2010
  A       6      2006
  B       7      2010

我正在尝试编写一个hivesql,它在ticker上返回一个groupby,其中的值是在一定范围内对应于每年的最大值列表。在本例中,如果范围是2008-2010,它将返回类似的值(不确定结果的结构实际上是什么样子,我希望您能够理解我的意思):

A [(year: 2008, value: 2), (year: 2009, value: 0 (found no value for this year)), (year: 2010, value: 5)]
B [(year: 2008, value: 0), (year: 2009, value: 0), (year: 2010, value: 7)]

特别是,我不明白如何要求hive返回与数据集中某个值的范围相对应的值列表。如果你能帮忙,我会很高兴的。

kse8i1jr

kse8i1jr1#

如果只需要有效值,可以使用where-between

select  ticker, max(value), year
    from my_table 
    where year between 2008 and 2010 
    group by ticker, year

    or if need  all the year you could build the set using union  

   select  m.ticker, max(coalesce(m.value,0)), m.year
    from  (
        select 2008 year
        union
        select 2009 year
        union 
        select 2010 year
        ) t
    left join my_table m on t.year = m.year
    group by ticker, year
ckocjqey

ckocjqey2#

如果你想为每一个股票代码和年份排一行,那么使用 cross join 生成行和 left join 引入价值观。多年来,使用子查询生成值非常容易:

select t.ticker, y.year, coalesce(max(d.value, 0)) as max_value
from (select distinct ticker from dataset) t cross join
     (select 2008 as year union all select 2009 union all select 2020) y left join
     dataset d
     on d.ticker = t.ticker and d.year = y.year
group by t.ticker, y.year
order by t.ticker, y.year;

对于较大的范围,可以使用如下子查询:

(select row_number() over () + 2008 - 1 as year
 from dataset
 limit 3
) y

相关问题