折叠配置单元中的行并保持非空值

hi3rlvi2  于 2021-06-25  发布在  Hive
关注(0)|答案(2)|浏览(317)

我在配置单元中有一个表,其中athr\u name和post\u date字段为90%null(在配置单元中用“?”表示)。我想通过用户名、发布日期、页面名称和访问日期查询表和组,以获得访问次数和访客数。但是,我还想将空值组合并替换为athr\u name和post\u date不为空的值(page\u nm包含唯一值,因此只能有正确的athr\u name或空值)。
换句话说,我有:

athr_name post_date         page_nm visit_date visit visitors
1      Steve  9/1/2019 /page1/content/   20191014    45       11
2      Steve  9/1/2019 /page1/content/   20191015    62       38
3      Steve  9/1/2019 /page1/content/   20191016    28       49
4      Steve  9/1/2019 /page1/content/   20191207    54       70
5      Steve  9/1/2019 /page1/content/   20191208    39       26
6          ?         ? /page1/content/   20191014    28       24
7          ?         ? /page1/content/   20191015    17       63
8          ?         ? /page1/content/   20191016    48       40
9          ?         ? /page1/content/   20191017    47       14
10         ?         ? /page1/content/   20191018    33        1

我想把数据折叠成这样:

athr_name post_date         page_nm visit_date visit visitors
1     Steve  9/1/2019 /page1/content/   20191014    73       35
2     Steve  9/1/2019 /page1/content/   20191015    79      101
3     Steve  9/1/2019 /page1/content/   20191016    76       89
4     Steve  9/1/2019 /page1/content/   20191017    47       14
5     Steve  9/1/2019 /page1/content/   20191018    33        1
6     Steve  9/1/2019 /page1/content/   20191207    54       70
7     Steve  9/1/2019 /page1/content/   20191208    39       26

如果它是列而不是行,则可以通过coalesce函数对其进行寻址。非常感谢您的帮助!

drkbr07n

drkbr07n1#

这是你想要的吗?

select max(athr_name), max(post_date), page_nm, 
       visit_date, sum(visit), sum(visitors)
from t
group by page_nm, visit_date;
u3r8eeie

u3r8eeie2#

首先,需要用最后一个值填充nulls值。您的查询可能如下所示:

SELECT athr_name, 
       post_date, 
       page_nm, visit_date, 
       sum(visit), 
       sum(visitors)
from (
    select nvl(athr_name, LAST_VALUE(athr_name, TRUE)
                                              OVER (ORDER BY page_nm, athr_name NULLS LAST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) as athr_name,
           nvl(post_date, LAST_VALUE(post_date, TRUE)
                                              OVER (ORDER BY page_nm, post_date NULLS LAST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) as post_date,
           page_nm,
           visit_date,
           visit,
           visitors
    from your_table) as tmp_view
GROUP BY athr_name, post_date, page_nm, visit_date;

更新:
如果您可能没有相应的用户名或某个页面的发布日期,最好使用此查询来保留此信息:

SELECT athr_name, post_date, page_nm, visit_date, sum(visit), sum(visitors)
from (
         select name_view.athr_name as athr_name,
                date_view.post_date as post_date,
                main.page_nm,
                main.visit_date,
                main.visit,
                main.visitors
         from your_table main
                  LEFT JOIN (select athr_name, page_nm, row_number() over (PARTITION BY page_nm) as rn
                             from your_table
                             where athr_name is not null) name_view
                            ON main.page_nm = name_view.page_nm AND name_view.rn = 1
                  LEFT JOIN (select post_date, page_nm, row_number() over (PARTITION BY page_nm) as rn
                             from your_table
                             where post_date is not null) date_view
                            ON main.page_nm = date_view.page_nm AND date_view.rn = 1) as tmp_view
GROUP BY athr_name, post_date, page_nm, visit_date;

相关问题