配置单元-从上一行复制值

f0brbegy  于 2021-06-26  发布在  Hive
关注(0)|答案(0)|浏览(282)

我正在尝试编写一个配置单元查询,如果当前字段中的值为null,该查询将从同一列中的前一行复制字段的值。如果当前值不为空,则应保留该值。例如,如果我有以下输入:

company    empId   first_name   last_ame    job_code   department   start_date
    110        500400   ABC          XYZ         300        101         01/20/2015
    110        500400   Null         Null        305        105         04/02/2015
    110        500400   ABC1         Null        Null       Null        15/02/1015
    110        500400   Null         XYZ1        307        Null        01/03/2015

输出应该是这样的:

company    empId   first_name   last_name   job_code   department   start_date
    110        500400   ABC          XYZ         300        101         01/20/2015
    110        500400   ABC          XYZ         305        105         04/02/2015
    110        500400   ABC1         XYZ         305        105         15/02/1015
    110        500400   ABC1         XYZ1        307        105         01/03/2015

我尝试了使用last\u value和lag函数的查询,但似乎都不起作用。对于last_值,它仅在行数有限时工作。当我在一个大数据集上运行它时,它失败了(map red没有完成)。这是我正在尝试的问题:

select
    company, empId, start_date,
    last_value(last_name, true) over (partition by company, empId order by    start_date) as last_name,
    last_value(first_name, true) over (partition by company, empId order by start_date) as first_name,
    last_value(department, true) over (partition by company, empId order by start_date) as department,
    last_value(job_code, true) over(partition by company,empId order by start_date) as job_code from samples.z_sample_test order by start_date;

对于lag,只有一个记录得到更新。不会更新所有后续记录。这是我正在使用的查询:

select
    c.company,
    c.empId,
    c.start_date,
    if(c.first_name is null, lag(c.first_name, 1) over (order by c.start_date), c.first_name) as first_name,
    if(c.last_name is null, lag(c.last_name, 1) over (order by    c.start_date), c.last_name) as last_name,
    if(c.job_code is null, lag(c.job_code, 1) over (order by c.start_date), c.job_code) as job_code,
    if(c.department is null, lag(c.department, 1) over (order by c.start_date), c.department) as department
    from samples.z_sample_test c
    left join samples.z_sample_test p
    on (c.company = p.company and c.empId = p.empId)
    group by c.company, c.employee, c.start_date, c.last_name, c.first_name,  c.job_code, c.department order by c.start_date;

我很感激你在这件事上的帮助。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题