许多关系数据库(如Oracle、Snowflake)支持PIVOT函数,您可以使用该函数将行转换为列。但是,Apache Hive还不支持Pivot函数。作为一种替代方法,您可以使用CASE和DECODE语句,根据您的需求将表行转换为列,或将列转换为行。在本文中,我们将用透视函数替代法来检验Hive表的转置方法的不同,并举例说明。
在数据仓库中,创建数据透视表是一项非常常见的任务。在加载到目标表之前,可能必须将行转换为列,或将列转换为行。Microsoft excel是一个流行的工具,它允许您根据自己的需求来透视列。Apache Hive还不支持Pivot或unpivot功能。在Hive中要实现同样的事情并不容易。你必须使用一种变通方法将行转换成列,反之亦然。
但是,您可以使用CASE或DECODE语句在Hive中转换行到列。
如前所述,将行转置到列是加载到目标表之前的常见预处理需求。
在本例中,我们将使用下面的student表。
> select class,section,pass_fail,count from students;
+---------+----------+------------+--------+--+
| class | section | pass_fail | count |
+---------+----------+------------+--------+--+
| CLASS2 | B | FAIL | 333 |
| CLASS2 | B | PASS | 543 |
| CLASS1 | A | FAIL | 123 |
| CLASS1 | A | PASS | 321 |
+---------+----------+------------+--------+--+
考虑以下示例,以将PASS和FAIL行转换为列。
SELECT class,
section,
Sum(CASE
WHEN pass_fail = 'PASS' THEN count
ELSE 0
END) AS PASS,
Sum(CASE
WHEN pass_fail = 'FAIL' THEN count
ELSE 0
END) AS FAIL
FROM students
GROUP BY class,
section;
+---------+----------+-------+-------+--+
| class | section | pass | fail |
+---------+----------+-------+-------+--+
| CLASS1 | A | 321 | 123 |
| CLASS2 | B | 543 | 333 |
+---------+----------+-------+-------+--+
就像前面的示例一样,将列转置或转置到行也是数据仓库环境中非常常见的需求。
例如,考虑下面一个示例销售表。该表包含每月销售数量。
> select month_id,sale_qty from sales_by_month;
+-----------+-----------+--+
| month_id | sale_qty |
+-----------+-----------+--+
| 201601 | 897456 |
| 201603 | 267156 |
| 201602 | 232253 |
| 201604 | 265646 |
| 201606 | 265654 |
| 201606 | 123457 |
+-----------+-----------+--+
现在,让我们将月份值转置为列,将数量转置为行。
SELECT Sum(CASE
WHEN month_id = 201601 THEN sale_qty
ELSE NULL
END) AS m_201601,
Sum(CASE
WHEN month_id = 201602 THEN sale_qty
ELSE NULL
END) AS m_201602,
Sum(CASE
WHEN month_id = 201603 THEN sale_qty
ELSE NULL
END) AS m_201603,
Sum(CASE
WHEN month_id = 201604 THEN sale_qty
ELSE NULL
END) AS m_201604,
Sum(CASE
WHEN month_id = 201605 THEN sale_qty
ELSE NULL
END) AS m_201605,
Sum(CASE
WHEN month_id = 201606 THEN sale_qty
ELSE NULL
END) AS m_201606
FROM sales_by_month;
+---------+---------+---------+---------+-------+---------+--+
| m_01 | m_02 | m_03 | m_04 | m_05 | m_06 |
+---------+---------+---------+---------+-------+---------+--+
| 897456 | 232253 | 267156 | 265646 | NULL | 389111 |
+---------+---------+---------+---------+-------+---------+--+
Apache Hive中的替代Oracle PIVOT 像Oracle和Snowflake这样的关系数据库提供了FOR子句PIVOT函数。您可以使用CASE语句作为替代方法。
下面是一个带有FOR子句的Oracle PIVOT函数示例。
SELECT * FROM order_stats
PIVOT(
COUNT(order_id)
FOR category_name
IN (
'CPU',
'VC',
'MB',
'Storage'
)
)
ORDER BY status;
+------------+------+-----+-----+----------+--+
| status | cpu | vc | mb | storage |
+------------+------+-----+-----+----------+--+
| cancelled | 2 | 0 | 0 | 0 |
| pending | 0 | 1 | 0 | 0 |
| shipped | 1 | 1 | 1 | 1 |
+------------+------+-----+-----+----------+--+
以下是Apache Hive的等效示例。
select status,
count( case when category_name = 'CPU' then order_id else null end ) as CPU,
count( case when category_name = 'VC' then order_id else null end ) as VC,
count( case when category_name = 'MB' then order_id else null end ) as MB,
count( case when category_name = 'Storage' then order_id else null end ) as Storage
from order_stats
group by status
order by status
;
+------------+------+-----+-----+----------+--+
| status | cpu | vc | mb | storage |
+------------+------+-----+-----+----------+--+
| cancelled | 2 | 0 | 0 | 0 |
| pending | 0 | 1 | 0 | 0 |
| shipped | 1 | 1 | 1 | 1 |
+------------+------+-----+-----+----------+--+
内容来源于网络,如有侵权,请联系作者删除!