下面是HQL中的代码。它从一个超过25亿行、大约334列的表中提取数据。它需要一段时间来运行,所以我正在寻找任何可能的性能提升。这是查询的主要块,还有后续的CTE执行一些进一步的处理,但下面是最昂贵的查询。
我知道窗口函数在这种情况下可能很重,但是它们是必需的,因为需要高度特定的时间戳计算。
任何帮助都非常感谢!
WITH t1 AS (
SELECT
*,
CASE
WHEN (LEAD(hour) OVER (PARTITION BY id ORDER BY `time`) - hour) > 1
THEN NULL
WHEN (LEAD(hour) OVER (PARTITION BY id ORDER BY `time`) - hour) = 1
THEN UNIX_TIMESTAMP(CONCAT(`date`, " ", LPAD((hour + 1), 2, 0), ":00:00"), 'dd/MM/yyyy HH:mm:ss') - UNIX_TIMESTAMP(CONCAT(`date`, " ", REGEXP_REPLACE(`time`, '\\.\\d+', '')), 'dd/MM/yyyy HH:mm:ss')
WHEN (LAG(s_id) OVER (PARTITION BY id ORDER BY `time`) = s_id) = TRUE
AND (hour - LAG(hour) OVER (PARTITION BY id ORDER BY `time`)) = 1
AND (LEAD(hour) OVER (PARTITION BY id ORDER BY `time`) - hour) = 0
THEN (LEAD(UNIX_TIMESTAMP(CONCAT(`date`, " ", REGEXP_REPLACE(`time`, '\\.\\d+', '')), 'dd/MM/yyyy HH:mm:ss'), 1) OVER (PARTITION BY id ORDER BY `time`) -
UNIX_TIMESTAMP(CONCAT(`date`, " ", REGEXP_REPLACE(`time`, '\\.\\d+', '')), 'dd/MM/yyyy HH:mm:ss') + LAG(hour_overlap_add) OVER (PARTITION BY id ORDER BY `time`))
ELSE LEAD(unix_timestamp(CONCAT(`date`, " ", regexp_replace(`TIME`, '\\.\\d+', '')), 'dd/MM/yyyy HH:mm:ss'), 1) OVER (PARTITION BY id, hour ORDER BY `time`)
- unix_timestamp(CONCAT(`date`, " ", regexp_replace(`TIME`, '\\.\\d+', '')), 'dd/MM/yyyy HH:mm:ss')
END AS time_to_next_trans
FROM(
SELECT
*,
(overlap_time_to_next_trans - sec_between_current_trans_and_next_hr) as hour_overlap_add
FROM(
SELECT
c_id,
s_id,
id,
rat,
dt,
`date`,
`time`,
hour,
(LEAD(s_id) OVER (PARTITION BY id ORDER BY `time`) = s_id) AS s_id_change,
LEAD(hour) OVER (PARTITION BY id ORDER BY `time`) - hour AS difference_hour,
UNIX_TIMESTAMP(CONCAT(`date`, " ", LPAD((hour + 1), 2, 0), ":00:00"), 'dd/MM/yyyy HH:mm:ss')
- UNIX_TIMESTAMP(CONCAT(`date`, " ", REGEXP_REPLACE(`time`, '\\.\\d+', '')), 'dd/MM/yyyy HH:mm:ss') AS sec_between_current_trans_and_next_hr,
LEAD(UNIX_TIMESTAMP(CONCAT(`DATE`, " ", REGEXP_REPLACE(`TIME`, '\\.\\d+', '')), 'dd/MM/yyyy HH:mm:ss'), 1) OVER (PARTITION BY id ORDER BY `time`) -
UNIX_TIMESTAMP(CONCAT(`DATE`, " ", REGEXP_REPLACE(`TIME`, '\\.\\d+', '')), 'dd/MM/yyyy HH:mm:ss') AS overlap_time_to_next_trans
FROM database.tablename
WHERE dt = "${rundate}” AND ID <> 0
)j
)o
)
1条答案
按热度按时间7dl7o3gd1#
在步骤1中完成大部分推导。然后重用它们而不再次派生。
希望这对你有帮助。