impala中聚合函数与重采样的sql结合

2w2cym1i  于 2021-07-13  发布在  Hadoop
关注(0)|答案(1)|浏览(516)

我在hadoop中有一个表,其中有不同传感器单元的采样时间数据 ts 1毫秒。我可以使用impala中的以下查询,使用不同聚合函数的组合对单个单元的数据进行重采样(假设我想使用 LAST_VALUE() 作为聚合函数):

  1. SELECT DISTINCT *
  2. from ( select ts_resample, unit,
  3. last_value(Val1) over (partition by ts_resample order by ts rows between unbounded preceding and unbounded following) as Val1,
  4. last_value(Val2) over (partition by ts_resample order by ts rows between unbounded preceding and unbounded following) as Val2
  5. from (
  6. SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/300 as bigint)*300 as TIMESTAMP) as ts_resample,
  7. ts as ts, unit as unit, Val1 as Val1, Val2 as Val2
  8. FROM Sensor_Data.Table1 WHERE unit='Unit1') as t) as tt

如果我为一个单元运行这个查询,那么我得到的是正确的asnwer,没有问题。
但是如果我想根据一些聚合函数对每个单元的数据进行重采样,例如。 LAST_VALUE() 然后我得到了错误的答案和重采样结果为每个单位是相同的,虽然我有不同的数据为每个单位。下面给出了我正在运行的查询,我没有在其中指定任何单位 WHERE 条款:

  1. SELECT DISTINCT *
  2. from(
  3. select ts_resample, unit,
  4. last_value(Val1) over (partition by ts_resample order by ts rows between unbounded preceding and unbounded following) as Val1,
  5. last_value(Val2) over (partition by ts_resample order by ts rows between unbounded preceding and unbounded following) as Val2
  6. from (
  7. SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/300 as bigint)*300 as TIMESTAMP) as ts_resample,
  8. ts as ts, unit as unit, Val1 as Val1, Val2 as Val2
  9. FROM Sensor_Data.Table1) as t) as tt

使用上述查询数据中现有三个单位得到的结果如下:

  1. ts_resample unit Val1 Val2
  2. 2020-12-01 00:00:00 unit1 0.8974 10.485
  3. 2020-12-01 00:00:00 unit2 0.8974 10.485
  4. 2020-12-01 00:00:00 unit3 0.8974 10.485
  5. 2020-12-01 00:05:00 unit1 0.9041 11.854
  6. 2020-12-01 00:05:00 unit2 0.9041 11.854
  7. 2020-12-01 00:05:00 unit3 0.9041 11.854

实际上我想得到每个单位的最后一个值,每个单位的值是不同的。如下所示:

  1. ts_resample unit Val1 Val2
  2. 2020-12-01 00:00:00 unit1 0.8974 10.485
  3. 2020-12-01 00:00:00 unit2 0.9014 11.954
  4. 2020-12-01 00:00:00 unit3 0.7854 10.821
  5. 2020-12-01 00:05:00 unit1 0.9841 11.125
  6. 2020-12-01 00:05:00 unit2 0.8742 10.963
  7. 2020-12-01 00:05:00 unit3 0.9632 11.784

有人能告诉我我的问题是什么吗?
谢谢

2ekbmq32

2ekbmq321#

我通过使用ts\u重采样在分区中提供单元信息来解决这个问题。最终解决方案如下:

  1. SELECT DISTINCT *
  2. from(
  3. select ts_resample, unit,
  4. last_value(Val1) over (partition by ts_resample, unit order by ts rows between unbounded preceding and unbounded following) as Val1,
  5. last_value(Val2) over (partition by ts_resample, unit order by ts rows between unbounded preceding and unbounded following) as Val2
  6. from (
  7. SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/300 as bigint)*300 as TIMESTAMP) as ts_resample,
  8. ts as ts, unit as unit, Val1 as Val1, Val2 as Val2
  9. FROM Sensor_Data.Table1) as t) as tt

在这之后,我得到了我想要的结果,并在我的问题中显示出来。

相关问题