sql—使用bigquery查找特定日期的总体数据

bwleehnv  于 2021-07-26  发布在  Java
关注(0)|答案(2)|浏览(521)

我正在使用bigquery创建一个查询,该查询将计算从每个id的特定日期起5天内的数据总数。到目前为止,我成功地得到了结果,结果返回了每天(从第1天到第5天)的数据,这意味着结果中有5个日期。然而,我想要的实际上是只返回1个日期,这是最后一个日期,所以它将像1个日期一样,显示从每个id的特定日期开始的过去5天的总体数据。
样本数据:

示例代码:

SELECT
   ID,
   Date,
    SUM(CASE WHEN Language = 'EN' THEN 1 ELSE 0 END) AS lang_EN,
    SUM(CASE WHEN Language = 'SN' THEN 1 ELSE 0 END) AS lang_SN,
     FROM (
      SELECT 
      DATE(Timestamp) as Date ,
       ID, 
      CASE 
     WHEN Language in ('EN', 'English') THEN 'EN' 
     WHEN Language in ('MY', 'Malay')  THEN 'MY' ELSE Language 
     END AS Language,  
           FROM t
           WHERE Smooch_User_ID IS NOT NULL AND DATE(Timestamp) between '2020-01-01' and '2020-01-31'
           GROUP BY   ID, Language, DATE(Timestamp) 
         )
          GROUP BY  ID,Date

样本输出:
到目前为止,查询返回从第1天到第5天的每个日期的数据。如何使其返回从日期1到5的总体数据,以及仅返回第5天日期的列日期,如下所示(假设第1天是2020-01-01,第5天是2020-01-05)。

2lpgd968

2lpgd9681#

我认为一种方法可以是声明变量,如果您使用的是脚本,那么您可以根据间隔中的结束日期提取/标记输出。如。

declare end_dt date default "2020-07-05"; -- this can be used as placeholder and replaced at run-time based on execution
    declare start_dt date default DATE_SUB(end_dt, INTERVAL 5 DAY);

    select 
      end_dt as ts_date, 
      id, 
      sum(CASE WHEN lang = 'EN' THEN 1 ELSE 0 END) AS lang_EN, 
      sum(CASE WHEN lang = 'SN' THEN 1 ELSE 0 END) AS lang_SN 
    from(
      -- sample data
      select cast("2020-07-01 01:01:25.550 UTC" as timestamp) as ts, "A" as id, "EN" as lang
      union all
      select cast("2020-07-01 01:01:25.550 UTC" as timestamp) as ts, "A" as id, "EN" as lang
      union all
      select cast("2020-07-01 01:01:25.550 UTC" as timestamp) as ts, "A" as id, "SN" as lang
      union all
      select cast("2020-07-02 01:01:25.550 UTC" as timestamp) as ts, "B" as id, "EN" as lang
      union all
      select cast("2020-07-02 01:01:25.550 UTC" as timestamp) as ts, "B" as id, "EN" as lang
      union all
      select cast("2020-07-02 01:01:25.550 UTC" as timestamp) as ts, "B" as id, "SN" as lang
      union all
      select cast("2020-07-02 01:01:25.550 UTC" as timestamp) as ts, "B" as id, "SN" as lang
      union all
      select cast("2020-07-03 01:01:25.550 UTC" as timestamp) as ts, "C" as id, "SN" as lang
      union all
      select cast("2020-07-03 01:01:25.550 UTC" as timestamp) as ts, "C" as id, "EN" as lang
      union all
      select cast("2020-07-03 01:01:25.550 UTC" as timestamp) as ts, "D" as id, "SN" as lang
      union all
      select cast("2020-07-04 01:01:25.550 UTC" as timestamp) as ts, "D" as id, "EN" as lang
    ) where date(ts) between start_dt and end_dt
    group by 1,2

和输出:

Row     ts_date     id  lang_EN     lang_SN      
1       2020-07-05  A   2           1    
2       2020-07-05  B   2           2    
3       2020-07-05  C   1           1    
4       2020-07-05  D   1           1
yws3nbqq

yws3nbqq2#

你的问题太复杂了。如果使用 COUNT(DISTINCT) :

SELECT ID, 
       COUNT(DISTINCT Language in ('EN', 'English') THEN DATE(timestamp) END) AS lang_EN,
       COUNT(DISTINCT Language in ('MY', 'Malay') THEN DATE(timestamp) END) AS lang_SN
FROM t
WHERE Smooch_User_ID IS NOT NULL AND
      DATE(Timestamp) <= DATE('2020-01-31') AND
      DATE(Timestamp) >= DATE_ADD(DATE('2020-01-31'), INTERVAL 5 DAY)
GROUP BY ID;

我认为这也捕获了您试图实现的日期逻辑。

相关问题