在时间序列数据表中添加缺少的日期记录

nhaq1z21  于 2021-06-18  发布在  Mysql
关注(0)|答案(1)|浏览(274)

我正在使用一个表,其中包含工作日的数据。这些数据基本上都是关于日终余额的。数据如下所示:

ID  Name        Some Val    Other Val   Date

10  Somebody    33001.93    33001.93    2018-10-01
10  Somebody    33481.93    33481.93    2018-10-02
10  Somebody    33001.93    33001.93    2018-10-03
10  Somebody    33582.76    33582.76    2018-10-04
10  Somebody    33582.73    33582.79    2018-10-05
------- Missing row for 2018-10-06 ---------------
------- Missing row for 2018-10-07 ---------------
10  Somebody    33582.76    33582.76    2018-10-08
------- Missing row for 2018-10-09 ---------------
10  Somebody    33462.76    33462.76    2018-10-10

我的任务是计算平均每日余额(每天结束时的总余额/总天数)。为了进行计算,我需要确保我有所有天的数据。为此,最后一行需要替换丢失的数据。
我需要的是:

ID  Name        Some Val    Other Val   Date

10  Somebody    33001.93    33001.93    2018-10-01
10  Somebody    33481.93    33481.93    2018-10-02
10  Somebody    33001.93    33001.93    2018-10-03
10  Somebody    33582.76    33582.76    2018-10-04
10  Somebody    33582.73    33582.79    2018-10-05    
10  Somebody    33582.73    33582.79    2018-10-06
10  Somebody    33582.73    33582.79    2018-10-07    
10  Somebody    33582.76    33582.76    2018-10-08
10  Somebody    33382.76    33582.76    2018-10-09
10  Somebody    33462.76    33462.76    2018-10-10

基本上,行5被写入丢失的行6和7,行8被写入行9。
我得到了部分解决方案,如果只是缺少周末记录的话。

select ID, Name, val1, val2, date from t
union all
select id, name, val1, val2, date + interval 1 day from t where dayofweek(date) = 6
union all
select id, name, val1, val2, date + interval 2 day from t where dayofweek(date) = 6
;

这个部分解决方案是在假设只有周末记录丢失的情况下工作的。通过将数据从星期五复制到星期六和星期日,创建了两个新表。最后将这三个数据集连接在一起。
如果数据在工作周内丢失(例如公共假日),则解决方案将失败,因此只有第6行和第7行被填充。第9行仍然是空的。
如何找到丢失的记录,用最后的记录信息填充它们,从而完成时间序列?我不熟悉sql,但不熟悉编程。有了正确的指针,我就能想出一个解决办法。有人建议我如何解决这个问题。
我使用的mysql版本是:

mysql  Ver 14.14 Distrib 5.7.24, for Linux (x86_64) using  EditLine wrapper
2mbi3lxu

2mbi3lxu1#

如果你的mysql支持 cte recursive 你可以试着用它做一个日历表。
那就做吧 outer join 和子查询 case when 架构(mysql v8.0)

CREATE TABLE T(
   ID int,
   Name varchar(50),
   SomeVal float,   
   OtherVal float,   
   `Date` date
);

insert into T values (10,'Somebody',33001.93,33001.93,'2018-10-01');
insert into T values (10,'Somebody',33481.93,33481.93,'2018-10-02');
insert into T values (10,'Somebody',33001.93,33001.93,'2018-10-03');
insert into T values (10,'Somebody',33582.76,33582.76,'2018-10-04');
insert into T values (10,'Somebody',33582.73,33582.79,'2018-10-05');
insert into T values (10,'Somebody',33582.76,33582.76,'2018-10-08');
insert into T values (10,'Somebody',33462.76,33462.76,'2018-10-10');

查询#1

WITH recursive CTE as(
  SELECT MIN(Date) minDt,MAX(Date) maxDt
  FROM T
  UNION ALL
  SELECT date_add(minDt,INTERVAL 1 DAY),maxDt
  FROM CTE
  WHERE minDt < maxDt
)

SELECT  
    CASE WHEN ID IS NULL THEN (SELECT ID 
                            FROM T tt 
                            WHERE tt.Date < t1.minDt
                            ORDER BY tt.Date DESC
                            LIMIT 1)  
    ELSE ID END ID,
    CASE WHEN Name IS NULL THEN (SELECT Name 
                            FROM T tt 
                            WHERE tt.Date < t1.minDt
                            ORDER BY tt.Date DESC
                            LIMIT 1) 
    ELSE Name END Name,
    CASE WHEN SomeVal IS NULL THEN (SELECT SomeVal 
                            FROM T tt 
                            WHERE tt.Date < t1.minDt
                            ORDER BY tt.Date DESC
                            LIMIT 1) 
    ELSE SomeVal END SomeVal,
    CASE WHEN OtherVal IS NULL THEN (SELECT OtherVal 
                            FROM T tt 
                            WHERE tt.Date < t1.minDt
                            ORDER BY tt.Date DESC
                            LIMIT 1) 
    ELSE OtherVal END OtherVal,
    minDt
FROM CTE t1 
LEFT JOIN T t2 ON t1.minDt = t2.Date
ORDER BY t1.minDT;

| ID  | Name     | SomeVal        | OtherVal       | minDt      |
| --- | -------- | -------------- | -------------- | ---------- |
| 10  | Somebody | 33001.9296875  | 33001.9296875  | 2018-10-01 |
| 10  | Somebody | 33481.9296875  | 33481.9296875  | 2018-10-02 |
| 10  | Somebody | 33001.9296875  | 33001.9296875  | 2018-10-03 |
| 10  | Somebody | 33582.76171875 | 33582.76171875 | 2018-10-04 |
| 10  | Somebody | 33582.73046875 | 33582.7890625  | 2018-10-05 |
| 10  | Somebody | 33582.73046875 | 33582.7890625  | 2018-10-06 |
| 10  | Somebody | 33582.73046875 | 33582.7890625  | 2018-10-07 |
| 10  | Somebody | 33582.76171875 | 33582.76171875 | 2018-10-08 |
| 10  | Somebody | 33582.76171875 | 33582.76171875 | 2018-10-09 |
| 10  | Somebody | 33462.76171875 | 33462.76171875 | 2018-10-10 |

db fiddle视图
如果你的mysql版本不支持 cte ,可以为创建日历表 outer join 架构(mysql v5.7)

CREATE TABLE T(
   ID int,
   Name varchar(50),
   SomeVal float,   
   OtherVal float,   
   `Date` date
);

insert into T values (10,'Somebody',33001.93,33001.93,'2018-10-01');
insert into T values (10,'Somebody',33481.93,33481.93,'2018-10-02');
insert into T values (10,'Somebody',33001.93,33001.93,'2018-10-03');
insert into T values (10,'Somebody',33582.76,33582.76,'2018-10-04');
insert into T values (10,'Somebody',33582.73,33582.79,'2018-10-05');
insert into T values (10,'Somebody',33582.76,33582.76,'2018-10-08');
insert into T values (10,'Somebody',33462.76,33462.76,'2018-10-10');

CREATE Table calendar(
   minDt Date
);

INSERT INTO calendar values ('2018-10-01');
INSERT INTO calendar values ('2018-10-02');
INSERT INTO calendar values ('2018-10-03');
INSERT INTO calendar values ('2018-10-04');
INSERT INTO calendar values ('2018-10-05');
INSERT INTO calendar values ('2018-10-06');
INSERT INTO calendar values ('2018-10-07');
INSERT INTO calendar values ('2018-10-08');
INSERT INTO calendar values ('2018-10-09');
INSERT INTO calendar values ('2018-10-10');

查询#1

SELECT  
    CASE WHEN ID IS NULL THEN (SELECT ID 
                            FROM T tt 
                            WHERE tt.Date < t1.minDt
                            ORDER BY tt.Date DESC
                            LIMIT 1)  
    ELSE ID END ID,
    CASE WHEN Name IS NULL THEN (SELECT Name 
                            FROM T tt 
                            WHERE tt.Date < t1.minDt
                            ORDER BY tt.Date DESC
                            LIMIT 1) 
    ELSE Name END Name,
    CASE WHEN SomeVal IS NULL THEN (SELECT SomeVal 
                            FROM T tt 
                            WHERE tt.Date < t1.minDt
                            ORDER BY tt.Date DESC
                            LIMIT 1) 
    ELSE SomeVal END SomeVal,
    CASE WHEN OtherVal IS NULL THEN (SELECT OtherVal 
                            FROM T tt 
                            WHERE tt.Date < t1.minDt
                            ORDER BY tt.Date DESC
                            LIMIT 1) 
    ELSE OtherVal END OtherVal,
    minDt
FROM calendar t1 
LEFT JOIN T t2 ON t1.minDt = t2.Date
ORDER BY t1.minDT;

| ID  | Name     | SomeVal        | OtherVal       | minDt      |
| --- | -------- | -------------- | -------------- | ---------- |
| 10  | Somebody | 33001.9296875  | 33001.9296875  | 2018-10-01 |
| 10  | Somebody | 33481.9296875  | 33481.9296875  | 2018-10-02 |
| 10  | Somebody | 33001.9296875  | 33001.9296875  | 2018-10-03 |
| 10  | Somebody | 33582.76171875 | 33582.76171875 | 2018-10-04 |
| 10  | Somebody | 33582.73046875 | 33582.7890625  | 2018-10-05 |
| 10  | Somebody | 33582.73046875 | 33582.7890625  | 2018-10-06 |
| 10  | Somebody | 33582.73046875 | 33582.7890625  | 2018-10-07 |
| 10  | Somebody | 33582.76171875 | 33582.76171875 | 2018-10-08 |
| 10  | Somebody | 33582.76171875 | 33582.76171875 | 2018-10-09 |
| 10  | Somebody | 33462.76171875 | 33462.76171875 | 2018-10-10 |

db fiddle视图

相关问题