Oracle SQL -匹配时更新并插入

eh57zj3b  于 2023-04-11  发布在  Oracle
关注(0)|答案(1)|浏览(202)

我在这里发现了一些类似的问题,但似乎没有一个真正适合我的情况。
我正在做的表格是这样的,它是一个记录学生在课程上表现的表格:

| STUDENT_ID | COURSE_ID | ENROLLMENT_TYPE | MARK | STATUS  | VERSION |
|            |           |                 |      |         |         |
| 1234       | 5678      | Mandatory       | 70   | ACTIVE  | 2       |
| 1234       | 5678      | Optional        | 70   | HISTORY | 1       |
| 1234       | 5678      | Optional        | null | HISTORY | 0       |
| 9876       | 4597      | Institutional   | 99   | ACTIVE  | 1       |
| 9876       | 4597      | Institutional   | null | HISTORY | 0       |

我需要将此表与另一个表合并,该表根据学生的组跟踪学生的注册情况,以便我可以根据需要插入或更新行:

| GROUP_ID | STUDENT_ID | COURSE_ID | ENROLLMENT_TYPE |
| 4976555  | 1234       | 5678      | Mandatory       |
| 6399875  | 1234       | 9034      | Optional        |
| 6399875  | 9876       | 4597      | Institutional   |

长话短说,我需要检查注册类型是否相同或已更改,因为有些学生根据他们所属的组注册课程,但组可以在一夜之间更改。
到这里为止一切都很好,但是如果行匹配,我还需要复制我要更新的行并将其设置为“HISTORY”,这样我们就可以记录对某行进行的所有更新。
目前,我有一个典型的

MERGE INTO performance USING group_enrollments
ON performance.STUDENT_ID = group_enrollments.STUDENT_ID 
AND performance.COURSE_ID = group_enrollments.COURSE_ID
WHEN MATCHED THEN UPDATE ......
WHEN NOT MATCHED THEN INSERT ......

在我之前做这部分代码的人认为在合并之前复制所有的行作为“HISTORY”行是一个好主意,但是这给我们带来了问题,因为这个过程每天晚上都运行,每次写入超过150.000行。
16/01 12.34:更新了有关表及其关系的更多信息

v7pvogib

v7pvogib1#

好了,首先,我会写一个查询,生成要更新和/或插入的行:

WITH    performance AS (SELECT 1234 student_id, 5678 course_id, 'Mandatory' enrollment_type, 70 mark, 'ACTIVE' status, 2 VERSION FROM dual UNION ALL
                        SELECT 1234 student_id, 5678 course_id, 'Optional' enrollment_type, 70 mark, 'HISTORY' status, 1 VERSION FROM dual UNION ALL
                        SELECT 1234 student_id, 5678 course_id, 'Optional' enrollment_type, NULL mark, 'HISTORY' status, 0 VERSION FROM dual UNION ALL
                        SELECT 9876 student_id, 4597 course_id, 'Institutional' enrollment_type, 99 mark, 'ACTIVE' status, 1 VERSION FROM dual UNION ALL
                        SELECT 9876 student_id, 4597 course_id, 'Institutional' enrollment_type, NULL mark, 'HISTORY' status, 0 VERSION FROM dual),
  group_enrollments AS (SELECT 4976555 group_id, 1234 student_id, 5678 course_id, 'Mandatory2' enrollment_type FROM dual UNION ALL
                        SELECT 6399875 group_id, 1234 student_id, 9034 course_id, 'Optional' enrollment_type FROM dual UNION ALL
                        SELECT 6399875 group_id, 9876 student_id, 4597 course_id, 'Institutional' enrollment_type FROM dual)
-- end of mimicking your tables with data in them
SELECT res.student_id,
       res.course_id,
       CASE WHEN dummy.id = 1 THEN res.new_enrollment_type
            WHEN dummy.id = 2 THEN res.old_enrollment_type
       END enrollment_type,
       res.mark,
       CASE WHEN dummy.id = 1 THEN 'ACTIVE'
            WHEN dummy.id = 2 THEN 'HISTORY'
       END status,
       CASE WHEN dummy.id = 1 THEN res.new_version
            WHEN dummy.id = 2 THEN res.old_version
       END VERSION 
FROM   (SELECT ge.student_id,
               ge.course_id,
               ge.enrollment_type new_enrollment_type,
               p.enrollment_type old_enrollment_type,
               p.mark,
               p.status,
               p.version old_version,
               nvl(p.version + 1, 0) new_VERSION
                 -- n.b. this may produce duplicates or unique constraint errors in a concurrent environment
        FROM   group_enrollments ge
               LEFT OUTER JOIN PERFORMANCE p ON ge.student_id = p.student_id
                                                AND ge.course_id = p.course_id
        WHERE  (p.status = 'ACTIVE' OR p.status IS NULL)
        AND    (p.enrollment_type != ge.enrollment_type OR p.enrollment_type IS NULL)) res
        INNER JOIN (SELECT 1 ID FROM dual UNION ALL
                    SELECT 2 ID FROM dual) dummy ON dummy.id = 1
                                                    OR (dummy.id = 2 
                                                        AND res.status = 'ACTIVE');

STUDENT_ID  COURSE_ID ENROLLMENT_TYPE       MARK STATUS     VERSION
---------- ---------- --------------- ---------- ------- ----------
      1234       5678 Mandatory2              70 ACTIVE           3
      1234       9034 Optional                   ACTIVE           0
      1234       5678 Mandatory               70 HISTORY          2

此查询首先查找所有全新的行(即group_enrollment表中的行在performance表中没有行)或具有不同enrollment_type的行。这些是需要插入或更新的行。
一旦我们知道了这一点,我们就可以连接一个虚拟的2行表,这样我们就可以连接到第一个虚拟行,不管我们是需要插入还是更新,但是如果我们需要更新,我们只会连接到第二个虚拟行。这意味着我们只有一行用于插入,但是两行用于更新。
然后很容易根据www.example.com输出正确的值dummy.id(第一个虚拟行是新值,第二个虚拟行是旧值)。
一旦我们完成了这些,我们就知道哪些数据需要合并到性能表中,所以现在merge语句看起来像这样:

merge into performance tgt
  using (SELECT res.student_id,
                res.course_id,
                CASE WHEN dummy.id = 1 THEN res.new_enrollment_type
                     WHEN dummy.id = 2 THEN res.old_enrollment_type
                END enrollment_type,
                res.mark,
                CASE WHEN dummy.id = 1 THEN 'ACTIVE'
                     WHEN dummy.id = 2 THEN 'HISTORY'
                END status,
                CASE WHEN dummy.id = 1 THEN res.new_version
                     WHEN dummy.id = 2 THEN res.old_version
                END VERSION 
         FROM   (SELECT ge.student_id,
                        ge.course_id,
                        ge.enrollment_type new_enrollment_type,
                        p.enrollment_type old_enrollment_type,
                        p.mark,
                        p.status,
                        p.version old_version,
                        nvl(p.version + 1, 0) new_VERSION
                          -- n.b. this may produce duplicates or unique constraint errors in a concurrent environment
                 FROM   group_enrollments ge
                        LEFT OUTER JOIN PERFORMANCE p ON ge.student_id = p.student_id
                                                         AND ge.course_id = p.course_id
                 WHERE  (p.status = 'ACTIVE' OR p.status IS NULL)
                 AND    (p.enrollment_type != ge.enrollment_type OR p.enrollment_type IS NULL)) res
                 INNER JOIN (SELECT 1 ID FROM dual UNION ALL
                             SELECT 2 ID FROM dual) dummy ON dummy.id = 1
                                                             OR (dummy.id = 2 
                                                                 AND res.status = 'ACTIVE')) src
    ON (tgt.student_id = src.student_id AND tgt.course_id = src.course_id AND tgt.status = src.status)
WHEN MATCHED THEN
  UPDATE SET tgt.enrollment_type = src.enrollment_type,
             tgt.version = src.version
WHEN NOT MATCHED THEN
  INSERT (tgt.student_id, tgt.course_id, tgt.enrollment_type, tgt.mark, tgt.status, tgt.version)
  VALUES (src.student_id, src.course_id, src.enrollment_type, src.mark, src.status, src.version);

为了澄清的目的,这里有一个非常简单的行的条件复制的例子(我们也可以称之为部分交叉连接,因为一个表中的所有行都至少连接到另一个表中的一行):

WITH sample_data AS (SELECT 100 ID, NULL status FROM dual UNION ALL -- expect only one row
                     SELECT 101 ID, 'A' status FROM dual UNION ALL -- expect two rows
                     SELECT 102 ID, 'B' status FROM dual -- expect only one row
                    )
SELECT dummy.id dummy_row_id,
       sd.id,
       sd.status
FROM   sample_data sd
       INNER JOIN (SELECT 1 ID FROM dual UNION ALL
                   SELECT 2 ID FROM dual) dummy ON dummy.id = 1
                                                   OR (dummy.id = 2 
                                                       AND sd.status = 'A')
ORDER BY sd.id, dummy.id;

DUMMY_ROW_ID         ID STATUS
------------ ---------- ------
           1        100 
           1        101 A
           2        101 A
           1        102 B

可以看到,对于sample_data“table”中id=101的行,我们有两行,但其他两个id各只有一行。
希望这能让你明白一些事情?

相关问题