使用groupby和having优化复杂sql查询

7y4bm7vi  于 2021-07-26  发布在  Java
关注(0)|答案(2)|浏览(223)

我遇到一个查询太慢(大约2-3分钟!)的问题。
完整的查询是

  1. SELECT Count(DISTINCT activitytr0_.libraryid) AS col_0_0_
  2. FROM mma_activitytrackerlibrary activitytr0_
  3. inner join mma_activitytracker activities1_
  4. ON activitytr0_.libraryid = activities1_.libraryid
  5. WHERE ( activitytr0_.channel NOT IN ( 'Classica', 'VOD Contents', 'FOX' ) )
  6. AND ( activities1_.phase IN ( 'Quality control', 'Spot check' ) )
  7. AND ( activities1_.result NOT IN ( 'Fail', 'TBC' ) )
  8. AND ( activitytr0_.tx_date IS NULL
  9. OR activitytr0_.tx_date >= DATE '2020-07-13' )
  10. AND ( activitytr0_.ident_tx IN (SELECT channelsch3_.clipid
  11. FROM mma_channelscheduleevents channelsch3_
  12. GROUP BY channelsch3_.clipid
  13. HAVING Min(channelsch3_.eventstarttime) >= DATE '2020-07-13')
  14. OR ( activitytr0_.tx_date IS NULL
  15. AND activitytr0_.ident_tx NOT IN (SELECT channelsch4_.clipid
  16. FROM mma_channelscheduleevents channelsch4_) ) )
  17. AND activitytr0_.trash <> 'Y'

“剪切”查询,最慢的部分是:

  1. SELECT channelsch3_.clipid
  2. FROM mma_channelscheduleevents channelsch3_
  3. GROUP BY channelsch3_.clipid
  4. HAVING Min(channelsch3_.eventstarttime) >= DATE '2020-07-13')

这个查询大约需要21秒,在一个有46000000行的表上,索引位于 clipid 在街上 eventstarttime 字段,并返回大约2000行
请记住,这是一个子查询,它是 field not in (...) 条款
我不知道如何优化查询,请记住 clipid 可以复制,我只想要 clipid 那里甚至没有一个相关的 eventstarttime 在“过去”
所以,对于这个简单的例子

  1. +--------+----------------+
  2. | clipid | eventstarttime |
  3. +--------+----------------+
  4. | A | 2020-01-01 |
  5. | A | 2021-01-01 |
  6. | B | 2020-01-01 |
  7. | C | 2021-01-01 |
  8. +--------+----------------+

我只想要一排,带 C ---应sayan malakshinov的要求

  1. select * from user_tab_col_statistics where table_name='MMA_CHANNELSCHEDULEEVENTS'

退货

  1. TABLE_NAME |COLUMN_NAME |NUM_DISTINCT|LOW_VALUE |HIGH_VALUE |DENSITY|NUM_NULLS|NUM_BUCKETS|LAST_ANALYZED |SAMPLE_SIZE|GLOBAL_STATS|USER_STATS|AVG_COL_LEN|HISTOGRAM|
  2. -------------------------|-----------------------|------------|-----------|-----------|-------|---------|-----------|-------------------|-----------|------------|----------|-----------|---------|
  3. MMA_CHANNELSCHEDULEEVENTS|EVENTID | 1T ]4 T ]4 | 1| 0| 1|2020-07-13 22:00:08| 1|YES |NO | 7|NONE |
  4. MMA_CHANNELSCHEDULEEVENTS|SCHEDULEID | 1 Z Z | 1| 0| 1|2020-07-13 22:00:08| 1|YES |NO | 4|NONE |
  5. MMA_CHANNELSCHEDULEEVENTS|EVENTTYPE | 1| | | 1| 0| 1|2020-07-13 22:00:08| 1|YES |NO | 2|NONE |
  6. MMA_CHANNELSCHEDULEEVENTS|EVENTSTARTTIME | 1|xx |xx | 1| 0| 1|2020-07-13 22:00:08| 1|YES |NO | 8|NONE |
  7. MMA_CHANNELSCHEDULEEVENTS|EVENTENDTIME | 1|xx |xx | 1| 0| 1|2020-07-13 22:00:08| 1|YES |NO | 8|NONE |
  8. MMA_CHANNELSCHEDULEEVENTS|EVENTACTUALSTARTFRAMES | 1 = = | 1| 0| 1|2020-07-13 22:00:08| 1|YES |NO | 4|NONE |
  9. MMA_CHANNELSCHEDULEEVENTS|EVENTSECEVENTOFFSET | 0| | | 0| 1| 0|2020-07-13 22:00:08| |YES |NO | 0|NONE |
  10. MMA_CHANNELSCHEDULEEVENTS|CLIPID | 1|THL50883 |THL50883 | 1| 0| 1|2020-07-13 22:00:08| 1|YES |NO | 9|NONE |
  11. MMA_CHANNELSCHEDULEEVENTS|SEGMENTNUMBER | 0| | | 0| 1| 0|2020-07-13 22:00:08| |YES |NO | 0|NONE |
  12. MMA_CHANNELSCHEDULEEVENTS|SCHEDULEDTITLE | 0| | | 0| 1| 0|2020-07-13 22:00:08| |YES |NO | 0|NONE |
  13. MMA_CHANNELSCHEDULEEVENTS|SCHEDULEDSOM | 0| | | 0| 1| 0|2020-07-13 22:00:08| |YES |NO | 0|NONE |
  14. MMA_CHANNELSCHEDULEEVENTS|SCHEDULEDDURATION | 1|00:00:00:00|00:00:00:00| 1| 0| 1|2020-07-13 22:00:08| 1|YES |NO | 12|NONE |
  15. MMA_CHANNELSCHEDULEEVENTS|SCHEDULEDDURATIONFRAMES| 0| | | 0| 1| 0|2020-07-13 22:00:08| |YES |NO | 0|NONE |
  16. MMA_CHANNELSCHEDULEEVENTS|EVENTCONTROLCODES | 0| | | 0| 1| 0|2020-07-13 22:00:08| |YES |NO | 0|NONE |
  17. MMA_CHANNELSCHEDULEEVENTS|RECONCILEKEY | 0| | | 0| 1| 0|2020-07-13 22:00:08| |YES |NO | 0|NONE |
  18. MMA_CHANNELSCHEDULEEVENTS|SCHEDULEDSOMBCD | 0| | | 0| 1| 0|2020-07-13 22:00:08| |YES |NO | 0|NONE |
  19. MMA_CHANNELSCHEDULEEVENTS|SCHEDULEDDURATIONBCD | 0| | | 0| 1| 0|2020-07-13 22:00:08| |YES |NO | 0|NONE |
j5fpnvbx

j5fpnvbx1#

我们无法优化您的原始查询,因为您没有显示它,而且我们也不知道统计数据的实际执行计划,但我可以建议您改进您显示给我们的部分的一种方法:
您需要在(eventstarttime,clipid)上创建索引
您可以这样处理查询:

  1. SELECT DISTINCT c.clipid
  2. FROM mma_channelscheduleevents c
  3. WHERE c.eventstarttime>=date'2020-07-13'
  4. and not exists (select 1
  5. from mma_channelscheduleevents c2
  6. where c2.clipid = c.clipid and c2.eventstarttime < date'2020-07-13'
  7. );

如果 eventstarttime>=date'2020-07-13' 远小于行数 eventstarttime<date'2020-07-13' .

uemypmqf

uemypmqf2#

您可能会发现,在聚合之前进行筛选会更快。所以值得一试:

  1. SELECT DISTINCT c.clipid
  2. FROM mma_channelscheduleevents c
  3. WHERE NOT EXISTS (SELECT 1
  4. FROM mma_channelscheduleevents c2
  5. WHERE c2.clipid = c.clipid AND c2.eventstarttime) DATE '2020-07-13'
  6. );

为了提高性能,您需要一个索引 mma_channelscheduleevents(clip_id, eventstarttime) .
这值得一试。不过,我不确定会不会更好。
编辑:
大概,你有另一张“剪辑”表。如果是,我建议:

  1. SELECT cl.clipid
  2. FROM clips cl
  3. WHERE NOT EXISTS (SELECT 1
  4. FROM mma_channelscheduleevents c2
  5. WHERE c2.clipid = cl.clipid AND
  6. c2.eventstarttime < DATE '2020-07-13'
  7. );
展开查看全部

相关问题