配置单元sql中带条件的数据库采样结果

o2rvlv0m  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(349)

我有一个没有主键的表,它是按日期分区的;像这样的列:

1. user_id  
2. device
3. region
4. datetime
5. and other columns

它包含用户从网站游戏中生成的事件,它们每秒钟触发一次。我想返回一个批处理,其中包含前6个用户(表的顶部)在当天生成的所有事件(包括重复行),这些用户检查以下条件:
地区=美国

- one user from iOS
- one user from android
- one user from PC

地区=欧盟

- one user from iOS
- one user from android
- one user from PC

你能提供一个我应该从哪里开始的示例代码吗?我的一个朋友提出了一些关于rank()的建议,但我从未使用过。
谢谢您!

SELECT * FROM 
    (SELECT user_id, 
    event_post_time, 
    device, 
    region, 
    COUNT(DISTINCT player_id) over (partition by player_id) as ct_pid, 
    COUNT(DISTINCT region) over (partition by region) as ct_region, 
    COUNT(DISTINCT device) over (partition by device) as ct_device 
    FROM events 
    WHERE event_post_time = current_date() 
    AND region IN ('EU','US') 
    AND device IN ('ios','android','pc')) e 
WHERE ct_pid <= 6 
AND ct_region <= 2 
AND ct_device <= 3 
ORDER BY player_id

在sqlfiddle中添加虚拟数据和预期输出:

user_id device region date_generated
1  ios  EU  22-05-18
  1  ios  EU  22-05-18
  1  ios  EU  22-05-18
  4  ios  US  22-05-18
  4  ios  US  22-05-18
  2  android  EU  22-05-18
  2  android  US  22-05-18
  4  pc  EU  22-05-18
  4  pc  EU  22-05-18
  4  pc  EU  22-05-18
  5  pc  US  22-05-18
omvjsjqw

omvjsjqw1#

也许,这就是你要找的。

select * from (
select rank() over (partition by region,device order by cn desc) as
top_num,player_id, region,device,cn from 
(
select count(*) as cn , player_id,region,device from 
test_table group by player_id,region,device 
)l
)t 
where top_num = 1;

如果这有帮助,请告诉我。
op edit:我使用您提供的查询成功地实现了我想要的功能;这是最后一个

WITH combo 
 AS (SELECT user_id, 
            region, 
            device 
     FROM   (SELECT Rank() 
                      OVER ( 
                        partition BY region, device 
                        ORDER BY cn DESC) AS top_num, 
                    user_id, 
                    region, 
                    device, 
                    cn 
             FROM   (SELECT Count(*) AS cn, 
                            user_id, 
                            region, 
                            device 
                     FROM   samples 
                     GROUP  BY user_id, 
                               region, 
                               device)l)t 
     WHERE  top_num = 1) 
SELECT s.user_id, 
   s.region, 
   s.device 
FROM   samples s 
   JOIN combo 
     ON s.user_id = combo.user_id 
        AND s.region = combo.region 
        AND s.device = combo.device

相关问题