如何基于sparklyr/mysql中的滚动行和列值计算唯一值(给出了示例)?

falq053o  于 2021-07-09  发布在  Spark
关注(0)|答案(0)|浏览(242)

我有样本数据,是加载到本地Spark连接。我正在尝试计算一个设备标识在过去10分钟内滚动显示/关联的位置标识数。下面是一个例子来解释我的数据任务。

样本数据

sample_df<-data.frame(local_date=c('2021-03-22 10:01:00','2021-03-22 10:01:00','2021-03-22 10:03:00','2021-03-22 10:04:00',
                                 '2021-03-22 10:04:00','2021-03-22 10:01:00','2021-03-22 10:06:00','2021-03-22 10:07:00'),
                    location_id=c("x","y","z","x","y","x","y","x"),device_id=c("a","a","a","a","a","b","b","b"))

加载到spark

df<-copy_to(sc,sample_df,"sample_df")

预期结果

expected_df<-data.frame(local_date=c('2021-03-22 10:01:00','2021-03-22 10:02:00','2021-03-22 10:03:00','2021-03-22 10:04:00',
                               '2021-03-22 10:01:00','2021-03-22 10:02:00','2021-03-22 10:03:00','2021-03-22 10:04:00',
                               '2021-03-22 10:05:00','2021-03-22 10:06:00','2021-03-22 10:07:00'),
                  no_locations=c(2,2,3,3,1,1,1,1,1,2,2),device_id=c("a","a","a","a","b","b","b","b","b","b","b"))

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题