我有样本数据,是加载到本地Spark连接。我正在尝试计算一个设备标识在过去10分钟内滚动显示/关联的位置标识数。下面是一个例子来解释我的数据任务。
样本数据
sample_df<-data.frame(local_date=c('2021-03-22 10:01:00','2021-03-22 10:01:00','2021-03-22 10:03:00','2021-03-22 10:04:00',
'2021-03-22 10:04:00','2021-03-22 10:01:00','2021-03-22 10:06:00','2021-03-22 10:07:00'),
location_id=c("x","y","z","x","y","x","y","x"),device_id=c("a","a","a","a","a","b","b","b"))
加载到spark
df<-copy_to(sc,sample_df,"sample_df")
预期结果
expected_df<-data.frame(local_date=c('2021-03-22 10:01:00','2021-03-22 10:02:00','2021-03-22 10:03:00','2021-03-22 10:04:00',
'2021-03-22 10:01:00','2021-03-22 10:02:00','2021-03-22 10:03:00','2021-03-22 10:04:00',
'2021-03-22 10:05:00','2021-03-22 10:06:00','2021-03-22 10:07:00'),
no_locations=c(2,2,3,3,1,1,1,1,1,2,2),device_id=c("a","a","a","a","b","b","b","b","b","b","b"))
暂无答案!
目前还没有任何答案,快来回答吧!