R语言 为落在某个日期间隔内的条目设置数据集宽格式

gab6jxml  于 2023-04-27  发布在  其他
关注(0)|答案(1)|浏览(82)

我正试图弄清楚如何使多个客户订单放置在120天宽格式为每个客户。如果一个客户有订单放置在几年的过程中,那么有多个间隔120天,所以每个客户可能有多行,其中每一行对应于从索引日期起120天的间隔。索引日期(s)将是从上一个索引订单日期开始的120天间隔之外的第一个订单日期。谢谢!
我有一个dataframe:

customerid <- c("A1", "A1", "A1", "A1", "A1", "A2", "A2", "A2")
orderid <- c("1", "2", "3", "4", "5", "6", "7", "8")
orderdate <- c("2020-05-19", "2020-09-08", "2020-09-16", "2020-12-21", "2021-01-03", "2020-08-21","2020-11-22","2021-02-01")
df <- data.frame(customerid, orderid, orderdate)

结果应为:

谢谢大家!

x6yk4ghg

x6yk4ghg1#

我认为最简单的方法是:
1.建立一个索引数据框,显示每个客户的第一个订单
1.对于每个订单,计算出订单所在的“120天间隔”
1.在120天间隔内获取不同的订单号
1.使用customeridint120作为ID列,将该数据透视得更宽

customerid <- c("A1", "A1", "A1", "A1", "A1", "A2", "A2", "A2")
orderid <- c("1", "2", "3", "4", "5", "6", "7", "8")
orderdate <- as.Date(c("2020-05-19", "2020-09-08", "2020-09-16", "2020-12-21", "2021-01-03", "2020-08-21","2020-11-22","2021-02-01"))
df <- data.frame(customerid, orderid, orderdate)

indexes <- df %>% 
  slice(which.min(orderdate),.by = customerid) %>% 
  select(customerid,indexdate = orderdate) # create index df

df %>% 
  left_join(indexes, by = "customerid") %>% 
  mutate(days_diff = orderdate - indexdate,
         int120 = floor(days_diff/120)) %>% 
  group_by(customerid,int120) %>% 
  mutate(order_in_int120 = row_number()) %>% 
  ungroup() %>% 
  pivot_wider(id_cols = c("customerid","int120"),
              names_from = order_in_int120,
              values_from = c("orderid","orderdate"))

相关问题