R语言 指定满足条件的上面第一行的值

cbjzeqam  于 2023-06-03  发布在  其他
关注(0)|答案(1)|浏览(329)

使用dplyr,我如何为每个状态为“已发送”的act_id获取前一个状态为“自动”或“已接收”的act_id的creation_date?
在每个参照组中,每个状态“已发送”必须找到在同一列(状态)中检测到的第一个“已接收”或“自动”的创建日期。
数据和示例

data = data.frame(

  creation_date = as.POSIXct(c("2023-05-01", "2023-05-02", "2023-05-03", "2023-05-04", "2023-05-05")),

  end_date = as.POSIXct(c("2023-05-06", "2023-05-07", "2023-05-08", "2023-05-09", "2023-05-10")),

  status = c("Automatic", "Received", "Sent", "Sent", "Received"),

  reference = c("A", "A", "A", "B", "B"),

  act_id = c(1, 2, 3, 4, 5)

)

我尝试创建一个新列(“position”),它给我row_number,其中状态是“Automatic”和“Received”,然后检查满足条件的第一行以获取其creation_date:

data %>%
  mutate(posicion = ifelse((status == "Automatic" | status == "Received"), row_number(), NA_integer_)) %>%

  group_by(reference) %>%

  mutate(creation_date_2 = sapply(seq_along(status),

                               function(i){

                                 if(status[i] == "Sent")  return(as.POSIXct(creation_date[max(which(position < i))]))}                           

))

但我不确定这是最正确/有效的方法,也不知道我在哪里犯了错误。
我也试过:

data %>%
 group_by(reference) %>%
  mutate(creation_date_2 = if_else(
               status == "Sent",
               creation_date[max(position[status %in% c("Automatic", "Received")] > row_number(), na.rm = TRUE)],
as.POSIXct(NA)))

在本例中,它可以工作,但它为同一引用中的所有act_id提供了相同的creation_date。
感谢您的评分

lpwwtiir

lpwwtiir1#

好吧,我可以用一个使用for循环的函数来解决这个问题,这不是我在R中最喜欢的方法,但你可能会想出一个apply风格的替代方案。

look_back <- function(df, id) {
  if(id<2) return(as.Date(NA))
  for (i in (id-1):1) {
    if (df$status[i] %in% c('Automatic', 'Received')) return(df$creation_date[i])
  }
  # if the loop finishes without finding any previous Automatic/Received entries, return NA
  return(as.Date(NA))
}

group_by(data, act_id) %>% 
  mutate(sent_date=case_when(
    status=='Sent' ~ look_back(data, act_id),
    TRUE ~ as.Date(NA)))

这给出了一个很好的干净输出:

# A tibble: 5 x 6
# Groups:   act_id [5]
  creation_date       end_date            status    reference act_id sent_date          
  <dttm>              <dttm>              <chr>     <chr>      <dbl> <dttm>             
1 2023-05-01 00:00:00 2023-05-06 00:00:00 Automatic A              1 NA                 
2 2023-05-02 00:00:00 2023-05-07 00:00:00 Received  A              2 NA                 
3 2023-05-03 00:00:00 2023-05-08 00:00:00 Sent      A              3 2023-05-02 00:00:00
4 2023-05-04 00:00:00 2023-05-09 00:00:00 Sent      B              4 2023-05-02 00:00:00
5 2023-05-05 00:00:00 2023-05-10 00:00:00 Received  B              5 NA

相关问题