R:在其他列上使用minif在 Dataframe 中生成变量

eni9jsuy  于 2023-01-28  发布在  其他
关注(0)|答案(1)|浏览(108)

我有一个 Dataframe ,其中包含一个表示观察日期的变量(x1月0n1x),ID(Loan.Number)和状态指示器(一至八)(Current.Delinquency.Code)。我想计算一个新变量,该变量仅在任何ID的状态指示符第一次超过3时为真。在Excel中,我将写入=[Portfolio.Date]=minif([Portfolio.Date], [Loan.Number], [@Loan.Number], [Current.Delinquency.Code], ">3"),但我不知道在R中怎么做。有人能帮我吗?
非常感谢!
我所寻找的是下面示例Data中[Delinquent]列的公式,当"Loan.Number"的"Current.Delinquency.Code"观察值第一次大于3时,该公式跳转到"TRUE"。

Portfolio.Date  Loan.Number  Current.Delinquency.Code  Delinquent
    2022/01/01      1            1                     FALSE
    2022/02/01      1            4                     TRUE
    2022/03/01      1            4                     FALSE
    2022/04/01      1            4                     FALSE
    2022/01/01      2            1                     FALSE
    2022/02/01      2            1                     FALSE
    2022/03/01      2            1                     FALSE
    2022/04/01      2            1                     FALSE
    2022/01/01      3            1                     FALSE
    2022/02/01      3            3                     FALSE
    2022/03/01      3            4                     TRUE
    2022/04/01      3            4                     FALSE
toiithl6

toiithl61#

如果我理解正确话,这是一种可能的解决方案:

library(dplyr)

# to make it reproducable:
set.seed(1)
# sample data
df <- data.frame(Portfolio.Date = seq.Date(from = as.Date("2022-01-01"), to = as.Date("2022-01-30"), by = "days"),
                 Loan.Number = rep(c(1,2), 15),
                 Current.Delinquency.Code = sample(1:8, size = 30, replace = TRUE)) %>%

# group by Loan.Number
dplyr::group_by(df, Loan.Number) %>%
    # order by Portfolio.Date
    dplyr::arrange(Portfolio.Date) %>%
    # check the condition and make a cumsum of it, returning only those where cumsum is 1 (frist occurence)
    dplyr::mutate(nc = ifelse(Current.Delinquency.Code > 3 & cumsum(Current.Delinquency.Code > 3) == 1, 1, 0)) %>%
    # ungroup to prevent unwanted behaviour down stream
    dplyr::ungroup()

# A tibble: 30 x 4
   Portfolio.Date Loan.Number Current.Delinquency.Code    nc
   <date>               <dbl>                    <int> <dbl>
 1 2022-01-01               1                        1     0
 2 2022-01-02               2                        4     1
 3 2022-01-03               1                        7     1
 4 2022-01-04               2                        1     0
 5 2022-01-05               1                        2     0
 6 2022-01-06               2                        5     0
 7 2022-01-07               1                        7     0
 8 2022-01-08               2                        3     0
 9 2022-01-09               1                        6     0
10 2022-01-10               2                        2     0
# ... with 20 more rows
# i Use `print(n = ...)` to see more rows

相关问题