R语言 根据与该特定日期对应的有效值将日期转换为NA

ax6ht2ek  于 2023-01-03  发布在  其他
关注(0)|答案(2)|浏览(166)

假设我有

> df
    fu1_date fu1_n_symp   fu5_date fu5_n_symp   fu7_date fu7_n_symp
1 2012-03-05          1 2014-03-05         NA 2016-03-05          1
2 2013-08-09          1 2015-10-09          2 2017-11-09         NA
3 2019-05-05          1 2020-06-07          2 2021-07-09          2

df表示一个非常大的 Dataframe ,在这个例子中,我记录了不同随访日期fu_date的症状数量n_symp
我的 Dataframe fu1_fu2_,...,fu20_中的每一行最多有20个后续。我需要更正我的 Dataframe ,以便如果n_sympNA,则对应的fuX_date应从as.Date()转换为NA
您可以看到,row 1在随访5(fu5_n_symp == NA)中有缺失值,但在FU1或FU7中没有。因此,第1行中的fu5_date应从2014-03-05转换为NA
我正在寻找一个解决方案,在dplyr只。

    • 预期输出**
> df
    fu1_date fu1_n_symp   fu5_date fu5_n_symp   fu7_date fu7_n_symp
1 2012-03-05          1       <NA>         NA 2016-03-05          1
2 2013-08-09          1 2015-10-09          2       <NA>         NA
3 2019-05-05          1 2020-06-07          2 2021-07-09          2

数据类型

df <- structure(list(fu1_date = structure(c(15404, 15926, 18021), class = "Date"), 
    fu1_n_symp = c(1L, 1L, 1L), fu5_date = structure(c(16134, 
    16717, 18420), class = "Date"), fu5_n_symp = c(NA, 2L, 2L
    ), fu7_date = structure(c(16865, 17479, 18817), class = "Date"), 
    fu7_n_symp = c(1L, NA, 2L)), class = "data.frame", row.names = c(NA, -3L))
wj8zmpe1

wj8zmpe11#

使用pivot_longer(),您可以指定".value" to names_to以成对堆叠daten_symp。在这种情况下,必须提供names_sepnames_pattern之一以指定应如何拆分列名。然后,您可以轻松地将缺少n_symp的日期替换为NA。最后,将长数据旋转得更宽以获得原始格式。

library(dplyr)
library(tidyr)

df %>%
  mutate(id = 1:n()) %>% 
  pivot_longer(-id, names_to = c("fu", ".value"), names_sep = "(?<=\\d)_") %>%
  mutate(date = replace(date, is.na(n_symp), NA)) %>%
  pivot_wider(names_from = fu, values_from = c(date, n_symp),
              names_glue = "{fu}_{.value}", names_vary = "slowest")

# # A tibble: 3 × 7
#      id fu1_date   fu1_n_symp fu5_date   fu5_n_symp fu7_date   fu7_n_symp
#   <int> <date>          <int> <date>          <int> <date>          <int>
# 1     1 2012-03-05          1 NA                 NA 2016-03-05          1
# 2     2 2013-08-09          1 2015-10-09          2 NA                 NA
# 3     3 2019-05-05          1 2020-06-07          2 2021-07-09          2

pivot_wider()中的names_vary控制组合结果列名的顺序。

  • “最快”(默认)
fu1_date   fu5_date   fu7_date   fu1_n_symp fu5_n_symp fu7_n_symp
  • “最慢”
fu1_date   fu1_n_symp fu5_date   fu5_n_symp fu7_date   fu7_n_symp
mbyulnm0

mbyulnm02#

更新:@Darren Tsai输入后调整代码:

df %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(-id,
               names_to = "key",
               values_to = "val", 
               values_transform = list(val = as.character)) %>% # change all to character class
  mutate(val = ifelse(is.na(lead(val, default = val[1])), NA_character_, val)) %>% 
  pivot_wider(names_from= key, values_from = val) %>% 
  type_convert() %>% 
  select(-id)

下面是一种方法,使用透视:
x一个一个一个一个x一个一个二个x

相关问题