我有一个长格式的数据集。数据集中的每个对象都被观察了很多次(从1到3)。根据受试者的可用性,从周一到周五每天进行一次观察。如果病例观察少于3天,我需要为缺失的观察生成空行(s)如果一个事例在星期一和星期五被观察到,那么第三个观察应该随机出现在星期二、星期三或星期四。下面的代码运行良好,除了一个基本缺陷:当日期是随机产生的,它们可能与受试者已经被观察的日期重叠。2这不应该发生。3如果我的解决方案太复杂而无法编辑,请随意建议完全不同的代码。4谢谢!
#Toy dataset
dataset_long <- data.frame(
id = c(1, 1, 2, 2, 2, 3, 3, 4, 5, 5),
observation = c(1, 2, 1, 2, 3, 1, 2, 1, 1, 2),
day_name = c("Monday", "Tuesday", "Monday", "Wednesday", "Thursday", "Tuesday", "Thursday", "Wednesday", "Monday", "Friday"),
scores = sample(20:60, 10)
)
# case observation day scores
#1 1 1 Monday 32
#2 1 2 Tuesday 31
#3 2 1 Monday 29
#4 2 2 Wednesday 28
#5 2 3 Thursday 22
#6 3 1 Tuesday 45
#7 3 2 Thursday 30
#8 4 1 Wednesday 36
#9 5 1 Monday 58
#10 5 2 Friday 37
#Identify cases with fewer than 3 observations
cases_to_fill <- dataset_long %>%
group_by(case) %>%
summarize(days_observed = n()) %>%
filter(days_observed < 3) %>%
ungroup()
#Create vector with day names
all_days <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
#Fill the missing observation day randomly
cases_to_fill_long <- cases_to_fill %>%
mutate(missing_day = map(days_observed, ~sample(all_days, 3 - .x))) %>%
unnest(missing_day) %>%
mutate(observation = 3)
#Join the filled cases with the original dataset
dataset_long_filled <- dataset_long %>%
full_join(cases_to_fill_long, by = c("case", "observation")) %>%
arrange(case, observation)
#Coalesce the two columns of day into one
dataset_long_filled |>
mutate(day = coalesce(day, missing_day)) |>
select(-days_observed, -missing_day)
# case observation day scores
#1 1 1 Monday 32
#2 1 2 Tuesday 31
#3 1 3 Friday NA
#4 2 1 Monday 29
#5 2 2 Wednesday 28
#6 2 3 Thursday 22
#7 3 1 Tuesday 45
#8 3 2 Thursday 30
#9 3 3 Tuesday NA Tuesday is repeated for this subject
#10 4 1 Wednesday 36
#11 4 3 Wednesday NA Wednesday is repeated for this subject
#12 4 3 Monday NA
#13 5 1 Monday 58
#14 5 2 Friday 37
#15 5 3 Friday NA Friday is repeated for this subject
1条答案
按热度按时间uwopmtnx1#
您可以尝试: