公共健康Epi -将自定义函数应用于日期列,而不使用djr::rowwise()

nr7wwzry  于 2024-01-03  发布在  其他
关注(0)|答案(2)|浏览(100)

首先,非常感谢stackoverflow社区。你总是能够回答我的问题,而我没有问一个问题。我终于坚持写一个函数,并希望任何见解。
工作背景:我是一名从事病毒性呼吸道疾病监测的流行病学家。我们通常按照“季节”来查看数据,季节从10月第一周的周日开始,并在接下来的一年中开始。例如,2023-2024年季节从2023年1月10日开始,将于2024年9月28日结束。
我想做的是:下面的函数可以在一个日期上工作,或者在一个嵌套框架中使用dplyr::rowwise(),但是有没有一种方法可以在不使用dplyr::rowwise()的情况下将函数应用于嵌套框架中的一个列?

test_func <- function(date){
  
  if(class(date)!="Date"){
    stop("Input not in date format.")
  }
  
  year = as.numeric(strftime(date, "%Y"))
  days = date - as.Date(sprintf("%s-10-01", year))
  
  oct_start = date - days
  
  #1 = monday, 7 = sunday
  
  wkday = as.numeric(strftime(oct_start, "%u"))
  
  #10/1/2022 saturday, starts 10/2/2022
  #10/1/2021 friday, starts 10/3/2021
  #10/1/2020 thursday, starts 10/4/2020
  
  if(wkday %in% 1:6){
    season_start = oct_start - wkday
  } else{
    season_start = oct_start
  }
  
  season_end = season_start + 364
  
  if(date < season_start){
    season_start = season_start - 364
    season_end = season_start + 364
  }
  
  year1 = as.numeric(strftime(season_start, "%Y"))
  year2 = as.numeric(strftime(season_end, "%Y"))
  
  return(paste(year1,year2, sep = "-"))
}

字符串
test_func(as.Date("2024-09-28"))产生“2023-2024”,这是正确的。
如果我将相同的函数应用于日期向量,我会得到条件的长度> 1错误。
date = seq.Date(from = as.Date("2023-10-01"), to = as.Date("2023-10-31"), by = 1)test_func(date)
更新:感谢您的回复,特别是关于维护向量化的回复。使用ifelse时,日期会有点不稳定,因此我不得不引入一个帮助函数来维护类。下面是我的最后一个函数:

assign_season <- function(date){

  if(class(date)!="Date"){
    stop("Input not in date format.")
  }

  init_week_num = OCepi::to_mmwr_date(OCepi::week_ending_date(date), "week")

  adjust = ifelse(init_week_num %in% c(40:53), init_week_num - 40, 40 - init_week_num) * 7

  # init_season_start = OCepi::week_ending_date(date + adjust) - 6
  init_season_start = safe.ifelse(
    init_week_num %in% c(40:53),
    OCepi::week_ending_date(date - adjust) - 6,
    OCepi::week_ending_date(date + adjust) - 6
  )

  season_start = safe.ifelse(date < init_season_start, init_season_start - 364, init_season_start)

  season_end = season_start + 363

  year1 = as.numeric(strftime(season_start, "%Y"))
  year2 = as.numeric(strftime(season_end, "%Y"))

  return(paste(year1,substr(year2,3,4), sep = "-"))
}

ldxq2e6h

ldxq2e6h1#

一种不加修改的方法是:

# vector of dates
date <- seq.Date(
    from = as.Date("2024-09-25"),
    to = as.Date("2024-10-03"),
    by = 1
)

# create data.frame
df <- data.frame(date = date)

# base
df$out <- sapply(df$date, test_func)

个字符
使用dirr:

# dplyr
df <- dplyr::mutate(df, out = sapply(date, test_func))

sdnqo3pr

sdnqo3pr2#

数字向量可以与布尔值相乘,

> c(1, 2, 3)*c(c(1, 2, 3) %in% 1:2)
[1] 1 2 0

字符串
我们可以在你的工作中加以利用

test_func <- function(date) {
  if (class(date) != "Date") {
    stop("Input not in date format.")
  }
  year <- as.numeric(strftime(date, "%Y"))
  days <- date - as.Date(sprintf("%s-10-01", year))
  oct_start <- date - days
  ## 1=monday, 7=sunday
  wkday <- as.numeric(strftime(oct_start, "%u"))
  ## 10/1/2022 saturday, starts 10/2/2022
  ## 10/1/2021 friday, starts 10/3/2021
  ## 10/1/2020 thursday, starts 10/4/2020
  ## FIX 1 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  # if (wkday %in% 1:6) {
  #   season_start <- oct_start - wkday
  # } else {
  #   season_start <- oct_start
  # }
  working_day <- wkday %in% 1:6
  season_start <- oct_start - wkday*working_day
  ## +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  season_end <- season_start + 364
  ## FIX 2 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  # if (date < season_start) {
  #   season_start <- season_start - 364
  #   season_end <- season_start + 364
  # }
  pre_season <- date < season_start
  season_start <- season_start - 364*pre_season
  season_end <- season_start + 364*pre_season
  ## +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  year1 <- as.numeric(strftime(season_start, "%Y"))
  year2 <- as.numeric(strftime(season_end, "%Y"))
  return(paste(year1, year2, sep="-"))
}

用法

test_func(as.Date("2024-09-28"))
# [1] "2023-2024"

date <- seq.Date(from=as.Date("2023-12-24"), to=as.Date("2024-01-10"), by=1) 
test_func(date)
#  [1] "2023-2023" "2023-2023" "2023-2023" "2023-2023" "2023-2023" "2023-2023" "2023-2023"
#  [8] "2023-2023" "2023-2024" "2023-2024" "2023-2024" "2023-2024" "2023-2024" "2023-2024"
# [15] "2023-2024" "2023-2024" "2023-2024" "2023-2024"

相关问题