R语言 标识找到值的列

jyztefdp  于 12个月前  发布在  其他
关注(0)|答案(2)|浏览(91)

我有一个包含记录ID、24列ICD-10代码和24个相应列的数据框DF,用于指示入院时是否存在ICD-10代码:

record_id   admitting_dx   principal_dx   poa_principal_dx   other_dx_1   poa_other_dx_1 ...
1111111     A000           A001           Y                  A0224        N
1111112     B409           B441           Y                  B464         N
1111113     J201           J850           N                  K37          Y

字符串
包含ICD-10代码的列的名称存储在名为ICD10_code_columns的向量中。所有与感染对应的ICD-10代码存储在另一个名为ICD10_infxn的向量中(约1800个ICD-10代码)。
我正在使用以下代码创建一个新列,该列指示ICD10_code_columns中的任何一个是否包含ICD10_infxn中包含的任何ICD-10代码:

DF <- DF %>% 
   mutate(
      infxn = case_when(
          if_any(any_of(ICD10_code_columns), ~ . %in% ICD10_infxn) ~ TRUE, 
          TRUE ~ FALSE
      )
   )


我想创建一个名为infxn_col_name的附加列,该列存储第一列的名称,该列包含每个记录ID的感染相关ICD-10代码,和NA用于没有任何感染相关ICD-10代码的记录ID。我还想确定感染相关ICD-根据存储在相应的POA_列中的值,在准入时存在10代码。理想情况下,该解决方案将使用dqr,但这不是必需的。
我尝试使用apply()max.col导致了“无法评估”错误:“操作只能用于数字、逻辑或复杂类型”。

fykwrbwg

fykwrbwg1#

我将提供以下tidyverse解决方案:

library(tidyverse)
foo = function(x){
  x %>% 
    t %>% 
    as_tibble() %>% 
    mutate(rn=names(x),
           inICD = V1 %in% ICD10_infxn) %>% 
    mutate(cols_contain_ICD = any(inICD)) %>% 
    filter(inICD) %>% 
    transmute(cols_contain_ICD,
              firstcol = first(rn)) %>% 
    list()
}

df %>% 
  rowwise %>% 
  mutate(tmp = across(any_of(ICD10_code_columns)) %>% list()  ) %>% 
  rowwise() %>% 
  mutate(tmp2 = foo(tmp)) %>% 
  select(-tmp) %>% 
  unnest_wider(tmp2) %>% 
  mutate(cols_contain_ICD = replace_na(cols_contain_ICD, F))

字符串
对于以下数据:

df <- tibble(record_id = c(   
        "1111111",  
        "1111112",  
        "1111113"), 
       admitting_dx = c(   
       "A000",         
       "B409",         
       "J201" ),       
       principal_dx = c(   
       "A001",         
       "B441",         
       "J850"),     
       poa_principal_dx =c (   
       "Y",                
       "Y",                
       "N"),                
       other_dx_1 =c (   
       "A0224",      
       "B464",       
       "K37"),        
      poa_other_dx_1 =c (
        "N",
        "N",
        "Y"        ))

ICD10_code_columns <- c("admitting_dx", "principal_dx", "other_dx_1")
ICD10_infxn <- c("A000", "B464")


它将返回以下结果:

# A tibble: 3 × 8
  record_id admitting_dx principal_dx poa_principal_dx other_dx_1 poa_other_dx_1 cols_contain_ICD firstcol    
  <chr>     <chr>        <chr>        <chr>            <chr>      <chr>          <lgl>            <chr>       
1 1111111   A000         A001         Y                A0224      N              TRUE             admitting_dx
2 1111112   B409         B441         Y                B464       N              TRUE             other_dx_1  
3 1111113   J201         J850         N                K37        Y              FALSE            NA

ruoxqz4g

ruoxqz4g2#

感谢asd-tm的建议。它让我走上了正确的道路,得到了一个有效的解决方案。最终,我使用purrr得到了我需要的结果:

library(purrr)

#Function to find first column containing an infection-related ICD-10 code
find_infxn_col <- function(...) {
  matching_columns <- which(c(...) %in% ICD10_infxn$ICD10)
  if(length(matching_columns) > 0) {
    return(names(c(...))[matching_columns[1]])
  } else {
    return(NA)
  }
}

DF <- DF %>% 
  mutate(
    infxn = case_when(
      if_any(any_of(ICD10_code_columns), ~ . %in% ICD10_infxn) ~ TRUE, 
      TRUE ~ FALSE, 
    infxn_col = pmap_chr(across(any_of(ICD10_code_columns)), find_infxn_col)
    poa_col_name = case_when(
      grepl("PRINC|OTH", infxn_col, ignore.case = TRUE) ~ paste0("POA_", infxn_col), 
      grepl("ADMITTING", infxn_col, ignore.case = TRUE) ~ infxn_col, 
      is.na(infxn_col) ~ NA_character_
    )
)

#For infection code in ADMITTING column, store ICD-10 code in poa_col_value
#For infection code in PRINC or OTH column, store associated POA value in poa_col_value
DF <- DF %>% 
  rowwise() %>% 
  mutate(
    poa_col_value = ifelse(!is.na(poa_col_name), get(poa_col_name), NA_character_)
  ) %>% 
  ungroup()

#Convert values in pol_col_value to TRUE/FALSE if not NA
DF <- DF %>% 
  mutate(
    infxn_poa = case_when(
      poa_col_value == "N" ~ FALSE, 
      !is.na(poa_col_value) ~ TRUE 
    )
  )

字符串

相关问题