R语言 从变量中删除负值,并在新变量中仅保留大于等于0的值

xfyts7mz  于 2023-10-13  发布在  其他
关注(0)|答案(1)|浏览(95)

新手到R。
我有1年的患者数据。原始数据集包括患者数据,其中包含50多个变量,例如Pat_id、年龄、种族、实验室检查、HbA 1c日期、值、HbA1c_baseline_date、Hb_m_days_HbA1c_Date_1、Hb_m_days_HbA1c_Date_2、.直至Hb_m_days_HbA1c_Date_15。
样本数据集包括Pat_ID、HbA1c_baseline_date、Hb_m_days_HbA1c_Date_1、Hb_m_days_HbA1c_Date_2、Hb_m_days_HbA1c_Date_3、Hb_m_days_HbA1c_Date_4、Hb_m_days_HbA1c_Date_5。
“PAt_id”-唯一患者ID。
“HbA1c_baseline_date”-是基线HbA 1c日期
“Hb_m_days_HbA1c_Date_”= HbA1c_Baseline日期与相应随访日期之间的差值。

sampledata <- data.frame(
  Pat_ID = c(1123, 1134, 1138, 1139),
  HbA1c_Baseline_Date = c("10/2/2019", "8/28/2019", "10/16/2019", "9/30/2019"),
  Hb_m_days_HbA1c_Date_1 = c(-421, -261, 0, 0),
  Hb_m_days_HbA1c_Date_2 = c(-329, -168, NA, 95),
  Hb_m_days_HbA1c_Date_3 = c(-196, -142, NA, 245),
  Hb_m_days_HbA1c_Date_4 = c(-105, -141, NA, NA),
  Hb_m_days_HbA1c_Date_5 = c(0, 0, NA, NA),
  Hb_m_days_HbA1c_Date_6 = c(97, 89, NA, NA),
)

我希望R创建一个新的变量,如“baseline”、“followup_1”、“followup_2”、“followup_3”、“followup_4”、“followup_5”等,直到原始数据集中的“followup_12”。
我想用R扫描所有“Hb_m_days_HbA1c_date_“s列,删除负值,找出“0”并将其保留在“基线”列中,之后的任何数字都应在下一个相应列“followup_1”、“followup_2”等中。
我试着使用图书馆(tidyverse),但它没有多大帮助。str(dataraw)的输出:

data.frame':    852 obs. of  167 variables:
 $ Pat_id                      : int  10004 10010 10012 10013 10014 10015 10019 10021 10025 10030 ...
 $ practice_id                    : int  11 11 13 11 12 14 8 8 14 12 ...
 $ rph_id                         : int  7 7 1 7 10 1 2 2 1 10 ...
 $ index_date                     : chr  "2019-10-02" "2019-08-28" "2019-07-19" "2019-07-10" ...
 $ study_group                    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ age                            : int  67 56 39 34 31 72 48 57 59 46 ...
 $ gender                         : int  1 1 1 0 0 0 1 1 1 0 ...
 $ race                           : int  3 1 1 2 2 1 2 1 2 2 ...
 $ medicare                       : int  1 0 0 0 0 1 0 0 0 0 ...
 $ commercial                     : int  0 0 0 0 0 0 0 0 1 0 ...
 $ hypertension                   : int  1 1 0 0 0 0 0 1 1 1 ...
 $ diabetes                       : int  1 1 1 1 1 1 1 1 1 1 ...
 $ cc_total                       : int  1 1 3 1 2 2 1 1 1 1 ...
 $ num_meds                       : int  11 16 8 4 7 5 8 14 8 16 ...
 $ sbp_baseline_value             : int  144 120 114 125 137 138 140 126 144 151 ...
 $ dbp_baseline_value             : int  60 76 72 84 78 82 98 84 70 96 ...
 $ hba1c_baseline_value           : num  9.2 11.2 11.1 10.4 11.3 10.3 12.8 10 13.1 9.6 ...
 $ awv_inclusion                  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ diabetes_inclusion_9           : int  NA NA NA NA NA NA NA NA NA NA ...
 $ diabetes_inclusion_7           : int  NA NA NA NA NA NA NA NA NA NA ...
 $ bp_inclusion                   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ dm_match_7                     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ dm_match_9                     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ awv_2019                       : int  1 1 1 1 1 1 1 1 1 1 ...
 $ dsm_dc_2019                    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ dsm_htn_2019                   : int  0 0 0 0 0 0 0 0 0 1 ...
 $ diabetes_subset7               : int  1 1 1 1 1 1 1 1 1 1 ...
 $ diabetes_subset9               : int  1 1 1 1 1 1 1 1 1 1 ...
 $ WHITE                          : int  0 1 1 0 0 1 0 1 0 0 ...
 $ BLACK                          : int  0 0 0 1 1 0 1 0 1 1 ...
 $ OTHER                          : int  1 0 0 0 0 0 0 0 0 0 ...
 $ DIABETES_SUPER7                : int  1 1 1 1 1 1 1 1 1 1 ...
 $ DIABETES_SUPER9                : int  1 1 1 1 1 1 1 1 1 1 ...
 $ hba1c_date                     : chr  "2019-10-02" "2019-08-28" "2019-10-16" "2019-09-30" ...
 $ hba1c_final_2019_date          : chr  "2019-10-02" "2019-08-28" "2019-10-16" "2019-09-30" ...
 $ Max..hba1c_relative_index_date.: int  0 0 89 82 149 -13 -15 -22 206 224 ...
 $ hba1c_endyear_value            : num  9.2 11.2 9.2 10.7 6.7 10.3 12.8 10 6.4 6.3 ...
 $ a1c_control_7_2019             : int  0 0 0 0 1 0 0 0 1 1 ...
 $ a1c_control_8_2019             : int  0 0 0 0 1 0 0 0 1 1 ...
 $ a1c_control_9_2019             : int  0 0 0 0 1 0 0 0 1 1 ...
 $ HbA1c.Poor.Control_2019        : int  1 1 1 1 0 1 1 1 0 0 ...
 $ bp_date                        : chr  "2019-10-02" "2019-11-06" "2019-12-27" "2019-09-30" ...
 $ bp_final_2019_date             : chr  "2019-10-02" "2019-11-06" "2019-12-27" "2019-09-30" ...
 $ Max..bp_relative_index_date.   : int  0 70 161 82 261 -18 -15 0 310 223 ...
 $ bp_relative_index_date         : int  0 70 161 82 261 -18 -15 0 310 223 ...
 $ sbp_endyear_value              : int  144 133 128 139 135 138 140 126 127 148 ...
 $ dbp_endyear_value              : int  60 86 88 71 89 82 98 84 68 87 ...
 $ sbp_control_2019               : int  0 1 1 1 1 1 0 1 1 0 ...
 $ dbp_control_2019               : int  1 1 1 1 1 1 0 1 1 1 ...
 $ bp_control_2019                : int  0 1 1 1 1 1 0 1 1 0 ...
 $ acr_baseline_date              : chr  "2019-03-21" "2019-08-29" "2019-07-20" "2019-06-28" ...
 $ acr                            : num  39.5 6.5 6.9 3.8 NA 9.5 6.9 NA 17.7 6.9 ...
 $ eGFR_baseline_date             : chr  "2019-10-03" "2019-08-29" "2019-07-23" "2019-06-28" ...
 $ eGFR                           : int  92 98 40 124 60 86 52 79 86 65 ...
 $ HDL_baseline_date              : chr  "2019-03-21" "2019-08-29" "2019-02-28" "2019-06-28" ...
 $ HDL                            : int  41 37 44 55 36 54 42 NA 42 23 ...
 $ LDL_baseline_date              : chr  "2019-03-21" "2019-08-29" "2019-02-28" "2019-06-28" ...
 $ LDL                            : num  81 70 90 56 93 59 64 NA 91 NA ...
 $ TG_baseline_date               : chr  "2019-03-21" "2019-08-29" "2019-02-28" "2019-06-28" ...
 $ TG                             : int  215 115 205 86 180 88 70 NA 172 589 ...
 $ TC_baseline_date               : chr  "2019-03-21" "2019-08-29" "2019-02-28" "2019-06-28" ...
 $ TC                             : int  165 130 175 128 165 131 120 NA 167 147 ...
 $ Non.HDL                        : int  124 93 131 73 129 77 78 NA 125 124 ...
 $ VLDL_baseline_date             : chr  "2019-03-21" "2019-08-29" "2019-02-28" "2019-06-28" ...
 $ VLDL                           : num  43 23 41 17 36 18 14 NA 34 NA ...
 $ height_in_baseline             : num  62 64.5 62 NA 72 NA NA 60 68 72 ...
 $ weight_lbs_baseline            : num  280 192 237 NA NA ...
 $ index_date_month               : int  10 8 7 7 4 10 12 10 2 5 ...
 $ index_date_quarter             : int  4 3 3 3 2 4 4 4 1 2 ...
 $ smoking_status                 : int  2 3 2 3 2 3 3 2 2 1 ...
 $ elix_obesity                   : int  1 0 1 0 1 0 0 0 0 1 ...
 $ elix_depression                : int  1 0 0 0 0 0 0 0 0 0 ...
 $ icd_10_chart                   : int  10 4 12 1 4 1 1 3 5 8 ...
 $ X_PS_                          : num  0.3575 0.3941 0.0882 0.2653 0.2291 ...
 $ X_Lps_2                        : num  -0.586 -0.43 -2.336 -1.019 -1.213 ...
 $ X_MATCHWGT_                    : num  1 1 1 1 1 1 1 1 1 1 ...
 $ X_MatchID_2                    : int  189 217 1 105 61 16 227 158 248 255 ...
 $ days_1                         : int  -421 0 -261 -13 -14 -467 -431 -380 -146 0 ...
 $ HbA1c_Date_1                   : chr  "2018-08-07" "2019-08-28" "2018-10-31" "2019-06-27" ...
 $ HbA1c_1                        : num  7.9 11.2 7 10.4 11.3 8.8 6.4 8 8.2 9.6 ...
 $ days_2                         : int  -329 NA -168 82 149 -13 -257 -22 -1 93 ...
 $ HbA1c_Date_2                   : chr  "2018-11-07" NA "2019-02-01" "2019-09-30" ...
 $ HbA1c_2                        : num  8.8 NA 7.7 10.7 6.7 10.3 6.8 10 13.1 5.9 ...
 $ days_3                         : int  -196 NA -142 232 NA NA -96 169 92 94 ...
 $ HbA1c_Date_3                   : chr  "2019-03-20" NA "2019-02-27" "2020-02-27" ...
 $ HbA1c_3                        : num  9.1 NA 8.1 11 NA NA 7.1 8.1 7.6 6 ...
 $ days_4                         : int  -105 NA -141 NA NA NA -15 NA 206 224 ...
 $ HbA1c_Date_4                   : chr  "2019-06-19" NA "2019-02-28" NA ...
 $ HbA1c_4                        : num  9.1 NA 8.2 NA NA NA 12.8 NA 6.4 6.3 ...
 $ days_5                         : int  0 NA 0 NA NA NA 190 NA NA NA ...
 $ HbA1c_Date_5                   : chr  "2019-10-02" NA "2019-07-19" NA ...
 $ HbA1c_5                        : num  9.2 NA 11.1 NA NA NA 9 NA NA NA ...
 $ days_6                         : int  97 NA 89 NA NA NA NA NA NA NA ...
 $ HbA1c_Date_6                   : chr  "2020-01-07" NA "2019-10-16" NA ...
 $ HbA1c_6                        : num  9.1 NA 9.2 NA NA NA NA NA NA NA ...
 $ days_7                         : int  NA NA 168 NA NA NA NA NA NA NA ...
 $ HbA1c_Date_7                   : chr  NA NA "2020-01-03" NA ...
 $ HbA1c_7                        : num  NA NA 10.1 NA NA NA NA NA NA NA ...
 $ days_8                         : int  NA NA NA NA NA NA NA NA NA NA ... The list output is truncated.

输出数据集应该如下所示-
| Pat_id|基线|followup_1|后续_2| followup_3|后续_4|后续_5|后续_6|后续_7|
| --|--|--|--|--|--|--|--|--|
| 1123 | 0 | 97 |||||||
| 1134 | 0 | 89 | 168 ||||||
| 1138 | 0 ||||||||
| 1139 | 0 | 95 | 245 ||||||

pexxcrt2

pexxcrt21#

使用tidyverse函数的一个潜在选项:

library(tidyverse)

sampledata <- data.frame(
  Pat_ID = c(1123, 1134, 1138, 1139),
  HbA1c_Baseline_Date = c("10/2/2019", "8/28/2019", "10/16/2019", "9/30/2019"),
  Hb_m_days_HbA1c_Date_1 = c(-421, -261, 0, 0),
  Hb_m_days_HbA1c_Date_2 = c(-329, -168, NA, 95),
  Hb_m_days_HbA1c_Date_3 = c(-196, -142, NA, 245),
  Hb_m_days_HbA1c_Date_4 = c(-105, -141, NA, NA),
  Hb_m_days_HbA1c_Date_5 = c(0, 0, NA, NA),
  Hb_m_days_HbA1c_Date_6 = c(97, 89, NA, NA)
)

df1 <- sampledata %>%
  pivot_longer(starts_with("days")) %>% # pivot the dataset to long format
  group_by(Pat_ID) %>%                              # group results by patient ID
  mutate(followup = case_when(value < 0 ~ NA,            # if the value is < 0 make it NA
                              value == 0 ~ "baseline",       # if the value == 0, make it baseline
                              value > 0 ~ paste("followup",      # if the value is > 0
                                                cumsum(value > 0),     # count each occurrence
                                                sep = "_"))) %>%        # and create followup_1, followup_2 etc
  na.omit() %>%                                     # then delete lines that contain an NA (don't need them)
  pivot_wider(names_from = followup,                # and pivot the data back to wide format
              values_from = value,
              id_cols = Pat_ID) %>%
  ungroup()                                         # and ungroup the dataframe

df1
#> # A tibble: 4 × 4
#>   Pat_ID baseline followup_1 followup_2
#>    <dbl>    <dbl>      <dbl>      <dbl>
#> 1   1123        0         97         NA
#> 2   1134        0         89         NA
#> 3   1138        0         NA         NA
#> 4   1139        0         95        245

# To add the new columns to your original dataframe
df2 <- sampledata %>%
  left_join(df1)
#> Joining with `by = join_by(Pat_ID)`

df2
#>   Pat_ID HbA1c_Baseline_Date Hb_m_days_HbA1c_Date_1 Hb_m_days_HbA1c_Date_2
#> 1   1123           10/2/2019                   -421                   -329
#> 2   1134           8/28/2019                   -261                   -168
#> 3   1138          10/16/2019                      0                     NA
#> 4   1139           9/30/2019                      0                     95
#>   Hb_m_days_HbA1c_Date_3 Hb_m_days_HbA1c_Date_4 Hb_m_days_HbA1c_Date_5
#> 1                   -196                   -105                      0
#> 2                   -142                   -141                      0
#> 3                     NA                     NA                     NA
#> 4                    245                     NA                     NA
#>   Hb_m_days_HbA1c_Date_6 baseline followup_1 followup_2
#> 1                     97        0         97         NA
#> 2                     89        0         89         NA
#> 3                     NA        0         NA         NA
#> 4                     NA        0         95        245

创建于2023-10-09带有reprex v2.0.2

相关问题