基于同一 Dataframe 的逻辑集替换多个列值

8xiog9wr  于 2023-02-06  发布在  其他
关注(0)|答案(3)|浏览(150)

我有一个 Dataframe df。我想用NA替换任何df[c("PhysicalActivity_yn_agesurvey", "smoker_former_or_never_yn_agesurvey", "NOT_RiskyHeavyDrink_yn_agesurvey", "Not_obese_yn_agesurvey", "HEALTHY_Diet_yn_agesurvey")] != df$SURVEY_MIN]true的列值。我该如何在R中执行此操作?

df <- structure(list(PhysicalActivity_yn_agesurvey = c(58, 47, 47, 
50, 53, 59), smoker_former_or_never_yn_agesurvey = c(58, 47, 
47, 50, 53, 59), NOT_RiskyHeavyDrink_yn_agesurvey = c(59, 48, 
47, 50, 53, 59), Not_obese_yn_agesurvey = c(58, 47, 47, 50, 53, 
59), HEALTHY_Diet_yn_agesurvey = c(58, 47, 47, 50, 53, 59), SURVEY_MIN = c(58, 
47, 47, 50, 53, 59)), row.names = c(NA, 6L), class = "data.frame")

这些是我试过的密码

df[lapply(df, function(x) ifelse(x != df$SURVEY_MIN, TRUE, FALSE))] <- NA

还尝试:

df[c("PhysicalActivity_yn_agesurvey", "smoker_former_or_never_yn_agesurvey", "NOT_RiskyHeavyDrink_yn_agesurvey",
                "Not_obese_yn_agesurvey", "HEALTHY_Diet_yn_agesurvey")] [df[c("PhysicalActivity_yn_agesurvey", "smoker_former_or_never_yn_agesurvey", "NOT_RiskyHeavyDrink_yn_agesurvey",
                 "Not_obese_yn_agesurvey", "HEALTHY_Diet_yn_agesurvey")] != df$SURVEY_MIN] <- NA
bnl4lu3b

bnl4lu3b1#

在R中为循环编写代码是非常糟糕的做法!(99%的情况下)

df[(df != df$SURVEY_MIN)]<-NA

就可以了。

xoefb8l8

xoefb8l82#

我希望我没理解错你的问题,但这应该能解决问题:

for (i in 1:nrow(df)) {
  for (j in 1:(ncol(df)-1)) { 
    if (df[i,j] != df$SURVEY_MIN[i]) {
      df[i,j] <- NA
    }
  }
}
20jt8wwn

20jt8wwn3#

您需要首先创建一个0值的数据框,该数据框将根据您的条件(如果转换为R,则为条件语句)填充。这需要一个循环,其中每个单元格都应与列SURVEY_MIN中的相应值进行比较。因此,首先我创建一个名为df_result的数据框,不包括您要比较的列(SURVEY_MIN),但稍后您可以连接它:

df_result <- data.frame(PhysicalActivity_yn_agesurvey = numeric(nrow(df)), 
                    smoker_former_or_never_yn_agesurvey = numeric(nrow(df)), 
                    NOT_RiskyHeavyDrink_yn_agesurvey = numeric(nrow(df)), 
                    Not_obese_yn_agesurvey = numeric(nrow(df)), 
                    HEALTHY_Diet_yn_agesurvey = numeric(nrow(df)))

然后我们需要定义一个函数来根据你的问题填充单元格,将函数应用于df中的每个单元格,并将结果保存在df_result中:

for (i in 1:nrow(df)) {
 for (j in 1:5) {
  colname <- names(df[j])
  if (df[i, j] == df$SURVEY_MIN[i]) {
   df_result[i, j] <- df[i, j]
  } else {
    df_result[i, j] <- NA
  }
 }
}

这说明只有两个值与SURVEY_MIN中对应的行值不同,它们来自NOT_RiskyHeavyDrink_yn_agesurvey

df_result
PhysicalActivity_yn_agesurvey smoker_former_or_never_yn_agesurvey NOT_RiskyHeavyDrink_yn_agesurvey Not_obese_yn_agesurvey HEALTHY_Diet_yn_agesurvey
58                                  58                               NA                     58                        58
47                                  47                               NA                     47                        47
47                                  47                               47                     47                        47
50                                  50                               50                     50                        50
53                                  53                               53                     53                        53
59                                  59                               59                     59                        59

相关问题