R -从数据框列中的单位拆分数值,并将单位追加到列名

zpjtge22  于 2023-11-14  发布在  其他
关注(0)|答案(3)|浏览(133)

我有一个样本 Dataframe 如下:

  1. df_test <- data.frame(
  2. ID = c(1:5),
  3. Length = c("17 mm","20 mm", "32 mm", "69 mm", "100 mm"),
  4. Vmax = c("17 cm2","20 cm2", "32 cm2", "69 cm2", "100 cm2"),
  5. Age = c("17 years","20 years", "32 years", "69 years", "100 years"),
  6. PG = c("17 mmHg","20 mmHg", "32 mmHg", "69 mmHg", "100 mmHg"),
  7. Weight = c("17 kg","20 kg", "32 kg", "69 kg", "100 kg")
  8. )

字符串
我想标识每列中的单位,将这些单位附加到列名中,并且只留下数值。结果应该如下所示:

  1. ID Length_cm Vmax_cm2 Age_years PG_mmHg Weight_kg
  2. 1 1 17 17 17 17 17
  3. 2 2 20 20 20 20 20
  4. 3 3 32 32 32 32 32
  5. 4 4 69 69 69 69 69
  6. 5 5 100 100 100 100 100


然后,我想更进一步,能够指定范围内的哪些列应用此规则。
使用下面的代码,我可以成功地完成一个指定的列:

  1. df_test$Vmax
  2. units <- gsub(".* ", "", df_test$Vmax)
  3. units <- units[!is.na(units)]
  4. units <- unique(units)
  5. df_test$Vmax <- as.numeric(gsub(" .*$", "", df_test$Vmax))
  6. col_index <- which(colnames(df_test) == "Vmax")
  7. col_name <- colnames(df_test)[col_index]
  8. new_col_name <- paste(col_name, units, sep="_")
  9. names(df_test)[col_index] <- new_col_name


我不知道如何将这个方法应用于多个列,我尝试将其放入for循环并使用mutate(across()),但没有成功。
注意:为了澄清,数值和单位总是用空格分隔。单位在任何给定的列中都是一致的。列中也可能有一些NA值。
任何关于这方面的帮助将不胜感激!谢谢!

ifsvaxew

ifsvaxew1#

我们可以重新整形longest,然后将每个度量值分为一个值和一个单位,将数字转换为数字数据,然后重新整形wide,将原始列名和新名称的单位组合在一起。

  1. library(tidyr)
  2. df_test %>%
  3. pivot_longer(-ID) %>%
  4. separate(value, c("val", "unit"), convert = TRUE) %>%
  5. pivot_wider(names_from = c(name, unit), values_from = val)

字符串
结果

  1. # A tibble: 5 x 6
  2. ID Length_mm Vmax_cm2 Age_years PG_mmHg Weight_kg
  3. <int> <int> <int> <int> <int> <int>
  4. 1 1 17 17 17 17 17
  5. 2 2 20 20 20 20 20
  6. 3 3 32 32 32 32 32
  7. 4 4 69 69 69 69 69
  8. 5 5 100 100 100 100 100

展开查看全部
jhkqcmku

jhkqcmku2#

试试这个:

  1. library(tidyverse)
  2. df_test %>%
  3. pivot_longer(-ID, names_to = "column", values_to = "value") %>%
  4. separate(col = value, into = c("value", "unit")) %>%
  5. transmute(
  6. ID,
  7. column = str_c(column, unit, sep = "_"),
  8. value = as.numeric(value)
  9. ) %>%
  10. pivot_wider(names_from = column, values_from = value)
  11. # A tibble: 5 × 6
  12. ID Length_mm Vmax_cm2 Age_years PG_mmHg Weight_kg
  13. <int> <dbl> <dbl> <dbl> <dbl> <dbl>
  14. 1 1 17 17 17 17 17
  15. 2 2 20 20 20 20 20
  16. 3 3 32 32 32 32 32
  17. 4 4 69 69 69 69 69
  18. 5 5 100 100 100 100 100

字符串
另外,在colnames上尝试lower_case + camel_case,避免一些麻烦!
列中也可能有一些NA值。
PS.小版本,以帮助与NA数据。
让我们说:

  1. df_test <- data.frame(
  2. ID = c(1:5),
  3. Length = c("17 mm","20 mm", "32 mm", NA_character_, "100 mm"),
  4. Vmax = c("17 cm2","20 cm2", "32 cm2", "69 cm2", NA_character_),
  5. Age = c("17 years", NA_character_, NA_character_, "69 years", "100 years"),
  6. PG = c("17 mmHg","20 mmHg", "32 mmHg", "69 mmHg", "100 mmHg"),
  7. Weight = c(NA_character_,"20 kg", "32 kg", "69 kg", "100 kg")


如果至少有一个值不是NA,则按“列”分组的fill将完成此任务:

  1. df_test %>%
  2. pivot_longer(-ID, names_to = "column", values_to = "value") %>%
  3. separate(col = value, into = c("value", "unit")) %>%
  4. with_groups(column, \(x) fill(x, unit, .direction = "downup")) %>%
  5. transmute(
  6. ID,
  7. column = str_c(column, unit, sep = "_"),
  8. value = as.numeric(value)
  9. ) %>%
  10. pivot_wider(names_from = column, values_from = value)
  11. # A tibble: 5 × 6
  12. ID Length_mm Vmax_cm2 Age_years PG_mmHg Weight_kg
  13. <int> <dbl> <dbl> <dbl> <dbl> <dbl>
  14. 1 1 17 17 17 17 NA
  15. 2 2 20 20 NA 20 20
  16. 3 3 32 32 NA 32 32
  17. 4 4 NA 69 69 69 69
  18. 5 5 100 NA 100 100 100

展开查看全部
iyfamqjs

iyfamqjs3#

或者,请尝试

  1. # get the units
  2. nm <- sapply(df_test[,-1], \(x){
  3. str_extract(x,'(?<=\\s).*')[1]
  4. })
  5. # rename the variables with units
  6. names(df_test)[2:6] <- paste(as.vector(names(df_test)[-1]),as.vector(nm),sep = '_')
  7. # remove the units from variables
  8. new_df <- map_df(df_test[,-1], \(x) {
  9. str_extract(x,'.*(?=\\s)')
  10. })
  11. # combine the ID with other variables
  12. cbind(ID=df_test[,1],new_df)

字符串
创建于2023-11-09带有reprex v2.0.2

  1. ID Length_mm Vmax_cm2 Age_years PG_mmHg Weight_kg
  2. 1 1 17 17 17 17 17
  3. 2 2 20 20 20 20 20
  4. 3 3 32 32 32 32 32
  5. 4 4 69 69 69 69 69
  6. 5 5 100 100 100 100 100

展开查看全部

相关问题