R -从数据框列中的单位拆分数值，并将单位追加到列名

zpjtge22 于 2023-11-14 发布在其他

关注(0)|答案(3)|浏览(137)

我有一个样本 Dataframe 如下：

df_test <- data.frame(
  ID = c(1:5),
  Length = c("17 mm","20 mm", "32 mm", "69 mm", "100 mm"),
  Vmax = c("17 cm2","20 cm2", "32 cm2", "69 cm2", "100 cm2"),
  Age = c("17 years","20 years", "32 years", "69 years", "100 years"),
  PG = c("17 mmHg","20 mmHg", "32 mmHg", "69 mmHg", "100 mmHg"),
  Weight = c("17 kg","20 kg", "32 kg", "69 kg", "100 kg")
)

字符串
我想标识每列中的单位，将这些单位附加到列名中，并且只留下数值。结果应该如下所示：

ID Length_cm Vmax_cm2 Age_years PG_mmHg Weight_kg
1  1        17       17        17      17        17
2  2        20       20        20      20        20
3  3        32       32        32      32        32
4  4        69       69        69      69        69
5  5       100      100       100     100       100

型
然后，我想更进一步，能够指定范围内的哪些列应用此规则。
使用下面的代码，我可以成功地完成一个指定的列：

df_test$Vmax
units <- gsub(".* ", "", df_test$Vmax)
units <- units[!is.na(units)]
units <- unique(units)
df_test$Vmax <- as.numeric(gsub(" .*$", "", df_test$Vmax))
col_index <- which(colnames(df_test) == "Vmax")
col_name <- colnames(df_test)[col_index]
new_col_name <- paste(col_name, units, sep="_")
names(df_test)[col_index] <- new_col_name

型
我不知道如何将这个方法应用于多个列，我尝试将其放入for循环并使用mutate（across（）），但没有成功。
注意：为了澄清，数值和单位总是用空格分隔。单位在任何给定的列中都是一致的。列中也可能有一些NA值。
任何关于这方面的帮助将不胜感激！谢谢！

来源：https://stackoverflow.com/questions/77456094/r-split-numerical-values-from-units-within-a-data-frame-column-and-append-unit

3条答案

按热度按时间

ifsvaxew1#

我们可以重新整形longest，然后将每个度量值分为一个值和一个单位，将数字转换为数字数据，然后重新整形wide，将原始列名和新名称的单位组合在一起。

library(tidyr)
df_test %>%
  pivot_longer(-ID) %>%
  separate(value, c("val", "unit"), convert = TRUE) %>%
  pivot_wider(names_from = c(name, unit), values_from = val)

字符串
结果

# A tibble: 5 x 6
     ID Length_mm Vmax_cm2 Age_years PG_mmHg Weight_kg
  <int>     <int>    <int>     <int>   <int>     <int>
1     1        17       17        17      17        17
2     2        20       20        20      20        20
3     3        32       32        32      32        32
4     4        69       69        69      69        69
5     5       100      100       100     100       100

型

展开查看全部

赞(0）回复(0）举报 2023-11-14

jhkqcmku2#

试试这个：

library(tidyverse)
df_test %>% 
  pivot_longer(-ID, names_to = "column", values_to = "value") %>% 
  separate(col = value, into = c("value", "unit")) %>% 
  
  transmute(
    ID, 
    column = str_c(column, unit, sep = "_"),
    value = as.numeric(value)
  ) %>% 
  
  pivot_wider(names_from = column, values_from = value) 
# A tibble: 5 × 6
     ID Length_mm Vmax_cm2 Age_years PG_mmHg Weight_kg
  <int>     <dbl>    <dbl>     <dbl>   <dbl>     <dbl>
1     1        17       17        17      17        17
2     2        20       20        20      20        20
3     3        32       32        32      32        32
4     4        69       69        69      69        69
5     5       100      100       100     100       100

字符串
另外，在colnames上尝试lower_case + camel_case，避免一些麻烦！
列中也可能有一些NA值。
PS.小版本，以帮助与NA数据。
让我们说：

df_test <- data.frame(
  ID = c(1:5),
  Length = c("17 mm","20 mm", "32 mm", NA_character_, "100 mm"),
  Vmax = c("17 cm2","20 cm2", "32 cm2", "69 cm2", NA_character_),
  Age = c("17 years", NA_character_, NA_character_, "69 years", "100 years"),
  PG = c("17 mmHg","20 mmHg", "32 mmHg", "69 mmHg", "100 mmHg"),
  Weight = c(NA_character_,"20 kg", "32 kg", "69 kg", "100 kg")

型
如果至少有一个值不是NA，则按“列”分组的fill将完成此任务：

df_test %>% 
  pivot_longer(-ID, names_to = "column", values_to = "value") %>% 
  separate(col = value, into = c("value", "unit")) %>% 
  
  with_groups(column, \(x) fill(x, unit, .direction = "downup")) %>% 
  
  transmute(
    ID, 
    column = str_c(column, unit, sep = "_"),
    value = as.numeric(value)
  ) %>% 
  
  pivot_wider(names_from = column, values_from = value) 
# A tibble: 5 × 6
     ID Length_mm Vmax_cm2 Age_years PG_mmHg Weight_kg
  <int>     <dbl>    <dbl>     <dbl>   <dbl>     <dbl>
1     1        17       17        17      17        NA
2     2        20       20        NA      20        20
3     3        32       32        NA      32        32
4     4        NA       69        69      69        69
5     5       100       NA       100     100       100

型

展开查看全部

赞(0）回复(0）举报 2023-11-14

iyfamqjs3#

或者，请尝试

# get the units
nm <- sapply(df_test[,-1], \(x){ 
str_extract(x,'(?<=\\s).*')[1]
  })
# rename the variables with units
names(df_test)[2:6] <- paste(as.vector(names(df_test)[-1]),as.vector(nm),sep = '_')
# remove the units from variables
new_df <- map_df(df_test[,-1], \(x) {
  str_extract(x,'.*(?=\\s)')
})
# combine the ID with other variables
cbind(ID=df_test[,1],new_df)

字符串
创建于2023-11-09带有reprex v2.0.2

ID Length_mm Vmax_cm2 Age_years PG_mmHg Weight_kg
1  1        17       17        17      17        17
2  2        20       20        20      20        20
3  3        32       32        32      32        32
4  4        69       69        69      69        69
5  5       100      100       100     100       100

型

展开查看全部

赞(0）回复(0）举报 2023-11-14

我来回答

R -从数据框列中的单位拆分数值，并将单位追加到列名

3条答案

相关问题

热门标签

最新问答