R中的多列旋转

jum4pzuy  于 2023-07-31  发布在  其他
关注(0)|答案(4)|浏览(108)

我的数据是广泛的形式,看起来像下面。数据包括名称和四个数据列。

df_wide <- data.frame(
  name = c("John", "Smith", "Tom", "Alice"),
  area = c("A", "A", "B", "B"),
  hello_1 = c(20, 30, 40, 70),
  hello_covered = c(60, 89, 76, 87),
  world_1 = c(55, 65, 20, 29),
  world_1_covered = c(87, 29, 43, 42)
)

字符串
我想把它转换成一个长格式,其中每个名称,我得到两行和两个数据列。我们期望的数据如下所示

name area medicine value covered
1  John    A  hello    20      60
2  John    A  world    55      87
3 Smith    A  hello    30      89
4 Smith    A  world    65      29
5   Tom    B  hello    40      76
6   Tom    B  world    20      43
7 Alice    B  hello    70      87
8 Alice    B  world    29      42


我试过使用pivto_longer,但不太成功。

qlfbtfca

qlfbtfca1#

您可以在names_to中设置".value",并提供names_sepnames_pattern之一,以指定如何拆分列名。

library(dplyr)
library(tidyr)

df_wide %>%
  rename(world_covered = world_1_covered) %>%
  rename_with(~ sub("1$", "value", .x), ends_with('1')) %>%
  pivot_longer(
    3:6,
    names_to = c("medicine", ".value"),
    names_sep = "_"
  )

# # A tibble: 8 × 5
#   name  area  medicine value covered
#   <chr> <chr> <chr>    <dbl>   <dbl>
# 1 John  A     hello       20      60
# 2 John  A     world       55      87
# 3 Smith A     hello       30      89
# 4 Smith A     world       65      29
# 5 Tom   B     hello       40      76
# 6 Tom   B     world       20      43
# 7 Alice B     hello       70      87
# 8 Alice B     world       29      42

字符串

cngwdvgl

cngwdvgl2#

可以先重命名以简化透视:

library(dplyr)
library(tidyr)
df_wide %>% 
  rename_with(~ gsub("\\_1", "", .x)) %>% 
  rename_with(~ ifelse(grepl("_", .x), .x, paste(.x, "value", sep = "_")), -c(name, area)) %>% 
  pivot_longer(-c(name, area), names_to = c("medicine", ".value"), names_sep = "_")

  name  area  medicine value covered
  <chr> <chr> <chr>    <dbl>   <dbl>
1 John  A     hello       20      60
2 John  A     world       55      87
3 Smith A     hello       30      89
4 Smith A     world       65      29
5 Tom   B     hello       40      76
6 Tom   B     world       20      43
7 Alice B     hello       70      87
8 Alice B     world       29      42

字符串

62lalag4

62lalag43#

tidyverse中的一种可选方法-不那么简洁,但可能更容易理解:

df_wide %>%
    # make data long
    tidyr::pivot_longer(hello_1:world_1_covered, names_to = "medicine_pre", values_to = "value") %>%
    # remove one from before column names made into long format
    dplyr::mutate(medicine_pre = stringr::str_remove_all(medicine_pre, pattern = "_1")) %>%
    # split the before column names into two columns
    tidyr::separate(medicine_pre, into = c("medicine", "helper"), sep = "_") %>% 
    # make NAs to "value"
    dplyr::mutate(helper = dplyr::coalesce(helper, "value"))  %>%
    # make data wide again
    tidyr::pivot_wider(names_from = helper, values_from = value)

# A tibble: 8 x 5
  name  area  medicine value covered
  <chr> <chr> <chr>    <dbl>   <dbl>
1 John  A     hello       20      60
2 John  A     world       55      87
3 Smith A     hello       30      89
4 Smith A     world       65      29
5 Tom   B     hello       40      76
6 Tom   B     world       20      43
7 Alice B     hello       70      87
8 Alice B     world       29      42

字符串

chhqkbe1

chhqkbe14#

一种方法应该适用于其他 hello_X/world_X 名称,因为这些名称标记了

  • hello_covered* 或 work_1_covered 是模式的离群值,这就是为什么需要额外的mutatemedicine
library(dplyr)
library(tidyr)

pivot_longer(df_wide, -c(name, area), names_pattern = "(.*)_*.*_(.*)", 
             names_to = c("medicine", "nm")) %>% 
  mutate(nm = replace(nm, grepl("\\d", nm), "value"), 
         medicine = sub("_.*", "" , medicine)) %>% 
  pivot_wider(names_from = nm, values_from = value)
# A tibble: 8 × 5
  name  area  medicine value covered
  <chr> <chr> <chr>    <dbl>   <dbl>
1 John  A     hello       20      60
2 John  A     world       55      87
3 Smith A     hello       30      89
4 Smith A     world       65      29
5 Tom   B     hello       40      76
6 Tom   B     world       20      43
7 Alice B     hello       70      87
8 Alice B     world       29      42

字符串

相关问题