R语言 按组前缀旋转更长

c9qzyr3d  于 2023-07-31  发布在  其他
关注(0)|答案(2)|浏览(122)

我需要透视更长的分组列字符串前缀。下面的玩具例子有两个组“A”和“B”,但我需要一个通用的tidyverse解决方案,以前缀为任意数量的组。

  1. #toy df
  2. set.seed(1)
  3. df <- data.table(
  4. date = rep(seq(as.Date("2020-01-01"),as.Date("2020-01-05"),by="day"),each=6),
  5. k = rep(c("A.mean","A.median","A.min","B.mean","B.median","B.min"),5),
  6. v = runif(30,0,50)
  7. ) %>%
  8. pivot_wider(names_from = k, values_from = v)
  9. df %>% head
  10. date A.mean A.median A.min B.mean B.median B.min
  11. <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
  12. 1 2020-01-01 13.3 18.6 28.6 45.4 10.1 44.9
  13. 2 2020-01-02 47.2 33.0 31.5 3.09 10.3 8.83
  14. 3 2020-01-03 34.4 19.2 38.5 24.9 35.9 49.6
  15. 4 2020-01-04 19.0 38.9 46.7 10.6 32.6 6.28
  16. 5 2020-01-05 13.4 19.3 0.670 19.1 43.5 17.0
  17. #pivot longer by group prefix
  18. df %>%
  19. select(date,matches("A\\.")) %>%
  20. rename_with(~str_replace(.x,"A\\.","")) %>%
  21. mutate( k = "A") %>%
  22. bind_rows(
  23. df %>%
  24. select(date,matches("B\\.")) %>%
  25. rename_with(~str_replace(.x,"B\\.","")) %>%
  26. mutate( k = "B")
  27. )
  28. date mean median min k
  29. <date> <dbl> <dbl> <dbl> <chr>
  30. 1 2020-01-01 13.3 18.6 28.6 A
  31. 2 2020-01-02 47.2 33.0 31.5 A
  32. 3 2020-01-03 34.4 19.2 38.5 A
  33. 4 2020-01-04 19.0 38.9 46.7 A
  34. 5 2020-01-05 13.4 19.3 0.670 A
  35. 6 2020-01-01 45.4 10.1 44.9 B
  36. 7 2020-01-02 3.09 10.3 8.83 B
  37. 8 2020-01-03 24.9 35.9 49.6 B
  38. 9 2020-01-04 10.6 32.6 6.28 B
  39. 10 2020-01-05 19.1 43.5 17.0 B

字符串

z31licg0

z31licg01#

下面是一个两步的过程(为了演示目的,用两行显示)。首先,透视更长以创建k、统计名称和值的列,然后透视更宽以创建所需的结果。

  1. library(tidyr)
  2. set.seed(1)
  3. df <- data.frame(
  4. date = rep(seq(as.Date("2020-01-01"),as.Date("2020-01-05"),by="day"),each=6),
  5. k = rep(c("A.mean","A.median","A.min","B.mean","B.median","B.min"),5),
  6. v = runif(30,0,50)
  7. ) %>%
  8. pivot_wider(names_from = k, values_from = v)
  9. #temp <- pivot_longer(df, -date, names_sep = "\\.", names_to = c("k", "stat"))
  10. #answer <- pivot_wider(temp, id_cols = c("date", "k"), names_from= "stat", values_from="value")
  11. #updated answer simplified down to just the pivot longer function
  12. answer <- pivot_longer(df, -date, names_sep = "\\.", names_to = c("k", ".value"))
  13. print(head(answer))
  14. # A tibble: 6 x 5
  15. date k mean median min
  16. <date> <chr> <dbl> <dbl> <dbl>
  17. 1 2020-01-01 A 13.3 18.6 28.6
  18. 2 2020-01-01 B 45.4 10.1 44.9
  19. 3 2020-01-02 A 47.2 33.0 31.5
  20. 4 2020-01-02 B 3.09 10.3 8.83
  21. 5 2020-01-03 A 34.4 19.2 38.5
  22. 6 2020-01-03 B 24.9 35.9 49.6

字符串

展开查看全部
li9yvcax

li9yvcax2#

希望这能起作用:

  1. df %>% pivot_longer(cols = contains(".")) %>%
  2. mutate(k = substr(name,1,1), name = substr(name,3,nchar(name))) %>%
  3. pivot_wider(names_from = name, values_from = value) %>%
  4. arrange(k)

字符串
例如:

  1. df
  2. # A tibble: 5 x 7
  3. # date A.mean A.median A.min B.mean B.median B.min
  4. # <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
  5. #1 2020-01-01 17.9 40.2 12.6 32.7 17.9 14.3
  6. #2 2020-01-02 49.5 29.8 50.0 36.5 0.788 49.7
  7. #3 2020-01-03 0.375 48.2 20.7 14.9 33.0 12.1
  8. #4 2020-01-04 5.42 10.1 16.8 35.5 49.4 10.7
  9. #5 2020-01-05 17.9 28.2 5.64 25.8 31.3 10.8
  10. df %>% pivot_longer(cols = contains(".")) %>%
  11. mutate(k = substr(name,1,1), name = substr(name,3,nchar(name))) %>%
  12. pivot_wider(names_from = name, values_from = value) %>%
  13. arrange(k)
  14. # A tibble: 10 x 5
  15. # date k mean median min
  16. <date> <chr> <dbl> <dbl> <dbl>
  17. # 1 2020-01-01 A 17.9 40.2 12.6
  18. # 2 2020-01-02 A 49.5 29.8 50.0
  19. # 3 2020-01-03 A 0.375 48.2 20.7
  20. # 4 2020-01-04 A 5.42 10.1 16.8
  21. # 5 2020-01-05 A 17.9 28.2 5.64
  22. # 6 2020-01-01 B 32.7 17.9 14.3
  23. # 7 2020-01-02 B 36.5 0.788 49.7
  24. # 8 2020-01-03 B 14.9 33.0 12.1
  25. # 9 2020-01-04 B 35.5 49.4 10.7
  26. #10 2020-01-05 B 25.8 31.3 10.8

展开查看全部

相关问题