R语言 如何使用带年份的列名重塑数据

lvmkulzt  于 2023-04-18  发布在  其他
关注(0)|答案(2)|浏览(160)

示例 Dataframe

df <- data.frame(countrycode=c("A","B","C"),
                 hdi_1999 = c(0.7, 0.8, 0.6),
                 hdi_2000 = c(0.71, 0.81, 0.61),
                 hdi_2001 = c(0.72, 0.82, 0.62),
                 icrg_1999 = c(60, 50, 70),
                 icrg_2000 = c(61, 51, 71),
                 icrg_2001 = c(62, 52, 72))

我需要的是4列唯一的国家代码年,其中年是_ 1999 2000 2001后的数字。

countrycode year hdi icrg

我的准则是

df_new <- df %>%
  pivot_longer(cols = starts_with("hdi"),
               names_to = c("hdi", "year"),
               names_sep = "_",
               values_to = "hdi_value",
               names_repair = "unique") %>%
  pivot_longer(cols = starts_with("icrg"),
               names_to = c("icrg", "year"),
               names_sep = "_",
               values_to = "icrg_value",
               names_repair = "unique")

其结果不是唯一的国家代码-年份对

ccgok5k5

ccgok5k51#

我们可以简单地在pivot_longer本身中完成此操作

library(tidyr)
pivot_longer(df, cols = -countrycode, 
   names_to = c(".value", "year"), names_pattern = "(.*)_(\\d{4})$")
  • 输出
# A tibble: 9 × 4
  countrycode year    hdi  icrg
  <chr>       <chr> <dbl> <dbl>
1 A           1999   0.7     60
2 A           2000   0.71    61
3 A           2001   0.72    62
4 B           1999   0.8     50
5 B           2000   0.81    51
6 B           2001   0.82    52
7 C           1999   0.6     70
8 C           2000   0.61    71
9 C           2001   0.62    72
nuypyhwy

nuypyhwy2#

使用reshape

df |> reshape(idvar=1, direction='long', varying=list(2:4, 5:7), sep=' ', 
              v.names=c('hdi', 'icrg'), times=1999:2001)
#         countrycode time icrg hdi
# A.1999           A 1999 0.70   60
# B.1999           B 1999 0.80   50
# C.1999           C 1999 0.60   70
# A.2000           A 2000 0.71   61
# B.2000           B 2000 0.81   51
# C.2000           C 2000 0.61   71
# A.2001           A 2001 0.72   62
# B.2001           B 2001 0.82   52
# C.2001           C 2001 0.62   72

相关问题