R语言 将因子水平转换为列,将列转换为因子水平

mfuanj7w  于 2023-06-19  发布在  其他
关注(0)|答案(2)|浏览(131)

我有一个关于一个问题的数据集,用于对攻读博士学位的原因进行排名(例如)。

df <- data.frame(
  id = c(1:4),
  rank1 = c("Salary", "Interest", "Career", "Title"),
  rank2 = c("Title", "Career", "Salary", NA),
  rank3 = c("Interest", "Title", NA, NA),
  rank4 = c("Career", NA, NA, NA))

> df
  id    rank1  rank2    rank3  rank4
1  1   Salary  Title Interest Career
2  2 Interest Career    Title   <NA>
3  3   Career Salary     <NA>   <NA>
4  4    Title   <NA>     <NA>   <NA>

ID为1的人将"Salary"评为最重要的原因,然后是"Title",依此类推...
然而,我试图将变量的因子水平转换为列,并将列转换为变量的因子水平,以获得以下结果:

id Salary Title Interest Career
1  1  rank1 rank2    rank3  rank4
2  2   <NA> rank3    rank1  rank2
3  3  rank2  <NA>     <NA>  rank1
4  4   <NA> rank1     <NA>   <NA>

在R中有没有办法做到这一点?我已经从tidyr尝试过spread(),但这不是我的目标。任何帮助是赞赏!谢谢你!

laik7k3q

laik7k3q1#

我相信@Chamkrai有你想要的答案(目前已删除),但我在考虑如何处理NA。在本例中,您可以将id 2的NA替换为“Salary”,因为这是该id唯一一个缺失的值。您也可以通过从“缺失”值中采样来填充其他NA。我还没有能够制定出一个简洁的方法,但有一个小的机会,这将有助于您的实际用例:

library(tidyverse)

df <- data.frame(
  id = c(1:4),
  rank1 = c("Salary", "Interest", "Career", "Title"),
  rank2 = c("Title", "Career", "Salary", NA),
  rank3 = c("Interest", "Title", NA, NA),
  rank4 = c("Career", NA, NA, NA))
df
#>   id    rank1  rank2    rank3  rank4
#> 1  1   Salary  Title Interest Career
#> 2  2 Interest Career    Title   <NA>
#> 3  3   Career Salary     <NA>   <NA>
#> 4  4    Title   <NA>     <NA>   <NA>

unique_values <- df %>%
  select(-id) %>%
  pivot_longer(everything()) %>%
  na.omit() %>%
  distinct(value) %>%
  pull(value)
unique_values
#> [1] "Salary"   "Title"    "Interest" "Career"

df %>%
  t %>%
  as.data.frame %>%
  mutate(across(everything(), 
                ~ifelse(is.na(.x) & row_number() > 1,
                        unique_values[!(unique_values %in% .x)],
                        .x))) %>%
  t %>%
  as.data.frame %>%
  pivot_longer(-id) %>%
  pivot_wider(names_from = value,
              values_from = name)
#> # A tibble: 4 × 5
#>   id    Salary Title Interest Career
#>   <chr> <chr>  <chr> <chr>    <chr> 
#> 1 1     rank1  rank2 rank3    rank4 
#> 2 2     rank4  rank3 rank1    rank2 
#> 3 3     rank2  rank4 rank3    rank1 
#> 4 4     rank3  rank1 rank4    rank2

创建于2023-06-15带有reprex v2.0.2

bvpmtnay

bvpmtnay2#

library(tidyverse)
(df <- data.frame(
  id = c(1:4),
  rank1 = c("Salary", "Interest", "Career", "Title"),
  rank2 = c("Title", "Career", "Salary", NA),
  rank3 = c("Interest", "Title", NA, NA),
  rank4 = c("Career", NA, NA, NA)))

(df_long <- pivot_longer(df,
                        cols=-id) |> na.omit())

(df_rewide <- pivot_wider(data = df_long,
                          id_cols = "id",
                          names_from = "value",
                          values_from = "name"))

相关问题