R语言 如何使用逗号分隔的数字作为数值变量

cigdeys3  于 2023-05-11  发布在  其他
关注(0)|答案(3)|浏览(216)

我正在处理Qualtrics的调查回复,并在R.
15个问题是多项选择题,其中一个人可以选择多个选项(例如,选择选项1,3和4的人的输出看起来像“1,3,4”)。我有4个问题(而不是15个):社会、情感、认知和家庭。如果一个人在社交中选择1、2和4,输出将是“1、2、4”,如果他只选择“2”作为家庭,输出将是“2”。参见下面的示例数据库:
| 性别|社会的|情感的|认知的|家庭|
| --------------|--------------|--------------|--------------|--------------|
| 1| 1|一、二、四|三|二|
| 二|二|三四|四个|一、二、四|
| 1|三四|一、三|一二三|1|
社交/情感/认知/家庭列中的每个数字代表一个类别。如果被访者勾选“1”,我对该类别的回答是肯定的,如果他没有勾选,我对该类别的回答是否定的。因此,这些列中的每个数字实际上是一个二进制响应(正/负)。
因此,为了能够分析数据(卡方),我希望数据框看起来像这样:
| 性别|社交1|社交2|社交3|社交4|
| --------------|--------------|--------------|--------------|--------------|
| 1|是的|不|是的|是的|
| 二|是的|是的|不|不|
| 1|不|不|是的|不|
有没有一个函数或一系列函数可以让我这样做?
请注意,我有15个问题(即15列),所以我更希望我能在整个 Dataframe 上做,而不仅仅是一个问题。
我试着这样做(对于每个列):

data<- read.csv("data.csv")
social.data<- data.frame(Sex=c(data$gender),
                       social=c(data$social),
                       str_split_fixed(data$social, ',', 3))

R给我的数字是分开列的。。从那里,我不知道该怎么做才能得到我上面描述的所需的 Dataframe ?

u0sqgete

u0sqgete1#

首先,使用str_split()将逗号分隔的字符串拆分为一个数字列表。然后,您可以Map已知的响应值来创建二进制变量。

library(tidyr)
library(dplyr)
dat <- data.frame(
  gender = c(1,2,1), 
  social = c("1", "2", "3,4"), 
  emotional = c("1,2,4", "3,4", "1,3"), 
  cognitive = c("3", "4", "1,2,3"), 
  family=c("2", "1,2,4", "1")
)
purrr::map(1:4, \(i){
  dat %>% 
    mutate(across(social:family,  ~purrr::map(str_split(.x, ","), as.numeric))) %>% 
    rowwise() %>% 
    transmute(across(social:family,  ~+(i %in% .x), .names = paste0("{.col}", i)))}) %>% 
    bind_cols() %>% 
    bind_cols(dat,.)
#>   gender social emotional cognitive family social1 emotional1 cognitive1
#> 1      1      1     1,2,4         3      2       1          1          0
#> 2      2      2       3,4         4  1,2,4       0          0          0
#> 3      1    3,4       1,3     1,2,3      1       0          1          1
#>   family1 social2 emotional2 cognitive2 family2 social3 emotional3 cognitive3
#> 1       0       0          1          0       1       0          0          1
#> 2       1       1          0          0       1       0          1          0
#> 3       1       0          0          1       0       1          1          1
#>   family3 social4 emotional4 cognitive4 family4
#> 1       0       0          1          0       0
#> 2       0       0          1          1       1
#> 3       0       1          0          0       0
旧答案:
library(stringr)
library(tidyr)
library(dplyr)
dat <- data.frame(
  gender = c(1,2,1), 
  social = c("1", "2", "3,4"), 
  emotional = c("1,2,4", "3,4", "1,3"), 
  cognitive = c("3", "4", "1,2,3"), 
  family=c("2", "1,2,4", "1")
)
dat <- dat %>% 
  mutate(across(social:family,  ~purrr::map(str_split(.x, ","), as.numeric)))

然后,您可以逐个对列表列使用unnest(),并使用pivot_wider()tidyr将它们旋转得更宽。

dat %>% 
  mutate(obs = row_number()) %>% 
  dplyr::select(obs, everything()) %>% 
  unnest(social) %>% 
  pivot_wider(names_from = "social", 
              values_from = "social", 
              values_fn =  function(x)as.numeric(!is.na(x)), 
              names_prefix="social", 
              values_fill=0) %>%
  unnest(emotional) %>% 
  pivot_wider(names_from = "emotional", 
              values_from = "emotional", 
              values_fn =  function(x)as.numeric(!is.na(x)), 
              names_prefix="emotional", 
              values_fill=0) %>% 
  unnest(cognitive) %>% 
  pivot_wider(names_from = "cognitive", 
              values_from = "cognitive", 
              values_fn =  function(x)as.numeric(!is.na(x)), 
              names_prefix="cognitive", 
              values_fill=0) %>% 
  unnest(family) %>% 
  pivot_wider(names_from = "family", 
              values_from = "family", 
              values_fn =  function(x)as.numeric(!is.na(x)), 
              names_prefix="family", 
              values_fill=0)
#> # A tibble: 3 × 17
#>     obs gender social1 social2 social3 social4 emotion…¹ emoti…² emoti…³ emoti…⁴
#>   <int>  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>     <dbl>   <dbl>   <dbl>   <dbl>
#> 1     1      1       1       0       0       0         1       1       1       0
#> 2     2      2       0       1       0       0         0       0       1       1
#> 3     3      1       0       0       1       1         1       0       0       1
#> # … with 7 more variables: cognitive3 <dbl>, cognitive4 <dbl>,
#> #   cognitive1 <dbl>, cognitive2 <dbl>, family2 <dbl>, family1 <dbl>,
#> #   family4 <dbl>, and abbreviated variable names ¹​emotional1, ²​emotional2,
#> #   ³​emotional4, ⁴​emotional3

你也可以将unnest()pivot_wider()步骤转换成一个函数,然后只对数据调用该函数:

u_pivot <- function(.data, x){
  xn <- as_label(enquo(x))
  .data %>% 
    unnest({{ x }}) %>%
    pivot_wider(names_from =  xn,
                values_from =  xn,
                values_fn =  function(x)as.numeric(!is.na(x)),
                names_prefix= xn,
                values_fill=0)
}

dat %>% 
  mutate(obs = row_number()) %>% 
  u_pivot(social) %>% 
  u_pivot(emotional) %>% 
  u_pivot(cognitive) %>% 
  u_pivot(family)

#> # A tibble: 3 × 17
#>   gender   obs social1 social2 social3 social4 emotion…¹ emoti…² emoti…³ emoti…⁴
#>    <dbl> <int>   <dbl>   <dbl>   <dbl>   <dbl>     <dbl>   <dbl>   <dbl>   <dbl>
#> 1      1     1       1       0       0       0         1       1       1       0
#> 2      2     2       0       1       0       0         0       0       1       1
#> 3      1     3       0       0       1       1         1       0       0       1
#> # … with 7 more variables: cognitive3 <dbl>, cognitive4 <dbl>,
#> #   cognitive1 <dbl>, cognitive2 <dbl>, family2 <dbl>, family1 <dbl>,
#> #   family4 <dbl>, and abbreviated variable names ¹​emotional1, ²​emotional2,
#> #   ³​emotional4, ⁴​emotional3

创建于2023-05-10带有reprex v2.0.2

bxgwgixi

bxgwgixi2#

也许是个开始。您可以使用dplyr::across在多个列或所有列之间移动。这将需要一些清理后,但应该让你开始。
首先是一些数据:

library(tidyverse)
data <- tibble(gender=c(1,2),
               social=c(1,'1,2,4'),
               emotional=c(2,'3,4'))

不确定是否有一种方法来编程,所以它认识到逗号的数量,但这增加了空白的地方有少于最大,你需要硬编码的最大!

data %>% 
  mutate(across(.cols = everything(),
                .fns = ~str_split_fixed(.,',',3)))

要重命名可能会检查这个问题:Splitting multiple string columns and rename the new columns adequately- R
下面是链接中针对这个问题的代码:

pipe_to_do <- . %>%
  str_split_fixed(string = .,pattern = "(,)",n = 3) %>% 
  as_tibble() %>% 
  rename(letter = V1,
         number = V2,
         sign = V3)

xx <- data %>%
  summarise(across(everything(),.fns = pipe_to_do))
xx

names_xx <- names(xx)

combine_names <- function(df,name) {
  str_c(name,"_",df)
}

combine_names_func <- function(df,name){
  df %>% 
    rename_with(.fn = ~ combine_names(.x,name))
}

map2(xx,names_xx,combine_names_func) %>% 
  reduce(bind_cols)
kyvafyod

kyvafyod3#

使用 data.table,将数据重新整形为长格式- melt,然后在逗号上 split,然后将其重新整形为宽格式- dcast

library(data.table)

d <- fread("gender  social  emotional   cognitive   family
1   1   1,2,4   3   2
2   2   3,4 4   1,2,4
1   3,4 1,3 1,2,3   1")

d[, id := .I
  ][, melt(.SD, id.vars = c("id", "gender"), variable.name = "grp")
    ][, .(x = paste(grp, unlist(tstrsplit(value, split = ",")), sep = "_")), by = .(id, gender, grp)
      ][, dcast(.SD, id + gender ~ x, \(i){sum(!is.na(i))}) ]

#    id gender cognitive_1 cognitive_2 cognitive_3 cognitive_4
# 1:  1      1           0           0           1           0
# 2:  2      2           0           0           0           1
# 3:  3      1           1           1           1           0
#    emotional_1 emotional_2 emotional_3 emotional_4 family_1 family_2
# 1:           1           1           0           1        0        1
# 2:           0           0           1           1        1        1
# 3:           1           0           1           0        1        0
#    family_4 social_1 social_2 social_3 social_4
# 1:        0        1        0        0        0
# 2:        1        0        1        0        0
# 3:        0        0        0        1        1

相关问题