R语言如何使用逗号分隔的数字作为数值变量

cigdeys3 于 2023-05-11 发布在其他

关注(0)|答案(3)|浏览(216)

我正在处理Qualtrics的调查回复，并在R.
15个问题是多项选择题，其中一个人可以选择多个选项（例如，选择选项1，3和4的人的输出看起来像“1，3，4”）。我有4个问题（而不是15个）：社会、情感、认知和家庭。如果一个人在社交中选择1、2和4，输出将是“1、2、4”，如果他只选择“2”作为家庭，输出将是“2”。参见下面的示例数据库：
| 性别|社会的|情感的|认知的|家庭|
| --------------|--------------|--------------|--------------|--------------|
| 1| 1|一、二、四|三|二|
| 二|二|三四|四个|一、二、四|
| 1|三四|一、三|一二三|1|
社交/情感/认知/家庭列中的每个数字代表一个类别。如果被访者勾选“1”，我对该类别的回答是肯定的，如果他没有勾选，我对该类别的回答是否定的。因此，这些列中的每个数字实际上是一个二进制响应（正/负）。
因此，为了能够分析数据（卡方），我希望数据框看起来像这样：
| 性别|社交1|社交2|社交3|社交4|
| --------------|--------------|--------------|--------------|--------------|
| 1|是的|不|是的|是的|
| 二|是的|是的|不|不|
| 1|不|不|是的|不|
有没有一个函数或一系列函数可以让我这样做？
请注意，我有15个问题（即15列），所以我更希望我能在整个 Dataframe 上做，而不仅仅是一个问题。
我试着这样做（对于每个列）：

data<- read.csv("data.csv")
social.data<- data.frame(Sex=c(data$gender),
                       social=c(data$social),
                       str_split_fixed(data$social, ',', 3))

R给我的数字是分开列的。。从那里，我不知道该怎么做才能得到我上面描述的所需的 Dataframe ？

来源：https://stackoverflow.com/questions/76217067/how-to-use-numbers-separated-by-comma-as-numeric-variables

3条答案

按热度按时间

u0sqgete1#

首先，使用str_split()将逗号分隔的字符串拆分为一个数字列表。然后，您可以Map已知的响应值来创建二进制变量。

library(tidyr)
library(dplyr)
dat <- data.frame(
  gender = c(1,2,1), 
  social = c("1", "2", "3,4"), 
  emotional = c("1,2,4", "3,4", "1,3"), 
  cognitive = c("3", "4", "1,2,3"), 
  family=c("2", "1,2,4", "1")
)
purrr::map(1:4, \(i){
  dat %>% 
    mutate(across(social:family,  ~purrr::map(str_split(.x, ","), as.numeric))) %>% 
    rowwise() %>% 
    transmute(across(social:family,  ~+(i %in% .x), .names = paste0("{.col}", i)))}) %>% 
    bind_cols() %>% 
    bind_cols(dat,.)
#>   gender social emotional cognitive family social1 emotional1 cognitive1
#> 1      1      1     1,2,4         3      2       1          1          0
#> 2      2      2       3,4         4  1,2,4       0          0          0
#> 3      1    3,4       1,3     1,2,3      1       0          1          1
#>   family1 social2 emotional2 cognitive2 family2 social3 emotional3 cognitive3
#> 1       0       0          1          0       1       0          0          1
#> 2       1       1          0          0       1       0          1          0
#> 3       1       0          0          1       0       1          1          1
#>   family3 social4 emotional4 cognitive4 family4
#> 1       0       0          1          0       0
#> 2       0       0          1          1       1
#> 3       0       1          0          0       0

旧答案：

library(stringr)
library(tidyr)
library(dplyr)
dat <- data.frame(
  gender = c(1,2,1), 
  social = c("1", "2", "3,4"), 
  emotional = c("1,2,4", "3,4", "1,3"), 
  cognitive = c("3", "4", "1,2,3"), 
  family=c("2", "1,2,4", "1")
)
dat <- dat %>% 
  mutate(across(social:family,  ~purrr::map(str_split(.x, ","), as.numeric)))

然后，您可以逐个对列表列使用unnest()，并使用pivot_wider()从tidyr将它们旋转得更宽。

dat %>% 
  mutate(obs = row_number()) %>% 
  dplyr::select(obs, everything()) %>% 
  unnest(social) %>% 
  pivot_wider(names_from = "social", 
              values_from = "social", 
              values_fn =  function(x)as.numeric(!is.na(x)), 
              names_prefix="social", 
              values_fill=0) %>%
  unnest(emotional) %>% 
  pivot_wider(names_from = "emotional", 
              values_from = "emotional", 
              values_fn =  function(x)as.numeric(!is.na(x)), 
              names_prefix="emotional", 
              values_fill=0) %>% 
  unnest(cognitive) %>% 
  pivot_wider(names_from = "cognitive", 
              values_from = "cognitive", 
              values_fn =  function(x)as.numeric(!is.na(x)), 
              names_prefix="cognitive", 
              values_fill=0) %>% 
  unnest(family) %>% 
  pivot_wider(names_from = "family", 
              values_from = "family", 
              values_fn =  function(x)as.numeric(!is.na(x)), 
              names_prefix="family", 
              values_fill=0)
#> # A tibble: 3 × 17
#>     obs gender social1 social2 social3 social4 emotion…¹ emoti…² emoti…³ emoti…⁴
#>   <int>  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>     <dbl>   <dbl>   <dbl>   <dbl>
#> 1     1      1       1       0       0       0         1       1       1       0
#> 2     2      2       0       1       0       0         0       0       1       1
#> 3     3      1       0       0       1       1         1       0       0       1
#> # … with 7 more variables: cognitive3 <dbl>, cognitive4 <dbl>,
#> #   cognitive1 <dbl>, cognitive2 <dbl>, family2 <dbl>, family1 <dbl>,
#> #   family4 <dbl>, and abbreviated variable names ¹emotional1, ²emotional2,
#> #   ³emotional4, ⁴emotional3

你也可以将unnest()和pivot_wider()步骤转换成一个函数，然后只对数据调用该函数：

u_pivot <- function(.data, x){
  xn <- as_label(enquo(x))
  .data %>% 
    unnest({{ x }}) %>%
    pivot_wider(names_from =  xn,
                values_from =  xn,
                values_fn =  function(x)as.numeric(!is.na(x)),
                names_prefix= xn,
                values_fill=0)
}

dat %>% 
  mutate(obs = row_number()) %>% 
  u_pivot(social) %>% 
  u_pivot(emotional) %>% 
  u_pivot(cognitive) %>% 
  u_pivot(family)

#> # A tibble: 3 × 17
#>   gender   obs social1 social2 social3 social4 emotion…¹ emoti…² emoti…³ emoti…⁴
#>    <dbl> <int>   <dbl>   <dbl>   <dbl>   <dbl>     <dbl>   <dbl>   <dbl>   <dbl>
#> 1      1     1       1       0       0       0         1       1       1       0
#> 2      2     2       0       1       0       0         0       0       1       1
#> 3      1     3       0       0       1       1         1       0       0       1
#> # … with 7 more variables: cognitive3 <dbl>, cognitive4 <dbl>,
#> #   cognitive1 <dbl>, cognitive2 <dbl>, family2 <dbl>, family1 <dbl>,
#> #   family4 <dbl>, and abbreviated variable names ¹emotional1, ²emotional2,
#> #   ³emotional4, ⁴emotional3

创建于2023-05-10带有reprex v2.0.2

赞(0）回复(0）举报 2023-05-11

bxgwgixi2#

也许是个开始。您可以使用dplyr::across在多个列或所有列之间移动。这将需要一些清理后，但应该让你开始。
首先是一些数据：

library(tidyverse)
data <- tibble(gender=c(1,2),
               social=c(1,'1,2,4'),
               emotional=c(2,'3,4'))

不确定是否有一种方法来编程，所以它认识到逗号的数量，但这增加了空白的地方有少于最大，你需要硬编码的最大！

data %>% 
  mutate(across(.cols = everything(),
                .fns = ~str_split_fixed(.,',',3)))

要重命名可能会检查这个问题：Splitting multiple string columns and rename the new columns adequately- R
下面是链接中针对这个问题的代码：

pipe_to_do <- . %>%
  str_split_fixed(string = .,pattern = "(,)",n = 3) %>% 
  as_tibble() %>% 
  rename(letter = V1,
         number = V2,
         sign = V3)

xx <- data %>%
  summarise(across(everything(),.fns = pipe_to_do))
xx

names_xx <- names(xx)

combine_names <- function(df,name) {
  str_c(name,"_",df)
}

combine_names_func <- function(df,name){
  df %>% 
    rename_with(.fn = ~ combine_names(.x,name))
}

map2(xx,names_xx,combine_names_func) %>% 
  reduce(bind_cols)

赞(0）回复(0）举报 2023-05-11

kyvafyod3#

使用 data.table，将数据重新整形为长格式- melt，然后在逗号上 split，然后将其重新整形为宽格式- dcast：

library(data.table)

d <- fread("gender  social  emotional   cognitive   family
1   1   1,2,4   3   2
2   2   3,4 4   1,2,4
1   3,4 1,3 1,2,3   1")

d[, id := .I
  ][, melt(.SD, id.vars = c("id", "gender"), variable.name = "grp")
    ][, .(x = paste(grp, unlist(tstrsplit(value, split = ",")), sep = "_")), by = .(id, gender, grp)
      ][, dcast(.SD, id + gender ~ x, \(i){sum(!is.na(i))}) ]

#    id gender cognitive_1 cognitive_2 cognitive_3 cognitive_4
# 1:  1      1           0           0           1           0
# 2:  2      2           0           0           0           1
# 3:  3      1           1           1           1           0
#    emotional_1 emotional_2 emotional_3 emotional_4 family_1 family_2
# 1:           1           1           0           1        0        1
# 2:           0           0           1           1        1        1
# 3:           1           0           1           0        1        0
#    family_4 social_1 social_2 social_3 social_4
# 1:        0        1        0        0        0
# 2:        1        0        1        0        0
# 3:        0        0        0        1        1

赞(0）回复(0）举报 2023-05-11

我来回答

R语言如何使用逗号分隔的数字作为数值变量

3条答案

旧答案：

相关问题

热门标签

最新问答

R语言 如何使用逗号分隔的数字作为数值变量

3条答案

旧答案：

相关问题

热门标签

最新问答

R语言如何使用逗号分隔的数字作为数值变量