R语言使用逗号作为分隔符拆分到新行时出错

o3imoua4 于 2023-02-14 发布在其他

关注(0)|答案(2)|浏览(208)

我有以下 Dataframe

temp = structure(list(pid = c("s1", "s1", "s1"), LEFT_GENE = c("PTPRO", "EPS8", "DPY19L2,AC084357.2,AC027667.1"
), RIGHT_GENE = c("", "FOx,D", "DPY19L2P2,S100A11P1")), row.names = c(1L, 2L, 3L), class = "data.frame")

  pid                     LEFT_GENE          RIGHT_GENE
1  s1                         PTPRO                    
2  s1                          EPS8                 FOx, D
3  s1 DPY19L2,AC084357.2,AC027667.1 DPY19L2P2,S100A11P1

我想将每个用逗号分隔的项目拆分成一个新行并创建新的组合。例如，最后一行应该创建6个新的附加行。但是，我得到这个错误，我不明白。

temp %>%
  separate_rows(LEFT_GENE:RIGHT_GENE, sep=",") %>%  
  data.frame ( stringsAsFactors = F)

Error in `fn()`:
! In row 3, can't recycle input of size 3 to size 2.
Run `rlang::last_error()` to see where the error occurred.

但是，错误似乎来自第3行，因为第1：2行工作正常

> temp[1:2, 
+      ] %>%
+   separate_rows(LEFT_GENE:RIGHT_GENE, sep=",") %>%  
+   data.frame ( stringsAsFactors = F)
  pid LEFT_GENE RIGHT_GENE
1  s1     PTPRO           
2  s1      EPS8        FOx
3  s1      EPS8          D

有人知道问题出在哪吗？

来源：https://stackoverflow.com/questions/75439645/error-while-splitting-into-new-row-with-comma-as-delimiter

2条答案

按热度按时间

ljsrvy3e1#

一次只能分隔一列

temp %>%
   separate_rows(RIGHT_GENE)%>%
   separate_rows(LEFT_GENE)

# A tibble: 9 × 3
  pid   LEFT_GENE  RIGHT_GENE 
  <chr> <chr>      <chr>      
1 s1    PTPRO      ""         
2 s1    EPS8       "FOx"      
3 s1    EPS8       "D"        
4 s1    DPY19L2    "DPY19L2P2"
5 s1    AC084357.2 "DPY19L2P2"
6 s1    AC027667.1 "DPY19L2P2"
7 s1    DPY19L2    "S100A11P1"
8 s1    AC084357.2 "S100A11P1"
9 s1    AC027667.1 "S100A11P1"

赞(0）回复(0）举报 2023-02-14

2ul0zpep2#

如果需要6行，则选项为

library(dplyr)
library(tidyr)
library(stringr)
library(purrr)
temp %>% 
  mutate(across(ends_with("_GENE"), ~ strsplit(.x,  split = ",")), 
  cnt = pmax(lengths(LEFT_GENE), lengths(RIGHT_GENE))) %>% 
  mutate(across(ends_with("_GENE"),
    ~ map2(.x, cnt, ~ `length<-`(.x, .y)))) %>%
  select(-cnt) %>%
  unnest_longer(where(is.list))

输出

# A tibble: 6 × 3
  pid   LEFT_GENE  RIGHT_GENE
  <chr> <chr>      <chr>     
1 s1    PTPRO      <NA>      
2 s1    EPS8       FOx       
3 s1    <NA>       D         
4 s1    DPY19L2    DPY19L2P2 
5 s1    AC084357.2 S100A11P1 
6 s1    AC027667.1 <NA>

如果NA应替换为之前的非NA，则在末尾添加fill

...
%>% fill(ends_with("_GENE"))
# A tibble: 6 × 3
  pid   LEFT_GENE  RIGHT_GENE
  <chr> <chr>      <chr>     
1 s1    PTPRO      <NA>      
2 s1    EPS8       FOx       
3 s1    EPS8       D         
4 s1    DPY19L2    DPY19L2P2 
5 s1    AC084357.2 S100A11P1 
6 s1    AC027667.1 S100A11P1

赞(0）回复(0）举报 2023-02-14

我来回答

R语言使用逗号作为分隔符拆分到新行时出错

2条答案

相关问题

热门标签

最新问答

R语言 使用逗号作为分隔符拆分到新行时出错

2条答案

相关问题

热门标签

最新问答

R语言使用逗号作为分隔符拆分到新行时出错