R语言 如何将合并组合到由列表组成的列?

fjnneemd  于 2023-11-14  发布在  其他
关注(0)|答案(4)|浏览(133)

我想合并的两列位置和。他们看起来像这样:position = c("123, 45", "67, 891, 23")substitution = c("A/B, C/D", "E/F, G/H, J/K"),所以我得到一个列,看起来像这样:
combined = c("A123B, C45D", "E67F, G891H, J23K")
基本上,我想用position中具有相同“索引”的数字替换substitution中的/(它还不是索引,因为它只是一个长字符串)。有什么想法吗?
我想创建一个列,列出氨基酸替换,从它们的位置,它们的参考和突变的氨基酸,不幸的是,我有这种奇怪的格式。
这里有一个小例子:

library(tidyverse)

df <- tibble(name = c("A", "B"),
             position = c("123, 45", "67, 891, 23"),
             substitution = c("A/B, C/D", "E/F, G/H, J/K"))

字符串

mmvthczy

mmvthczy1#

使用分割-应用-合并方法两次。
1.在","上拆分positionsubstitution列。将它们作为新列(使用"_1")以保留原始列。
1.在“/”上拆分拆分后的substitution列,以获得单个字符作为矢量。
1.使用sprintf按要求的顺序创建一个字符串。
1.将字符串以逗号分隔粘贴在一起。

library(dplyr)
library(purrr)

df %>%
  mutate(across(c(position, substitution), 
          ~strsplit(.x, ", "), .names = "{col}_1"), 
         substitution_1 = map(substitution_1, ~strsplit(.x, "/")), 
         combined = map2_chr(position_1, substitution_1, \(x, y) {
           toString(map2_chr(x, y, \(p, q) sprintf("%s%s%s", q[1], p, q[2])))
          })) %>%
  select(-ends_with("_1"))

#  name  position    substitution  combined         
#  <chr> <chr>       <chr>         <chr>            
#1 A     123, 45     A/B, C/D      A123B, C45D      
#2 B     67, 891, 23 E/F, G/H, J/K E67F, G891H, J23K

字符串

ffvjumwh

ffvjumwh2#

你可以试试下面的代码

df %>%
    separate_rows(-name, sep = ",\\s+") %>%
    summarise(
        combined = toString(map2_chr(substitution, position, \(x, y) sub("/", y, x))),
        .by = name
    )

字符串
这给

# A tibble: 2 × 2
  name  combined
  <chr> <chr>
1 A     A123B, C45D
2 B     E67F, G891H, J23K

0vvn1miw

0vvn1miw3#

答案都很好。我的一个朋友想出了一个不同的解决方案:

library(tidyverse)

df <- tibble(name = c("A", "B"),
             position = c("123, 45", "67, 891, 23"),
             substitution = c("A/B, C/D", "E/F, G/H, J/K"))

df <- 
  df |>
  separate_longer_delim(c(position, substitution), ", ") |>
  mutate(combined = paste0(substring(substitution, 1, 1), position, substring(substitution, 3, 3))) |>
  group_by(name) |>
  summarise(
    position = paste(position, collapse = ","),
    substitution = paste(substitution, collapse = ","),
    combined = paste(combined, collapse = ",")
  )

字符串

neekobn8

neekobn84#

有很多很好的答案。这里有另一个,所有的tidyverse,但没有取代的功能:

library(tidyverse)    

df %>%
  separate_longer_delim(-name, delim = ", ") %>% 
  separate_wider_delim(substitution, names = c("start", "end"), delim = "/") %>% 
  mutate(name, combined = paste0(start, position, end), .keep = "none") %>% 
  summarise(.by = name, combined = str_flatten_comma(combined))

# A tibble: 2 × 2
  name  combined         
  <chr> <chr>            
1 A     A123B, C45D      
2 B     E67F, G891H, J23K

字符串

相关问题