R语言 旋转更长时间以比较病例和对照样本,而不是NA值

k4ymrczo  于 2023-07-31  发布在  其他
关注(0)|答案(2)|浏览(81)

我有一个 Dataframe ,看起来与此类似

gene_id gene_short_name S1-Case S2-Case S3-Case S4-Control S5-Control S6-Control
EN0001  TFG             0.003   0.001   0.002   0.001       0.002     0.003

字符串
我想要

gene_id gene_short_name Case   Control
EN0001  TFG             0.003  0.001
EN0001  TFG             0.001  0.002
EN0001  TFG             0.002  0.003


我试过的代码是这样的

df_longer <- pivot_longer(df, cols  = -c(gene_id, gene_short_name), names_to = c("sample", ".value"), 
         names_sep = "-", names_repair = "check_unique" )


但这给了我

gene_id gene_short_name  Sample Case   Control
EN0001     TFG              S1      0.003  NA
EN0001     TFG              S2      0.001  NA
EN0001     TFG              S3      0.002  NA
EN0001     TFG              S4      NA     0.001
EN0001     TFG              S5      NA     0.002
EN0001     TFG              S6      NA     0.003


有没有办法在pivot_longer中删除这些数据,或者我必须在使用pivot longer之前重新排列数据?谢谢你

mwg9r5ms

mwg9r5ms1#

由于要丢弃前面的S[0-9]-部分,因此我们使用names_pattern=并丢弃前面的部分。

pivot_longer(df, cols  = -c(gene_id, gene_short_name),
    names_to = ".value", names_pattern = ".*-(.*)")
# # A tibble: 3 × 4
#   gene_id gene_short_name  Case Control
#   <chr>   <chr>           <dbl>   <dbl>
# 1 EN0001  TFG             0.003   0.001
# 2 EN0001  TFG             0.001   0.002
# 3 EN0001  TFG             0.002   0.003

字符串

  • 编辑 *:为了保留S[1-6],这在这里可以工作,并且可能足以在一般情况下工作。我保留了辅助变量rn1rn2主要是为了显示它们包含的内容,它们可以在第二个pivot之后安全地删除:
df %>%
  mutate(rn1 = row_number()) %>%
  pivot_longer(cols  = -c(rn1, gene_id, gene_short_name)) %>%
  separate(name, into = c("S", "var")) %>%
  group_by(rn1, var) %>%
  mutate(rn2 = row_number()) %>%
  ungroup() %>%
  pivot_wider(c(gene_id, gene_short_name, rn1, rn2), names_from = var, values_from = c(value, S))
# # A tibble: 3 × 8
#   gene_id gene_short_name   rn1   rn2 value_Case value_Control S_Case S_Control
#   <chr>   <chr>           <int> <int>      <dbl>         <dbl> <chr>  <chr>    
# 1 EN0001  TFG                 1     1      0.003         0.001 S1     S4       
# 2 EN0001  TFG                 1     2      0.001         0.002 S2     S5       
# 3 EN0001  TFG                 1     3      0.002         0.003 S3     S6


数据类型

df <- structure(list(gene_id = "EN0001", gene_short_name = "TFG", "S1-Case" = 0.003, "S2-Case" = 0.001, "S3-Case" = 0.002, "S4-Control" = 0.001, "S5-Control" = 0.002, "S6-Control" = 0.003), class = "data.frame", row.names = c(NA, -1L))

6bc51xsx

6bc51xsx2#

不是有效的代码,而是一种替代方法

df %>% pivot_longer(cols = c(starts_with('S'))) %>% 
  separate(name, into = c('s','c'), sep = '-') %>% 
  pivot_wider(id_cols = c(gene_id,gene_short_name), names_from = c, values_from = value, values_fn = list) %>% 
  unnest(c(Case, Control))

字符串
创建于2023-07-15带有reprex v2.0.2

# A tibble: 3 × 4
  gene_id gene_short_name  Case Control
  <chr>   <chr>           <dbl>   <dbl>
1 EN0001  TFG             0.003   0.001
2 EN0001  TFG             0.001   0.002
3 EN0001  TFG             0.002   0.003

相关问题