R语言 提取行前面的名称

2cmtqfgy  于 2023-06-19  发布在  其他
关注(0)|答案(1)|浏览(113)

我有一个DataFrame,我试着提取出包含基因名称的行名称,这样我就可以在我做的这个东西中使用它们。

> dput(head(top_genes, 10))
structure(list(gene = c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10"), padj = c(4.06560302580566e-07, 4.06560302580566e-07, 
8.82394456455233e-06, 8.82394456455233e-06, 1.40670757561308e-05, 
3.14618269147922e-05, 5.45992164401531e-05, 6.357130849789e-05, 
9.57636036627077e-05, 0.000162558694623289)), row.names = c(NA, 
10L), class = "data.frame")

> top_genes
DataFrame with 10 rows and 6 columns
                 baseMean log2FoldChange     lfcSE      stat      pvalue        padj
                <numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric>
ENSG00000189057  208.4504        2.46758  0.374405   6.59067 4.37844e-11 4.06560e-07
ENSG00000144857  449.1676       -1.75691  0.264875  -6.63299 3.28954e-11 4.06560e-07
ENSG00000086570  108.1964       -3.11881  0.517200  -6.03018 1.63777e-09 8.82394e-06
ENSG00000188229 3466.9125        1.20077  0.199925   6.00608 1.90059e-09 8.82394e-06
ENSG00000073464   69.5251       -1.99824  0.339075  -5.89322 3.78738e-09 1.40671e-05
ENSG00000145362  345.8562       -2.00053  0.349257  -5.72796 1.01648e-08 3.14618e-05
ENSG00000259976  255.7929       -1.44627  0.257937  -5.60705 2.05802e-08 5.45992e-05
ENSG00000146648  450.7668       -1.63299  0.293841  -5.55738 2.73852e-08 6.35713e-05
ENSG00000118523 7688.2137       -2.51771  0.460736  -5.46454 4.64096e-08 9.57636e-05
ENSG00000167992   60.4870       -5.69455  1.064230  -5.35087 8.75336e-08 1.62559e-04

但是当我尝试使用这个代码时,我得到的是数字而不是名字。

# Create a dataframe with the top genes data
top_genes <- data.frame(
  gene = c(row.names(top_genes)),
  padj = c(top_genes$padj)
)

# Create the plot
ggplot(top_genes, aes(x = gene, y = -log10(padj))) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(x = "Gene", y = "-log10(Adjusted p-value)", title = "Differential Expression - Top Genes") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Example of the output
我该怎么解决?

6l7fqoea

6l7fqoea1#

您的输出不包括列名,因此很难重新创建您的问题。当我重新创建数据集和函数时,我得到了预期的结果。

top_genes <- structure(list(gene = c("1", "2", "3", "4", "5", "6", "7", "8", 
                        "9", "10"), padj = c(4.06560302580566e-07, 4.06560302580566e-07, 
                                             8.82394456455233e-06, 8.82394456455233e-06, 1.40670757561308e-05, 
                                             3.14618269147922e-05, 5.45992164401531e-05, 6.357130849789e-05, 
                                             9.57636036627077e-05, 0.000162558694623289)), row.names = c(NA, 
                                                                                                         10L), class = "data.frame")


gene_names <- c('ENSG00000189057',  
                'ENSG00000144857', 
                'ENSG00000086570',  
                'ENSG00000188229',
                'ENSG00000073464',   
                'ENSG00000145362', 
                'ENSG00000259976', 
                'ENSG00000146648',  
                'ENSG00000118523', 
                'ENSG00000167992')

rownames(top_genes) <- gene_names

top_genes <- data.frame(
  gene = c(row.names(top_genes)),
  padj = c(top_genes$padj)
)

ggplot(top_genes, aes(x = gene, y = -log10(padj))) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(x = "Gene", y = "-log10(Adjusted p-value)", title = "Differential Expression - Top Genes") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

但是,如果您关心基因名称,我建议使用以下命令将行名称移动到列中:

top_genes$gene_names <- row.names(top_genes)

然后更新你的ggplot函数,使它读起来像这样:

ggplot(top_genes, aes(x = gene_names, y = -log10(padj))) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(x = "Gene", y = "-log10(Adjusted p-value)", title = "Differential Expression - Top Genes") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

相关问题