从列中提取Chr编号[重复]

nbnkbykc  于 2023-11-14  发布在  其他
关注(0)|答案(3)|浏览(111)

这个问题已经有答案了

Extracting numbers from vectors of strings(13个回答)
两年前关闭。
我有一个数据框,其中有一列包含染色体详细信息(1到22)。我想创建另一列,其中仅包含染色体编号

cbwuti44

cbwuti441#

使用stringr包和regex你可以实现你正在寻找的,但你需要知道所有的可能性。也许如果你想要的和烦人的信息之间只有下划线,你可以使用str_split和“_”作为模式参数来解决你的问题。

library(stringr)
df <- data.frame(chromosome = c("chr6_GL000253v2_alt", "chr6_GL000254v2_alt",
                                "chr6_GL000255v2_alt", "chr6_GL000256v2_alt", "chr4", "chr11",
                                "chr8", "chr12", "chr2", "chr12", "chr4", "chr6", "chr15", "chr4",
                                "chr2"))
df$chromosome_fixed=str_split(df$chromosome,"_",simplify = T)[,1]

字符串

gab6jxml

gab6jxml2#

使用data.table软件包的解决方案:

REPREX

  • 代码
library(data.table)
library(stringr)

DT[, Chr_ID := lapply(.SD, str_extract,"(?<=^chr)\\d+"), .SDcols = "chromosome"]

字符串

  • 输出
DT
#>              chromosome Chr_ID
#>  1: chr6_GL000253v2_alt      6
#>  2: chr6_GL000254v2_alt      6
#>  3: chr6_GL000255v2_alt      6
#>  4: chr6_GL000256v2_alt      6
#>  5:                chr4      4
#>  6:               chr11     11
#>  7:                chr8      8
#>  8:               chr12     12
#>  9:                chr2      2
#> 10:               chr12     12
#> 11:                chr4      4
#> 12:                chr6      6
#> 13:               chr15     15
#> 14:                chr4      4
#> 15:                chr2      2

  • 您的数据
DT <- data.table(chromosome = c("chr6_GL000253v2_alt", "chr6_GL000254v2_alt",
                 "chr6_GL000255v2_alt", "chr6_GL000256v2_alt", "chr4", "chr11",
                 "chr8", "chr12", "chr2", "chr12", "chr4", "chr6", "chr15", "chr4",
                 "chr2"))
DT
#>              chromosome
#>  1: chr6_GL000253v2_alt
#>  2: chr6_GL000254v2_alt
#>  3: chr6_GL000255v2_alt
#>  4: chr6_GL000256v2_alt
#>  5:                chr4
#>  6:               chr11
#>  7:                chr8
#>  8:               chr12
#>  9:                chr2
#> 10:               chr12
#> 11:                chr4
#> 12:                chr6
#> 13:               chr15
#> 14:                chr4
#> 15:                chr2


创建于2021-10-12由reprex package(v2.0.1)

ca1c2owp

ca1c2owp3#

我已经创建了类似的列,并将数字提取到一个名为Number的新列:

#Populate a dummy table
df = pd.DataFrame(data=['chr6_GL','chr6_GL00','chr4','chr11','chr8','chr12'], columns=['Data'])
#Extract the numbers using regex and assign it to a new column called 'Number'
df['Numbers']=df['Data'].str.extract(r'chr([0-9]*)')

字符串
数据编号

相关问题