我有5个随机dna序列(20长度的dna碱基),我想找到碱基计数。
在第一节中,我准备了一个dna长度函数来生成一个5×20碱基长度的序列。但我想知道基数。序列中有多少个“a”,序列中有多少个“c”,序列中有多少个“g”,序列中有多少个“t”。
prepare dna_length(int) as
with t1 as (select chr(65) as s union select chr(67) union select chr(71) union select chr(84) )
, t2 as ( select s, row_number() over() as rn from t1)
, t3 as ( select generate_series(1,$1) as i,round(random() * 4 + 0.5) as rn )
, t4 as ( select t2.s from t2 join t3 on (t2.rn=t3.rn))
select array_to_string(array(select s from t4),'') as dna;
with t1 as (
select 1 as rn, 'A' as s
union select 2, 'C'
union select 3, 'T'
union select 4, 'G'
), t2 as (
select generate_series(1, 5) as sample
), t3 as (
select t2.sample, generate_series(1,20) as i,
round(random() * 4 + 0.5) as rn
from t2
), t4 as (
select t3.sample, t3.i, t3.rn, t1.s
from t3
join t1 on t1.rn = t3.rn
)
select sample, string_agg(s, '' order by i)
from t4
group by sample
order by sample;
现在看起来是这样的:
id DNA
1 ACTGCTGCAGTCGTACGTAC
2 TGCAGTCGTAGCTGACGTAG
3 GCAGTGACCAACGTGTGACA
4 TGACGTGTCGAGACGAAGAG
5 CGTGTGAGAGTCGTAGAGTG
结果应该是这样的:
id DNA A C G T
1 ACTGCTGCAGTCGTACGTAC 4 6 5 5
2 TGCAGTCGTAGCTGACGTAG 4 4 6 6
3 GCAGTGACCAACGTGTGACA 6 5 6 4
4 TGACGTGTCGAGACGAAGAG 4 3 8 3
5 CGTGTGAGAGTCGTAGAGTG 4 2 9 5
1条答案
按热度按时间e0uiprwp1#
您可以在最终查询中执行条件计数: