Bioinformatics/Subsequence: Difference between revisions
No edit summary |
Thundergnat (talk | contribs) (→{{header|Raku}}: Add a Raku example) |
||
Line 3: | Line 3: | ||
Genarate randomly a string (200 elements) of characters '''A, C, G, and T''' representing a '''DNA''' sequence write a routine to find the position of subsequence (also generating randomly). |
Genarate randomly a string (200 elements) of characters '''A, C, G, and T''' representing a '''DNA''' sequence write a routine to find the position of subsequence (also generating randomly). |
||
<br> Let length of subsequence equal to '''4''' |
<br> Let length of subsequence equal to '''4''' |
||
=={{header|Raku}}== |
|||
Chances are actually pretty small that a random 4 codon string will show up at all in a random 200 codon sequence. Bump up the sequence size to get a reasonable chance of multiple matches. |
|||
<lang perl6>use String::Splice:ver<0.0.3>; |
|||
my $line = 80; |
|||
my $haystack = [~] <A C G T>.roll($line * 8); |
|||
say 'Needle: ' ~ my $needle = [~] <A C G T>.roll(4); |
|||
my $these = $haystack ~~ m:g/<$needle>/; |
|||
my @match = $these.map: { .from, .pos } |
|||
printf "From: %3s to %3s\n", |$_ for @match; |
|||
my $disp = $haystack.comb.batch($line)».join.join("\n"); |
|||
for @match.reverse { |
|||
$disp.=&splice(.[1] + .[1] div $line, "\e[0m" ); |
|||
$disp.=&splice(.[0] + .[0] div $line, "\e[31m"); |
|||
} |
|||
say $disp;</lang> |
|||
{{out}} |
|||
Show in custom div to better display highlighting. |
|||
<div style="font-family: monospace,Courier; line-height: 1.2em; background-color: #f9f9f9; border: 1px solid #ddd; padding: 1em;"> |
|||
Needle: TAGC<br> |
|||
From: 159 to 163<br> |
|||
From: 262 to 266<br> |
|||
From: 315 to 319<br> |
|||
From: 505 to 509<br> |
|||
From: 632 to 636<br> |
|||
CATATGTGACACTGACAGCTCGCGCGAAAATCCGTGTGACGGTCTGAACACTATACTATAGGCCCGGTCGGCATTTGTGG<br> |
|||
CTCCCCAGTGGAGAGACCACTCGTCAATTGCTGACGACTTAACACAAATCGAGTCGCCCTTAGTGCCAGACGGGACTCC<span style="color: #CC0000;">T</span><br> |
|||
<span style="color: #CC0000;">AGC</span>AAAGGGCGGCACGTGGTGACTCCCAATATGTGAGCATGCCATCTAATTGATCTGGGGGGTTTCGCGGGAATACCTAG<br> |
|||
GGGCGTTCTGTCCATGGATCTC<span style="color: #CC0000;">TAGC</span>CCTGCGAAGAGATACCCGCAGTGAGTTGCACGTGCAAAGAACTTGTAAC<span style="color: #CC0000;">TAGC</span>G<br> |
|||
TATTCTGTATCCGCCGCGCGATATGCTTCTGCGGGATGTACTTCTTGTGACTAAGACTTTGTTATCCAAATTGACCAATA<br> |
|||
TTCAACGGTCGACTCTCCGAGGCAGTATCGGTACGCCGAAAAATGGTTACTTCGGCCATACGTAACCTCTCAAGTCACGA<br> |
|||
TTACAGCCCACGGGGGCTTACAGCA<span style="color: #CC0000;">TAGC</span>TCCAAAGACATTCCAATTGAGCTACAACGTGTTCAGTGCGGAGCAGTATCC<br> |
|||
AGTACTCGACTGTTATGGTAAAAGGGCATCGTGATCGTTTATATTAATCATTGGGACAGGTGGTTAATGTCA<span style="color: #CC0000;">TAGC</span>TTAG<br> |
|||
</div> |
|||
=={{header|Ring}}== |
=={{header|Ring}}== |
Revision as of 18:31, 20 March 2021
- Task
Genarate randomly a string (200 elements) of characters A, C, G, and T representing a DNA sequence write a routine to find the position of subsequence (also generating randomly).
Let length of subsequence equal to 4
Raku
Chances are actually pretty small that a random 4 codon string will show up at all in a random 200 codon sequence. Bump up the sequence size to get a reasonable chance of multiple matches. <lang perl6>use String::Splice:ver<0.0.3>;
my $line = 80;
my $haystack = [~] <A C G T>.roll($line * 8);
say 'Needle: ' ~ my $needle = [~] <A C G T>.roll(4);
my $these = $haystack ~~ m:g/<$needle>/;
my @match = $these.map: { .from, .pos }
printf "From: %3s to %3s\n", |$_ for @match;
my $disp = $haystack.comb.batch($line)».join.join("\n");
for @match.reverse {
$disp.=&splice(.[1] + .[1] div $line, "\e[0m" ); $disp.=&splice(.[0] + .[0] div $line, "\e[31m");
}
say $disp;</lang>
- Output:
Show in custom div to better display highlighting.
Needle: TAGC
From: 159 to 163
From: 262 to 266
From: 315 to 319
From: 505 to 509
From: 632 to 636
CATATGTGACACTGACAGCTCGCGCGAAAATCCGTGTGACGGTCTGAACACTATACTATAGGCCCGGTCGGCATTTGTGG
CTCCCCAGTGGAGAGACCACTCGTCAATTGCTGACGACTTAACACAAATCGAGTCGCCCTTAGTGCCAGACGGGACTCCT
AGCAAAGGGCGGCACGTGGTGACTCCCAATATGTGAGCATGCCATCTAATTGATCTGGGGGGTTTCGCGGGAATACCTAG
GGGCGTTCTGTCCATGGATCTCTAGCCCTGCGAAGAGATACCCGCAGTGAGTTGCACGTGCAAAGAACTTGTAACTAGCG
TATTCTGTATCCGCCGCGCGATATGCTTCTGCGGGATGTACTTCTTGTGACTAAGACTTTGTTATCCAAATTGACCAATA
TTCAACGGTCGACTCTCCGAGGCAGTATCGGTACGCCGAAAAATGGTTACTTCGGCCATACGTAACCTCTCAAGTCACGA
TTACAGCCCACGGGGGCTTACAGCATAGCTCCAAAGACATTCCAATTGAGCTACAACGTGTTCAGTGCGGAGCAGTATCC
AGTACTCGACTGTTATGGTAAAAGGGCATCGTGATCGTTTATATTAATCATTGGGACAGGTGGTTAATGTCATAGCTTAG
Ring
<lang ring> row = 0 dnaList = [] base = ["A","C","G","T"] long = 20 see "DNA sequence:" + nl see " " + long + ": "
for nr = 1 to 200
row = row + 1 rnd = random(3)+1 baseStr = base[rnd] see baseStr # + " " if (row%20) = 0 and long < 200 long = long + 20 see nl if long < 100 see " " + long + ": " else see "" + long + ": " ok ok add(dnaList,baseStr)
next
strBase = "" for n = 1 to 4
rnd = random(3)+1 strBase = strBase + base[rnd]
next
see "subsequence to search: " + strBase + nl
seqok = 0
for n = 1 to 196
flag = 1 for m = 0 to 3 if dnaList[n+m] != strBase[m+1] flag = 0 exit ok next if flag = 1 seqok = 1 see "start position of sequence = " + n + nl ok
next
if seqok = 0
see "subsequence not found" + nl
ok </lang>
- Output:
DNA sequence: 20: GAGTATAAAAAGCGACATAG 40: AAGCAGGGGGGGAACAGACA 60: ACAATTGTGAAAACTAATCA 80: ATACGGAAAAGGATAAACAT 100: GAGGGACTGCGGTTGGTAGG 120: CGATGAAACCTAAGAATGAA 140: AACGAGGAAGGTGTAAAGTG 160: ATGGGGTCATGGGACAGACA 180: TAGCTAAATGGATAAAAGCG 200: GGTGAAGTCGGTCGCAAACG subsequence to search: ATGA start position of subsequence = 79 start position of subsequence = 103 start position of subsequence = 116