Bioinformatics/Subsequence: Difference between revisions
Thundergnat (talk | contribs) (→{{header|Raku}}: Add a Raku example) |
(Added Wren) |
||
Line 118: | Line 118: | ||
start position of subsequence = 103 |
start position of subsequence = 103 |
||
start position of subsequence = 116 |
start position of subsequence = 116 |
||
</pre> |
|||
=={{header|Wren}}== |
|||
{{libheader|Wren-pattern}} |
|||
{{libheader|Wren-str}} |
|||
{{libheader|Wren-fmt}} |
|||
<lang ecmascript>import "random" for Random |
|||
import "/pattern" for Pattern |
|||
import "/str" for Str |
|||
import "/fmt" for Fmt |
|||
var rand = Random.new() |
|||
var base = "ACGT" |
|||
var findDnaSubsequence = Fn.new { |dnaSize, chunkSize| |
|||
var dnaSeq = List.filled(dnaSize, null) |
|||
for (i in 0...dnaSize) dnaSeq[i] = base[rand.int(4)] |
|||
var dnaStr = dnaSeq.join() |
|||
var dnaSubseq = List.filled(4, null) |
|||
for (i in 0...4) dnaSubseq[i] = base[rand.int(4)] |
|||
var dnaSubstr = dnaSubseq.join() |
|||
System.print("DNA sequence:") |
|||
var i = chunkSize |
|||
for (chunk in Str.chunks(dnaStr, chunkSize)) { |
|||
Fmt.print("$3d..$3d: $s", i - chunkSize + 1, i, chunk) |
|||
i = i + chunkSize |
|||
} |
|||
System.print("\nSubsequence to locate: %(dnaSubstr)") |
|||
var p = Pattern.new(dnaSubstr) |
|||
var matches = p.findAll(dnaStr) |
|||
if (matches.count == 0) { |
|||
System.print("No matches found.") |
|||
} else { |
|||
System.print("Matches found at the following indices:") |
|||
for (m in matches) { |
|||
Fmt.print("$3d..$3d", m.index + 1, m.index + 4) |
|||
} |
|||
} |
|||
} |
|||
findDnaSubsequence.call(200, 20) |
|||
System.print() |
|||
findDnaSubsequence.call(600, 40)</lang> |
|||
{{out}} |
|||
<pre> |
|||
DNA sequence: |
|||
1.. 20: TATGGGCGCATTATGACAAC |
|||
21.. 40: GGCTACTGAAACGAAAATTC |
|||
41.. 60: ATGCCTTCGGAGGCTAGACC |
|||
61.. 80: ACTCATACATGATTTACAGC |
|||
81..100: TAGTCAGTTGCGTCCGCCAT |
|||
101..120: CCCGCATAACTATGTATTAC |
|||
121..140: GAGCATGTTCTGGCAACCTT |
|||
141..160: TCAGTGACAGTTCCTCAGGC |
|||
161..180: GCGTTCGCGTTGAAGGCCTC |
|||
181..200: CCCACACCGCACCCCTGCCG |
|||
Subsequence to locate: AATT |
|||
Matches found at the following indices: |
|||
36.. 39 |
|||
DNA sequence: |
|||
1.. 40: GCGCTGAGCGCCCCAGTACAGCGGGTTAAACCGAGCCCGC |
|||
41.. 80: TCCGATGAACCAACTCCCATTCCTATAATGGTGCCCCGAC |
|||
81..120: ATATTGAATTCGGCGGGTCCGCTATCGGGCTGAGGATGCC |
|||
121..160: AATATCTAGGCGCTACCCTGAAGATCCTCAGTTGTGGTGT |
|||
161..200: CGCGGAGTGTCGATCCCAGAGCTCCCAATTGACTCAATTA |
|||
201..240: CTTTTTCCGTCCTCTTGCTTACGGATTTATGTTTGTGGCA |
|||
241..280: GAGGTTATGCTTCAGGCATCCCCATGTTTCCTGAGATACG |
|||
281..320: ACCACTGTCAGGTGGCTTGAATCTACCTTGTATTTCCTCT |
|||
321..360: AGTACCAGTCACTGTCATCTACTGGAAGCCATATCAGCGT |
|||
361..400: TGAAATGTCTATAATTTACTCTCCGGTTGTACCCAAGCGA |
|||
401..440: TAACAGCAACGTGTGGGTCTAAAGAGTTCCGCGTTTCGAC |
|||
441..480: ATAACGTGCTCCTATTTATCTACCGAAACACCCTATTTTC |
|||
481..520: CATCTAACCGGCACCCAATGCGCAGGTGTACGCGTCCTAC |
|||
521..560: TACGTTTGAAACGGTTCCATCTCGCCATGTACAATTGTGG |
|||
561..600: GGCTACGATTAAGTGTAGTCGGTAATTCAGGGTGAAGTTG |
|||
Subsequence to locate: TTCG |
|||
Matches found at the following indices: |
|||
89.. 92 |
|||
435..438 |
|||
</pre> |
</pre> |
Revision as of 19:22, 20 March 2021
- Task
Genarate randomly a string (200 elements) of characters A, C, G, and T representing a DNA sequence write a routine to find the position of subsequence (also generating randomly).
Let length of subsequence equal to 4
Raku
Chances are actually pretty small that a random 4 codon string will show up at all in a random 200 codon sequence. Bump up the sequence size to get a reasonable chance of multiple matches. <lang perl6>use String::Splice:ver<0.0.3>;
my $line = 80;
my $haystack = [~] <A C G T>.roll($line * 8);
say 'Needle: ' ~ my $needle = [~] <A C G T>.roll(4);
my $these = $haystack ~~ m:g/<$needle>/;
my @match = $these.map: { .from, .pos }
printf "From: %3s to %3s\n", |$_ for @match;
my $disp = $haystack.comb.batch($line)».join.join("\n");
for @match.reverse {
$disp.=&splice(.[1] + .[1] div $line, "\e[0m" ); $disp.=&splice(.[0] + .[0] div $line, "\e[31m");
}
say $disp;</lang>
- Output:
Show in custom div to better display highlighting.
Needle: TAGC
From: 159 to 163
From: 262 to 266
From: 315 to 319
From: 505 to 509
From: 632 to 636
CATATGTGACACTGACAGCTCGCGCGAAAATCCGTGTGACGGTCTGAACACTATACTATAGGCCCGGTCGGCATTTGTGG
CTCCCCAGTGGAGAGACCACTCGTCAATTGCTGACGACTTAACACAAATCGAGTCGCCCTTAGTGCCAGACGGGACTCCT
AGCAAAGGGCGGCACGTGGTGACTCCCAATATGTGAGCATGCCATCTAATTGATCTGGGGGGTTTCGCGGGAATACCTAG
GGGCGTTCTGTCCATGGATCTCTAGCCCTGCGAAGAGATACCCGCAGTGAGTTGCACGTGCAAAGAACTTGTAACTAGCG
TATTCTGTATCCGCCGCGCGATATGCTTCTGCGGGATGTACTTCTTGTGACTAAGACTTTGTTATCCAAATTGACCAATA
TTCAACGGTCGACTCTCCGAGGCAGTATCGGTACGCCGAAAAATGGTTACTTCGGCCATACGTAACCTCTCAAGTCACGA
TTACAGCCCACGGGGGCTTACAGCATAGCTCCAAAGACATTCCAATTGAGCTACAACGTGTTCAGTGCGGAGCAGTATCC
AGTACTCGACTGTTATGGTAAAAGGGCATCGTGATCGTTTATATTAATCATTGGGACAGGTGGTTAATGTCATAGCTTAG
Ring
<lang ring> row = 0 dnaList = [] base = ["A","C","G","T"] long = 20 see "DNA sequence:" + nl see " " + long + ": "
for nr = 1 to 200
row = row + 1 rnd = random(3)+1 baseStr = base[rnd] see baseStr # + " " if (row%20) = 0 and long < 200 long = long + 20 see nl if long < 100 see " " + long + ": " else see "" + long + ": " ok ok add(dnaList,baseStr)
next
strBase = "" for n = 1 to 4
rnd = random(3)+1 strBase = strBase + base[rnd]
next
see "subsequence to search: " + strBase + nl
seqok = 0
for n = 1 to 196
flag = 1 for m = 0 to 3 if dnaList[n+m] != strBase[m+1] flag = 0 exit ok next if flag = 1 seqok = 1 see "start position of sequence = " + n + nl ok
next
if seqok = 0
see "subsequence not found" + nl
ok </lang>
- Output:
DNA sequence: 20: GAGTATAAAAAGCGACATAG 40: AAGCAGGGGGGGAACAGACA 60: ACAATTGTGAAAACTAATCA 80: ATACGGAAAAGGATAAACAT 100: GAGGGACTGCGGTTGGTAGG 120: CGATGAAACCTAAGAATGAA 140: AACGAGGAAGGTGTAAAGTG 160: ATGGGGTCATGGGACAGACA 180: TAGCTAAATGGATAAAAGCG 200: GGTGAAGTCGGTCGCAAACG subsequence to search: ATGA start position of subsequence = 79 start position of subsequence = 103 start position of subsequence = 116
Wren
<lang ecmascript>import "random" for Random import "/pattern" for Pattern import "/str" for Str import "/fmt" for Fmt
var rand = Random.new() var base = "ACGT"
var findDnaSubsequence = Fn.new { |dnaSize, chunkSize|
var dnaSeq = List.filled(dnaSize, null) for (i in 0...dnaSize) dnaSeq[i] = base[rand.int(4)] var dnaStr = dnaSeq.join() var dnaSubseq = List.filled(4, null) for (i in 0...4) dnaSubseq[i] = base[rand.int(4)] var dnaSubstr = dnaSubseq.join() System.print("DNA sequence:") var i = chunkSize for (chunk in Str.chunks(dnaStr, chunkSize)) { Fmt.print("$3d..$3d: $s", i - chunkSize + 1, i, chunk) i = i + chunkSize } System.print("\nSubsequence to locate: %(dnaSubstr)") var p = Pattern.new(dnaSubstr) var matches = p.findAll(dnaStr) if (matches.count == 0) { System.print("No matches found.") } else { System.print("Matches found at the following indices:") for (m in matches) { Fmt.print("$3d..$3d", m.index + 1, m.index + 4) } }
}
findDnaSubsequence.call(200, 20) System.print() findDnaSubsequence.call(600, 40)</lang>
- Output:
DNA sequence: 1.. 20: TATGGGCGCATTATGACAAC 21.. 40: GGCTACTGAAACGAAAATTC 41.. 60: ATGCCTTCGGAGGCTAGACC 61.. 80: ACTCATACATGATTTACAGC 81..100: TAGTCAGTTGCGTCCGCCAT 101..120: CCCGCATAACTATGTATTAC 121..140: GAGCATGTTCTGGCAACCTT 141..160: TCAGTGACAGTTCCTCAGGC 161..180: GCGTTCGCGTTGAAGGCCTC 181..200: CCCACACCGCACCCCTGCCG Subsequence to locate: AATT Matches found at the following indices: 36.. 39 DNA sequence: 1.. 40: GCGCTGAGCGCCCCAGTACAGCGGGTTAAACCGAGCCCGC 41.. 80: TCCGATGAACCAACTCCCATTCCTATAATGGTGCCCCGAC 81..120: ATATTGAATTCGGCGGGTCCGCTATCGGGCTGAGGATGCC 121..160: AATATCTAGGCGCTACCCTGAAGATCCTCAGTTGTGGTGT 161..200: CGCGGAGTGTCGATCCCAGAGCTCCCAATTGACTCAATTA 201..240: CTTTTTCCGTCCTCTTGCTTACGGATTTATGTTTGTGGCA 241..280: GAGGTTATGCTTCAGGCATCCCCATGTTTCCTGAGATACG 281..320: ACCACTGTCAGGTGGCTTGAATCTACCTTGTATTTCCTCT 321..360: AGTACCAGTCACTGTCATCTACTGGAAGCCATATCAGCGT 361..400: TGAAATGTCTATAATTTACTCTCCGGTTGTACCCAAGCGA 401..440: TAACAGCAACGTGTGGGTCTAAAGAGTTCCGCGTTTCGAC 441..480: ATAACGTGCTCCTATTTATCTACCGAAACACCCTATTTTC 481..520: CATCTAACCGGCACCCAATGCGCAGGTGTACGCGTCCTAC 521..560: TACGTTTGAAACGGTTCCATCTCGCCATGTACAATTGTGG 561..600: GGCTACGATTAAGTGTAGTCGGTAATTCAGGGTGAAGTTG Subsequence to locate: TTCG Matches found at the following indices: 89.. 92 435..438