Bioinformatics/Subsequence: Difference between revisions

Content added Content deleted

Inline

Revision as of 20:28, 27 March 2021

Task

Randomly generate a string of 200 DNA bases (represented by A, C, G, and T).

Write a routine to find all the positions of a randomly generated subsequence (four letters).

Factor

Works with: Factor version 0.99 2021-02-05

<lang factor>USING: accessors formatting grouping io kernel math math.functions.integer-logs math.parser random regexp sequences ;

new-dna ( n -- str ) [ "ACGT" random ] "" replicate-as ;

pad ( n d -- str ) [ number>string ] dip 32 pad-head ;

.dna ( seq n -- )

   seq length integer-log10 1 + :> d seq n group
   [ n * d pad write ": " write write nl ] each-index ;

.match ( slice -- ) [ from>> ] [ to>> ] bi "%d..%d\n" printf ;

.matches ( slices -- )

   "Matches found at the following indices:" print
   [ .match ] each ;

.locate ( slices -- )

   [ "No matches found." print ] [ .matches ] if-empty ;

.biosub ( dna-size row-size -- )

   [ new-dna dup ] [ .dna nl ] bi*
   4 new-dna dup "Subsequence to locate: %s\n" printf
   <regexp> all-matching-slices .locate ;

80 10 .biosub nl 600 39 .biosub nl</lang>

Output:

 0: ATTCAAGGAC
10: CACTATTAAC
20: CTGCATTGTG
30: AGAACTTGCA
40: GTGTACCGAG
50: AGCGAGTTTA
60: AAGCAACACA
70: TCTTTACCGA

Subsequence to locate: GTAG
No matches found.

  0: GATCTCGTCATGGTCCATCCTAACATTTCGGTTGTGGGC
 39: GCATCCCGATAGGCGAAGTTAAATCTACGTAGTCCTACG
 78: TCACGACGGAACATGATTGCCCACCGAAGTCGTAGGCGA
117: GCTAAAGTCGGTACATACACGATCTGCTATATTCGTTCT
156: CCGACACACGACATGCAATCCGAGAAGCTCTCGAAGTGC
195: GGTCAGATCCTCAGACTCGAACAGAGGAGACCTTAACTG
234: ATACCCACAGTACTTCTCGCATAACCTAAGCACCTATGC
273: TTACACCATCGTCCTGATATTGAGTGAGTCTGGTCGGAG
312: ATATTATCTAGCACCCTCAAGCTCTGTGTGCCACACCAG
351: GATTCCACTTCGCGCTTGCCTAGAGAAAGTAGAGTAGGT
390: GGTGTCATTAGTACACTGTTTGCGATGCACCAACCAAAC
429: CCGACCGCCATGATGACTGCTTTTCGGCCAACGTCAGAT
468: TAAGAGTACTTTTAGTAGCACCGCAAGCCAGCCGGTTTA
507: GCAAGATCCTGCAGCCTCCACGTTATTTCAGGTCTCTAA
546: GCGTTCTTTCCATGGAAGTAGTCACCGCTCCCGTTGCCA
585: ATGGACACAGACGTT

Subsequence to locate: ATAT
Matches found at the following indices:
145..149
289..293
312..316

Julia

<lang julia>DNArand(n, bases=['A', 'T', 'C', 'G']) = String(rand(bases, n))

DNAsearch(needle, haystack, lap=true) = findall(needle, haystack, overlap=lap)

const rand_string = DNArand(200) const subseq = DNArand(4)

println("Search sequence:\n$rand_string\nfor substring $subseq. Found at positions: ") foreach(p -> print(rpad(p[2], 8), p[1] % 10 == 0 ? "\n" : ""), enumerate(DNAsearch(subseq, rand_string)))

</lang>

Output:

Search sequence:
CCGAAGCCAGGAGGACTGAGCGCTTGCGTCCCGAGTTCTGCGACGAGTCTCTTCATTATAAGGCCACTGATTGCGCTCATCATGAGTGCCAGAAGCACCGCTAAACATAAGTGTCCTTTCTTCCTGACGCACTTGAAGATTGTGACCATTTGTGCGGGTTGTGAGTTAGGGGCTCTCATTGTACACGATCTATAGTGTGC
for substring CGCT. Found at positions:
21:24   74:77   99:102

Perl

<lang perl>use strict; use warnings; use feature 'say';

my @bases = <A C G T>; my $basecnt = 160;

my($string,$target); $string .= $bases[ int rand @bases ] for 1 .. $basecnt; $target .= $bases[ int rand @bases ] for 1 .. 4; say "Target: $target"; say 'Matches at these positions:'; say (($string =~ s/.{1,40}\K/\n/gr) =~ s/($target)/ >$1< /gr);</lang>

Output:

Target: CCTG
Matches at these positions:
9
90
157
TTGCC >CCTG< CAAAGTTAATAAGTAAACAATTAAGTGAGTG
CTCTAGGGTAAGGTGAGGGCGGGAAGGGGAAAAATACCGA
TGCGAG >CCTG< TAGAGCCGGGCCTCAAATTAAACGAAAAAT
ATAAGTTTGCTTGGCACGCTGTACTACTTATCC >CCTG< ACT

Phix

Note: match_all() is due to become a builtin in the next release, so the version below may or may not need renaming/deleting before it will run.
Currently only searches for non-overlapped sequences, but it should be pretty obvious how to change that, in which case the next underline will simply partially overwrite the previous, so you'll get eg "<=<==>".

constant cheat = false

function grandna(integer len)
    string dna = repeat(' ',len)
    for i=1 to len do dna[i] = "ACGT"[rand(4)] end for
    return dna
end function

procedure show(string dna, sequence idx)
    idx &= length(dna)+100 -- (add an otherwise unused sentinel)
    sequence s = split(trim(join_by(split(join_by(dna,1,10,""),"\n"),1,5," ")),"\n")
    integer ii = 1,         -- idx index
            i = idx[ii],    -- current target
            ux = 1,         -- underline index (1..4)
            ldx = 1         -- line index (1, 51, 101, etc)
    for si=1 to length(s) do
        printf(1,"%3d: %s\n",{ldx,s[si]})
        ldx += 50
        if i and i<ldx then
            string ul = repeat(' ',59)
            while i and i<ldx do
                integer up = i-ldx+51       -- underline pos (relative to ldx)
                up += floor((up-1)/10)+5    -- (plus any needed spacing)
                ul[up] = "<==>"[ux]
                ux += 1
                i += 1
                if ux>4 then
                    ux = 1
                    ii += 1
                    i = idx[ii]
                end if
            end while
            printf(1,"%s\n",ul)
        end if
    end for
    if length(idx) then
        string s = iff(length(idx)>1?"s":""),
               t = join(apply(idx,sprint),", ")
        printf(1,"%s occurs at location%s: %s\n",{test,s,t})
    else
        printf(1,"%s does not occur\n",{test})
    end if
end procedure

function match_all(object needle, sequence haystack, bool bOverlap = false)
    if atom(needle) then return find_all(needle,haystack) end if
    integer start = 1
    sequence res = {}
    while 1 do
        start = match(needle,haystack,start)
        if start=0 then exit end if
        res = append(res,start)
        start += iff(bOverlap?1:length(needle))
    end while
    return res
end function
 
string dna = grandna(200),
       test = grandna(4)
constant cheats = iff(cheat?{9,13,49,60,64,68}:{})
for i=1 to length(cheats) do
    dna[cheats[i]..cheats[i]+3] = test
end for
sequence idx = match_all(test,dna)
show(dna,idx)

Output:

with cheat enabled

  1: GGAGATATCG ACCGACCGAA GTAAAGTCAA AGTCGTCCAA TCCACGGACG
             <= =><==>                                   <=
 51: ACTTCAGCAC GACCGACCGA CCTATTTAAG AGACCACACT TAAGGAATCC
     =>       < ==><==><== >
101: ATGCGAAATA AAAATGGGCG AGTAGCCGTG GGGCGCTAAA GCACCCACCT
151: AGTTTTCGCC GAAGTACTAG ACCACCTTCG GATCGACAAA GCTTTCACCA
                                         <==>
CGAC occurs at locations: 9, 13, 49, 60, 64, 68, 184

with cheat disabled

  1: TGATTTAAAC CGTGGTGCAA TTTATAAACA CTGCGATATG CCTCCTGATG
 51: GCATGGTATT CGACACCAAG ACGCTGGTGG GCACACTGGC TTTCAGAATA
101: GGAGTCACAA TCCCTCTATG ATGTCCTCTA GCGGGTGTGT GTTCAGTGCC
151: AGCGCTTACT TCCGGCGTGG CCGACTCTTT TTAAAGCGTA TAGCTGGGGT
GCTA does not occur

Python

Works with: Python version 3.8

Library: regex

<lang python> from random import choice import regex as re import time

def generate_sequence(n: int ) -> list:

   return "".join([ choice(['A','C','G','T']) for _ in range(n) ])

def dna_findall(needle: str, haystack: str) -> None:

   if sum(1 for _ in re.finditer(needle, haystack, overlapped=True)) == 0:
       print("No matches found")
   else:
       print(f"Found {needle} at the following indices: ")
       for match in re.finditer(needle, haystack, overlapped=True):
           print(f"{match.start()}:{match.end()} ")

dna_seq = generate_sequence(200) sample_seq = generate_sequence(4)

c = 1 for i in dna_seq:

   print(i, end="") if c % 20 != 0 else print(f"{i}")
   c += 1

print(f"\nSearch Sample: {sample_seq}")

dna_findall(sample_seq, dna_seq) </lang>

Output:

TTGCCCCTGTACTGAGCCCA TAAGCTTGCACTCAAGGTTT TGCCCCCTCATATTATAACG CATCCATTATACAAAACCGA TACCCTTCCGCATATTATGA AAAGTGGCGAAGTGCCTTGA TTTGCATTCATAGTACAACG GTGCAAAAGCATTGTATGTC TCACATTTACATGGGAAATG CCTAGTAGGTGCAAGACCTG

Search Sample: TACA Found TACA at the following indices: 69:73 133:137 167:171

Raku

Chances are actually pretty small that a random 4 codon string will show up at all in a random 200 codon sequence. Bump up the sequence size to get a reasonable chance of multiple matches. <lang perl6>use String::Splice:ver<0.0.3>;

my $line = 80;

my $haystack = [~] <A C G T>.roll($line * 8);

say 'Needle: ' ~ my $needle = [~] <A C G T>.roll(4);

my $these = $haystack ~~ m:g/<$needle>/;

my @match = $these.map: { .from, .pos }

printf "From: %3s to %3s\n", |$_ for @match;

my $disp = $haystack.comb.batch($line)».join.join("\n");

for @match.reverse {

   $disp.=&splice(.[1] + .[1] div $line, "\e[0m" );
   $disp.=&splice(.[0] + .[0] div $line, "\e[31m");

}

say $disp;</lang>

Output:

Show in custom div to better display highlighting.

Needle: TAGC

From: 159 to 163

From: 262 to 266

From: 315 to 319

From: 505 to 509

From: 632 to 636

CATATGTGACACTGACAGCTCGCGCGAAAATCCGTGTGACGGTCTGAACACTATACTATAGGCCCGGTCGGCATTTGTGG

CTCCCCAGTGGAGAGACCACTCGTCAATTGCTGACGACTTAACACAAATCGAGTCGCCCTTAGTGCCAGACGGGACTCCT

AGCAAAGGGCGGCACGTGGTGACTCCCAATATGTGAGCATGCCATCTAATTGATCTGGGGGGTTTCGCGGGAATACCTAG

GGGCGTTCTGTCCATGGATCTCTAGCCCTGCGAAGAGATACCCGCAGTGAGTTGCACGTGCAAAGAACTTGTAACTAGCG

TATTCTGTATCCGCCGCGCGATATGCTTCTGCGGGATGTACTTCTTGTGACTAAGACTTTGTTATCCAAATTGACCAATA

TTCAACGGTCGACTCTCCGAGGCAGTATCGGTACGCCGAAAAATGGTTACTTCGGCCATACGTAACCTCTCAAGTCACGA

TTACAGCCCACGGGGGCTTACAGCATAGCTCCAAAGACATTCCAATTGAGCTACAACGTGTTCAGTGCGGAGCAGTATCC

AGTACTCGACTGTTATGGTAAAAGGGCATCGTGATCGTTTATATTAATCATTGGGACAGGTGGTTAATGTCATAGCTTAG

REXX

This REXX version allows the user to specify:

length of the (random) DNA data sequence (default is 200).
length of the (random) DNA sequence (default is four).
DNA proteins to be used in the sequence (default is ACGT).
width of the output lines of (random DNA) (default is 100).
often (if ever) to add a blank to the output (default is every 10 proteins).
DNA proteins to be searched in the data (the default is four unique random proteins).
the seed for the RANDOM function so runs can be repeated with the same data (no default).

<lang rexx>/*REXX pgm gens random DNA (ACGT) sequence & finds positions of a random 4─protein seq. */ parse arg totLen rndLen basePr oWidth Bevery rndDNA seed . if totLen== | totLen=="," then totLen= 200 /*Not specified? Then use the default.*/ if rndLen== | rndLen=="," then rndLen= 4 /* " " " " " " */ if basePr== | basePr=="," then basePr= 'acgt' /* " " " " " " */ if oWidth== | oWidth=="," then oWidth= 100 /* " " " " " " */ if Bevery== | Bevery=="," then Bevery= 10 /* " " " " " " */ if rndDNA== | rndDNA=="," then rndDNA= copies(., rndLen) /*what we're looking for*/ if datatype(seed, 'W') then call random ,,seed /*used to generate repeatable random #s*/ call genRnd /*gen data field of random proteins. */ call show /*show " " " " " */ say ' base DNA proteins used: ' basePr say 'random DNA proteins used: ' dna? call findRnd if @== then do; say "the random DNA proteins weren't found."; exit 4; end say 'the random DNA proteins were found in positions:' strip(@) exit 0 /*stick a fork in it, we're all done. */ /*──────────────────────────────────────────────────────────────────────────────────────*/ commas: parse arg ?; do jc=length(?)-3 to 1 by -3; ?=insert(',', ?, jc); end; return ? /*──────────────────────────────────────────────────────────────────────────────────────*/ findRnd: @=; p=0 /*@: list of the found target proteins*/

             do until p==0;    p= pos(dna?, $$, p+1);     if p>0  then @= @ commas(p)
  /*Found one?  Append it to the "Found"s*/
             end   /*p*/;             return

/*──────────────────────────────────────────────────────────────────────────────────────*/ genRnd: dna?=; use= basePr; upper use basePr rndDNA; lenB= length(basePr)

             do k=1  for rndLEN;      x= substr(rndDNA, k, 1)
             if x==.  then  do;  ?= random(1, length(use) );         x= substr(use, ?, 1)
                                 use= delstr(use, ?, 1)   /*elide so no protein repeats*/
                            end
             dna?= dna? || x                              /*build a random protein seq.*/
             end   /*k*/
       return

/*──────────────────────────────────────────────────────────────────────────────────────*/ show: say " index │"center('DNA sequence of ' commas(totLen) " proteins", oWidth+10)

     say "───────┼"center(                                            , oWidth+10, '─')
     $=; $$=;                 idx= 1               /*gen data field of random proteins.*/
        do j=1  for totLen;   c= substr( basePr, random(1, lenB), 1)
         $$= $$ || c                                       /*append a random protein.  */
         if Bevery\==0  then if j//Bevery==0  then $= $' ' /*possibly add a blank.     */
         if length( space($ || c, 0) )<oWidth  then do;   $= $  ||  c;   iterate;   end
         say strip( right(idx, 7)'│' $, 'T');  $=          /*display line ──► terminal.*/
         idx= idx + oWidth                                 /*bump the index number.    */
         end   /*j*/
     if $\==  then say right(idx, 7)"│" strip($, 'T')    /*show residual protein data*/
     say;                return</lang>

output when using the default inputs:

 index │                                                 DNA sequence of  200  proteins
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────
      1│ TTTTTAGCG CGTTTTGTAG CGCTCTAAAA ACCGTAGCTA TATTTCTCGA AGTTTCACCC AGCTCTTTTG CCCCAGGGTT GCGCTAAGCC CAGCTTCGAG
    101│ GGGGCACAG GTAAAATACT ACCGTCCGTG GAGGGGGATG AATTGACCCG ACATTTTTTG AAGCATAACT CGTGACTCAA TATTGCATGA TTACACCAGC

  base DNA proteins used:  ACGT
random DNA proteins used:  GCAT

the random DNA proteins were found in positions: 162 184

output when using the inputs of: 1000 , , , , tttt

 index │                                       DNA sequence of  1,000  proteins
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────
      1│ GTGATTTTT AGCGCGTTTT GTAGCGCTCT AAAAACCGTA GCTATATTTC TCGAAGTTTC ACCCAGCTCT TTTGCCCCAG GGTTGCGCTA AGCCCAGCTT
    101│ GAGGGGGGC ACAGGTAAAA TACTACCGTC CGTGGAGGGG GATGAATTGA CCCGACATTT TTTGAAGCAT AACTCGTGAC TCAATATTGC ATGATTACAC
    201│ AGCTAGGTT AGTGTAAAAA CCCCCCTATC TTCCTGATCA ATGGCGAGTA AAACATGCAA CCAATTTGTG AGCGAGTACT GGAAATTATT GTTTACGGGA
    301│ AGGCACATG CTACGCGCAA CAGATATCTT AGACTGACCC TTTTAGAGTC ATAAGCCCCT GTCGCCTACA TGCTACTAAT ACTCCAACTA GCGGCGCACC
    401│ TCAACCGGA TCATGGCGCC AGGGAAAATG TGGCGTAGCG ACGTGCTCAT CGCTCGCCGG GGAGAGCCTT TCAGAATCTC GAATAAAACC TGGTAATGAC
    501│ TCATCAATC GTAATGGTCG TCTGGGGCAA GAAGCCGATA TTATAGACTC AGGTCAGACG TGTGCACAAC GGCAGAATTT ATAGTAATTC GCGTGAACTA
    601│ GTTTCGGGA TAGGCCTACG ACCAATCATA GGACATTCGA TGCACGGTGT AGAAACAGTT CTCTGATGTT ACTCGGGATA ACACTCGCAA TCCCCTAGGA
    701│ ACCGTGAGC GTCGCTAGTA TCTGAGATAG TCGCGACTGC CCAGCGGTCT TTAAGTTCGC ACACTACGGG ACTCCTAGTT CGCCCATTCA TGGCTATTTT
    801│ CCTATCAGT CCAATCCCAC GGGGAGGGCA CTCGCGCAAT TCATTCAAAG AGGGCCATTT GCCGATATAA GGTCCATCAT CGGGAGGAAT ATGACTCCTG
    901│ TTAGTATTA GAGCAGCCTC GCTGCGTACT ACTGTCAGTG GCCCGTCAGG GAAGGCAAAA CGTTTTTCCT CTAGGAATCC GTCAATTGGA CTTCTAGACT

  base DNA proteins used:  ACGT
random DNA proteins used:  TTTT

the random DNA proteins were found in positions: 5 6 16 69 157 158 159 340 796 797 962 963

Ring

<lang ring> /*-----------------------------------

Project : DNA subsequences
Date : 2021/03/23
Author : Gal Zsolt (~ CalmoSoft ~)
Email : <calmosoft@gmail.com>

*/

//-----------------------------------------

load "stdlibcore.ring" load "guilib.ring"

start = 0 base = ["A","C","G","T"] dnaList = [] dnaSeq = [] ColLine = list(21)

C_Spacing = 2

C_ButtonDnaStyle = ' background-color: Red; border-radius: 8px;' C_ButtonStyle = '"background-color:white"; border-radius: 8px;' Button = newlist(10,20) LayoutButtonRow = list(10)

//-----------------------------------------

app = new qApp {

     win = new qWidget() {
           setWindowTitle('DNA subsequences')

setWinIcon(self,AppFile("white.jpg"))

           setStyleSheet('background-color:White')
           setgeometry(560,180,300,300)
           //reSize(400,400)
           winheight = 10 
           fontSize = 8 # + (winheight / 100)

           LayoutButtonMain = new QVBoxLayout()            
           LayoutButtonMain.setSpacing(C_Spacing)
           LayoutButtonMain.setContentsmargins(0,0,0,0)

           LabelInd = new qLabel(win) { settext("    DNA subsequences start positions:")
                                        setAlignment(Qt_AlignHCenter | Qt_AlignVCenter)
                                        setStyleSheet("background-color:yellow") }

           ButtonInd = new QPushButton(win) { setStyleSheet("background-color:yellow") }

           LabelFind = new qLabel(win) { settext("    DNA subsequence to find:")
                                         setStyleSheet("background-color:yellow") }

           ButtonFind = new QPushButton(win)

           DnaSearch = new QPushButton(win) { setclickevent("pstart()")
                                              setStyleSheet("background-color:yellow")
                                              settext("Find")
                                            }
   
           for Col = 1 to 21
               ColLine[Col] = new qLabel(win) {
                               setmaximumheight(20)
                               setAlignment(Qt_AlignHCenter | Qt_AlignVCenter) 
                               setStyleSheet("background-color:darkgray")
                               setText(string(Col-1))
                               } 
           next

           LayoutInd = new QHBoxLayout() { setSpacing(C_Spacing) setContentsMargins(0,0,0,0) }
           LayoutInd.AddWidget(LabelInd) 
           LayoutInd.AddWidget(ButtonInd) 
           LayoutButtonMain.AddLayout(LayoutInd) 
               
           LayoutTitleRow = new QHBoxLayout() { setSpacing(C_Spacing) setContentsMargins(0,0,0,0) }

               for Col = 1 to 21                
                   LayoutTitleRow.AddWidget(ColLine[Col])         
               next
                           
           LayoutButtonMain.AddLayout(LayoutTitleRow)  
           
           RowLine = list(10)

           for Row = 1 to 10
               Letter = "" + Row*20
               if Row*20 < 100
                  Letter = "  " + Row*20
               ok
               RowLine[Row] = new qLabel(win) { setFont(new qFont("Verdana",fontSize,40,0))
                                                setAlignment(Qt_AlignHCenter | Qt_AlignVCenter) 
                                                setStyleSheet("background-color:darkgray")
                                                setText(Letter)
                                              } 
           next

           for Row = 1 to 10
               LayoutButtonRow[Row] = new QHBoxLayout()    
               {
                   setSpacing(C_Spacing)
                   setContentsmargins(0,0,0,0)
               }

              LayoutButtonRow[Row].AddWidget(RowLine[Row])
              
              for Col = 1 to 20
                   Button[Row][Col] = new QPushButton(win) {
                                      setmaximumwidth(20) 
                                      }
                   LayoutButtonRow[Row].AddWidget(Button[Row][Col])    
              next
              
              LayoutButtonMain.AddLayout(LayoutButtonRow[Row])         
           next

           LayoutDataRow = new QHBoxLayout() { setSpacing(C_Spacing) setContentsMargins(0,0,0,0) }

           LayoutDataRow.AddWidget(LabelFind)
           LayoutDataRow.AddWidget(ButtonFind)               
           LayoutDataRow.AddWidget(DnaSearch)
           LayoutButtonMain.AddLayout(LayoutDataRow) 
                 
           setLayout(LayoutButtonMain)
           
           pStart()
           show()
  }
  exec()
}

//-----------------------------------------

func pStart()

    start = start + 1

    dnaList = []
    for row = 1 to 10
        for col = 1 to 20
            Button[row][col].settext("")
        next
    next
    for nr = 1 to 200
        rnd = random(3)+1
        baseStr = base[rnd]
        row = ceil(nr/20)
        col = nr%20
        if col = 0
           col = 20
        ok
        Button[row][col].settext(baseStr)
        add(dnaList,baseStr)
    next

    startDna()

//-----------------------------------------

func startDna()

    strDna = list2str(dnaList)
    strDna = substr(strDna,nl,"")

    while true
          strBase = ""
          for n = 1 to 4
              rnd = random(3)+1
              strBase = strBase + base[rnd]
          next
          ind = substr(strDna,strBase)
          if ind > 0
             exit
          ok
    end

    showDna(dnaList)

//-----------------------------------------

func showDna(dnaList)

    if start > 1
    see nl
    for n = 1 to len(dnaSeq)
        for m = 0 to 3
            ind = dnaSeq[n] + m
            row = ceil(ind/20)
            col = ind%20
            if col = 0
               col = 20
            ok
            Button[row][col].setstylesheet(C_ButtonStyle)
        next
    next
    ok

    dnaSeq = []
    strDna = list2str(dnaList)
    strDna = substr(strDna,nl,"")

    while true
          strBase = ""
          for n = 1 to 4
              rnd = random(3)+1
              strBase = strBase + base[rnd]
          next
          ind = substr(strDna,strBase)
          if ind > 0
             exit
          ok
    end

    ButtonFind.setStyleSheet("background-color:yellow")
    ButtonFind.settext(strBase)

    for n = 1 to 196
        flag = 1
        for m = 0 to 3
            if dnaList[n+m] != strBase[m+1]
               flag = 0
               exit
            ok
        next
        if flag = 1
           add(dnaSeq,n)
        ok
    next
    
    temp = ""
    ButtonInd.settext("")
    for nr = 1 to len(dnaList)
        ind = find(dnaSeq,nr)
        if ind > 0
           temp = temp + string(dnaSeq[ind]) + " "
           ButtonInd.settext(temp)
           for n = nr to nr + 3
               row = ceil(n/20)
               col = n%20
               if col = 0
                  col = 20
               ok
               Button[row][col].setStyleSheet(C_ButtonDnaStyle)
               Button[row][col].settext(dnaList[n])
           next
        ok
     next

//----------------------------------------- </lang>

Output:

Bioinformatics/Subsequence - video

Wren

Library: Wren-pattern

Library: Wren-str

Library: Wren-fmt

<lang ecmascript>import "random" for Random import "/pattern" for Pattern import "/str" for Str import "/fmt" for Fmt

var rand = Random.new() var base = "ACGT"

var findDnaSubsequence = Fn.new { |dnaSize, chunkSize|

   var dnaSeq = List.filled(dnaSize, null)
   for (i in 0...dnaSize) dnaSeq[i] = base[rand.int(4)]
   var dnaStr = dnaSeq.join()
   var dnaSubseq = List.filled(4, null)
   for (i in 0...4) dnaSubseq[i] = base[rand.int(4)]
   var dnaSubstr = dnaSubseq.join()
   System.print("DNA sequence:")
   var i = chunkSize
   for (chunk in Str.chunks(dnaStr, chunkSize)) {
        Fmt.print("$3d..$3d: $s", i - chunkSize + 1, i, chunk)
        i = i + chunkSize
   }
   System.print("\nSubsequence to locate: %(dnaSubstr)")
   var p = Pattern.new(dnaSubstr)
   var matches = p.findAll(dnaStr)
   if (matches.count == 0) {
       System.print("No matches found.")
   } else {
       System.print("Matches found at the following indices:")
       for (m in matches) {
           Fmt.print("$3d..$3d", m.index + 1, m.index + 4)
       }
   }

}

findDnaSubsequence.call(200, 20) System.print() findDnaSubsequence.call(600, 40)</lang>

Output:

DNA sequence:
  1.. 20: TATGGGCGCATTATGACAAC
 21.. 40: GGCTACTGAAACGAAAATTC
 41.. 60: ATGCCTTCGGAGGCTAGACC
 61.. 80: ACTCATACATGATTTACAGC
 81..100: TAGTCAGTTGCGTCCGCCAT
101..120: CCCGCATAACTATGTATTAC
121..140: GAGCATGTTCTGGCAACCTT
141..160: TCAGTGACAGTTCCTCAGGC
161..180: GCGTTCGCGTTGAAGGCCTC
181..200: CCCACACCGCACCCCTGCCG

Subsequence to locate: AATT
Matches found at the following indices:
 36.. 39

DNA sequence:
  1.. 40: GCGCTGAGCGCCCCAGTACAGCGGGTTAAACCGAGCCCGC
 41.. 80: TCCGATGAACCAACTCCCATTCCTATAATGGTGCCCCGAC
 81..120: ATATTGAATTCGGCGGGTCCGCTATCGGGCTGAGGATGCC
121..160: AATATCTAGGCGCTACCCTGAAGATCCTCAGTTGTGGTGT
161..200: CGCGGAGTGTCGATCCCAGAGCTCCCAATTGACTCAATTA
201..240: CTTTTTCCGTCCTCTTGCTTACGGATTTATGTTTGTGGCA
241..280: GAGGTTATGCTTCAGGCATCCCCATGTTTCCTGAGATACG
281..320: ACCACTGTCAGGTGGCTTGAATCTACCTTGTATTTCCTCT
321..360: AGTACCAGTCACTGTCATCTACTGGAAGCCATATCAGCGT
361..400: TGAAATGTCTATAATTTACTCTCCGGTTGTACCCAAGCGA
401..440: TAACAGCAACGTGTGGGTCTAAAGAGTTCCGCGTTTCGAC
441..480: ATAACGTGCTCCTATTTATCTACCGAAACACCCTATTTTC
481..520: CATCTAACCGGCACCCAATGCGCAGGTGTACGCGTCCTAC
521..560: TACGTTTGAAACGGTTCCATCTCGCCATGTACAATTGTGG
561..600: GGCTACGATTAAGTGTAGTCGGTAATTCAGGGTGAAGTTG

Subsequence to locate: TTCG
Matches found at the following indices:
 89.. 92
435..438