Bioinformatics/Subsequence
Bioinformatics/Subsequence is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.
- Task
Genarate randomly a string (200 elements) of characters A, C, G, and T representing a DNA sequence write a routine to find the position of subsequence.
Let length of subsequence equal to 4
Ring
<lang ring> row = 0 dnaList = [] base = ["A","C","G","T"] long = 20 see "DNA sequence:" + nl see " 12345678901234567890" + nl see " " + long + ": "
for nr = 1 to 200
row = row + 1 rnd = random(3)+1 baseStr = base[rnd] see baseStr # + " " if (row%20) = 0 and long < 200 long = long + 20 see nl if long < 100 see " " + long + ": " else see "" + long + ": " ok ok add(dnaList,baseStr)
next see nl+ " 12345678901234567890" + nl
strBase = "" for n = 1 to 4
rnd = random(3)+1 strBase = strBase + base[rnd]
next
see "sequence to search: " + strBase + nl
seqok = 0
for n = 1 to 196
flag = 1 for m = 0 to 3 if dnaList[n+m] != strBase[m+1] flag = 0 exit ok next if flag = 1 seqok = 1 see "start position of sequence = " + n + nl ok
next
if seqok = 0
see "sequence not found" + nl
ok </lang>
- Output:
DNA sequence: 12345678901234567890 20: GAGTATAAAAAGCGACATAG 40: AAGCAGGGGGGGAACAGACA 60: ACAATTGTGAAAACTAATCA 80: ATACGGAAAAGGATAAACAT 100: GAGGGACTGCGGTTGGTAGG 120: CGATGAAACCTAAGAATGAA 140: AACGAGGAAGGTGTAAAGTG 160: ATGGGGTCATGGGACAGACA 180: TAGCTAAATGGATAAAAGCG 200: GGTGAAGTCGGTCGCAAACG 12345678901234567890 sequence to search: ATGA start position of subsequence = 79 start position of subsequence = 103 start position of subsequence = 116