Bioinformatics/Sequence mutation: Difference between revisions

m
No edit summary
m (→‎{{header|jq}}: 0 .. 300)
 
(3 intermediate revisions by the same user not shown)
Line 2,243:
Σ: 261
</pre>
=={{header|jq}}==
{{Works with|jq}}
 
'''Works with gojq, the Go implementation of jq'''
 
'''Adapted from [[#Wren|Wren]]'''
 
Since jq does not include a PRNG, the following assumes that
an external source of entropy such as /dev/urandom is available.
See the "Invocation" section below for details.
<syntaxhighlight lang="jq">
### Generic utilities
 
# Output: a PRN in range(0; .)
def prn:
if . == 1 then 0
else . as $n
| (($n-1)|tostring|length) as $w
| [limit($w; inputs)] | join("") | tonumber
| if . < $n then . else ($n | prn) end
end;
 
# bag of words
def bow(stream):
reduce stream as $word ({}; .[($word|tostring)] += 1);
 
# Emit a stream of the constituent characters of the input string
def chars: explode[] | [.] | implode;
 
def lpad($len): tostring | ($len - length) as $l | (" " * $l) + .;
 
# Print $n-character segments at a time, each prefixed by a 1-based index
def pretty_nwise($n):
(length | tostring | length) as $len
| def _n($i):
if length == 0 then empty
else "\($i|lpad($len)): \(.[:$n])",
(.[$n:] | _n($i+$n))
end;
_n(1);
### Biology
def bases: ["A", "C", "G", "T"];
 
def randomBase:
bases | .[length|prn];
 
# $w is an array [weightSwap, weightDelete, weightInsert]
# specifying the weights out of 300 for each of swap, delete and insert
# Input: an object {dna}
# Output: an object {dna, message}
def mutate($w):
def removeAt($p): .[:$p] + .[$p+1:];
(.dna|length) as $le
# get a random position in the dna to mutate
| ($le | prn) as $p
# get a random number between 0 and 299 inclusive
| (300 | prn) as $r
| .dna |= [chars]
| if $r < $w[0]
then # swap
randomBase as $base
| .message = " Change @\($p) \(.dna[$p]) to \($base)"
| .dna[$p] = $base
elif $r < $w[0] + $w[1]
then # delete
.message = " Delete @\($p) \(.dna[$p])"
| .dna |= removeAt($p)
else # insert
randomBase as $base
| .message = " Insert @\($p) \($base)"
| .dna |= .[:$p] + [$base] + .[$p:]
end
| .dna |= join("") ;
 
# Generate a random dna sequence of given length:
def generate($n):
[range(0; $n) | randomBase] | join("");
 
# Pretty print dna and stats.
def prettyPrint($rowLen):
"SEQUENCE:", pretty_nwise($rowLen),
( bow(chars) as $baseMap
| "\nBASE COUNT:",
( bases[] as $c | " \($c): \($baseMap[$c] // 0)" ),
" ------",
" Σ: \(length)",
" ======\n"
) ;
 
# For displaying the weights
def pretty_weights:
" Change: \(.[0])\n Delete: \(.[1])\n Insert: \(.[2])";
 
# Arguments are length, weights, mutations
def task($n; $w; $muts ):
generate($n)
| . as $dna
| prettyPrint(50),
"\nWEIGHTS (0 .. 300):", ($w|pretty_weights),
"\nMUTATIONS (\($muts)):",
(reduce range(0;$muts) as $i ({$dna};
mutate($w)
| .emit += [.message] )
| (.emit | join("\n")),
"",
(.dna | prettyPrint(50)) ) ;
 
 
task(250; # length
[100, 100, 100]; # use e.g. [0, 300, 0] to choose only deletions
10 # mutations
)
</syntaxhighlight>
'''Invocation:'''
<pre>
< /dev/urandom tr -cd '0-9' | fold -w 1 | $JQ -cnr -f rc-sequence-mutation.jq
</pre>
{{output}}
<pre>
SEQUENCE:
1: AGGACACTGCCTTATTTTGTTTCAACAGAAGCCATCTCGAGCAACTACGT
51: GGCCACACAAGCTAATACGAATGACCTTGTATGGGGAGTTACGGGGGGTT
101: TATCTTGAGAAATGGTATAACGATACCCCAAGTGGCGTGATAGGCCGCGC
151: GGGCCTCAGAATAGGTCGTAGATCCGTAAGGGCACCGGGAGCCTTTCTTC
201: TCGTATAATCCGCCGAGATGTTAAAAGACAGCTATGGATTCCCGTAATGC
 
BASE COUNT:
A: 66
C: 57
G: 67
T: 60
------
Σ: 250
======
 
 
WEIGHTS (0 .. 300):
Change: 100
Delete: 100
Insert: 100
 
MUTATIONS (10):
Insert @76 T
Delete @104 C
Change @197 T to T
Insert @206 A
Delete @184 C
Change @69 A to C
Insert @211 G
Delete @31 C
Insert @165 G
Insert @234 T
 
SEQUENCE:
1: AGGACACTGCCTTATTTTGTTTCAACAGAAGCATCTCGAGCAACTACGTG
51: GCCACACAAGCTAATACGCATGACCTTTGTATGGGGAGTTACGGGGGGTT
101: TATTTGAGAAATGGTATAACGATACCCCAAGTGGCGTGATAGGCCGCGCG
151: GGCCTCAGAATAGGTGCGTAGATCCGTAAGGGCACGGGAGCCTTTCTTCT
201: CGTATAAATCCGGCCGAGATGTTAAAAGACAGCTTATGGATTCCCGTAAT
251: GC
 
BASE COUNT:
A: 66
C: 55
G: 69
T: 62
------
Σ: 252
======
</pre>
 
=={{header|Julia}}==
<syntaxhighlight lang="julia">dnabases = ['A', 'C', 'G', 'T']
Line 2,357 ⟶ 2,530:
Total 502
</pre>
 
=={{header|Lua}}==
Using the <code>prettyprint()</code> function from [[Bioinformatics/base_count#Lua]] (not replicated here)
2,442

edits