Bioinformatics/Sequence mutation: Difference between revisions
Content added Content deleted
No edit summary |
|||
Line 2,243: | Line 2,243: | ||
Σ: 261 |
Σ: 261 |
||
</pre> |
</pre> |
||
=={{header|jq}}== |
|||
{{Works with|jq}} |
|||
'''Works with gojq, the Go implementation of jq''' |
|||
'''Adapted from [[#Wren|Wren]]''' |
|||
Since jq does not include a PRNG, the following assumes that |
|||
an external source of entropy such as /dev/urandom is available. |
|||
See the "Invocation" section below for details. |
|||
<syntaxhighlight lang="jq"> |
|||
### Generic utilities |
|||
# Output: a PRN in range(0; .) |
|||
def prn: |
|||
if . == 1 then 0 |
|||
else . as $n |
|||
| (($n-1)|tostring|length) as $w |
|||
| [limit($w; inputs)] | join("") | tonumber |
|||
| if . < $n then . else ($n | prn) end |
|||
end; |
|||
# bag of words |
|||
def bow(stream): |
|||
reduce stream as $word ({}; .[($word|tostring)] += 1); |
|||
# Emit a stream of the constituent characters of the input string |
|||
def chars: explode[] | [.] | implode; |
|||
def lpad($len): tostring | ($len - length) as $l | (" " * $l) + .; |
|||
# Print $n-character segments at a time, each prefixed by a 1-based index |
|||
def pretty_nwise($n): |
|||
(length | tostring | length) as $len |
|||
| def _n($i): |
|||
if length == 0 then empty |
|||
else "\($i|lpad($len)): " + (if length <= $n then . else .[:$n] end), |
|||
(.[$n:] | _n($i+$n)) |
|||
end; |
|||
_n(1); |
|||
### Biology |
|||
def bases: ["A", "C", "G", "T"]; |
|||
def randomBase: |
|||
bases | .[length|prn]; |
|||
# $w is an array [weightSwap, weightDelete, weightInsert] |
|||
# specifying the weights out of 300 for each of swap, delete and insert |
|||
# Input: an object {dna} |
|||
# Output: an object {dna, message} |
|||
def mutate($w): |
|||
def removeAt($p): .[:$p] + .[$p+1:]; |
|||
(.dna|length) as $le |
|||
# get a random position in the dna to mutate |
|||
| ($le | prn) as $p |
|||
# get a random number between 0 and 299 inclusive |
|||
| (300 | prn) as $r |
|||
| .dna |= [chars] |
|||
| if $r < $w[0] |
|||
then # swap |
|||
randomBase as $base |
|||
| .message = " Change @\($p) \(.dna[$p]) to \($base)" |
|||
| .dna[$p] = $base |
|||
elif $r < $w[0] + $w[1] |
|||
then # delete |
|||
.message = " Delete @\($p) \(.dna[$p])" |
|||
| .dna |= removeAt($p) |
|||
else # insert |
|||
randomBase as $base |
|||
| .message = " Insert @\($p) \($base)" |
|||
| .dna |= .[:$p] + [$base] + .[$p:] |
|||
end |
|||
| .dna |= join("") ; |
|||
# Generate a random dna sequence of given length: |
|||
def generate($n): |
|||
[range(0; $n) | randomBase] | join(""); |
|||
# Pretty print dna and stats. |
|||
def prettyPrint($rowLen): |
|||
"SEQUENCE:", pretty_nwise($rowLen), |
|||
( bow(chars) as $baseMap |
|||
| "\nBASE COUNT:", |
|||
( bases[] as $c | " \($c): \($baseMap[$c] // 0)" ), |
|||
" ------", |
|||
" Σ: \(length)", |
|||
" ======\n" |
|||
) ; |
|||
# For displaying the weights |
|||
def pretty_weights: |
|||
" Change: \(.[0])\n Delete: \(.[1])\n Insert: \(.[2])"; |
|||
# Arguments are length, weights, mutations |
|||
def task($n; $w; $muts ): |
|||
generate($n) |
|||
| . as $dna |
|||
| prettyPrint(50), |
|||
"\nWEIGHTS (0 .. 300):", ($w|pretty_weights), |
|||
"\nMUTATIONS (\($muts)):", |
|||
(reduce range(0;$muts) as $i ({$dna}; |
|||
mutate($w) |
|||
| .emit += [.message] ) |
|||
| (.emit | join("\n")), |
|||
"", |
|||
(.dna | prettyPrint(50)) ) ; |
|||
task(250; # length |
|||
[100, 100, 100]; # use e.g. [0, 300, 0] to choose only deletions |
|||
10 # mutations |
|||
) |
|||
</syntaxhighlight> |
|||
'''Invocation:''' |
|||
<pre> |
|||
< /dev/urandom tr -cd '0-9' | fold -w 1 | $JQ -cnr -f rc-sequence-mutation.jq |
|||
</pre> |
|||
{{output}} |
|||
<pre> |
|||
SEQUENCE: |
|||
1: AGGACACTGCCTTATTTTGTTTCAACAGAAGCCATCTCGAGCAACTACGT |
|||
51: GGCCACACAAGCTAATACGAATGACCTTGTATGGGGAGTTACGGGGGGTT |
|||
101: TATCTTGAGAAATGGTATAACGATACCCCAAGTGGCGTGATAGGCCGCGC |
|||
151: GGGCCTCAGAATAGGTCGTAGATCCGTAAGGGCACCGGGAGCCTTTCTTC |
|||
201: TCGTATAATCCGCCGAGATGTTAAAAGACAGCTATGGATTCCCGTAATGC |
|||
BASE COUNT: |
|||
A: 66 |
|||
C: 57 |
|||
G: 67 |
|||
T: 60 |
|||
------ |
|||
Σ: 250 |
|||
====== |
|||
WEIGHTS (ex 300): |
|||
Change: 100 |
|||
Delete: 100 |
|||
Insert: 100 |
|||
MUTATIONS (10): |
|||
Insert @76 T |
|||
Delete @104 C |
|||
Change @197 T to T |
|||
Insert @206 A |
|||
Delete @184 C |
|||
Change @69 A to C |
|||
Insert @211 G |
|||
Delete @31 C |
|||
Insert @165 G |
|||
Insert @234 T |
|||
SEQUENCE: |
|||
1: AGGACACTGCCTTATTTTGTTTCAACAGAAGCATCTCGAGCAACTACGTG |
|||
51: GCCACACAAGCTAATACGCATGACCTTTGTATGGGGAGTTACGGGGGGTT |
|||
101: TATTTGAGAAATGGTATAACGATACCCCAAGTGGCGTGATAGGCCGCGCG |
|||
151: GGCCTCAGAATAGGTGCGTAGATCCGTAAGGGCACGGGAGCCTTTCTTCT |
|||
201: CGTATAAATCCGGCCGAGATGTTAAAAGACAGCTTATGGATTCCCGTAAT |
|||
251: GC |
|||
BASE COUNT: |
|||
A: 66 |
|||
C: 55 |
|||
G: 69 |
|||
T: 62 |
|||
------ |
|||
Σ: 252 |
|||
====== |
|||
</pre> |
|||
=={{header|Julia}}== |
=={{header|Julia}}== |
||
<syntaxhighlight lang="julia">dnabases = ['A', 'C', 'G', 'T'] |
<syntaxhighlight lang="julia">dnabases = ['A', 'C', 'G', 'T'] |
||
Line 2,357: | Line 2,531: | ||
Total 502 |
Total 502 |
||
</pre> |
</pre> |
||
=={{header|Lua}}== |
=={{header|Lua}}== |
||
Using the <code>prettyprint()</code> function from [[Bioinformatics/base_count#Lua]] (not replicated here) |
Using the <code>prettyprint()</code> function from [[Bioinformatics/base_count#Lua]] (not replicated here) |