Jump to content

Bioinformatics/Sequence mutation: Difference between revisions

Add Factor
(Add Factor)
Line 316:
 
Total:513
</pre>
 
=={{header|Factor}}==
<lang factor>USING: assocs combinators.random formatting grouping io kernel
macros math math.statistics namespaces prettyprint quotations
random sequences sorting ;
IN: sequence-mutation
 
SYMBOL: verbose? ! Turn on to show mutation details.
! Off by default.
 
! Return a random base as a character.
: rand-base ( -- n ) "ACGT" random ;
 
! Generate a random dna sequence of length n.
: <dna> ( n -- seq ) [ rand-base ] "" replicate-as ;
 
! Prettyprint a dna sequence in blocks of n.
: .dna ( seq n -- )
"SEQUENCE:" print [ group ] [ ] bi
[ * swap " %3d: %s\n" printf ] curry each-index ;
 
! Show a histogram of bases in a dna sequence and their total.
: show-counts ( seq -- )
"BASE COUNTS:" print histogram >alist [ first ] sort-with
[ [ " %c: %3d\n" printf ] assoc-each ]
[ "TOTAL: " write [ second ] [ + ] map-reduce . ] bi ;
 
! Prettyprint the overall state of a dna sequence.
: show-dna ( seq -- ) [ 50 .dna nl ] [ show-counts nl ] bi ;
 
! Call a quotation only if verbose? is on.
: log ( quot -- ) verbose? get [ call ] [ drop ] if ; inline
 
! Set index n to a random base.
: bswap ( n seq -- seq' )
[ rand-base ] 2dip 3dup [ nth ] keepd spin
[ " index %3d: swapping %c with %c\n" printf ] 3curry log
[ set-nth ] keep ;
 
! Remove the base at index n.
: bdelete ( n seq -- seq' )
2dup dupd nth [ " index %3d: deleting %c\n" printf ]
2curry log remove-nth ;
 
! Insert a random base at index n.
: binsert ( n seq -- seq' )
[ rand-base ] 2dip over reach
[ " index %3d: inserting %c\n" printf ] 2curry log
insert-nth ;
 
! Allow "passing" probabilities to casep. This is necessary
! because casep is a macro.
MACRO: build-casep-seq ( seq -- quot )
{ [ bswap ] [ bdelete ] [ binsert ] } zip 1quotation ;
 
! Mutate a dna sequence according to some weights.
! For example,
! "ACGT" { 0.1 0.3 0.6 } mutate
! means swap with 0.1 probability, delete with 0.3 probability,
! and insert with 0.6 probability.
: mutate ( dna-seq weights-seq -- dna-seq' )
[ [ length random ] keep ] [ build-casep-seq ] bi* casep ;
inline
 
! Prettyprint a sequence of weights.
: show-weights ( seq -- )
"MUTATION PROBABILITIES:" print
" swap: %.2f\n delete: %.2f\n insert: %.2f\n\n" vprintf
;
 
: main ( -- )
verbose? on "ORIGINAL " write 200 <dna> dup show-dna 10
{ 0.2 0.2 0.6 } dup show-weights "MUTATION LOG:" print
[ mutate ] curry times nl "MUTATED " write show-dna ;
 
MAIN: main</lang>
{{out}}
<pre>
ORIGINAL SEQUENCE:
0: CACAGGCAAGGGTCGTATGCTACTATAGATGTTTCAGAACCGTATTTCGA
50: CTCCGACGCGGTCATGAAGCAGACACTCCGTCACCGATTGCAAGTGTGCA
100: GTTGGGAGAATGCATTAAAATTCTGGGTTATGAAACGGGCAGCCTTGATT
150: GACAGGTGGTCCAGCGACAGTTTAACATACCAAACTCTTTGAGTACGCAG
 
BASE COUNTS:
A: 55
C: 44
G: 52
T: 49
TOTAL: 200
 
MUTATION PROBABILITIES:
swap: 0.20
delete: 0.20
insert: 0.60
 
MUTATION LOG:
index 82: deleting A
index 161: inserting C
index 48: deleting G
index 10: swapping G with T
index 184: swapping T with C
index 137: inserting T
index 60: inserting T
index 135: inserting C
index 32: inserting T
index 201: inserting A
 
MUTATED SEQUENCE:
0: CACAGGCAAGTGTCGTATGCTACTATAGATGTTTTCAGAACCGTATTTCA
50: CTCCGACGCGGTTCATGAAGCAGACACTCCGTCCCGATTGCAAGTGTGCA
100: GTTGGGAGAATGCATTAAAATTCTGGGTTATGAAACCGGGTCAGCCTTGA
150: TTGACAGGTGGTCCCAGCGACAGTTTAACATACCAAACCCTTTGAGTACG
200: CAAG
 
BASE COUNTS:
A: 55
C: 47
G: 50
T: 52
TOTAL: 204
</pre>
 
1,827

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.