Bioinformatics/Sequence mutation: Difference between revisions

Automated syntax highlighting fixup (second round - minor fixes)
m (syntax highlighting fixup automation)
m (Automated syntax highlighting fixup (second round - minor fixes))
Line 15:
* Give more information on the individual mutations applied.
* Allow mutations to be weighted and/or chosen.
<syntaxhighlight lang="11l">UInt32 seed = 0
F nonrandom(n)
:seed = 1664525 * :seed + 1013904223
Line 113 ⟶ 112:
TOT= 249
<syntaxhighlight lang=Ada"ada">with Ada.Containers.Vectors;
with Ada.Numerics.Discrete_Random;
with Ada.Text_Io;
Line 264 ⟶ 262:
Count of G is 56
Count of T is 51</pre>
<syntaxhighlight lang="rebol">bases: ["A" "T" "G" "C"]
dna: map 1..200 => [sample bases]
Line 370 ⟶ 367:
200 : CC
Total count => A: 46 T: 47 G: 55 C: 54</pre>
Adenine ( A ) is always swapped for Thymine ( T ) and vice versa. Similarly with Cytosine ( C ) and Guanine ( G ).
<syntaxhighlight lang=C"c">
Line 672 ⟶ 668:
<syntaxhighlight lang="cpp">#include <array>
#include <iomanip>
#include <iostream>
Line 829 ⟶ 824:
A: 65, C: 66, G: 64, T: 56, Total: 251
=={{header|Common Lisp}}==
<b>Usage :</b>
Line 836 ⟶ 830:
:: :genome <i><Genome Sequence></i>)
<b>All keys are optional. <i><Genome length></i> is discarded when :genome is set.</b>
<syntaxhighlight lang="lisp">
(defun random_base ()
(random 4))
Line 979 ⟶ 973:
T : 137 G : 119
<syntaxhighlight lang="factor">USING: assocs combinators.random formatting grouping io kernel
macros math math.statistics namespaces prettyprint quotations
random sequences sorting ;
Line 1,101 ⟶ 1,094:
TOTAL: 204
<syntaxhighlight lang="go">package main
import (
Line 1,257 ⟶ 1,249:
<syntaxhighlight lang="haskell">import Data.List (group, sort)
import Data.List.Split (chunksOf)
import System.Random (Random, randomR, random, newStdGen, randoms, getStdRandom)
Line 1,377 ⟶ 1,369:
Σ: 203</pre>
<syntaxhighlight lang=J"j">ACGT=: 'ACGT'
MUTS=: ;: 'del ins mut'
Line 1,506 ⟶ 1,497:
│ │ 200│GGC │
<syntaxhighlight lang="java">import java.util.Arrays;
import java.util.Random;
Line 1,652 ⟶ 1,642:
A: 71, C: 62, G: 58, T: 61, Total: 252
<syntaxhighlight lang="javascript">// Basic set-up
const numBases = 250
const numMutations = 30
Line 1,876 ⟶ 1,865:
Σ: 261
<syntaxhighlight lang="julia">dnabases = ['A', 'C', 'G', 'T']
randpos(seq) = rand(1:length(seq)) # 1
mutateat(pos, seq) = (s = seq[:]; s[pos] = rand(dnabases); s) # 2-1
Line 1,991 ⟶ 1,979:
Total 502
Using the <code>prettyprint()</code> function from [[Bioinformatics/base_count#Lua]] (not replicated here)
<syntaxhighlight lang="lua">math.randomseed(os.time())
bases = {"A","C","T","G"}
function randbase() return bases[math.random(#bases)] end
Line 2,053 ⟶ 2,040:
121 gcatagagtg gattccttta acctaggaga aacgcccttc cggttcagca tggcgagtgc
181 gtacaacgat gacccagat</pre>
=={{header|Mathematica}} / {{header|Wolfram Language}}==
BioSequence is a fundamental data type in Mathematica:
<syntaxhighlight lang=Mathematica"mathematica">SeedRandom[13122345];
randompos = RandomInteger[seq["SequenceLength"]];
Line 2,104 ⟶ 2,090:
{{"T", 60}, {"A", 70}, {"G", 67}, {"C", 49}}</pre>
<syntaxhighlight lang=Nim"nim">import random
import strformat
import strutils
Line 2,258 ⟶ 2,243:
<syntaxhighlight lang="perl">use strict;
use warnings;
use feature 'say';
Line 2,330 ⟶ 2,314:
G: 51
T: 51</pre>
<!--<syntaxhighlight lang=Phix"phix">(phixonline)-->
<span style="color: #004080;">string</span> <span style="color: #000000;">dna</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">repeat</span><span style="color: #0000FF;">(</span><span style="color: #008000;">' '</span><span style="color: #0000FF;">,</span><span style="color: #000000;">200</span><span style="color: #0000FF;">+</span><span style="color: #7060A8;">rand</span><span style="color: #0000FF;">(</span><span style="color: #000000;">300</span><span style="color: #0000FF;">))</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">dna</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span> <span style="color: #000000;">dna</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">"ACGT"</span><span style="color: #0000FF;">[</span><span style="color: #7060A8;">rand</span><span style="color: #0000FF;">(</span><span style="color: #000000;">4</span><span style="color: #0000FF;">)]</span> <span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
Line 2,408 ⟶ 2,391:
Base counts: A:128, C:110, G:119, T:123, total:480
<syntaxhighlight lang=PureBasic"purebasic">#BASE$="ACGT"
Line 2,494 ⟶ 2,476:
Base counts:
A: 51, C: 55, G: 43, T: 53, Total: 202</pre>
In function seq_mutate argument kinds selects between the three kinds of mutation. The characters I, D, and S are chosen from the string to give the kind of mutation to perform, so the more of that character, the more of that type of mutation performed.<br>
Similarly parameter choice is chosen from to give the base for substitution or insertion - the more any base appears, the more likely it is to be chosen in any insertion/substitution.
<syntaxhighlight lang="python">import random
from collections import Counter
Line 2,587 ⟶ 2,568:
T: 72
TOT= 251</pre>
<code>prettyprint</code> and <code>tallybases</code> are defined at [[Bioinformatics/base count#Quackery]].
<syntaxhighlight lang=Quackery"quackery"> [ $ "ACGT" 4 random peek ] is randomgene ( --> c )
[ $ "" swap times
Line 2,640 ⟶ 2,620:
total 201
<syntaxhighlight lang="racket">#lang racket
(define current-S-weight (make-parameter 1))
Line 2,784 ⟶ 2,763:
T : 42
TOTAL: 193</pre>
(formerly Perl 6)
Line 2,791 ⟶ 2,769:
<syntaxhighlight lang="raku" line>my @bases = <A C G T>;
# The DNA strand
Line 2,847 ⟶ 2,825:
G 43
T 53</pre>
<syntaxhighlight lang="ring">
row = 0
dnaList = []
Line 3,008 ⟶ 2,985:
A: 83, T: 32, C: 36, G: 49, Total: 200
<syntaxhighlight lang="ruby">class DNA_Seq
attr_accessor :seq
Line 3,064 ⟶ 3,040:
Total 199: {:A=>52, :C=>50, :G=>49, :T=>48}
<syntaxhighlight lang="rust">
use rand::prelude::*;
use std::collections::HashMap;
use std::fmt::{Display, Formatter, Error};
pub struct Seq<'a> {
alphabet: Vec<&'a str>,
distr: rand::distributions::Uniform<usize>,
pos_distr: rand::distributions::Uniform<usize>,
seq: Vec<&'a str>,
impl Display for Seq<'_> {
fn fmt(&self, f: &mut Formatter) -> Result<(), Error> {
let pretty: String = self.seq
.map(|(i, nt)| if (i + 1) % 60 == 0 { format!("{}\n", nt) } else { nt.to_string() })
let counts_hm = self.seq
.fold(HashMap::<&str, usize>::new(), |mut m, nt| {
*m.entry(nt).or_default() += 1;
let mut counts_vec: Vec<(&str, usize)> = counts_hm.into_iter().collect();
counts_vec.sort_by(|a, b| a.0.cmp(&b.0));
let counts_string = counts_vec
.fold(String::new(), |mut counts_string, (nt, count)| {
counts_string += &format!("{} = {}\n", nt, count);
write!(f, "Seq:\n{}\n\nLength: {}\n\nCounts:\n{}", pretty, self.seq.len(), counts_string)
impl Seq<'_> {
pub fn new(alphabet: Vec<&str>, len: usize) -> Seq {
let distr = rand::distributions::Uniform::new_inclusive(0, alphabet.len() - 1);
let pos_distr = rand::distributions::Uniform::new_inclusive(0, len - 1);
let seq: Vec<&str> = (0..len)
.map(|_| {
Seq { alphabet, distr, pos_distr, seq }
pub fn insert(&mut self) {
let pos = thread_rng().sample(self.pos_distr);
let nt = self.alphabet[thread_rng().sample(self.distr)];
println!("Inserting {} at position {}", nt, pos);
self.seq.insert(pos, nt);
pub fn delete(&mut self) {
let pos = thread_rng().sample(self.pos_distr);
println!("Deleting {} at position {}", self.seq[pos], pos);
pub fn swap(&mut self) {
let pos = thread_rng().sample(self.pos_distr);
let cur_nt = self.seq[pos];
let new_nt = self.alphabet[thread_rng().sample(self.distr)];
println!("Replacing {} at position {} with {}", cur_nt, pos, new_nt);
self.seq[pos] = new_nt;
fn main() {
let mut seq = Seq::new(vec!["A", "C", "T", "G"], 200);
println!("Initial sequnce:\n{}", seq);
let mut_distr = rand::distributions::Uniform::new_inclusive(0, 2);
for _ in 0..10 {
let mutation = thread_rng().sample(mut_distr);
if mutation == 0 {
} else if mutation == 1 {
} else {
println!("\nMutated sequence:\n{}", seq);
Initial sequnce:
Length: 200
A = 56
C = 38
G = 53
T = 53
Deleting C at position 197
Inserting T at position 157
Replacing C at position 149 with G
Replacing A at position 171 with G
Replacing T at position 182 with G
Deleting C at position 124
Inserting T at position 128
Replacing G at position 175 with C
Deleting A at position 35
Replacing A at position 193 with G
Mutated sequence:
Length: 199
A = 53
C = 36
G = 56
T = 54</pre>
<syntaxhighlight lang="swift">let bases: [Character] = ["A", "C", "G", "T"]
enum Action: CaseIterable {
Line 3,151 ⟶ 3,267:
G: 56
T: 45</pre>
<syntaxhighlight lang="vlang">import rand
import rand.seed
Line 3,301 ⟶ 3,416:
<syntaxhighlight lang="ecmascript">import "random" for Random
import "/fmt" for Fmt
import "/sort" for Sort
Line 3,438 ⟶ 3,552:
<syntaxhighlight lang=Yabasic"yabasic">// Rosetta Code problem:
// by Galileo, 07/2022
Line 3,526 ⟶ 3,639:
Base counts: A: 71, C: 84, G: 75, T: 82, total: 312
---Program done, press RETURN---</pre>
<syntaxhighlight lang="zkl">var [const] bases="ACGT", lbases=bases.toLower();
dna:=(190).pump(Data().howza(3),(0).random.fp(0,4),bases.get); // bucket of bytes
Line 3,588 ⟶ 3,700:
Base Counts: 191 : A(49) C(45) G(57) T(40)
<syntaxhighlight lang=Rust>
use rand::prelude::*;
use std::collections::HashMap;
use std::fmt::{Display, Formatter, Error};
pub struct Seq<'a> {
alphabet: Vec<&'a str>,
distr: rand::distributions::Uniform<usize>,
pos_distr: rand::distributions::Uniform<usize>,
seq: Vec<&'a str>,
impl Display for Seq<'_> {
fn fmt(&self, f: &mut Formatter) -> Result<(), Error> {
let pretty: String = self.seq
.map(|(i, nt)| if (i + 1) % 60 == 0 { format!("{}\n", nt) } else { nt.to_string() })
let counts_hm = self.seq
.fold(HashMap::<&str, usize>::new(), |mut m, nt| {
*m.entry(nt).or_default() += 1;
let mut counts_vec: Vec<(&str, usize)> = counts_hm.into_iter().collect();
counts_vec.sort_by(|a, b| a.0.cmp(&b.0));
let counts_string = counts_vec
.fold(String::new(), |mut counts_string, (nt, count)| {
counts_string += &format!("{} = {}\n", nt, count);
write!(f, "Seq:\n{}\n\nLength: {}\n\nCounts:\n{}", pretty, self.seq.len(), counts_string)
impl Seq<'_> {
pub fn new(alphabet: Vec<&str>, len: usize) -> Seq {
let distr = rand::distributions::Uniform::new_inclusive(0, alphabet.len() - 1);
let pos_distr = rand::distributions::Uniform::new_inclusive(0, len - 1);
let seq: Vec<&str> = (0..len)
.map(|_| {
Seq { alphabet, distr, pos_distr, seq }
pub fn insert(&mut self) {
let pos = thread_rng().sample(self.pos_distr);
let nt = self.alphabet[thread_rng().sample(self.distr)];
println!("Inserting {} at position {}", nt, pos);
self.seq.insert(pos, nt);
pub fn delete(&mut self) {
let pos = thread_rng().sample(self.pos_distr);
println!("Deleting {} at position {}", self.seq[pos], pos);
pub fn swap(&mut self) {
let pos = thread_rng().sample(self.pos_distr);
let cur_nt = self.seq[pos];
let new_nt = self.alphabet[thread_rng().sample(self.distr)];
println!("Replacing {} at position {} with {}", cur_nt, pos, new_nt);
self.seq[pos] = new_nt;
fn main() {
let mut seq = Seq::new(vec!["A", "C", "T", "G"], 200);
println!("Initial sequnce:\n{}", seq);
let mut_distr = rand::distributions::Uniform::new_inclusive(0, 2);
for _ in 0..10 {
let mutation = thread_rng().sample(mut_distr);
if mutation == 0 {
} else if mutation == 1 {
} else {
println!("\nMutated sequence:\n{}", seq);
Initial sequnce:
Length: 200
A = 56
C = 38
G = 53
T = 53
Deleting C at position 197
Inserting T at position 157
Replacing C at position 149 with G
Replacing A at position 171 with G
Replacing T at position 182 with G
Deleting C at position 124
Inserting T at position 128
Replacing G at position 175 with C
Deleting A at position 35
Replacing A at position 193 with G
Mutated sequence:
Length: 199
A = 53
C = 36
G = 56
T = 54</pre>
