Anonymous user
Bioinformatics/base count: Difference between revisions
→{{header|Java}}: Rework java code. Add some comments.
(→{{header|Java}}: Rework java code. Add some comments.) |
|||
Line 838:
Σ: 500</pre>
=={{header|Java}}==
For counting the bases, we simply use a <code>HashMap</code>, and then use the <code>Map.merge</code>, inserting <code>1</code>, and using <code>Integer::sum</code> as the aggregation function. This effectively creates a <code>Map</code> that keeps a running count for us. Java ''does'' provide the <code>groupingBy</code> and <code>counting</code> collectors, which would ''generally'' make these kinds of operation easier. However, <code>String</code>’s <code>chars()</code> method returns a <code>IntStream</code>, which generally just makes everything more complicated. Or verbose. Or inefficient. Ultimately, doing it by hand is easier and more efficient than with streams. The best tool for this job though would be Guava’s <code>MultiSet</code>, which is a dedicated Key to Count container.
Note that Java’s native strings are UCS-2/UTF-16: Each character is 2-byte long. If parsing from a '''very''' large ASCII/UTF8 text file, then <code>String</code> is a poor choice, as opposed to, say <code>byte[]</code>. For the purpose of this exercise though, using <code>byte[]</code> would just add uninteresting casts and bloat to the code, so we stick to <code>String</code>.
<lang Java>import java.util.HashMap;
import java.util.Map;
public class orderedSequence {
public static void main(String[] args) {
Sequence gene = new Sequence("CGTAAAAAATTACAACGTCCTTTGGCTATCTCTTAAACTCCTGCTAAATGCTCGTGCTTTCCAATTATGTAAGCGTTCCGAGACGGGGTGGTCGATTCTGAGGACAAAGGTCAAGATGGAGCGCATCGAACGCAATAAGGATCATTTGATGGGACGTTTCGTCGACAAAGTCTTGTTTCGAGAGTAACGGCTACCGTCTTCGATTCTGCTTATAACACTATGTTCTTATGAAATGGATGTTCTGAGTTGGTCAGTCCCAATGTGCGGGGTTTCTTTTAGTACGTCGGGAGTGGTATTATATTTAATTTTTCTATATAGCGATCTGTATTTAAGCAATTCATTTAGGTTATCGCCGCGATGCTCGGTTCGGACCGCCAAGCATCTGGCTCCACTGCTAGTGTCCTAAATTTGAATGGCAAACACAAATAAGATTTAGCAATTCGTGTAGACGACCGGGGACTTGCATGATGGGAGCAGCTTTGTTAAACTACGAACGTAAT");
gene.runSequence();
}
}
/
public class Sequence {
this.seq = sq;
}
/** print the organized structure of the sequence */
public void prettyPrint() {
System.out.println("Sequence:");
int i = 0;
for ( ; i < seq.length() - 50 ; i += 50) {
System.out.printf("%5s : %s\n", i + 50, seq.substring(i, i + 50));
}
System.out.printf("%5s : %s\n", seq.length(), seq.substring(i));
}
/** display a base vs. frequency chart */
public void displayCount() {
Map<Character, Integer> counter = new HashMap<>();
for (int i = 0 ; i < seq.length() ; ++i) {
counter.merge(seq.charAt(i), 1, Integer::sum);
}
System.out.println("Base vs. Count:");
counter.forEach(
key, value -> System.out.printf("%5s : %s\n", key, value));
System.out.printf("%5s: %s\n", "SUM", seq.length());
}
public void runSequence() {
this.prettyPrint();
this.displayCount();
}
}
</lang>
{{out}}
<pre>
Sequence:
50 : CGTAAAAAATTACAACGTCCTTTGGCTATCTCTTAAACTCCTGCTAAATG
100 : CTCGTGCTTTCCAATTATGTAAGCGTTCCGAGACGGGGTGGTCGATTCTG
150 : AGGACAAAGGTCAAGATGGAGCGCATCGAACGCAATAAGGATCATTTGAT
200 : GGGACGTTTCGTCGACAAAGTCTTGTTTCGAGAGTAACGGCTACCGTCTT
250 : CGATTCTGCTTATAACACTATGTTCTTATGAAATGGATGTTCTGAGTTGG
300 : TCAGTCCCAATGTGCGGGGTTTCTTTTAGTACGTCGGGAGTGGTATTATA
350 : TTTAATTTTTCTATATAGCGATCTGTATTTAAGCAATTCATTTAGGTTAT
400 : CGCCGCGATGCTCGGTTCGGACCGCCAAGCATCTGGCTCCACTGCTAGTG
450 : TCCTAAATTTGAATGGCAAACACAAATAAGATTTAGCAATTCGTGTAGAC
500 : GACCGGGGACTTGCATGATGGGAGCAGCTTTGTTAAACTACGAACGTAAT
Base vs. Count:
A : 129
C : 97
T : 155
G : 119
SUM: 500
</pre>
=={{header|JavaScript}}==
|