Huffman coding

From Rosetta Code
Revision as of 01:29, 26 March 2009 by rosettacode>Mwn3d (Added task with Java)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Task
Huffman coding
You are encouraged to solve this task according to the task description, using any language you may know.

Huffman coding is a way of reducing the number of bits it takes to represent a string of symbols. It is a lossless compression algorithm. In order to generate Huffman codes you need to have a list of symbols sorted by their frequency. Say you have these symbols and frequencies:

A 50%
B 25%
C 12.5%
D 12.5%

Right away you might think that the best way to encode these symbols is with two binary bits (since there are only four), but because the frequencies of the characters differ so much, Huffman codes can save some space.

The slightly humanized procedure for generating Huffman codes for these symbols looks a little like a tree. You start with a vertex and branch out twice from it, labeling the top branch "0" and the bottom branch "1". Each time you make a "0" branch you label the vertex at the end of it with the most frequent symbol on your list. Each time you make a "1" branch you treat the new vertex at the end of it like the previous vertex, until you only have one symbol left, when you label the new "1" vertex with that symbol. The tree for the symbols above looks like this:

   A
 0/
 /
.     B
 \  0/
 1\ /
   .     C
    \  0/
    1\ /
      .
       \
       1\
         D

The Huffman codes (sometimes called "codewords") can be read off this tree by going to each symbol starting at the original vertex and appending the branch names:

A: 0
B: 10
C: 110
D: 111

Write a program which reads in symbols and frequencies in pairs separated by whitespace and generates Huffman codes for each symbol.

Java

Works with: Java version 1.5+

First a Letter class to keep things organized: <lang java5>public class Letter implements Comparable<Letter>{ public Double freq; public char letter; private String huff = "";

public Letter(char l, double f){ letter = l; freq = f; }

//For easy sorting public int compareTo(Letter o){ if(freq == o.freq){ return ((Character)letter).compareTo(o.letter); } //we want higher frequencies first so add a minus return -freq.compareTo(o.freq); }

public String getHuff(){ return huff; }

public void setHuff(String huff){ this.huff= huff; }

public String toString(){ return letter + ": " + huff; } }</lang> Now the class that actually does the work: <lang java5>import java.util.Collections; import java.util.LinkedList; import java.util.Scanner;

public class GenHuffs{

public static void main(String[] args){ LinkedList<Letter> lets = new LinkedList<Letter>(); Scanner in = new Scanner(System.in); while(in.hasNext()){ char symbol = in.next().charAt(0); double freq = in.nextDouble(); Letter newLet = new Letter(symbol, freq); lets.add(newLet); } Collections.sort(lets); genHuffs(lets); System.out.println(lets); }

/** * Sets the most frequent symbol to codeword 0, next most frequent to 10, * continues prepending 1's until the last symbol which gets all 1's * * @param letters a list of symbols sorted by frequency */ public static void genHuffs(LinkedList<Letter> letters){ String currHuff = "0"; String nextHuffPre = "1"; for(Letter l:letters){ if(l!=letters.getLast()){ l.setHuff(currHuff); currHuff = nextHuffPre + "0"; nextHuffPre += "1"; }else{ l.setHuff(currHuff.replace("0", "")); } } }

}</lang>