Huffman coding

From Rosetta Code
Revision as of 11:42, 17 December 2011 by 79.53.134.54 (talk) (Updated D code)
Task
Huffman coding
You are encouraged to solve this task according to the task description, using any language you may know.

Huffman encoding is a way to assign binary codes to symbols that reduces the overall number of bits used to encode a typical string of those symbols.

For example, if you use letters as symbols and have details of the frequency of occurrence of those letters in typical strings, then you could just encode each letter with a fixed number of bits, such as in ASCII codes. You can do better than this by encoding more frequently occurring letters such as e and a, with smaller bit strings; and less frequently occurring letters such as q and x with longer bit strings.

Any string of letters will be encoded as a string of bits that are no-longer of the same length per letter. To successfully decode such as string, the smaller codes assigned to letters such as 'e' cannot occur as a prefix in the larger codes such as that for 'x'.

If you were to assign a code 01 for 'e' and code 011 for 'x', then if the bits to decode started as 011... then you would not know if you should decode an 'e' or an 'x'.

The Huffman coding scheme takes each symbol and its weight (or frequency of occurrence), and generates proper encodings for each symbol taking account of the weights of each symbol, so that higher weighted symbols have less bits in their encoding. (See the WP article for more information).

A Huffman encoding can be computed by first creating a tree of nodes:

  1. Create a leaf node for each symbol and add it to the priority queue.
  2. While there is more than one node in the queue:
    1. Remove the node of highest priority (lowest probability) twice to get two nodes.
    2. Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.
    3. Add the new node to the queue.
  3. The remaining node is the root node and the tree is complete.

Traverse the constructed binary tree from root to leaves assigning and accumulating a '0' for one branch and a '1' for the other at each node. The accumulated zeros and ones at each leaf constitute a Huffman encoding for those symbols and weights:

Using the characters and their frequency from the string "this is an example for huffman encoding", create a program to generate a Huffman encoding for each character as a table.

Ada

Works with: Ada 2005

huffman.ads: <lang Ada>with Ada.Containers.Indefinite_Ordered_Maps; with Ada.Containers.Ordered_Maps; with Ada.Finalization; generic

  type Symbol_Type is private;
  with function "<" (Left, Right : Symbol_Type) return Boolean is <>;
  with procedure Put (Item : Symbol_Type);
  type Symbol_Sequence is array (Positive range <>) of Symbol_Type;
  type Frequency_Type is private;
  with function "+" (Left, Right : Frequency_Type) return Frequency_Type
    is <>;
  with function "<" (Left, Right : Frequency_Type) return Boolean is <>;

package Huffman is

  -- bits = booleans (true/false = 1/0)
  type Bit_Sequence is array (Positive range <>) of Boolean;
  Zero_Sequence : constant Bit_Sequence (1 .. 0) := (others => False);
  -- output the sequence
  procedure Put (Code : Bit_Sequence);
  -- type for freqency map
  package Frequency_Maps is new Ada.Containers.Ordered_Maps
    (Element_Type => Frequency_Type,
     Key_Type     => Symbol_Type);
  type Huffman_Tree is private;
  -- create a huffman tree from frequency map
  procedure Create_Tree
    (Tree        : out Huffman_Tree;
     Frequencies : Frequency_Maps.Map);
  -- encode a single symbol
  function Encode
    (Tree   : Huffman_Tree;
     Symbol : Symbol_Type)
     return   Bit_Sequence;
  -- encode a symbol sequence
  function Encode
    (Tree    : Huffman_Tree;
     Symbols : Symbol_Sequence)
     return    Bit_Sequence;
  -- decode a bit sequence
  function Decode
    (Tree : Huffman_Tree;
     Code : Bit_Sequence)
     return Symbol_Sequence;
  -- dump the encoding table
  procedure Dump_Encoding (Tree : Huffman_Tree);

private

  -- type for encoding map
  package Encoding_Maps is new Ada.Containers.Indefinite_Ordered_Maps
    (Element_Type => Bit_Sequence,
     Key_Type     => Symbol_Type);
  type Huffman_Node;
  type Node_Access is access Huffman_Node;
  -- a node is either internal (left_child/right_child used)
  -- or a leaf (left_child/right_child are null)
  type Huffman_Node is record
     Frequency   : Frequency_Type;
     Left_Child  : Node_Access := null;
     Right_Child : Node_Access := null;
     Symbol      : Symbol_Type;
  end record;
  -- create a leaf node
  function Create_Node
    (Symbol    : Symbol_Type;
     Frequency : Frequency_Type)
     return      Node_Access;
  -- create an internal node
  function Create_Node (Left, Right : Node_Access) return Node_Access;
  -- fill the encoding map
  procedure Fill
    (The_Node : Node_Access;
     Map      : in out Encoding_Maps.Map;
     Prefix   : Bit_Sequence);
  -- huffman tree has a tree and an encoding map
  type Huffman_Tree is new Ada.Finalization.Controlled with record
     Tree : Node_Access       := null;
     Map  : Encoding_Maps.Map := Encoding_Maps.Empty_Map;
  end record;
  -- free memory after finalization
  overriding procedure Finalize (Object : in out Huffman_Tree);

end Huffman;</lang>

huffman.adb: <lang Ada>with Ada.Text_IO; with Ada.Unchecked_Deallocation; with Ada.Containers.Vectors; package body Huffman is

  package Node_Vectors is new Ada.Containers.Vectors
    (Element_Type => Node_Access,
     Index_Type   => Positive);
  function "<" (Left, Right : Node_Access) return Boolean is
  begin
     -- compare frequency
     if Left.Frequency < Right.Frequency then
        return True;
     elsif Right.Frequency < Left.Frequency then
        return False;
     end if;
     -- same frequency, choose leaf node
     if Left.Left_Child = null and then Right.Left_Child /= null then
        return True;
     elsif Left.Left_Child /= null and then Right.Left_Child = null then
        return False;
     end if;
     -- same frequency, same node type (internal/leaf)
     if Left.Left_Child /= null then
        -- for internal nodes, compare left children, then right children
        if Left.Left_Child < Right.Left_Child then
           return True;
        elsif Right.Left_Child < Left.Left_Child then
           return False;
        else
           return Left.Right_Child < Right.Right_Child;
        end if;
     else
        -- for leaf nodes, compare symbol
        return Left.Symbol < Right.Symbol;
     end if;
  end "<";
  package Node_Vector_Sort is new Node_Vectors.Generic_Sorting;
  procedure Create_Tree
    (Tree        : out Huffman_Tree;
     Frequencies : Frequency_Maps.Map) is
     Node_Queue : Node_Vectors.Vector := Node_Vectors.Empty_Vector;
  begin
     -- insert all leafs into the queue
     declare
        use Frequency_Maps;
        Position : Cursor      := Frequencies.First;
        The_Node : Node_Access := null;
     begin
        while Position /= No_Element loop
           The_Node :=
             Create_Node
               (Symbol    => Key (Position),
                Frequency => Element (Position));
           Node_Queue.Append (The_Node);
           Next (Position);
        end loop;
     end;
     -- sort by frequency (see "<")
     Node_Vector_Sort.Sort (Node_Queue);
     -- iterate over all elements
     while not Node_Queue.Is_Empty loop
        declare
           First : constant Node_Access := Node_Queue.First_Element;
        begin
           Node_Queue.Delete_First;
           -- if we only have one node left, it is the root node of the tree
           if Node_Queue.Is_Empty then
              Tree.Tree := First;
           else
              -- create new internal node with two smallest frequencies
              declare
                 Second : constant Node_Access := Node_Queue.First_Element;
              begin
                 Node_Queue.Delete_First;
                 Node_Queue.Append (Create_Node (First, Second));
              end;
              Node_Vector_Sort.Sort (Node_Queue);
           end if;
        end;
     end loop;
     -- fill encoding map
     Fill (The_Node => Tree.Tree, Map => Tree.Map, Prefix => Zero_Sequence);
  end Create_Tree;
  -- create leaf node
  function Create_Node
    (Symbol    : Symbol_Type;
     Frequency : Frequency_Type)
     return      Node_Access
  is
     Result : Node_Access := new Huffman_Node;
  begin
     Result.Frequency := Frequency;
     Result.Symbol    := Symbol;
     return Result;
  end Create_Node;
  -- create internal node
  function Create_Node (Left, Right : Node_Access) return Node_Access is
     Result : Node_Access := new Huffman_Node;
  begin
     Result.Frequency   := Left.Frequency + Right.Frequency;
     Result.Left_Child  := Left;
     Result.Right_Child := Right;
     return Result;
  end Create_Node;
  -- fill encoding map
  procedure Fill
    (The_Node : Node_Access;
     Map      : in out Encoding_Maps.Map;
     Prefix   : Bit_Sequence) is
  begin
     if The_Node.Left_Child /= null then
        -- append false (0) for left child
        Fill (The_Node.Left_Child, Map, Prefix & False);
        -- append true (1) for right child
        Fill (The_Node.Right_Child, Map, Prefix & True);
     else
        -- leaf node reached, prefix = code for symbol
        Map.Insert (The_Node.Symbol, Prefix);
     end if;
  end Fill;
  -- free memory after finalization
  overriding procedure Finalize (Object : in out Huffman_Tree) is
     procedure Free is new Ada.Unchecked_Deallocation
       (Name   => Node_Access,
        Object => Huffman_Node);
     -- recursively free all nodes
     procedure Recursive_Free (The_Node : in out Node_Access) is
     begin
        -- free node if it is a leaf
        if The_Node.Left_Child = null then
           Free (The_Node);
        else
           -- free left and right child if node is internal
           Recursive_Free (The_Node.Left_Child);
           Recursive_Free (The_Node.Right_Child);
           -- free node afterwards
           Free (The_Node);
        end if;
     end Recursive_Free;
  begin
     -- recursively free root node
     Recursive_Free (Object.Tree);
  end Finalize;
  -- encode single symbol
  function Encode
    (Tree   : Huffman_Tree;
     Symbol : Symbol_Type)
     return   Bit_Sequence
  is
  begin
     -- simply lookup in map
     return Tree.Map.Element (Symbol);
  end Encode;
  -- encode symbol sequence
  function Encode
    (Tree    : Huffman_Tree;
     Symbols : Symbol_Sequence)
     return    Bit_Sequence
  is
  begin
     -- only one element
     if Symbols'Length = 1 then
        -- see above
        return Encode (Tree, Symbols (Symbols'First));
     else
        -- encode first element, append result of recursive call
        return Encode (Tree, Symbols (Symbols'First)) &
        Encode (Tree, Symbols (Symbols'First + 1 .. Symbols'Last));
     end if;
  end Encode;
  -- decode a bit sequence
  function Decode
    (Tree : Huffman_Tree;
     Code : Bit_Sequence)
     return Symbol_Sequence
  is
     -- maximum length = code length
     Result   : Symbol_Sequence (1 .. Code'Length);
     -- last used index of result
     Last     : Natural     := 0;
     The_Node : Node_Access := Tree.Tree;
  begin
     -- iterate over the code
     for I in Code'Range loop
        -- if current element is true, descent the right branch
        if Code (I) then
           The_Node := The_Node.Right_Child;
        else
           -- false: descend left branch
           The_Node := The_Node.Left_Child;
        end if;
        if The_Node.Left_Child = null then
           -- reached leaf node: append symbol to result
           Last          := Last + 1;
           Result (Last) := The_Node.Symbol;
           -- reset current node to root
           The_Node := Tree.Tree;
        end if;
     end loop;
     -- return subset of result array
     return Result (1 .. Last);
  end Decode;
  -- output a bit sequence
  procedure Put (Code : Bit_Sequence) is
     package Int_IO is new Ada.Text_IO.Integer_IO (Integer);
  begin
     for I in Code'Range loop
        if Code (I) then
           -- true = 1
           Int_IO.Put (1, 0);
        else
           -- false = 0
           Int_IO.Put (0, 0);
        end if;
     end loop;
     Ada.Text_IO.New_Line;
  end Put;
  -- dump encoding map
  procedure Dump_Encoding (Tree : Huffman_Tree) is
     use type Encoding_Maps.Cursor;
     Position : Encoding_Maps.Cursor := Tree.Map.First;
  begin
     -- iterate map
     while Position /= Encoding_Maps.No_Element loop
        -- key
        Put (Encoding_Maps.Key (Position));
        Ada.Text_IO.Put (" = ");
        -- code
        Put (Encoding_Maps.Element (Position));
        Encoding_Maps.Next (Position);
     end loop;
  end Dump_Encoding;

end Huffman;</lang>

example main.adb: <lang Ada>with Ada.Text_IO; with Huffman; procedure Main is

  package Char_Natural_Huffman_Tree is new Huffman
    (Symbol_Type => Character,
     Put => Ada.Text_IO.Put,
     Symbol_Sequence => String,
     Frequency_Type => Natural);
  Tree         : Char_Natural_Huffman_Tree.Huffman_Tree;
  Frequencies  : Char_Natural_Huffman_Tree.Frequency_Maps.Map;
  Input_String : constant String :=
    "this is an example for huffman encoding";

begin

  -- build frequency map
  for I in Input_String'Range loop
     declare
        use Char_Natural_Huffman_Tree.Frequency_Maps;
        Position : constant Cursor := Frequencies.Find (Input_String (I));
     begin
        if Position = No_Element then
           Frequencies.Insert (Key => Input_String (I), New_Item => 1);
        else
           Frequencies.Replace_Element
             (Position => Position,
              New_Item => Element (Position) + 1);
        end if;
     end;
  end loop;
  -- create huffman tree
  Char_Natural_Huffman_Tree.Create_Tree
    (Tree        => Tree,
     Frequencies => Frequencies);
  -- dump encodings
  Char_Natural_Huffman_Tree.Dump_Encoding (Tree => Tree);
  -- encode example string
  declare
     Code : constant Char_Natural_Huffman_Tree.Bit_Sequence :=
       Char_Natural_Huffman_Tree.Encode
         (Tree    => Tree,
          Symbols => Input_String);
  begin
     Char_Natural_Huffman_Tree.Put (Code);
     Ada.Text_IO.Put_Line
       (Char_Natural_Huffman_Tree.Decode (Tree => Tree, Code => Code));
  end;

end Main;</lang>

Output:

  = 101
a = 1001
c = 01010
d = 01011
e = 1100
f = 1101
g = 01100
h = 11111
i = 1110
l = 01101
m = 0010
n = 000
o = 0011
p = 01110
r = 01111
s = 0100
t = 10000
u = 10001
x = 11110
1000011111111001001011110010010110010001011100111101001001001110011011100101110100110111110111111100011101110100101001000101110000001010001101011111000001100
this is an example for huffman encoding

C

This code lacks a lot of needed checkings, especially for memory allocation.

<lang c>#include <stdio.h>

  1. include <stdlib.h>
  2. include <string.h>
  1. define BYTES 256

struct huffcode {

 int nbits;
 int code;

}; typedef struct huffcode huffcode_t;

struct huffheap {

 int *h;
 int n, s, cs;
 long *f;

}; typedef struct huffheap heap_t;

/* heap handling funcs */ static heap_t *_heap_create(int s, long *f) {

 heap_t *h;
 h = malloc(sizeof(heap_t));
 h->h = malloc(sizeof(int)*s);
 h->s = h->cs = s;
 h->n = 0;
 h->f = f;
 return h;

}

static void _heap_destroy(heap_t *heap) {

 free(heap->h);
 free(heap);

}

  1. define swap_(I,J) do { int t_; t_ = a[(I)]; \
     a[(I)] = a[(J)]; a[(J)] = t_; } while(0)

static void _heap_sort(heap_t *heap) {

 int i=1, j=2; /* gnome sort */
 int *a = heap->h;
 while(i < heap->n) { /* smaller values are kept at the end */
   if ( heap->f[a[i-1]] >= heap->f[a[i]] ) {
     i = j; j++;
   } else {
     swap_(i-1, i);
     i--;
     i = (i==0) ? j++ : i;
   }
 }

}

  1. undef swap_

static void _heap_add(heap_t *heap, int c) {

 if ( (heap->n + 1) > heap->s ) {
   heap->h = realloc(heap->h, heap->s + heap->cs);
   heap->s += heap->cs;
 }
 heap->h[heap->n] = c;
 heap->n++;
 _heap_sort(heap);

}

static int _heap_remove(heap_t *heap) {

 if ( heap->n > 0 ) {
   heap->n--;
   return heap->h[heap->n];
 }
 return -1;

}

/* huffmann code generator */ huffcode_t **create_huffman_codes(long *freqs) {

 huffcode_t **codes;
 heap_t *heap;
 long efreqs[BYTES*2];
 int preds[BYTES*2];
 int i, extf=BYTES;
 int r1, r2;
 memcpy(efreqs, freqs, sizeof(long)*BYTES);
 memset(&efreqs[BYTES], 0, sizeof(long)*BYTES);
 heap = _heap_create(BYTES*2, efreqs);
 if ( heap == NULL ) return NULL;
 for(i=0; i < BYTES; i++) if ( efreqs[i] > 0 ) _heap_add(heap, i);
 while( heap->n > 1 )
 {
   r1 = _heap_remove(heap);
   r2 = _heap_remove(heap);
   efreqs[extf] = efreqs[r1] + efreqs[r2];
   _heap_add(heap, extf);
   preds[r1] = extf;
   preds[r2] = -extf;
   extf++;
 }
 r1 = _heap_remove(heap);
 preds[r1] = r1;
 _heap_destroy(heap);
 codes = malloc(sizeof(huffcode_t *)*BYTES);
 int bc, bn, ix;
 for(i=0; i < BYTES; i++) {
   bc=0; bn=0;
   if ( efreqs[i] == 0 ) { codes[i] = NULL; continue; }
   ix = i;
   while( abs(preds[ix]) != ix ) {
     bc |= ((preds[ix] >= 0) ? 1 : 0 ) << bn;
     ix = abs(preds[ix]);
     bn++;
   }
   codes[i] = malloc(sizeof(huffcode_t));
   codes[i]->nbits = bn;
   codes[i]->code = bc;
 }
 return codes;

}

void free_huffman_codes(huffcode_t **c) {

 int i;
 for(i=0; i < BYTES; i++) free(c[i]);
 free(c);

}

  1. define MAXBITSPERCODE 100

void inttobits(int c, int n, char *s) {

 s[n] = 0;
 while(n > 0) {
   s[n-1] = (c%2) + '0';
   c >>= 1; n--;
 }

}

const char *test = "this is an example for huffman encoding";

int main() {

 huffcode_t **r;
 int i;
 char strbit[MAXBITSPERCODE];
 const char *p;
 long freqs[BYTES];
 memset(freqs, 0, sizeof freqs);
 p = test;
 while(*p != '\0') freqs[*p++]++;
 r = create_huffman_codes(freqs);
 for(i=0; i < BYTES; i++) {
   if ( r[i] != NULL ) {
     inttobits(r[i]->code, r[i]->nbits, strbit);
     printf("%c (%d) %s\n", i, r[i]->code, strbit);
   }
 }
 free_huffman_codes(r);
 return 0;

}</lang>

Alternative

Using a simple heap-based priority queue. Heap is an array, while ndoe tree is done by binary links. <lang c>#include <stdio.h>

  1. include <string.h>

typedef struct node_t { struct node_t *left, *right; int freq; char c; } *node;

struct node_t pool[256] = Template:0; node qqq[255], *q = qqq - 1; int n_nodes = 0, qend = 1; char *code[128] = {0}, buf[1024];

node new_node(int freq, char c, node a, node b) { node n = pool + n_nodes++; if (freq) n->c = c, n->freq = freq; else { n->left = a, n->right = b; n->freq = a->freq + b->freq; } return n; }

/* priority queue */ void qinsert(node n) { int j, i = qend++; while ((j = i / 2)) { if (q[j]->freq <= n->freq) break; q[i] = q[j], i = j; } q[i] = n; }

node qremove() { int i, l; node n = q[i = 1];

if (qend < 2) return 0; qend--; while ((l = i * 2) < qend) { if (l + 1 < qend && q[l + 1]->freq < q[l]->freq) l++; q[i] = q[l], i = l; } q[i] = q[qend]; return n; }

/* walk the tree and put 0s and 1s */ void build_code(node n, char *s, int len) { static char *out = buf; if (n->c) { s[len] = 0; strcpy(out, s); code[n->c] = out; out += len + 1; return; }

s[len] = '0'; build_code(n->left, s, len + 1); s[len] = '1'; build_code(n->right, s, len + 1); }

void init(char *s) { int i, freq[128] = {0}; char c[16];

while (*s) freq[(int)*s++]++;

for (i = 0; i < 128; i++) if (freq[i]) qinsert(new_node(freq[i], i, 0, 0));

while (qend > 2) qinsert(new_node(0, 0, qremove(), qremove()));

build_code(q[1], c, 0); }

void encode(char *s, char *out) { while (*s) { strcpy(out, code[*s]); out += strlen(code[*s++]); } }

void decode(char *s, node t) { node n = t; while (*s) { if (*s++ == '0') n = n->left; else n = n->right;

if (n->c) putchar(n->c), n = t; }

putchar('\n'); if (t != n) printf("garbage input\n"); }

int main(void) { int i; char *str = "this is an example for huffman encoding", buf[1024];

init(str); for (i = 0; i < 128; i++) if (code[i]) printf("'%c': %s\n", i, code[i]);

encode(str, buf); printf("encoded: %s\n", buf);

printf("decoded: "); decode(buf, q[1]);

return 0; }</lang>output<lang>' ': 000 'a': 1000 'c': 01101 'd': 01100 'e': 0101 'f': 0010 'g': 010000 'h': 1101 'i': 0011 'l': 010001 'm': 1111 'n': 101 'o': 1110 'p': 10011 'r': 10010 's': 1100 't': 01111 'u': 01110 'x': 01001 encoded: 0111111010011110000000111100000100010100001010100110001111100110100010101000001011101001000011010111000100010111110001010000101101011011110011000011101010000 decoded: this is an example for huffman encoding</lang>

C++

This code builds a tree to generate huffman codes, then prints the codes.

<lang cpp>#include <iostream>

  1. include <queue>
  2. include <map>
  3. include <climits> // for CHAR_BIT
  4. include <iterator>
  5. include <algorithm>

const int UniqueSymbols = 1 << CHAR_BIT; const char* SampleString = "this is an example for huffman encoding";

typedef std::vector<bool> HuffCode; typedef std::map<char, HuffCode> HuffCodeMap;

class INode { public:

   const int f;
   virtual ~INode() {}

protected:

   INode(int f) : f(f) {}

};

class InternalNode : public INode { public:

   INode *const left;
   INode *const right;
   InternalNode(INode* c0, INode* c1) : INode(c0->f + c1->f), left(c0), right(c1) {}
   ~InternalNode()
   {
       delete left;
       delete right;
   }

};

class LeafNode : public INode { public:

   const char c;
   LeafNode(int f, char c) : INode(f), c(c) {}

};

struct NodeCmp {

   bool operator()(const INode* lhs, const INode* rhs) const { return lhs->f > rhs->f; }

};

INode* BuildTree(const int (&frequencies)[UniqueSymbols]) {

   std::priority_queue<INode*, std::vector<INode*>, NodeCmp> trees;
   for (int i = 0; i < UniqueSymbols; ++i)
   {
       if(frequencies[i] != 0)
           trees.push(new LeafNode(frequencies[i], (char)i));
   }
   while (trees.size() > 1)
   {
       INode* childR = trees.top();
       trees.pop();
       INode* childL = trees.top();
       trees.pop();
       INode* parent = new InternalNode(childR, childL);
       trees.push(parent);
   }
   return trees.top();

}

void GenerateCodes(const INode* node, const HuffCode& prefix, HuffCodeMap& outCodes) {

   if (const LeafNode* lf = dynamic_cast<const LeafNode*>(node))
   {
       outCodes[lf->c] = prefix;
   }
   else if (const InternalNode* in = dynamic_cast<const InternalNode*>(node))
   {
       HuffCode leftPrefix = prefix;
       leftPrefix.push_back(false);
       GenerateCodes(in->left, leftPrefix, outCodes);
       HuffCode rightPrefix = prefix;
       rightPrefix.push_back(true);
       GenerateCodes(in->right, rightPrefix, outCodes);
   }

}

int main() {

   // Build frequency table
   int frequencies[UniqueSymbols] = {0};
   const char* ptr = SampleString;
   while (*ptr != '\0')
       ++frequencies[*ptr++];
   INode* root = BuildTree(frequencies);
   
   HuffCodeMap codes;
   GenerateCodes(root, HuffCode(), codes);
   delete root;
   for (HuffCodeMap::const_iterator it = codes.begin(); it != codes.end(); ++it)
   {
       std::cout << it->first << " ";
       std::copy(it->second.begin(), it->second.end(),
                 std::ostream_iterator<bool>(std::cout));
       std::cout << std::endl;
   }
   return 0;

}</lang>

Sample output:

  110
a 1001
c 101010
d 10001
e 1111
f 1011
g 101011
h 0101
i 1110
l 01110
m 0011
n 000
o 0010
p 01000
r 01001
s 0110
t 01111
u 10100
x 10000

Clojure

<lang clojure> (use 'clojure.contrib.seq-utils)

(defn probs [items]

 (let [freqs (frequencies items) sum (reduce + (vals freqs))]
   (into {} (map (fn k v [k (/ v sum)]) freqs))))

(defn init-pq [weighted-items]

 (let [comp (proxy [java.util.Comparator] []
               (compare [a b] (compare (:priority a) (:priority b))))
       pq (java.util.PriorityQueue. (count weighted-items) comp)]
   (doseq [[item prob] weighted-items] (.add pq { :symbol item, :priority prob }))
   pq))

(defn huffman-tree [pq]

 (while (> (.size pq) 1)
   (let [a (.poll pq) b (.poll pq) new-node { :priority (+ (:priority a) (:priority b)), :left a, :right b }]
     (.add pq new-node)))
 (.poll pq))        
     

(defn symbol-map

 ([t] (into {} (symbol-map t [])))
 ([{:keys [symbol,left,right] :as t} code]
   (if symbol symbol code
     (concat (symbol-map left (conj code 0))
             (symbol-map right (conj code 1))))))

(defn huffman-encode [items]

 (-> items probs init-pq huffman-tree symbol-map))

(defn display-huffman-encode [s]

 (println "SYMBOL\tWEIGHT\tHUFFMAN CODE")
 (let [probs (probs (seq s))]
   (doseq [[char code] (huffman-encode (seq s))]
     (printf "%s:\t\t%s\t\t%s\n" char (probs char) (apply str code)))))

(display-huffman-encode "this is an example for huffman encoding") </lang>

Program Output:

SYMBOL	WEIGHT	HUFFMAN CODE
 :		2/13		111
a:		1/13		1001
c:		1/39		00110
d:		1/39		00000
e:		1/13		1011
f:		1/13		1101
g:		1/39		101010
h:		2/39		0001
i:		1/13		1100
l:		1/39		10001
m:		2/39		0100
n:		4/39		011
o:		2/39		0101
p:		1/39		101011
r:		1/39		10100
s:		2/39		0010
t:		1/39		00001
u:		1/39		10000
x:		1/39		00111

CoffeeScript

<lang coffeescript> huffman_encoding_table = (counts) ->

 # counts is a hash where keys are characters and
 # values are frequencies;
 # return a hash where keys are codes and values
 # are characters
 
 build_huffman_tree = ->
   # returns a Huffman tree.  Each node has
   #   cnt: total frequency of all chars in subtree
   #   c: character to be encoded (leafs only)
   #   children: children nodes (branches only)
   q = min_queue()
   for c, cnt of counts
     q.enqueue cnt,
       cnt: cnt
       c: c
   while q.size() >= 2
     a = q.dequeue()
     b = q.dequeue()
     cnt = a.cnt + b.cnt
     node = 
       cnt: cnt
       children: [a, b]
     q.enqueue cnt, node
   root = q.dequeue()
   
 root = build_huffman_tree()
 
 codes = {}
 encode = (node, code) ->
   if node.c?
     codes[code] = node.c
   else
     encode node.children[0], code + "0"
     encode node.children[1], code + "1"
 
 encode(root, "")
 codes

min_queue = ->

 # This is very non-optimized; you could use a binary heap for better
 # performance.  Items with smaller priority get dequeued first.
 arr = []
 enqueue: (priority, data) ->
   i = 0
   while i < arr.length
     if priority < arr[i].priority
       break
     i += 1  
   arr.splice i, 0,
     priority: priority
     data: data
 dequeue: ->
   arr.shift().data
 size: -> arr.length
 _internal: ->
   arr

freq_count = (s) ->

 cnts = {}
 for c in s
   cnts[c] ?= 0
   cnts[c] += 1
 cnts
 

rpad = (s, n) ->

 while s.length < n
   s += ' '
 s

examples = [

 "this is an example for huffman encoding"
 "abcd"
 "abbccccddddddddeeeeeeeee"

]

for s in examples

 console.log "---- #{s}"
 counts = freq_count(s)
 huffman_table = huffman_encoding_table(counts)
 codes = (code for code of huffman_table).sort()
 for code in codes
   c = huffman_table[code]
   console.log "#{rpad(code, 5)}: #{c} (#{counts[c]})"
 console.log()
</lang>

output

<lang> > coffee huffman.coffee


this is an example for huffman encoding

000  : n (4) 0010 : s (2) 0011 : m (2) 0100 : o (2) 01010: t (1) 01011: x (1) 01100: p (1) 01101: l (1) 01110: r (1) 01111: u (1) 10000: c (1) 10001: d (1) 1001 : i (3) 101  : (6) 1100 : a (3) 1101 : e (3) 1110 : f (3) 11110: g (1) 11111: h (2)


abcd

00  : a (1) 01  : b (1) 10  : c (1) 11  : d (1)


abbccccddddddddeeeeeeeee

0  : e (9) 1000 : a (1) 1001 : b (2) 101  : c (4) 11  : d (8)

</lang>


Common Lisp

This implementation uses a tree built of huffman-nodes, and a hash table mapping from elements of the input sequence to huffman-nodes. The priority queue is implemented as a sorted list. (For a more efficient implementation of a priority queue, see the Heapsort task.)

<lang lisp>(defstruct huffman-node

 (weight 0 :type number)
 (element nil :type t)
 (encoding nil :type (or null bit-vector))
 (left nil :type (or null huffman-node))
 (right nil :type (or null huffman-node)))

(defun initial-huffman-nodes (sequence &key (test 'eql))

 (let* ((length (length sequence))
        (increment (/ 1 length))
        (nodes (make-hash-table :size length :test test))
        (queue '()))
   (map nil #'(lambda (element)
                (multiple-value-bind (node presentp) (gethash element nodes)
                  (if presentp
                    (incf (huffman-node-weight node) increment)
                    (let ((node (make-huffman-node :weight increment
                                                   :element element)))
                      (setf (gethash element nodes) node
                            queue (list* node queue))))))
        sequence)
   (values nodes (sort queue '< :key 'huffman-node-weight))))

(defun huffman-tree (sequence &key (test 'eql))

 (multiple-value-bind (nodes queue)
     (initial-huffman-nodes sequence :test test)
   (do () ((endp (rest queue)) (values nodes (first queue)))
     (destructuring-bind (n1 n2 &rest queue-rest) queue
       (let ((n3 (make-huffman-node
                  :left n1
                  :right n2
                  :weight (+ (huffman-node-weight n1)
                             (huffman-node-weight n2)))))
         (setf queue (merge 'list (list n3) queue-rest '<
                            :key 'huffman-node-weight)))))))1

(defun huffman-codes (sequence &key (test 'eql))

 (multiple-value-bind (nodes tree)
     (huffman-tree sequence :test test)
   (labels ((hc (node length bits)
              (let ((left (huffman-node-left node))
                    (right (huffman-node-right node)))
                (cond
                 ((and (null left) (null right))
                  (setf (huffman-node-encoding node)
                        (make-array length :element-type 'bit
                                    :initial-contents (reverse bits))))
                 (t (hc left (1+ length) (list* 0 bits))
                    (hc right (1+ length) (list* 1 bits)))))))
     (hc tree 0 '())
     nodes)))

(defun print-huffman-code-table (nodes &optional (out *standard-output*))

 (format out "~&Element~10tWeight~20tCode")
 (loop for node being each hash-value of nodes
       do (format out "~&~s~10t~s~20t~s"
                  (huffman-node-element node)
                  (huffman-node-weight node)
                  (huffman-node-encoding node))))</lang>

Example:

> (print-huffman-code-table 
   (huffman-codes "this is an example for huffman encoding"))
Element   Weight    Code
#\t       1/39      #*10010
#\d       1/39      #*01101
#\m       2/39      #*0100
#\f       1/13      #*1100
#\o       2/39      #*0111
#\x       1/39      #*100111
#\h       2/39      #*1000
#\a       1/13      #*1010
#\s       2/39      #*0101
#\c       1/39      #*00010
#\l       1/39      #*00001
#\u       1/39      #*00011
#\e       1/13      #*1101
#\n       4/39      #*001
#\g       1/39      #*01100
#\p       1/39      #*100110
#\i       1/13      #*1011
#\r       1/39      #*00000
#\Space   2/13      #*111

D

Translation of: Python

<lang d>import std.stdio, std.algorithm, std.typecons, std.container;

auto encode(TFreq, TSymb)(in TFreq[TSymb] symb2freq) pure {

   alias Tuple!(TSymb,"symb", string,"code") Pair;
   alias Tuple!(TFreq,"freq", Pair[],"pairs") Block;
   Block[] blocks;
   foreach (symb, freq; symb2freq)
       blocks ~= Block(freq, [Pair(symb, "")]);
   auto heap = BinaryHeap!(Block[], "b < a")(blocks);
   while (heap.length > 1) {
       auto lo = heap.front(); heap.removeFront();
       auto hi = heap.front(); heap.removeFront();
       foreach (ref pair; lo.pairs)
           pair.code = '0' ~ pair.code;
       foreach (ref pair; hi.pairs)
           pair.code = '1' ~ pair.code;
       heap.insert(Block(lo.freq + hi.freq, lo.pairs ~ hi.pairs));
   }
   // schwartzSort!q{ a.code.length }(heap.front().pairs);
   schwartzSort!((p){ return p.code.length; })(heap.front().pairs);
   return heap.front().pairs;

}

void main() {

   const txt = "this is an example for huffman encoding";
   int[char] symb2freq;
   foreach (c; txt)
       symb2freq[c]++;
   writeln("Symbol\tWeight\tHuffman Code");
   foreach (p; encode(symb2freq))
       writefln("%s\t%s\t%s", p.symb, symb2freq[p.symb], p.code);

}</lang> Output:

Symbol  Weight  Huffman Code
n   4   010
    6   101
a   3   1001
e   3   1100
o   2   0011
i   3   1110
h   2   0001
f   3   1101
s   2   0111
m   2   0010
t   1   10000
g   1   00000
r   1   01101
p   1   01100
l   1   00001
u   1   10001
x   1   11110
d   1   111111
c   1   111110

F#

Translation of: OCaml

<lang fsharp>type 'a HuffmanTree =

   | Leaf of int * 'a
   | Node of int * 'a HuffmanTree * 'a HuffmanTree

let freq = function Leaf (f, _) | Node (f, _, _) -> f let freqCompare a b = compare (freq a) (freq b)

let buildTree charFreqs =

   let leaves = List.map (fun (c,f) -> Leaf (f,c)) charFreqs
   let freqSort = List.sortWith freqCompare
   let rec aux = function
       | [] -> failwith "empty list"
       | [a] -> a
       | a::b::tl ->
           let node = Node(freq a + freq b, a, b)
           aux (freqSort(node::tl))
   aux (freqSort leaves)

let rec printTree = function

 | code, Leaf (f, c) ->
     printfn "%c\t%d\t%s" c f (String.concat "" (List.rev code));
 | code, Node (_, l, r) ->
     printTree ("0"::code, l);
     printTree ("1"::code, r)

let () =

 let str = "this is an example for huffman encoding"
 let charFreqs =
   str |> Seq.groupBy id
       |> Seq.map (fun (c, vals) -> (c, Seq.length vals))
       |> Map.ofSeq
        
 let tree = charFreqs |> Map.toList |> buildTree
 printfn "Symbol\tWeight\tHuffman code";
 printTree ([], tree)</lang>

Output:

Symbol	Weight	Huffman code
p	1	00000
r	1	00001
g	1	00010
l	1	00011
n	4	001
m	2	0100
o	2	0101
c	1	01100
d	1	01101
h	2	0111
s	2	1000
x	1	10010
t	1	100110
u	1	100111
f	3	1010
i	3	1011
a	3	1100
e	3	1101
 	6	111

Fantom

<lang fantom> class Node {

 Float probability := 0.0f

}

class Leaf : Node {

 Int character
 new make (Int character, Float probability)
 {
   this.character = character
   this.probability = probability
 }

}

class Branch : Node {

 Node left
 Node right
 new make (Node left, Node right)
 {
   this.left = left
   this.right = right
   probability = this.left.probability + this.right.probability
 }

}

class Huffman {

 Node[] queue := [,]
 Str:Str table := [:]
 new make (Int[] items)
 {
   uniqueItems := items.dup.unique
   uniqueItems.each |Int item|
   {
     num := items.findAll { it == item }.size
     queue.add (Leaf(item, num.toFloat / items.size)) 
   }
   createTree 
   createTable
 }
 Void createTree ()
 {
   while (queue.size > 1)
   {
     queue.sort |a,b| {a.probability <=> b.probability}
     node1 := queue.removeAt (0)
     node2 := queue.removeAt (0)
     queue.add (Branch (node1, node2))
   }
 }
 Void traverse (Node node, Str encoding)
 {
   if (node is Leaf)
   {
     table[(node as Leaf).character.toChar] = encoding
   }
   else // (node is Branch)
   {
     traverse ((node as Branch).left, encoding + "0")
     traverse ((node as Branch).right, encoding + "1")
   }
 }
 Void createTable ()
 {
   if (queue.size != 1) return // error!
   traverse (queue.first, "")
 }
 override Str toStr ()
 {
   result := "Huffman Encoding Table:\n"
   table.keys.sort.each |Str key|
   {
     result += "$key -> ${table[key]}\n"
   }
   return result
 }

}

class Main {

 public static Void main ()
 {
   example := "this is an example for huffman encoding"
   huffman := Huffman (example.chars)
   echo ("From \"$example\"")
   echo (huffman)
 }

} </lang>

Output:

From "this is an example for huffman encoding"
Huffman Encoding Table:
  -> 101
a -> 1100
c -> 10000
d -> 10001
e -> 1101
f -> 1110
g -> 11110
h -> 11111
i -> 1001
l -> 01101
m -> 0011
n -> 000
o -> 0100
p -> 01100
r -> 01110
s -> 0010
t -> 01010
u -> 01111
x -> 01011

Go

Translation of: Java

<lang go>package main

import (

   "fmt"
   "container/heap"

)

type HuffmanTree interface {

   Freq() int

}

type HuffmanLeaf struct {

   freq  int
   value int

}

type HuffmanNode struct {

   freq        int
   left, right HuffmanTree

}

func (self HuffmanLeaf) Freq() int {

   return self.freq

}

func (self HuffmanNode) Freq() int {

   return self.freq

}

type treeHeap []HuffmanTree

func (th treeHeap) Len() int { return len(th) } func (th treeHeap) Less(i, j int) bool {

   return th[i].Freq() < th[j].Freq()

} func (th *treeHeap) Push(ele interface{}) { *th = append(*th, ele.(HuffmanTree)) } func (th *treeHeap) Pop() (popped interface{}) {

   popped = (*th)[len(*th)-1]
   *th = (*th)[:len(*th)-1]
   return

} func (th treeHeap) Swap(i, j int) { th[i], th[j] = th[j], th[i] }

func buildTree(charFreqs map[int]int) HuffmanTree {

   var trees treeHeap
   for c, f := range charFreqs {
       trees = append(trees, HuffmanLeaf{f, c})
   }
   heap.Init(&trees)
   for trees.Len() > 1 {
       // two trees with least frequency
       a := heap.Pop(&trees).(HuffmanTree)
       b := heap.Pop(&trees).(HuffmanTree)
       // put into new node and re-insert into queue
       heap.Push(&trees, HuffmanNode{a.Freq() + b.Freq(), a, b})
   }
   return heap.Pop(&trees).(HuffmanTree)

}

func printCodes(tree HuffmanTree, prefix []int) {

   switch i := tree.(type) {
   case HuffmanLeaf:
       // print out character, frequency, and code for this leaf (which is just the prefix)
       fmt.Printf("%c\t%d\t%s\n", i.value, i.freq, string(prefix))
   case HuffmanNode:
       // traverse left
       prefix = append(prefix, '0')
       printCodes(i.left, prefix)
       prefix = prefix[:len(prefix)-1]
       // traverse right
       prefix = append(prefix, '1')
       printCodes(i.right, prefix)
       prefix = prefix[:len(prefix)-1]
   }

}

func main() {

   test := "this is an example for huffman encoding"
   charFreqs := make(map[int]int)
   // read each character and record the frequencies
   for _, c := range test {
       charFreqs[c]++
   }
   // build tree
   tree := buildTree(charFreqs)
   // print out results
   fmt.Println("SYMBOL\tWEIGHT\tHUFFMAN CODE")
   printCodes(tree, []int{})

}</lang> Example output:

SYMBOL	WEIGHT	HUFFMAN CODE
n	4	000
m	2	0010
o	2	0011
s	2	0100
u	1	01010
p	1	01011
h	2	0110
d	1	01110
c	1	01111
t	1	10000
l	1	10001
x	1	10010
r	1	100110
g	1	100111
i	3	1010
e	3	1011
 	6	110
f	3	1110
a	3	1111
Translation of: Python

<lang go>package main

import (

   "fmt"
   "container/heap"

)

type coded struct {

   char int
   code string

}

type counted struct {

   total int
   chars []coded

}

type cHeap []counted

// satisfy heap.Interface func (c cHeap) Len() int { return len(c) } func (c cHeap) Less(i, j int) bool { return c[i].total < c[j].total } func (c cHeap) Swap(i, j int) { c[i], c[j] = c[j], c[i] } func (c *cHeap) Push(ele interface{}) {

   *c = append(*c, ele.(counted))

} func (c *cHeap) Pop() (popped interface{}) {

   popped = (*c)[len(*c)-1]
   *c = (*c)[:len(*c)-1]
   return

}

func encode(symb2freq map[int]int) []coded {

   var ch cHeap
   for char, freq := range symb2freq {
       ch = append(ch, counted{freq, []codedTemplate:Char: char})
   }
   heap.Init(&ch)
   for len(ch) > 1 {
       a := heap.Pop(&ch).(counted)
       b := heap.Pop(&ch).(counted)
       for i, c := range a.chars {
           a.chars[i].code = "0" + c.code
       }
       for i, c := range b.chars {
           b.chars[i].code = "1" + c.code
       }
       heap.Push(&ch, counted{a.total+b.total, append(a.chars, b.chars...)})
   }
   return heap.Pop(&ch).(counted).chars

}

const txt = "this is an example for huffman encoding"

func main() {

   symb2freq := make(map[int]int)
   for _, c := range txt {
       symb2freq[c]++
   }
   table := encode(symb2freq)
   fmt.Println("Symbol  Weight Huffman Code")
   for _, c := range table {
       fmt.Printf("     %c    %d    %s\n", c.char, symb2freq[c.char], c.code)
   }

}</lang>

Haskell

Credits go to huffman where you'll also find a non-tree solution. Implements its priority queue as a sorted list. <lang haskell>import Data.List import Control.Arrow import Data.Ord

data HTree a = Leaf a | Branch (HTree a) (HTree a)

               deriving (Show, Eq, Ord)

test :: String -> IO () test s = mapM_ (\(a,b)-> putStrLn ('\ : a : "\' : " ++ b))

        . serialize . huffmanTree . freq $ s

serialize :: HTree a -> [(a, String)] serialize (Branch l r) = map (second('0':)) (serialize l) ++ map (second('1':)) (serialize r) serialize (Leaf x) = [(x, "")]

huffmanTree :: (Ord w, Num w) => [(w, a)] -> HTree a huffmanTree = snd . head . until (null.tail) hstep

                 . sortBy (comparing fst) . map (second Leaf)

hstep :: (Ord a, Num a) => [(a, HTree b)] -> [(a, HTree b)] hstep ((w1,t1):(w2,t2):wts) = insertBy (comparing fst) (w1 + w2, Branch t1 t2) wts

freq :: Ord a => [a] -> [(Int, a)] freq = map (length &&& head) . group . sort</lang> Example output: <lang haskell>*Main> test "this is an example for huffman encoding" 'p' : 00000 'r' : 00001 'g' : 00010 'l' : 00011 'n' : 001 'm' : 0100 'o' : 0101 'c' : 01100 'd' : 01101 'h' : 0111 's' : 1000 'x' : 10010 't' : 100110 'u' : 100111 'f' : 1010 'i' : 1011 'a' : 1100 'e' : 1101 ' ' : 111</lang>

Using Set as a priority queue

(might be worth it for bigger alphabets): <lang haskell>import qualified Data.Set as S

htree :: (Ord t, Num t, Ord a) => S.Set (t, HTree a) -> HTree a htree ts | S.null ts_1 = t1

        | otherwise = htree ts_3
          where
            ((w1,t1), ts_1) = S.deleteFindMin ts
            ((w2,t2), ts_2) = S.deleteFindMin ts_1
            ts_3 = S.insert (w1 + w2, Branch t1 t2) ts_2

huffmanTree :: (Ord w, Num w, Ord a) => [(w, a)] -> HTree a huffmanTree = htree . S.fromList . map (second Leaf)</lang>

A non-tree version

This produces the output required of a task without building the Huffman tree at all, by building all the trace strings directly while reducing the histogram: <lang haskell>huffman :: [(Int, Char)] -> [(Char, String)] huffman = reduce . map (\(p, c) -> (p, [(c ,"")])) . sortBy (comparing fst)

 where
   reduce [(_, ys)]  = ys
   reduce (x1:x2:xs) = reduce $ insertBy (comparing fst) (add x1 x2) xs
   add (p1, xs1) (p2, xs2) = (p1 + p2, map (second ('0':)) xs1 ++ map (second ('1':)) xs2)

test s = mapM_ (\(a,b)->putStrLn ('\ : a : "\' : " ++ b)) . huffman . freq $ s</lang>

Icon and Unicon

<lang Icon>record huffnode(l,r,n,c) # internal and leaf nodes record huffcode(c,n,b,i) # encoding table char, freq, bitstring, bits (int)

procedure main()

s := "this is an example for huffman encoding"

Count := huffcount(s) # frequency count Tree := huffTree(Count) # heap and tree

Code := [] # extract encodings CodeT := table() every x := huffBits(Tree) do

  put( Code, CodeT[c] := huffcode( c := x[-1], Count[c].n, b := x[1:-1], integer("2r"||b) ) )


Code := sortf( Code, 1 ) # show table in char order write("Input String : ",image(s)) write(right("char",5), right("freq",5), " encoding" ) every write(right(image((x := !Code).c),5), right(x.n,5), " ", x.b )

end

procedure huffBits(N) # generates huffman bitcodes with trailing character if \N.c then return N.c # . append leaf char code suspend "0" || huffBits(N.l) # . left suspend "1" || huffBits(N.r) # . right end


procedure huffTree(T) # two queue huffman tree method local Q1,Q2,x,n1,n2

Q1 := [] # queue of characters and weights every x := !T do # ensure all are huffnodes

  if type(x) == "huffnode" then put(Q1,x) else runerr(205,x)

Q1 := sortf(Q1,3) # sort by weight ( 3 means by .n )

if *Q1 > 1 then Q2 := [] while *Q1+*\Q2 > 1 do { # While there is more than one node ...

  n1 := if Q1[1] & ( ( Q1[1].n <= Q2[1].n ) | not Q2[1] ) then get(Q1) else get(Q2)  # lowest weight from Q1 or Q2
  n2 := if Q1[1] & ( ( Q1[1].n <= Q2[1].n ) | not Q2[1] ) then get(Q1) else get(Q2)  # lowest weight from Q1 or Q2
  put( Q2, huffnode( n1, n2, n1.n + n2.n ) )   # new weighted node to end of Q2

}

return (\Q2 | Q1)[1] # return the root node end

procedure huffcount(s) # return characters and frequencies in a table of huffnodes by char local c,T

T := table() every c := !s do {

  /T[c] := huffnode(,,0,c) 
  T[c].n +:= 1	  
  }

return T end</lang>

Sample output:

Input String : "this is an example for huffman encoding"
 char freq encoding
  " "    6 101
  "a"    3 1100
  "c"    1 10000
  "d"    1 10001
  "e"    3 1101
  "f"    3 1110
  "g"    1 11110
  "h"    2 11111
  "i"    3 1001
  "l"    1 01101
  "m"    2 0011
  "n"    4 000
  "o"    2 0100
  "p"    1 01100
  "r"    1 01110
  "s"    2 0010
  "t"    1 01010
  "u"    1 01111
  "x"    1 01011

The following Unicon specific solution takes advantage of the Heap priority queue implementation found in the UniLib Collections package and implements the algorithm given in the problem description. The program produces Huffman codes based on each line of input. <lang Unicon>import Collections

procedure main(A)

   every line := !&input do {
       every (t := table(0))[!line] +:= 1           # Frequency table
       heap := Heap(sort(t), field, "<")            # Initial priority queue
       while heap.size() > 1 do {                   # Tree construction
           every (p1|p2) := heap.get()
           heap.add([&null, p1[2]+p2[2], p1, p2])
           }
       codes := treeWalk(heap.get(),"")             # Get codes from tree
       write("Huffman encoding:")                   # Display codes
       every pair := !sort(codes) do
           write("\t'",\pair[1],"'-> ",pair[2])
       }

end

procedure field(node) # selector function for Heap

   return node[2]  # field to use for priority ordering

end

procedure treeWalk(node, prefix, codeMap)

   /codeMap := table("")
   if /node[1] then {  # interior node
       treeWalk(node[3], prefix||"0", codeMap)
       treeWalk(node[4], prefix||"1", codeMap)
       }
   else codeMap[node[1]] := prefix
   return codeMap

end</lang>

A sample run:

->huffman
this is an example for huffman encoding
Huffman encoding:
        ' '-> 111
        'a'-> 1001
        'c'-> 00110
        'd'-> 00000
        'e'-> 1011
        'f'-> 1101
        'g'-> 101010
        'h'-> 0001
        'i'-> 1100
        'l'-> 10001
        'm'-> 0100
        'n'-> 011
        'o'-> 0101
        'p'-> 101011
        'r'-> 10100
        's'-> 0010
        't'-> 00001
        'u'-> 10000
        'x'-> 00111
aardvarks are ant eaters
Huffman encoding:
        ' '-> 011
        'a'-> 10
        'd'-> 0010
        'e'-> 010
        'k'-> 0011
        'n'-> 0001
        'r'-> 110
        's'-> 1111
        't'-> 1110
        'v'-> 0000
->

HuffStuff provides huffman encoding routines

J

Solution (drawn from the J wiki):

<lang j>hc=: 4 : 0

if. 1=#x do. y
else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=. 2{./:x end.

)

hcodes=: 4 : 0

assert. x -:&$ y           NB. weights and words have same shape
assert. (0<:x) *. 1=#$x    NB. weights are non-negative
assert. 1 >: L.y           NB. words are boxed not more than once
w=. ,&.> y                 NB. standardized words
assert. w -: ~.w           NB. words are unique
t=. 0 {:: x hc w           NB. minimal weight binary tree
((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t

)</lang>

Example:<lang j>  ;"1":L:0(#/.~ (],.(<' '),.hcodes) ,&.>@~.)'this is an example for huffman encoding'

t    0 1 0 1 0
h    1 1 1 1 1
i    1 0 0 1  
s    0 0 1 0  
     1 0 1    
a    1 1 0 0  
n    0 0 0    
e    1 1 0 1  
x    0 1 0 1 1
m    0 0 1 1  
p    0 1 1 0 0
l    0 1 1 0 1
f    1 1 1 0  
o    0 1 0 0  
r    0 1 1 1 0
u    0 1 1 1 1
c    1 0 0 0 0
d    1 0 0 0 1
g    1 1 1 1 0</lang>

Java

This implementation creates an actual tree structure, and then traverses the tree to recover the code. <lang java>import java.util.*;

abstract class HuffmanTree implements Comparable<HuffmanTree> {

   public final int frequency; // the frequency of this tree
   public HuffmanTree(int freq) { frequency = freq; }
   // compares on the frequency
   public int compareTo(HuffmanTree tree) {
       return frequency - tree.frequency;
   }

}

class HuffmanLeaf extends HuffmanTree {

   public final char value; // the character this leaf represents
  
   public HuffmanLeaf(int freq, char val) {
       super(freq);
       value = val;
   }

}

class HuffmanNode extends HuffmanTree {

   public final HuffmanTree left, right; // subtrees
  
   public HuffmanNode(HuffmanTree l, HuffmanTree r) {
       super(l.frequency + r.frequency);
       left = l;
       right = r;
   }

}

public class HuffmanCode {

   // input is an array of frequencies, indexed by character code
   public static HuffmanTree buildTree(int[] charFreqs) {
       PriorityQueue<HuffmanTree> trees = new PriorityQueue<HuffmanTree>();
       // initially, we have a forest of leaves
       // one for each non-empty character
       for (int i = 0; i < charFreqs.length; i++)
           if (charFreqs[i] > 0)
               trees.offer(new HuffmanLeaf(charFreqs[i], (char)i));
       assert trees.size() > 0;
       // loop until there is only one tree left
       while (trees.size() > 1) {
           // two trees with least frequency
           HuffmanTree a = trees.poll();
           HuffmanTree b = trees.poll();
           // put into new node and re-insert into queue
           trees.offer(new HuffmanNode(a, b));
       }
       return trees.poll();
   }
   public static void printCodes(HuffmanTree tree, StringBuffer prefix) {
       assert tree != null;
       if (tree instanceof HuffmanLeaf) {
           HuffmanLeaf leaf = (HuffmanLeaf)tree;
           // print out character, frequency, and code for this leaf (which is just the prefix)
           System.out.println(leaf.value + "\t" + leaf.frequency + "\t" + prefix);
       } else if (tree instanceof HuffmanNode) {
           HuffmanNode node = (HuffmanNode)tree;
           // traverse left
           prefix.append('0');
           printCodes(node.left, prefix);
           prefix.deleteCharAt(prefix.length()-1);
           // traverse right
           prefix.append('1');
           printCodes(node.right, prefix);
           prefix.deleteCharAt(prefix.length()-1);
       }
   }
   public static void main(String[] args) {
       String test = "this is an example for huffman encoding";
       // we will assume that all our characters will have
       // code less than 256, for simplicity
       int[] charFreqs = new int[256];
       // read each character and record the frequencies
       for (char c : test.toCharArray())
           charFreqs[c]++;
       // build tree
       HuffmanTree tree = buildTree(charFreqs);
       // print out results
       System.out.println("SYMBOL\tWEIGHT\tHUFFMAN CODE");
       printCodes(tree, new StringBuffer());
   }

}</lang>

Example output:

SYMBOL	WEIGHT	HUFFMAN CODE
d	1	00000
t	1	00001
h	2	0001
s	2	0010
c	1	00110
x	1	00111
m	2	0100
o	2	0101
n	4	011
u	1	10000
l	1	10001
a	3	1001
r	1	10100
g	1	101010
p	1	101011
e	3	1011
i	3	1100
f	3	1101
 	6	111

JavaScript

Translation of: Ruby
Works with: SpiderMonkey

for the print() function.

First, use the Binary Heap implementation from here: http://eloquentjavascript.net/appendix2.html

The Huffman encoder <lang javascript>function HuffmanEncoding(str) {

   this.str = str;
   var count_chars = {};
   for (var i = 0; i < str.length; i++) 
       if (str[i] in count_chars) 
           count_chars[str[i]] ++;
       else 
           count_chars[str[i]] = 1;
   var pq = new BinaryHeap(function(x){return x[0];});
   for (var ch in count_chars) 
       pq.push([count_chars[ch], ch]);
   while (pq.size() > 1) {
       var pair1 = pq.pop();
       var pair2 = pq.pop();
       pq.push([pair1[0]+pair2[0], [pair1[1], pair2[1]]]);
   }
   var tree = pq.pop();
   this.encoding = {};
   this._generate_encoding(tree[1], "");
   this.encoded_string = ""
   for (var i = 0; i < this.str.length; i++) {
       this.encoded_string += this.encoding[str[i]];
   }

}

HuffmanEncoding.prototype._generate_encoding = function(ary, prefix) {

   if (ary instanceof Array) {
       this._generate_encoding(ary[0], prefix + "0");
       this._generate_encoding(ary[1], prefix + "1");
   }
   else {
       this.encoding[ary] = prefix;
   }

}

HuffmanEncoding.prototype.inspect_encoding = function() {

   for (var ch in this.encoding) {
       print("'" + ch + "': " + this.encoding[ch])
   }

}

HuffmanEncoding.prototype.decode = function(encoded) {

   var rev_enc = {};
   for (var ch in this.encoding) 
       rev_enc[this.encoding[ch]] = ch;
   var decoded = "";
   var pos = 0;
   while (pos < encoded.length) {
       var key = ""
       while (!(key in rev_enc)) {
           key += encoded[pos];
           pos++;
       }
       decoded += rev_enc[key];
   }
   return decoded;

}</lang>

And, using the Huffman encoder <lang javascript>var s = "this is an example for huffman encoding"; print(s);

var huff = new HuffmanEncoding(s); huff.inspect_encoding();

var e = huff.encoded_string; print(e);

var t = huff.decode(e); print(t);

print("is decoded string same as original? " + (s==t));</lang>

output

this is an example for huffman encoding
'n': 000
's': 0010
'm': 0011
'o': 0100
't': 01010
'x': 01011
'p': 01100
'l': 01101
'r': 01110
'u': 01111
'c': 10000
'd': 10001
'i': 1001
' ': 101
'a': 1100
'e': 1101
'f': 1110
'g': 11110
'h': 11111
0101011111100100101011001001010111000001011101010111100001101100011011101101111001000111010111111011111110111000111100000101110100010000010010001100100011110
this is an example for huffman encoding
is decoded string same as original? true

Objective-C

Translation of: Java

This is not purely Objective-C. It uses Apple's Core Foundation library for its binary heap, which admittedly is very ugly. Thus, this only builds on Mac OS X, not GNUstep. <lang objc>#import <Foundation/Foundation.h>


@interface HuffmanTree : NSObject { int freq; } -(id)initWithFreq:(int)f; @property (readonly) int freq; @end

@implementation HuffmanTree @synthesize freq; // the frequency of this tree -(id)initWithFreq:(int)f { if (self = [super init]) { freq = f; } return self; } @end


const void *HuffmanRetain(CFAllocatorRef allocator, const void *ptr) { return [(id)ptr retain]; } void HuffmanRelease(CFAllocatorRef allocator, const void *ptr) { [(id)ptr release]; } CFComparisonResult HuffmanCompare(const void *ptr1, const void *ptr2, void *unused) { int f1 = ((HuffmanTree *)ptr1).freq; int f2 = ((HuffmanTree *)ptr2).freq; if (f1 == f2) return kCFCompareEqualTo; else if (f1 > f2) return kCFCompareGreaterThan; else return kCFCompareLessThan; }


@interface HuffmanLeaf : HuffmanTree { char value; // the character this leaf represents } @property (readonly) char value; -(id)initWithFreq:(int)f character:(char)c; @end

@implementation HuffmanLeaf @synthesize value; -(id)initWithFreq:(int)f character:(char)c { if (self = [super initWithFreq:f]) { value = c; } return self; } @end


@interface HuffmanNode : HuffmanTree { HuffmanTree *left, *right; // subtrees } @property (readonly) HuffmanTree *left, *right; -(id)initWithLeft:(HuffmanTree *)l right:(HuffmanTree *)r; @end

@implementation HuffmanNode @synthesize left, right; -(id)initWithLeft:(HuffmanTree *)l right:(HuffmanTree *)r { if (self = [super initWithFreq:l.freq+r.freq]) { left = [l retain]; right = [r retain]; } return self; } -(void)dealloc { [left release]; [right release]; [super dealloc]; } @end


HuffmanTree *buildTree(NSCountedSet *chars) {

CFBinaryHeapCallBacks callBacks = {0, HuffmanRetain, HuffmanRelease, NULL, HuffmanCompare}; CFBinaryHeapRef trees = CFBinaryHeapCreate(NULL, 0, &callBacks, NULL);

// initially, we have a forest of leaves // one for each non-empty character for (NSNumber *ch in chars) { int freq = [chars countForObject:ch]; if (freq > 0) CFBinaryHeapAddValue(trees, [[[HuffmanLeaf alloc] initWithFreq:freq character:(char)[ch intValue]] autorelease]); }

NSCAssert(CFBinaryHeapGetCount(trees) > 0, @"String must have at least one character"); // loop until there is only one tree left while (CFBinaryHeapGetCount(trees) > 1) { // two trees with least frequency HuffmanTree *a = (HuffmanTree *)CFBinaryHeapGetMinimum(trees); CFBinaryHeapRemoveMinimumValue(trees); HuffmanTree *b = (HuffmanTree *)CFBinaryHeapGetMinimum(trees); CFBinaryHeapRemoveMinimumValue(trees);

// put into new node and re-insert into queue CFBinaryHeapAddValue(trees, [[[HuffmanNode alloc] initWithLeft:a right:b] autorelease]); } HuffmanTree *result = [(HuffmanTree *)CFBinaryHeapGetMinimum(trees) retain]; CFRelease(trees); return [result autorelease]; }

void printCodes(HuffmanTree *tree, NSMutableString *prefix) { NSCAssert(tree != nil, @"tree must not be nil"); if ([tree isKindOfClass:[HuffmanLeaf class]]) { HuffmanLeaf *leaf = (HuffmanLeaf *)tree;

// print out character, frequency, and code for this leaf (which is just the prefix) NSLog(@"%c\t%d\t%@", leaf.value, leaf.freq, prefix);

} else if ([tree isKindOfClass:[HuffmanNode class]]) { HuffmanNode *node = (HuffmanNode *)tree;

// traverse left [prefix appendString:@"0"]; printCodes(node.left, prefix); [prefix deleteCharactersInRange:NSMakeRange([prefix length]-1, 1)];

// traverse right [prefix appendString:@"1"]; printCodes(node.right, prefix); [prefix deleteCharactersInRange:NSMakeRange([prefix length]-1, 1)]; } }

int main(int argc, const char * argv[]) {

   NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];

NSString *test = @"this is an example for huffman encoding";

// read each character and record the frequencies NSCountedSet *chars = [[NSCountedSet alloc] init]; int n = [test length]; for (int i = 0; i < n; i++) [chars addObject:[NSNumber numberWithInt:[test characterAtIndex:i]]];

// build tree HuffmanTree *tree = buildTree(chars); [chars release];

// print out results NSLog(@"SYMBOL\tWEIGHT\tHUFFMAN CODE"); printCodes(tree, [NSMutableString string]);

   [pool drain];
   return 0;

}</lang>

Example output:

SYMBOL	WEIGHT	HUFFMAN CODE
g	1	00000
x	1	00001
m	2	0001
d	1	00100
u	1	00101
t	1	00110
r	1	00111
n	4	010
s	2	0110
o	2	0111
p	1	10000
l	1	10001
a	3	1001
 	6	101
f	3	1100
e	3	1101
c	1	11100
h	2	11101
i	3	1111

OCaml

Translation of: Standard ML

We use a Set (which is automatically sorted) as a priority queue. <lang ocaml>type 'a huffman_tree =

 | Leaf of 'a
 | Node of 'a huffman_tree * 'a huffman_tree

module HSet = Set.Make

 (struct
    type t = int * char huffman_tree (* pair of frequency and the tree *)
    let compare = compare
      (* We can use the built-in compare function to order this: it will order
         first by the first element (frequency) and then by the second (the tree),
         the latter of which we don't care about but which helps prevent elements
         from being equal, since Set does not allow duplicate elements *)
  end);;

let build_tree charFreqs =

 let leaves = List.fold_left (fun z (c,f) -> HSet.add (f, Leaf c) z) HSet.empty charFreqs in
 let rec aux trees =
   let f1, a = HSet.min_elt trees in
   let trees' = HSet.remove (f1,a) trees in
   if HSet.is_empty trees' then
     a
   else
     let f2, b = HSet.min_elt trees' in
     let trees = HSet.remove (f2,b) trees' in
     let trees' = HSet.add (f1 + f2, Node (a, b)) trees in
     aux trees
 in
 aux leaves

let rec print_tree code = function

 | Leaf c ->
     Printf.printf "%c\t%s\n" c (String.concat "" (List.rev code));
 | Node (l, r) ->
     print_tree ("0"::code) l;
     print_tree ("1"::code) r

let () =

 let str = "this is an example for huffman encoding" in
 let charFreqs = Hashtbl.create 42 in
 String.iter (fun c ->
     let old =
       try Hashtbl.find charFreqs c
       with Not_found -> 0 in
     Hashtbl.replace charFreqs c (old+1)
   ) str;
 let charFreqs = Hashtbl.fold (fun c f acc -> (c,f)::acc) charFreqs [] in
 let tree = build_tree charFreqs in
 print_string "Symbol\tHuffman code\n";
 print_tree [] tree</lang>

Perl

<lang perl>sub make_tree { my %letters; $letters{$_}++ for (split "", shift); my (@nodes, $n) = map({ a=>$_, freq=>$letters{$_} }, keys %letters); while ((@nodes = sort { $a->{freq} <=> $b->{freq} or $a->{a} cmp $b->{a} } @nodes) > 1) { $n = { "0"=>shift(@nodes), "1"=>shift(@nodes) }; $n->{freq} = $n->{0}{freq} + $n->{1}{freq}; push @nodes, $n; }

walk($n, "", $n->{tree} = {}); $n; }

sub walk { my ($n, $s, $h) = @_; exists $n->{a} and do { print "'$n->{a}': $s\n"; $h->{$n->{a}} = $s if $h; return; }; walk($n->{0}, $s.0, $h); walk($n->{1}, $s.1, $h); }

sub encode { my ($s, $t) = @_; $t = $t->{tree}; join("", map($t->{$_}, split("", $s))); }

sub decode { my @b = split("", shift); my ($n, $out) = $_[0];

while (@b) { $n = $n->{shift @b}; if ($n->{a}) { $out .= $n->{a}; $n = $_[0]; } } $out; }

my $text = "this is an example for huffman encoding"; my $tree = make_tree($text); my $e = encode($text, $tree); print "$e\n"; print decode($e, $tree), "\n";</lang>output<lang>'g': 00000 'l': 00001 'p': 00010 'r': 00011 't': 00100 'u': 00101 'h': 0011 'm': 0100 'o': 0101 'n': 011 's': 1000 'x': 10010 'c': 100110 'd': 100111 'a': 1010 'e': 1011 'f': 1100 'i': 1101 ' ': 111 0010000111101100011111...111110101100000 this is an example for huffman encoding</lang>

Perl 6

<lang perl6>sub huffman ($s) {

   my $de = $s.chars;
   my @q = $s.comb.classify({$_}).map({[+.value / $de, .key]}).sort;
   while @q > 1 {

my ($a,$b) = @q.splice(0,2); @q = sort [$a[0] + $b[0], [$a[1], $b[1]]], @q;

   }
   sort *.value, gather walk @q[0][1], ;

}

multi walk (@node, $prefix) {

   walk @node[0], $prefix ~ 1;
   walk @node[1], $prefix ~ 0;

} multi walk ($node, $prefix) { take $node => $prefix }

.perl.say for huffman('this is an example for huffman encoding');</lang> Output:

"d" => "000000"
"c" => "000001"
"x" => "00001"
"i" => "0001"
"f" => "0010"
"e" => "0011"
" " => "010"
"a" => "0110"
"u" => "01110"
"t" => "01111"
"s" => "1000"
"r" => "10010"
"p" => "10011"
"n" => "101"
"o" => "1100"
"m" => "1101"
"h" => "1110"
"l" => "11110"
"g" => "11111"

To demonstrate that the table can do a round trip: <lang perl6>my $str = 'this is an example for huffman encoding'; my %enc = huffman $str; my %dec = %enc.invert; say $str; my $huf = %enc{$str.comb}.join; say $huf; my $rx = join('|', map { "'" ~ .key ~ "'" }, %dec); $rx = eval '/' ~ $rx ~ '/'; say $huf.subst(/<$rx>/, -> $/ {%dec{~$/}}, :g);</lang> Output:

this is an example for huffman encoding
0111111100001100001000011000010011010101000110000101101101100111111000110100010110010010010111001110001000101101011010101000111010000011100000000000110111111
this is an example for huffman encoding

PicoLisp

Using a con cells (freq . char) for leaves, and two cells (freq left . right) for nodes. <lang PicoLisp>(de prio (Idx)

  (while (cadr Idx) (setq Idx @))
  (car Idx) )

(let (A NIL P NIL L NIL)

  (for C (chop "this is an example for huffman encoding")
     (accu 'A C 1) )                  # Count characters
  (for X A                            # Build index tree as priority queue
     (idx 'P (cons (cdr X) (car X)) T) )
  (while (or (cadr P) (cddr P))       # Remove entries, insert as nodes
     (let (A (car (idx 'P (prio P) NIL))  B (car (idx 'P (prio P) NIL)))
        (idx 'P (cons (+ (car A) (car B)) A B) T) ) )
  (setq P (car P))
  (recur (P L)                        # Traverse and print
     (if (atom (cdr P))
        (prinl (cdr P)  " " L)
        (recurse (cadr P) (cons 0 L))
        (recurse (cddr P) (cons 1 L)) ) ) )</lang>

Output:

n 000
m 0100
o 1100
s 0010
c 01010
d 11010
g 00110
l 10110
p 01110
r 11110
t 00001
u 10001
a 1001
  101
e 0011
f 1011
i 0111
x 01111
h 11111

PHP

Works with: PHP version 5.3+
Translation of: Python

(not exactly)

<lang php><?php function encode($symb2freq) {

   $heap = new SplPriorityQueue;
   $heap->setExtractFlags(SplPriorityQueue::EXTR_BOTH);
   foreach ($symb2freq as $sym => $wt)
       $heap->insert(array($sym => ), -$wt);
   while ($heap->count() > 1) {
       $lo = $heap->extract();
       $hi = $heap->extract();
       foreach ($lo['data'] as &$x)
           $x = '0'.$x;
       foreach ($hi['data'] as &$x)
           $x = '1'.$x;
       $heap->insert($lo['data'] + $hi['data'],
                     $lo['priority'] + $hi['priority']);
   }
   $result = $heap->extract();
   return $result['data'];

}

$txt = 'this is an example for huffman encoding'; $symb2freq = array_count_values(str_split($txt)); $huff = encode($symb2freq); echo "Symbol\tWeight\tHuffman Code\n"; foreach ($huff as $sym => $code)

   echo "$sym\t$symb2freq[$sym]\t$code\n";

?></lang>

Example output:

Symbol	Weight	Huffman Code
n	4	000
m	2	0010
o	2	0011
t	1	01000
g	1	01001
x	1	01010
u	1	01011
s	2	0110
c	1	01110
d	1	01111
p	1	10000
l	1	10001
a	3	1001
 	6	101
f	3	1100
i	3	1101
r	1	11100
h	2	11101
e	3	1111

Prolog

Works with SWI-Prolog <lang Prolog>huffman :- L = 'this is an example for huffman encoding', atom_chars(L, LA), msort(LA, LS), packList(LS, PL), sort(PL, PLS), build_tree(PLS, A), coding(A, [], C), sort(C, SC), format('Symbol~t Weight~t~30|Code~n'), maplist(print_code, SC).



build_tree(R1], [V2|R2, [V, [V1|R1], [V2|R2]]) :- V is V1+V2.


build_tree(R1], [V2|R2] | T], AF) :- V is V1 + V2, A = [V, [V1|R1], [V2|R2, sort([A| T], NT), build_tree(NT, AF).


coding([_A,FG,FD], Code, CF) :- ( is_node(FG) -> coding(FG, [0 | Code], C1); leaf_coding(FG, [0|Code], C1) ), ( is_node(FD) -> coding(FD, [1 | Code], C2); leaf_coding(FD, [1 | Code], C2) ), append(C1, C2, CF).

leaf_coding([FG,FD], Code, CF) :- reverse(Code, CodeR), CF = FG, FD, CodeR .


is_node([_V, _FG, _FD]).


print_code([N, Car, Code]):- format('~w :~t~w~t~30|', [Car, N]), forall(member(V, Code), write(V)), nl.


packList([],[]).

packList([X],1,X) :- !.

packList([X|Rest],[XRun|Packed]):-

   run(X,Rest, XRun,RRest),
   packList(RRest,Packed).

run(Var,[],[1,Var],[]).

run(Var,[Var|LRest],[N1,Var],RRest):-

   run(Var,LRest,[N, Var],RRest),
   N1 is N + 1.

run(Var,[Other|RRest], [1,Var],[Other|RRest]):-

   dif(Var,Other).

</lang> Output :

 ?- huffman.
Symbol          Weight        Code
c :             1             01010
d :             1             01011
g :             1             01100
l :             1             01101
p :             1             01110
r :             1             01111
t :             1             10000
u :             1             10001
x :             1             11110
h :             2             11111
m :             2             0010
o :             2             0011
s :             2             0100
a :             3             1001
e :             3             1100
f :             3             1101
i :             3             1110
n :             4             000
  :             6             101

PureBasic

Works with: PureBasic version 4.50

<lang PureBasic>OpenConsole()

SampleString.s="this is an example for huffman encoding" datalen=Len(SampleString)

Structure ztree

 linked.c
 ischar.c
 char.c
 number.l
 left.l
 right.l

EndStructure

Dim memc.c(0) memc()=@SampleString

Dim tree.ztree(255)

For i=0 To datalen-1

 tree(memc(i))\char=memc(i)
 tree(memc(i))\number+1
 tree(memc(i))\ischar=1

Next

SortStructuredArray(tree(),#PB_Sort_Descending,OffsetOf(ztree\number),#PB_Sort_Character)

For i=0 To 255

 If tree(i)\number=0
   ReDim tree(i-1)
   Break
 EndIf

Next

dimsize=ArraySize(tree()) Repeat

 min1.l=0
 min2.l=0
 For i=0 To dimsize
   If tree(i)\linked=0
     If tree(i)\number<min1 Or min1=0
       min1=tree(i)\number
       hmin1=i
     ElseIf tree(i)\number<min2 Or min2=0
       min2=tree(i)\number
       hmin2=i
     EndIf
   EndIf
 Next
 
 If min1=0 Or min2=0
   Break
 EndIf
 
 dimsize+1
 ReDim tree(dimsize)
 tree(dimsize)\number=tree(hmin1)\number+tree(hmin2)\number
 tree(hmin1)\left=dimsize
 tree(hmin2)\right=dimsize
 tree(hmin1)\linked=1
 tree(hmin2)\linked=1

ForEver

i=0 While tree(i)\ischar=1

 str.s=""
 k=i
 ZNEXT:
 If tree(k)\left<>0
   str="0"+str
   k=tree(k)\left
   Goto ZNEXT
 ElseIf tree(k)\right<>0
   str="1"+str
   k=tree(k)\right
   Goto ZNEXT
 EndIf
 PrintN(Chr(tree(i)\char)+" "+str)
 i+1

Wend Input()

CloseConsole()</lang>

output:

  110
n 000
e 1010
f 1001
a 1011
i 1110
h 0010
s 11111
o 0011
m 0100
x 01010
u 01011
l 01100
r 01101
c 01110
g 01111
p 10000
t 10001
d 11110

Python

A slight modification of the method outlined in the task description allows the code to be accumulated as the heap is manipulated.

The output is sorted first on length of the code, then on the symbols.

<lang python>from heapq import heappush, heappop, heapify from collections import defaultdict

def encode(symb2freq):

   """Huffman encode the given dict mapping symbols to weights"""
   heap = [[wt, [sym, ""]] for sym, wt in symb2freq.items()]
   heapify(heap)
   while len(heap) > 1:
       lo = heappop(heap)
       hi = heappop(heap)
       for pair in lo[1:]:
           pair[1] = '0' + pair[1]
       for pair in hi[1:]:
           pair[1] = '1' + pair[1]
       heappush(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:])
   return sorted(heappop(heap)[1:], key=lambda p: (len(p[-1]), p))

txt = "this is an example for huffman encoding" symb2freq = defaultdict(int) for ch in txt:

   symb2freq[ch] += 1
  1. in Python 3.1+:
  2. symb2freq = collections.Counter(txt)

huff = encode(symb2freq) print "Symbol\tWeight\tHuffman Code" for p in huff:

   print "%s\t%s\t%s" % (p[0], symb2freq[p[0]], p[1])</lang>

Example output:

Symbol  Weight  Huffman Code
    6   101
n   4   010
a   3   1001
e   3   1100
f   3   1101
h   2   0001
i   3   1110
m   2   0010
o   2   0011
s   2   0111
g   1   00000
l   1   00001
p   1   01100
r   1   01101
t   1   10000
u   1   10001
x   1   11110
c   1   111110
d   1   111111

An extension to the method outlined above is given here.

Ruby

Uses a

Library: RubyGems

package PriorityQueue

<lang ruby>require 'priority_queue'

def huffman_encoding(str)

 char_count = Hash.new(0)
 str.each_char {|c| char_count[c] += 1}
 
 pq = CPriorityQueue.new
 # chars with fewest count have highest priority
 char_count.each {|char, count| pq.push(char, count)}
 
 while pq.length > 1
   key1, prio1 = pq.delete_min
   key2, prio2 = pq.delete_min
   pq.push([key1, key2], prio1 + prio2)
 end
 
 Hash[*generate_encoding(pq.min_key)]

end

def generate_encoding(ary, prefix="")

 case ary
 when Array
   generate_encoding(ary[0], "#{prefix}0") + generate_encoding(ary[1], "#{prefix}1")
 else
   [ary, prefix]
 end

end

def encode(str, encoding)

 str.each_char.collect {|char| encoding[char]}.join

end

def decode(encoded, encoding)

 rev_enc = encoding.invert
 decoded = ""
 pos = 0
 while pos < encoded.length
   key = ""
   while rev_enc[key].nil?
     key << encoded[pos]
     pos += 1
   end
   decoded << rev_enc[key]
 end
 decoded

end

str = "this is an example for huffman encoding" encoding = huffman_encoding(str) encoding.to_a.sort.each {|x| p x}

enc = encode(str, encoding) dec = decode(enc, encoding) puts "success!" if str == dec</lang>

[" ", "111"]
["a", "1011"]
["c", "00001"]
["d", "00000"]
["e", "1101"]
["f", "1100"]
["g", "00100"]
["h", "1000"]
["i", "1001"]
["l", "01110"]
["m", "10101"]
["n", "010"]
["o", "0001"]
["p", "00101"]
["r", "00111"]
["s", "0110"]
["t", "00110"]
["u", "01111"]
["x", "10100"]
success!

Scala

Works with: scala version 2.8

<lang scala>object Huffman {

 import scala.collection.mutable.{Map, PriorityQueue}
 
 sealed abstract class Tree
 case class Node(left: Tree, right: Tree) extends Tree
 case class Leaf(c: Char) extends Tree
 
 def treeOrdering(m: Map[Tree, Int]) = new Ordering[Tree] { 
    def compare(x: Tree, y: Tree) = m(y).compare(m(x))
 }
 def stringMap(text: String) = text groupBy (x => Leaf(x) : Tree) mapValues (_.length)
 
 def buildNode(queue: PriorityQueue[Tree], map: Map[Tree,Int]) {
   val right = queue.dequeue
   val left = queue.dequeue
   val node = Node(left, right)
   map(node) = map(left) + map(right)
   queue.enqueue(node)
 }
 def codify(tree: Tree, map: Map[Tree, Int]) = {
   def recurse(tree: Tree, prefix: String): List[(Char, (Int, String))] = tree match {
     case Node(left, right) => recurse(left, prefix+"0") ::: recurse(right, prefix+"1")
     case leaf @ Leaf(c) => c -> ((map(leaf), prefix)) :: Nil
   }
   recurse(tree, "")
 }
 def encode(text: String) = {
   val map = Map.empty[Tree,Int] ++= stringMap(text)
   val queue = new PriorityQueue[Tree]()(treeOrdering(map)) ++= map.keysIterator
   
   while(queue.size > 1) {
     buildNode(queue, map)
   }
   codify(queue.dequeue, map)
 }
 
 
 def main(args: Array[String]) {
   val text = "this is an example for huffman encoding"
   val code = encode(text)
   println("Char\tWeight\t\tEncoding")
   code sortBy (_._2._1) foreach { 
     case (c, (weight, encoding)) => println("%c:\t%3d/%-3d\t\t%s" format (c, weight, text.length, encoding)) 
   }
 }

}</lang>

Output:

Char    Weight          Encoding
t:        1/39          011000
p:        1/39          011001
r:        1/39          01101
c:        1/39          01110
x:        1/39          01111
g:        1/39          10110
l:        1/39          10111
u:        1/39          11000
d:        1/39          11001
o:        2/39          1010
s:        2/39          1101
m:        2/39          1110
h:        2/39          1111
f:        3/39          0000
a:        3/39          0001
e:        3/39          0010
i:        3/39          0011
n:        4/39          100
 :        6/39          010

SETL

<lang SETL>var forest := {}, encTab := {};

plaintext := 'this is an example for huffman encoding';

ft := {}; (for c in plaintext)

 ft(c) +:= 1;

end;

forest := {[f, c]: [c, f] in ft}; (while 1 < #forest)

 [f1, n1] := getLFN();
 [f2, n2] := getLFN();
 forest with:= [f1+f2, [n1,n2]];

end; addToTable(, arb range forest);

(for e = encTab(c))

 print(c, ft(c), e);

end;

print(+/ [encTab(c): c in plaintext]);

proc addToTable(prefix, node);

 if is_tuple node then
   addToTable(prefix + '0', node(1));
   addToTable(prefix + '1', node(2));
 else
   encTab(node) := prefix;
 end;

end proc;

proc getLFN();

 f := min/ domain forest;
 n := arb forest{f};
 forest less:= [f, n];
 return [f, n];

end proc;</lang>

Standard ML

Works with: SML/NJ

<lang sml>datatype 'a huffman_tree =

        Leaf of 'a
      | Node of 'a huffman_tree * 'a huffman_tree

structure HuffmanPriority = struct

 type priority = int

(* reverse comparison to achieve min-heap *)

 fun compare (a, b) = Int.compare (b, a)
 type item = int * char huffman_tree
 val priority : item -> int = #1

end

structure HPQueue = LeftPriorityQFn (HuffmanPriority)

fun buildTree charFreqs = let

   fun aux trees = let
       val ((f1,a), trees) = HPQueue.remove trees
   in
       if HPQueue.isEmpty trees then
           a
       else let
               val ((f2,b), trees) = HPQueue.remove trees
               val trees = HPQueue.insert ((f1 + f2, Node (a, b)),
                                           trees)
           in
               aux trees
           end
   end
   val trees = HPQueue.fromList (map (fn (c,f) => (f, Leaf c)) charFreqs)

in

   aux trees

end

fun printCodes (revPrefix, Leaf c) =

   print (String.str c ^ "\t" ^
          implode (rev revPrefix) ^ "\n")
 | printCodes (revPrefix, Node (l, r)) = (
   printCodes (#"0"::revPrefix, l);
   printCodes (#"1"::revPrefix, r)
   );

let

   val test = "this is an example for huffman encoding"
   val charFreqs = HashTable.mkTable
                       (HashString.hashString o String.str, op=)
                       (42, Empty)
   val () =
       app (fn c =>
               let val old = getOpt (HashTable.find charFreqs c, 0)
               in HashTable.insert charFreqs (c, old+1)
               end)
           (explode test)
   val tree = buildTree (HashTable.listItemsi charFreqs)

in

   print "SYMBOL\tHUFFMAN CODE\n";
   printCodes ([], tree)

end</lang>

Tcl

Library: Tcllib (Package: struct::prioqueue)

<lang tcl>package require Tcl 8.5 package require struct::prioqueue

proc huffmanEncode {str args} {

   array set opts [concat -dump false $args]
   
   set charcount [dict create]
   foreach char [split $str ""] {
       dict incr charcount $char
   }
   
   set pq [struct::prioqueue -dictionary] ;# want lower values to have higher priority
   dict for {char count} $charcount {
       $pq put $char $count
   }
   
   while {[$pq size] > 1} {
       lassign [$pq peekpriority 2] p1 p2
       $pq put [$pq get 2] [expr {$p1 + $p2}]
   }
   
   set encoding [walkTree [$pq get]]
   set map [dict create {*}[lreverse $encoding]]
   
   if {$opts(-dump)} {
       foreach key [lsort -command compare [dict keys $map]] {
           set char [dict get $map $key]
           puts "$char\t[dict get $charcount $char]\t$key"
       }
   }
   
   return $encoding

}

proc walkTree {tree {prefix ""}} {

   if {[llength $tree] < 2} {
       return [list $tree $prefix]
   }
   lassign $tree left right
   return [concat [walkTree $left "${prefix}0"] [walkTree $right "${prefix}1"]]

}

proc compare {a b} {

   if {[string length $a] < [string length $b]} {return -1}
   if {[string length $a] > [string length $b]} {return  1}
   return [string compare $a $b]

}

set str "this is an example for huffman encoding"

set encoding [huffmanEncode $str -dump true]

puts $str puts [string map $encoding $str]</lang> outputs

n	4	000
 	6	101
s	2	0010
m	2	0011
o	2	0100
i	3	1001
a	3	1100
e	3	1101
f	3	1110
t	1	01010
x	1	01011
p	1	01100
l	1	01101
r	1	01110
u	1	01111
c	1	10000
d	1	10001
g	1	11110
h	2	11111
this is an example for huffman encoding
0101011111100100101011001001010111000001011101010111100001101100011011101101111001000111010111111011111110111000111100000101110100010000010010001100100011110

Ursala

following the algorithm given above <lang Ursala>#import std

  1. import nat
  2. import flo

code_table = # takes a training dataset to a table <char: code...>

-+

  *^ ~&v?\~&iNC @v ~&t?\~&h ~&plrDSLrnPlrmPCAS/'01',
  ~&itB->h fleq-<&d; ^C\~&tt @hthPX ^V\~&lrNCC plus@bd,
  ^V(div@rrPlX,~&rlNVNC)^*D(plus:-0.@rS,~&)+ *K2 ^/~&h float+ length+-
  1. cast %csAL

table = code_table 'this is an example for huffman encoding'</lang> a quick walk through the code starting from the bottom:

  • *K2 ^/~&h float+ length compute character frequencies by partitioning the input list of characters by equality, and transforming each equivalence class to a pair containing its member and its cardinality represented as a floating point number
  • ^V(div@rrPlX,~&rlNVNC)^*D(plus:-0.@rS,~&) construct a list of unary trees, one for each character class, with its normalized frequency in the root, and the character in the leaf
  • ~&itB->h while the list contains more than one tree, do the following, and when done take the head of the list
  • fleq-<&d; sort the trees in increasing order by their roots
  • ^C\~&tt @hthPX ^V\~&lrNCC plus@bd change the first two trees in the sorted list to a single binary tree whose root is the sum of their roots
  • *^ visit the following function on each node of the tree obtained from the loop and propagate the results upward from the leaves
  • ~&v?\~&iNC if the node is a leaf, construct a singleton list containing the pair of its root (a character) and the empty string (of bits)
  • @v ~&t?\~&h if there is only a single subtree, propagate the result already obtained for it
  • ~&plrDSLrnPlrmPCAS/'01' otherwise there are two subtrees, hence two lists previously computed results propagating upward, so insert a zero into all of the bit strings in the results on the left, and a one into all the ones on the right, concatenate the left and right results, and propagate the contatenation upward

output:

<
   `r: '00000',
   `l: '00001',
   `c: '00010',
   `u: '00011',
   `n: '001',
   `m: '0100',
   `h: '0101',
   `g: '01100',
   `d: '01101',
   `o: '0111',
   `s: '1000',
   `t: '10010',
   `p: '100110',
   `x: '100111',
   `a: '1010',
   `f: '1011',
   `i: '1100',
   `e: '1101',
   ` : '111'>