AVL Tree/Discussion: Difference between revisions

no edit summary
(Created page with "= AVL Trees in C# = Despite it being more than 50 years since their invention, confusion reigns in the realms of programming them. A clear and definitive algorithm is require...")
 
No edit summary
Line 9:
The simplest of data structures that support the set membership operator is an unbalanced tree. Set elements are stored in sorted ordered. This is achieved by a set node which is shown below.
 
<syntaxhighlight lang="C#" csharp>
class SetNode<T> where T : IComparable<T>
{
Line 23:
}
}
</lang>
</syntaxhighlight>
 
Modern programming languages such as C# and C++ have facilities for expressing set theory. C# has generics and C++ has templates. This allows for the concept of a set of type T (where T is the element type expressed as a parameter). The generic notation for a set of T in C# is Set<T>.
Line 29:
Above is a node for a set of T, where T is the generic class parameter. Note that the generic parameter T inherits from IComparable of T (is constrained in other words). IComparable of T is shown below.
 
<syntaxhighlight lang="C#" csharp>
public interface IComparable<T>
{
int CompareTo(T value);
}
</lang>
</syntaxhighlight>
 
This is an interface class containing a single method. Note that interface classes contain no code - they merely define the form of virtual functions. Virtual functions are functions that are called through pointers and hence it is determined at runtime which version of the function is called.
Line 52:
A set node contains a reference to an object of type T (which is comparable). It also contains a left and right reference to child nodes. The first program uses this definition to place the nodes into a data structure that orders the elements. From a given node, smaller nodes are placed on the left, and greater nodes are placed on the right. A program that does this is shown below.
 
<syntaxhighlight lang="C#" csharp>
// Set1 - Binary Search Tree - Constraints
// Copyright Benedict McNamara 2006 - 2014, All Rights Reserved
Line 174:
}
}
</lang>
</syntaxhighlight>
 
The method Add moves left and right in the tree to place a new entry in the tree. New entries are placed in the tree at a leaf of the tree. Note that the method CompareTo of the IComparable of T interface is used to determine the order of new entries. An infinite loop is used to navigate the tree, with break statements used to break out of the loop when required. The Add method is iterative and this will remain the case throughout what follows.
Line 186:
By ensuring that each node in a set contains a reference to its parent, it becomes possible to iterate on the set. Thus, the node for a set will be upgraded to the following.
 
<syntaxhighlight lang="C#" csharp>
public class SetNode<T> where T : IComparable<T>
{
Line 212:
}
}
</lang>
</syntaxhighlight>
 
Two fields have been added to the node, a parent reference and a boolean flag indicating whether the node is a header node. Header nodes will be discussed shortly. The first constructor is used to create a header node and the second constructor is the standard constructor used to create a normal node. Note that the second constructor now assumes that the parent node is passed as a parameter when creating the node.
Line 218:
When attempting to progress from one key to the next in a binary tree, the problem is simple if the current node has a non-null right child reference. One merely goes right, then all the way left up to when the next left node is null. This is because the first node on the right is greater, and continuing left from there will eventually lead to the smallest node greater than the current node. This is half the algorithm required for iteration, but the real problem comes if the right node is null (the other half). The solution lies in having a parent reference. When the right node is null, one progresses up the parent chain until a node greater than the existing node is found. The logic is not found in the set class, rather it is found in the separate enumerator class. The logic is shown below, and should be carefully examined.
 
<syntaxhighlight lang="C#" csharp>
public struct SetEntry<T> : IEnumerator<T> where T : IComparable<T>
{
Line 280:
public SetNode<T> Node;
}
</lang>
</syntaxhighlight>
 
The class SetEntry is an enumerator for the set and it is a separate class. It holds only a reference to a node in the field Node. The method MoveNext implements iteration on the set. At the top of a set is the header node. The header node points left to the beginning of the set and points right to the end of the set. The header node contains no entry, it merely acts as a sentinel in the set. The parent of the header node is the root node and the parent of the root node is the header node. The root node is at the top of the set of entries that actually hold data. In the above algorithm for iteration, when the header node is reached, iteration has proceeded beyond the end of the set (hence false is returned). MoveNext() implements the previously discussed logic for iteration. It first examines child references and failing that examines parent references to obtain the next entry in the set.
Line 286:
Set must now be implemented such that it supports parent references and a header node. The following code does this (Project Set2).
 
<syntaxhighlight lang="C#" csharp>
// Set2 - Binary Search Set
// - Implements: Parents, Iterators
Line 505:
}
}
</lang>
</syntaxhighlight>
 
The method Add has been modified to support parent references. The set membership indexer is unchanged.
Line 529:
The source code for a node swap is shown below (Project Set3). This code was ported from C++ into C#. The code was developed in C++ entirely independently. To correctly master binary trees (in particular deletion) node swapping is required.
 
<syntaxhighlight lang="C#" csharp>
static void SwapNodeReference(ref SetNode<T> First,
ref SetNode<T> Second)
Line 668:
}
}
</lang>
</syntaxhighlight>
 
There are five cases, depending upon parent child relationships between the nodes being swapped. If no parent-child relationship exists (ie. when the nodes are in different parts of the set), the last case is executed. This is the general case and it should be studied before other special cases.
Line 674:
Once swapping has been mastered, deletion then is as follows.
 
<syntaxhighlight lang="C#" csharp>
public void Remove(T Key)
{
Line 750:
}
}
</lang>
</syntaxhighlight>
 
Swapping is performed when required. Near the line containing Nodes--, balancing routines can be plugged into the set. They also need to be plugged into the add routine. This will be done in the next section where the set will be balanced. Note that only a couple of lines of the project Set3 change to achieve the balancing.
Line 756:
With project Set3, a different approach has been taken with comparability. No longer are generic constraints used. Instead, the following field has been included in the set class.
 
<syntaxhighlight lang="C#" csharp>
IComparer<T> Comparer;
</lang>
</syntaxhighlight>
 
The interface IComparer is documented in the .Net documentation. Roughly, an IComparer looks like the following interface class.
 
<syntaxhighlight lang="C#" csharp>
public interface IComparer<T>
{
int Compare(T t1, T t2);
}
</lang>
</syntaxhighlight>
 
There is another interface called IComparable, which is shown below.
 
<syntaxhighlight lang="C#" csharp>
public interface IComparable<T>
{
int CompareTo(T value);
}
</lang>
</syntaxhighlight>
 
Sets work off the IComparer interface. There are two constructors for the set class, as shown below.
 
<syntaxhighlight lang="C#" csharp>
class Set<T> : IEnumerable<T>
{
Line 803:
...
}
</lang>
</syntaxhighlight>
 
The first constructor (the default constructor) generates the IComparer for the set. The way this is done is through another class called Comparer (roughly shown below).
 
<syntaxhighlight lang="C#" csharp>
[Serializable]
public abstract class Comparer<T> : IComparer<T>
Line 815:
public abstract int Compare(T t1,T t2);
}
</lang>
</syntaxhighlight>
 
Notice that the property Default is static. Although the class itself (i.e. Comparer) is abstract (hence cannot be instantiated), the static property Default can be used without instantiating the class. What the property Default does is detect the IComparable interface in the key class T and use it to generate an IComparer. In summary, the class T derives from IComparable<T>, and the default constructor of the set class uses the CompareTo method of IComparable to generate the Compare method of IComparer. Hence we end up with a comparer for the key class. Alternatively, the Comparer may be passed in directly (via the other constructor). This accounts for the two constructors shown above. A production standard set class will use these techniques rather than use generic constraints on the key class.
Line 823:
Here is the full source code to Set3.
 
<syntaxhighlight lang="C#" csharp>
// Set3 - Binary Search Tree
// - Using IComparer Generic Interface
Line 1,224:
return string_out;
}
 
}
 
Line 1,421 ⟶ 1,420:
}
}
</lang>
</syntaxhighlight>
 
= Balancing Sets =
Line 1,436 ⟶ 1,435:
There is also a right rotation. Once left and right rotations have been discovered, it is possible to use them to maintain balancing when adding and removing entries from a tree. The type of tree to be considered is constrained in that left and right subtrees of any node are restricted to being at most 1 different in height (both of the trees above satisfy this condition). This type of tree is almost balanced, and is referred to as an AVL tree in honour of the Russian mathematicians mentioned above. It is possible (using rotations) to always satisfy the AVL condition when adding and removing entries from the tree.
 
==Definition of an AVL SetSets==
 
An AVL set is a binary tree in which the heights of the left and right subtrees of the root differ by at most 1 and in which the left and right subtrees are again AVL sets.
Line 1,448 ⟶ 1,447:
The above definition leads to the following base class for node and class for a set node.
 
<syntaxhighlight lang="C#" csharp>
public enum Direction { FromLeft, FromRight };
 
Line 1,491 ⟶ 1,490:
}
}
</lang>
</syntaxhighlight>
 
There is a non-generic base class Node. This facilitates non-generic balancing. The header node is an instance of the class Node, whereas all other nodes are instances of the class SetNode<T>. Apart from the references to the left and right subtrees and parent, a node contains a balance factor. The balance factor is one of the values from the above enumeration State. With just the addition of the member (Balance), sets can be balanced (which is quite efficient in terms of node storage). The generic class SetNode<T> derives from Node.
Line 1,497 ⟶ 1,496:
The routine that balances a set after an insertion is shown below.
 
<syntaxhighlight lang="C#" csharp>
public static void BalanceSet(Node Root, Direction From)
{
Line 1,570 ⟶ 1,569:
}
}
</lang>
</syntaxhighlight>
 
If the direction is from the left, left set balancing is instigated. If the direction is from the right, right set balancing is performed. This algorithm progresses up the parent chain of the set until the set is balanced. Certain cases can be taken care of immediately. For example, when inserting into the right subtree (FromRight), if the current node is left high, the balance factor is immediately set to State.Balanced and the job is done. Yet other cases require the set to be restructured. Ancilliary procedures BalanceLeft and BalanceRight have been created for this job. They restructure the set as required. The procedures BalanceLeft and BalanceRight are shown below.
 
<syntaxhighlight lang="C#" csharp>
static void BalanceLeft(ref Node Root)
{
Line 1,668 ⟶ 1,667:
}
}
</lang>
</syntaxhighlight>
 
Reference rotations are performed based upon the cases of the insertion. The procedures BalanceLeft and BalanceRight are symmetric under left/right reflection.
Line 1,696 ⟶ 1,695:
== The Final AVL Tree Code ==
 
<syntaxhighlight lang="C#" csharp>
// Set4 - Finite Ordered Sets - 4State - Balanced
 
Line 2,990 ⟶ 2,989:
}
}
</lang>
</syntaxhighlight>