Tuesday 12 July 2022

Tree Data Structures

In this post I will explain an important data structure, "The Tree". Trees are versatile data structures that are used in many algorithms, from abstract syntax trees, to the HTML DOM your Web browser uses.

Top

A Basic Tree Data Structure

Diagram 1 illustrates that a tree is simply a collection of nodes. How the nodes are layed out and, the information a given node contains, is specific to the problem being solved.

Some basic observations about a tree data structure are...

  • A tree always contains a single, top-level node, known as the root.
  • Any given tree node can have zero or more child nodes. (E.g. The root node in diagram 1, contains three child nodes)
  • Any given node has a level. In diagram 1, the root is defined as being at level 1.

Top

A Tree Structure In Code

Given that we know a node may contain zero or more children. How can we represent this in code? One way might be to have each node contain a collection of child nodes. The collection might be an array or list of nodes. A list, of course allows us to add more child nodes without having to resize the collection as with an array. Under the hood, a list will resize itself as nodes are added.

The list approach is commonly used, however, having a list instance for each node can impact the memory footprint.

As always with software, there are tradeoffs and one needs to consider the problem domain and problem size to determine the best fit.

Diagram 2 illustrates how to implement a tree structure, where child nodes are stored within a list. As is often the case in software, adding a level of indirection is key. Here, the indirection is a list that stores child nodes for any given node. As already noted, this approach can prove to be expensive in terms of memory usage. Each node must contain an instance of the list class, even if the node contains no child nodes.

Top

Self-Describing Tree Structure

Most of the time, using data structures provided by your development platform are fine. Other times, however, one needs to rethink, go back to basics and develop a more streamlined solution. Developing custom data structures to fit a niche is not a new idea.

Diagram 3 illustrates a tree structure that is fully described using a node structure. That is, no need for a separate list data structure to store child nodes.

Diagram 3 also illustrates the following...

  • Implements a doubly-linked list, allowing one to forward or reverse traverse sibling nodes.
  • Each node has a pointer to its parent node.
  • A node containing children has a pointer to its first child node and to its last child node. This caters for efficient insertion of removal of nodes.

This particular structure is optimal in the sense that only the node data required is stored. A node does not require a list or array to store child nodes. This does complicate the implementation somewhat, but, I think is worthwhile to ensure minimal memory usage.

Analysis allows us to determine that a node basically consists of a previous and next sibling. This essentially describes a doubly-linked list. The first and last child pointers describe how a node collection might describe a collection of nodes.

Top

Self-Describing Tree Source Code

Analysis of the self-describing tree structure leads to two separate records. The first describes a simple node, that contains pointers to previous and next siblings. The second, describes a node that contains the first and last child nodes, essentially a collection of nodes. The following C# code illustrates how these ideas might be implemented. Records, in C# are best represented using classes or structs.

The following is the code for a Node (C#). Note, all members could just be declared as public fields. Using private members and separate accessor functions allows us to add additional functionality. This might include raising events when the tree structure is modified.


public class Node
{
  private NodeContainer _parent;
  private Node _previousSibling;
  private Node _nextSibling;

  public Node Parent =>
    _parent;

  public Node PreviousSibling =>
    _previousSibling;

  public Node NextSibling =>
    _nextSibling;

  public void SetParent(NodeContainer container)
  {
    _parent = container;
  }

  public void SetPreviousSibling(Node prev)
  {
    _previousSibling = prev;
  }

  public void SetNextSibling(Node next)
  {
    _nextSibling = next;
  }
}
Top

The next code snippet describes a node container. As one might expect, a node container, contains zero or more nodes. Code is included to append a node to an existing node. The code also includes an All method. This method returns all nodes, using a depth-first tree search.


public class NodeContainer : Node
{
  private Node _firstChild;
  private Node _lastChild;

  public IEnumerable<Node> All()
  {
    Stack<Node> stack = new Stack<Node>();
    stack.Push(this);

    while (stack.Count > 0)
    {
      Node cur = stack.Pop();
      yield return cur;

      if (cur is NodeContainer container)
      {
        Node curSib = container._lastChild;

        while (null != curSib)
        {
          stack.Push(curSib);
          curSib = curSib.PreviousSibling;
        }
      }       
    }
  }

  public void Append(Node node)
  {
    node.SetParent(this);

    if (null == _firstChild)
    {
      _firstChild = node;
      _lastChild = node;
      node.SetNextSibling(null);
      node.SetPreviousSibling(null);
    }
    else
    {
      node.SetPreviousSibling(_lastChild);
      node.SetNextSibling(null);
      _lastChild.SetNextSibling(node);
      _lastChild = node;
    }
  }
}
Top

Summary

This post only scratches the surface when discussing tree data structures.

Tree, in software are found in many places. Examples include...
  • Abstract Syntax Trees - the result of parsing program text.
  • Expression trees - the result of parsing an expression.
  • HTML DOM - as used by a web browser.