Threaded Binary Trees

CS321 Lecture: Threaded Binary Trees                    last revised 10/31/97

Objectives:

1. To introduce inorder-threading of binary trees, and to show how inthreaded
   trees can be traversed easily.
2. To discuss threading schemes based on pre or post-order traversal instead

Materials: Transparency + Handout of a completely in-threaded binary tree

I. Inorder Threading of Binary Trees
-  ------- --------- -- ------ -----

   A. When we talked about preorder, inorder, and postorder traversal of binary
      trees in CS122, we saw that a stack is always needed to accomplish the 
      traversal - either explicitly or implicitly because of recursion.  

      1. Example: If we had a Node class with data members _info, _lchild, and
                  _rchild, we might implement a recursive inorder traversal 
                  function as follows:

                void inorder(Node * p)
                  { if (p != NULL)
                      { inorder(p -> _lchild);
                        // Do whatever visiting the _data of this node means
                        inorder(p -> _rchild)
                      }
                  }

         This could be done non-recursively as follows:

                void inorder(Node * p)
                  { stack < Node * > s;
                    do
                      {
                        while (p != NULL)
                          { s.push(p);
                            p = p -> _lchild;
                          }
                        if (! s.isempty())
                          { p = s.pop();
                            // Do whatever it means to visit p's data
                            p = p -> _rchild;
                          }
                     } 
                   while (p != NULL || s.isempty());
                    
         Implementing of the non-recursive version would use a stack whose size 
         is equal to the height of the tree; the recurive version would use an
         implicit stack of the same size.

      2. Since these traversals are used often, we would like to avoid the 
         space and time overhead of the stack if we can. It turns out that there
         is a simple tree representation that allows us to do this, while also
         allowing us to define an iterator for the tree that allows us to
         easily move from one node to the next in the appropriate order.

   B. Consider inorder traversal.  We will define the inorder successor of
      a node n as being the next node that will be visited in doing an
      inorder traversal of the tree - or some sentinel value (e.g. NULL or
      a pointer back to the header node if there is one) if the node is the
      last one visited in inorder.  

      1. Suppose we were able to define a function

                Node * insucc(Node * p);
                /* returns a pointer to the inorder successor of p */

         Suppose, further, that we arranged for there to be a header node for
         the tree, whose insucc is the first node in the inorder traversal 
         order.  if so, we could implement inorder traversal of a tree as 
         follows, without the use of a stack or recursion:

                p = insucc(header);
                while (p <> sentinel)
                  { // Do whatever it means to visit the _data at this node
                    p = insucc(p);
                  }

      2. If a node has a non-NULL right child, insucc is easy to define:

                Node * c = p -> _rchild;
                while (c -> _lchild != NULL)
                    c = c -> lchild;
                return c;

      3. However, if a node has a NULL right child, then its inorder successor
         is "above" it in the tree.  This is what the stack does for us - note
         that, in the non-recursive inorder traversal algorithm, when 
         p -> _rchild is NULL then we fall through the while loop and pop the 
         stack again, getting the node that was the parent of p.  If its right 
         child is also NULL we end up popping the stack again, going further 
         up the tree.

      4. We have also noted that a binary tree with n nodes contains n+1 NULL
         pointers.  It would be nice to do something useful with these.  One
         thing we could do with a NULL rchild pointer is to use it to point to 
         the inorder successor of the node.  We will call such a pointer a 
         thread, and a tree containing such pointers a right-inthreaded binary
         tree.

      5. Of course, we must have some way of tagging the pointers to distinguish
         between a regular child pointer and a thread.  Since this requires just
         one bit, it can generally be done at no additional cost by using a bit
         somewhere that is otherwise unused.

         a. For example, on some machines a pointer must be even, since words 
            in memory begin on even address boundaries.  Therefore, the 
            low-order bit of a pointer must be zero.  We can differentiate 
            threads from regular pointers by setting this bit to 1.

         b. On most machines, the number of bits used to store an address far
            exceeds the number needed to represent the range of addresses
            needed for the physical memory installed; hence, the high order
            bit is normally 0.  We can differentiate threads from regular
            pointers by setting this bit to 1.

   C. We can now implement insucc - and hence inorder traversal - as follows:

        Node * insucc(Node * p)
        /* returns a pointer to the inorder successor of p */
          {
            if (isthread(p -> _rchild)
                return makepointer(p -> _rchild);
            else
              { Node * c = p ->< _rchild;
                while (c -> _lchild != NULL)
                    c = c -> _lchild;
                return c;
              }
          }

        - where isthread tests the extra bit of a pointer to see if it is a
          regular pointer or a thread, and makepointer clears the extra bit
          so that the thread can be used like a poionter.

      1. What is the time efficiency of this algorithm?  Clearly, any one
         application of insucc can require time proportional to the height
         of the tree.  But what is of more interest is the average cost of
         applying insucc n times in order to visit all the nodes of the
         tree.  We call the average cost per use, averaged over all cases,
         the AMORTIZED COST.

      2. Note that a tree of n nodes contains n lchild pointers and n rchild
         pointers.  In the process of traversing the tree, insucc follows each
         non-NULL lchild pointer exactly once, and each rchild pointer (normal
         or thread) exactly once.  Therefore, the total time for traversing
         a right-intreaded tree of n nodes in inorder is O(n)!  This, of
         course, is optimal - since we must visit all n nodes.

      3. From this, it follows that the amortized cost of one use is O(n/n) =
         O(1).

   D. Note that this trick only made use of the NULL rchild pointers.  What
      about the NULL lchild pointers?  Suppose we define inpred as the
      inorder predecessor.  By symmetry, it turns out that inpred looks like
      insucc with lchild and rchild pointers interchanged.  Thus, we can
      replace NULL lchild pointers by threads to the inorder predecessor.  If we
      do so, then we can perform inorder traversal in either direction without
      the use of recursion or a stack.

      1. Such a tree is called completely inthreaded.

      2. A tree which contained only left-threads would allow reverse inorder
         traversal only.  Such a tree is called left-inthreaded.

   E. A completely inthreaded binary tree might look like the following.  

        TRANSPARENCY + HANDOUT

      Note that we make use of a header node to simplify some of the algorithms
      to follow.  The header convention is this:

      1. If the tree is empty, then the header's left child is a thread back
         to the header.  Otherwise, it points to the root of the tree.

      2. The header's right child is a pointer (not a thread) to itself.

      3. The first node (in inorder) in the tree has an lchild thread back to
         the header.  (Note that our insertion algorithm will ensure this.)
         Likewise, the last node has an rchild thread back to the header. 
         (Insert will also do this.)

      4. Note how this choice causes our insucc algorithm, when applied to the
         header, to yield the first node of the tree.  Our inpred algorithm
         also works correctly.  Finally, both algorithms return a pointer to
         the header when applied to the first/last node in the tree (as the
         case may be.)

   F. How can be build such a tree?  If we always insert new nodes in place
      of previously NULL pointers, then the following approach will work:

      1. If the new node is the lchild of its parent, then it lies between
         its parent's inpred and its parent in inorder traversal.  Therefore,
         let the lchild of the new node be the original lchild (thread) of
         the parent, and let the rchild of the new node be a thread to its
         parent.

      2. If the new node is the rchild of its parent, then it lies between
         its parent and its parent's insucc in inorder traversal.  Therefore,
         let the rchild of the new node be the original rchild (thread) of
         the parent, and let the lchild of the new node be a thread to its
         parent.

   G. As a further consideration, note that while threads as we have 
      implemented them are based on inorder traversal, they can assist the
      other traversals as well:

      1. Preorder - define the function presucc.  Note that:

         a. If a node has an lchild, then its lchild is its presucc.
            Ex: node 1 in diagram.

         b. Otherwise, if it has an rchild, then its rchild is its presucc.
            Ex: node 19

         c. If it has no children, then it is the last node to be visited in
            preorder in the left subtree of some node Q.  Let Q be the nearest
            such node having a non-empty right child.  (If all else fails, the
            header qualifies.)  Then Q's rchild is the presucc. 

                Ex: node 17     Q is 2          presucc is 5
                         27          1                     3

         d. From a node P having no actual rchild, this node Q can be found as 
            follows:

            i. Follow P's rchild thread to a node above it.  Clearly, P is in
               the left subtree of this node.  If this node has a non-thread
               rchild, then it is node Q.

           ii. If this node's rchild is a thread, then repeat the process
               as many times as necessary until a node is found having a
               non-thread rchild.  This is node Q.

          iii. Having found this node Q, P's presucc is Q's rchild.

        Time complexity for a complete traversal: note that each non-thread
        lchild is followed exactly once, and that each rchild is followed
        exactly once - therefore, the traversal is O(n), and the amortized cost
        of presucc is O(1).

      2. Reverse preorder - define the function prepred.  This is not quite as
         easy, since we must always go through the parent of the node.  Note:

         a. If a node is the lchild of its parent, then its parent is its
            prepred.  Ex: node 2.

         b. If a node is the rchild of its parent and the parent has no lchild,
            then the parent is the prepred.  Ex: node 25.

         c. Otherwise, its prepred is the last node (in preorder) in the left 
            subtree of its parent.  This can be found by going down the left
            subtree of the parent as far as possible - going right whenever
            possible, otherwise left.

            Ex: node 13 - prepred is 38.

         d. Thus, we must first define a function parent (which is useful in
            its own right and also for postsucc, it turns out.)

            For any node P, there exists a nearest ancestor Q, such that
            P is in its right subtree.  (If all else fails, the header is
            such.)  We can find this node by following lchild pointers 
            until we have followed a thread.  Then, if P is the rchild of Q,
            then Q is its parent - otherwise, we follow lchild pointers in
            the right subtree of Q until we hit P.

                ex: node 3 - Q = 1 and is its parent
                    node 12 - Q = 1.  Note that we can find the parent
                        (node 6) by going right from Q, then continuing left.
                    node 13 - Q = 6 and is its parent

         e. Given the parent function, prepred is easily defined as discussed
            above.  Note that reverse preorder traversal using prepred will
            not be O(n) for the whole tree, but rather O(n*h), since parent
            potentially involves visiting one node on each level of the tree,
            and in subsequent applications of parent the same path can be
            retraced.  Thus, the amortized cost of prepred is O(h) =
            O(log n) if the tree is well balanced.  (But O(n) worst case).

      3. Postorder traversal - define a function postsucc.

         a. By symmetry, this turns out to be similar to prepred, but with
            the roles of lchild and rchild interchanged.  To find the postsucc,
            we first find the parent of the node in question.  

         b. If the node is the rchild of its parent, or if it is the only
            child of its parent, then the parent is the postucc.
            Ex: nodes 3, 24.

         c. Otherwise, we find the first node in postorder in the right subtree
            of the parent.  This can be found by going down the subtree as far
            as possible, preferring to go left whenever possible, otherwise
            right.
            Ex: node 2 - postsucc = 32.

         d. As with prepred, postorder traversal using postsucc is O(n*h);
            amortized cost of postsucc is O(h) = O(log n).

         e. A caution on implementation: with the previous algorithms, our
            header convention has worked to our advantage to produce desired
            results - e.g. we could apply inpred, insucc, prepred, or presucc
            to the header and get a correct node, and in each case applying
            the function to the last node would lead us back to the header.
            With postsucc, some special cases are needed when leaving or
            coming back to the header due to our trick of making the header
            its own right child.  (However, the fact that the header is its
            own right child makes it easy to recognize the header.)

      4. Reverse postorder traversal - define a function postpred.

         a. By symmetry, this is analgous to presucc, but with lchild and
            rchild roles reversed.  Reverse postorder traversal using postpred
            is therefore O(n), so the amortized cost of postpred is O(1).

         b. As with postsucc, some special cases are needed around the
            header.

II. Preorder and postorder threading
--  -------- --- --------- ---------

   A. The threading scheme we have discussed has been based on the inorder
      traversal of the tree.  However, as we have seen, the inorder threads
      can also be used to accomplish other traversals (though not necessarily
      in O(1) amortized time.)

   B. If some other traversal is going to be used regularly instead of inorder,
      then an alternate threading scheme might be considered.  We could, for
      example, base a scheme on pre-order:

      1. We might build a threading scheme on the fact that if a node has a
         left child, then that child is its preorder successor.  If it has no
         left child, then its lchild pointer could be made into a thread to
         its pre-order successor.  (In this case, the rchild pointer is used
         as in an unthreaded tree.)  Presucc now becomes simply:

                if (! isthread(p -> _lchild))
                    return p -> _lchild;
                else
                    return makepointer(p -> _lchild);

      2. Alternately, we could adopt the following scheme for pre-order:

         a. If the node has no lchild, then make its lchild pointer a thread
            to its pre-order predecessor.

         b. If the node has no rchild, then make its rchild pointer a thread
            to its pre-order successor.

         c. This scheme, like the previous one, makes forward pre-order 
            traversal fairly easy:

                if (! isthread(p -> _lchild))
                    return p -> _lchild;
                else if (! isthread(p -> _rchild))
                    return p -> _rchild;
                else
                    return makepointer(p -> _rchild);

        d. Reverse pre-order is also possible with this scheme, THOUGH WE WOULD
           OCCASSIONALLY HAVE TO GO TO THE HEADER AND APPLY PRESUCC REPEATEDLY.
           (This is because a node's prepred is never below it in the tree.)

   C. We could also base a scheme on post-order.  Unfortunately, forward
      post-order will always be hard, because a node's post-order successor
      is never below it in the tree.  However, a scheme to support reverse
      post-order would be somewhat easier!
Copyright ©1998 - Russell C. Bjork