Trees

CS122 Lecture: Introduction to Trees                    last revised 3/11/98

Objectives:

1. To define "tree" and "binary tree"
2. To show how trees and forests can be represented as binary trees

I. Introduction

   A. Our discussion of data types has moved from elementary types (integers
      etc) to linear structured types (arrays, stacks, queues, lists etc.).
      Now we want to move to a consideration of branching structures, in
      which each element of the structure can have more than one "successor".

   B. The most general sort of branching structure is the graph, which we shall
      consider later.  First, though, we want to give considerable attention to
      a particularly useful class of branching structures: trees.

   C. Definition: A tree is a set of nodes, consisting of a special node-called
      the root - and 0 or more disjoint subsets, each of which is a tree.

      1. ex:                    A
                            /   |   \
                        B       C       E
                                |     /   \
                                D     F   G
                                          |
                                          H

        - the set of nodes A .. H is a tree.  A is the root, and the
          subtrees are B, C .. D, and E .. H.

          in the subtree B, B is the root and there are no subtrees.
          in the subtree C..D, C is the root, E is the subtree.  E in turn
            is the root of a tree with no subtrees
          in the subtree E..H, E is the root, F and G are the roots of two
            subtrees, one of which (F) has no subtrees of its own, and the
            other of which (G) has the subtree H.

      2. Note well the insistence that the subtrees be disjoint.  For
         example:
                        A
                      /   \
                     B     C
                      \   /  \
                        D     E

         is not a tree.

   D. Some terminology:

      1. Tree terminology is borrowed from two portions of the natural world:

         a. Wood type trees: we speak of the "root" of a tree and of its
            "leaves".  We have already defined the notion of "root" (but
            notice that we draw it on the top, not on the bottom!)  A leaf of
            a tree is the root of a (sub)tree that has no subtrees of its own.

         b. Geneaological trees (family trees):

            (1) If A is the root of a tree and B is the root of one of its 
                subtrees, then we say that A is the "father" or "parent" of B, 
                and B is the "son" or "child" of A.  In the above:

                - A is the parent of B, C, and E.  B,C, and E are children of A.
                - C is the parent of D, D is the child of C.
                - E is the parent of F and G; F and G are children of E.

            (2) We can carry this further, speaking of A as the grandparent of
                D etc.  In general, we say that A is the "ancestor" of H and
                H is the "descendant" of A if H is in one of the subtrees of A.
                In the example above, B, C, D, E, F, G, and H are all 
                descendants of A.
 
            (3) If two nodes are the children of the same parent), we say that 
                they are "brothers" or "siblings" or (sometimes) "twins".  In 
                the above, B, C, and E are siblings, as are F and G.

            (4) We could go farther and use terms like "uncle" - but we
                seldom do.

      2. Additional terminology:

         a. The leaves of a tree are sometimes also called "external" or
            "terminal" nodes, and the non-leaf nodes can be called "internal"
            or "non-terminal" nodes.

         b. The "degree" of a node is the number of children it has.  (Note
            that we can then define a leaf as a node with degree 0.)  The
            degree of a tree is the maximum degree of any of its nodes.  In
            the above example, the degree of A is three - and this also happens
            to be the degree of the whole tree, since the next highest degree
            is two.  It need not always be the case that the root has the
            highest degree.

         c. The "level" of a node can be defined as follows:

            - The level of a node is its distance from the root

         or - equivalently:

            - The level of the root of a tree is zero.
            - The level of any other node is 1 + the level of its parent.
            - In the above: A is at level 0, B, C, and E at level 1, D, F, and
              G at level 2, and H at level 3.

            But note: Some authors define the level of the root of a tree to be
            1, not 0.  The effect, in the above example, would to make each
            level number one greater.

         d. The "height" of a tree is the number of levels present in it.  In 
            the above example, the height is 4. (Note: this is equivalent to 
            the maximum level in the tree + 1.)

         e. A "path" from the root of a tree to a node is a sequence of nodes
            N .. N  such that N  is the root, N  is the leaf, and N  is the
             1    hi           1               h                   i
            parent of N    for all i, 1 <= i < h.
                       i+1

         f. We can define the notion of "path length" to any node as the number
            of nodes visited in travelling from the root to that node.  (We
            will include the root in our enumeration, though some authors
            don't.)  Note that by our definition, the path length to a node is
            the same as 1 + the level of the node.  We can define the "path 
            length" for a whole tree to be the sum of the path lengths of each 
            of the nodes.  Then the "average path length" for the tree will be 
            the total path length divided by the number of nodes.  This figure 
            turns out to be a measure of the time complexity of many operations
            on the tree, since operations typically work down to a node by 
            following a path from the root.

      3. In drawing our tree examples, there has been an implicit left-to-right
         ordering of the children of a given parent.  In an actual tree, this
         ordering may or may not be an important.  An "ordered" tree is one
         in which there is such an ordering imposed on the children of the
         same parent; in an "unordered" tree, no such relationship exists.

         a. Note that any practical scheme for representing a tree imposes an
            order.

         b. In our further discussion, we will work with ordered trees unless
            we explicitly say otherwise - though most of what we say about
            ordered trees applies equally to unordered trees.

   E. To further generalize, we can define the concept of a "forest" as a
      set of 0 or more disjoint trees.  

      1. Example:
                        B       C       E
                                |     /   \
                                D     F   G
                                          |
                                          H

      2. Observe: we can convert a forest to a tree by adding a single node
         to serve as the root of a tree in which each of the original trees
         is a subtree:

        ex:                     A
                            /   |   \
                        B       C       E
                                |     /   \
                                D     F   G
                                          |
                                          H
 
      3. Conversely, deleting the root from a tree leaves behind a forest
         consisting of its subtrees.  (Obviously, this is how we got our
         forest from our original tree.)

   F. In writing about trees, we can adopt one of several systems of notation:

      1. The graph-like drawings we have been using thus far.

      2. Indentation:

         ex: Our original tree:

                A
                  B
                  C
                    D
                  E
                    F
                    G
                      H

         ex: Our forest:

                B
                C
                  D
                E
                  F
                  G
                    H

      3. Parentheses.  ex: our tree

                A(B, C(D), E(F, G(H)))

   G. Some uses of trees - observe that a tree is a fundamentally hierarchical
      structure.  Thus, a tree is appropriate to model any reality that
      exhibits hierarchy:

      1. Geneaological trees of all sorts: family relationships among
         individuals, tribes, languages etc.

      2. Classifications systems:

         a. Taxonomic classification of plants and animals.
         b. Dewey decimal (or Library of Congress) classification of books.

      3. Breakdown of a manufactured product into subassemblies, each of
         turn consists of sub-subassemblies etc. down to the smallest
         components.

      4. Structure of a program - main routine is the root, procedures it
         contains are subtrees, each of which contains nested procedure
         definitions etc.

   H. Another use of trees is game trees.  Consider the problem of writing a
      program to play tic-tac-toe against a human opponent.  If the board
      configuration is:

                        X O X  where squares 5,6,8, and 9 are vacant
                        X 5 6   
                        O 8 9   

      and it is the machine's turn to move, an optimal move can be found by
      exploring all possible machine moves, then all possible human counter-
      moves, then all possible machine moves ... expanding each branch until
      it either ends in a victory or a drawn game.  This analysis can be
      represented by a tree like this, where a round node represents a
      situation in which it is the human's turn to move and a square node
      represents the machine's move.

                                        O

         [0]                 [0]                  [1]               [0]
       /  |  \             /  |  \             /   |  \          /   |  \
    1     0     1       0     0     0        1     1     1     1     1     0
  8/ \9 6/ \9 6/ \8   8/ \9 5/ \9 5/ \8    6/ \9 5/ \9 5/ \6 6/ \8 5/ \8 5/ \6
  W [0] [0][0][0] W [-1][0] [0][0][0][-1] [-1] W W[-1] W [0] [0] W [0] W [0][0]
     |8  |9 |6 |8     |9 |8  |9 |5 |8 |5   |9       |5    |5  |8    |8    |6 |5
     D   D  D  D      L  D   D  D  D  L    L        L     D   D     D     D  D
 
   I. Trees are also very useful for information storage and retrieval
      situations such as symbol tables, even though hierarchy may not be
      involved.

II. Operations on trees

   A. As with any flexible data structure, there are many possible operations
      we could define on trees.  Certainly, we want a create operation - but
      note that there is no such thing as an empty tree!  So when we create
      a tree, we create a tree having at least one node - the root.

   B. The operation of insertion into a tree is certainly important, but
      depends heavily on the principle by which the nodes are organized.
      We defer discussion of insertion and deletion to discussion of various
      special kinds of tree organized on various principles.

   C. One class of operations that can be defined for all kinds of tree is
      traversal.  By "traversal", we mean the act of systematically
      "visiting" all of the nodes to perform some operation on them:

      1. Printing out the contents of all of the nodes involves a traversal.

      2. Unless the tree is ordered somehow on the basis of some key,
         searching for a node containing a given value would involve a
         traversal (though in practice trees that are to be searched are
         usually structured in such a way as to avoid this.)

   D. One issue that arises in connection with traversal is the order of
      traversal.  Two orders are of particular importance:

      1. Preorder traversal:    Visit the root of the tree
                                Traverse each subtree in turn in preorder

        Example on the above:   A B C D E F G H

      2. Postorder traversal:   Traverse each subtree in postorder
                                Visit the root

        Example on the above:   B D C F H G E A

   E. Of lesser importance is level order traversal: visit all the nodes
      on level zero, then all on level one etc.  

        Example on the above:   A B C E D F G H

   F. The above operations can be defined on a forest by mentally adding a
      root which is ignored when it comes time to visit it.

III. Representing Trees and Forests

   A. We have noted that a forest can be converted to a tree by adding a
      root.  Thus we focus on representing trees - to represent a forest,
      simply include a "root" as a header.

   B. One method is to use a linked representation in which each node contains
      pointers to its children.  This means that when we define the data type
      for a node, the degree of the tree determines the number of pointer
      fields needed.  Pointer fields in a given node that are not needed can
      be set to nil.

        ex: for a tree of degree 3, as in our example:

        const
            degree = 3;

        type
            nodeptr = ^node;
            node = record
                info: infotype;
                child: array[1..degree] of nodeptr
            end;

   C. Now, for example, we could implement operations on this tree as follows:

      1. preorder traversal:

        procedure preorder(t: nodeptr);
        (* Traverses the tree whose root is pointed to by t in preorder *)

            var
                i: 1..degree;

            begin
                visit(t^.info);
                for i := 1 to degree do
                    if t^.child[i] <> nil then
                        preorder(t^.child[i])
            end;

       2. Reading a tree in from a text file.  Assume that the nodes of a
          tree have been written out, one node to a line, in pre-order.
          Assume each line contains the contents of the node and the number
          of its children.  

        ex:     The tree        A
                            B   C   D
                                   E F

        would be stored as:

        A 3
        B 0
        C 0
        D 2
        E 0
        F 0

We can read the tree in as follows:

        procedure ReadTree(var t: nodeptr);
        (* Reads in a sub(tree) from a list of nodes stored on disk *)

            var
                i, NoChildren: 0..degree;

            begin

                new(t);
                readln(infile, t^.info, NoChildren);
                for i := 1 to NoChildren do
                    ReadTree(t^.child[i]);
                for i := NoChildren + 1 to degree do
                    t^.child[i] := nil

            end;

   D. However, this representation runs into a severe efficiency problem if
      the degree of the tree is large.  

      1. Thm: For a tree of degree d with n nodes, we will always have 
              n*(d-1) + 1 nil pointers stored in the nodes.

         Pf: Each of the n nodes has room for d pointers - or n*d pointers
             in all.  Each node (except the root) is pointed to by exactly
             one of these.  So n-1 pointers are used to point to other
             nodes, leaving n*d - (n-1) = n*(d-1) + 1 nil.

      2. For example, for a tree of degree 10 with 100 nodes, we waste 901
         pointers.

   E. An alternate representation can be arrived at by transforming the general
      tree into a binary tree.  A binary tree is either empty, or it consists
      of a root and exactly two disjoint sets of nodes - designated left child
      and right child, each of which is a binary tree.  

      This transformation can be done recursively, as follows:

      1. To transform a general tree rooted at a node A to its equivalent
         binary tree:

        - create a binary tree whose root is A.
        - transform the leftmost subtree of A in the general tree, and make
          this the left subtree of A in the binary tree.
        - transform the next sibling of A in the general tree, and make this
          the right subtree of A in the binary tree..

      2. ex: our original tree:

                A
              /
             B
              \
               C
              /  \
             D    E
                 /
                F
                 \
                  G
                 /
                H

      3. Note that you can visualize the shape of the original tree by mentally
         rotating the binary equivalent 45 degrees counterclockwise.

      4. The same method can be applied to a forest - the right subtree of the
         binary equivalent of the root of one of the trees is the transformed
         version of the next tree in the forest.  We can see what this would
         look like for our example forest by just deleting the A node from
         the above tree.

   G. Performing traversals on a general tree represented by an equivalent
      binary tree.

      1. Preorder traversal of the general tree is accomplished by preorder
         traversal of the transformed tree.

        ex: preorder traversal of the above binary tree: A B C D E F G H

      2. Postorder traversal of the general tree is accomplished by INORDER
         traversal of the transformed tree.

         a. Inorder traversal:  traverse the left subtree in inorder
                                visit the root
                                traverse the right subtree in inorder

         b. ex: the above:      B D C F H G E A

         c. This works because:

            - The left subtree of any node in the transformed tree contains all
              the nodes that were descendants of that node in the original
              tree.  These should be visited first.
            - The right subtree of any node in the transformed tree contains
              all the nodes that were right siblings (or descendants thereof)
              of the node in the original tree. These should be visited after
              the node.

     3. Postorder traversal of the transformed tree has no relationship to
         any meaningful operation on the original tree.

   H. An equivalent to our ReadTree procedure defined above would be the
      following (again assuming that the tree is stored on the disk in
      pre-order):

        procedure ReadTree(var t: nodeptr);
        (* Reads in a sub(tree) from a list of nodes stored on disk *)

            var
                i, NoChildren: 0..degree;
                p: nodeptr;

            begin

                new(t);
                readln(infile, t^.info, NoChildren);
                if NoChildren > 0 then
                  begin
                    ReadTree(t^.lchild);
                    p := t^.lchild;
                    for i := 2 to NoChildren do
                      begin
                        ReadTree(p^.rchild);    
                        p := p^.rchild
                      end;
                    p^.rchild := nil  (* Not really necessary! *)
                  end
                else
                    t^.lchild := nil;
                t^.rchild := nil

            end;
Copyright ©1999 - Russell C. Bjork