CS122 Lecture: Introduction to Trees last revised 3/11/98
Objectives:
1. To define "tree" and "binary tree"
2. To show how trees and forests can be represented as binary trees
I. Introduction
A. Our discussion of data types has moved from elementary types (integers
etc) to linear structured types (arrays, stacks, queues, lists etc.).
Now we want to move to a consideration of branching structures, in
which each element of the structure can have more than one "successor".
B. The most general sort of branching structure is the graph, which we shall
consider later. First, though, we want to give considerable attention to
a particularly useful class of branching structures: trees.
C. Definition: A tree is a set of nodes, consisting of a special node-called
the root - and 0 or more disjoint subsets, each of which is a tree.
1. ex: A
/ | \
B C E
| / \
D F G
|
H
- the set of nodes A .. H is a tree. A is the root, and the
subtrees are B, C .. D, and E .. H.
in the subtree B, B is the root and there are no subtrees.
in the subtree C..D, C is the root, E is the subtree. E in turn
is the root of a tree with no subtrees
in the subtree E..H, E is the root, F and G are the roots of two
subtrees, one of which (F) has no subtrees of its own, and the
other of which (G) has the subtree H.
2. Note well the insistence that the subtrees be disjoint. For
example:
A
/ \
B C
\ / \
D E
is not a tree.
D. Some terminology:
1. Tree terminology is borrowed from two portions of the natural world:
a. Wood type trees: we speak of the "root" of a tree and of its
"leaves". We have already defined the notion of "root" (but
notice that we draw it on the top, not on the bottom!) A leaf of
a tree is the root of a (sub)tree that has no subtrees of its own.
b. Geneaological trees (family trees):
(1) If A is the root of a tree and B is the root of one of its
subtrees, then we say that A is the "father" or "parent" of B,
and B is the "son" or "child" of A. In the above:
- A is the parent of B, C, and E. B,C, and E are children of A.
- C is the parent of D, D is the child of C.
- E is the parent of F and G; F and G are children of E.
(2) We can carry this further, speaking of A as the grandparent of
D etc. In general, we say that A is the "ancestor" of H and
H is the "descendant" of A if H is in one of the subtrees of A.
In the example above, B, C, D, E, F, G, and H are all
descendants of A.
(3) If two nodes are the children of the same parent), we say that
they are "brothers" or "siblings" or (sometimes) "twins". In
the above, B, C, and E are siblings, as are F and G.
(4) We could go farther and use terms like "uncle" - but we
seldom do.
2. Additional terminology:
a. The leaves of a tree are sometimes also called "external" or
"terminal" nodes, and the non-leaf nodes can be called "internal"
or "non-terminal" nodes.
b. The "degree" of a node is the number of children it has. (Note
that we can then define a leaf as a node with degree 0.) The
degree of a tree is the maximum degree of any of its nodes. In
the above example, the degree of A is three - and this also happens
to be the degree of the whole tree, since the next highest degree
is two. It need not always be the case that the root has the
highest degree.
c. The "level" of a node can be defined as follows:
- The level of a node is its distance from the root
or - equivalently:
- The level of the root of a tree is zero.
- The level of any other node is 1 + the level of its parent.
- In the above: A is at level 0, B, C, and E at level 1, D, F, and
G at level 2, and H at level 3.
But note: Some authors define the level of the root of a tree to be
1, not 0. The effect, in the above example, would to make each
level number one greater.
d. The "height" of a tree is the number of levels present in it. In
the above example, the height is 4. (Note: this is equivalent to
the maximum level in the tree + 1.)
e. A "path" from the root of a tree to a node is a sequence of nodes
N .. N such that N is the root, N is the leaf, and N is the
1 hi 1 h i
parent of N for all i, 1 <= i < h.
i+1
f. We can define the notion of "path length" to any node as the number
of nodes visited in travelling from the root to that node. (We
will include the root in our enumeration, though some authors
don't.) Note that by our definition, the path length to a node is
the same as 1 + the level of the node. We can define the "path
length" for a whole tree to be the sum of the path lengths of each
of the nodes. Then the "average path length" for the tree will be
the total path length divided by the number of nodes. This figure
turns out to be a measure of the time complexity of many operations
on the tree, since operations typically work down to a node by
following a path from the root.
3. In drawing our tree examples, there has been an implicit left-to-right
ordering of the children of a given parent. In an actual tree, this
ordering may or may not be an important. An "ordered" tree is one
in which there is such an ordering imposed on the children of the
same parent; in an "unordered" tree, no such relationship exists.
a. Note that any practical scheme for representing a tree imposes an
order.
b. In our further discussion, we will work with ordered trees unless
we explicitly say otherwise - though most of what we say about
ordered trees applies equally to unordered trees.
E. To further generalize, we can define the concept of a "forest" as a
set of 0 or more disjoint trees.
1. Example:
B C E
| / \
D F G
|
H
2. Observe: we can convert a forest to a tree by adding a single node
to serve as the root of a tree in which each of the original trees
is a subtree:
ex: A
/ | \
B C E
| / \
D F G
|
H
3. Conversely, deleting the root from a tree leaves behind a forest
consisting of its subtrees. (Obviously, this is how we got our
forest from our original tree.)
F. In writing about trees, we can adopt one of several systems of notation:
1. The graph-like drawings we have been using thus far.
2. Indentation:
ex: Our original tree:
A
B
C
D
E
F
G
H
ex: Our forest:
B
C
D
E
F
G
H
3. Parentheses. ex: our tree
A(B, C(D), E(F, G(H)))
G. Some uses of trees - observe that a tree is a fundamentally hierarchical
structure. Thus, a tree is appropriate to model any reality that
exhibits hierarchy:
1. Geneaological trees of all sorts: family relationships among
individuals, tribes, languages etc.
2. Classifications systems:
a. Taxonomic classification of plants and animals.
b. Dewey decimal (or Library of Congress) classification of books.
3. Breakdown of a manufactured product into subassemblies, each of
turn consists of sub-subassemblies etc. down to the smallest
components.
4. Structure of a program - main routine is the root, procedures it
contains are subtrees, each of which contains nested procedure
definitions etc.
H. Another use of trees is game trees. Consider the problem of writing a
program to play tic-tac-toe against a human opponent. If the board
configuration is:
X O X where squares 5,6,8, and 9 are vacant
X 5 6
O 8 9
and it is the machine's turn to move, an optimal move can be found by
exploring all possible machine moves, then all possible human counter-
moves, then all possible machine moves ... expanding each branch until
it either ends in a victory or a drawn game. This analysis can be
represented by a tree like this, where a round node represents a
situation in which it is the human's turn to move and a square node
represents the machine's move.
O
[0] [0] [1] [0]
/ | \ / | \ / | \ / | \
1 0 1 0 0 0 1 1 1 1 1 0
8/ \9 6/ \9 6/ \8 8/ \9 5/ \9 5/ \8 6/ \9 5/ \9 5/ \6 6/ \8 5/ \8 5/ \6
W [0] [0][0][0] W [-1][0] [0][0][0][-1] [-1] W W[-1] W [0] [0] W [0] W [0][0]
|8 |9 |6 |8 |9 |8 |9 |5 |8 |5 |9 |5 |5 |8 |8 |6 |5
D D D D L D D D D L L L D D D D D
I. Trees are also very useful for information storage and retrieval
situations such as symbol tables, even though hierarchy may not be
involved.
II. Operations on trees
A. As with any flexible data structure, there are many possible operations
we could define on trees. Certainly, we want a create operation - but
note that there is no such thing as an empty tree! So when we create
a tree, we create a tree having at least one node - the root.
B. The operation of insertion into a tree is certainly important, but
depends heavily on the principle by which the nodes are organized.
We defer discussion of insertion and deletion to discussion of various
special kinds of tree organized on various principles.
C. One class of operations that can be defined for all kinds of tree is
traversal. By "traversal", we mean the act of systematically
"visiting" all of the nodes to perform some operation on them:
1. Printing out the contents of all of the nodes involves a traversal.
2. Unless the tree is ordered somehow on the basis of some key,
searching for a node containing a given value would involve a
traversal (though in practice trees that are to be searched are
usually structured in such a way as to avoid this.)
D. One issue that arises in connection with traversal is the order of
traversal. Two orders are of particular importance:
1. Preorder traversal: Visit the root of the tree
Traverse each subtree in turn in preorder
Example on the above: A B C D E F G H
2. Postorder traversal: Traverse each subtree in postorder
Visit the root
Example on the above: B D C F H G E A
E. Of lesser importance is level order traversal: visit all the nodes
on level zero, then all on level one etc.
Example on the above: A B C E D F G H
F. The above operations can be defined on a forest by mentally adding a
root which is ignored when it comes time to visit it.
III. Representing Trees and Forests
A. We have noted that a forest can be converted to a tree by adding a
root. Thus we focus on representing trees - to represent a forest,
simply include a "root" as a header.
B. One method is to use a linked representation in which each node contains
pointers to its children. This means that when we define the data type
for a node, the degree of the tree determines the number of pointer
fields needed. Pointer fields in a given node that are not needed can
be set to nil.
ex: for a tree of degree 3, as in our example:
const
degree = 3;
type
nodeptr = ^node;
node = record
info: infotype;
child: array[1..degree] of nodeptr
end;
C. Now, for example, we could implement operations on this tree as follows:
1. preorder traversal:
procedure preorder(t: nodeptr);
(* Traverses the tree whose root is pointed to by t in preorder *)
var
i: 1..degree;
begin
visit(t^.info);
for i := 1 to degree do
if t^.child[i] <> nil then
preorder(t^.child[i])
end;
2. Reading a tree in from a text file. Assume that the nodes of a
tree have been written out, one node to a line, in pre-order.
Assume each line contains the contents of the node and the number
of its children.
ex: The tree A
B C D
E F
would be stored as:
A 3
B 0
C 0
D 2
E 0
F 0
We can read the tree in as follows:
procedure ReadTree(var t: nodeptr);
(* Reads in a sub(tree) from a list of nodes stored on disk *)
var
i, NoChildren: 0..degree;
begin
new(t);
readln(infile, t^.info, NoChildren);
for i := 1 to NoChildren do
ReadTree(t^.child[i]);
for i := NoChildren + 1 to degree do
t^.child[i] := nil
end;
D. However, this representation runs into a severe efficiency problem if
the degree of the tree is large.
1. Thm: For a tree of degree d with n nodes, we will always have
n*(d-1) + 1 nil pointers stored in the nodes.
Pf: Each of the n nodes has room for d pointers - or n*d pointers
in all. Each node (except the root) is pointed to by exactly
one of these. So n-1 pointers are used to point to other
nodes, leaving n*d - (n-1) = n*(d-1) + 1 nil.
2. For example, for a tree of degree 10 with 100 nodes, we waste 901
pointers.
E. An alternate representation can be arrived at by transforming the general
tree into a binary tree. A binary tree is either empty, or it consists
of a root and exactly two disjoint sets of nodes - designated left child
and right child, each of which is a binary tree.
This transformation can be done recursively, as follows:
1. To transform a general tree rooted at a node A to its equivalent
binary tree:
- create a binary tree whose root is A.
- transform the leftmost subtree of A in the general tree, and make
this the left subtree of A in the binary tree.
- transform the next sibling of A in the general tree, and make this
the right subtree of A in the binary tree..
2. ex: our original tree:
A
/
B
\
C
/ \
D E
/
F
\
G
/
H
3. Note that you can visualize the shape of the original tree by mentally
rotating the binary equivalent 45 degrees counterclockwise.
4. The same method can be applied to a forest - the right subtree of the
binary equivalent of the root of one of the trees is the transformed
version of the next tree in the forest. We can see what this would
look like for our example forest by just deleting the A node from
the above tree.
G. Performing traversals on a general tree represented by an equivalent
binary tree.
1. Preorder traversal of the general tree is accomplished by preorder
traversal of the transformed tree.
ex: preorder traversal of the above binary tree: A B C D E F G H
2. Postorder traversal of the general tree is accomplished by INORDER
traversal of the transformed tree.
a. Inorder traversal: traverse the left subtree in inorder
visit the root
traverse the right subtree in inorder
b. ex: the above: B D C F H G E A
c. This works because:
- The left subtree of any node in the transformed tree contains all
the nodes that were descendants of that node in the original
tree. These should be visited first.
- The right subtree of any node in the transformed tree contains
all the nodes that were right siblings (or descendants thereof)
of the node in the original tree. These should be visited after
the node.
3. Postorder traversal of the transformed tree has no relationship to
any meaningful operation on the original tree.
H. An equivalent to our ReadTree procedure defined above would be the
following (again assuming that the tree is stored on the disk in
pre-order):
procedure ReadTree(var t: nodeptr);
(* Reads in a sub(tree) from a list of nodes stored on disk *)
var
i, NoChildren: 0..degree;
p: nodeptr;
begin
new(t);
readln(infile, t^.info, NoChildren);
if NoChildren > 0 then
begin
ReadTree(t^.lchild);
p := t^.lchild;
for i := 2 to NoChildren do
begin
ReadTree(p^.rchild);
p := p^.rchild
end;
p^.rchild := nil (* Not really necessary! *)
end
else
t^.lchild := nil;
t^.rchild := nil
end;
Copyright ©1999 - Russell C. Bjork