CPS222 Lecture: Binary Trees Last revised 1/18/2013 Objectives: 1. To define "binary tree" 2. To introduce traversals on binary trees 3. To introduce the use of binary trees to represent arithmetic expression Materials: 1. Excerpts from recursive code for binary tree traversals (to project) 2. Non-recursive inorder traversal code to project 3. Level order traversal code to project 4. Guessing game program - executable and code handout I. Introduction - ------------ A. In our discussion of general trees and forests, we noted that we can represent any general tree/forest by an equivalent binary tree; and we saw that operations on a tree can be mapped to equivalent operations on the binary tree equivalent. B. However, binary trees are of great importance in their own right; and it is in this sense that we consider them now. C. Definition: A binary tree is a set of nodes which is either empty, or it consists of a root and two disjoint subsets, designated the left subtree and the right subtree, each of which is a binary tree. 1. Example: A / \ B D / \ / \ C E / \ / \ - A is the root. Its subtrees are B..C and D..E - B's left subtree is C, and its right subtree is empty - both of C's subtrees are empty - D's left subtree is empty, and its right subtree is E - both of E's subtrees are empty. - In drawing the above, I deliberately included pointers to empty subtrees. In practice, the tree can also be drawn as: A / \ B D / \ C E 2. Note: (By our earlier definition of tree) a binary tree is not necessarily a tree! A binary tree can be empty - a tree cannot! a. Nonetheless, we use similar terminology for talking about binary trees - e.g. "parent", "child" etc. b. For clarity, I will sometimes use the term "general tree" to distinguish from "binary tree" 3. Every binary tree has exactly two subtrees - though one or both may be empty. 4. Not only are the subtrees of a binary tree ordered; but even if a binary tree has only one non-empty subtree, we still designate it as either "left" or "right". Thus, the following two binary trees are distinct: A A / \ B B D. When discussing binary trees, it is useful to define a number of special kinds of binary tree. Unfortunately, the terminology is not used consistently from one author to another, so one must be careful to be sure he knows what a given writer means! 1. A proper or strictly binary tree is a binary tree in which every node has either two non-empty subtrees or no non-empty subtrees. (Some writers include in the definition the requirement that the tree itself be non-empty) Ex: A A / \ / \ B C B C / \ D E / \ F G but not: A / \ B 2. A perfect binary tree (called "full" or "complete" by some writers [ though our book uses the term "complete" differently, as we shall see ] is a strictly binary tree in which all the leaves lie on the same level - OR - A binary tree of height 1 (where we use the intuitive definition of height) is perfect. A binary tree of height h (h > 1) is perfect iff its two subtrees are perfect binary trees of height h-1. Ex: A A / \ / \ B C B C / \ / \ D E F G Observe: for a given height, a perfect binary tree contains the maximum possible number of nodes. 3. A complete binary tree (called "almost-complete" by some writers) is a binary tree having the following properties: a. If the height of the tree is h, then all leaves lie at level h or at level h - 1. b. If any node has a descendant at level h in its right subtree, then all of the leaves in its left subtree are at level h. Ex: A A / \ / \ B B C / \ / D E F Observe: a perfect binary tree can be converted to a complete, but not perfect, tree of the same height by removing nodes on the lowest level, starting from the right and working toward the left. E. There are two important theorems about the number of nodes in various kinds of binary trees: 1. Thm: In a perfect binary tree of height h, the number of nodes is h 2 - 1 (if height is measured intuitively in NODES) Pf: Assigned as a homework problem 2. Thm: In a complete binary tree of height h (measured intuitively), there are at least 2^(h-1) nodes and at most 2^h - 1 nodes. Pf: If we delete all the nodes at level h, we end up with a perfect tree of height h-1. By the previous theorem, this tree contains 2^(h-1) - 1 nodes. Since our original tree was of height h, we must have deleted at least one node; therefore, our original tree had at least 2^(h-1) nodes. Again, if our complete tree of height h is also perfect, then it contains 2^h - 1 nodes; otherwise, we can add nodes at level h to produce a perfect tree of height h, ending up with 2^h - 1 nodes; therefore, our original complete tree had at most 2^h - 1 to nodes begin with. Observe: for any non-zero number of nodes, it is always possible to construct a complete binary tree. Also, for a given number of nodes, a complete binary tree has the minimum height (though if the tree is not perfect there are other arrangements of the bottom level that yield equal performance). Since many algorithms have time complexity proportional to the height, this means that complete binary trees are optimal. Observe: the preceeding two theorems give us an important measure of the size of a binary tree. These two theorems tell us that for an optimal binary tree of n nodes: 2^(h-1) <= n <= 2^h - 1 or: h-1 <= ceiling(log n) <= h 2 That is - for a complete binary tree, any algorithm whose time is proportional to the height of the tree is O(log n). II. Operations on Binary Trees -- ---------- -- ------ ----- A. One of the most useful operations on a binary tree is traversal. Three orders of traversal are of special interest: 1. Preorder: visit the root traverse the left subtree in preorder traverse the right subtree in preorder 2. Inorder: traverse the left subtree in inorder visit the root traverse the right subtree in inorder 3. Postorder: traverse the left subtree in postorder traverse the right subtree in postorder visit the root 4. Note that preorder and postorder were also defined for general trees. Inorder pertains only to binary trees, though we did make use of it when representing a general tree as a binary tree. B. The traversal algorithms are most commonly expressed recursively, since they are defined recursively. PROJECT Code for binary tree operations 1. Node class 2. preorder traversal What would you have to do to change this to inorder or postorder? ASK - change names - change relative order of visit and recurison C. These operations can also be expressed in non-recursive form. 1. Ex: inorder - PROJECT code for non-recursive inorder 2. Non-recursive preorder traversal is very similar. 3. Non-recursive postorder traversal is somewhat more complex. D. Another traversal that is sometimes of interest is level-order. For this, we use a non-recursive algorithm with a queue: 1. PROJECT code for level-order 2. To see that this algorithm works correctly, note the following: a. Nodes are visited in the order in which they are inserted into q. b. All the nodes at level L are inserted in the queue - in level order - before any nodes at level L+1 are inserted in the queue. This can be shown inductively: i. Basis: the node at level 0 is inserted in the queue before the nodes at level 1 are inserted. ii. Hypothesis: assume that for all levels L <= some k (k >= 0) it is true that all the nodes at level L are inserted in the queue in level order before any node at level L+1 is inserted. We wish to show that it is also true that all the nodes at level k + 1 are inserted in the queue in level order before any node at level k+2 is inserted. Proof: as each node at level k is visited, its two children at level k+1 are inserted in the queue in level order. When all the nodes at level k have been visited, all the nodes at level k+1 have been inserted in level order; but no node at level k+1 has been visited; therefore, no node at level k+2 has yet been inserted. QED III. Uses of binary trees --- ---- -- ------ ----- A. We have already seen that any general forest or tree can be represented by an equivalent binary tree, and that when a linked structure is used, this binary tree is more space-efficient because it has fewer wasted null pointers. B. A very important use of binary trees is in representing arithmetic or logical expressions. Such a tree is called an expression tree. For example, the expression: A * (B + C) / D - E can be represented by the tree: - / \ / E / \ * D / \ A + / \ B C 1. Observe that in an expression tree, the internal nodes are operators and the external nodes are operands. The subtress are subexpressions. 2. Observe further that: a. Traversing the tree in inorder yields the inorder form of the expression - though parentheses may be needed to show operator precedence. Ex: the above: A*B+C/D-E b. Traversing the tree in preorder yields the prefix form of the expression. Ex: the above: -/*A+BCDE c. Traversing the tree in postorder yields the postfix form of the expression. Ex: the above: ABC+*D/E- 3. The tree form of an expression can obviously be used for conversion from one form to another - e.g. create the tree from infix, then traverse it in postorder to yield postfix. But it has many other uses, as well: a. In an optimizing compiler, the tree form of an expression can be used for optimization. For example, if the same subtree occurs more than once, it can be evaluated once and stored; then the resulting value can be plugged in wherever the common subexpression occurred. ex: * / \ + + / \ / \ A B A B b. In an interpreter, an expression can be stored in tree form. Whenever the expression is to be evaluated, the tree can be traversed in postorder, with the current values of the various operands being plugged in in place of the terminal nodes. C. Decision trees 1. Some classification or diagnosis kinds of problems can be solved by a protocol based on yes-no questions. Such a problem can be modeled by a binary tree in which each non-leaf node represents a question, with the left subtree being the process to be followed if the answer is "no" and the right if the answer is "yes". Leaf nodes represent conclusions. 2. Example - a simple guessing game program. a. DEMO b. HANDOUT code D. We will see later that a very nice sorting algorithm - called heapsort - is based on a special kind of binary tree. E. A particular kind of binary tree that is often useful is a BINARY SEARCH TREE (BST). (In fact, sometimes people mistakenly think that all binary trees are BSTs - not so!) We will look at this shortly.