CPS222 Lecture: Lists Last revised 12/17/14 Objectives: 1. To introduce the data type sequence (ordered list). 2. To show how sequences can be implemented by arrays, vectors, or linked lists. 3. To introduce representation for matrices as two or more dimensional arrays I. Sequences - --------- A. Many of the interesting "standard" abstract data types are variations on the theme of a sequence. B. A sequence is a group of items that has the following basic properties. 1. Either the sequence is empty, or it has a unique first item and a unique last item. (If it consists of exactly one item, these two are the same, otherwise they are different.) 2. Each item, except the last, has a unique successor. 3. If you start with the first item and apply the successor operation repeatedly, you will eventually visit each item exactly once, ending at the last item. 4. We may also want to define the operation of predecessor analogous to successor: a. Each item, except the first, has a unique predecessor b. If you start with the last item and apply the predecessor operation repeatedly, you will eventually visit each item exactly once, ending at the first item. c. Successor and predecessor are inverses - e.g. if B is the successor of A, then A is the predecessor of B, and vice versa 5. We may want to be able to access items by relative position in the list (with position 0 being the first). B. For a sequence, we have the following set of values and set of potential operations. Note that, for different kinds of sequences, we may be interested only in a subset of the set of operations: Values: { all sequences of items (of some type) } Operations: { add a new item at a specified position - interesting special cases are beginning, end, or specific numbered position. access the item at a specified position - same options as above delete the item at a specified position - same options as above determine whether the sequence is empty obtain the successor of a given item obtain the predecessor of a given item } II. Representations for Sequences -- --------------- --- --------- A. There are two basic alternatives for implementing a sequence - using an array (or a variant known as a vector), or using a linked list. B. Arrays 1. Since arrays are supported directly in almost all programming languages, they are an attractive representation for sequences. a. In an array, LOGICAL ADJACENCY (B follows A) is modelled by PHYSICAL ADJACENCY (B occurs just after A in memory.) b. If the sequence is allowed to grow or shrink over time, we might also store a count of the number of items, along with the actual array of items, which would be allocated with extra space to allow for growth. c. In C/C++, an array is declared by a declaration of the form [], which both declares the array and allocates the needed storage. i. Example: int n[100]; // declares n to refer to an array of integers // and allocates storage 100 integers. ii. Contrast this with Java, where the declaration of an array and storage allocation are two distinct steps - e.g. int n[]; n = new int[100]; iii. An array element is accessed by subscript - e.g. n[i] is the ith element of the array. (Subscripts are 0 origin, as in Java) iv. A potential source of errors in C/C++ programs is that array subscripts are not checked for legitimacy - e.g. given the above declaration of n as an array of 100 ints, it would be possible to refer to n[200] - which would access a storage location belonging to some other variable. Storing a value into this location could result in a hard to find error! 2. With an array representation of a sequence, certain operations are very easy: a. Accessing an item at an arbitrary position. If the items in the sequence are numbered 0, 1, ... and we know the address in memory of the first item, then the address of the ith item is (address of first item) + i * (size of an item) Example: Given the array declaration int n[100]; and assuming that the array n starts at location 1000 in memory and an int occupies 4 bytes of memory, then n[10] is at location 1000 + 10 * 4 = 1040 b. Obtaining the successor or predecessor of an item. If we know the address of a particular item, then its successor is at address: (address of current item) + (size of an item) and its predecessor is at address: (address of current item) - (size of an item) c. Adding a new last item (assuming there is room for one more item in the array) - Store the item at address (address of first item) + (item count) * (size of an item) - Increment the item count d. Deleting the last item - Decrement the item count. (The old value is still stored in memory, but is no longer considered part of the sequence.) All of the above are O(1) 3. With an array representation of a sequence, certain operations are relatively hard: a. Adding a new item at an arbitrary position (or at the beginning) entails moving all the items currently at the same or higher-numbered positions up one slot. b. Deleting an item at an arbitrary position (or at the beginning) entails moving all the items currently at higher-numbered positions down one slot. The above are O(n), where is n is the number of items in the sequence. 4. Many programming languages (including C/C++) support creating arrays with two or more dimensions. (A two-dimensional array is often used for modeling mathematical matrices). Though these are not sequences as we have been talking about, we mention them briefly here. You will use a matrix in your "Game of Life" project. C/C++ Example a. Declaration float x[10][20]; // Declares x to be a matrix of floats // The matrix has 10 rows and 20 columns // Allocates storage for 200 floats b. Access x[i][j] refers to the element in row i and column j C. Vectors 1. When we create an array, we must specify how many items it may contain. If the sequence grows larger than this, we typically have to move the entire array to some new, larger location in memory, since the memory allocator typically will have put other variables immediately after the space we reserved for the array. This is a non-trivial O(n) exercise at best - and may not even be easily possible. For this reason, we may be tempted to allocate memory to more than adequately accomodate the potential growth of the sequence - which leads to either wasting memory or an unpleasant surprise when we discover we guessed too small! 2. Many languages provide a variant typically known as a vector which can be resized at any time - though increasing the size can take O(n) time because the vector is implemented by an array that may need to be copied to a new larger location in storage. D. Linked lists 1. The use of a linked implementation of a sequence typically requires that the programming language support variables of pointer or vector type - which we will discuss in the next lecture. 2. The fundamental idea is that we abandon the notion of modelling logical adjacency by physical adjacency. Instead, we associate with each item an explicit LINK - the address in memory of its successor. EXAMPLES: We often represent linked lists using a "box and arrow" notation ----- ----- ----- | A | | B | | D | | o-|-->| o-|-->| o-|-- ----- ----- ----- | --- - (It is common to refer to the individual boxes as NODES.) Form class into a list linked in alphabetical order by pointing to each other. 3. With a linked representation of a sequence, certain operations are very easy: a. Adding a new item at an arbitrary position is a matter of readjusting links - assuming we know its predecessor. EXAMPLES: Show modifications to above drawing to insert a node containing "C" just after "B". Show process of inserting a new person into class list. b. Deleting an item at an arbitrary position is a matter of readjusting links - assuming we know its predecessor. EXAMPLE: Show modifications to above drawing to delete node containing "B". Show process of deleting a person from class list. c. Accessing the successor of an item involves following its link. The above pointer operations are O(1). 4. With a linked representation of a sequence, certain operations are relatively hard: a. Accessing an item at an arbitrary position entails starting at the beginning and following links (traversing the list) the required number of steps - e.g. to access item 10, we start at the beginning (item 0) and follow links 10 times. This is an O(n) operation. (Note that this may also be part of the cost of adding or deleting an item at an arbitrary position, since we need access to its predecessor - unless we are already there for some reason.) b. Accessing the predecessor of an item entails starting at the beginning of the list and following links until we find a node whose successor is the one we want - i.e. the links are "one way streets". This is an O(n) operation. (This can be avoided by maintaining a doubly-linked list, in which each node has two links - one to its successor and one to its predecessor.) 5. Provided memory is not totally full, it is easy to grow the sequence by allocating a new node and linking it in at the right place. There is no need to specify a size up front. E. You should already be quite familiar with working with arrays. In the next lecture, we turn to implementing linked lists, using C++.