CS322: Multiple CPU Systems: Multiprocessing, Distributed Processing, Networks

Introduction

Multiple CPU systems are becoming more common as CPU costs decrease

We have previously noted that until recently the cost of CPU's generally dictated that computer systems be built around a single CPU. Now, however, CPU's have become relatively cheap, and various configurations involving multiple CPU's are common. In this session, we want to briefly survey some of the issues that arise in such systems.
First, we need to classify multiple-CPU systems:
1. Tightly vs. loosely coupled systems
  
  The most basic distinction is between tightly coupled and loosely coupled systems:
  1. In a tightly coupled system, 2 or more CPU's share a common clock, share some or all memory in common, and run under the control of a single operating system. Obviously, in such a configuration the CPU's are physically located at the same site. This is often referred to as multiprocessing. The primary justifications for the use of multiple CPU's are:
    - To obtain increased computing power.
    - Increased reliability: if one CPU goes down, operation can continue at reduced throughput. ("Graceful degradation" or "fail soft").
    - Incremental expansion: adding a CPU, rather than replacing the existing CPU with a larger model.
    - To allow for specialization: one CPU may be a number cruncher while another is tailored for IO.
  2. In a loosely coupled system, 2 or more CPU's may be located at some distance from one another, and communicate by passing messages over some sort of communication link. The primary justifications for such configurations are:
    - Sharing of resources between multiple sites. These resources can include:
      - Specialized hardware such as fast printers, high power processors.
      - Software.
      - Data.
    - Making the computing facility better reflect the structure of the organization it serves. For example, order entry may be done on a computer system at the branch office where the order is received; inventory processing may be done at the warehouse; and billing etc. may be done at the home office or back at the branch.
      - In the early days of computerization, large central systems tended to concentrate power in ways that were not necessarily consistent with organizational policy.
      - More recently, dispersion of small systems has caused trends to the opposite extreme.
      - Distributed computing is seen as a way to achieve the best possible mix of centralization and decentralization.
    - Reliability, as above, though this requires some duplication of resources and so tends to go slightly against the desire for resource sharing.
    - Incremental expansion, as above.
    - Communication, as via electronic mail (email).
2. Symmetrical vs. asymmetrical systems
  
  Within the broad category of tightly coupled systems, we may further distinguish between symmetrical systems, in which all processors are physically similar, and asymmetric systems, in which the processors may be specialized and/or may perform specialized roles.
3. Further distinctions between types of loosely coupled systems
  
  Within the broad category of loosely coupled systems, further distinctions are possible, though terminology is not always uniformly applied.
  1. The term distributed computing can be used in a technical sense to refer to a configuration in which an array of computers at various sites is interconnected in such a way as to mirror the organizational structure of an organization.
    - Various steps in processing may be done at different sites. (The order processing example above.) When the bulk of the actual processing is done at a central site based on data entry at remote sites, this is sometimes called remote job entry (RJE).
    - Databases may be distributed. For example, a branch office may maintain the records on the customers it serves, with the home office still having access to it. Conversely, the home office may maintain the database on inventory and pricing of the product line, which data can also be accessed in the field.
  2. The term local area network (LAN) refers to a configuration in which a number of computer systems in close physical proximity (typically one building or adjacent buildings) share resources such as printers or a central database.
  3. The term computer network or wide area network (WAN) refers to a configuration in which computer systems dispersed over a wide geographic area are interconnected to provide facilities such as communication and, in some cases, remote login. Note that, unlike the previous two cases, such networks often cut across organizational lines.

Some comments about tightly-coupled systems (multiprocessors):

Many "conventional" computers are in fact asymmetric multiprocessors

As we have noted, such systems can be symmetric or asymmetric. In point of fact, many systems that are thought of as being a single computer may, in fact, be a form of asymmetric multiprocessor. e.g.
1. IO channels found on large mainframes are programmable and can be thought of as processors specialized for IO.
2. Some mainframe systems incorporate conventional minicomputers as IO processors:
  1. "Front-end" processors to service the requirements of terminals (echoing, line editing etc.)
  2. "Back-end" database processors.
3. Microprocessors in the system console that manage system startup, shutdown, and failure diagnosis.
4. In such systems, all processors except the main CPU are dedicated to specific tasks and run software dedicated to that task, perhaps without any true operating system. Therefore, from an operating system standpoint they are relatively straight-forward; the central CPU operating system sees the dedicated processors as very smart peripherals and communicates with them as such.
Symmetric systems have some interesting features not found in asymmetric systems.
1. Efficiency
  
  First, an observation about efficiency. One of the prime goals of symmetric systems is increased computational power. However, the communication overhead involved means that one cannot double computation power by simply going from one CPU to two. Experimental data indicates that 2 CPU's have roughly 1.8 times the power of 1, and 3 CPU's have roughly 2.1 times the power of 1 - though some systems have achieved better results (e.g 4 CPU's with 3.6 times the power of 1.)
  
  The chief villain - but not the only one - is contention for access to shared memory.
2. There is the question of the locus of operating system kernel functions.
  1. Master-slave: one CPU manages kernel functions such as CPU scheduling and IO. This is simpler to implement, but can lead to a bottleneck; and if the master CPU goes down then the whole system goes down (cf goal of graceful degradation.)
  2. Totally symmetric: any CPU can execute operating system kernel code. Of course, this raises important synchronization questions: while one CPU is executing the OS kernel, other CPU's must be locked out of doing so, and probably must simply busy wait (since scheduling is a kernel function!) This argues for keeping the kernel as small as possible.
3. There is the question of migration of processes.
  1. A given process could be scheduled on only one CPU
    
    In the simplest case, once a given process starts executing on a certain CPU, it continues to run only on that CPU.
    - Apportioning of workload among CPU's is an issue only when a new process is created.
    - This allows each CPU to have a private memory for the processes it is currently running, plus access to a shared memory that all of the CPU's use for functions like communication and scheduling - which reduces contention for shared memory.
    - This can lead to load imbalance, with one CPU very busy while others are idle.
  2. A given process could be scheduled on more than one CPU
    
    Alternately, a process may run on any CPU that is free to run it. This generally requires that processes reside in shared memory, and thus leads to increased memory contention problems; but it allows better use of CPU cycles. (Note: the memory contention problem can be greatly reduced if each CPU has its own private cache memory, since the majority of memory operations will then be performed in the cache rather than in shared memory.)

Some comments about loosely-coupled (distributed/network) systems:

Includes a very broad family of systems

We can only make very general comments. Some key issues:
1. Communication configuration.
2. Routing of messages.
3. Message security.
4. Time-ordering of events.
5. Control of access to critical sections in a distributed environment.
6. Deadlock handling.
7. Methods for maintaining network integrity in the face of possible failure of sites and/or links.
Communication configuration:
1. Types of network connections
  
  Clearly, every site in a network must be able to communicate with each other site in the network by some means. Depending on the goals of the network, there are a number of viable options:
  1. Fully connected is usually neither necessary nor feasible.
  2. Partially connected is suitable for networks whose primary functions are communication and remote login. To minimize the possibility of the network being partitioned by the failure of a link, it is desirable for each site to connect to at least two others, and to ensure that no one link is on all the paths between any pair of sites. Note that techniques from graph theory (cf CS321) can be used to find minimal cost interconnection patterns etc. Networks such as the Internet use this approach.
  3. Tree structured or hierarchical is suitable for distributed computation within a hierarchically structured organization. The root of the tree would be the home office, the level 2 nodes the regional offices etc. This is advantageous, since the structure of the computer system mirrors corporate organization.
  4. The star configuration can be thought of as a special case of hierarchical. It can be used for distributed systems, and is also useful in LAN's where the central site may be a file server dedicated to providing shared access to files, perhaps coupled with major shared resources such as high-speed printers. Also, within a larger network, a series of stars might service a number of sites within close geographic proximity, with long distance communication service being provided only from star to star. (Networks such as Internet make use of this approach.)
  5. Ring structures are useful for networks whose primary function is communication; but they may impose too much delay for other functions. (Note that on a one-way ring a message may have to pass through n nodes and on a two-way ring through n/2 nodes, worst case.) IBM's LAN architecture is based on this model.
  6. Bus structures differ from the others in that a link connects several sites, rather than just two. This means that a site cannot transmit over the link whenever it wants to, but must instead wait until the link becomes free.
    - Contrast CB radio with ordinary telephones.
    - Such systems must deal with the possibility of collision: if two sites are both waiting for the link to become free, they might both begin transmitting at the same time when the previous user finishes. To deal with this, a site must listen for a collision when it is transmitting, and if it detects one must stop transmitting, wait for a while, then start again. To prevent repeated collisions, each site waits for a random time before restarting.
    - An alternate approach to collisions, if the bus is logically (but not necessarily physically) structured as a ring, is token passing.
    Bus structures are common for LAN's, but not for longer-distance networks.
    
    A prime example is Ethernet, which underlies the LAN network of a number of manufacturers
    
    Example: At Gordon, FAITH, HOPE, and the minor prophets cluster are connected to an Ethernet that also has terminal servers and our Internet connection on it.
Routing of messages
1. Two issues:
  1. How do we chose a path?
    
    If there exists more than one path between a message's source and destination, then some choice must be made as to the path to use.
  2. Do we packetize the messages?
    
    Are messages sent as complete entities all at one time, or are they broken up into smaller pieces?
2. With regard to the first question, the major options are:
  1. Fixed routing: a message between A and B always follows a fixed, predetermined route.
  2. Virtual circuit: a route is chosen when two processes begin communicating, and continues for the duration of that session.
  3. Dynamic routing: each message (or portion of a message) travels over the route deemed most efficient at the time it is sent.
3. Without exploring all the alternatives for the second question, we note that the most prevalent approach is packet switching:
  1. A message is broken up into a number of fixed-size pieces called packets. Each packet begins with a header that contains the sender, destination address, a sequence number, and other control information as needed. It ends with error-detection information to allow the recipient to be sure it was not corrupted:
```
---------------------------------------------------
| Header    | Text                       | Error  |
|           |                            | Control|
---------------------------------------------------
		
```
  2. Using this approach, different portions of the message can be sent over different routes if dynamic routing is used. It is the task of the receiving site to reassemble the message in proper order (using the sequence numbers), and to request the retransmission of any packets that were lost or corrupted.
  3. Note that this requires that each site along the way function in a store and forward mode. When a message arrives for a site located further down the network, the site must hold it until it is able to send it on its way over an appropriate link. (This would be true with any scheme not involving dedicated circuits. One of the advantages of packet switching is that the units to be stored temporarily en route are of a fixed size.)
Clearly, any time a message is traveling over a considerable distance, security becomes an issue:
1. Eavesdropping
  
  could be done either by an unscrupulous forwarding site or by wiretapping or monitoring of radio links.
2. Spoofing
  
  Alteration of the message or insertion of spurious messages - e.g. in an EFT network.
3. The chief technique for maintaining security is cryptography: the message text is enciphered at the sending site and deciphered by the recipient.
4. DES
  
  The federal government has adopted an encryption technique called DES, which was developed jointly by IBM and the NSA. However, DES has been widely criticized:
  1. Certain of its design criteria have been kept secret. There are those who suspect that NSA knows how to break it.
  2. Like most cryptographic schemes, it suffers from the key distribution problem. If A is going to send messages to B that can be decrypted with some key K, how does he ensure that the message that gives B the key is not compromised?
5. Public Key Encryption
  
  An alternate approach is the so-called public-key or RSA cryptosystem.
  1. Messages are represented as large positive integers, by treating the letters of the alphabet as digits modulo 26 (or 95 if one uses the full ASCII code)
  2. Based on a result from number theory, as follows:
    - Let n be the product of two primes p and q: n = pq.
    - Let d and e be integers such that:
      
      gcd( e, ( p - 1 ) ( q - 1 ) ) = 1
      d * e mod ( p - 1 ) ( q - 1 ) = 1
    - Then if m is a positive integer with m < n, it turns out that if we encrypt m by e
      
      C = E(m) = m^e mod n
      
      then we can recover m as follows:
      
      m = C^d mod n
  3. Observe:
    - Two different keys
      
      Unlike most schemes, the key for encrypting a message is not the same as the key for decrypting it. A user can publish his chosen values of (e,n), while retaining d as a secret. Now anyone can send him an encrypted message; but only he knows how to decrypt it.
    - Easy to find p and q but hard to recover them
      
      It is relatively easy to find large primes for p and q (say 100 digits each), and from them to calculate n, d, and e. But it is conjectured to be exceedingly hard to go the other way, since factoring is a hard problem to which no easy solution has been found over 300 years of mathematical research.
    - Can be used to digitally sign messages
      
      The same scheme can be used to sign messages to prevent insertion of fake messages into a system: if a user uses his private key d to encrypt a message, anyone in the world can decrypt it using his public key e. The content of the message is thus not protected; but the fact that the specified individual sent it is established by the fact that no one else could have created such an encrypted message without knowledge of d. (To guard the contents of the message as well, the user could encrypt it twice, using the public key of his recipient the second time.)
  4. Three problems with RSA:
    - The fear that someone will someday find an efficient way to factor large numbers. Given that n is public, this would totally destroy the security of the system.
    - The computational time involved - about 1000 x that needed by DES
    - It has a copyright that limits its use

Time-ordering of events

the order of events is important

A number of the solutions to some of the problems we will consider shortly require that we have some way of determining the relative time order of two events - i.e. we have to be able to answer the question "which of these two events occurred first?".
Different CPUs often have unsynchronized clocks

The problem, of course, is that if each CPU maintains its own clock then there is no way of guaranteeing that all the clocks on the network run in exact synchronization. Since significant events on a network can transpire in time intervals under a second, even a small amount of disagreement could produce problems.

Event timestamps

Therefore, many schemes rely on a notion of a timestamp, which is not actually a time but an integer that simply represents the relative order of events at some site on the network (i.e. the first event to occur at the site gets timestamp 1, the second gets timestamp 2 etc.)

Each CPU maintains its own logical clock for the purpose of issuing timestamps to events that occur at its site. Each time an event is issued a timestamp, this clock is incremented. (Thus the timestamp of each event is unique.)
To maintain global consistency, each message that is sent carries a timestamp representing the event of sending it. Further, when a message is received the recipient CPU also timestamps it as of the time it is received.

Since it cannot be that a message arrives earlier than the time it is sent (or even at the exact same time), if the receiving CPU finds that the incoming message has a timestamp greater than or equal to the reading of its own clock, then it first increments its own clock to 1 more than the time stamp of the received message. (Thus, some timestamp values will never be issued by a given site, though all will be used somewhere in the network.)

Example:

Process A		Process B		Process C
Time
Event	Time
Event	Time
Event
100	Send to B	101	Send to C	99	Send to A
101	Receive from C	102	Receive from A	100	Receive from B, bumps own clock to 102

The messages end up with the following timestamps:

Source -> Destination	Sending timestamp	Receiving timestamp
A -> B	100	102
B -> C	101	102
C -> A	99	101

Regardless of the actual physical time at which the messages were sent, the message from C to A is regarded as having been sent first, that from A to B as second, and that from B to C as third.

In terms of arrival, that from C to A is regarded as arriving first, while the other two are regarded as having both arrived at the same time (tied for second.)

A time stamping scheme like this guarantees that certain global consistency conditions hold:
1. If two events A and B occur at the same site, then either A occurs before B (and TS(A) < TS(B)) or B occurs before A (and TS(B) < TS(A)).
2. For any message its TS(send) < TS(receive).
3. If any event A can have any causal influence on some other event B (either directly or indirectly) then TS(A) < TS(B).
4. If two events have the same timestamp, then they occurred at different sites and it is impossible for either to be a cause of the other. We say, in this case, that the two events occurred at the same time.
5. To break ties among events occurring at the same time, we assign each site a unique number and use that as the tie breaker for events with the same timestamp value. (Thus each timestamp also includes an indication of the site that issued it.)

Control of access to critical sections.
1. This is more of a problem in a distributed system than it is in a simple network.
  
  Particular concerns arise when a database is distributed, with multiple processes being able to update it, and with update transactions potentially involving data stored at more than one site.
  
  Note: We are dealing here, of course, only with critical sections that concern items distributed across the network. Each CPU is free to enter and leave strictly local critical sections at will.
2. The simplest approach is to centralize control of access to each critical section.
  
  For example, if a portion of a shared database is stored at a certain site, then any process wishing to update that data may be required to send a message to the host CPU requesting permission to do so. (In fact, the host CPU will probably actually have to do the update itself, so the message may contain all the information needed.)
3. Alternately, token passing may be used.
  1. In one variant of token passing, the network must be logically (but not necessarily physically) structured as a ring. The token is a special message that grants permission to enter a critical section. When a process receives the token, it either:
    - Enters its critical section, then when it leaves passes the token to the next process in the ring.
    - or simply passes the token on without delay.
  2. Alternately, to reduce unnecessary message traffic, we can allow a process to hang onto the token until it gets a message from some other process requesting it - in which case it can forward it directly to the requester.
    - This avoids the need for a ring structure.
    - But it does mean a process wishing to enter a critical section must potentially send messages to all other sites requesting the token. (In general, no one can know where the token is at any given time except the process holding it.)
If neither of the above approaches is possible or desirable for some reason, then a fully distributed synchronization algorithm must be used.

This relies on the use of time stamps, and requires a considerable number of messages.
1. The text (page 567) gives an algorithm for controlling critical sections. In brief, each process that wants to enter a global critical section must obtain permission from each other process before it may do so.
2. Observe: this requires 2 * ( n - 1 ) messages, a provable minimum.
3. Unfortunately, fully distributed schemes like this, in addition to requiring a lot of message overhead, also have two other significant problems:
  - Each process (or at least each site) must know the identify of every other process on the network - which can be problematical if sites go down and recover.
  - The failure of a site could prevent a process at another from ever getting into a critical section, even if the failed site was not using it.

Dealing with deadlock.

As is true with local deadlock, we can chose between deadlock prevention, deadlock avoidance, or deadlock detection and recovery. However, in a distributed system all three approaches become more complex.
Deadlock prevention: processes may be prioritized based on the timestamp of their creation. Two schemes have been proposed based on this idea:
1. Wait-die prevents circular wait.
  - A process can never wait for a resource held by an older process; if it needs such a resource, it must be rolled back to some safe check point or die altogether and be restarted. (The restart happens automatically.)
  - Because this prevents circular wait, deadlock cannot occur (cf the argument used to prove that resource ordering prevents deadlock.)
  - To prevent starvation, a process that dies keeps its original time stamp when it is restarted; thus, it will eventually get to be the oldest process competing for a given resource and complete (though it may have to be rolled back several times first.)
    
    (Note, though, that a process tends to wait more as it ages.)
2. Wound-wait allows preemption: an older process may preempt a resource from a younger process (forcing the younger process to be rolled back or wounded), while a younger process will wait for one held by an older process.
  - Once again, circular wait is impossible with this scheme.
  - Once again, to prevent starvation, a process that is wounded keeps its original timestamp.
Deadlock avoidance is usually managed by using an algorithm like the banker's algorithm at a central site, from which processes obtain permission to use shared resources.
Deadlock detection and recovery can be accomplished by a centralized process, or by a hierarchy of processes that each manage a portion of a tree structured network, as described in the text.

Methods of maintaining network integrity.

In a system spread over a wide area, the possibility that a link or site might fail is very real, and the network must be able to both pick up after the failure and also re-integrate the failed component into the system when it is restored.
This is a particular issue with regard to synchronization and/or deadlock detection:
1. In a centralized approach, if the synchronizer fails, it must be replaced.
2. In a token passing scheme, the token can be lost.
3. In a fully distributed scheme, a failed process will never answer a synchronization request message.
A class of algorithms known as election algorithms are used to resolve such problems. The book considers two such algorithms for the first case listed above - the need to replace an (apparently) failed central coordinator. We will not consider them further.

$Id: multicpu.html,v 1.4 2000/04/18 14:47:17 senning Exp $

These notes were written by R. Bjork of Gordon College. They were edited, revised and converted to HTML by J. Senning of Gordon College in April 1998.