Overall System Structure; Busses

CS222 Lecture: Overall System Structure         last revised 3/30/99

Objectives:

1. To overview the major building blocks of a complete computer system
2. To discuss issues and options in the design of bus systems.

Materials: Transparency of Stallings page 76

I. Introduction
-- ------------

   A. At the start of the course, we noted that a computer system can be
      described at five different levels of detail.  More recently, we have
      been focussing on one of those levels - the hardware design level - and 
      have divided it into three sublevels.  What are they?  ASK

      1. The system level 

      2. The CPU implementation level

      3. The logic design level

      We focussed on the lowest of these three levels in CS221; we have been
      looking at the middle level in the last week or so; now we move up
      to the overall, or system level.

   B. We will do this by looking at two issues:

      1. The major kinds of building blocks from which complete systems are
         constructed.

      2. The way in which these building blocks are interconnected in order to
         produce a complete system.

   C. As we are doing this, we will also introduce a system of notation that
      can be used to describe an overall system.

      1. In this system, each major component of a system is denoted by a
         single upper-case letter, with each type of component being denoted
         by a different letter - e.g. a P is a processor, an M is a memory 
         element, etc.

      2. The description is sometimes further qualified - e.g. an M might be

         a. Primary memory, such as semiconductor RAM (M.primary or M.p or Mp).

         b. Secondary memory, such as disk (M.secondary or M.s or Ms)

         etc.

      3. A formal definition of the syntax of the notation we will use is 
         contained in an appendix to:

          C. Gordon Bell and Allan Newell. Computer Structures: Readings
          and Examples (NY: McGraw Hill, 1971)

II. Basic Building Blocks
--  ----- -------- ------

   A. All computer systems are built by combining certain kinds of basic 
      building blocks, which fall into the various categories.  (We will
      overview the categories now and study each in detail later in the
      course.)

   B. The following are basic components:

      1. Memories (M)

         a. A memory is a device that stores information without altering its
            meaning or form - i.e. if a certain binary value is stored into a
            memory, then that exact same binary value can be retrieved at a
            later time.

         b. The simplest form of memory is a register, that stores a single
            atomic value.  Registers are basic building blocks of many
            components of a system.  Those contained in the CPU are often
            directly visible to the assembly language programmer (e.g. the
            accumulator of the von Neumann machine.)

         c. Other memories store multiple values, with some mechanism being
            used to specify a particular value that is to be read or written.

            i. Primary memory is typically organized using a linear addressing
               scheme, so that each value stored is assigned a unique address
               in the range 0 .. total_size - 1.

           ii. Secondary memory may require that values be addressed by
               physical position - e.g. surface, track, sector, location
               within sector.

         c. Note that we can further distinguish different types of memory -
            e.g. Mp, Ms.

      2. Data elements (D)

         a. A data element is device that changes the meaning of information
            without altering its form - i.e. it may take in a binary value and
            output a binary value that is the result of some computation on the
            input value.

         b. A simple example is a shifter, which receives a binary value and
            outputs a binary value; but the value outputted is different from
            the value put in - e.g. if the shifter does a left shift then the
            output is the input value * 2. (Devices like adders also fall into
            this category.)

         c. A more complicated example is an ALU, which can perform any one of
            several operations on an operand or set of operands presented to
            it.  (This can be realized by a set of combinatorial circuits - one
            per function - plus MUX(es) to control the inputs to the circuits
            and to select which function appears as the output.)

      3. Transducers (T)

         a. A transducer is device that changes the form of information without
            altering its meaning.

         b. Most IO devices are transducers.  For example:

            i. A keyboard transforms the representation of a character as the
               physical motion of a key into the representation of that same
               character as a binary code.

           ii. A screen transforms the representation of a character as a
               binary code into the representation of that same character as
               a pattern of dots on the screen.

      4. Control elements (K)

         a. A control element is a device that controls the operation of other 
            devices.

         b. Control elements are frequently used to interface between various
            devices.  For example, disks (a type of M) typically have 
            controllers that control operations such as the positioning of the
            heads.

         c. Control elements are frequently realized as state machines.

   C. The following are used to connect basic components:

      1. Links (L)

         a. A link is path for transmitting information between two points
            without altering either its meaning or form.  

         b. Technically, even a piece of wire is a link.  But links are 
            explicitly noted in describing a system only when the link has 
            important characteristics that effect system performance, such as 
            speed limitations.

      2. Switches (S)

         a. A switch is a device that provides alternate paths for information
            between other devices - i.e. a set of potential links.

            i. In the most sophisticated case, several links can be active at
               the same time - i.e. the switch behaves like a telephone 
               switchboard.

           ii. Simpler switches allow only one link to be active at a given 
               time.  

         b. Note that the system bus that forms the heart of many computers is
            a form of switch which allows one link active at a time.

   D. One last basic component is the processor (P)

      1. A processor can actually be viewed as composed of simpler components:

         One or more M's (registers)
         D (ALU)
         K (control)
         One or more S's (internal data paths)

      2. However, in describing a complete system it is often expedient to 
         treat a processor as a single unit without worrying about its internal
         structure.

      3. What makes a processor a processor is that it is capable of fetching,
         interpreting and executing instructions - i.e. it is programmable.

      4. Every computer system has at least one processor - a central processor.

         a. Some have additional processors - e.g. systems with multiple CPU's 
            or mainframe systems that have a single CPU and some number of IO 
            processors or a personal computer having a single CPU and a 
            floating-point coprocessors.

         b. When several processors are present in a system, we qualify the P
            symbol to denote the type of each - e.g.

            P.central or P.c or Pc
            P.input_output or P.io or Pio
            P.floating_point or P.fpp or Pfpp

   E. When we describe networks of interconnected computer systems, we may
      choose to treat an entire computer system as a single building block, for
      which we would use the letter C.

   F. Some examples of complete systems:

      1. The VonNeumann machine:

                M--P--T

         or:    K
               /|\
                M
             /  |  \
             T--D--T

      2. IBM 370/155:

                Mp(#0) --
                        |
                Mp(#1) ---- S ---|--- M.cache --- Pc --- T('Console)
                        |        |----------------|
                Mp(#2) -|        |                   |--Ms(#0;fixed head disk)
                        |        |--- Pio(#0) -- K --|
                Mp(#3) -|        |                   |--Ms(#1;fixed head disk)
                                 |
                                 |
                                 .
                                 .
                                 .
                                 |                   |--Ms(#0;movable arm disk)
                                 |--- Pio(#4) -- K --|
                                                     |--Ms(#1;movable arm disk)
                                                     |
                                                     |--Ms(#2;movable arm disk)

      3. A network of PC's connected to a central file-server via ethernet:

                C       C       C       C       C
                |_______|_______|_______|_______|___ S.ethernet
                                                        |
                                                     C.server

III. Computer Connection Structures
---  -------- ---------- ----------

   A. At the highest level, a complete computer system can be regarded as being
      built up from three basic modules:
 
      1. One or more CPU's - often one per system, but sometimes more than one
         (multiprocessor systems).

      2. The memory system, consisting of one or more kinds of memory plus
         associated controllers.

      3. The IO system, consisting of some number of IO devices plus associated
         controllers.

   B. A key design issue in building a computer system is how these various
      devices are to be CONNECTED.

      1. The SPEED at which information flows between the various components
         of the system can be the determining factor in overall system
         performance.  If system performance is limited by the speed of the
         interconnection system, then technical improvements to the individual
         components will not result in performance gains.

      2. Interconnection systems often have a longer lifetime than individual
         components; thus design decisions concerning them can have long-term
         implications. 

         Example: DEC designed the UNIBUS as an interconnect structure for
         its PDP-11's in 1970.  The PDP-11 CPU went through a series of
         generations, but the UNIBUS architecture connecting the CPU to
         peripherals remained the same.  Early members of the VAX family - 
         introduced in the late 1970's - still used a UNIBUS for connecting to 
         many of their IO devices.  Thus, the UNIBUS architecture lived through 
         several generations of PDP-11 CPU and on into the next CPU
         architecture.

         (Note: later PDP-11's and some VAXes used a bus structure called the
          QBus, and several other bus architectures have been used on more 
          various VAX models over the years.)

   C. By far the most commonly-used interconnect structure today is a bus
      structure in which all modules connect to a set of shared lines called a
      BUS SYSTEM.

      1. Today there are several industry-standard bus architectures which allow
         components from several different manufacturers to be assembled
         together into a single system.

         Example: The PCI Bus is widely used in both Wintel PCs and MacIntoshes.

      2. In addition, many manufacturers have their own proprietary bus
         architectures

      3. Actually - as we shall see - a given computer may have several bus
         systems.  Further, there are a number of fundamental design choices
         which must be made when designing such a system.

IV. Overview of Bus Systems
--  -------- -- --- -------

   A. A bus system is a simple form of switch (S)

      1. A bus consists of a collection of wires that individual components 
         (CPU's, memories, and IO device interfaces) plug into.

      2. These wires include some for carrying addresses, some for carrying
         information, and some for control.

      TRANSPARENCY: STALLINGS P. 76

      Example: the Z80 system bus consists of 40 lines: 16 for address, 8 for
               data, and 16 for control (including power, ground, clock.)

      Example: the UNIBUS consists of 56 lines: 18 for address, 16 for data,
               and 22 for control.

   B. Generally, bus architectures are standardized - either by one company or
      by an industry group such as IEEE.  This allows many different kinds of
      devices to be built to plug into it.  Each device that plugs into a given
      bus must know what signals to expect on what pins and what protocols will
      be used to exchange information over the bus.

   C. The transfer of a piece of information between two devices on the bus
      involves a BUS CYCLE.

      1. For each bus cycle, one device on the bus is designated the BUS MASTER,
         and the other is designated the SLAVE.

         a. This master may the same device all the time, or provision may be 
            made to allow different devices to become bus masters on different 
            cycles.

            i. CPU's are almost always masters.

           ii. Memories are almost always slaves.

          iii. IO devices are generally slaves when receiving commands or data
               from the CPU, but can be masters when doing DMA transfers from
               memory.

         b. If multiple devices can be bus masters, then each cycle must be
            proceeded by some arbitration period when one device is chosen to
            be the master for the current cycle.  (This is generally done on a
            priority basis.)

      2. The first part of a bus cycle involves the bus master putting an
         address on the bus, with the expectation that the slave device
         will recognize it and respond.  

         a. When the slave is a memory, the address may consist of two parts:

            i. The high order bits designate a particular memory device (if
               there is more than one on the system).

           ii. The low order bits designate a particular location in that 
               memory.

         b. When the slave is an IO device, the address serves only to 
            designate the particular device.  Furthermore, sometimes only a
            portion of the address bits are used for this purpose, since the
            number of IO devices on the system is usually small compared to
            the total number of individually-addressable memory locations.

      3. At the same time, the bus master uses control signals to indicate what 
         type of data transfer is to be performed.

         a. Type of slave being addressed (memory, IO, or perhaps
            a coprocessor). (Note: some bus architectures don't need this - no
            distinction is drawn between types of devices.)

         b. Direction: master to slave (write) or slave to master (read) or
            (sometimes) slave to master to slave (read-modify-write).

         c. Quantity of data to be transferred (one byte, one word, or
            (sometimes) a block of contiguous locations.

      4. The second part of the bus cycle involves the actual data transfer.

         a. In the simplest case, one unit of data (i.e. as many bits as
            there are in the data part of the bus) is transferred.

         b. It is also possible to do BLOCK MODE transfers, in which several
            units of data are transferred from successive addresses, one
            after another.  (The address specified in the address phase is
            the address of the first unit of data).

   D. Bus cycles are used for a variety of different purposes.  (Note: high
      performance CPU's have a small amount of very high speed on-chip memory
      known as cache memory that allows most of these accesses to be done
      without the need for an actual bus cycle - more on this later.)

      1. Each instruction executed by the CPU may involve a bus cycle for
         INSTRUCTION FETCH.  Here the master is the CPU and the slave is memory.

         a. On CPU's that allow variable-length instructions, several bus
            cycles may be needed to fetch a complete instruction.

         b. As we have seen, some CPU's prefetch instructions - i.e.
            they try to maintain a lookahead of several words into the
            instruction stream by issuing instruction fetch cycles when the
            bus is not otherwise in use.

      2. Instructions executed by the CPU may involve additional bus cycles
         with the CPU as master and memory as the slave - for:

         a. OPERAND ADDRESS CALCULATION may or may not require a bus cycle.
            (A bus cycle is required when indirect addressing is used.)

         b. OPERAND FETCH (possibly more than one on 2 or 3 address machines).

         c. OPERAND STORE (possibly more than one, though multiple stores for
            a single instruction are less common than multiple fetches).

      3. IO instructions will involve a bus cycle with the CPU as master and
         an IO controller as the slave. This may involve:

         a. TRANSFER OF A COMMAND (to the controller).

         e. TRANSFER OF STATUS INFORMATION (from  the controller.)

         f. TRANSFER OF DATA TO/FROM AN IO DEVICE.

      4. On many systems, IO controllers can also initiate bus cycles.  These
         are of two types:

         a. INTERRUPT CYCLES (CPU is the slave).

         b. DMA CYCLES (memory is the slave).

      5. On multi-processor systems, bus cycles may also be used for
         INTER-PROCESSOR COMMUNICATION.

   E. Implementation of a bus

      1. This is an interesting problem because, in general, it must be possible
         for different devices to drive a given line at different times.

         a. Example: For a write transaction, the master drives data on to the
            data bus; but for a read transaction, the slave does so.  Further,
            different read transactions may involve different slaves.

         b. Example: If a bus can have multiple masters, then each master must
            be capable of driving the address and control lines in the bus 
            when it is in charge.

      2. This suggests that each device that can drive a given line of a bus
         should contain a gate (called the driver) whose output is tied to
         that line - e.g.
                                                |
                device #1       --- driver -----+
                                                |
                device #2       --- driver -----+
                                                |
                device #3       --- driver -----+
                                                |

         However, this won't work if ordinary gates are used for the drivers.
         
         ASK CLASS WHY

      3. One solution might be to implement a bus using MUXes:
                                        _______
                device #1 --------------| MUX |----- bus
                device #2 --------------|     |
                device #3 --------------|     |
                                        .     .
                                        .     .

         a. This technique is frequently used for INTERNAL BUSSES in the CPU

         b. But it is not a good approach for system busses.  ASK WHY

            - The total number of devices to be connected to the bus must be
              known when it is built (inflexible)
            - A lot more wires are needed - each bus slot must have its own
              set of connections to the MUXes

      4. The most common approach is to use TRI-STATE gates.

         a. As the name implies, a tri-state gate is one whose output can be in
            any of three states: 0, 1, or Hi-impedance.

         b. The hi-impedance state is the new one.  When the output is in this
            state, it behaves like it is not connected at all - e.g. (viewing
            the gate output as being like a switch)

            Ordinary gate:                      Tri-state gate:

                1 ---o                          1 ---o
                       \___                          o \___
                0 ---o                          0 ---o

            Output is always connected          Output is connected to 1 or 0 
            to 1 or 0                           or not connected at all

         c. Tri-state gates are available in many standard configurations (e.g.
            AND, OR, NAND, flip-flops etc.)  A tri-state gate has an additional
            input called ENABLE.  When this is active, the output of the gate 
            is determined by the other inputs, as usual; when it is inactive,
            the output of the gate is effectively disconnected from the circuit.

         d. Tri-state gates are realized by modifying the output circuit of
            a standard gate.  The following is the "totem-pole" circuit used
            by TTL gates.  (CMOS gates use a similar structure). 

                                Vcc
                                |
                            __|/
                              |\
                                |
                                +--- output
                                |
                            __|/
                              |\
                                |
                                Ground

            i. Each of the two transistors acts like a switch which is either
               off or on. If the transistor is on, then the output of the gate
               is effectively connected to Vcc or ground (as the case may be.)
               (Clearly, we cannot allow both transistors to be on at the
               same time.  This would effectively short the power supply to
               ground, resulting in the rapid destruction of one or both of
               the transistors.)

           ii. In an ordinary TTL gate one or the other of the two transistors 
               connecting to the output is on at any given time, and the other 
               is off, thus connecting the output to either Vcc or ground.

          iii. In a disabled tri-state gate, BOTH transistors are off, thus
               leaving the output effectively unconnected (as if the gate
               weren't in the circuit at all).

      5. A third approach is to use OPEN-COLLECTOR gates. This is useful for
         cases where a given device must drive a given line either to 0 or not
         at all (i.e. it never has to drive the line to 1).

         a. Open-collector gates are available in many standard configurations 
            (e.g. AND, OR, NAND, flip-flops etc.)  However, the two states of
            the output are disconnected or 0.  (The disconnected state occurs
            when the logic function the gate implements would call for a 1
            to be output.)

            Ordinary gate:                      Open collector gate:

                1 ---o                               o
                       \___                            \___
                0 ---o                          0 ---o

            Output is always connected          Output is connected to 0 
            to 1 or 0                           or not connected at all

      b. Open collector gates are realized by a different modification of the
         standard gate output circuit.  For example, this is the way a basic TTL
         "totem pole" would be turned into an open-collector gate by omitting
         one of the output transistors:

                                +--- output
                                |
                            __|/
                              |\
                                |
                                Ground

      c. Open collector gates are most often used when any number of devices
         must be able to assert the same line at the same time - e.g. an
         arbitration line representing a bus request. 

         i. Because the only state to which a device can assert such a line is 
            0, such lines are most often configured as ACTIVE-LOW - i.e. the
            active state is 0 and the inactive state is 1.

        ii. To make sure the line goes to 1 when no device is asserting it,
            such lines normally are terminated by a PULLUP RESISTOR to Vcc.

   F. Regardless of how the interfaces connect to the bus, the electrical
      characteristics of a bus system have an important influence on
      system performance.

      1. Bus designers must take at least the following characteristics into 
         consideration:
         
         a. PROPAGATION DELAY: When a bus master or slave near one end of the 
            bus places some information on the bus, it will take a measurable 
            time for that information to propagate to the other end, due to 
            effects of capacitance and inductance.  This time increases with
            increasing physical length of the bus.

         b. SKEW: The propagation delay for different lines of the bus is not
            necessarily the same; thus if several bits are changed at the same
            time near one end of the bus, the changes may be seen at different
            times at the other end.  Also, if a common clock is used to
            synchronize events, the clock may actually arrive at different
            devices at different times.

         c. LOADING: When we talked about the realization of gates, we mentioned
            that a given gate can only drive so many of a given type.  Since
            some signals generated by a bus master must be received by all other
            devices on the bus (e.g. to recognize their own address), there is
            a limit as to how many devices may be plugged into the bus.  Note,
            too, that propagation delay tends to increase with increasing bus
            load.

      2. Bus designers take these factors into consideration when establishing
         bus timing.

         a. An appropriate interval must be allowed between the time a device
            asserts a signal and the time it can expect the signal to be
            received.  This time is called the SETTLING TIME.

         b. In the case of addresses, because slew could cause the wrong device
            to respond to an address, a separate "address valid" control signal
            is often used, asserted some time AFTER the address itself is put
            on the bus (to ensure that all bits have settled.)

         c. For over a decade now, bus speeds have lagged behind CPU speeds, so
            that the basic bus cycle time is some multiple of the CPU cycle
            time (e.g. 2:1 or 3:1 or 4:1).
   
V. General Issues in the Design of Bus Systems
-  ------- ------ -- --- ------ -- --- -------

   A. Before establishing the detailed assignments of different wires on the
      bus, bus architects need to make a number of general design choices.  We
      consider these in turn now.

   B. One fundamental choice, when designing an overall system, is whether to
      have one bus to serve both memory and IO devices, or separate memory and
      IO busses.

      1. The difference can be seen by comparing diagrams:

                Two busses                      Single bus

            M--S--P--S--K--T                    P--S--M
                     |--K--T                       |--K--T
                     ---K--T                       |--K--T
                                                   ---K--T

            or (w/DMA):

            M--S--P--S--K--T  non-DMA device
               |     |--K--T    "       "
               |     |
               |     ---K--T  DMA device
               |________|

            or (w/CPU connected to memory bus only and an adapter used to 
                connect the busses)

            M--S--P
               |
            K.adapter--S--K--T
                       |--K--T
                       |--K--T

      2. On high-end computer systems, the choice is often made to have two or 
         more separate busses.  Often considerations of speed are a reason for 
         going this route - a memory bus (which gets the most intense
         use) can be made faster if it handles memory only, since the total
         length of the bus is smaller.

      3. Smaller computers generally use a single physical bus, possibly with 
         some control lines unique to memory operations and some to IO.  (Note 
         that this design choice gave rise to the name UNIBUS (one bus) for the 
         PDP-11 bus - the first system to use this design.)

         a. This makes for a simpler and less expensive system.

         b. It also simplifies the building of interfaces for DMA devices.  If
            there are separate IO and memory busses, then DMA device controllers
            must connect to both busses some how, either directly or through an
            adapter that ties the two busses together.

         c. Use of one bus reduces the number of pins needed on the CPU
            package.

      4. Note that even when there is only one PHYSICAL bus, there can be
         more than one LOGICAL bus if there are several sets of control lines.
                                                       ____
         Example: One of the control lines on Z80 bus (MREQ) is used only
                                             ____
                  for memory cycles and one (IORQ) is used only for IO
                  cycles.  Exactly one of these two lines is asserted during
                  any given bus cycle.

      5. When a single physical bus is used, there is also a choice to be made
         between using MEMORY MAPPED IO and ISOLATED IO.

         a. With memory-mapped IO, both memory and IO devices use the same
            address space - i.e. a "memory read" or "memory write" operation to 
            certain addresses actually transfers data to or from a given device.

            Example: The PDP-11 UNIBUS is designed for memory-mapped IO.

                     Addresses 000000 to 775777 (octal) are memory addresses
                               776000 to 777777 are IO devices

         b. With isolated io, separate address spaces are used for memory and 
            IO.  This also requires separate control lines for each kind of
            operation.

            Example: The Z80 uses addresses 0000 to FFFF (hex) for memory
                                                           ____
                     and 00 to FF (hex) for IO ports.  The MREQ control
                     line causes memories to look at the address lines and IO
                                                 ____
                     ports to ignore them, while IORQ causes ports to look at
                     the address lines and memory to ignore them.

          c. Each approach has its advantages and disadvantages (ASK).

            i. Advantages of memory-mapped IO:

               - Fewer control lines on the bus (an important consideration with
                 the limited pinout of microprocessor chips.)

               - The full instruction set of the processor can be used for IO, 
                 not just a few specialized instructions.  (E.g. bit-oriented 
                 instructions can be used to test/set individual bits in 
                 peripheral registers.)

               - If the CPU itself is configured for memory-mapped IO, then the
                 opcodes that would have been needed for input-output 
                 can be used for something else.

           ii. Disadvantages of memory-mapped IO:

               - With a limited number of bits available for addressing memory
                 (e.g. 16 on small microprocessors), memory-mapped IO reduces 
                 the total amount of memory that can be installed in a system 
                 since some of the address space must be used for IO addresses.
                 (This is less of a problem with processors that use wider 
                 addresses.)

               - With separate IO and memory addresses and control lines, it is
                 possible to tailor bus protocols to the characteristics of each
                 memory and IO ports separately.  

               - Interfacing can often be simpler with separate IO addresses - 
                 e.g. the number of address bits to decode is smaller.  (There
                 are many fewer ports than memory addresses.)

               - Also, IO instructions can be shorter, since they need to 
                 specify fewer address bits.

      6. Finally, we should note that some systems have been built in which
         the CPU connects to a single physical bus, but other busses are
         present in the system, being connected to the central bus via BUS
         ADAPTERS.

         Example: When the VAX line was first introduced, Digital wanted to
                  allow customers to continue to use peripherals that worked
                  with the PDP-11 UNIBUS, since there were many in existence.
                  However, it was necessary to design a new bus (the SBI bus) 
                  for memory, to allow more than 256K of memory to be
                  present.  The approach used was to build a system with one
                  to four UNIBUSes connected to the SBI bus via a bus adapter:

         P -- S.SBI -- M
              |
              |------ K.UNIBUS adapter --- S.UNIBUS -- (To UNIBUS peripherals)
              |
              |------ K.UNIBUS adapter --- S.UNIBUS -- (To UNIBUS peripherals)

   C. Another fundamental choice is the WIDTH of the bus (each bus) - the 
      number of bits used for addresses, and the number of bits used for data.

      1. The address width ultimately determines how much memory and/or how
         many IO devices can be connected to the bus.

         Example: The UNIBUS's 18 bits of address allow up to 256K of memory,
                  which seemed more than adequate when it was designed.
                  However, this eventually proved inadequate, and later PDP-11's
                  and VAXes had to resort to using a separate bus for memory, 
                  with some fairly complicated techniques used to allow IO 
                  devices (on the UNIBUS) to access memory on the memory bus 
                  for DMA operations.

      2. The data width helps determine bus throughput (# of bytes transferred
         over the bus per second).

         Example: Microprocessors are generally classified as 8 bit, 16 bit, or
                  32 bit not on the basis of the width of their internal data
                  paths, but rather on the basis of their bus width.

   D. Another fundamental choice is whether to DEDICATE various lines in the
      bus to certain functions, or to MULTIPLEX certain lines.

      1. In our discussion so far, we have assumed that the bus contains
         separate lines for address and data.

      2. To reduce the width of the bus (and thus the cost of each interface),
         the same lines can be used for both functions, but at different times
         during the bus cycle.

         a. During the first half of the cycle, they carry address information.

         b. During the second half, they carry the data being transferred.

      3. Of course, this could result in an increased cycle time, since the
         address and data parts of the cycle cannot overlap.  It also makes
         the memory system interface to the bus more complex, since it must
         now contain a register to hold the address during the data part of
         the cycle.

      4. Microprocessors with data busses wider than 8 bits sometimes have to
         use multiplexing due to pinout limitations on the chip package.

   E. The previous choices have dealt with the physical configuration of the
      bus.  Another important choice has to do with the bus PROTOCOL - the
      rules whereby the bus master and slave exchange signals with one
      another.  Here, the fundamental choice is between SYNCHRONOUS and
      ASYNCHRONOUS protocols.

      1. In a synchronous protocol, all devices on the bus share a common clock.
         The bus master puts signals on the bus and expects the slave to respond
         within a certain time frame, without looking for explicit
         acknowledgement from the slave that it has done so.

         a. Example: The Z80 memory read protocol:
                                 ____________________________________
            Address from CPU  __/                                    \__
                                \____________________________________/
            ____              _______                             ______
            MREQ from CPU            |                            |
                                     |____________________________|
                                                     _______________
            Data from memory  ______________________/               \___
                                                    \_______________/
         
            i. Note the delay between putting the address on the bus and
                         ____
               asserting MREQ.  This ensures that the address has settled so
               that only the right memory chip will respond.

           ii. The protocol specifies a maximum interval between the falling
                       ____
               edge of MREQ and the time the memory gets its data on the bus.
               (We will see that a separate wait line is provided for use by
               memories that cannot meet this standard.)

         b. Note that, in a synchronous protocol, all control signals are 
            generated by the master.  

            i. The slave has to respond by providing the data, but does not 
               send any control signals to the master.  This simplifies the
               construction of the slave interface.

           ii. However, if the slave failed to respond the CPU would never 
               know.  (A totally floating bus looks like a byte of all 1's, so 
               if the CPU addressed a non-functioning (or nonexistent) slave it 
               would think that the slave was sending it the value FF and that 
               would be treated as the slave's data.)

         c. Note, too, that in a synchronous system all devices are expected
            to be able to respond within a specific time frame when they are
            addressed.  Since this is not necessarily realistic, many
            synchronous systems include a WAIT control signal that a device
            may assert if it needs more time.
                                                                       
            i. Example: Many bus systems - including that of the Z80  ____
               we will be using in lab -include a control line called WAIT.
               If the device being addressed asserts this, the progress of
               the protocol is held up until it is released.

           ii. A typical use of this control line is to interface memory
               chips with a longer access time to a system.  This is the
               origin of the phrase "zero wait state memory" - describing
               a system whose memory chips are fast enough not to require
               use of this facility.

      2. In an asynchronous protocol, the CPU and port EXCHANGE a series of
         signals.  

         a. For example, the following is the protocol for a memory or IO 
            IO read on the MC68000 microprocessor:
                                 ___________________________________
            Address from CPU  __/                                   \__
                                \___________________________________/

            Address and data  ______                             ____
            strobes from CPU        \___________________________/
                                          ____________________________
            Data from slave   ___________/                            \_
                                         \____________________________/
            _____       
            DTACK from slave  _________________                     ___
                                               \___________________/

            i. The CPU puts the address on the bus, waits for settling time,
               then asserts the strobes (three separate lines).  It then
               waits for the memory/port to respond.

           ii. The memory or port addressed places its data on the bus, waits
                                                   _____
               for settling time, and then asserts DTACK.

          iii. The CPU captures the data, then releases its strobes.  After
               a settling time it also releases its address.
                                                                        _____
           iv. The port, seeing the strobes no longer asserted, releases DTACK,
               then (after a settling time) its data. 

         b. This exchange of control signals is often referred to as 
            "HANDSHAKING".

            Note that the issue in the handshake is the TRANSMISSION of the
            data, not its PROCESSING by the device.  For example, a printer
            may take several milliseconds to print a character.  But its
            interface will handshake with the CPU when the character to be
            printed has been received, not when it has actually been 
            printed. (The CPU must still poll a status bit separately to be 
            sure that the printer has finished printing the preceeding 
            character.)  

      3. The choice of synchronous versus asynchronous protocols is 
         basically made by the CPU designer.  However, if desired, 
         handshaking can often be added to a synchronous system (e.g. by 
         appropriate use of WAIT on the Z80.)

         For example, suppose we wanted to use handshaking for IO (only) ____
         on a Z80 based system.   For this, we add the new control line  SYNC
         and require that the addressed device assert this to indicate that 
         it has responded to the transfer request.  Of course, now we must 
         make the  Z80 wait during an IO cycle for this response to occur.  
                        ____
            We can use  WAIT for this, as follows:

        ____               _________                    Note that this NAND gate
        SYNC    ____|\o____|        \                           ____
        ____        |/     |         \       ____       asserts WAIT to the Z80
        IORQ    ____|\o____|          )o____ WAIT       if we are in an IO cycle
        __          |/     |          )                  ____         __
        M1      ___________|         /                  (WAIT low and M1 high) 
                           |________/                   and the device hasn't  
                                                                      ____
                                                        yet responded(SYNC high)

      4. Both synchronous and asynchronous protocols have their pros and cons:

         a. In favor of the synchronous approach:

            i. Interfacing is simpler: the slave does not need to send any
               signals back to the CPU.  (However, this advantage goes away 
               if the slave is slow and must request wait states.)

           ii. The synchronous approach is faster overall, since fewer signals 
               must be put on the bus.   (Recall that each signal must be
               followed by a settling time before other activity can occur.)

         b. In favor of the asynchronous approach:

            i. This approach can accomodate a wide variety of interface speeds
               mixed on the same bus.  

               - This allows older and newer technology interfaces to be used
                 on the same system, increasing the range possible devices that
                 can be interfaced.

               - If a slow interface is replaced with a fast interface, system 
                 speed immediately improves without changing anything else.

           ii. This approach gives a positive assurance that the requested 
               data transfer has actually occurred - i.e. the interface 
               addressed exists, is working, and was able to respond.  (Of 
               course, if an attempt is made to access a nonexistent or 
               nonfunctional device, the CPU could wait forever for a 
               handshake signal that never comes. This is usually handled by 
               having a bus timeout mechanism that causes a trap to a software
               routine that deals with the problem.)

         c. On systems having separate busses for memory and IO, it is common
            to find that the memory bus is synchronous (for speed, and since
            the memories can be assumed to be of uniform technology), while the
            IO bus is asynchronous (for interface flexibility.)

   F. Finally, if more than one device can serve as a bus master, there is the 
      matter of BUS ARBITRATION - how is a master chosen if more than one
      device wants to use the bus at the same time?

      1. There are two basic approaches that can be taken.

         a. A centralized approach: one device (often the CPU) is designated
            as the bus ARBITRATOR.  All requests to use the bus are routed to
            it and it gives permission on a priority basis.

         b. A decentralized approach: all potential bus masters look at the
            arbitration lines, and the highest priority device recognizes that
            it has priority and proceeds while all others wait.

      2. An example of a centralized approach: DAISY-CHAINING:

         a. Each device has an BUS-GRANT INPUT (BGI) and a BUS-GRANT OUTPUT 
            (BGO).

         b. The devices are connected in a chain, such that the BGO of one
            device connects to the BGI of its neighbor.  The first device
            on the chain receives an external grant signal (usually coming
            from a centralized arbiter) and the last device on the chain has
            no connection from its BGO.  Usually, all devices are also connected
            to a common request line.

                _______
                REQUEST ------------------------------------------------
        Arbiter                |            |            |             |
                           ----------   ----------   ----------   ----------
                GRANT   ---|BGI  BGO |---|BGI  BGO|---|BGI  BGO|---|BGI  BGO|
                           ----------   ----------   ----------   ----------
                               ||          ||            ||           ||
                  Other ================================================
                  bus signals
        
         c. When the arbiter sees an incoming bus use request and is able to 
            grant it, it asserts BGI to the first device.

         d. Each device behaves as follows:

            i. If its BGI is not asserted, then it does not assert its BGO.

           ii. If its BGI is asserted then

               - If it wants the bus, it uses it and leaves BGO unasserted.
               - Otherwise, it asserts BGO.

         e. The result is that, if multiple devices request the bus, only the
            one nearest the arbiter gets to use it.
Copyright ©1999 - Russell C. Bjork