CS222 Lecture: Input-output devices and interfacing     Last revised 3/26/98

Transparencies: IBM 370 block diagram, 
                Priority Encoder, Z80 DMA, Software polled interrupt network

Outline:

1. Device characteristics and control requirements
2. Connecting the CPU to IO devices - options and examples
3. Control options: programmed, interrupt-driven, DMA

I. IO Devices (T's)
-  -- ------- -----

   A. Practically anything can be connected to a computer system as an IO
      device: not only conventional peripherals but such things as automobile 
      carbeuretors and nuclear power plants.  However, we want to look at some 
      of the more common kinds of device: terminals, printers, tape units,
      disks, and real-time devices.

   B. TERMINALS

      1. Terminals function both as input devices and as output devices.  When
         used as an input device, though the host often has to be responsible
         for "echoing" the typed characters back to the device.  Thus, each
         input operation may actually involve both input and output 
         communication with the device.

      2. Input characteristics:

         a. Speed: The communication link may allow speeds of up to 6000
            bytes/second.  

            i. However, when a human is typing, the speed tends to be quite low.
               (Observe that a 60wpm typist is typing about 5 characters per 
               second, with occassional pauses.)

           ii. It is common to connect PC's to a host system as terminals.  The
               maximum data rate can be achieved when the PC is downloading 
               data it has stored internally to the host.

         b. Input tends to come in bursts, with pauses between.

         c. Input can require significant per-character activity by the
            host - including:

            i. Checking for line terminators.

           ii. Processing control characters such as DEL, break etc.

          iii. Echoing the data back for display.  This may involve sending
               back more than one character for one character received - e.g.
               CRLF for CR; BS space BS for DEL on a video terminal etc.

         d. Often, there is a provision for synchronization protocols.  One
            such protocol, XON/XOFF, uses two special characters ^Q (XON) and
            ^S (XOFF).

            i. If the terminal sends XOFF to the host, then the host will not
               send any more data to the terminal until it sends XON.  This
               allows terminals to keep up with bursts of data from the CPU -
               especially important for hardcopy devices or PC's that have
               to pause to write data to a local disk from time to time.

           ii. In like manner, the terminal may also respond to XON/XOFF from
               the CPU.  This is especially necessary for block mode terminals
               or PC's.

         e. There are three options for handling this per-character processing

            i. The CPU can do it.
           ii. A special terminal interface board with its own microprocessor
               can handle it.  Such boards typically handle 8-32 lines.
          iii. A dedicated "front end" minicomputer can handle it for all
               terminals on the system.  This is common on larger mainframes.

      4. Output characteristics

         a. Terminals are generally line-oriented - i.e the host normally
            sends the characters making up a single line in one burst, often
            with pauses between lines.  (But not always)

         b. Some per-character processing may be required for output, too -
            e.g. Conversion of control characters like Form Feed, Tab to
            line feeds, spaces.

         c. Note that XON/XOFF's send to the host as input by the terminal
            actually control output from the host to the terminal and vice
            versa.

   C. PRINTERS are output only devices and are available with a broad spectrum 
      of capabilities, from relatively slow dot-matrix printers to fast laser 
      and page printers.  The former may print under 100 cps; the latter 10's 
      of 1000's of cps.

   D. GRAPHICS DEVICES, both video and hardcopy, have many of the 
      characteristics of printers.  (In fact, many printers can also
      do graphics.)  Video graphics devices impose special requirements on the
      system, because the display must be continually refreshed.  This is
      generally accomplished by including a local memory in the device to store
      a "bit map" of the screen as displayed.  Sometimes, this local memory is
      seen as part of the regular system memory space by the host CPU.

   E. POINTING DEVICES (mice, graphics tablets etc.) are input devices often 
      used in conjunction with video graphics devices.  

      1. They typically send "messages" to the CPU to report "events" such as 
         mouse movement (rotation of the ball), button clicking, or the touching
         of the tablet.

      2. A distinctive characteristic of these devices is that their input is
         often unsolicited.  The CPU must be able to receive and respond to a
         message from them at any time.

   F. MAGNETIC TAPE has already been discussed under memory.  Tape devices have
      data rates of 100,000 to 1 million cps, and are used for both input and
      output.

   G. DISK has also been discussed already.  Data rates may exceed 1 million 
      cps.  Disks are also used for both input and output.

   H. REAL TIME DEVICES

      1. This broad category includes various sensors (input devices) and 
         effectors (output devices) that might be found in process control 
         environments.  The distinctive characteristic of these 
         devices is that the outside world is directly affected by the
         computation in progress as it occurs. 

      2. The requirements vary widely, but range from very low volumes of data
         (e.g. a simple A/D converter) to very high volumes (e.g. an image
         digitizer.)  Often, fast CPU response to certain kinds of external
         events detected by a sensor is expected (e.g. to an 
         over-temperature or over-pressure alarm in process control.)

   I. NETWORK INTERFACES

      1. A computer that is part of a network will contain a network
         interface that connects to the computer's system bus on one side and
         to the network on the other side.  (For illustration, we will assume
         that the network is an ethernet.)  The interface must be capable of 
         two basic operations, plus additional functions:

         a. Transferring a stream of data from the computer to the network -
            which may involve waiting for the network to be idle before
            initiating a transmission.  (Ethernet is a shared bus).

         b. Monitoring the data coming over the network, and receiving any
            data that is addressed to this particular computer (or is
            broadcast to all computers on the network), while ignoring all
            other data.

         c. Data is transmitted in units called packets, which have headers
            that contain network control information, followed by the actual
            data; frequently, the interface is made responsible for generating 
            and interpreting these headers.

         d. For these reasons, a network interface will often be built around
            a dedicated microprocessor on the interface card itself.

      2. The data rate that must be supported depends, of course, on the
         type of network; for ethernet it can be either 10 MBits/sec (about
         1M Byte/sec) or 100 Mbits/second; other networks can reach gigabit
         data rates.

   J. THE SYSTEM CLOCK

       1. Most systems include a real-time clock that keeps track of the current
          time of day.  Often, this clock will be set up to interrupt the CPU
          at regular or programmable intervals (often every 1/60 second as
          tied to the power-line frequency).  At every tick, system software
          updates the internal time.

       2. These are IO devices only in the sense that they interface to the
          CPU like IO devices.

   K. MISCELLANEOUS DEVICES

      In addition to the devices we have listed here, there are, of course, many
      many others that could be named - scanners, video input, etc. etc.  
      The principles we develop for the devices we did list will work with 
      these, too.

   L. Summary of Requirements Imposed on the System

Device          I/O     Transfer rate (bps)     Special CPU involvement

Terminal/PC     I,O     up to 6000              Considerable per
                                                character processing
Printer         O       100-100,000 +

Graphics device O       Varies widely

Pointing device I       Generally very low      Respond at any time to
                                                message from device

Tape            I,O     up to 1 million or so   Blocking/deblocking data

Disk            I,O     up to 2 million or so   Blocking data;  scheduling of 
                                                requests to minimize head 
                                                movement

Real Time       I,O     varies                  varies - often needs very fast 
                                                response to certain events

Network         I,O     up to gigabit           Considerable processing
                                                generally done by dedicated CPU
                                                that is part of the interface

Clock           I       none                    Update internal time each "tick"

II. Connecting the CPU to IO Devices - Options and Examples
--  ---------- --- --- -- -- ------- - ------- --- --------

   A. In discussing IO systems, one will often hear reference to an "IO port".
      A port is simply a mechanism for the transmission of information
      to/from a particular device.  Implementing a port requires the building
      of an interface to the system bus, plus an interface to the device.

      1. The interface to the device may take one of three forms:

         a. It may be a serial line (one wire for each direction of data flow).
            This is commonly used for low data-rate devices (terminals, some 
            printers and graphics devices, pointing devices) - especially when
            they are located at a considerable physical distance from the CPU.

         b. It may be a parallel cable (one wire per bit of data to be
            transferred at one time, plus control lines).  This is required for
            high data-rate devices.  Because these busses are built for speed,
            their physical length must be kept short, requiring the devices
            they serve to be physically close to the CPU.

         c. An extension of the idea of a parallel interface to the device is
            to use a bus system in its own right to service multiple devices
            from one controller.

            i. In large computer systems, controllers for devices like disks 
               are often set up this way, to allow one controller to service
               multiple disks.

           ii. On smaller systems (PC's and workstations), there are a number
               of peripheral bus standards in common use - e.g. SCSI (small 
               computer systems interconnect) - pronounced "Scuzzy"; IDE.
               The computer system itself contains a single interface from its 
               internal bus to the external, and then individual devices in
               turn are interfaced to it.  (Of course, now each device must 
               have its own internal interface between the external bus and the 
               device proper!)

         d. The MPF-I uses both serial and parallel IO.

            i. Communication with the keyboard and display is done in parallel.

           ii. But data is stored on cassette tape as serial data.

         e. We will begin by focussing on parallel IO.  Serial IO always 
            requires an interface that connects to the system bus as a parallel 
            device and to the external device as a serial device, so we need 
            to understand parallel operation first.

   B. In designing an IO system, there are a number of key design choices
      to be made.  Some of these choices are determined by the architecture
      of the CPU, while others are made by the system designer.

      1. Separate IO processor versus CPU responsible for IO.

         a. On very large mainframe computer systems, it is common to have
            one or more separate programmable processors responsible for
            handling IO, with the main CPU doing computation only.  This
            allows each kind of processor to be customized for the particular
            task it has to perform.

            Example: IBM 370 - TRANSPARENCY

            This approach has found its way into smaller computers as well;
            for example, some microcomputer families offer the system designer
            the option of either having the microprocessor do IO itself or 
            adding a special IO processor to the system configuration.  (But
            this is always an option, never a necessity as with the 370.)

         b. Note that, regardless of which approach is taken, the following
            discussion is still relevant either to the CPU itself or to
            the IO processor/coprocessor - though we will assume that the
            CPU handles IO for simplicity.

      2. Several of the bus system design issues we discussed in an earlier
         lecture come into play here.

         a. Separate IO and memory busses versus a single physical bus.

         b. Separate IO address space versus Memory-mapped IO. 

            i. Primarily an issue on systems with a single physical bus - 
               though memory mapped IO COULD be used on any system by ignoring 
               the separate IO bus if it is present and hooking everything to 
               the memory bus.

           ii. Mandatory if the CPU has a single physical bus which is NOT
               divided into two logical busses, and no special I/O instructions.

               Example: PDP-11, VAX, Motorola 680x0 series processors

         c. Synchronous versus asynchronous bus protocols.

         d. Linear select versus partially decoded versus fully-decoded 
            addressing.

            i. In any interfacing scheme, an IO port must be able to recognize 
               and respond to its unique address (whether that be an IO port 
               number or an address within the memory map.)  This requires 
               decoding logic.

           ii. In a fully-decoded addressing scheme, all of the relevant bits of
               the address bus are considered by the decoder.  This means that 
               the device will respond to the correct address, and no others.  
               However, this requires considerable decoding logic (e.g. 8 
               address bits + 3 control lines for a port on the Z80).
               Therefore, on smaller systems one of two shortcuts is often used
               to reduce decoding logic requirements.

          iii. Partial decoding: only some of the relevant address bits are 
               actually decoded; the rest are ignored.  This means that the 
               port will actually respond to several different addresses, 
               though only one is its "official address".

               Example: the address of 8255 #1 port A on the MPF-I is 80.  
               However, since the decoding logic for IO ports ignores A3 and 
               A2, this chip will actually respond to port numbers 84, 88, and 
               8C as well.

           iv. If the number of ports is small, it may be possible to assign 
               port addresses so that each port has one bit in its address that
               is unique to it.  This address bus bit can then be used directly
               as a chip select, with no decoding.

               Example: A Z80-based system with 8 ports might assign (binary) 
               addresses to the ports as follows:

                        1111 1110
                        1111 1101
                        1111 1011
                        1111 0111
                        1110 1111
                        1101 1111
                        1011 1111
                        0111 1111

                   (Note that each address includes exactly one 0.)

               Now suppose each device has two separate low-true chip select 
               inputs, as is often the case.  One of these can be derived by 
                                          ____      __
               NAND-ing the complement of IORQ with M1.  This will be common 
                                                        ____            __
               to all chips: no chip can respond unless IORQ is low and M1 is 
               high.  Individual selects for each port can now be derived as 
               follows. For the first:
                __
                CS = A0

               for the second:
                __
                CS = A1

               etc.
   
               (Of course, if each chip includes some address decoding logic on
                it, as the 8255's do, we might only include (say) 6 of the 8 
                address bus bits in this linear select scheme.)

   C. Basic requirements for implementing an IO port

      1. Regardless of what schemes we use for port addressing and selection, 
         bus protocol etc. each port must meet certain requirements:

         a. It will have an interface to the system data bus.

            i. If it is an input port (data from it can be read by the CPU), it 
               must have one tri-state bus driver (buffer) per bit, which will 
               be enabled only when the CPU is reading from the port.

           ii. If it is an output port (data can be sent to it from the CPU), 
               it must have one latch per bit to hold the data the CPU sends, 
               since the data will only be on the data bus for a brief time.  
               This latch must be enabled to capture data only when the CPU is 
               writing to the port.

         b. It will have to include the necessary logic to respond to system 
            bus control signals and (if necessary) to generate handshake 
            signals or wait state requests.

         c. It will have an interface to the device it serves.

      2. Often, additional facilities are included in a port

         a. An input port may include a latch to hold the data from the device
            until the CPU reads it.

         b. The port may have handshaking signals with the device it serves to
            control transfer between the port latches and the device.

         c. The port may have one or more status bits that the CPU can read to
            determine if the port is ready for a data transfer.  (These status
            bits will actually be part of a separate status port that the CPU 
            can examine.)

         Example: input port for a keyboard.  When a key is struck, a byte is
                  transferred to a latch in the port.  The "data available"
                  status bit will be set and will remain set until the data is
                  read by the CPU.  (This bit will be part of a separate port.)

         d. The port may be able to generate an interrupt to inform the CPU that
            the device has just completed a data transfer to/from the latches in
            the port, so that it would now be appropriate for the CPU to do a
            transfer to/from the port.

            Example: In the above, the setting of "data available" may trigger 
                     an interrupt.          

III. Control Options

   A. We have seen how parallel devices can be interfaced to a computer
      system bus by the use of appropriate devices.  We now turn to a 
      consideration of mechanisms for transmitting data to/from the device
      at an appropriate rate.

   B. IO devices vary widely in the rates at which they can handle data.  

      1. At one end of the spectrum, the rate at which a keyboard produces 
         data is limited by the typing speed of a human typist, which is rarely
         more than 5 characters per second, and can be as slow as one character
         every few seconds (or less.)   Pointing devices (e.g. mice) have
         similar characteristics.

      2. At the other end of the spectrum, devices such as standard magnetic 
         tape (not cassettes) or disks can transfer data at rates in excess of 
         1 million bytes/second.  Fast network interfaces can transfer
         10's or even 100's of millions of bytes/second.

      3. Compare these numbers with CPU speeds, where the clock rate may be
         on the order of 100s of MHz.  In some cases, the CPU could execute
         1000's of instructions in the time the IO device takes to process
         one byte of data; in other cases, the CPU cannot even execute one
         instruction in this time.
 
   C. For any device then, we need some way of coordinating the device speed
      and the CPU speed.  In most cases, the CPU will have to wait for the
      slower IO device to perform its work; but in other cases the reverse may
      be true.

   D. For illustration, we will use a typical moderate-speed printer, which
      may print at a rate of 100 CPS (characters per second) or so, or roughly 
      1 millionth of the clock rate of a typical CPU.   (We will assume
      for now that the printer is connected as a parallel device, though
      a serial connection may also be used.  We will also ignore the possibility
      of the printer having its own memory to buffer incoming characters.)

      1. If we assume that the CPU executes a loop in which it sends
         characters to the printer as fast as it can, and the loop contains
         10 instructions, then the CPU can send 10 million characters per
         second and the printer can handle only 100.  If we did this, over
         99% of the characters would be lost!

      2. To prevent such things from occurring, devices such as printers are
         typically built with some sort of status flag which indicates whether
         the printer is able to accept a character.  This flag will be part of
         a status byte that the CPU can read (possibly along with other flags
         such as "out of paper" etc.)  Thus, the interface to the printer will
         include at least two separate ports: an output port to which an ASCII
         character may be sent, and an input port which may be read in order to
         determine the device's status. 

      3. Two different kinds of status flags may be used:

         a. A "ready" flag would be 1 to indicate that the device is able to
            accept a new character, and 0 to indicate that it cannot accept
            a character because it is still printing a previous one.

         b. A "busy" flag would be 1 to indicate that the device is printing a
            previously-sent character, and so cannot accept a new one, and 0
            to indicate that it is available to accept a new character.

         c. Clearly, these definitions of "ready" and "busy" are complementary;
            if the designer of a particular system chose to implement a ready
            flag, and a user wanted a busy flag, all he would have to do is
            invert the flag supplied.

         d. For our discussion, we will assume a printer with a "ready" flag.
            We assume the printer manages this flag as follows:

            - When it is first turned on, the printer sets its ready flag to 1.
            - When it receives a character to be printed, it sets its ready
              flag to 0.
            - When it has finished printing the character, it sets its ready
              flag back to 1.
      
      4. To prevent loss of data, we impose the following requirement: a
         character to be printed can only be send to the printer's output
         port when the printer's ready flag is 1.

   E. Four basic approaches may be taken to synchronizing the CPU and its
      external devices.

      1. A strictly hardware-based approach, in which the device interface
         "hangs" the CPU if a data transfer is attempted when the device is
         not ready, by using handshaking or WAIT control lines.  

         a. In our example, we might construct a configuration like the
            following:
         ______                                    ____      
         Select for printer from decoder --------o|    \    |\        ____
                                                  |     )---| >o----- WAIT
         Ready flag from device -----------------o|____/    |/
                                      ____
         b. This circuit would assert WAIT to the CPU (thus forcing some number
            of wait states to occur) whenever the device was selected by an IO
            operation but was not ready for it.

         c. Such an approach, however, is seldom desirable.  A busy device will
            hang up the entire system, enabling it to do nothing else, until
            it has finished its work.  Thus, we shall not pursue this approach
            any further.

      2. A strictly software-based approach, in which the CPU tests the device
         status before sending data.

         a. The simplest form of this is an approach known as busy waiting.
            Suppose, for illustration, that:

            - Printer data to be printed is to be output to port 40
            - Printer status can be read from port 41.  The low order bit of
              the status is the ready bit.

            Then the following subroutine could be used to print a character
            from the A register.  (The example code is for a Z80):

                PRTCHR  PUSH    AF
                WAIT    IN      A,(41H)
                        BIT     0,A
                        JP      Z,WAIT
                        POP     AF
                        OUT     (40H),A
                        RET

            Note that the loop from WAIT through JR would typically be
            executed over a thousand times for each character printed.   As
            in the previous strictly-hardware example, the CPU would do nothing
            else during this time.

         b. Another variant of this is an approach known as polling.  Here,
            instead of looping forever on a status test, the CPU arranges to
            test the status periodically, doing other useful things in the
            meantime.  Such an approach might be used, for example, on a CPU
            dedicated to serving the IO needs of a large number of IO devices
            (e.g. a communications "front end" processor on a mainframe system).
            Such a processor might execute code like the following:

                for i := 1 to NumberOfDevices do
                    test status of device i
                    if it is ready, then service it

         c. With somewhat more difficulty, polling of devices might be
            intermixed with other kinds of computation.  This is made difficult
            by the need to have the other computational routines "remember"
            to call the polling routines from time to time.

         d. Except for CPU's totally dedicated to IO, total software control of
            IO is rarely satisfactory.

         Example: simple microcomputer operating systems such as CPM and
                  MS/DOS use a polling approach to keyboard input.  For
                  example, in CPM the keyboard ready flag is checked under 
                  two circumstances:

                  - as part of a busy waiting loop whenever the current program
                    needs keyboard input.  (At this point, busy waiting is
                    appropriate since the program cannot proceed until the data
                    is obtained.)

                  - Whenever IO is done to another device such as the screen
                    or printer, CPM also checks the keyboard flag.

                  The effect of this is that one can control-C a running 
                  program which does output to the screen or printer; but the
                  only way to stop a program that does no IO at all is to
                  reset the system!

      3. A third approach to IO control uses a mixture of hardware and software
         techniques, by utilizing the interrupt capabilities of the CPU.  This
         approach is called interrupt-driven IO.

         a. Most CPU's have one or more interrupt control lines, which may be
            asserted by an external device.  (For now, we assume just one.) We 
            begin by connecting the device's ready flag to the CPU's 
            interrupt input, so that an interrupt is requested whenever the
            device needs CPU attention:

                                             (other devices)
                                        |\      |        ___
        Ready flag of device -----------| >o----O------- INT
                                        |/      |

                                (Note: this gate is normally an
                                 open collector gate to allow
                                 multiple devices to connect to the
                                 same line.)

         b. We further arrange for the CPU to respond to the interrupt request
            by executing a software routine that performs an appropriate data
            transfer to/from the device, thus clearing the ready flag and
            removing the interrupt condition until the operation is complete,
            at which time a new interrupt will be generated.

         c. Interrupt-driven IO has several complexities that must be dealt with
            in a complete system:

            i. With the simple hardware configuration shown above, we assumed
               that we would always want the device to interrupt when it becomes
               ready.  In the case of a device like a printer, however, there 
               may be times when we have no work for it to do.  In such a case,
               we want to be able to tell the device to quit interrupting until
               some more work comes along.  This is conventionally handled by
               including an interrupt-enable flag in each interface, which the
               CPU can set and clear to determine whether that particular
               device may interrupt.

                                     ____     (other devices)
        Ready flag of device -------|    \      |        ___
                ____                |     )o----O------- INT
                |  |----------------|____/      |
                |  |
                |__| Interrupt-enable flip flop (settable/clearable by CPU)

           ii. If the system has more than one IO device (as it generally does),
               some provision must be made to cause the software for the proper
               device to be invoked when the interrupt is received.  

          iii. Further, if two or more devices become ready at the same time we
               want to guarantee that each is serviced in turn without 
               interference from the other.  We may wish to prioritize the 
               interrupts so that the highest priority device gets served
               first, and we may even wish to allow a higher priority device to
               interrupt a lesser-priority one.

         d. The problem of identifying the device responsible for the interrupt
            can be handled in one of two ways:

            i. The simplest approach (from a hardware standpoint) is simply to 
               have the CPU poll all the devices to see which one is in need
               of service.  However, this makes the interrupt service routine
               slow, and so is not generally desirable.

           ii. Instead, most systems make some provision for the interrupting
               device to place some data on the system bus to identify itself.
               This is done in response to an interrupt acknowledge signal from
               the CPU - e.g.

                                _______                              ________
        Interrupt request              |                            |
                                       |____________________________|
                                ________________                   __________
        Acknowledge                             |                  |
                                                |__________________|
                                                    ________________
        Device identification   ___________________/                \________
                                                   \________________/

        (Note: we assume that the device also uses the acknowledge as an
         indicator to remove its request.)

          iii. Anything that will uniquely identify the device can be chosen
               for the device's response; but the most typical choice is to
               require the device to put on the bus a memory address which
               is either:

               - The starting address of a service routine for the device.

               - The address of a memory location which contains the starting
                 address of a service routine for the device.  

               (The second option is more flexible since it allows system
                software to be restructured without rewiring the devices,
                so long as a table of service routine addresses is kept in
                a fixed, known location.)

               This approach is known as VECTORED interrupts.

               Example: On the Intel 8086/8088, memory locations 0..3FF are
                        reserved for interrupt vectors.  An interrupting
                        device puts an 8 bit interrupt type on the bus,
                        which the CPU uses as an index into a table of 32
                        bit addresses of service routines.  Each device is
                        generally assigned a unique interrupt type from
                        among the 256 possible values.  A very similar
                        approach is used by later members of the 80x86 family,
                        except the table can be anywhere in memory, not
                        necessarily at 0..3FF (a special CPU register is
                        used to point to it.)

           iv. Vectored interrupts are the most sophisticated scheme and
               are available on most medium and larger CPUs, including
               many micros.

         e. The problem of multiple devices interrupting at the same time
            can be handled in several different ways.

            i. First, all CPU's have some mechanism whereby interrupt
               recognition can be temporarily disabled.  (The request is
               present, but the CPU ignores it until interrupts are
               re-enabled.)  This allows an interrupt service routine to
               protect itself from interrupts by other devices.

           ii. But we still have to ensure that when an interrupt is
               accepted only one device will respond.  One way to do this
               is by daisy-chaining.  We add one new input and one new output 
               to each interface, known as IEI (interrupt-enable-in) and IEO 
               (interrupt-enable out.)  

               The various interfaces are connected as follows:
                    ___
                    INT ------------------------------------------------
                               |            |            |             |
                           ----------   ----------   ----------   ----------
        Acknowledge    ---|IEI  IEO |---|IEI  IEO|---|IEI  IEO|---|IEI  IEO|
                           ----------   ----------   ----------   ----------
                               ||          ||            ||           ||
                  Other ================================================
                  bus signals
        
               Note that the first device receives an input of 1 when the
               CPU acknowledges an interrupt, and either keeps it or passes
               it on to the next device.  On a Z80, the IEI input to the
               first device might be derived as follows:
                ____       _____
                IORQ ----o|     \
                __        |      )---- IEI to first device
                M1 ------o|_____/

               Each interface is now wired something like the following:

                                               ____
                        ----------------------|    \
                       |              |\    /-|     )---------  IEO            
                       |           +--| >o-/  |____/
                       |           |  |/       ____
                IEI ---+-----------)----------|    \    Put device's vector
                                   |          |     )-- on bus
                Internal request --+----------|____/
                                   |
                                   |  |\                ___
                                   +--| >o------------  INT
                                      |/
                                       ___
               - Any device may assert INT, and multiple devices may do so at
                 the same time.

               - However, only the requesting device nearest the CPU will see
                 the acknowledge signal, and so it alone will put its vector
                 on the bus.

               - To prevent race conditions, however, we must ensure that no
                 device near the CPU decides to request and interrupt (and
                 thus "steal" IEI) when a device further down the chain is in
                 the process of being acknowledged.  This can be done by
                 wiring the internal request so that it cannot be set when IEI
                 coming into the interface is high.

          iii. Another way to handle the problem of multiple devices 
               interrupting at the same time is by the use of a special purpose
               support chip called a priority encoder.

               - As an example, a one out of eight priority encoder has 8 inputs
                 and 4 outputs.  The inputs are numbered 0, 1, 2 ... 7, with
                 7 being the highest priority input and 0 the lowest.

               - One of the outputs is asserted if at least one of the inputs
                 are asserted.  This output is called GS.

               - The remaining three outputs encode the number of the highest
                 priority input that is currently asserted.  (If no input is
                 asserted, these outputs encode are normally ignored.) These 
                 outputs are designated A2, A1, A0.  

               Examples: 

                    - No input is asserted.  GS is not asserted, A2..A0 ignored.
                    - Input 4 is asserted.   GS is asserted, A2..A0 encode 4.
                    - Inputs 4,5 asserted.   GS is asserted, A2..A0 encode 5.
                    - All inputs asserted.   GS is asserted, A2..A0 encode 7.

                TRANSPARENCY - TERRELL PAGE 359

                Notes:

                - The 8212 is a microprocessor support chip that accepts an 
                                                                   ___
                  8-bit data item from an external source when its STB input
                                           ___
                  is asserted.  So long as STB remains asserted, the data in 
                  the 8212 will follow the inputs, changing with any change on 
                                                                          ___
                  them.  At the same time, the 8212 also starts to assert INT. 
                  The data contained in the 8212 is put onto the system bus 
                                                      ___
                  when DS2 is asserted, at which time INT is de-asserted.

                - In this example, the 8212 is wired to load a value suitable
                  for forming a Z80 interrupt vector.  The connection from
                  __                            ___
                  GS of the priority encoder to STB of the 8212 causes the 
                  interrupt request process to begin as soon as any one of the 
                  eight devices requests an interrupt.  However, the interrupt 
                  that is actually finally generated will be the highest 
                  priority one in effect when the Z80 acknowledges, since the 
                  8212 follows changes in the output of the priority encoder.

                - This circuit relieves the individual interfaces from the task
                  of generating a vector to put on the bus.

                - It is assumed that each individual interface removes its
                  interrupt request when it is addressed for a data transfer.
                  Thus if multiple devices are simultaneously requesting 
                  interrupts, each will be serviced in priority order and will 
                  remove its request at that time.

            iv. Some CPU's effectively internalize what the priority encoder
                does, by having multiple interrupt lines coming in at
                different levels.  

                - For example, the PDP-11 has 4 such lines, designated 
                  BR4 .. BR7, with BR7 being the highest priority.
                  Each level also has its own acknowledge line.

                - The CPU contains a three bit field in the PSW that encodes
                  a processor priority.   (This can range from 0..7).  Under
                  normal conditions, the CPU priority will be 3 or less.

                - An interrupt will only be acknowledged when the CPU
                  priority is less than that of the incoming request - e.g.
                  a BR4 request will only be acknowledged if the CPU
                  priority is 3 or less.

                - If multiple requests are coming in, the highest priority
                  request is acknowledged.

                - Generally, the service routine for a given device will
                  see that the PSW is set to a priority level equal to
                  the priority of the interrupt that called for the service.
                  This means that a service routine for a level 4 device
                  (e.g. a terminal) cannot be interrupted by any other
                  level 4 device, but can be interrupted by level 5 or
                  higher devices.  When the service routine exits, it
                  resets the CPU priority to what it was on entrance.

            v. This last approach is known as MULTIPLE-LEVEL interrupts.  It
               is the most sophisticated scheme, found on most medium and
               larger CPUs (including many 16/32 bit micros)

           vi. Of course, it is still possible to have more devices than levels.
               (And generally this will be true.)  In this case, a daisy
               chain can be used to prioritize devices on the same level.

                Example:



                -------
                | CPU |<-- Level 4 daisy chain
                |     |<-- Level 5 daisy chain
                |     |<-- Level 6 daisy chain
                |     |<-- Level 7 daisy chain
                -------   

      4. Another approach to IO control is direct memory access (DMA).  This is
         an approach that is totally based in hardware.

         a. We have noted that the speed of IO devices ranges from several
            thousand times slower than the CPU to as fast as the CPU itself.
            When device speeds approach those of the CPU, the other forms of
            IO control we have discussed cease to be useable, since any form
            of software IO requires several machine instructions (at least)
            to transfer a single item of data.  Thus, when device speeds
            approach 10% or so of CPU speed, software control of IO becomes
            impractical.

         b. The alternative for fast devices is to allow the device interface
            to gain control of the system bus each time it has a data item
            to transfer.  This means, in essence, that the interface must 
            contain some of the capabilities of a CPU.

            - It must be able to generate the various system bus signals for
              a MEMORY operation (such as MREQ) and to gate its own address 
              and data information onto the bus.

            - Typically, the interface needs at least two registers of its
              own:

              - A memory address register to keep track of where the next
                transfer is to go to/come from.  This register must be
                incremented after each transfer.

              - A counter to keep track of the number of data items transferred.
                Typically, the DMA interface will interrupt the CPU when this
                count reaches 0.

           - Often, a third register is needed.  If the device itself is
             addressable (as would be true in the case of a disk, say), then
             the interface also needs a DEVICE ADDRESS register to keep track
             of the location on the device to/from which the transfers occur.

             Example: (for 8-bit CPU with 16 bit address):

                                                     D Bus
                                                    ^^^^^^^^
                                                    ||||||||
                                                 --------------
                                                 | Tri-state  |
                                                 | buffers    |<- TS Enable
                                                 |____________|
                         A Bus                      ||||||||
                     ^^^^^^^^^^^^^^^^               |===============> To device
                     ||||||||||||||||               ||||||||
                ---------------------------  -----------------------
TS enable  ---> | 16 bit address register |  | 8 bit data register |
Load lower ---> | with tri-state outputs  |  | with latched inputs |<--Load
Load upper ---> |                         |  | and 2 sets of       |
Increment  ---> |                         |  | tri-state outputs   |
                ---------------------------  -----------------------
                   ^^^^^^^^   ^^^^^^^^              ^^^^^^^^    
                   ||||||||   ||||||||              ||||||||
                    D Bus      D Bus                ||||||||
To internal __________                       -----------------------
control              |                       |   MUX               |<- In/Out
                -----------                  -----------------------
                | Counter |<-Load               ^^^^^^^^ ^^^^^^^^
                -----------                     |||||||| ||||||||
                  ^^^^^^^^                      D Bus     Device
                  ||||||||
                   D Bus

        c. When a DMA device is in use, the only task of the software is to
           load up the registers in the interface and start it doing the
           transfer.

        d. Because DMA interfaces are complex, DMA is typically used only in
           cases where device speeds make it necessary.

        e. One other issue arises in conjunction with DMA interfaces:
           cycle-stealing versus burst mode transfer:

           i. Most interfaces are designed so that they have to 
              go through the process of requesting the bus and waiting for
              acknowledgement for EACH data transfer done.  This is fine,
              so long as the data rate of the device is low enough.  This
              mode is called CYCLE-STEALING, because each transfer "steals"
              one memory cycle from the CPU.

          ii. For very fast devices (e.g. some disks), there might not be
              enough time to allow the interface to request and wait for the
              bus for each transfer.  Such interfaces may work in a BURST MODE
              in which, once the interface has control of the bus, it keeps
              it until a whole block of transfers is done - i.e. it goes through
              repeated memory cycles, but holds the bus request active the
              whole time without ever releasing it.

      5. Finally, we mention the use of separate, dedicated IO processors to
         almost totally remove IO responsibility from the CPU.  For example,
         we have already mentioned the IBM 370 configuration.  Here,
         communication between the CPU and the Pio is generally by means of
         shared memory plus the ability for each processor to interrupt the
         other.
  
      6. We can contrast these approaches in terms of the extent to which they
         allow for the OVERLAPPING of computation and IO:

         Busy-waiting           No overlap possible: computation halts while
                                IO is being done

         Polling                Possibily some overlap possible

         Interrupt-driven       Considerable overlap - but CPU must still pause
                                computation to handle each byte transferred

         DMA and Pio            Total overlap - CPU is only involved in
                                initiating the request.

IV. Interrupts on the Z80
                                                           ___     ___   ___
   A. We have noted that the Z80 has two interrupt inputs: INT and NMI.  INT
      is the one typically used for ordinary IO activity.

      1. The Z80 contains an internal flip-flop called IFF1 which allows
         software to control the recognition of interrupts.  The following
         gating structure is present on the chip itself:
        ___             _____
        INT     ------o|     \
                       |      )---- interrupt recognition circuits.
        IFF1    -------|_____/
                                                                            ___
         Note the effect of this: when IFF1 is clear, an external signal on INT
         is simply ignored by the CPU until IFF1 is set.

      2. The setting and clearing of IFF1 is controlled by two instructions:
         EI (enable interrupts) and DI (disable interrupts.)  Also, IFF1 is
         automatically cleared under certain circumstances:

         a. When the CPU is reset.  Any software that wants to use interrupts
            must therefore explicitly enable them as part of its initialization
            code.

         b. When an interrupt is acknowledged.  This means that the routine
            which responds to an interrupt must re-enable interrupts, typically
            just before it exits. 

         c. When a non-maskable interrupt is received.  In this case, the
            hardware stores the current value of IFF1 in a second flip-flop
            IFF2 and clears IFF1.  When the non-maskable interrupt handler
            software exits, it may use the RETN instruction to return from
            the interrupt and reset IFF1 to the saved value in IFF2.
                                                                          ___
      3. The Z80 has three different interrupt modes that determine how an INT
         interrupt (if enabled) is actually handled.  These modes are called
         mode 0, mode 1, and mode2.  Software may set the mode by using the
         instructions IM 0, IM 1, and IM 2.  (The default mode at reset is 0.)

         a. Mode 0 is also called the 8080 mode, since in this mode the Z80
            behaves like an 8080.

         b. Mode 1 provides a very simple mechanism for simple systems.  It
            requires minimal hardware in the interface.

         c. Mode 2 provides a more sophisticated mechanism for more complex
            systems.

         d. We will discuss the modes in the order 1, 0, then 2.

   B. Mode 1 interrupts on the Z80.
                                                                           ___
      1. When interrupt mode 1 is selected, the Z80 responds to an enabled INT
         interrupt as if the program had executed an RST 38 instruction.
         That is:

         a. The address of the instruction that was about to be fetched when
            the interrupt was accepted is instead pushed on the system stack.

         b. The PC is loaded with 38H.

         c. Note that this is the same as a CALL 38H.

         d. It is assumed that memory locations 38 and following contain a
            routine to service the interrupt.  Often, the code here will
            simply jump to a service routine elsewhere in memory.

         e. Note that 38H is within the address space of the MPF-I monitor
            ROM.  The MPF-I code beginning at 38H is the following:

                PUSH    HL
                LD      HL,(FF01)
                EX      (SP),HL
                RET

            - The effect of the first three instructions is to push the
              word located at addresses FF01..FF02 (in RAM) onto the stack,
              without destroying any register.  (HL is used, but its original
              value is restored by the exchange.)

            - The RET pops this value and transfers control to it.

            - The MPF-I user can therefore route a mode 1 interrupt to a
              service routine anywhere in memory by placing the address of
              the routine in RAM locations FF01..FF02.  During initialization,
              the MPF-I monitor loads these locations with the address of
              its breakpoint service routine, so a mode 1 interrupt would be
              treated as a program breakpoint in the absence of user
              alteration of these locations.

      2. In addition to the above, the Z80 also clears IFF1, disabling further
         interrupts until the software executes EI again.

      3. The interrupt handler software that is invoked by the interrupt should
         behave as follows:

         a. It must save and later restore any registers it uses, to avoid
            messing up the program that was interrupted.  Often, this is done
            by using EX AF, AF' and EXX both at the start and finish of the
            routine; but this may only be done if no other software uses these
            instructions.

         b. It can return to the program that was interrupted as follows:

                        EI
                        RETI

            (Note: the RETI is essentially the same as an RET, and an RET could
             be used instead in many cases.  But RETI does have a purpose in 
             connection with certain special devices to be discussed later.)

      4. A mode 1 interrupt does not expect the hardware to do anything
         special in response to the acknowledge signal from the CPU; thus,
         it imposes the least hardware requirements on the system.

      5. A key limitation of mode 1 is that if the system has more than one
         device capable of generating an interrupt, then there is no way for
         the service routine to know "who did it", short of somehow polling all
         of the devices to see which one(s) have/has its/their ready flag set. 
         On a system that has 8 or fewer devices capable of interrupting, a 
         fairly elegant scheme can be used.  (This also illustrates how hardware
         can be constructed to allow the software to selectively disable 
         interrupts from certain devices at certain times.)

                TRANSPARENCY FROM TERRELL PAGE 347

   C. Mode 0 interrupts on the Z80

      1. Mode 0 on the Z80 imitates the interrupt-handling provisions of the
         8080.   When an interrupt is accepted in mode 0, the CPU does an
         instruction fetch cycle, not from memory but from the IO bus.  The
         interrupting device is expected to recognize the interrupt
                                     ____     __
         acknowledgement on the bus (IORQ and M1 both low), placing any 1-byte
         opcode on the data bus.  The CPU treats this as an instruction to
         execute, just as if it had been fetched from memory.

      2. In addition to the above, the Z80 also clears its interrupt-enable,
         of course.

      3. While any 1-byte opcode may be used as the response from the device,
         the most common choice will be one of the 8 RST instructions
         (opcodes C7, CF, D7, DF, E7, EF, F7 and FF).  Each of these
         instructions behaves as follows:

         a. Push the current PC on the system stack.

         b. Transfer control to one of the eight memory locations 0, 8, 10, 18,
            20, 28, 30, or 38.

         c. In other words, these instructions behave like a subroutine call;
            but the subroutine address is implicit in the opcode rather than
            requiring two more bytes in the instruction.

         d. These instructions were put in the 8080 instruction set primarily
            for use as interrupt-acknowledgement responses by devices.  Due
            to the requirement that the response be only 1 byte long, ordinary
            CALL instructions could not be used.
           
      4. Mode 0, then allows for 8 different devices to each force the Z80 to
         execute an appropriate service routine.

         Example: A system with a keyboard, display, and disk, each
                  capable of generating an interrupt.  The system designer
                  decides to put the first few instructions of the service
                  routines in the following locations:

                        10      - service routine for keyboard
                        20      - service routine for display
                        30      - service routine for disk

                  When the keyboard generates an interrupt and sees the
                  acknowledge coming back, it will put the opcode for
                  RST 10 (D7) on the data bus.

      5. Mode 0 is not terribly useful on the MPF-I, because the monitor
         uses the reserved locations for the RST instructions for other 
         purposes, except for location 38 (as discussed above).  Thus, only
         RST 38 is useable.

      6. As an aside, note that mode 1 can be thought of as a special case of
         mode 0, in which the bus is not actually read; instead, an FF
         (RST 38) is used as the op-code to execute.

   D. The most flexible Z80 interrupt mode is mode 2.  Mode 2 provides for
      VECTORED interrupts.

      1. In a vectored interrupt scheme, the device supplies the address of
         a memory location which in turn contains the address of a service
         routine for the device.

      2. Since the Z80 data bus is only 8 bits wide and an address is 16
         bits, the vector address is specified as follows:

         a. The Z80 contains an internal I register, which must be pre-set
            by the programmer to contain the high order byte of the
            address of the interrupt vectors.  Two instructions allow
            access to the I register:

                LD      I,A
                LD      A,I

         b. The device, when responding to an interrupt acknowledge, will
            place the low order byte of the vector address on the bus.
            (This value must always be even.)

         c. Note that the combination I+D-Bus is not the address of the
            actual service routine to execute, but rather the address of
            a memory location containing the address of the service
            routine.

      3. As with the other modes, the Z80 pushes the PC and disables
         further interrupts before loading the PC with the value specified
         by the vector.

      4. Example: A system with keyboard, display, and disk.  The interrupt
         handlers for these devices begin at

                1234
                2017
                3029

         respectively.  The system designer decides to put the interrupt
         vector at locations 8000 on up.

         a. At system startup, the I register will be loaded with 80.
        
         b. Memory locations 8000..8005 will contain 34 12 17 20 39 30
            (remember byte-reversed format.)

         c. The keyboard interface will respond to interrupt acknowledge
            by placing 00 on the bus; the display 02, and the disk 04.

      5. With vectored interrupts, up to 128 different devices can be
         accomodated.

   E. NMI on the Z80
                                                                   ___
      1. We now can say something about how the Z80 responds to an NMI.  

      2. No interaction with external hardware is involved, and no acknowledge
         signal is put on the bus.

      3. The Z80 does the following:

         a. Push the address of the next instruction.

         b. Put 66H in the PC.

         c. Save IFF1 into IFF2 and clear IFF1.

      4. NMI interrupt handling routines should terminate with RETN, which
         restores IFF1 from IFF2.

Copyright ©1999 - Russell C. Bjork