CS222 Lecture: Course Introduction; Architecture and Organization;
               Performance                                              1/11/99

Objectives:

1. Introduce course, requirements
2. Tie this course into CS221
3. Review levels of structure and major components of a computer
4. Introduce concepts of architecture and organization and relate to structure
   of this course.
5. Introduce the notion of performance as a driving force in this field.
6. Review basic VonNeumann machine architecture
7. Introduce concepts of machine and assembly language programming

I. Preliminaries: Roll, Syllabus
-  -------------  ----  --------

II. Course Introduction
--  ------ ------------

   A. Last semester, we started off CS221 by observing that the complexity of
      computer systems requires us to study them at various levels of
      abstraction.  Can anyone recall what those levels are?

      ASK
      
      1. The user level: the computer system performs certain tasks in 
         response to certain commands (e.g. EDIT).  To the user, it appears
         as if the system "understands" a command language such as the DCL
         command language of the VAX, or the keypad commands of EDT, or the
         mouse clicks of a graphical interface.

      2. The higher-level language programming level: each application is
         programmed using the statements of a higher-level language such as
         Pascal or C.  A single user-level command is thus implemented by
         100's or 1000's of statements in a programming language.  To
         the programmer, it appears as if the system "understands" the
         particular higher-level language he or she is programming in.

      3. The machine language programming level: as delivered by the 
         manufacturer, a given computer system has certain primative
         components and capabilities:

         a. A memory system, capable of storing and retrieving information
            in fixed-size units known as "bytes" or "words".

         b. An input-output system, capable of transferring information
            between memory and some number of devices such as keyboards,
            screens, disks etc.

         c. A CPU, capable of performing primative operations such as
            addition, subtraction, comparison, etc., and also capable of
            controlling the other two systems.

            i. The CPU is designed to respond to a set of basic machine
               language instructions, which is specific to a given type of
               CPU.  (E.g. the machine language for the VAX is vastly
               different from that of the Intel CPU's used in PC's or the
               Motorola 68K and Power PC CPU's used in MacIntoshes.)

           ii. The compiler for a higher level language translates that
               language into the native machine language of the underlying
               machine.  

              - The same program must be translated into different machine
                languages to run on different machines; thus, each type of
                machine must have its own set of compilers.

              - Regardless of the HLL used, the machine code generated by the 
                compiler for a given machine will be in the same native
                machine language of that machine.

              - On the VAX, the .OBJ and .EXE files produced by the compiler
                and linker contain machine language binary code.

         At this level, it appears that the system "understands" its
         machine language.

      4. The hardware design level: Ultimately, computer systems are built as
         interconnections of hardware devices known as gates, flip-flops, etc.,
         combined to form registers and busses.  These, in turn, are realized
         from primitive electronic building blocks known as transistors,
         resistors, capacitors etc.  The resultant system is capable of
         directly executing the instructions comprising the machine language 
         of the system.

      5. The solid-state physics level: current computers are fabricated from
         materials such as silicon that have been chemically "doped" to alter
         their electronic properties.   Transistors, resistors, and capacitors
         are realized by utilizing the properties of these semiconductor
         materials.  (Of course, future computers may use some other technology
         such as optics.)

   Summary:

                User Level              User commands, Application software
                -----------------------------------------------------------
                HLL Programming level   Statements in Pascal, C, etc.
                -----------------------------------------------------------
                Machine language level  Machine language instructions
                -----------------------------------------------------------
                Hardware design level   Gates, flip-flops etc.
                -----------------------------------------------------------
                Solid-state physics     Physical properties of semiconductors

      In CS221 we spent most of our time at the 4th level - hardware design.
      In this course, we will consider this level further, but will also study
      the third level in detail.  (In fact, we will study the 3rd level
      first, and then go back to the 4th level to see how the capabilities we
      have studied are implemented there.)

   B. Note the course schedule in the syllabus: the first half will focus on
      machine and assembly language programming (the third level in our
      hierarchy), and the second half on how this level of abstraction is 
      implemented at the hardware level (the fourth level).

   C. One way to view how these two emphases of the course relate is in
      terms of two words that are often used interchangeably, but which
      really have distinct technical meanings: COMPUTER ARCHITECTURE and
      COMPUTER ORGANIZATION.  (Note title of course).

       1. Computer architecture is concerned with the FUNCTIONAL CHARACTERISTICS
          of a computer system - as seen by the assembly language programmer.

          One of the major topics of the first half of the course will be 
          the architecture of the VAX, and we will also devote some time to
          the architecture of the MIPS CPU's used in our workstations and
          to several other architectures.

       2. Computer organization is concerned with how an architecture can be
          REALIZED: the logical arrangement of various component parts to 
          produce an overall system to accomplish certain design goals.

          a. The technology used to build the system components.

          b. The component parts themselves

          c. Their interconnection

          d. Strategies for improving performance.

       3. Note that a given architecture may be realized by many different
          organizations.  The VAX is a good example.

          a. At one time, our main academic system was a VAX-11/780 - which 
             was the first VAX model developed.  It occupied four good size 
             cabinets - each big enough to hold a person.  In particular, the 
             VAX instruction set was realized in this machine by a CPU 
             implemented using 20 circuit boards, each about 8" x 15", using 
             third-generation integrated-circuit technology.

          b. Each of our current VAX systems sits in a small box not much bigger
             than a PC.  In it, the VAX instruction set is realized by a single 
             1" square chip, using fourth-generation CMOS VLSI technology.  
             Yet this CPU is many times faster than the 11/780!

          c. Further, the two systems have different kinds of internal busses.
             As a result, the two systems use very different kinds of memory 
             expansion boards.  Though they can use the same IO devices, a 
             different kind of controller board is needed for each machine.

          d. Nonetheless, the two machines have the same architecture.  An
             assembly language programmer could not tell the difference
             between them.

       4. Computer architectures tend to be rather stable.

          a. E.g. the VAX architecture has been in use essentially unchanged 
             since 1981, and IBM's basic mainframe architecture has lasted even 
             longer.  The 80x86 architecture used in Wintel PC's has its roots 
             in an architecture developed in the late 1970's, with a major 
             revision in the mid 1980's and minor revisions since then.  

          b. A major factor in the stability of architecture is the need to be
             able to continue to use existing software.   Potential changes
             to an architecture have to be weighed carefully in terms of their
             impact on existing software, and adoption of an altogether new
             architecture comes at a hugh software development cost - which
             is why we are still using architectures developed in the 1970's.

       5. On the other hand, computer organization tends to evolve quickly with 
          changes in technology - each new model of a given system will 
          typically have different organizational features from its predecessors
          (though some aspects will be common, too.)  The driving factor here
          is performance; and it is common for one or more new implementations 
          of a popular architecture to be developed each year.

   D. A fair question to ask at this point is "why should I need to learn
      about computer architecture and organization, given that I'm not planning
      to be a computer hardware designer, and that higher level language
      compilers insulate the software I write from the details of the
      hardware on which it is running?"

      1. An understanding of computer architecture is important for a number
         of reasons:

         a. Although modern compilers hide the underlying hardware architecture
            from the higher-level-language programmer, it is still useful to
            have some sense of what is going on "under the hood" 

            i. Cf the benefit of learning Greek for NT studies.

           ii. There will be times when one has to look at what is happening at 
               the machine language level to find an obscure error in a program.

         b. Further, familiarity with the underlying architecture is necessary
            for developing and maintaining some kinds of software:

            i. compilers

           ii. operating systems and operating system components (such as 
               device drivers)

          iii. embedded systems.

         c. In order to understand various performance-improvement techniques,
            one must have some understanding of the functionality whose
            performance they are improving.

      2. Likewise, an understanding of computer organization is important for
         a number of reasons:

         a. Intelligent purchase decisions - seeing beyond the "hype" to
            understand what the real impact of various features on performance
            is.

         b. Making effective use of high performance systems - sometimes the
            way data and code is structured can prevent efficient use of
            mechanisms designed to improve performance.

         c. Increasingly, compilers that produce code for high performance
            systems have to incorporate knowledge as to how the code is
            actually going to be executed by the underlying hardware -
            especially when the CPU uses techniques like pipelining and 
            out-of-order execution to maximize performance.

III. Performance
---  -----------

   A. We have noted that computer organization is largely driven by performance
      issues.  A CPU manufacturer cannot be content to keep selling the same
      basic design, but must continually be developing better designs in order
      to remain competitive.

   B. This raises an important issue: how do we measure the performance of a
      computer system, and how do we compare the performance of different
      systems?

      1. Based on general reading you do in trade publications, what performance
         metrics do manufacturers tend to advertise?

         ASK

      2. The authors of our textbook point out that the only really legitimate
         way to measure performance is TIME.

         a. Two different time-related metrics are important

            i. Response time - how long does it take a given system to complete
               a given task - start to finish?

           ii. Throughput - how many tasks of a given kind can a given system
               complete in a unit of time? 

          iii. The former metric is of most importance in a single-user system
               (e.g. a personal computer or workstation).  The latter metric 
               may be more important for multi-user systems (e.g. time-shared 
               systems, servers).

         b. The time needed to complete a given task consists of several
            components:

            i. CPU time (computation)

           ii. I/O time (e.g. time spent accessing a disk or transmitting
               information over a network)

          iii. (On a multi-user system) Time spent waiting for a resource that 
               is in use by another user 

         c. Response time can be improved by speeding up the CPU, speeding up
            I/O, or both.  These measures will also improve throughput; in
            addition, throughput can be improved by more effective overlapping
            of the use of various resources (e.g. doing computation on the
            CPU for one user while simultaneously performing disk accesses
            for other users on the various disks.)

   C. Most of the performance improving techniques we will consider focus on
      speeding up computation - i.e. reducing the amount of CPU time needed
      to perform a given task.  This reflects the fact that this component of
      overall time is the more easily improved - I/O operations tend to be
      mechanical in nature (e.g. moving disk heads) and are therefore less 
      easily speeded up.

   D. The CPU time needed to perform a given task is given by the following
      equation:

      number of instructions       average number of         time for one
      that must be executed     X  clock cycles needed    X  clock cycle 
      to perform the task          per instruction (CPI)

      Since the latter term is the reciprocal of the clock rate, this can
      also be written:

      number of instructions       average number of
      that must be executed     X  clock cycles needed
      to perform the task          per instruction (CPI)
      ---------------------------------------------------
                clock rate

      Example: A certain task requires the execution of 1 million instructions,
               each requiring an average of three clock cycles.  On a 300 MHz
               clock system, this task will take:

        1 million instructions x 3 clocks/instruction
        ---------------------------------------------  = .01 second = 10 ms
                300 million clocks/second 

      1. This equation suggests three basic ways that performance on a given
         task might be improved:

         a. Reduce the total number of instructions that must be executed

            i. Use a better algorithm (a software issue, not a hardware one)

           ii. Use a CPU with a more powerful instruction set (e.g. a VAX 
               instruction might perform a task that would take several
               instructions on a MIPS CPU).

         b. Reduce the average number of clocks needed to execute an instruction

            i. Better implementation of the instruction in hardware

           ii. Use of various forms parallelism to allow the CPU to be working
               on different portions of several different instructions at the
               same time.  This doesn't reduce the total number of clocks needed
               to execute one instruction, but it does reduce the total number 
               of clocks needed to execute a series of instructions and hence 
               the effective average number of clocks needed per instruction.

         c. Increase the clock rate

            i. Use of improved technology - e.g. smaller basic feature sizes on 
               a chip result in lower capacitances and inductances, allowing 
               faster clock rates.

           ii. The time needed for a clock cycle is determined by the amount of 
               time needed for a signal to propagate down the longest internal 
               data path in the CPU.  Using internal data paths with fewer gates
               allows a shorter clock cycle and a higher clock rate.  (E.g. 
               using carry lookahead instead of ripple carry in an adder uses 
               more gates overall, but results in shorter individual data 
               paths).

      2. Unfortunately, these three components interact with each other, so that
         improving one dimension of performance may come at the cost of reducing
         performance in another direction.

         Example: 

         a. Until the 1980's, a basic trend in CPU design was toward
            increasingly powerful instruction sets - i.e. increasing the amount 
            of computation that a single instruction could perform.  (In many 
            ways, the VAX architecture represents the high water mark of this 
            trend.)

         b. In the 1980's, an alternate approach emerged that focussed on using
            simpler instructions that lend themselves to faster clock rates and
            a much higher level of intra-instruction parallelism.  (The MIPS
            architecture is representative of this trend.)

         c. Proponents of the latter trend coined the name Reduced Instruction 
            Set Computer (RISC) to describe this approach.  The earlier approach
            then was given the name Complex Instruction Set Computer (CISC).

      3. Further, measures to improve CPU performance also impact other system
         components.

         a. Since each instruction executed by the CPU involves at least one
            reference to memory (to fetch the instruction), improving CPU speed
            necessitates improving memory system speed as well.  However, basic
            DRAM memory chip technology has not improved significantly (access
            times remain around 60 ns), so memory systems have had to 
            incorporate sophisticated cache memory techniques to keep up with
            CPU speeds.

         b. Increased CPU speed typically results in increased power
            consumption, which impacts power supplies, CPU cooling, and the
            ability to run a system off rechargeable batteries.

   E. Further, the equation we have been discussing does not lend itself
      to direct calculation of the time needed for a given task, so other
      techniques must be used to actually measure performance.

      1. The clock rate is the one number that is easily obtained.  The
         total number of instructions needed to perform a given task
         could be calculated from the program code, but the computation would
         be laborious.  And determination of CPI would be very difficult, since
         it may depend on:

         a. The exact nature of the instructions executed (on CISC's, some 
            instructions require more clocks than others; on RISC's, average
            CPI is affected by program flow).

         b. The interaction between the CPU and the memory system.

      2. For similar reasons, one cannot simply say that if a given program
         takes time t on a system with a given clock rate, it will take, say,
         time t/2 on a system whose clock is twice as fast.

         a. The improvement could be much less than the clock rate ratio, if
            the rest of the system (e.g. memory, I/O) is not speeded up
            proportionally.

         b. Sometimes, the performance improvement turns out to be greater than 
            that implied by the clock rate ratio, because other components
            of the equation (e.g. CPI) have been improved as well.
            
      3. In practice, speeds of various systems are typically compared by
         the use of BENCHMARKS (individual programs or sets of programs.)
         The book discusses a number of reasons why this approach is fraught
         with pitfalls, ranging from statistical issues to the possibility
         of manufacturers "rigging" their product to do well on known
         benchmarks.

IV. Review Of Basic Von Neumann Machine Architecture
--  ------ -- ----- --- ------- ------- ------------

   A. Last semester we saw that modern computer systems are based on a
      basic architecture frequently known as "the Von Neumann machine". 

      1. In this architecture there are five fundamental kinds of building 
         blocks.  Can anyone recall what they are?

         ASK

         a. Memory

         b. Arithmetic-logic Unit (ALU)
   
         c. Control
   
         d. Input
   
         e. Output
   
      2. In CS221 we looked in detail at memory elements (registers, memory 
         chips, and various magnetic media) and the arithmetic-logic unit (e.g. 
         shifts, hardware realizations of arithmetic operations.)  In this 
         course, we will look at the other building blocks - especially the 
         control element, which is responsible for interpreting machine language
         programs.  We will also look at how these building blocks are 
         interconnected to one another. (Overall system structure, plus the IO 
         and memory subsystems.)

         Note: In most modern computers the ALU and Control elements are 
               combined into a single building block known as the CPU.  However,
               for the purpose of understanding how the CPU works, it is 
               helpful to consider reach part separately.

      3. The basic VonNeumann machine architecture can be pictured as follows:

          - - - - - - - CONTROL - - - - - -   Solid lines: flow of data
          |      |          ^ < -|         |  Dashed lines: flow of control
                            |               
          v      |          |    |         v
        INPUT     - - > MEMORY          OUTPUT
          |      |       ^  |    |         ^
          |              |  |              |
          |      |- - >  |  v - -|         |
          |-----------> A.L.U. ------------|
  
      4. The execution cycle of this machine could be described as follows:

            while not halted do
              begin
                fetch an instruction from memory  (Symbolically: IR <- M[PC])
                update program counter            (Symbolically: PC <- PC + 1)
                decode instruction
                execute instruction
              end

      5. We will now briefly review the various individual building blocks.

   B. We saw in CS221 that a conventional memory system can be viewed as an 
      array of addressable units or cells, each of which has an unsigned 
      integer address in the range 0..# of cells - 1.  This system interfaces 
      to the rest of the computer through two special registers called the 
      memory address register (MAR) and the memory buffer register (MBR).

      1. The memory system is capable of performing two primative operations:

         a. Reading the contents of a cell (or sometimes two or more adjacent
            cells), delivering the data stored there to the rest of the
            system (while leaving the copy in memory unchanged.)

         b. Writing a new value into a cell (or sometimes two or more
            adjacent ones.)

      2. To access the memory, the control unit arranges for an address to be
         placed in the MAR.  If the operation is to be a write into the memory,
         it also arranges for data to be placed in the MBR.  Then it issues
         a command to the memory to do the required operation.  (In the case
         of a read, the data read will be placed in the MBR upon completion.)

      3. A fundamental concept is the notion of a "memory cell".

         a. The basic unit of information storage in a computer is, of course, 
            the bit.  But since a single bit is too small a unit of information 
            for most purposes, memories are organized on the basis of larger 
            units each consisting of some fixed number of bits.

         b. In the early days of computing, computers were usually 
            specialized as either "business" or "scientific" machines.

            i. On a typical "business" machine the unit of storage in memory 
               was the character, represented by a code requiring 6-8 bits.

           ii. On a typical "scientific" machine the unit of storage was the 
               word, involving typically around 24-60 bits.  Later, when 
               minicomputers were introduced, a word size of 16 became common,
               and one minicomputer even had a 12-bit word.  

         c. The IBM 360 (early 60's) introduced a new concept: multiple 
            memory organizations in a single machine:

            i. The primary organization of memory was by bytes of 8 bits.

           ii. Two adjacent bytes formed a halfword (16 bits), four adjacent
               bytes formed a word (32 bits), and 8 adjacent bytes formed a
               doubleword.

          iii. Memory was byte-addressable.  The address of larger units was
               specified by giving the address of its lowest byte.  Thus
               halfwords always had even addresses; words had addresses that
               were multiples of 4 etc.

         d. This organization has been adopted by many modern machines,
            including the VAX and MIPS.  However, the terminology varies -
            e.g. on the VAX a "word" is 16 bits and a "longword" is 32 bits;
            on MIPS a "word" is 32 bits and 16 bits is called a "halfword",
            as on the 360.   

      4. In addition to specifying the basic unit of memory, we also talk about
         the address space of a machine as representing the range of possible
         memory addresses.  This is basically a function of how many bits are
         used in the formation of a memory address (and thus the size of the
         MAR).  Examples:

         a. IBM 360/370 - 24 bit address -> 16 Megabytes.

         b. PDP-11 - 16 bit address -> 64 K bytes

         c. VAX - 32 bit address -> 4 gigabytes potential - but actual
            implementations of the VAX architecture use a somewhat smaller
            address size to keep costs down.

         d. DEC Alpha (successor to VAX) and later members of the MIPS family 
            - 64 bit address -> 4 x 10e18 bytes potential (only partially 
            supported in terms of actual memory, of course).

   C. The ALU is a portion of the system where there is considerable
      architectural diversity.

      1. All ALU's consist of three basic types of building block:

         a. Registers - special high-speed memory cells for storing items that
            are being worked on.

         b. Functional elements - e.g. adders, shifters, comparators etc.

         c. Data paths connecting (1) and (2), as well as external connections
            to the memory and I/O subsystems.

         d. Not all of these are directly visible to the assembly language
            programmer.  Typically, the registers and the functional elements 
            are the most noticeable.

      2. The following simplified block diagram shows how a typical ALU might 
         be organized:

                                To memory/IO bus(ses)
                                        ^ v
                                        |_|______________________________
                                                                        |
                ______________________________________ Result bus       |
                |                 |        |         |                  |
                |                 |        |       Bitwise              |
                |               Adder   Shifter    Logic                |
                |                ^ ^       ^       Functions            |
                |                | |       |         ^ ^                ^
                |                |_|_______|_________| |                |
                |                | |___________________|                v
                |                |                     |                |
                |       Operand 1|_______   ___________| Operand 2      |
                |       Bus              ^  ^            Bus            |
                |                        |  |                           |
                |                    Register Set                       |
                |                    (including                         | 
                |                         MAR  _________________________|
                |                         MBR) _________________________|
                |                          ^
                |                          |
                |__________________________|

         a. An instruction to add the contents of a memory location to a
            register could be executed as follows:

            i. The address of the appropriate cell in memory would be placed
               in the MAR.

           ii. The memory would be instructed to read that cell and place its
               contents in the MBR.

          iii. The contents of the appropriate register would be placed on
               the Operand 1 Bus, and the contents of the MBR would be placed
               on the Operand 2 Bus (or vice versa).  The adder would sum the 
               two numbers and place the result on the Result Bus, and
               the value on the Result Bus would be stored back into the
               appropriate register.

         b. Note that some instructions would not use all the busses.  For
            example, an instruction to copy a memory location into a
            register would not use the Operand 1 bus; the adder would be
            told to route the Operand 2 bus value straight through (in
            effect by adding zero to it.)

   D. The I/O subsystem is is the area where there is the greatest diversity 
      between systems.  Almost anything can be an IO device for a computer - 
      from a terminal to an automobile or a power plant!

      1. Broadly speaking, IO devices fall into two categories:
   
         a. Sequential devices can process data only in a certain fixed order.
            This category includes devices like terminals and printers.

         b. Direct access devices allow the user to read/write a specific
            location on the device.  Disks are the major example of such a
            device.
   
         c. Some devices - such as magnetic tape - are hybrids: they are
            basically sequential, but have some direct access capability.
   
      2. Each IO device connects to the system through a CONTROLLER or
         INTERFACE, that in turn connects to a system bus.  (Often, one
         controller may service several devices of the same type to keep
         costs down.)  For example, the following is a simplified version of
         the basic configuration of our old Micro-VAX (CHARITY):

        Terminal---                     
        Terminal---  Terminal           
        Terminal---  controller ________________ Disk        ----- Disk
        Terminal---                     |         Controller ----- Disk
        Terminal---                     |                    ----- Disk
        Terminal---                     |
        Terminal---                     |________ Tape
                                        |         Controller ----- Tape
        Ethernet---  Network    --------|
                     controller         |
                                  System IO Bus

      3. Programming routines that access IO devices is quite complicated,
         because the code is very device specific, and because it is often
         necessary to deal with various kinds of error conditions that can
         arise.  For this reason, most computer systems are used with an
         OPERATING SYSTEM (such as VMS on the VAX or MS/DOS on PC's) that
         contains routines (known as device drivers) for each kind of device
         on the system.  For this reason, we will say only a little about
         accessing IO devices in this course.

   E. The control unit is the part of the system that is responsible for
      carrying out the basic Von Neumann machine "fetch-execute" cycle.

      1. To facilitate this, the control unit contains two special registers:

         a. An instruction register (IR) to hold the instruction that is
            currently being worked on.

         b. A program counter (PC) to hold the address of the next instruction
            to execute.  This must be updated each time through the fetch
            execute cycle.

            i. Typically, this is done by adding the length of each
               instruction to the PC when it is fetched - i.e. instructions
               occupy successive locations in memory.  (This is analogous
               to the way statements in a Pascal program are executed
               successively, one after another.

           ii. Some instructions serve to alter the PC to change this flow
               of control.  They are analogous in function to the goto and
               procedure call instructions of Pascal (which are, of course,
               derived from them.)

      2. What does an instruction look like?
  
         a. Normally,it consists of an OPERATION CODE (op code) that tells what 
            is to be done, plus some number of OPERAND SPECIFIERS that 
            specify the operands to be used.
   
         b. The set of op-codes that can be used comprises the INSTRUCTION SET
            of a particular machine.  A typical instruction set might include
            operations like the following:
   
            i. Data movement operations
   
           ii. Arithmetic operations: add, sub, mul, div - often with different
               versions for different data types (e.g. integer, floating point,
               possibly with different operand sizes.)
   
          iii. Bitwise logical operations - bitwise and, or, xor.
   
           iv. Arithmetic comparison operations
   
            v. Conditional and unconditional branch instructions (gotos) and
               procedure call instructions
   
           vi. Etc.
   
         c. The operand specifiers often allow a variety of ADDRESSING MODES -
            e.g. an operand specifier may specify that a certain constant value
            is to be used, or that the contents of a certain register are to
            be used, or that the operand is to be found in a certain memory cell.
   
            Example: Consider the following Pascal program fragment
   
            var
                i: integer;
                p: ^integer;
   
            ...
               ... i + p^ + 3
   
            At the machine language level, three different addressing modes 
            could be used for the three operands (if available on the machine 
            in use.)
   
            i. For i - direct addressing.  The instruction would contain the 
               address of i, which contains the value to use.
   
           ii. For p^ - indirect addressing.  The instruction would contain the
               address of p, which in turn contains the ADDRESS of the value to 
               use.  To access the data, the CPU would make two trips to memory 
               - one to get the address of the data item (contents of p) and one
               the get the data value (contents of p^).
   
          iii. For 3 - immediate addressing.  The instruction would contain the
               actual value 3.

      3. The instruction set of a given machine constitutes a language - the
         machine language of that machine - and the control unit is an
         interpreter for that language.  Each different CPU architecture is
         characterized by a distinctive machine language.

         Example: Consider the task of adding one to the contents of an
                  integer variable X that happens to be stored in memory
                  cell 42.  The following is the machine language code for
                  this on various machines (in hexadecimal)

         VAX            0000002A 9F D6

         80x86          FF 35 0000002A

         MIPS (minimum of three instructions - recall that MIPS is a RISC!)

                        9402002A
                        20420001
                        AC02002A

         (Actually, some NOP instructions would likely be needed to account
          for delays needed by pipelining)

      4. Of course, if one needs to write programs at this level, one
         seldom programs directly in machine language.  Instead, one typically
         uses a symbolic language known as ASSEMBLY LANGUAGE.

         Example: Assembly language equivalents of the above:

         VAX            INCL    X

         80x86          INC     X       

         MIPS (three instructions - recall that MIPS is a RISC!)

                        lw      $2, X
                        addi    $2, $2, 1
                        sw      $2, X
         Note:

         a. Mnemonics are used in place of numeric op-codes

         b. The ability use symbolic names for storage locations.

         c. Generally: one line of assembly language per machine language
            instruction.

         d. Translation into machine language is a straight-forward mechanical
            process done by a program called an ASSEMBLER.

      5. Even though we will focus on a particular assembly language (that of
         the VAX), it is a goal of this course that you should be able to 
         transfer what you have learned to another machine.  All assembly 
         languages are similar in principle, though different in form.

   F. Conclusion

      1. This ends our discussion of the general structure of Von Neumann
         computers.
   
      2. Beginning with the next lecture, we will begin considering in detail
         the architecture of the VAX, with some comparative looks at MIPS.

Copyright ©1999 - Russell C. Bjork