CS311 Lecture: Representing Information in Binary;      Last revised 8/27/03
               Octal and Hexadecimal Shorthands
Objectives:

1. To review binary representation for unsigned integers
2. To introduce binary represenations for signed integers
3. To introduce the IEEE 754 representation floating point representation
4. To discuss the basic process of doing floating point arithmetic
5. To overview binary representations for characters, sounds, graphics
6. To introduce octal and hexadecimal shorthands

Materials:

1. Sample Editor program
2. Transparency summarizing IEEE floating point formats
3. RGB Demo Applet

I. Introduction
-  ------------

   A. One of the key suggestions in Von Neumann's landmark paper was that
      computer systems should be based on the binary system, using just
      the two bit values 0 and 1 to encode information.

      1. This is consistent with the nature of the physical devices used
         to construct computer systems, and has contributed to the high
         reliability we associate with digital systems.

      2. Although Von Neumann was thinking just in terms of representing
         integers in binary, we have since learned to represent many other
         kinds of information this way, including text, sounds, and graphics.

      3. To this day - over 5 decades later, and through several changes
         in fundamental techology  - this idea still dominates the field of 
         computing.

   B. At the outset, we should note that there is a crucial distinction between
      a NUMBER (an abstraction) and a REPRESENTATION of a number (a symbol).  

      1. For example, the number we call four can be represented by the 
         following symbols:

        FOUR    QUATUOR         4       IV      ||||    11      100     etc.
                                                          3        2

      2. On the other hand, the symbol 100 could stand for 4 (as above) or for 
         one hundred (decimal system) or for 9 (trinary system) or for any one 
         of an infinite variety of possible values.

   C. Most often, we symbolize numbers by a symbol based on a positional
      notation system.  The development of positional notation was a crucial 
      advance in the development of modern mathematics.  In a positional 
      system, we have a base or radix (such as 2 for binary, 3 for trinary, or 
      ten for decimal).  We make use of a limited set of symbols for 
      representing values 0 .. radix - 1.

      1. For example, in binary, we use two symbols (0,1); in octal (base 8) 
         we use 8 (0,1,2,3,4,5,6,7); in hexadecimal (base 16) we use 16 symbols 
         (0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F) etc. 

      2. When there is any possibility of confusion, we denote the radix of a
         numerical symbol by a subscript (written using decimal notation!) - 
         eg: 9010
                 16
   D. Further, in a positional system the value of a symbol depends on where it
      is written.  The rightmost symbol (digit) has the value (symbol_value)*1;
      the one next to it has value (symbol_value) * radix etc.  Thus, the       
      radix-2 (binary) number 01000001 is equivalent to the decimal number:

        0*128 + 1*64 + 0*32 + 0*16 + 0*8 + 0*4 + 0*2 + 1*1 = 65  .
                                                               10
   E. In this and subsequent lectures, we will consider how various kinds
      of information can be represented in binary.  Today, we limit
      ourselves to unsigned integers.

II. Review of Internal Encoding for Unsigned Binary Integers
--  ------ -- -------- -------- --- -------- ------ --------

   A. The normal way of encoding unsigned binary integers is to use a
      straightforward place-value system, in which the bits are assigned
      weights 2^0, 2^1, 2^2 etc.  Thus, for example, the binary integer 1011
      would be interpreted as

          1 * 2^0 = 1
        + 1 * 2^1 = 2
        + 0 * 2^2 = 0
        + 1 * 2^3 = 8
                  ---
                   11

   B. Note that with n bits we can store values in the range 0 .. 2^n - 1

   C. Conversion between decimal and binary

      1. To go from binary to decimal, we use the basic approach outlined above:
         multiply the rightmost bit by 2^0, the next bit by 2^1, the next bit 
         by 2^2 etc.  Add the products.  (It helps if you memorize the powers 
         of 2)

         Example: convert 10011101 to decimal           (157)

         Exercise: 10101010                             (170)

      2. To go from decimal to binary, we can use successive division:
         Divide the decimal number by 2.  The remainder is the rightmost bit of
         the binary equivalent.  Divide the quotient by 2.  The new remainder 
         is the second rightmost bit.  Divide the new quotient by 2.  The new
         remainder is third rightmost bit ... Continue until the quotient is 0.

         Example: convert 238 to binary

                238 / 2 = 119 rem 0     <- least significant bit
                119 / 2 =  59 rem 1
                 59 / 2 =  29 rem 1
                 29 / 2 =  14 rem 1
                 14 / 2 =   7 rem 0
                  7 / 2 =   3 rem 1
                  3 / 2 =   1 rem 1
                  1 / 2 =   0 rem 1     <- most significant     238 => 11101110

         Exercise: 252                                  (11111100)

   D. Adding binary numbers: As a child, you learned how to do addition based
      on an addition table like the following:

                0    1    2    3    4    5    6    7    8    9
        0       0    1    2    3    4    5    6    7    8    9
        1       1    2    3    4    5    6    7    8    9    0 + carry
        2       2    3    4    5    6    7    8    9    0+ca 1 + carry
        3       3    4    5    6    7    8    9    0+ca 1+ca 2 + carry

        etc.

      For binary, the table is much simpler:

                0       1
        0       0       1
        1       1       0 + carry

      Example:  01011010                Check:  90
                01101100                       108
                --------                       ---
                11000110                       198

      Exercise: 00111001
                01011010
                --------
                10010011

   E. One issue to be sensitive to in addition and other operations is
      OVERFLOW.  Computer hardware typically uses a fixed number of bits
      to represent integers, so an erroneous result will be produced if the
      correct sum is too big for the representation.

      Example: assume a certain machine uses 8 bits for integers.  Consider
               the following problem:

                11001000                200     
                11001000                200
                --------                ---
                10010000                144 !!

      The error arises because 8 bits cannot represent the correct sum.
      (Here, we detect overflow by the fact that there was carry out of the
      most significant bit position.)

   F. Other arithmetic operations can be done on binary numbers in a manner
      analogous to the decimal operations, but using the binary tables.

III. Representations for Signed Integers
---- --------------- --- ------ --------

   A. The method we have developed thus far for representing integers in binary
      only allows us to represent integers >= 0.  Obviously, we also need
      a way to represent negative numbers.

   B. In decimal, we represent negative numbers by representing their absolute 
      value and preceeding it by a negative sign.  This, in effect, means that 
      we have added one more symbol to our set of numerical symbols (a 10%
      increase in the size of our set of symbols).  On a  computer, this is 
      highly undesirable, since we would then have to have three different 
      symbols to represent numbers with, instead of two (a 50% increase in the
      size of our set of symbols.)

   C. Instead, we take advantage of the fact that the number of bits in
      a binary number is fixed by the hardware design (the machine's word
      length).  

      1. If we wish to represent signed numbers, then, we can reserve the
         leftmost bit as a sign bit, and interpret it as follows:

                0 --> the number is >= 0        1 --> the number is negative

      2. A consequence of reserving one bit for the sign is that the largest
         positive number we can represent is about half as big as what we
         could have represented with the same number of bits using unsigned
         notation (we've given up one bit for the sign.)  (This is why some
         languages (e.g. C++) allow integers to explicitly be declared as
         unsigned - to allow all bits to be used for representing magnitude.)

      3. An important ramification of a choice like this is that we have
         to be careful about OVERFLOW.

         a. With a fixed number of bits allocated for representing an integer,
            an addition can produce a result that is too big to be represented
            in the specified number of bits.

            Example: represent 100 (decimal) as an 8 bit signed binary number:

                     01100100   (rightmost bit is sign - 0)

            Now take the sum 100 + 100 = 200.  This results in overflow,
            because we only have seven bits available for representing the
            magnitude of the number (since one bit is reserved for the sign)

         b. With the representation schemes we will be using, undetected
            overflow will typically produce a result of the wrong sign - 
            e.g the binary representation for 200 in 8 bits is

                    11001000

            but this looks like a negative number since the rightmost bit
            is 1!

   D. With this basic encoding, there are actually three possible schemes for
      representing signed numbers. All agree on using the leftmost bit
      in the number to encode the sign - with 0 encoding + and 1 encoding -.
      They differ in what they do with the remaining bits.

      1. Sign magnitude

      2. One's complement

      3. Two's complement

      We will only discuss the first and last schemes, since one's complement
      is rarely used.

   E. For the following examples, we will assume the use of 8 bits to
      represent an integer - 1 for the sign and 7 for the value.   This is
      just so we don't have to write lots of 0's or 1's on the board -
      most typically we use 32 bits (or even 64 bits) to represent an integer.

IV. Sign-Magnitude Representation
--  -------------- --------------

   A. The simplest way to represent signed numbers is called sign-magnitude.
      It is based on the method we use with decimal numbers:

      1. To represent a positive number in (say) 8 bits, represent its
         magnitude as seven bits and prefix a 0.

      2. To represent a negative number in (say) 8 bits, represent its magnitude
         as seven bits and prefix a 1.

         Example:       +65     --> 1000001 (7 bits) --> 01000001
                        -65     --> 1000001 (7 bits) --> 11000001

         Exercise:      + 72, -72       (01001000, 11001000)

      3. To change the sign of a number, invert the sign bit.

   B. Some Ramifications of this Representation

      1. There are two representations for zero: 00000000, 10000000

         (The latter can be used as an "undefined value" to be stored in
          unitialized variables.  If the hardware ever reads it from memory,
          a trap can be generated to indicate that the progammer has used
          an undefined variable.)

                                                      n-1         n-1
      2. Range of values representable with n bit: -(2   - 1) .. 2   - 1

      3. Unfortunately, while simple for us, sign-magnitude is awkward in
         hardware.  For example, the algorithm to add two sign magnitude numbers
         looks like this:

         If signs agree:        add magnitudes, retaining signs.  Overflow
                                occurs if the sum magnitude exceeds 2^(n-1) - 1 
                                - i.e. if there is carry out of the most
                                significant bit of the magnitude sum.

         If signs differ:       find number with greater magnitude and retain
                                its sign.  Form magnitude of result by
                                subtracting smaller magnitude from larger.
                                (Overflow cannot occur.)

         Examples:      0 0000001               1
                      + 0 0000001               1
                        ---------               -
                        0 0000010               1

                        1 1000000             -64
                      + 0 0000001               1

              becomes:  1 1000000
                      - 1 0000001
                        ---------             ---
                        1 0111111             -63

                        1 0000001              -1
                        1 1111111            -127
                        ---------            ----
                        1 0000000       OVERFLOW - CARRY OUT OF MOST
                                        SIGNIFICANT BIT OF MAGNITUDE SUM

         Exercise:      00100000
                      + 10000010
                        --------
                        00011110

      4. Actually, multiplication and division are more straightforward:
         multiply/divide magnitudes and set sign = xor of original signs.

   C. Sign-magnitude is rarely used for integers, but is used as part of
      the representation for reals as we shall see later.

V. Two's Complement Representation
-  ----- ---------- --------------

   A. The most commonly used scheme for representing signed numbers is
      called two's complement.  

      1. The basic idea is this: when we write a positive number, we
         are really writing an abbreviation for the representation, in the
         sense that the number could be thought of as having infinitely many
         0's to the left - e.g.

         The representation for 42 (using 8 bits) is 00101010 - but we could
         think of this as standing for

         ... 00000000000000000101010
         infinitely many 0's 

      2. To form the representation for a negative number, think of what would
         happen if we subtracted its absolute value from 0 (-X = 0 - X).  E.g.
         if we subtracted the representation for 42 from 0, we would get

         ... 00000000000000000000000
       - ... 00000000000000000101010
         ---------------------------
         ... 11111111111111111010110
         infinitely many 1's 

         a. If we abbreviate down to 8 bits, we get 11010110

         b. Actually, we don't have to work with infinitely many bits - to
            negate an n bit number, it suffices to subtract it from the
            (n+1)-bit number 2^n and then truncate to n bits - e.g.

         100000000      Representation for 2^8
       -  00101010      42
         ---------
         011010110 = 11010110 in 8 bits
         ^
         +---- Discard this bit

      3. This is called the TWO'S-COMPLEMENT representation - i.e. we say
         that 11010110 is the two's complement of 00101010 - and vice versa.

         Observe that if we add the n-bit representation of a number and its 
         two's complement, without treating the sign bit specially, the result
         is 2^n.  However, if we truncate the result to n bits, it becomes 
         0 - which is surely  what we want since X + (-X) = 0

                 00101010
              +  11010110
                 --------
                100000000 = 00000000 in 8 bits

   B. An Easy Way to Represent Numbers in Two's complement

      1. To represent a positive number in n bits, represent it as an
         unsigned number using n-1 bits and prefix a 0.  (If this cannot be
         done, then the number cannot be represented as an n-bit two's
         complement number.)

      2. To represent a negative number in n bits, represent its absolute
         value as above, then invert all the bits (including the sign bit)
         and add 1. 

         Example:       +65 --> 01000001
         (8 bits)       -65 --> 10111111

         Exercise:      + 72, -72       (01001000, 10111000)
                                                                     
      3. To change the sign of a number, invert all the bits, add 1

      4. Observe: if we are using n bits, then X, if negative, looks like the
                  unsigned number
                                           n
                                         (2 ) + X

   C. Some Ramifications of this Representation

      1. There is one representation for 0: 00000000, so X + (-X) = 00000000

                                                       n-1    n-1
      2. Range of values representable with n bits:  -2   .. 2   - 1

         (Note that this is assymetric)

      3. To add two 2's complement numbers: add them as if they were unsigned 
         (ie treat the sign as if it were the most-significant bit of an
         unsigned number.) Discard any carry out of the sign position.

         To see why this works (in the absence of overflow), observe the
         following:

         a. If we add two non-negative numbers X and Y, the operation is the
            same as unsigned addition in any case.

         b. If we add negative number (X) to a non-negative number (Y),
            X looks like the unsigned number X + 2^n and the sum looks like 
            the unsigned number X + Y + 2^n.

            i. If the final sum is negative, this is the correct
               representation.

           ii. If the final sum is positive, there is (of necessity) a
               carry out of the most significant position which cancels
               the 2^n term, so the result looks like X + Y.

         c. If we add two negative numbers X and Y, X looks like the
            unsigned number X + 2^n and Y looks like the unsigned number
            Y + 2^n.  The sum therefore looks like X + Y + 2^n + 2^n.
            However, since there is (of necessity) a carry out of the
            most significant position, one of the 2^n's is cancelled and
            the sum looks like X + Y + 2^n.

       4. There is overflow if carry out of sign != carry in to sign

          Examples:     00001000                8
                        11111111               -1
                        --------               --
                        00000111                7

                        11111110               -2
                        11111110               -2
                        --------               --
                     1  11111100               -4
                      \
                        carry out of sign position discarded

                        01100100              100
                        01100100              100
                        --------
                        11001000 (-56)  OVERFLOW - CARRY-IN TO SIGN BUT
                                                   NONE OUT OF SIGN

          Exercise:     11000001              -63
                        01000000               64
                        --------               --
                        00000001                1


            Proof of overflow rule:

            Carry in to sign != carry out of sign  <=> overflow

            a. If we add two positive numbers, there will never be carry out
               of the sign (since both signs are zero).  The sign of the result
               will be 1 (which must be an overflow) iff there is carry in to
               the sign - i.e. the two carries differ.

            b. If we add two negative numbers, there will always be carry out
               of the sign (since both signs are one).  The sign of the result
               will be 0 (which must be an overflow) iff there is no carry in to
               the sign - i.e. the two carries differ.

            c. If we add two numbers of unlike sign, there can never be
               overflow.  Further, there will be carry out of the sign iff
               there is carry in to the sign (since just one sign is one) - 
               i.e. the two carries will always be the same.

         5. To subtract: Negate (flip all the bits, add 1) the subtrahend and
             add

         6. Multiplication and division are more complex.  One approach is
            to complement negative values so as to do the whole operation
            with positive values, then if the two original operands were of
            opposite sign then complement the result.  Or, the whole 
            operation can be done in 2's comp - we'll discuss this later.

      D. Two's complement is the preferred scheme for integer arithmetic on 
         most machines, though many (including MIPS) use Sign-Magnitude for 
         floating  point. (Note that, since the arithmetic algorithms are 
         wired into the hardware, the choice is made by the hardware designer 
         and cannot easily be altered in software.)

VI. Internal encoding for reals (floating point numbers)
--  -------- -------- --- ----- ------------------------

   A. Thus far we have confined our discussion to integers.  How can we 
      represent real numbers?  It turns out that this is an area where 
      architectures diverge widely.  We will discuss general principles, and
      then a particular approach taht has become widely accepted.

   B. A real number is stored internally as a mantissa times a power of some
      radix - i.e.
                     e
                m * r

      1. CPU architects have several basic options regarding the format of the
         mantissa, m:

         a. It may be a pure fraction.  That is, there is an assumed binary 
            point just to the LEFT of the first bit of the mantissa.

         b. It may be a number between 1 and 2, with an assumed binary point
            between the first and second bits.

         c. It may be an integer, with an assumed binary point just to the
            RIGHT of its last bit.

      2. The exponent e specifies a power of some radix, r, by which the 
         mantissa is to be multiplied.

         a. The radix is often 2, but in some machines is 4, 8, or 16.

            Example: If a particular machine uses 2 as the radix of its
                     exponent, then a number with mantissa .1000 and exponent 
                     2 is interpreted as
                           2
                  0.5   * 2  = 2
                     10

                     However, if the exponent radix is 16 (as on the IBM 
                     360/370), then the interpretation is
                            2
                  0.5   * 16 = 128
                     10 

         b. Of course, the choice of what radix to use for the exponent is an
            architectural choice, involving tradeoffs between RANGE and
            PRECISION.  

   C. In an attempt to deal with data exchange problems arising from a 
      diversity of floating point representations used by different 
      manufacturers, the IEEE has developed a floating point standard (Standard 
      754) that is used by most systems

      1. Older architectures developed prior to this standard may use a
         different format (in fact, the proliferation of such formats was
         what led to the development of IEEE 754).  

         a. Two important architectures that are still in use that utilize
            a pre-IEEE 754 floating point format are the IBM mainframe
            architecture and the VAX architecture.

         b. All current microprocessor architectures in wide use utilize the
            IEEE 754 format, including MIPS, IA32, and PowerPC.

         c. The Java Virtual Machine utilizes the IEEE 754 format.

      2. The standard provides two different formats for single and double 
         precision numbers.  We will consider the single precision 
         representation in this system in some detail.

              31 30       23 22                   0
              -------------------------------------
              |s|exp        |fraction             |
              -------------------------------------

         a. s = sign of mantissa: 0 = +, 1 = -

         b. exp = exponent as power of 2, stored in excess 127 form - i.e. 
            value in this field stored is 127 + true value of exponent.

         ex: true exponent = 0; stored exponent = 127
             true exponent = 127; stored exponent = 254 (largest possible value)
             true exponent = -126; stored exponent = 1 (smallest possible value)

         The extreme values of the stored exponent (0 and 255) are reserved
         for special purposes to be described below.

         c. The significand (magnitude of mantissa) is normalized to lie in the 
            range 1.0 <= |m| < 2.0.  This implies that the leftmost bit is a 1. 
            It is not actually stored - the 23 bits allocated are used to store
            the bits to the right of the binary point, and a 1 is inserted to 
            the left of the binary point by the hardware when doing arithmetic.
            (This is called hidden-bit normalization, and is why this field is 
            labelled "fraction".) In effect, we get 24 bits of precision by 
            storing 23 bits.

         Example:       stored significand = 00000000000000000000000
                        true significand = 1.0

                        stored significand = 11111111111111111111111
                        true significand = 1.11111111111111111111111
  
         d. As noted above, certain exponent values are reserved for special
            purposes, and when they appear the interpretation of the signficand
            changes:

            i. Stored exponent = 0

               - If the significand is zero, then the number represented is 0.0
                 (Since an all-zero stored significand represents 1.0, a special
                 case is needed to represent zero.)

               - If the significand is non-zero, then we have a denormalized
                 number; no hidden bit is inserted and the true exponent is 
                 taken as -126

           ii. Stored exponent = 255

               - If the significand is zero, then the number represented is
                 +/- infinity (depending on the sign bit.)  The hardware
                  correctly propagates infinity in arithmetic - e.g. infinity +
                  anything is infinity.

               - If the significand is non-zero, then the representation is
                  not a number (NAN).  Any use of NAN in an arithmetic operation
                  always produces NAN as the result.

      3. Examples:  0.0:                0 00000000 00000000000000000000000

                    1.0:                0 01111111 00000000000000000000000
                        (1.0 x 2^0)

                    0.5 (0.1 binary):   0 01111110 00000000000000000000000
                        (1.0 x 2^-1)

                    0.75 (0.11 binary): 0 01111110 10000000000000000000000
                         (1.1 x 2^-1)

                    3.0 (11 binary):    0 10000000 10000000000000000000000
                         (1.1*2^1)

                   -0.375 (-0.011 binary): 1 01111101 10000000000000000000000
                        (-1.1*2^-2)

                    1 10000011 01000000000000000000000 = - 1.01 * 2^4 = -20.0

      4. Range of values:

         - largest finite positive:     0 11111110 11111111111111111111111 =
           1.11111111111111111111111 * 2^127 or ~2^129 (about 5 * 10^38) 

         - smallest normalized pos.:    0 00000001 00000000000000000000000 =
           1.00000000000000000000000 * 2^-126 (about 1 * 10^-38)
        
         (The precision of both of the above is 24 bits = ~ 7 decimal places)

         - smallest positive:           0 00000000 00000000000000000000001 =
         = .000000000000000000000001 * 2^-126 or 2^-149 (about ~2 * 10^-45)
         (But precision is only one bit!)

      5. IEEE 754 also defines a double precision floating point standard, 
         which represents a number using 64 bits: 1 for the sign, 11 for the
         exponent (excess 2047), and 52 for the fraction.

      6. Summary: TRANSPARENCIES - MOTOROLA MC68881 MANUAL PP 2-16, 3-24

VII. Floating point Arithmetic
---  -------- ----- ----------

   A. Arithmetic on floating point numbers is, of course, much more complex
      than integer (or fixed-point) arithmetic.  

      1. It is not necessary to have hardware provisions for doing floating
         point arithmetic - it is possible to code subroutines to perform
         basic floating point operations using a combination of integer
         arithmetic and shift operations.  

         a. Historically, when integrated circuit techology was more limited,
            that was often the case.

         b. It still is the case for low-end microprocessors used in embedded
            systems.

      2. Historically, many CPU's relegated floating point arithmetic to
         a separate processor, often called a "floating point coprocessor".
         On older systems, this was often a separate chip, which may or may
         not be installed in a given computer.  (If not, floating point
         arithmetic would be done in software.)  Today, the "coprocessor"
         is actually often part of the main CPU chip.  The term coprocessor
         remains in use for historical reasons, and because floating point
         operations often use their own register set.

      3. What we have to say in this section is applicable regardless of how
         floating point operations are physically implemented.

   B. We will briefly consider the basic task facing floating point processors,
      but will not look at the algorithms in detail.

   C. Recall that a floating point number is actually represented internally
      by two fixed-point numbers: a mantissa and an exponent.  That is, it is
      of the form:
                     e
                m * r

      We will assume use of the IEEE standard - i.e. 1 <= m < 2, with r = 2.

   D. Floating point addition or subtraction entails the following steps:

      1. Reinsertion of the hidden bit.  Though normalized floating point 
         numbers are STORED without the 1 to the left of the binary point,
         the arithmetic unit can work on an internal form of the number
         with the hidden bit explicitly present.

         (Of course, if an operand is zero or a denormalized number, a 0 is
          inserted in the hidden bit position.)

      2. Denormalization: if the exponents of the two operands differ, then
         the operand with the smaller exponent must be shifted right to line
         up the implied binary points.  The larger exponent will then be the
         exponent of the result.
                                0           -1
         Example:       1.00 * 2  + 1.00 * 2   must be converted to:

                                0           0
                        1.00 * 2  + 0.10 * 2  before adding
  
      3. The addition/subtraction proper.

      4. Renormalization: There are three possibilities for the result of
         of the addition/subtraction step:

         a. There could be carry out from the leftmost mantissa bit:

                             0           0                0
            Example: 1.10 * 2  + 1.00 * 2  yields 0.10 * 2  plus a carry out

            In this case, the mantissa is shifted right (bringing the
            carry bit in), and the exponent is increased.
                                                              1
            Example: The final result of the above is 1.01 * 2

         b. There could be no carry out, and the leftmost bit of the mantissa
            could be zero - i.e. the result could be unnormalized.  (This only
            occurs when adding numbers of unlike signs or subtracting numbers
            of like sign.)

                             0           0                0
            Example: 1.10 * 2  - 1.00 * 2  yields 0.10 * 2  (with no carry out)

            In this case, the mantissa must be shifted left one or more places
            to renormalize it, and the exponent must be decreased for each
            shift.
                                                              -1
            Example: The final result of the above is 1.00 * 2

            Note: To reduce the loss of precision in cases like this, the
                  floating point unit often includes one or two GUARD BITS to
                  the right of the mantissa which "catch" bits shifted out
                  during denormalization and make them available for use in
                  renormalization

            Note: If the exponent would be reduced below the smallest
                  permissible value, the result is left in denormalized form.

         c. The result could be correct and normalized as it stands.

      5. Preparation for storage.

         a. If the number has been shifted right during renormalization, then
            a bit will have been shifted out, and will be caught by the
            guard bits.  Moreover, the guard bits may contain bits that were 
            shifted out during initial denormalization which are properly part
            of the infinite-precision result.

         b. IEEE 754 defines various ROUNDING MODES that control how to handle
            the guard bits:

            i. Round toward zero: the guard bits are discarded. (Also called
               truncation.)

           ii. Round to nearest: round the result to the nearest representable
               value - e.g. if the guard bits are 11, then add 1 to the least
               significant bit of the result.  Ties are broken by rounding 
               toward zero.

          iii. Round toward plus infinity: if the result is positive, and
               the guard bits are non-zero, add one to the least significant
               bit of the result - else discard the guard bits.

           iv. Round toward minus infinity: same, but round if result is
               negative.

         c. In any case, the the hidden bit is removed prior to storing the 
            result.  

   E. Floating point division and multiplication are - relatively speaking -
      simpler than addition and subtraction.

      1. The basic rule for multiplication is

         a. Reinsert the hidden bit.

         b. Multiply the mantissas

         c. Add the exponents

         d. If necessary, normalize the product by shifting right and increase
            the exponent by 1.  (Note that if the mantissas are normalized, 
            they will lie in the range:         1 <= m < 2

            Therefore, the product of the mantissas will lie in the range:

                        1 <= m < 4

            So at most one right shift is needed.

         e. Store the result less the hidden bit after appropriate rounding.

      2. The basic rule for division is

         a. Reinsert the hidden bit.

         b. Divide the mantissas

         c. Subtract the exponents

         d. If necessary, normalize the quotient by shifting left and decrease
            the exponent by 1.  (Note that if the mantissas are normalized, 
            they will lie in the range:         1 <= m < 2

            Therefore, the quotient of the mantissas will lie in the range

                        0.1 < m < 2.0
                           2          2
            So at most one left shift is needed.

         e. Store the result less the hidden bit after appropriate rounding.

   F. As can be seen, a floating point arithmetic unit needs to be able to
      add and subtract exponents, and to shift, add, and subtract mantissas.
      The latter can be done by using the same hardware as for the integer
      multiply/divide operations, or special, dedicated hardware.

VIII. Representing Characters, Sounds, and Graphics
----  ------------ ----------  ------  --- --------

   A. In principle, any information that can be represented as integers
      (unsigned or signed) can be represented in binary by converting the
      integer representation into binary.

   B. Real numbers can be represented as a pair of integers - a mantissa
      (perhaps with an implied binary point) and an exponent.

   C. We have previously seen how textual information can be represented
      by assigning integer codes to individual characters - either:

      1. ASCII: each character is assigned an 8 bit code in the range 1 .. 127

      2. Unicode: each character is assigned a 16 bit code in the range
         1..65535.

   D. Sounds

      1. Computers can store and reproduce sounds by storing digitized
         SAMPLES of the sound signal intensity.

         DEMO: Sample Editor

      2. For high quality, these samples must be taken tens of thousands of
         times per second.

         a. There is an important theorem, called the sampling theorem,
            that says that any sound can be accurately reproduced given
            samples taken at twice the highest frequency present in it.

            - The human ear can hear pitches up to about 20 KHz.  CD 
              quality audio is based on 44,000 samples per second.
            - To conserve storage, computer systems often use 22,000 or
              11,000 samples per second.  This loses the upper-end of the
              frequency spectrum, but is adequate for many purposes

         b. The precision with which the samples are stored is also important.

            - Music CD's use 12 bit samples, which gives a precision of one
              part in 4096, or about .025 %
            - Many computer systems use 8 bit samples, which gives a
              precision of one part in 256, or about 0.4%

   E. Graphics

      1. Pictorial information is displayed by breaking the screen into
         individual dots, known as PIXELS.  The quality of the image is
         in part determined by the number of pixels per inch (often
         abbreviated dpi = dots per inch.)  This is called the RESOLUTION
         of the image.

         a. Computer monitors typically use a resolution of around 72 dpi.

         b. Typical laser printers use 300-600 dpi; some publication-quality
            printers go to 600 or 1200 dpi or more.

      2. For black and white graphics, each pixel can be represented by a
         single bit in memory.

      3. For gray scale graphics, each pixel can be represented by a
         small integer (often a single byte) representing a degree of
         lightness or darkness.  For example, using one byte per pixel:

         0 = black       255 = white    128 = medium gray

      4. For color graphics, each pixel is represented by three small 
         integers, representing the intensity of each of the three primary 
         colors (red, green, and blue) at that point.

         a. The most sophisticated systems currently available store 24 bits
            for each pixel - 8 for each color.  This allows for over
            16 million colors, ranging from black (all 0's - all colors
            totally off) to white (all 1's - all colors totally on.)

            Examples

             R          G               B

             11111111   00000000        00000000        Pure red
             11111111   11111111        00000000        Pure yellow
             10000000   10000000        11111111        Light blue

            DEMO: RGB Applet on course page

         b. To conserve storage, some systems store only 8 bits per pixel,
            where each possible value selects one of 256 predefined colors.

         c. Other systems store 16 bits per pixel, allowing a choice of one
            of 65536 predfined colors.

   F. Movies

      1. Current television technology is based on redrawing the screen
         30 times per second.  Each such image is called a FRAME.

      2. Thus, video can be represented by a series of individual frames
         (using graphics representation) - 30 per second - plus
         an associated sound track.

      3. As you can imagine, the storage requirements for video information
         can be huge.  The storage (and transmission time) requirements can
         be significantly reduced by various compression techniques we will
         not discuss here.

IX. Octal Numbers
--  ----- -------

   A. By now you are probably tired of writing 1's and 0's all the time.  
      Writing numbers in binary is tiring, and it is very easy to make mistakes.
      On the other hand, converting numbers between decimal and binary is 
      itself a painful process, so at the hardware level we like to work
      with the binary form.

   B. Consider, for a moment, the radix-8 (octal) number system.  Since there 
      are 8 different symbols in this system, octal numbers are about as easy 
      to write as decimal numbers.  Further, because 8 is a power of 2, it is 
      very easy to convert between binary and octal notations.

      1. Binary to octal: group binary number into groups of three bits, 
         starting from the right.  Each group will now represent a value in 
         the range 0 .. 7 - i.e. an octal digit.

         Example:       11 000 111      --> 307
                                       8

         Exercise:      10 101 100      (254)

      2. Octal to binary: convert each digit to three bits:

         Example:       146 --> 001 100 110

         Exercise:      321     (011 010 001)

   C. In a sense, then octal becomes a form of shorthand for binary.  Any
      given bit can be quickly recovered from the octal representation.

      Example:  What is bit 4 of the number represented by 246 ?
                                                              8

                Observe: octal digit 0 is bits 0..2; digit 1 is bits 3..5.
                So bit 4 is middle bit of middle digit - i.e. middle bit of 4 -
                i.e. middle bit of 100 - i.e. 0/

      Exercise: Bit 5 of 172?   (1)

X. Hexadecimal numbers
-  ----------- -------

   A. Another system often used as a shorthand for binary is hexadecimal - base
      16.  The hex digits are written 0 .. 9, A, B, C, D, E, F

        Example:        A13E --> 1010 0001 0011 1110

        Exercise:       4FB7     (0100 1111 1011 0111)

        Exercise:       1100 0101 1111 0100     (C5F4)

   B. Historically, the instruction set architecture of the DEC PDP-11
      minicomputer was such that using the octal shorthand was the more
      natural notation.  (Octal digit boundaries fell at field boundaries
      within the instruction.)  As a result, octal notation found its way
      into the C programming language and its descendants.  However, for
      most modern architectures (including MIPS) the documentation convention
      is to use hexadecimal, rather than octal, as a shorthand for binary.
      Therefore, hexadecimal is the main shorthand we will use in this course.

   C. Always remember, though, that octal and hexadecimal are simply
      shorthands for making the writing and reading of binary numbers
      easier.  The internal representation is always binary.

Copyright ©2003 - Russell C. Bjork