CS322: Concurrent Processes and Programming

Programming language constructs for expressing parallelism

Traditional programming languages are oriented toward sequential programming, since the assumption is that S₁; S₂ means that S₁ is done before S₂. To write programs that use concurrency, we must either:
- Have programming language constructs that allow the programmer to specify portions of the program which can be done in parallel (processor resources permitting)
- or: Have compilers that automatically detect potential parallelism (using rules like those above) and generate appropriate code.
- We focus on alternative (1) here.
One of the earliest proposed programming primitives for concurrency was fork .. join. This primitive has the advantage of being able to specify any possible parallel computation.
- FORK label
  
  This calls for the creation of a new process, with execution of the new process beginning at label. The original process continues with the next statement.
- JOIN count
  
  This must be executed by the specified number of processes. All but the last terminate when they reach the JOIN; the last continues execution with the next statement.
- Example: the above program can be written
```
	b := d;
	FORK L1;
	a := b + c;
	GOTO L2;
L1:	d := e + f;
	FORK L3;
L2:	JOIN 2
	e := a;
	GOTO L4;
L3:	f := b;
L4:	JOIN 2;
	g := e + f;
```
  However, as this example shows, FORK and JOIN lead to spaghetti-code that is far from transparent. Unfortunately, it is the only primitive that allows any possible precedence graph to be realized; however, the same principle holds here as for structured sequential programming: what is gained in readability and maintainability far exceeds any loss of generality.
- No major programming language includes fork and join as built-in features of the language.
- However, the basic concurrency primitive of Unix is a system call called fork() that is similar to, but not the same as, what we have just discussed.
  - Fork() is a Unix system service that creates a copy of the process that executes it. Following a successful fork, there are two almost-identical processes on the system.
    1. Both processes share the same code.
    2. The new process gets a copy of the creator's data - but the data is not shared (each process has its own copy.)
  - The only difference between the two processes is this: the parent and newly-created child get back a different value from the fork system service. Thus, each knows who it is.
  - There is no Unix service that is analogous to join. Each forked process goes on and runs until it terminates. (However, there is a mechanism whereby a parent can wait until its child process(es) die(s).)
Dijkstra's primitive: PARBEGIN .. PAREND, also known as COBEGIN .. COEND:
```
PARBEGIN      or:      COBEGIN
  S₁;                    S₁;
  S₂;                    S₂;
  ...  	                 ...
  S_n                     S_n 
PAREND;                COEND;
```
- The compiler may (but is not required to) generate code which causes each of the S_i to be done in parallel. Only when all of the statements terminate does control continue past the {PAR,CO}END.
  
  The code segment above could produce
```
	FORK L2;
	FORK L3;
	...
	FORK Ln;	
	S₁;
	GOTO L9999;
    L2: S₂;
	GOTO L9999;
    L3: S₃;
	GOTO L9999;
	...
    Ln: S_n
 L9999: JOIN n
	
```
- The reverse is not true. PARBEGIN and PAREND cannot be used to express some precedence graphs. In particular, our first example can only be partially realized using PARBEGIN and PAREND:
```
	S₁;           -- this does not allow S₂ and S₅ to be done in
	PARBEGIN      -- parallel, though the precedence graph permits it
	  S₂; S₃
	PAREND;
	PAREBEGIN
	  S₄; S₅
	PAREND;
	S₆;
	
```
- No major programming language includes PARBEGIN and PAREND as a built-in feature, though some experimental languages have done so.
Tasking. Several programming languages - notably PL/1 and Ada - include the concept of a task. A task looks syntactically like a (parameterless) procedure; but if a program consists of a number of tasks then each task is able to execute concurrently with all the others.
- A concurrent program consists totally of a set of tasks - no separate main program as in the previous two systems where there was non-concurrent code preceding and following the primitives.
- Each task making up a program is automatically started when the program is started.
- The program terminates when all tasks comprising it have terminated.
- Note that this is a fairly coarse-grained concurrency; there is no way to specify that two single statements can be done in parallel without making a task out of each - a lot of overhead. However, tasking is more than adequate for many problems calling for concurrency.
  
  For example: problems like the bounded buffers problems can easily be implemented this way. Later, we will look at an example of this, written in Ada.
Java Threads. Java is a true multi-threaded language. It is difficult to do any substantial GUI-based work with Java without using threads. In many cases threads are used to avoid what would be a more complicated interrupt scheme and little inter-thread communication is required. However it is possible to have threads interact with each other in an arbitrarily complex way.
- Threads share the same code.
- Threads share the same data.
Note that all of these constructs simply indicate to the compiler that certain statements may be done in parallel.
- On a uniprocessor system, the compiler may generate totally non-parallel code - example:
```
	PARBEGIN
	    S₁; S₂; S₃
	PAREND
	
```
  may be compiled as
```
	BEGIN
	    S₁; S₂; S₃
	END
	
```
  or - equally correctly -
```
	BEGIN
	    S₂; S₃; S₁
	END
	
```
- More rarely, the compiler may generate code that spawns a set of processes that are multiprogrammed by the operating system. The problem here is that the operating system's provisions for sharing of data (e.g. global variables) between separate processes may not be as rich as those implicit in the language.
- On a multiprocessor system, truly parallel code can be produced; but the number of available processors determines the extent to which the compiler generates all possible parallel code.
Note that we have now established a distinction between a program and the process(es) executing it. This is terribly important.
- The execution of a traditional single program involves a single process.
- The execution of a concurrent program involves multiple processes.
  - FORK causes a copy of the current process to be made, sharing the same code and data but having its own program counter (state). JOIN terminates all but the last process to arrive.
  - PARBEGIN causes the creation of one process per parallel statement, each of which accesses shared code and global data but has its own program counter and possibly local data. Each process terminates when it finishes executing its statement. The parent process waits until all the child processes have terminated, and then proceeds from the PAREND.
  - In a program using tasking, each task constitutes a separate process.

$Id: concurrency3.html,v 1.5 1998/03/03 23:42:04 senning Exp $

These notes were written by Prof. R. Bjork of Gordon College. In February 1998 they were edited and converted into HTML by J. Senning of Gordon College.