CMPSC 311, Introduction to Systems Programming

Process Control and Signals, Projects 6 and 7



This sequence of two projects shows how to create and test (6) a child process and signal handlers, and (7) a simple command interpreter, or shell.  The first project asks you to manage (in a simple way) the coordination between two processes.  The second project extends the first with interactive features and multiple processes.  The first project is individual and will take one week, while the second is for two people and will take two weeks.  We will use the description of Project 6 for the in-class discussion of Process Control and Signals on Mar. 13, 15 and 18; Project 6 is then due before class on Mar. 22.  Note that the second midterm exam will be given during the time you are working on Project 7, and a solution to Project 6 will be posted before the in-class exam review on Mar. 27.

Background information for the projects, with a complete example program, is provided.  You should work through the background information first; we covered some of this in class on Feb. 1.  A starting point for Project 6 is also provided here, which uses some of the background information.

A solution to Project 6 will be posted soon after the due date, so that you can check your work, review for the exam, and proceed to the next project even if not completely successful.  Please be sure to turn in your work before the solution is posted, even if it is incomplete.  Late submissions will not be accepted.

Here is what you should turn in for these projects:

  1. a printed copy of the source code, with comments, your name, and the date;
  2. a printed copy of the output from the program (if it compiles and runs successfully);
  3. a printed copy of the error messages from the compiler or runtime system (otherwise);
  4. any additional write-up required for the project;
  5. a brief statement of how you allocated your time working on the project (planning, reading the manuals, coding, debugging, cursing the prof, etc.).
  6. An electronic version of your program should be submitted through ANGEL.  Specific instructions will be included with each project.  Be sure to attach all parts of your program's source code.  Do not attach an executable file.  The dropbox will remain open until 2 pm on the project due date for Project 6, and 11 pm for Project 7.

You can run your examples on Solaris, Linux or Mac OS X, but please specify which system you used, and when.  The test cases to demonstrate the output are your choice; 5 or 6 should be enough.  There are some examples at the end of this description.  The time allocation statement should be like "1 hr planning, ...", and not like "1% planning, ..."; there's no reason to be completely precise about it, but at least try to be honest.

The notation <checkpoint> indicates a place in the description where you should have a version of the program that compiles correctly without complaint, and does some limited action correctly.  You may need to rewrite code between the checkpoints.  Only the final version needs to be turned in.  Code that was provided and remains unchanged does not need to be turned in, but you should indicate this somewhere.

Some parts of the project descriptions review topics discussed earlier in the course, when they might not have seemed so important, and in case they didn't sink in the first time.



CMPSC 311, Project 6

Posted Mar. 12, 2013.  Due Friday, Mar. 22, 2013, 2 pm (electronic version, to ANGEL), and in class (paper version).  A solution will be posted on ANGEL at 2 pm on Mar. 22, so no late projects will be accepted after that time.  25 points.

Reading (review).

Reading (new, processes).

Reading (new, signals).

Reading (new, further concepts).

That's a lot of reading.  The rest of this project description provides a quick introduction to these topics, hopefully enough to get you through the project.  All of this material will be needed for Project 7.



The objective of the project is to learn about processes and signals.  Specifically, it is to write a Unix program pr6 in C that will These are components that will be useful in the next project; they demonstrate some of the principles underlying processes, concurrency, interrupt handlers, reentrant functions, and scheduling.  Signals in Unix are a software abstraction of interrupts at the hardware level.

These man pages on Solaris will also be useful:

Note that the manual section numbers are sometimes different on Linux and Mac OS X.  The most common change is from Section 3C to simply Section 3.  Use the -a option with the man command to see everything, independent of the system.



Suggestions on the development of the program.

The project description is written as a series of incremental steps leading to the following program structure:

  1. getopt() loop, to read the command-line arguments
  2. check the values set during the getopt() loop
  3. print initial information as seen in the options and (perhaps) the default values
  4. install signal handlers
  5. initialize process table
  6. in a loop, fork some child processes
    1. child: sleep() with alarm(), then exit()
    2. parent: update process table
  7. now it is just the parent
  8. sleep() with alarm()
  9. in a second loop, wait for the child processes to terminate, update process table as this happens
  10. print final information
Compare this to CS:APP Fig. 8.17, 8.18, 8.31, 8.32, 8.33 and 8.35, for the basic structure.  Fig. 8.29 shows a related but different usage of alarm().

To build the program in stages, you could install and test the components in this order:
  1. main(), getopt() loop
  2. usage()
  3. Ctime(), print_msg(), ...
  4. signal handlers
  5. in main(), alarm() and sleep()
  6. in main(), fork() once
  7. in the child, alarm() and sleep(), exit()
  8. in main(), wait_child(), wait_any_child()
  9. in main(), wait for child
  10. in main(), put loops around fork() and wait_any_child()
  11. process table
These source files are linked in the description:

pr6.1.c       pr6.2.c        pr6.3.c
pr6_ctime.h   pr6_ctime.c
pr6_signal.h  pr6_signal.c
pr6_wait.h    pr6_wait.c
pr6_table.h   pr6_table.c
Makefile      Makefile-c99   Makefile-gcc   Makefile-lnx   Makefile-mac

Over half the code is provided directly from this description.  Some of the code is directly useful, and some of it consists of examples that are intended to be discarded or replaced after you understand how the example works.

Be sure to use the compiler commands "c99 -v ..." or "gcc -std=c99 -Wall -Wextra ...", and to recompile the program often as you work on it.



The -h option with pr6 should print (at least)

  Usage:  pr6 [-h] [-v] [-a n] [-b n] [-c n] [-f n] [-s n] [-t n] [-x n]

The options should work as follows:
 
-h print a help message
-v enable verbose mode (extra output)
-a n child alarm time interval, default 0
-b n parent alarm time interval, default 0
The alarms repeat at regular intervals.  The default value 0 means that the alarm feature is not to be used.
-c n fork() n child processes, default 0
You can force a maximum value for n, but the program should at least allow n between 0 and 8.
-f n fflush() before fork(), 0 = no, 1 = yes, default 1
This is described later.  It is an "extra feature" that allows some experimentation.
-s n child sleep time, default 0
-t n parent sleep time, default 0
The default value 0 means that the process does not sleep.
-x n child exit status n, default 0
The default value is EXIT_SUCCESS, which is 0.

The options -a, -s and -x apply to each child process.
The option -v applies to the parent and child processes.
The remaining options apply only to the parent process.

The starting point of the project is in the file pr6.1.c, which you should save and test.  Compile it with one of these commands, using the Sun or GNU compilers, and C89 or C99.  The -D option with C99 clears up a warning from the compilers on Solaris about getopt().

cc -v -o pr6 pr6.1.c
gcc -Wall -Wextra -o pr6 pr6.1.c

c99 -v -D_POSIX_C_SOURCE=200112L -o pr6 pr6.1.c
gcc -std=c99 -Wall -Wextra -D_POSIX_C_SOURCE=200112L -o pr6 pr6.1.c
With Linux, you can use instead  -D_POSIX_C_SOURCE=200809L  .  With Mac OS X, just omit it.

There will be more code provided later in the project description.

<checkpoint>



The idea of this project is to generate several processes, which will all send output to the same place, the terminal or a file.  We need to start with a simple mechanism to help identify which process has printed a message, and when.  The initial testing will be with signal handlers for only one process.

You should now pick up and read the code in pr6_ctime.h and pr6_ctime.c.  Note that the file names have an embedded underscore _ not a space; web browsers that underline links make this difficult to see.  The functions defined here are

char * Ctime(char buf[26]);

void print_msg(char *msg);
void print_msg_1(char *msg, int n);
void print_msg_2(char *msg, int n1, int n2);
void print_msg_error(char *msg, char *errmsg);
void print_msg_abort(char *msg);

Here is an example of using the print_msg() function,

print_msg("Hello, world");

 18173: Tue Mar 12 13:43:45 2013 Hello, world

If the child process and the parent process want to print the same text, print_msg() makes it possible to see which process caused the output, and at what time.  If more than two processes are involved, then having the process ID and time attached to an error message or diagnostic output makes it easier to see what's going on.  Note that printed output is typically buffered, and you might see an earlier message from one process appear after a later message from a different process.  This can be disconcerting, but it isn't wrong.  If you sort the output by process number, or by the time, then everything looks right.

These functions are built using ctime(3C) and getpid(2)getpid(2) returns the process identifier of the current process.  getppid(2) returns the parent process's identifier.  The type of a process identifier is pid_t, an integer type defined in the header <sys/types.h>.

The ps(1) command can be used to see the identifiers of all processes currently running, but that's the wrong way to obtain process numbers for use in a program.  It might be a good idea to run ps occasionally (or top continuously) if you think your program has a bug, to be sure you are not accumulating a large number of processes.
Later, when studying thread programming, we will need the reentrant function Ctime().  It is useful here because it allows a simple method of giving a time-stamp to a line of output.  You should understand why ctime(3C) is not reentrant, and why ctime_r(3C) is necessary.  It will be harder to understand why there are two versions of ctime_r().  (It's because Sun implemented one proposed version before a second version was chosen by the POSIX committee to be the standard version.  The old version remains so that old code doesn't need to be rewritten.  No one ever said this business of programming makes complete sense.  More information can be found in /usr/include/time.h, but it's not easy reading and not really recommended at this point.)

The lines #ifdef ... #else ... #endif are used by the C preprocessor to select which part of the code is actually given to the compiler.  If you want to test both versions in pr6_ctime.c, the Sun compiler command now is one of

cc -v -D_POSIX_PTHREAD_SEMANTICS -o pr6 pr6.1.c pr6_ctime.c
cc -v -o pr6 pr6.1.c pr6_ctime.c
and the same -D option can be used with c99 and gcc (use gcc on Linux and Mac OS X).  The -D option acts as if there was a corresponding #define before the first line of the source code.  Using the POSIX version is generally preferred.

This would be a good time to start thinking about writing a shell script or Makefile to keep track of the various compiler commands, so you can avoid retyping them and making mistakes.  The Makefile provided later is an example, using some features of make not previously discussed.

<checkpoint>



In general, if a Unix system function fails, it returns -1 or 0 or NULL instead of a proper value.  In this case, the external integer variable errno is set to a value that indicates the nature of the failure.  You should check the return value of any system function, and act on an error in some way that makes sense for the program.

The possible values for the failure indicator and for errno are given in the man page for the system function.  The particular values of errno that are interesting are given symbolically (for example, ECHILD), and the values can be converted to character strings with strerror(3C).

/* for errno, strerror(3C) */
#include <errno.h>
#include <string.h>

Some uses of errno and strerror() associated with the system functions signal(), sigaction(), fork(), wait() and waitpid() are shown later.  Note that we checked the return values of time() and ctime_r() in Ctime()printf() hardly ever fails, so most programmers don't check its return value.  Some programmer tools, such as lint(1) on Solaris, can check to see if you have ignored the return value, but this is not entirely foolproof.



A signal handler has the general form

void sighandler(int sig)
{
  /* This is just a sample */
  print_msg_1("sighandler, signal number", sig);
}

The parameter is a signal number, supplied as part of generating and catching the signal.  The complete list of signal numbers and their symbolic values is given in the header signal.h and the man page signal.h(3HEAD); partial lists are in CS:APP Fig. 8.25, or APUE Sec. 10.2.

Of course, what the signal handler actually should do depends on the requirements of the program.  In the simplest case, it could just print a message giving the signal number, and let the program continue.  In general, printing is not required, and it is often undesirable.  You could also call exit() from within the handler, which is appropriate in some cases but not here.  In this project's final version you will only need handlers for the signals SIGINT, SIGCHLD and SIGALRM.  The handler body print_msg_1() is sufficient for now, but you should add to it later.

You should now pick up and read the code in pr6_signal.h and pr6_signal.c.  Compare this to CS:APP Fig. 8.34.

To install a signal handler, use

#include "pr6_signal.h"

and then just call the installer like this:

  install_signal_handler(SIGALRM, sighandler);

The reason the code in pr6_signal.c is so ugly (with the #ifdef's) is that the simpler code using signal(3C) is an older style, while the sigaction(2) function is a more modern version giving greater control if you want to exert it.  In this case, you do want to use sigaction(), for reasons explained later, but we are allowing for experiments.

In the example, when the process receives an alarm signal, the function sighandler() will be called.  (SIGALRM really is spelled that way, and not SIGALARM.)  Some additional code would be needed if you want to save the previously-installed signal handler so it could be restored later; you do not need to do that here.  The code to install the signal handler should go into main() before the call to fork(), which ensures that the parent and child will use the same handlers.

We wrote a simple program sigs.c to see what the different versions of Unix actually use for their signal numbers; sigs.c uses the non-standard value NSIG and the non-standard function strsignal().  There is output for Solaris 9, Solaris 10, Linux (2.6 kernel) and Mac OS X.  The lesson of this test is, always use the symbolic name of a signal, because the numeric values differ widely.  The symbolic names valid for a particular system are in the man page for signal(), or through the system command "kill -l" to generate the variable part of the signal names.  There is output from the latter for the same systems.

To test the signal handler installation, use the program pr6.2.c.  This will simply try to install your own signal handler for every available signal.  Note that some signal handlers, such as those responsible for killing the program, cannot be replaced.  The system function raise(3) will send signal i to its own process, and you should see one line of output from the handler.

Here is what you should do to improve your understanding of signals:

  1. Copy the program pr6.2.c, comment out the call to signal_test(), and compile the program with one of
        cc -v -D_POSIX_PTHREAD_SEMANTICS -o pr6 pr6.2.c pr6_ctime.c pr6_signal.c
        gcc -Wall -Wextra -D_POSIX_PTHREAD_SEMANTICS -o pr6 pr6.2.c pr6_ctime.c pr6_signal.c
  2. Explain any problems reported by the compiler and fix them.
  3. Run the program (no command-line options required) and verify that some signals cannot have their handlers replaced.
  4. Uncomment the call to signal_test(), recompile and rerun the program.
  5. Run the program again, and type control-C while it is running.  Normally the terminal driver forwards control-C to the foreground process as a SIGINT signal, and the default handler for SIGINT will terminate the program.  Here it should print one line of output and continue.  Try control-Z as well; normally this will temporarily stop the process, and you would restart it using the jobs and fg commands, or terminate the process with the kill command.  We'll have more to say about foreground and background processes in the next project.
  6. In signal_test(), replace the if test so it is always true, then recompile and rerun.  It should stop running when you reach the SIGKILL signal, whose handler could not be replaced.
  7. Explain (to yourself, not to hand in) the output you have seen.  Pay attention to the times!  [For example, Why did the program not sleep for 60 seconds at the end?]
You should also try the original pr6.2.c without the call to signal_setup().

Again, note that some of the code in pr6.2.c is for testing and understanding, and will need to be removed in the final version of the program.

<checkpoint>



An alarm is an event that occurs in the future, at which time an alarm signal is generated for the process.  The function alarm(2) sets one alarm time, some number of seconds in the future.  For example,

  alarm(10);

for ten seconds ahead, removing any previous alarm setting.  (Remember, the 2 in alarm(2) refers to section 2 of the man pages.)  If the -a or -b options of pr6 are used with a non-zero time (assumed to be an integer number of seconds), then you can set an alarm with something like

/* for alarm(2) */
#include <unistd.h>

  if (alarm_time_interval > 0)
    { alarm(alarm_time_interval); }

After the alarm signal is delivered, the alarm can be reset with the same time interval.

The reason to prefer sigaction() over signal() is so that the signal handler you installed will stay installed.  On some Unix systems, including Solaris but not Linux or Mac OS X, signal() can have the effect of a one-shot installation, so that in this case, if the process receives more than one signal of the same type, after the first signal is caught then the default signal handler is reinstalled.  This is usually not what you want to happen.  Here is an example assuming you used signal() for the installation.  (In pr6_signal.c, that would be the case if you compile with the -DPR6_USE_SIGNAL option.)

void sighandler_test_case(int sig)
{ /* point A */  signal(sig, sighandler_test_case); }

Which signal handler is installed when the process arrives at /* point A */?  It will be the default handler.  If you don't reinstall the signal handler, the default remains in effect.  This may lead to a race condition or undesired behavior.

Here is an example on Solaris.  The commands entered are in bold.  We typed control-C twice after the first "signal 2" output appeared, and omitted some of the output for convenience.

% cc -v -D_POSIX_PTHREAD_SEMANTICS -o pr6 pr6.2.c pr6_ctime.c pr6_signal.c
pr6.2.c:
pr6_ctime.c:
pr6_signal.c:

% pr6
CMPSC 311 Project 6, version 2
install_signal_handler(0) failed: Invalid argument
install_signal_handler(9) failed: Invalid argument
install_signal_handler(23) failed: Invalid argument
 18222: Tue Mar 12 13:50:57 2013 generic_signal_handler, signal 1
 18222: Tue Mar 12 13:50:58 2013 generic_signal_handler, signal 2
 18222: Tue Mar 12 13:50:59 2013 generic_signal_handler, signal 3
 18222: Tue Mar 12 13:51:00 2013 generic_signal_handler, signal 4
^C 18222: Tue Mar 12 13:51:00 2013 generic_signal_handler, signal 2
 18222: Tue Mar 12 13:51:00 2013 generic_signal_handler, signal 5
 18222: Tue Mar 12 13:51:01 2013 generic_signal_handler, signal 6
 18222: Tue Mar 12 13:51:02 2013 generic_signal_handler, signal 7
^C 18222: Tue Mar 12 13:51:03 2013 generic_signal_handler, signal 2
 18222: Tue Mar 12 13:51:03 2013 generic_signal_handler, signal 8
 18222: Tue Mar 12 13:51:04 2013 generic_signal_handler, signal 10
 18222: Tue Mar 12 13:51:05 2013 generic_signal_handler, signal 11
 ...
 18222: Tue Mar 12 13:51:40 2013 generic_signal_handler, signal 47
 18222: Tue Mar 12 13:51:41 2013 generic_signal_handler, signal 48
 18222: Tue Mar 12 13:51:44 2013 generic_signal_handler, signal 14
 18222: Tue Mar 12 13:51:44 2013 all done!


% cc -v -D_POSIX_PTHREAD_SEMANTICS -DPR6_USE_SIGNAL -o pr6 pr6.2.c pr6_ctime.c pr6_signal.c
pr6.2.c:
pr6_ctime.c:
pr6_signal.c:

% pr6
CMPSC 311 Project 6, version 2
install_signal_handler(0) failed: Invalid argument
install_signal_handler(9) failed: Invalid argument
install_signal_handler(23) failed: Invalid argument
 18245: Tue Mar 12 13:52:40 2013 generic_signal_handler, signal 1
 18245: Tue Mar 12 13:52:41 2013 generic_signal_handler, signal 2
 18245: Tue Mar 12 13:52:42 2013 generic_signal_handler, signal 3
 18245: Tue Mar 12 13:52:43 2013 generic_signal_handler, signal 4
^C

%
Notice that the first SIGINT signal was sent by the process to itself with raise(), and the second was sent to the process by the terminal driver in response to the control-C.  The second used the default handler, which terminated the process.

By now you may have noticed that the default handler for SIGALRM will print "Alarm clock" and terminate the process.  If not, just let the second-compiled pr6 run to completion.

If you tried running this example on Linux, and got output that begins with

Using built-in specs.
Target: x86_64-redhat-linux
...

then you forgot to change the compile command from "cc -v" to "gcc -Wall -Wextra".  Recall that the cc and gcc commands on Linux and Mac OS X are the same.

When comparing Unix signal handlers to C++ or Java exception handlers, keep these features in mind.  Signal handlers apply to the entire process and are installed while the program is running.  This is the same concept as for interrupt handlers in an operating system.  Exception handlers in C++ or Java (the catch clause following try) apply to regions of the program.  This gives more structure to the program, and greater control over program behavior, but at a higher design and runtime cost.  The major benefit is that errors in program design can often be caught by the compiler before runtime. 

<checkpoint>



If the -s or -t options of pr6 are used with a non-zero time (assumed to be an integer number of seconds), then the process should sleep for a total of that number of seconds.  The parent process should sleep after all the child processes have been started (there are no child processes at this point in the development).  The system function sleep(3C) takes the process out of the operating system's run queue and reschedules it to start again at a later time.  The code for this ought to be easy:

/* for sleep(3C) */
#include <unistd.h>

  if (sleep_time > 0)
    { sleep(sleep_time); }

The problem is that if a signal (in particular, SIGALRM) is received while the process is sleeping, the process wakes up and sleep() returns too soon; you should have noticed this earlier.  So, sleep() returns the number of seconds remaining from the original request.  You can reset the alarm and sleep for the remaining time, as follows:

  int remaining_sleep_time = sleep_time;

  while (remaining_sleep_time > 0)
    {
      if (alarm_time_interval > 0)
        { alarm(alarm_time_interval); }
      remaining_sleep_time = sleep(remaining_sleep_time);
    }

Note that these are now the only calls to alarm() and sleep(), but the argument values are different in the parent and child processes.

All this is wrapped up in the Sleep() function in pr6.3.c, which provides some more tests you should run shortly.

There are other ways to reset the alarm, most of which cause mysterious behavior.  For example, resetting the alarm inside the SIGALRM signal handler may confuse sleep() about how much time was remaining.  One good technique to look at later is the interval timer function setitimer(2).

Your signal handlers could print some additional messages to help see what's going on.  If your program ends unexpectedly with the message "Alarm clock" then you have the problem that the default alarm signal handler is installed, or became reinstalled.  For example, try compiling pr6.3.c with the -DPR6_USE_SIGNAL option.  You could change from signal() to sigaction(), or call signal() again inside the while loop just before alarm(), or call signal() again inside the handler.

% cc -v -D_POSIX_PTHREAD_SEMANTICS -o pr6 pr6.3.c pr6_ctime.c pr6_signal.c
pr6.3.c:
pr6_ctime.c:
pr6_signal.c:

% pr6
CMPSC 311 Project 6, version 3
 18270: Tue Mar 12 13:53:47 2013 starting - the default signal handlers are installed
 18270: Tue Mar 12 13:53:47 2013 going to sleep for 10 seconds - try control-C or control-Z (or not)
 18270: Tue Mar 12 13:53:57 2013 please wait while the signal handlers are changed
install_signal_handler(0) failed: Invalid argument
install_signal_handler(9) failed: Invalid argument
install_signal_handler(23) failed: Invalid argument
 18270: Tue Mar 12 13:53:57 2013 ok - the generic signal handlers are installed
 18270: Tue Mar 12 13:53:57 2013 going to sleep for 15 seconds - try control-C or control-Z (or not)
 18270: Tue Mar 12 13:54:01 2013 alarm signal received
 18270: Tue Mar 12 13:54:05 2013 alarm signal received
 18270: Tue Mar 12 13:54:09 2013 alarm signal received
 18270: Tue Mar 12 13:54:12 2013 all done!


% cc -v -D_POSIX_PTHREAD_SEMANTICS -DPR6_USE_SIGNAL -o pr6 pr6.3.c pr6_ctime.c pr6_signal.c
pr6.3.c:
pr6_ctime.c:
pr6_signal.c:

% pr6
CMPSC 311 Project 6, version 3
 18293: Tue Mar 12 13:54:49 2013 starting - the default signal handlers are installed
 18293: Tue Mar 12 13:54:49 2013 going to sleep for 10 seconds - try control-C or control-Z (or not)
 18293: Tue Mar 12 13:54:59 2013 please wait while the signal handlers are changed
install_signal_handler(0) failed: Invalid argument
install_signal_handler(9) failed: Invalid argument
install_signal_handler(23) failed: Invalid argument
 18293: Tue Mar 12 13:54:59 2013 ok - the generic signal handlers are installed
 18293: Tue Mar 12 13:54:59 2013 going to sleep for 15 seconds - try control-C or control-Z (or not)
 18293: Tue Mar 12 13:55:03 2013 alarm signal received
Alarm clock


There is one more experiment you should try, but the results might not be easy to see at this point.  The setup for sigaction() in pr6_signal.c uses SA_RESTART to communicate to other system functions that they should restart after the process catches a signal.  If SA_RESTART is replaced with 0 (or if sigaction() is replaced by signal()), then a signal could cause a system function to return before it is finished.  Some of the following code is designed to handle that case.

As a reminder, these tests were run on Solaris.  If you don't get the same result with the same program on Linux or Mac OS X, that's expected.

If you tried using control-Z to test the program, it is possible you have some background or suspended jobs.  Try running the jobs command; if there is no output, then you have no background or suspended jobs.  You can use a command like "kill %1" to terminate a suspended job (in this case, job 1).  Jobs that are running in the background probably should be left alone.  For example,

% cc -v -D_POSIX_PTHREAD_SEMANTICS -o pr6 pr6.3.c pr6_ctime.c pr6_signal.c
pr6.3.c:
pr6_ctime.c:
pr6_signal.c:

% pr6
CMPSC 311 Project 6, version 3
 18315: Tue Mar 12 13:55:50 2013 starting - the default signal handlers are installed
 18315: Tue Mar 12 13:55:50 2013 going to sleep for 10 seconds - try control-C or control-Z (or not)
^C

% jobs

% pr6
CMPSC 311 Project 6, version 3
 18317: Tue Mar 12 13:56:13 2013 starting - the default signal handlers are installed
 18317: Tue Mar 12 13:56:13 2013 going to sleep for 10 seconds - try control-C or control-Z (or not)
^Z
Suspended

% jobs
[1]  + Suspended                     pr6

% kill %1

% (just type return)
[1]    Terminated                    pr6

<checkpoint>



Now it's time to create some more processes.

The code to create a child process looks like this.  Indented parts are inside some function, perhaps main().

/* for fork(2), getpid(2) */
#include <sys/types.h>
#include <unistd.h>

  pid_t child_pid;

  child_pid = fork();

  /* There is (should be) one more process running the same program.
   * Both processes have returned from fork(), but with different
   * values assigned to child_pid.
   */

  if (child_pid == (pid_t)(-1))
    { /* This is the parent process.  The fork failed, there is no child. */
      print_msg_error("fork()", strerror(errno));

      /* maybe quit? */
    }
  else if (child_pid == 0)
    { /* This is the child process.  The fork succeeded. */

      /* add more code, but still exit */

      exit(child_exit_status);
         /* this will also send a SIGCHLD signal to the parent process */
    }
  else
    { /* This is the parent process.  The fork succeeded. */

      /* add more code, but do not exit yet */
    }

The entire address space of the parent process is copied to build the address space of the child process.  Once the child process starts, it is at the point in the program where fork() returns, the same point as in the parent process.  The only way the two can notice which is which is by the return value from fork().  In the child process, fork() returns 0; in the parent process, fork() returns the process identifier of the child process that was just created.

It will be helpful to print the process identifiers of each process and its parent after the fork, just to see what's going on.  Similarly, print this information before each process exits.  Use the getpid() and getppid() functions for this. 

<checkpoint>



The system function exit() terminates the process, sends a SIGCHLD signal to the parent process, and places its argument (the child process exit status) in a place where it can be retrieved by the parent process.  See exit(2) and exit(3C).  You should now be able to install a SIGCHLD handler without much difficulty, but its function is not yet clear.



The parent can wait for a child to terminate with something like the following:  (see the files pr6_wait.h and pr6_wait.c)

/* for errno */
#include <errno.h>

/* for waitpid(2) and wait(2) */
#include <sys/types.h>
#include <sys/wait.h>

/* wait for a child process whose pid you know
 *
 * return 1 if a child was found
 *    *child_status has been updated, and the child has terminated
 *
 * return 0 if no child was found
 *    *child_status has not been updated
 */

int wait_child(pid_t wait_pid, int *child_status)
{
  int s;

  /* loop because waitpid() can be interrupted by a signal and return early */

  while (waitpid(wait_pid, &s, 0) == (pid_t)(-1))
    {
      if (errno == ECHILD)                  /* no more children */
        { return 0; }
    }

  *child_status = s;

  return 1;
}

/* wait for a child process whose pid you do not know
 *    if more than one child has terminated, report only one
 *
 * return 1 if a child was found
 *    *wait_pid and *child_status have been updated, and the child has terminated
 *
 * return 0 if no child was found
 *    *wait_pid and *child_status have not been updated
 */

int wait_any_child(pid_t *wait_pid, int *child_status)
{
  pid_t w;
  int s;

  /* loop because wait() can be interrupted by a signal and return early */

  while ((w = wait(&s)) == (pid_t)(-1))
    {
      if (errno == ECHILD)                  /* no more children */
        { return 0; }
    }

  *wait_pid = w;
  *child_status = s;

  return 1;
}

It would make more sense to use the wait.h(3HEAD) macros applied to child_status, but that is a feature you can use in the next project (see CS:APP Fig. 8.17 or APUE Sec. 8.6 for an example).

The reason that wait() and waitpid() should be used in a loop is that if the parent process receives any signal, then wait() returns.  It is possible that the child has not actually terminated, and you need to wait some more.  Otherwise, the code would have been something easy like

  wait_pid = wait(&child_status);

  if (wait_pid == (pid_t)(-1))
    { deal with the error }

As long as a process exists, its own process identifier will not change.  However, if the parent process terminates before the child process, then the parent process ID of the child is set to 1 for the init process.  This affects the return value of getppid() in the child process.

<checkpoint>



In preparation for the next project, you should make it possible for the parent to have several children.  The pr6 command-line option -c can be used to determine how many children to fork().   The easiest way to do this is to put some of the code you have written so far into a loop, actually two loops.  Fork the children in one loop, and wait for the children in a second loop.  In general, you don't know the order in which the children will finish, so the second loop should use wait_any_child().  One version of the solution to be posted will use wait_child() in the second loop, just to test the code; it will be useful later.  Of course, you would need to change child_pid from a simple variable to an array.

<checkpoint>



Here is some sample output at this stage of the program.  Note the process numbers that distinguish the parent and child processes.  The "child finished" output uses print_msg_2() to print the process number and exit status as retrieved by wait().  Note that the exit status 2 in the last example does not appear in the low byte. This has to do with the encoding of the exit status and termination status of the child process into one int.  The WEXITSTATUS macro in <sys/wait.h> cleans this up.

% cc -v -D_POSIX_PTHREAD_SEMANTICS -o pr6 pr6.4.c pr6_ctime.c pr6_signal.c pr6_wait.c
pr6.4.c:
pr6_ctime.c:
pr6_signal.c:
pr6_wait.c:

% pr6 -a 2 -b 3 -s 5 -t 8
CMPSC 311 Project 6, version 4
  child_alarm_time = 2
  parent_alarm_time = 3
  child_sleep_time = 5
  parent_sleep_time = 8
 18350: Tue Mar 12 13:57:56 2013 here is the parent, all children created
 18350: Tue Mar 12 13:57:59 2013 alarm signal received
 18350: Tue Mar 12 13:58:02 2013 alarm signal received

% pr6 -a 2 -b 3 -s 5 -t 8 -c 1
CMPSC 311 Project 6, version 4
  child_alarm_time = 2
  parent_alarm_time = 3
  child_processes = 1
  child_sleep_time = 5
  parent_sleep_time = 8
 18353: Tue Mar 12 13:58:24 2013 here is the parent, all children created
 18354: Tue Mar 12 13:58:24 2013 here is child 0
 18354: Tue Mar 12 13:58:26 2013 alarm signal received
 18353: Tue Mar 12 13:58:27 2013 alarm signal received
 18354: Tue Mar 12 13:58:28 2013 alarm signal received
 18353: Tue Mar 12 13:58:29 2013 child signal received - ignored
 18353: Tue Mar 12 13:58:32 2013 alarm signal received
 18353: Tue Mar 12 13:58:32 2013 child finished 18354 0x00000000

% pr6 -a 2 -b 3 -s 5 -t 8 -c 2
CMPSC 311 Project 6, version 4
  child_alarm_time = 2
  parent_alarm_time = 3
  child_processes = 2
  child_sleep_time = 5
  parent_sleep_time = 8
 18357: Tue Mar 12 13:58:59 2013 here is the parent, all children created
 18358: Tue Mar 12 13:58:59 2013 here is child 0
 18359: Tue Mar 12 13:58:59 2013 here is child 1
 18358: Tue Mar 12 13:59:01 2013 alarm signal received
 18359: Tue Mar 12 13:59:01 2013 alarm signal received
 18357: Tue Mar 12 13:59:02 2013 alarm signal received
 18358: Tue Mar 12 13:59:03 2013 alarm signal received
 18359: Tue Mar 12 13:59:03 2013 alarm signal received
 18357: Tue Mar 12 13:59:04 2013 child signal received - ignored
 18357: Tue Mar 12 13:59:04 2013 child signal received - ignored
 18357: Tue Mar 12 13:59:07 2013 alarm signal received
 18357: Tue Mar 12 13:59:07 2013 child finished 18358 0x00000000
 18357: Tue Mar 12 13:59:07 2013 child finished 18359 0x00000000

% pr6 -a 2 -b 3 -s 5 -t 8 -c 2 -x 2
CMPSC 311 Project 6, version 4
  child_alarm_time = 2
  parent_alarm_time = 3
  child_processes = 2
  child_sleep_time = 5
  parent_sleep_time = 8
  child_exit_status = 2
 18375: Tue Mar 12 13:59:32 2013 here is the parent, all children created
 18376: Tue Mar 12 13:59:32 2013 here is child 0
 18377: Tue Mar 12 13:59:32 2013 here is child 1
 18377: Tue Mar 12 13:59:34 2013 alarm signal received
 18376: Tue Mar 12 13:59:34 2013 alarm signal received
 18375: Tue Mar 12 13:59:35 2013 alarm signal received
 18377: Tue Mar 12 13:59:36 2013 alarm signal received
 18376: Tue Mar 12 13:59:36 2013 alarm signal received
 18375: Tue Mar 12 13:59:37 2013 child signal received - ignored
 18375: Tue Mar 12 13:59:37 2013 child signal received - ignored
 18375: Tue Mar 12 13:59:40 2013 alarm signal received
 18375: Tue Mar 12 13:59:40 2013 child finished 18376 0x00000200
 18375: Tue Mar 12 13:59:40 2013 child finished 18377 0x00000200




Also in preparation for the next project, the parent process should maintain a "process table" (use the ps command in Unix to see all your current processes).  This part of the current project does not need to be very sophisticated.  Each entry in the table contains information about a process you have created, except for the parent itself.  After a new child process is created, the parent puts its information into the table, and then removes that information when the child terminates.  At this point, the only information you have is the child process number and some simple state information (new, running, stopped, terminated, etc.).  The table can be updated by the parent process after fork() (the child process is new and assumed to be running) and after wait() or waitpid() (the child process was running and has now terminated).  As an illustration, you could use a fixed-size array like this:

/* fixed-size process table, give the size as a symbolic constant */
#define MAX_CHILDREN 8

/* an entry in the process table */
typedef struct pr6_process {
  pid_t pid;            /* process ID, supplied from fork() */
                        /* if 0, this entry is currently not in use */
  int   state;          /* process state, your own definition */
  int   exit_status;    /* supplied from wait() if process has finished */
} pr6_process_info;

/* the process table, maintained by the parent process only */
pr6_process_info process_table[MAX_CHILDREN];

A full but too-simple implementation of this is given in the files pr6_table.h and pr6_table.c.  One problem with this design is the size limit.  It would be better to use a dynamic data structure, such as a linked list or an array allocated with malloc(), so you don't have to force a strong limit on the number of child processes (8 is not enough on a real system) and so that space can be economized (usually we are far below the maximum).  However, the fixed-size table is a reasonable compromise for now.  You will need to improve on it in Project 7.  It will be useful to have a function to print the process table, and to use this as part of the verbose option (-v).  An example is given below.

One idea you should consider but reject is to update the process table as soon as possible.  After all, the child process sends a SIGCHLD signal to the parent as soon as it terminates.  (Actually, there are some other times when this signal could be sent, but don't worry about that yet.)  Why not let the signal handler for SIGCHLD call wait() and update the process table?  It turns out that this is a bad idea.  The reason is that a child process could exit and send the signal before the parent process has even created the process table entry for the child.  This gets you into a race condition, and very likely into an error condition.  Later we'll discuss how to get around the problem, but the easiest approach for now is to avoid it entirely.  This is a real problem and you will need to know how to deal with it, but that's mostly for later.  See CS:APP Sec. 8.5.7 for another example of the same problem.

Here's an example of the race condition, but without using the process table.  The first version is on Solaris, and the second version is on Mac OS X.  We added a call to print_msg_1() in the parent so you can see its progress through the loop that calls fork(); the extra number after "here is the parent" is just a loop iteration counter.  Pay attention to the relative ordering of the output from the parent and from child 0.

% cc -v -D_POSIX_PTHREAD_SEMANTICS -o pr6 pr6.4a.c pr6_ctime.c pr6_signal.c pr6_wait.c
pr6.4a.c:
pr6_ctime.c:
pr6_signal.c:
pr6_wait.c:

% pr6 -b 3 -t 8 -c 2 -x 2
CMPSC 311 Project 6, version 4a
  parent_alarm_time = 3
  child_processes = 2
  parent_sleep_time = 8
  child_exit_status = 2
 18411: Tue Mar 12 14:03:49 2013 here is the parent 0
 18412: Tue Mar 12 14:03:49 2013 here is child 0
 18411: Tue Mar 12 14:03:49 2013 here is the parent 1
 18411: Tue Mar 12 14:03:49 2013 here is the parent, all children created
 18411: Tue Mar 12 14:03:49 2013 child signal received - ignored
 18413: Tue Mar 12 14:03:49 2013 here is child 1
 18411: Tue Mar 12 14:03:49 2013 child signal received - ignored
 18411: Tue Mar 12 14:03:52 2013 alarm signal received
 18411: Tue Mar 12 14:03:55 2013 alarm signal received
 18411: Tue Mar 12 14:03:57 2013 child finished 18412 0x00000200
 18411: Tue Mar 12 14:03:57 2013 child finished 18413 0x00000200


% gcc -Wall -Wextra -D_POSIX_PTHREAD_SEMANTICS -o pr6 pr6.4a.c pr6_ctime.c pr6_signal.c pr6_wait.c
pr6_signal.c:63: warning: unused parameter ‘sig’
pr6_signal.c:63: warning: unused parameter ‘func’

% pr6 -b 3 -t 8 -c 2 -x 2

CMPSC 311 Project 6, version 4a
  parent_alarm_time = 3
  child_processes = 2
  parent_sleep_time = 8
  child_exit_status = 2
  7974: Tue Mar 12 14:18:07 2013 here is the parent 0
  7975: Tue Mar 12 14:18:07 2013 here is child 0
  7974: Tue Mar 12 14:18:07 2013 child signal received - ignored
  7974: Tue Mar 12 14:18:07 2013 here is the parent 1
  7974: Tue Mar 12 14:18:07 2013 here is the parent, all children created
  7976: Tue Mar 12 14:18:07 2013 here is child 1
  7974: Tue Mar 12 14:18:07 2013 child signal received - ignored
  7974: Tue Mar 12 14:18:10 2013 alarm signal received
  7974: Tue Mar 12 14:18:13 2013 alarm signal received
  7974: Tue Mar 12 14:18:15 2013 child finished 7975 0x00000200
  7974: Tue Mar 12 14:18:15 2013 child finished 7976 0x00000200

<checkpoint>



There is another "feature" of the program which you might find distressing.  Here's the problem, the cause, and the cure.  Consider this little program, omitting the include files.

int main(void)
{
  printf("0 %d\n", getpid());
  fork();
  printf("1 %d\n", getpid());
  return 0;
}

Here is some output on Solaris (Linux is similar, because it's really a problem with the C libraries).  The first command line (after compiling) sends all output to the terminal window.  The second command line sends all output to the program cat through a pipe.  cat simply repeats its input, in this case by sending it to the terminal.  The third command line sends all output to a file out, then prints the file.

% cc -o a example.c

% a
0 27393
1 27393
1 27394

% a | cat
0 27395
1 27395
0 27395
1 27397

% a > out ; cat out
0 27398
1 27398
0 27398
1 27399

There are two different lines beginning with "1 " because there are two processes running the program at the point of the second printf().  But, why did we get two identical lines "0 27395"?  After all, there was only one printf() of this text before the fork().  The reason is that stdout in C, and cout in C++, is a buffered output stream.  The characters output by printf() are placed in a reserved part of memory in the process address space.  If the output is really going to the terminal window, then the buffer is "flushed" promptly, so you see the output as soon as it is complete (the output stream is line buffered).  If the output is going to a file or to another process through a pipe, then the "flush" operation is delayed until the buffer is full (the output stream is fully buffered) or until the fflush(3C) function is called by the producer (the program example.c in this case).  When the parent process in the example executes fork(), its entire address space is copied to build the child process address space.  If the output buffer has anything in it, that is also copied.  When the parent does its next printf(), the original copy of the buffer is used.   When the child does its next printf(), its copy of the buffer is used.  Eventually, both buffers are flushed.  That's why you get the same line twice.  The cure for the problem is to call fflush() before fork(), as follows:

  if (flush_before_fork)
    { fflush(stdout); fflush(stderr); }

  child_pid = fork();

The flag flush_before_fork should be set from the command line option -f, with the default being to perform the flush operations.  This will give you some flexibility for experiments.  Remember, the conditional here is only to make experiments easy.  A proper application program would always fflush() before fork()

<checkpoint>



Although it is not part of this project, here is how two processes can send signals to each other in a more general way.  This will be needed later.

/* for kill(2) */
#include <sys/types.h>
#include <signal.h>

  /* in the child */
  if (...)
    { kill(parent_pid, SIGUSR1); }

  /* in the parent */
  if (...)
    { kill(child_pid, SIGUSR2); }

The name kill() is from the old days of Unix, when the kill signal to terminate a process was the most important use of signals.  The signals SIGUSR1 and SIGUSR2 (yes, these are spelled correctly) are reserved for application programs, without any restrictions on their use or interpretation by the OS.

Consider:  What happens to the signal being sent if the other process has already terminated?




Here is some sample output from a completed program on Solaris.  Yours does not need to be identical, but should be similar as far as the timing and ordering is concerned.  You might see different behavior with Linux or Mac OS X.

% cc -v -D_POSIX_PTHREAD_SEMANTICS -o pr6 pr6.5.c pr6_ctime.c pr6_signal.c pr6_wait.c pr6_table.c
pr6.5.c:
pr6_ctime.c:
pr6_signal.c:
pr6_wait.c:
pr6_table.c:

% pr6 -h
CMPSC 311 Project 6, version 5
Usage: pr6 [-h] [-v] [-a n] [-b n] [-c n] [-f n] [-s n] [-t n] [-x n]
    -h      help
    -v      verbose mode
    -a n    child alarm time interval, default 0
    -b n    parent alarm time interval, default 0
    -c n    fork() n child processes, default 0, max 5
    -f n    fflush() before fork(), 0 = no, 1 = yes, default 1
    -s n    child sleep time, default 0
    -t n    parent sleep time, default 0
    -x n    child exit status n, default 0

The output from the following commands can be found here.

% pr6
% pr6 -c 1
% pr6 -c 2
% pr6 -v -c 1 -x 12
% pr6 -b 3 -t 10
% pr6 -a 3 -s 10 -c 1
% pr6 -a 3 -s 10 -c 3
% pr6 -a 3 -b 7 -s 8 -t 15 -c 1
% pr6 -a 3 -b 7 -s 8 -t 15 -c 3
% pr6 -a 1 -b 2 -s 3 -t 4 -x 5 -c 2
% pr6 -a 1 -b 2 -s 3 -t 4 -x 5 -c 2 -v
% pr6 -a 1 -b 2 -s 3 -t 4 -x 5 -c 4 -v




Any time you have a program split into several files, recompiling the various parts can become tedious.  The traditional Unix tool for managing program development is make.  The idea is to create a file, called Makefile or makefile, that contains instructions on how to build the program or take various actions.  You then give a command like "make target", and make will read the Makefile, extract the relevant commands, and cause them to be executed.

Here is a makefile that could be useful on Solaris; save it as Makefile.  Some other variations,
Makefile-c99, using C99 for Solaris
Makefile-gcc, using GCC for Solaris
Makefile-lnx, using GCC for Linux
Makefile-mac, using GCC for Mac OS X

Be sure that lines like "cc ..." begin with a tab character and not 8 spaces.  Pay attention to what lint says about "function returns value which is always ignored".  This means that you might not be checking the success or failure of a system function.



There are a few standard and non-standard system functions that might be useful, but you don't need them here:
  strsignal(3C), sig2str(3C) -- convert signal number to string
  str2sig(3C) -- convert string to signal number
  perror(3C) -- print errno information to stderr (we used strerror() and stdout here)
  psignal(3C) -- print signal information to stderr

The Solaris man page siginfo.h(3HEAD) might be useful later.



Instructions for turning in the program.

Your program and results should be submitted in two steps.
In your project6 directory, run these commands, which will create the file project-6-username.tar.gz.  Be sure to substitute your own username and your own list of files.  The first command creates a tar file, and the second confirms its contents.  The third command compresses the file.  Note that the first command is so long that it may wrap around to the next line of the browser; it ends with Makefile.

tar cvf project-6-username.tar pr6.5.c pr6_ctime.[ch] pr6_signal.[ch] pr6_wait.[ch] pr6_table.[ch] Makefile

tar tvf project-6-username.tar

gzip project-6-username.tar

ls -l project-6-username.tar.gz

Login to ANGEL and put the file project-6-username.tar.gz in the ANGEL Dropbox for Project 6 (with your username substituted, of course).

The grade will be based on the paper version that you turn in at class time, and the electronic version is in case we have any questions about your program or data.



Here are some remarks by a former student, about his experience dealing with fork().  There's a lot of belated wisdom here.

I spent a loonnggggggggg time trying to figure out simply what was going on.  Fork() made little sense to me initially but I looked up examples online but mainly learned by making simple test programs.  To me it wasn't clear that fork() split the program at the line of fork().  I was under the initial impression that fork started a process from the beginning.

This was followed/coincided with a period of intense cursing.

There wasn't much planning.  Once I figured out what to do, things fell into place.



Last revised 12 Mar. 2013