CMPSC 311, Introduction to Systems Programming

main() and exit()



Reading
References


main()

main() is a function with some special properties required by C and by Posix.  In the simplest sense, it is the first function called when a C program starts as a new process, but there are lots of details involved.  In reality, there is a startup function, provided by the OS, that is called before main().  The startup function arranges initial data in the process address space, and then it calls main().  The process ends when main() returns or the program calls exit() (normal behavior) or abort() (abnormal behavior); control then returns to the execution environment.


Hosted environment

main() is required in a hosted environment, which is the typical setting on a workstation or anything else with an operating system.  There might be some other way to start a program in a freestanding environment, so nothing can be said about main() there without checking the system-specific documentation.



Objects with static storage duration

An object in C is a region of data storage in the execution environment, which consists of a contiguous sequence of one or more bytes, whose contents can represent values; don't confuse this simple definition with objects in C++.  An object's storage duration, or lifetime, is the portion of program execution time during which storage is guaranteed to be reserved for the object.  An object exists, has a constant address, and retains its last-stored value throughout its lifetime.
Objects with static storage duration are allocated and initialized by the startup function before main() is called, and remain allocated until the process terminates.  This category of data includes
The startup function will also initialize the runtime stack for the process, effectively providing arguments and a return address for main().  Of course, the program itself will be loaded into memory, or at least enough of it to get started.

Exercise.  Why is main() not necessarily the first programmer-defined function called when a C++ program starts?

Exercise.  How much storage is reserved for the string literals if you write
char *foo = "string 1"; char *bar = "string 2";
Would your answer be different if you write
char *foo = "string 1"; char *bar = "string 1";
or
char foo[] = "string 1"; char bar[] = "string 2";
How would your answer affect the design of the rest of your program?



main()'s return type is int

main() returns an int value to the startup function, which passes it on to the execution environment as the exit status of the process.  The command shells use the exit status of a process to indicate success (0) or failure (nonzero, with the particular value indicating the reason for failure).  The function exit() takes one int argument that also acts as the exit status.  The function abort() terminates the process indirectly by sending it a SIGABRT signal, which (usually) leads to exit() being called with a nonzero argument.

Don't forget the famous tag line, "Friends don't let friends void main()."

You can recover the exit status of the immediately previous command, but the syntax depends on which shell you are using.
You can use the exit status with a conditional statement in a shell script.  This example uses sh, which is the standard Posix shell.

if prog foo bar
then
  echo prog was successful # exit status zero
else
  echo prog failed         # exit status nonzero
fi

The important concept is that any program you write could be used in a shell script, and therefore must set its exit status according to the usual expectations.

Example.

void-main-void.c

void main(void) { return; }

Try it.

% gcc void-main-void.c
void-main-void.c: In function 'main':
void-main-void.c:1: warning: return type of 'main' is not 'int'



main()'s parameters

There are two standard choices, and one non-standard choice that is often possible.  Recall the basic terminology that parameters are used in the definition of a function, arguments are used in the invocation (calling) of a function, and the two are matched up by the function call mechanism.



int main(void) { ... }

In this case, the parameters that are allowed are not used.  Nevertheless, the execution environment and the startup function will allocate and initialize storage as if they were going to be used.  This is not wasted effort.  There is additional information copied from the execution environment to the process address space that is accessible without going through main()'s parameters.  Anyway, the startup function doesn't know how main() was written, so it must do the same thing for all new processes.



int main(int argc, char *argv[]) { ... }

The parameter names, as for any function, are arbitrary, but these are customary.  The parameters allow access to the command line as interpreted by the command shell.  Commands are entered as a single line of text, expanded if necessary, and then broken into shorter strings representing the "words" of the command.  argc is the number of command-line arguments, and argv points to the arguments as a NULL-terminated array of null-terminated character strings.  Think of "argument count" and "argument vector".

The rules for the parameters (as defined by the C Standard) are
There are additional rules defined by the Posix Standard, and expected by users.



int main(int argc, char *argv[], char *envp[]) { ... }

The parameter names, as for any function, are arbitrary, but these are customary.  The parameters argc and argv behave as previously.  The third parameter gives access to the environment variables, but in a non-standard way.  This is one of the cases of a previously common practice that is still accepted by the compilers for backward-compatibility, but new programs should avoid it.  We'll discuss this more in a later section.



Example.  Make it possible for error messages to contain the name of the program as it was used on the command line.

prog.h

extern char *program_name;

prog.c

char *program_name = "[unknown]";

main.c

#include "prog.h"
#include "foo.h"

int main(int argc, char *argv[])
{
  if (argc > 0 && argv[0][0] != '\0')
    program_name = argv[0];

  foo("display this message");

  return 0;
}

foo.h

void foo(char *msg);

foo.c


#include <stdio.h>   /* for fprintf() */
#include <stdlib.h>  /* for exit() */
#include "prog.h"
#include "foo.h"

void foo(char *msg)
{
  fprintf(stderr, "%s: %s failed: %s\n", program_name, __func__, msg);
  exit(1);
}

Now let's try it.

% cc -o prog main.c prog.c foo.c
% cp prog prog2
% prog
prog: foo failed: display this message
% prog2
prog2: foo failed: display this message

Beyond this example, some programs are designed to adjust their behavior according to the name of the program.  The Posix utilities true and false are easily implemented this way.



Command-line arguments

Think of argv[] as the set of parameters or arguments to the program.  The easiest thing to do is just echo the arguments.

for (int n = 0; n < argc; n++)
  printf("argv[%d] = %s\n", n, argv[n]);

or

for (int n = 0; argv[n] != NULL; n++)
  printf("argv[%d] = %s\n", n, argv[n]);

Later we'll discuss getopt(), which is used to separate command-line options from command-line operands in a standard way.  The distinction is that options affect how the program works, while operands provide its data.



Environment variables

Think of environment variables as a set of global values accessible by the program, except that they are not variables in the usual sense.  An environment variable and its value are represented as a character string of the form  name=value  .  The set of strings is a NULL-terminated array.  The global variable environ is defined, allocated and initialized by the startup function, and must be declared in your program as

extern char **environ;

if you intend to use it.  Then you can do something like this

for (int n = 0; environ[n] != NULL; n++)
  printf("environ[%d] = %s\n", n, environ[n]);

However, this is not the right approach to finding the value associated with any particular environment variable.  Later we'll look at getenv(), to obtain environment variable values in a standard way.

The vocabulary is
The non-standard version of main() cited earlier,

int main(int argc, char *argv[], char *envp[]) { ... }

allows access to the initial state of the environment variables via envp.  Since it is possible for a running program to change its own environment variables with the putenv(), setenv() or unsetenv() library functions, use of envp instead of environ is often a mistake.  Nevertheless, we will need to look at envp in a later discussion of the process address space, to help learn more about how environ is managed.

When you use environ directly in a program, do not modify it (leave that to putenv() and so on), and question whether you really need to access it (leave that to getenv()).  When we use environ here, it is mostly to increase understanding of how Unix works.

Exercise.  Is it permitted for an environment variable name to use the character = ?  Is it permitted for an environment variable value to use the character = ?  What happens if the environment list contains two entries with the same name?  [no, yes, it's undefined]

Exercise.  Suppose you want to find the name of the current user (owner of the current process).  Your choices are the getlogin() function, among others, or the environment variable USER.  Which should you choose?



Files

At program startup, the three file streams standard input, standard output and standard error are predefined and opened.  After main() returns, or if exit() is called, all output streams are flushed, and all open files are closed.  If the process ends by calling abort(), the runtime system is not obligated to close all files properly; for example, an output stream might not be flushed before closing.



Termination

Process termination returns control to the execution environment.  If main() executes return n; it is as if it called exit(n).  The exit() function takes care of cleaning up the process, by calling functions registered with atexit(), flushing open output streams, closing open streams (input or output), removing temporary files, and setting up the exit status to return to the startup function.

The standard header <stdlib.h> defines two macros for use as arguments to exit(), EXIT_SUCCESS and EXIT_FAILURE.

Exercise.  This classic example does not follow modern practice in its arguments or return value.  Is it justified to claim that "the exit code is random"?  Rewrite the example in a completely standard way.
#include <stdio.h>

main()
{
printf("hello, world\n");
}
$ cc hello.c       compile with C89 on Linux or Mac OS X, using bash
$ a.out
hello, world
$ echo $?          print the exit status
13

% c99 hello.c     
compile with C99 on Solaris, using tcsh
"hello.c", line 4: warning: old-style declaration or incorrect type for: main
% a.out
hello, world
% echo $status
1



Miscellaneous

It is possible to obtain the address of a function with the unary & operator.  This is the address in memory of the first instruction of the function.  The expression &main is legal in C but not in C++.

main() cannot be inline'd, since the startup function needs to find main()'s address from information in the executable file.

In C11, the new keyword _Noreturn is used to indicate a function that does not return.  For example, the prototype for exit() would become
_Noreturn void exit(int status);
instead of the current
void exit(int status);
Some other uses are with abort(), longjmp()_Exit() and quick_exit().  Obviously, you could not use _Noreturn with main().



Summary and additional notes -- main()'s structure, command-line arguments, and environment variables

The usual way to start a C main program is with
int main(int argc, char *argv[]) { ... }
To get direct access to the environment variables, add the POSIX standard declaration
extern char **environ;
or use the nonstandard form
int main(int argc, char *argv[], char *envp[]) { ... }
or both.  environ and envp have essentially the same type, and have the same value when the program starts.  The number of environment strings is not specified; start with environ[0] and continue until environ[i] is NULL, which indicates the end of the array.  The same applies to envp[0] and envp[j].  In general, it is better to use getenv(3C) to search for a specific environment variable, so the pointer environ or the array envp are not normally used explicitly.  Environment variables can be set with the library function putenv(3C)putenv() could cause environ to change, to make room for new string pointers, so in general it is safer to use environ (whose value could change to reflect a new environment variable) than envp (whose value will not change).  Of course, if you really want to ignore changes to the environment as the program runs, then use envp.  Note that calls to putenv() affect only the current process, not the command shell that started the process.

The command line is parsed by the shell into argv[0], ..., argv[argc-1], where argv[0] is the program name as the command was given.  Although the number of argument strings is specified by argc, it is also the case that argv[] has a null element argv[argc] to mark the end of the array, as with environ[] and envp[].  In most cases, it is more natural to iterate through argv[] by incrementing an index, comparing the index to argc.

The exit status of the program is the return value from main(), or the value supplied to the system function exit(3C).  Note that there is also a man page for exit(2).  There is some additional information in intro(1).  An exit status of 0 is interpreted as success, non-zero as failure.  In the context of shell scripts with loops and conditionals, this would be treated as true or false.

The following conditions should hold for main():

envp is equal to environ at the start of the program, but this could change if the program adds to its own environment.  In this case, envp remains the same, and environ changes.

There is no importance to the ordering of the environment strings.  Windows requires the environment strings to be sorted alphabetically, but Unix does not.

The POSIX standard includes two functions setenv(3) and unsetenv(3), which would be useful for a shell program.  These are implemented on Solaris 10, GNU/Linux and Mac OS X but not Solaris 9.


Last revised, 28 Jan. 2013