main()
and exit()
abort
, atexit
, exit
,
etc.main()
main()
is a function with some special properties
required by C and by Posix. In the simplest sense, it is the
first function called when a C program starts as a new process, but
there are lots of details involved. In reality, there is a
startup function, provided by the OS, that is called before main()
.
The
startup
function
arranges initial data in the process address space, and then it
calls main()
. The process ends when main()
returns or the program calls exit()
(normal behavior)
or abort()
(abnormal behavior); control then returns
to the execution environment.fork()
and exec()
functions are the usual Unix interfaces to the startup
function. Don't confuse this with the system startup
mechanism that gets the OS running in the first place.main()
is required in a hosted environment, which is
the typical setting on a workstation or anything else with an
operating system. There might be some other way to start a
program in a freestanding environment, so nothing can be said about
main()
there without checking the system-specific
documentation.main()
is called, and remain allocated
until the process terminates. This category of data includesextern
keyword.
You can prevent access to a global variable from another
source file by using the static
keyword.
We will discuss more of this later.static
local variablesstatic
local
variables are only accessible by the defining function.main()
.
Of course, the program itself will be loaded into memory, or at
least enough of it to get started.main()
not necessarily the
first
programmer-defined function called when a C++ program starts?char *foo = "string 1"; char
*bar = "string 2";
char *foo = "string 1"; char
*bar = "string 1";
char foo[] = "string 1"; char
bar[] = "string 2";
main()
's return type is int
main()
returns an int
value to the
startup function, which passes it on to the execution environment as
the exit status of the process. The command shells use the
exit status of a process to indicate success (0) or failure
(nonzero, with the particular value indicating the reason for
failure). The function exit()
takes one int
argument that also acts as the exit status. The function abort()
terminates the process indirectly by sending it a SIGABRT
signal, which (usually) leads to exit()
being called
with a nonzero argument.void
main()
."csh
, tcsh
)echo $status
sh
, bash
)echo $?
sh
, which is the
standard Posix shell.if prog foo bar
then
echo prog was successful # exit status zero
else
echo prog
failed # exit
status nonzero
fi
void-main-void.c
void main(void) { return; }
% gcc
void-main-void
.c
void-main-void
.c: In function 'main':
void-main-void
.c:1: warning: return type of
'main' is not 'int'
main()
's parametersint main(void) { ... }
main()
's
parameters. Anyway, the startup function doesn't know how main()
was written, so it must do the same thing for all new processes.int main(int argc, char *argv[]) {
... }
argc
is the number of command-line arguments, and argv
points to the arguments as a NULL-terminated array of
null-terminated character strings. Think of "argument count"
and "argument vector".argc
is nonnegative; it tells you how many
elements of argv[]
are valid, as infor (int n = 0; n < argc; n++) do something with
argv[n]
argv[argc]
is NULL
; you can detect
the end of the array when iterating through it, as in for (char **p = argv; *p != NULL; p++) do something with
*p
argc
> 0, then argv[0]
through argv[argc-1]
point to valid strings which
are initialized to implementation-defined values by the
execution environment prior to program startup.argc
> 0, argv[0]
points to
a string representing the program name.argv[0][0]
is the
null character; this should be considered an unusual
situation.argc
> 1, the strings pointed to by argv[1]
through argv[argc-1]
are the program parameters.argc
, argv
,
and the strings pointed to by the argv
array.argc--; argv++;
argc
, argv
,
etc., after main() begins to execute.argc
and argv
are function
parameters, they act like local variables, and changes to them
are not communicated back to main()
's
caller. There are other ways to return information to
the execution environment.exec()
functions.exec()
should follow the
same rules as required for the Posix utility programs.int main(int argc, char *argv[],
char *envp[]) { ... }
argc
and argv
behave as previously. The third parameter gives access to the
environment variables, but in a non-standard way. This is one
of the cases of a previously common practice that is still accepted
by the compilers for backward-compatibility, but new programs should
avoid it. We'll discuss this more in a later section.prog.h
extern char *program_name;
prog.c
char *program_name =
"[unknown]";
main.c
#include "prog.h"
#include "foo.h"
int main(int argc, char *argv[])
{
if (argc > 0 && argv[0][0] !=
'\0')
program_name = argv[0];
foo("display this message");
return 0;
}
foo.h
void foo(char *msg);
foo.c
#include
<stdio.h> /* for fprintf() */
#include <stdlib.h> /* for exit() */
#include "prog.h"
#include "foo.h"
void foo(char *msg)
{
fprintf(stderr, "%s: %s failed: %s\n",
program_name, __func__, msg);
exit(1);
}
% cc -o prog main.c prog.c foo.c
% cp prog prog2
% prog
prog: foo failed:
display this message
% prog2
prog2: foo failed:
display this message
Beyond this example, some programs are designed to adjust
their behavior according to the name of the program. The Posix
utilities true
and false
are easily
implemented this way.argv[]
as the set of parameters or arguments
to the program. The easiest thing to do is just echo the
arguments.
for (int n = 0; n < argc;
n++)
printf("argv[%d] = %s\n", n, argv[n]);
for (int n = 0; argv[n] !=
NULL; n++)
printf("argv[%d] = %s\n", n, argv[n]);
getopt()
, which is used to
separate command-line options from command-line operands in a
standard way. The distinction is that options affect how the
program works, while operands provide its data.name=value
. The set of strings is a
NULL-terminated array. The global variable environ
is defined, allocated and initialized by the startup function, and
must be declared in your program asextern char **environ;
for (int n = 0; environ[n] !=
NULL; n++)
printf("environ[%d] = %s\n", n, environ[n]);
getenv()
, to obtain environment variable
values in a standard way.environ
environ
is the only object specified in the
Posix standard whose declaration is not in an include file.environ[0]
environ[]
name=value
ARG_MAX
gives the limit on the
total amount of space used for all the command-line argument and
environment variable strings. While it is possible to
exceed the limit, this is not a common event.main()
cited earlier, int main(int argc, char
*argv[], char *envp[]) { ... }
envp
. Since it is possible for a running program
to change its own environment variables with the putenv()
,
setenv()
or unsetenv()
library functions,
use of envp
instead of environ
is often
a mistake. Nevertheless, we will need to look at envp
in a later discussion of the process address space, to help learn
more about how environ
is managed.environ
directly in a program, do not
modify it (leave that to putenv()
and so on), and
question whether you really need to access it (leave that to getenv()
).
When
we
use
environ
here, it is mostly to increase understanding of
how Unix works.getlogin()
function, among others, or the environment variable USER
.
Which should you choose?main()
returns, or if exit()
is called, all output streams
are flushed, and all open files are closed. If the process
ends by calling abort()
, the runtime system is not
obligated to close all files properly; for example, an output stream
might not be flushed before closing.main()
executes return n;
it is as if it called exit(n)
. The exit()
function takes care of cleaning up the process, by calling functions
registered with atexit()
, flushing open output
streams, closing open streams (input or output), removing temporary
files, and setting up the exit status to return to the startup
function.<stdlib.h>
defines two
macros for use as arguments to exit()
, EXIT_SUCCESS
and EXIT_FAILURE.
#include <stdio.h>
main()
{
printf("hello, world\n");
}
$ cc hello.c compile with C89 on Linux or Mac
OS X, using bash
$ a.out
hello, world
$ echo $?
print
the exit status
13
% c99
hello.c
compile with C99 on Solaris, using
tcsh
"hello.c", line 4: warning: old-style declaration or incorrect
type for: main
% a.out
hello, world
% echo $status
1
It is possible to obtain the address of a function with
the unary &
operator. This is the address in
memory of the first instruction of the function. The
expression &main
is legal in C but not in C++.main()
cannot be inline
'd, since the
startup function needs to find main()
's address from
information in the executable file._Noreturn
is used to indicate
a function that does not return. For example, the prototype
for exit()
would become_Noreturn void exit(int
status);
void exit(int status);
abort()
, longjmp()
,
_Exit()
and quick_exit()
.
Obviously, you could not use _Noreturn
with main()
.int main(int argc, char *argv[]) { ... }To get direct access to the environment variables, add the POSIX standard declaration
extern char **environ;or use the nonstandard form
int main(int argc, char *argv[], char *envp[]) { ... }or both. environ and envp have essentially the same type, and have the same value when the program starts. The number of environment strings is not specified; start with environ[0] and continue until environ[i] is NULL, which indicates the end of the array. The same applies to envp[0] and envp[j]. In general, it is better to use getenv(3C) to search for a specific environment variable, so the pointer environ or the array envp are not normally used explicitly. Environment variables can be set with the library function putenv(3C). putenv() could cause environ to change, to make room for new string pointers, so in general it is safer to use environ (whose value could change to reflect a new environment variable) than envp (whose value will not change). Of course, if you really want to ignore changes to the environment as the program runs, then use envp. Note that calls to putenv() affect only the current process, not the command shell that started the process.
The command line is parsed by the shell into argv[0], ..., argv[argc-1], where argv[0] is the program name as the command was given. Although the number of argument strings is specified by argc, it is also the case that argv[] has a null element argv[argc] to mark the end of the array, as with environ[] and envp[]. In most cases, it is more natural to iterate through argv[] by incrementing an index, comparing the index to argc.
The exit status of the program is the return value from main(), or the value supplied to the system function exit(3C). Note that there is also a man page for exit(2). There is some additional information in intro(1). An exit status of 0 is interpreted as success, non-zero as failure. In the context of shell scripts with loops and conditionals, this would be treated as true or false.
The following conditions should hold for main():
There is no importance to the ordering of the environment strings. Windows requires the environment strings to be sorted alphabetically, but Unix does not.
The POSIX standard includes two functions setenv(3) and
unsetenv(3), which would be useful for a shell
program. These are implemented on Solaris 10, GNU/Linux and
Mac OS X but not Solaris 9.
Last revised, 28 Jan. 2013