We start with some background information about C programs on
Unix,
including a full example.
Some
of this will be review, and some of this will be new
information.
For information on remote access and editors, see the General Instructions.
There
will
be examples of compiler commands
later.
Standards. The POSIX standard describes the C language interface to a Unix-like system, and it often defers to the ANSI/ISO standard for C. This describes how system and library functions should behave. The on-line documentation for each particular operating system should indicate if there are any discrepancies. Links to the POSIX standard are given on the General Instructions page. Most Linux distributions now include various sections of the standard in their man pages.
Terminology. In the POSIX standard (IEEE Std 1003,
2008), there are some important definitions:
sh utility and the system()
function,
although
popen()
and the various forms of exec() may also be
considered to
behave
as interpreters."
The commonly used Unix and Linux command shells are sh
(the
Bourne shell, standardized by POSIX),
csh (the C shell), tcsh
(the Tenex C shell, another version of csh), ksh
(the Korn
shell), and
bash
(the Bourne-again shell, from GNU). You can see which shell
you
are
using with a command like "echo $SHELL", as will be
seen
later.
Notation. The notation fork(2) means
that
the
appropriate man page is in section 2 ("man -s 2 fork"
for
this
one page, or "man -a fork" to see everything).
The
section
numbers used here refer to Solaris, and may be different on Linux
or
other
versions of Unix.
man pages. (manual pages) These are written
for
people
who have been using Unix already, and beginners often find them
inadequate.
Try the GNU documentation or one of the textbooks if the man page
is
too
confusing.
The command syntax for the man command differs
between
systems. For example, on
Solaris
or Linux, "man -s 2 fork", while on Mac OS X, "man
2
fork".
The apropos command is also useful.
One reason for using "man -a something" instead of "man
something" is that you discover more information.
Another
is
that Unix systems are inconsistent about which information goes in
which
section, so what worked on one system might not work on another.
On the Web. Here are some links to documentation
related
to program design. You might find these more informative or
more
accessible than the Solaris man pages. Examples are
included.
GNU
C
Library,
The
Basic Program/System Interface (read this along with
the current discussion)
GNU
C
Library,
Signal
Handling (read this later)
GNU
C
Library,
Processes (read this later)
On-Line. The GNU C library information can also be
accessed
from Linux or the CSE Sun systems with the commands
info libc "program basics"
info libc "signal handling"
info libc processes
A quick guide to info
is
available.
(Exercise:
how many command-line arguments are there in each of these three
examples?)
Command-line structure. More information is
available
in
intro(1).
The simplest form of a Unix command is to name a program to run,
and
give
it some arguments to use in the same way that a function uses
parameters.
When the program runs, the command-line arguments are given to it
as an
array of character strings. Arguments are separated into
options
and operands; options can be simple or have an
option-argument.
All
the options should precede the operands. The general form is
utility_name [options] [operands]
The utility name is expected to refer to an executable file, or a
command
that is built into the shell. The brackets indicate that
something
is optional. For example,
ls -l -t -r foobar
will list directory information about the file system entry foobar,
which
can
be a file or directory. The options all start with "-"
and
are
one letter. The simple options can be combined, so that
ls -ltr foobar
has the same effect. In most cases the order of the options
does
not matter. Another example is
cc -v -o bgi bgi.c
The option -v is simple, the option -o
has
one
option-argument
(bgi), and the command has one operand (bgi.c).
The
parts
of
a command line are separated by "white space", which is
some
number of spaces, tabs and escaped-newlines (backslash followed by
return).
If an option has more than one option-argument, they are separated
by
commas;
this is used infrequently. The man page for each utility
will
describe
how it treats repeated options, or mutually-exclusive
options.
The
getopt(3C)
function is the standard mechanism to separate options and
operands.
Examples of the use of getopt() can be found on its
man
page
and
in the bgi.n.c
programs to follow. The GNU C
library has two
other
mechanisms,
getopt_long() and
argp_parse(), which you
could use after gaining experience with
getopt(), but they are
not part of the Sun or Posix libraries.
Environment variables. If the command line is like a
function
call with parameters, then the environment variables are like
global
variables
and hidden parameters. The shell maintains a set of strings
of
the
form "name=value" which determine the environment. As the
shell
starts
a new program, it copies the environment variables to a place
where the
new program can find them. The command printenv(1)
will
print the current set of environment variables known to the shell,
and
the set can be altered with the commands setenv and
unsetenv
(or export, declare, set
and unset,
depending
on
the shell). On Solaris, see set(1).
If
you already know the name of an environment variable, such as SHELL,
then
you
can print its value with "printenv SHELL" or "echo
$SHELL".
main() structure. The usual way to start a C main program is with
int main(int argc, char *argv[]) { ... }
To get direct access to the environment variables, add the POSIX
standard
declaration
extern char **environ;
or use the nonstandard form
int main(int argc, char *argv[], char *envp[]) {
... }
or both. environ and envp have
essentially
the
same type, and have the same value when the program starts on
Solaris and Linux.
The number of environment strings is not specified; start with envp[0]
and continue until envp[i] is NULL,
which
indicates
the
end of the array. The same applies to environ[0]
and environ[j].
In general, it is better to use getenv(3C) to search
for
a
specific
environment variable, so the array envp or the pointer
environ
are not normally used explicitly. Environment variables can be
set
with the library function putenv(3C). putenv()
could cause environ to change, to make room for new
string
pointers,
so in general it is safer to use environ (whose value
could
change
to reflect a new environment variable) than
envp (whose value
will not change). Of course, if you really want to ignore
changes
to the environment as the program runs, then use
envp. Note
that calls to putenv() affect only the current
process,
not
the
command shell that started the process.
The command line is parsed by the shell into argv[0],
...,
argv[argc-1],
where argv[0] is the program name as the command was
given.
Although the number of argument strings is specified by argc,
it
is
also the case that argv[] has a null element
argv[argc]
to mark the end of the array, as with envp[] and environ[].
In
most
cases,
it is more natural to iterate through argv[]
by
incrementing an index, comparing the index to argc.
The exit status of the program is the return value from main(),
or
the
value supplied to the system function exit(3C).
Note
that
there is also a man page for exit(2). There is
some
additional information in intro(1). An exit
status
of 0
is interpreted as success, non-zero as failure. In the
context of
shell scripts with loops and conditionals, this would be treated
as
true
or false.
The following conditions should hold for main():
argc >= 0argv is not NULLfor (i = 0; i < argc; i++) { argv[i] is not
NULL;
strlen(argv[i])
could be 0; }argv[argc] is NULLenvp is not NULLenviron is not NULLenvp[0] could be NULLenviron[0] could be NULLenvp is equal to environ at the start of
the
program,
but this could change if the program adds to its own
environment.
In this case, envp remains the same, and environ
changes.
There is no importance to the ordering of the environment strings. Windows requires the environment strings to be sorted alphabetically, but Unix does not.
The POSIX standard includes two functions setenv(3)
and unsetenv(3),
which would be useful for a shell program. These are
implemented
on Solaris 10, GNU/Linux and Mac OS X but not Solaris 9.
General guidelines.
argv[0],
..., argv[argc-1]argv[0] is the program name as the command was
givenstdout with printf(...)stderr with fprintf(stderr,
...)getopt(3C)argv[] array only once with getopt()
(although this is not a completely firm rule)
Using C89,
Sun's compiler cc -v -O -o bgi bgi.c
GNU compiler gcc -Wall -Wextra -O -o
bgi
bgi.c
code checker lint bgi.c
Using C99,
Sun's compiler c99 -v -O3 -o bgi
bgi.c
GNU compiler gcc -std=c99 -Wall
-Wextra -O -o
bgi bgi.c
code checker lint -Xc99 bgi.c
The optimization flags (-O and -O3)
can
be
omitted
for this short example program. On Sparc processors running
Solaris with
Sun's compiler (cc) the default is 32-bit addresses;
you
can
request
64-bit addresses by adding the option -m64 or the
option -xarch=generic64
Some of what lint(1) complains about can be
ignored.
Use "man -a lint" to see complete instructions.
Of
course,
you should read the man pages for cc and gcc
to see
what
these additional options do for you. Note that the -v
option
on cc does not mean "verbose", as it does with
GCC.
Use
"info gcc"
for
more complete information.
Run the program. At the prompt from the command
shell,
type the name of the program followed by its options and
operands.
If you get a response like "command not found", try using "./bgi"
instead
of
"bgi". This problem is connected to the path
variable in the command shell.
Now we'll go through the development of a complete example. Links to the code follow the specification and some output examples. The program itself is intended to be more-or-less realistic, but it's still just an example.
The objective of this example is to write, test and understand a
Unix
program bgi [BackGround Information] in C with
command-line
arguments, in the style of
a typical Unix utility, using the standard library functions getopt(),
getenv(),
putenv(),
setenv(),
unsetenv(),
exit(),
printf(),
fprintf(),
atoi().
The program should return with an exit status in a style
consistent
with
other Unix commands, and you should provide a simple shell script
to
test
the exit status, using an if-then-else-endif construct (the exact
syntax
depends on the shell you choose) and the echo
command.
Most
of the testing can be done interactively. The program should
print
the following when the -h option is used (this is
the
typical
form of a help option):
usage: bgi [-h] [-v] [-a] [-e] [-p] [-s var=val]
[-t
var]
[-u var] [-x stat]
The options should behave as follows:
-h |
help, print a usage message to stdout
(some additional output could be provided, explaining the options) |
-a |
print the command-line arguments, one per line |
-e |
print the environment variables using environ,
one
per
line, in the style SOMETHING=something |
-p |
print the environment variables using envp[],
one
per
line,
in the style SOMETHING=something |
-s var=val |
set the environment variable var to the
value val |
-t var |
print the value of the environment variable var,
in
the
style var = "something" or var: not found |
-u var |
remove (unset) the environment variable var
from
the environment |
-x stat |
exit with the given status |
-v |
verbose mode, print the addresses of argv[i] with the -a
option print the addresses of environ[i] with the -e
option print the addresses of envp[i] with the -p
option |
The options can occur in any order, and may be repeated. If
multiple
options are used, the -t output should come before
the -a
output, which should come before the -p output,
which
should
come
before the
-e output. The -s, -u
and -x
options produce no output except perhaps an error message.
The -s
and -u options affect only the current process, not
the
command
shell that started it. The last
-x option takes effect;
if there is none, then exit with status 0, for success. The
-x
option does not cause an immediate exit, it just saves a status
value
for
use later when the program would normally exit. If some
other
option
is used, or there is an option-argument missing, this is an error,
and
the program should print (to stderr) the help
message or
something
equivalent. Command-line operands are ignored except with
the -a
option.
The output from the command printenv should be
identical
to
your output with the
-e or
-p option alone.
The exit status should be a byte-sized integer, between 0 and 255, but it might be interesting to avoid checking this condition, just to see what happens. The value of the exit status can be printed from a shell script or interactively with a command sequence like
forbgi -x 17
echo $?
sh, ksh, tcsh and bash,
or
forbgi -x 17
echo $status
csh, tcsh. You should
also
try an if-then-else construct to see how the exit status can be used
in
a shell script (there is an example shortly).
Before reading further, try sketching the design of this program.
Here is an example using Solaris, where '%' is the
shell's
prompt. The commands have been highlighted. The
addresses
might
be different when you try it.
Here is the same example on Mac OS X. The prompt is% bgi -a -t USER -t SHELL
USER = "dheller"
SHELL = "/bin/tcsh"
argc = 6
bgi
-a
-t
USER
-t
SHELL
% bgi -v -a -t USER -t SHELL
USER = "dheller"
SHELL = "/bin/tcsh"
address ffbff0a4: argc = 7
address ffbff0a8: argv = ffbff0c4
address ffbff0c4: argv[0] = ffbff220 --> "bgi"
address ffbff0c8: argv[1] = ffbff224 --> "-v"
address ffbff0cc: argv[2] = ffbff227 --> "-a"
address ffbff0d0: argv[3] = ffbff22a --> "-t"
address ffbff0d4: argv[4] = ffbff22d --> "USER"
address ffbff0d8: argv[5] = ffbff232 --> "-t"
address ffbff0dc: argv[6] = ffbff235 --> "SHELL"
% bgi -v -z
bgi: illegal option -- z
bgi: invalid option 'z'
usage: bgi [-h] [-v] [-a] [-e] [-p] [-s var=val] [-t var] [-u var] [-x stat]
-h print help
-v verbose mode
-a print argc and argv[]
-e print environ[]
-p print envp[]
-s var=val set environment variable var to value val
-t var print value of environment variable var
-u var unset environment variable var
-x stat set exit status to stat
% bgi -a blat -t USER
argc = 5
bgi
-a
blat
-t
USER
'$'.
Here is a sequence of initial versions of the program, with extensive comments, which you can use to get started. The fourth version was used to generate the output shown above. The fifth version cleans up some details, and completes this example.$ bgi -a -t USER -t SHELL
USER = "dheller"
SHELL = "/bin/bash"
argc = 6
bgi
-a
-t
USER
-t
SHELL
$ bgi -v -a -t USER -t SHELL
USER = "dheller"
SHELL = "/bin/bash"
address 0xbffffd58: argc = 7
address 0xbffffd5c: argv = 0xbffffdf8
address 0xbffffdf8: argv[0] = 0xbffffe64 --> "bgi"
address 0xbffffdfc: argv[1] = 0xbffffe68 --> "-v"
address 0xbffffe00: argv[2] = 0xbffffe6b --> "-a"
address 0xbffffe04: argv[3] = 0xbffffe6e --> "-t"
address 0xbffffe08: argv[4] = 0xbffffe71 --> "USER"
address 0xbffffe0c: argv[5] = 0xbffffe76 --> "-t"
address 0xbffffe10: argv[6] = 0xbffffe79 --> "SHELL"
$ bgi -v -z
bgi: illegal option -- z
bgi: invalid option 'z'
usage: bgi [-h] [-v] [-a] [-e] [-p] [-s var=val] [-t var] [-u var] [-x stat]
-h print help
-v verbose mode
-a print argc and argv[]
-e print environ[]
-p print envp[]
-s var=val set environment variable var to value val
-t var print value of environment variable var
-u var unset environment variable var
-x stat set exit status to stat
$ bgi -a blat -t USER
argc = 5
bgi
-a
blat
-t
USER
Use the browser to save these files - copy and paste is sometimes inaccurate, and retyping is just a waste of time.bgi.1.c
bgi.2.c
bgi.3.cbgi.3a.c(one character different)
bgi.4.c
bgi.5.c
bgi.sh (shell script, Solaris only)
bgi.gcc.sh (shell script, Solaris, Linux or Mac OS X)
A reasonable order in which to read the Solaris man pages for this example (and what to look for) is
man -s 1 intro
introduction to commands
and
application
programs,
manual page organization,
command syntax standard,
diagnostics
(exit status), list of commands
man -s 2 intro
introduction to system
calls and
error numbers,
errno.h, errno, various
definitions,
list of system functions
man -s 3 intro
introduction to functions
and
libraries,
include files, interfaces
and
headers,
definitions,
standard C library, memory
allocators,
networking, etc.
man -s 4 intro
introduction to file
formats
man -s 5 intro
introduction to miscellany,
standards, environment, etc.
man -s 3C exit
stdlib.h, exit(), EXIT_SUCCESS,
EXIT_FAILURE
man -s 2 exit
most of this information
will be
relevant later
man -s 3C stdio
stdio.h, stdin, stdout, stderr, EOF
(end of file), NULL (null pointer)
man -s 3C printf
printf(), fprintf(), format
conversions,
examples
man -s 3C getopt
getopt(), relevant example
man -s 3C atoi
string to int
conversion,
but without error checking
man -s 5 environ
user environment,
environment
strings
try the command printenv
first
most of the specific
environment
information is not needed here
man -s 3C getenv
getenv()
man -s 3C putenv
putenv(), heed the warning!
man printenv
which printenv
printenv is built into the
shell
tcsh,
but is a separate utility under csh or sh
or ksh
or bash
man -s 1 set
setenv, unsetenv, etc.
man tcsh
or, man csh,
depending on
which shell you are using
man sh
the Bourne shell, adopted
by
POSIX,
because there is less
reading to
reach what you need here (the if-then-else)
The command which printenv will be useful to see
whether
the
command
printenv is built into the shell (tcsh)
or is
a separate command (sh,
csh,
ksh,
bash).
Here are some test cases to try. If you already have files
named
T1
and T2, and you don't want to destroy them, then
pick some other names.
Some experiments withbgi
bgi -h
(does the message go tostdoutorstderr?)
bgi -g
(same question)
bgi -x 0
bgi -x 1
(the shell script will be needed to see any difference)
bgi -a
bgi -e
bgi -p
bgi -p > T1
printenv > T2
diff T1 T2
(there should be no difference, but there is using thebashshell)
bgi -t USER -t GROUP -t SHELL
(also try this as a command in amakefile, or with another shell)
bgi -t LC -t USER -a
bgi -t LC -a -t USER
setenv foo 13
bgi -t foo
unsetenv foo
bgi -t foo
getopt(): (digit 1, not
letter l)Here is another test script using thebgi -v x 1
bgi -v -x 1
bgi -v -x1
bgi -vx1
bgi -av
bgi -va
sh shell.
If
you
already
have files named e, p, p2,
then pick some
other
names.
# test bgi.c with Sun and GNU C compilers
for compile in "cc -v" "gcc -Wall -Wextra"
do
echo compiling with $compile
rm bgi
if $compile -o bgi bgi.c
then
echo testing -s option
bgi -t foo
bgi -t foo -s foo=bar
bgi -t foo -s foo=bar -t foo
echo testing -e and -p options
bgi -e > e
bgi -p > p
printenv > p2
echo compare -e and -p
diff e p
echo compare -p and printenv
diff p p2
rm e p p2
echo compare -e and -p after -s
bgi -s foo=bar -e > e
bgi -s foo=bar -p > p
diff e p
rm e p
echo done
else
echo
echo compile failed, test abandoned
fi
echo
done
Some additional notes.
When writing any C program, check all the system functions and
library
functions to verify that they are being used correctly, and that
possible
runtime errors are detected. No one bothers checking to see
if printf()
or
fprintf() fail - what would you do about it, anyway?
Some versions of printf() don't mind character
string
pointers
that are NULL, and some will choke. You might
need
to
check
the value of a pointer before passing it to printf().
The
runtime
error
messages
"segmentation fault" and "bus error" usually
mean
"bad pointer". Sun's position is that the C programming
language
standard says that the argument for a %s specifier
shall
be a
pointer to an array of characters, and while NULL is
indeed a
pointer it clearly does not qualify as a pointer to an array of
characters.
The GNU C library will accept a NULL string pointer
for printf(),
but
usually
not for other functions. Would you rather
find your bugs sooner or later?
Most Unix utilities accept "--" (two hyphens) as a separator
between
the
command-line
options and operands, in case one of the operands starts with "-".
The
operand
"-" usually refers to stdin.
What
does
getopt()
do about these special cases?
Unix has been historically inconsistent about whether help and
error
messages
go to stdout or stderr. The
current
standard
is
help to stdout, error messages to stderr.
Single-character constants like 'c' have type int
in C, not type char as in C++. Usually this
does
not
matter
because of the type conversion rules.
Why is the return value from getopt() an int
and
not
a char? getopt() returns either
a
character
(normal
result) or the int value -1 (abnormal result or end
of
the
options).
The dual use of return values is often described as a design flaw,
and
it is a common problem in the C libraries. The global
variables optarg,
optind,
opterr,
optopt
simplify the interface to getopt(), as they might
have
been
parameters
to a more complex function. If there is a problem with one
of the
options, getopt() will print a message to stderr
unless
you disable this feature by first setting
opterr to 0. optopt
and optind are used to see which option and
command-line
argument
you are currently referring to. This is necessary because
some
options
can be combined into the same argument.
GNU's Not Unix. No kidding.
The GNU/Linux getopt(3) function differs from the
Solaris getopt(3C)
function. Here is the earlier example, for comparison, but
now
run
on Linux. The commands have been highlighted.
Note from the last example that the GNU/Linux$ ./bgi -a -t USER -t SHELL
USER = "dheller"
SHELL = "/bin/tcsh"
argc = 6
./bgi
-a
-t
USER
-t
SHELL
$ ./bgi -v -a -t USER -t SHELL
USER = "dheller"
SHELL = "/bin/tcsh"
address 0xbfe301e0: argc = 7
address 0xbfe301e4: argv = 0xbfe30264
address 0xbfe30264: argv[0] = 0xbff13b05 --> "./bgi"
address 0xbfe30268: argv[1] = 0xbff13b0b --> "-v"
address 0xbfe3026c: argv[2] = 0xbff13b0e --> "-a"
address 0xbfe30270: argv[3] = 0xbff13b11 --> "-t"
address 0xbfe30274: argv[4] = 0xbff13b14 --> "USER"
address 0xbfe30278: argv[5] = 0xbff13b19 --> "-t"
address 0xbfe3027c: argv[6] = 0xbff13b1c --> "SHELL"
$ ./bgi -v -z
./bgi: invalid option -- z
./bgi: invalid option 'z'
usage: ./bgi [-h] [-v] [-a] [-e] [-p] [-s var=val] [-t var] [-u var] [-x stat]
-h print help
-v verbose mode
-a print argc and argv[]
-e print environ[]
-p print envp[]
-s var=val set environment variable var to value val
-t var print value of environment variable var
-u var unset environment variable var
-x stat set exit status to stat
$ ./bgi -a blat -t USER
USER = "dheller"
argc = 5
./bgi
-a
-t
USER
blat
getopt()
permutes the argv[]
array. This is non-standard behavior. One way to force
the
standard behavior is to define the environment variable POSIXLY_CORRECT
(if you believe one part of the documentation) or _POSIX_OPTION_ORDER
(if you believe another part of the documentation). The bash
commands would be
declare -x
POSIXLY_CORRECT
declare -x _POSIX_OPTION_ORDER
csh or tcsh commands would besetenv POSIXLY_CORRECT
setenv _POSIX_OPTION_ORDER
POSIXLY_CORRECT is the right
one
to use, but you could also insert the character '+' at the head of getopt()'s
third
argument
(this
is non-standard). However, that defeats the
use of ':' as the first character, which makes it easier to detect
missing
option-arguments. So much for consistent and orthogonal
extensions
to the standard._POSIX_, but not _POSIX_OPTION_ORDER,
and none of them are environment variables.There are a few other incompatibilities that arise with the bgi
program.
The GNU/Linux putenv()
function can
remove something from the environment. Try this on Solaris,
Linux and Mac OS X:
gcc -Wall -Wextra -o bgi
bgi.4.c
bgi -s a=A -t a -s a -t a
Solaris
a = "A"a = "A"
Linux
a = "A"a: not found
Mac OS X
a = "A"bgi: putenv(a) failed: Invalid argumenta = "A"unsetenv() function returns an int,
according
to
the Posix Standard. On Mac OS X the compiler formerly
complained% gcc -Wall -Wextra -o bgi bgi.4.cbgi.4.c: In function 'main':bgi.4.c:217: error: void value not ignored as it ought to beunsetenv() had
a void
return; the man page for unsetenv()
notes this. The standard implementation is used on Linux, but
the
man
page is wrong (if you are feeling adventuresome, compare man
-a
unsetenv and the file /usr/include/stdlib.h).
The
cure
for
the Mac OS X problem was simply to insist on the standard
version:gcc -std=c99
-D_POSIX_C_SOURCE=200112L -D_XOPEN_SOURCE=600 -Wall -Wextra -o
bgi
bgi.4.cQ&A
Q. From bgi.3.c,
printf(" address %10p:
argv[%d] =
%10p --> \"%s\"\n",
&argv[i], i,
argv[i], argv[i]);
Could you explain why &argv[i] and argv[i]
give two different addresses? Let me make a guess: &argv[i]
gives the address of argv since by default a pointer
of char
array always points to the first element so &argv[i]
gives the current address of the pointer argv? and argv[i]
always gives the address of where argv[i] is stored?
A. argv
has
type char *[] so each of its elements has type char
*. argv[i] has type char *,
and
its value points to some character string (or is NULL).
The
character
string
is an array of char elements in
memory. &argv[i] is the address of argv[i],
and
has
type char **. The operator precedence of C
and C++ causes [] to be evaluated before &.
The best way to answer this kind of question is to start drawing
boxes
for memory locations and arrows from one box to another when you
have a
pointer. It's kinda hard to do in plain text.
For this particular case we have
argv ---> argv[0] ---> "string"
argv[1]
--->
"another string"
...
argv[argc] ==
NULL
Actually, argv[0] points to argv[0][0],
the
value
of argv[0][0] is 's', the value
of argv[0][1] is 't', etc. And,
the
argv array should go upward, since argv[1]
is at a higher
address than argv[0].
Last revised 31 Jan. 2013