CMPSC
311,
Introduction to Systems Programming
Introduction to Unix
Reading and References
- These notes are based on APUE, Foreword, Preface 2, Preface 1,
and Chapter 1 (UNIX System Overview).
- We have added references to similar material in CS:APP, CP:AMA
and C:ARM.
- See also the General
Instructions, so you can login to and use Solaris or
Linux.
- Try the sample programs from APUE as you go. We provide
links to versions of the programs that do not depend on
apue.h,
in case you don't have the book. The differences are only
in the error-reporting mechanisms.
Reading
- CS:APP
- Preface
- Ch. 1, A Tour of Computer Systems
- Ch. 10, Intro (p. 862), Sec. 10.1-10.3, 10.8-10.10,
System-Level I/O
- Ch. 8, pp. 702-703; Sec. 8.2, Processes; Sec. 8.4, Process
Control
- Sec. 8.3 and Appendix A explain the error-handling wrappers,
which may be a little confusing at first.
- For some more about processes and memory, see Ch. 9, Sec.
9.1-9.2. We'll get back to this later.
- CP:AMA
- Preface
- Ch. 1, Introducing C
- Ch. 2, Fundamentals (this should be mostly review for you,
but read it anyway)
- Ch. 22, Input/Output, (esp. Sec. 22.1, Streams; Sec. 22.4,
Character I/O; Sec. 22.5, Line I/O).
- APUE, see above
- C:ARM, as noted below
It all started with ...
- AT&T Bell Labs
- the telephone monopoly
Modern trends
- Public standards
- Open source
- Community development
- You know how it should work, and you can see how it does work
(or doesn't, and then you can help fix it).
Programming language standards
- ANSI Standard C, ISO
Standard C
- first version, 1989-1990
- second version, 1999
- most recent revision, approved Dec. 2011
System interface standards
- AT&T System V Interface Definition
- X/Open Portability Guide
- IEEE POSIX
- The Open Group, Single UNIX
Specification
- most recent revision, 2008
- Linux Standard Base
Commercial systems
- AT&T research systems, since 1969
- Ken Thompson, Dennis Ritchie, Brian Kernighan, Doug McIlroy,
etc.
- 6th edition, Version 6; 7th edition, Version 7; 8th, 9th,
10th editions
- Plan 9 from Bell
Labs
- AT&T commercial products
- System III, 1981
- System V, 1983; System V Release 4, 1988-95
- AT&T derivatives
- SCO UnixWare
- the former Microsoft Xenix, 1980, transferred to SCO in 1987
- etc.
Non-Commercial systems
- Univ. of California, Berkeley - BSD series, up to 1995
- Carnegie-Mellon Univ. - Mach kernel, 1985-94
- Free Software Foundation,
since 1985; GNU Project,
since 1983
- compilers, libraries and utility programs - everything but
an OS kernel
- eventually, maybe, who knows?, the GNU Hurd operating system
kernel
Commercial systems
- Sun Microsystems, SunOS, Solaris,
OpenSolaris
- SunOS and Solaris were derived from SVR4, with many changes
- Sun was purchased by Oracle Corp. in 2010
- OpenSolaris source
code browser
- IBM, AIX, z/OS (which is a mainframe operating system)
- Silicon Graphics Inc., IRIX
- Hewlett-Packard, HP-UX
- Digital Equipment Corp. / Compaq / HP, Ultrix, Digital Unix,
Tru64 Unix
- DEC/IBM/HP consortium, OSF/1 (based on Mach)
- Apple, Mac OS X
Semi-Commercial systems
- Linux, since 1991
- Red Hat, SUSE/Novell, Caldera
(defunct, SCO), Debian,
Mandrake/Mandriva, Slackware, Gentoo, Ubuntu, Knoppix,
Fedora, etc., etc.
- Linux kernel source code
browser
- Android, since 2003
- Linux kernel, Open Handset Alliance, Android Open Source
Project
- Android
(Wikipedia)
Unix Timeline,
1969 to date
APUE, Sec. 1.2, UNIX Architecture
An Operating System provides services to running programs
(processes), and manages resources on their behalf.
- physical resources -- hardware -- processor, memory, disk, I/O
devices
- abstract resources -- hide the details of a physical resource
behind a software layer
- virtual resources -- construct multiple abstract resources by
sharing physical resources
- manage resources
- allocate resources, prevent their misuse
- provide abstractions of the resources
- processor --> instruction set architecture, assembly
language
- processor, main memory --> processes and threads, address
space
- main memory, disk memory --> virtual memory
- disk memory, I/O devices --> files, networks
- whole system --> virtual machine
- map abstract resources to virtual resources over time
(scheduling)
- map virtual resources to physical resources over time
(virtualization)
- run one process for a while, switch to another process, run
it for a while, etc.
- load parts of several processes into main memory, and share
the space
Nearly everything about I/O can be abstracted as a file.
- An open device appears to be an open file.
- See the
/dev directory for the file interfaces
to devices.
- An open network connection (a socket) appears to be an open file.
- How can this help in debugging?
- On Solaris, see the
/proc file system for
process information.
- On Linux, see the
/proc file system for process
and system information.
Software layers
- OS kernel -- direct interaction with hardware
- system calls -- interface to the kernel
- system libraries -- wrappers around system calls
- programming language libraries -- wrappers around system
libraries
- system utilities -- application-independent tools
- command interpreter, command shell -- user interface
- application libraries -- application-specific tools
- applications -- complete programs for ordinary users
- some applications have their own command shells and
programming-language facilities
Toolkits and connections between tools
- Operating Systems also provide methods for connecting hardware
and software components.
- "There are many people who use UNIX or Linux who IMHO do not
understand UNIX. UNIX is not just an operating system, it
is a way of doing things, and the shell plays a key role by
providing the glue that makes it work. The UNIX
methodology relies heavily on reuse of a set of tools rather
than on building monolithic applications. Even perl
programmers often miss the point, writing the heart and soul of
the application as perl script without making use of the UNIX
toolkit." -- David
Korn, author of the Korn shell, 2001
- We'll see examples of this as we go on.
APUE, Sec. 1.3, Logging In
(generic)
Login protocol
- You provide a user name (login name) and a password, and maybe
another identifier.
- The system verifies that this information matches a known
legal login name and its password.
- A command shell is started, based on your user information
stored in some administrative database.
(PSU CSE specific)
Log in via
- Workstations in 218 IST - Dell systems running Red Hat Linux
- Servers in the CSE Dept. - from
Sun Microsystems
Oracle (running Solaris) and Dell (running Red Hat Linux)
- SSH Client, from Windows on your own system, with a terminal
window
ssh command from a terminal window
- Putty, from Windows in 220 IST
- You might need to connect to the CSE Dept. Cisco Virtual
Private Network first, if not on campus.
(PSU CSE specific)
Login instructions
- locally, in
218 IST
- remotely,
to 218 IST
- The CSE file system is shared between all the CSE Unix and
Windows systems, but it is separate from the PSU file system.
- The files in your CSE home directory are backed up, but it
might not be a bad idea to keep copies on your own USB drive.
Important
- Do not use control-C when you want to copy text on a terminal
window.
- The Windows SSH client has Copy and Paste buttons, or you
can use control-Insert and shift-Insert in place of control-C
and control-V.
- The Unix/Linux window managers have similar features - check
the menu bar.
- On a Mac, use Command (⌘)-C and Command (⌘)-V in the usual
way.
- On Unix systems, in a terminal window, control-C sends a
signal to a running program that causes the program to
terminate.
Try these commands
cat
/etc/passwd (the
public
part of the password file)
echo
$HOME
(your home directory)
echo
$SHELL
(which shell is started at login?)
hostname
(which system is this?)
w
(who is
logged in?)
who
(who is logged in?)
ls
(what files and directories do I have?)
pwd
(which directory am I in?)
A shell is a command-line
interpreter.
- reads user input
- executes some commands itself
- to execute other commands, finds the appropriate program and
starts a new process
- login shell
- interactive shell
- input from a keyboard, single commands with history and
editing (try the arrow keys)
- shell scripts
- input from a file, as a programming language
- See APUE Sec. 1.3 for a brief history of different shells.
cat /etc/shells
man shells
ls -l /etc/shells
ls -l `cat /etc/shells`
- Note - this uses the back-quote character
`
not the quote character ' .
- The Bourne shell
sh is standardized in Posix.
bash = the Bourne-again shell, from GNU
- The C shell
csh has been replaced by the Tenex C
shell tcsh, which is widely available.
- The Korn shell
ksh is another example.
- Its merits (and many other topics) are discussed by its
author in an interview
on Slashdot in 2001.
- The command prompt is typically
$ or %,
but is configurable. It may include the machine name and
your current directory.
- Shell scripts should be written for the Bourne shell.
- You can start any of the shells for interactive use, after
login; just type the appropriate command.
Using man pages
- The Unix manual sections and the
man command
syntax vary from one system to another.
- The notation
ls(1) refers to the entry for ls
in section 1.
- Run
man -a instead of man
, to see all the entries.
- Type
<space> to go to next part.
- On Linux and Mac OS X, type
q to quit at the end
of a man page.
Some vocabulary
- A utility is a
program that is provided with the system.
- A command is input
to a command interpreter.
- Some commands are handled by the command interpreter
directly, and some require starting another program.
HOME and SHELL are examples of environment variables.
There are also shell variables,
and we'll discuss both of these in more detail later.
To end a shell, type the command exit. You may
need to close the terminal window afterward.
To logout, type the command logout. Some window
managers have a button labeled EXIT, and others have Logout as an
option on some menu. Mac OS X has its own tricks; if you use
it, you know them already.
Don't forget to disconnect from the VPN if you needed it to log in
remotely.
If you are using a shared system, it is impolite to lock the screen
and then leave. It is especially impolite to turn the system
off after you logout, as someone else could be logged in remotely.
APUE, Sec. 1.4, Files and
Directories
Hierarchical file system
- A directory contains
a list of file names and directory names.
- A directory entry contains a name and attributes of the
name.
- The attributes of a file include the name of its owner,
access permissions, size, location of the contents, etc.
- The actual implementation varies in some details from one
system to the next, but much of it is standardized.
- A folder is just a
pretty picture that the GUI uses to display a connection to a
directory.
- A pathname is a
sequence of filenames
separated by
/
- a filename could be a file name or a (sub)directory name
- root, root directory
- absolute pathname, fully-qualified pathname
- starts with
/
- for example,
/usr/bin/gcc
- relative to the root directory of this file system
- relative pathname, partially-qualified pathname
- does not start with
/
- for example,
foo/bar
- relative to the current directory
- filename
bar
- Unix file and directory names are case-sensitive.
(More about this later.)
- filenames don't have extensions. (More about this
later.)
- current directory, current working directory, present working
directory
- associated with each running program (process)
- prefixed to a relative pathname, to form an absolute
pathname
- shorthand notations
- home directory (tilde)
~
- current directory (dot)
.
- parent directory (dot-dot)
..
- dot and dot-dot are actual directory entries, tilde is
expanded by the command shell
Some directories at the root level
/bin
|
executable files (binaries)
|
/usr
|
user libraries and
applications
|
/usr/bin |
more executable files |
/etc
|
configuration files
|
/dev
|
device interfaces
|
/tmp |
temporary files
|
- Mac OS X organizes executables and applications differently.
- Generally speaking, each system has its own locally-stored
file system, with some easy way to refer to file systems
elsewhere on the network.
- On the PSU CSE network,
/tmp is local storage,
while most other directories and files are on a networked file
server.
- Your Desktop directory could be local storage or networked
storage. It's probably local.
- Do not confuse file and directory names with Uniform Resource
Locators (URL's) used with Web browsers.
Directory-related commands
ls list the filenames in the
current directory
cd change the current directory to
your home directory
cd x change the current directory to x
pwd print the absolute pathname of
the current directory
There are "hidden files".
- any file or directory name that starts with
.
(dot) is not normally listed by the ls command
ls -a will list all files, including the
hidden files and hidden directories
The command shells can do filename expansion and match multiple
filenames in your current working directory.
~ (tilde) expands to the absolute pathname of
your home directory
* matches any sequence of characters
? matches any one character
[a-z] matches any one character between a
and z
- combinations are allowed
Try these commands (the semicolon
indicates sequential execution)
cd / ; pwd ; ls ; echo [def]*
cd etc ; pwd ; ls ; ls . ; ls ..
cd ; pwd
Exercise 1 --- Solution (no peeking!)
It is stated in APUE (p. 4),
"The only two characters that cannot
appear in a filename are the slash character (/) and
the null character. The slash separates the filenames that
form a pathname and the null character terminates a
pathname. Nevertheless, it's good practice to restrict the
characters in a filename to a subset of the normal printing
characters."
- Why are DOS-style pathnames incompatible with C character
strings?
- for example,
C:\Program Files\Outlook
Express\MSIMN.EXE
- Why is it a bad idea to use a space character or a tab
character in a filename?
- Why is it a bad idea to use a colon in a filename?
- Why is it a bad idea to use a semicolon in a filename?
- Same question, ampersand
- Same question, question mark, equals sign, ampersand
- Same question, percent sign
Files and directories have various properties (attributes)
associated with them.
% ls -l x.c bits.html
-rw-rw---- 1 dheller
fcse 3736 Aug 19
2002 bits.html
-rw------- 1 dheller
fcse 533
Dec 3 2004 x.c
- file type (here, the first
- indicates a "plain
file"; also, d for a directory, etc.)
- permissions (user, group, others; read, write, execute)
- number of links to the file (hard links, not soft links or
symbolic links)
- file owner and group (for matching permissions)
- number of bytes
- last modification time
- file name
- Unix filenames do not use filename extensions.
- In the filename
foo.pdf, ".pdf" is
part of the filename, but it's a suffix, not an extension.
- Many programs do pay attention to the filename suffix,
however.
The command file reads (part of) the file contents,
and tries to guess what type of file it is.
We created a file using MS Word (Mac 2008), and saved it in four
different formats.
% ls -l
-rw-------@ 1 dheller fcse 22016 Aug 10 12:08
foo.doc
-rw-------@ 1 dheller fcse 23398 Aug 10 12:07
foo.docx
-rw-------@ 1 dheller fcse 26257 Aug 10 12:08
foo.rtf
-rw-------@ 1 dheller fcse 5
Aug 10 12:13 foo.txt
% file *
foo.doc:
Microsoft Office Document
foo.docx:
Zip archive data, at least v2.0 to extract
foo.rtf:
Rich Text Format data, version 1,
foo.txt:
ASCII text, with CRLF line terminators
% cat foo.txt
foo
If you rename foo.doc to foo.bar, or foo.html,
then it is still recognized as a Word file by Mac OS X.
Source code from APUE (modified to avoid apue.h)
- Fig. 1.3 -- read a
directory, print the filenames
- right-click on the link to download the file
- A simple explanation of
printf()
- A quick explanation of the program
- take one command-line argument (
argv[1]), use
it as a directory name
- open the directory with
opendir()
- obtain a pointer (
dp) to a struct (type DIR)
that contains information about the directory
- close the directory with
closedir()
- read the directory entries with
readdir() in a
loop
- each time
readdir() is called, it advances
to the next directory entry
- obtain a pointer (
dirp) to a struct (type struct
dirent) that contains information about the
directory entry
- check all the return values in case there's an error of some
kind
- compile the program with
cc fig1.3.c
The executable file is named a.out
by default.
- If you want to name the executable file
myprog,
then compile with cc -o myprog fig1.3.c
- If you want to name the executable file
ls,
then compile with cc -o ls fig1.3.c
This may lead to confusion about which ls
program to run, yours or the system's. It's easily
resolved with some more information; meanwhile, don't do it.
- If you want to see more warning messages from the compiler
(a good idea!), then use
(on Solaris only) cc -v fig1.3.c
(on Solaris, Linux or Mac OS X) gcc
-Wall -Wextra fig1.3.c
- run the program (you might need to use
./a.out
instead - we'll explain this later)
- verify that the program is correct
- This part uses some techniques that are discussed in the
next section.
a.out . | sort > /tmp/x
/bin/ls -a . > /tmp/y
diff /tmp/x /tmp/y
rm /tmp/x /tmp/y
- If the
diff command reports nothing, then the
program is correct, except for the ordering of the output.
- How could these verification commands report a problem, even
if the program is correct?
- What directory are you in?
- Who else is doing the same thing?
- Does it matter that you lost the previous contents of
/tmp/x
and /tmp/y?
The Unix abstractions go even farther than what we described so far.
- A file is a sequence of bytes.
- Depending on permissions, you can open it, read it, write
it, close it, and share it with other users.
- Every I/O device can be modeled as a file.
- This includes memory, disks, keyboards, displays, and
networks.
- All system I/O can be made to appear as reading and writing
a file.
APUE, Sec. 1.5, Input and Output
Standard Input (stdin) and Standard Output (stdout)
are abstractions of some input source and output target. The
OS connects stdin to an input source, depending on how
you started the program, and similarly for stdout.
When the program runs, it only needs to know that stdin
and stdout are connected to something, and (usually)
won't need to care exactly what they are connected to. When
using an interactive shell, stdin and stdout
are normally connected to the keyboard and terminal window, but it's
easy to change that.
Standard Error (stderr) is a separate output channel
used for reporting errors. A frequent situation is that stdout
is connected to a file, and stderr is connected to a
terminal window with an interactive user. Server programs,
which have no interactive user, typically send error messages to a
separate log file for later inspection.
The Standard I/O Library <stdio.h>
defines stdin, stdout and stderr,
and a lot of other stuff such as printf(). See
also CP:AMA Ch. 22, or C:ARM Sec. 15.4.
stdin, stdout
and stderr in C correspond to cin, cout and cerr
in C++, but they are accessed via functions, and not through
methods and operators.
You can override the default connections on the shell's command
line, with the symbols <, > and |
(vertical bar). These can be combined.
command > file
|
stdout is
connected to file
|
output redirection
|
command < file
|
stdin is
connected to file
|
input redirection
|
command < file1 >
file2
|
stdin is
connected to file1
stdout is connected to file2 |
|
command1 | command2
|
stdout of command1
is connected to stdin of command2,
and the two processes run concurrently
|
pipeline
|
At this early stage, you should leave stderr connected
to the terminal, but it is also possible to connect it to a
file. The exact syntax of this depends on which shell you are
using.
A filter is a program that
reads from stdin, and writes to stdout,
and uses no other files except perhaps stderr.
See the use of sort above.
A pipe is the connection
between two processes denoted by the | symbol. A
pipeline is a sequence of
commands connected by pipes. Only the first and last commands
in a pipeline can use additional I/O redirection (input redirection
on the first, output redirection on the last).
In command1 | command2
, the operating system maintains an intermediate buffer between command1
and command2. The first command writes into
the buffer, while the second reads from it. If the buffer
fills, the first command is temporarily suspended; if the buffer
empties, the second command is temporarily suspended. The OS
also coordinates the processes to manage the buffer correctly.
There are some additional redirections possible. See the
shell's documentation for more details.
command > file |
stdout is
connected to file
the previous contents of file are lost
|
output
redirection
|
command >> file |
standard output of command
is appended to file
the previous contents of file are not
lost
|
When you use the open() function to open a file for
reading or writing, you get a file
descriptor which is used to identify the open file to
functions that manipulate it. When you use the fopen()
function, you get a file pointer.
stdin is a file pointer, and STDIN_FILENO
is a file descriptor; both refer to the same open file.
A file descriptor has type int,
and a file pointer has type FILE * . FILE
is a struct type defined in stdio.h.
The
OS uses a file descriptor as an index into a small array of
structures, while a FILE is a structure that
contains a file descriptor and some additional information.
The details vary from one implementation to another, but the
principles don't.
A stream is the underlying
concept of I/O in the C Standard Library. Since open()
and fopen() work on devices as well as files, and we
can connect the output of one program to the input of another,
without actually creating an intermediate file, it's often better to
speak of stream I/O than file I/O.
You can open a stream for reading, or for writing, or both,
subject to permissions that are checked by the operating system.
When you have read all the bytes of a stream, you have reached
end-of-file, and the next read operation returns a value that
indicates this condition. When entering data at the
keyboard, and the keyboard is treated as stdin, you
can type control-D to indicate end-of-file. Be careful that
you don't type control-D as a command, since that terminates input
to your command shell.
Here is a sampler of the available functions that have been
mentioned so far, and a few more. See CP:AMA Ch. 22 or C:ARM
Ch. 15 for much more information about the C library, and CS:APP Ch.
10 (Sec. 10.1-10.3) about the Posix library.
Posix Standard
|
unbuffered I/O
|
open
|
close
|
read
|
write
|
lseek
|
C Standard
|
buffered I/O
|
fopen
|
fclose
|
fread
getc
getchar
fgets
scanf
|
fwrite
putc
putchar
fputs
printf
|
fseek
ftell
|
| Posix Standard |
directory access
|
opendir
|
closedir
|
readdir
|
|
|
The read() and fread() functions pick up
a given number of bytes, the getc() and getchar()
functions pick up one character, fgets() picks up a
line of input as a character string, and scanf()
interprets a line of input to assign values to variables.
Source code from APUE (modified to avoid apue.h)
- Fig. 1.4 -- copy a file
from
stdin to stdout using read()
and write()
- Fig. 1.5 -- copy a file
from
stdin to stdout using getc()
and putc()
- This becomes your second assignment - Can you tell the
difference between a DOS-style text file and a UNIX-style text
file?
Exercise 2 --- Solution (no peeking!)
- CS:APP, Fig. 10.2, has a shorter example of file copying,
using
read() and write().
Explain how it differs from APUE Fig. 1.4.
- CP:AMA, pp. 568-569, has a longer example of file copying,
using
fopen() and fclose(), checking
the command-line arguments for misuse, and reporting problems if
the files can't be opened. It's a good example in general,
but which other features of misuse should also be considered?
APUE, Sec. 1.6, Programs and Processes (and threads)
A program is an executable
file.
A process is an executing
instance of a program. Processes are managed by the operating
system.
- process ID -- identifies the process to the OS
- user ID -- identifies the user who started the process
- group ID -- identifies the group of the user who started the
process
- These are non-negative integers with types
pid_t,
uid_t, gid_t.
- The process ID is unique among all current processes.
- It is possible for one program to be instantiated as more than
one process.
- for ex., several users (on the same system) open files with
the same editor
- It is possible for one command to start more than one process.
- a command pipeline,
prog1 | prog2
| prog3
- executing a shell script starts a new instance of the shell,
which in turn starts new processes while interpreting the
script
- Processes started by the same command form a process group.
Source code from APUE (modified to avoid apue.h)
- Fig. 1.6,
getpid()
obtains the process ID
- see also CS:APP, Sec. 8.4.1
- Fig. 1.9,
getuid()
obtains the user ID, getgid() obtains the user's
group ID
Process control functions, called from an existing process
fork() -- make a new process, as a copy of the
current process
- parent process, child process
- generalizes to a process
group or process
tree
- the parent and child processes both continue execution after
returning from
fork(), using the same program,
executing concurrently
exec() -- replace the program of the current
process
- a family of six functions with slightly different interfaces
- the original process ID, and the relationship between parent
and child processes, are not changed
waitpid() -- wait for a child process to
terminate
Source code from APUE (modified to avoid apue.h)
- Fig. 1.7, a rudimentary
command shell
- We will expand on this later as an extended project (Projects
6 and 7).
- see also CS:APP, Sec. 8.4, Process Control; compare APUE
Fig. 1.7 to CS:APP Fig. 8.15.
fgets() reads a line of input from stdin;
this is the command to be executed.
fgets() returns NULL on
end-of-file. NULL is the null pointer,
defined in stdio.h and elsewhere. The user
needs to type control-D to indicate end-of-file. This ends
the shell.
- The input line includes a newline character from when the user
typed return (or enter). This is removed.
- A child process is started with
fork().
This is a complete copy of the parent process. Both
instances of the program return from fork(), but
with different return values.
- fork() returns 0 to the child process, and returns
the (nonzero) process ID of the child process to the parent
process,
- unless it failed, and then it returns -1 to the parent
process, and there is no child process.
- The child process replaces its program with the one indicated
by the command, using
execlp(). If
successful, the child process has started a new program, which
does the requested work. If not, execlp()
returns, and the child exits.
- Meanwhile, the parent process waits for the child process to
finish, using
waitpid().
- The exit status of
the child is retrieved by
waitpid(), but we ignore
it here.
In all modern systems, a process contains one or more threads.
- "thread of control" -- imagine a line passing through program
statements as the execution progresses
- threads share the resources of a process, within the process address space
- functions
- global data
- file descriptors, etc.
- each thread has its own runtime stack, for local data and
function return addresses
- threads execute concurrently
- perhaps simultaneously if the hardware permits this
- Some care is required to coordinate the activities of
threads - more about this later.
In C (the 1989 and 1999 editions), threads are implemented through
library functions provided with the operating system. Posix
specifies a particular thread library. The 2011 edition of the
C Standard adds threads directly to the language, as does the 2011
edition of the C++ Standard.
APUE, Sec. 1.7, Error Handling
When a system function or library function fails, it returns
- an error indicator -- did the function succeed or fail?
- an error number -- if the function failed, why?
Typical error indicators, as return value
- -1, instead of a non-negative integer
EOF, instead of a valid character
NULL, instead of a valid pointer
- much more about this later
Typical error numbers
- the integer
errno, defined in <errno.h>,
may be assigned an error number
- system-defined error numbers are listed in
<errno.h>
- Always use the macro name, never the numerical value.
- Only a few are specified in the C Standard, many more are in
the Posix Standard.
- See CS:APP Sec. 8.3 and Appendix A.
- See CP:AMA Sec. 24.2, but that discussion isn't typical of
what we'll do in this course. The Q&A on pp. 637-638
(first two) is more like what we described here.
To interpret an error number as a character string,
#include <string.h>
char *strerror(int errnum);
strerror() returns a pointer to a string, that
you can supply to printf(), etc.
#include <stdio.h>
void perror(const char *msg);
perror() writes the message and error explanation
to stderr
strerror() gives you greater flexibility
- use
fprintf() to write to stderr
or any open output stream
printf() is tied to stdout, and perror()
is tied to stderr
100% ABSOLUTE RULE
- If something can fail, don't ignore that possibility.
99.9% ABSOLUTE RULE
- If something has failed, recover gracefully.
APUE, Sec. 1.8, User Identification
Every activity is associated with some user, from login to process
creation to logout.
- For security, passwords are checked and activity can be
logged.
- "Pseudo-users" can be created if there is no actual person
involved.
- The "superuser" root
is all-powerful.
File access permissions are checked by matching ID's of the file and
the requesting process.
source
|
information
|
describes
|
set at
|
Password File
|
user name
user ID
group ID
|
the user
|
account creation
|
properties of a process
|
process ID
user ID
group ID
|
the owner of the process
|
login,
process startup
|
properties of a file
|
user ID
group ID
|
the owner of the file
|
file creation
|
APUE, Sec. 1.9, Signals
A signal is a
generalization and abstraction of a hardware interrupt.
- Hardware interrupts are generated
asynchronously by processor or device activity, sometimes
intentionally, sometimes not.
- The OS reacts to the interrupt, and an interrupt handler (in the
OS) does something appropriate.
- No user process deals directly with hardware interrupts.
- Example - a machine instruction executes integer
divide-by-zero.
- A hardware exception occurs (exceptions are internally
generated, interrupts are externally generated, otherwise they
are treated in the same way by the OS).
- The processor catches the exception and calls an interrupt
handler pre-installed by the OS, which determines which
process caused the exception; the signal
SIGFPE
(floating-point exception) is then sent to that process.
- Exercise for some other course: Why did integer divide
by 0 report a floating-point exception? Does this happen
on every machine?
Signals are generated
asynchronously by some agent outside the process (via the OS), or by
the process itself (with OS support).
A signal causes transfer of control to a signal handler, which is a function provided by
the programmer.
- If this sounds like a C++ exception and exception handler, it
is, but signals are a simpler mechanism.
- A signal is delivered
to the process, and is caught
(or, received) by the
signal handler.
- The signal handler should do something appropriate, and then
return
or call exit().
- There are default signal handlers supplied by the system.
- You can write and install your own signal handlers for most
signals, with the
signal() or sigaction()
functions.
- The process can decide to delay receipt of a signal (block the signal) or to ignore the signal.
Source code from APUE (modified to avoid apue.h)
- Fig. 1.10, which
extends Fig. 1.7 with a signal handler
- The signal handler
sig_int(), like all signal
handlers, has one parameter indicating which signal caused it to
be called.
- The handler is installed by a call to
signal(),
associating it with the signal number SIGINT.
SIGINT is the name of a particular signal.
It's a symbol defined in <signal.h>.
- The signal handler is not actually called until its associated
signal is delivered.
Some signals can be sent to a process from the keyboard, via the
terminal controller and command shell.
keystroke
|
signal
|
default effect
|
control-C
|
SIGINT
|
terminate the process
|
control-Z
|
SIGTSTP
|
stop (suspend) the process
(it can be allowed to continue later by sending a SIGCONT signal)
|
control-backslash
|
SIGQUIT
|
dump core and terminate the
process
|
control-D
|
(none)
|
end-of-input indicator,
treated as end-of-file by stdin
|
- The phrase dump core
means to write a file that contains a memory image, enough information to run a
debugger and find out what state the process was in when it
ended. This phrase is the last vestige of the old core
memory that was used in early computer systems.
After typing control-Z to stop a process, you should either restart
it, or terminate it. Here is an example using the fg
(foreground) command, and then the bg (background)
command.
- A foreground process
is connected to a terminal window, and the command shell waits
for it to finish.
- A background process
is also connected to a terminal window, but the command shell
does not wait for it to finish, and it runs concurrently with
the shell.
- Actually, foreground and background refer to process groups,
but let's not get ahead of ourselves.
- To start a command as a background process, end the command
with
&
- To see a list of your background processes, use the command
jobs
% sleep 60
[60 seconds later we get a new prompt]
% sleep 60
[after some time, type control-C]
^C
[the process was terminated]
% sleep 60
[after some time, type control-Z]
^Z
Suspended
% jobs
[1] +
Suspended
sleep 60
% fg
[allow
the
process to continue in the foreground]
sleep 60
% sleep 60
[that one finished, start another]
^Z
Suspended
% bg
[allow
the
process to continue in the background]
[1] sleep 60 &
% jobs
[1]
Running
sleep 60
% jobs
[1]
Running
sleep 60
%
[1]
Done
sleep 60
%
APUE, Sec. 1.10, Time Values
How do you measure time on a computer?
How long does it take a program to run?
- clock time, wall-clock time
- CPU time
- user time
- system time, on behalf of the program
- All Unix systems have a
time command that runs a
program and reports these three times.
- Competition from other processes affects clock time, but not
CPU time (much).
Most modern microprocessors have cycle counters and
processor-specific hardware counters to measure "interesting events"
very accurately.
Exercise 3 --- Solution (no peeking!)
- When is wall-clock time not a reliable measurement of the time
it takes to run a program?
- The current time and date is provided from the operating
system as a 32-bit integer counting the number of seconds since
"the Epoch", midnight on 1 Jan. 1970, Coordinated Universal
Time. (See the
time() function.)
- When does this time "roll over"?
- What are the equivalent starting times for Windows and Mac OS
X?
APUE, Sec. 1.11, System Calls and
Library Functions
Function call -- transfer control to another part of the same
program
- The "function call" machine instruction includes a change of
address for the next instruction to execute, and saves a return
address.
- The "return from function call" instruction uses the
previously saved return address to get back to the place where
the function was called.
System call -- transfer control to the operating system
- direct entry to OS services
- callable from C, but very likely implemented in assembler
- The "system call" machine instruction also includes a change
of processor operating mode from User (unprivileged) to Kernel
(highly privileged), and saves state information for the current
process.
- The "return from system call" instruction also includes a
change of processor operating mode from Kernel to User, and
restores the previous state of the process.
System function
- for example,
write() makes a system call to send bytes to
an open device.
- The previous call to
open() would have
associated the file descriptor used by write()
with the proper device drivers for read and write operations
on the file.
- So, the actual system call that
write()
induces is not known until run time.
- This is why you need to understand pointers to functions.
Library function
- for example,
printf() creates a character string and
then uses write() to send the bytes to the
device (or pseudo-device) associated with stdout.
strcpy() copies character strings using
existing storage, and would not use a system function.
System calls are an integral part of the operating system.
System functions are provided with the operating system.
- In the Unix world, system functions are expected to follow the
Posix Standard.
Library functions are provided with the programming language, and
(in the case of C and C++) can usually be replaced by another
version.
- Library functions are expected to follow the appropriate
language standard.
- Posix adopts all of the C library functions, with some
clarification about C-unspecified behavior.
You can use your own library functions, and maybe even your own
system functions, but not your own system calls.
Summary of commands and examples so
far
utility
|
purpose
|
example
|
exit
|
quit a shell (you may need to
close the terminal window separately)
|
exit
|
logout
|
log out
|
logout
|
passwd |
change your password
On the CSE systems, use adpasswd instead.
|
passwd
|
man |
read Unix manual pages (<space>
for next screenful; q to quit) |
man csh
|
read all sections
|
man -a printf
|
apropos
|
search man page headers
|
apropos print
|
ls |
list directory contents |
ls /tmp
|
list content of current
working directory
|
ls
|
list with details (the long
option)
|
ls -l
|
apply ls to the
current directory itself, not its contents
|
ls -d
|
cp |
copy a file
|
cp source target
|
first, prompt for
confirmation (y) if would overwrite existing
file |
cp -i source target |
mv
|
move (rename) a file
|
mv old new
|
rm
|
remove a file
|
rm foo
|
first, prompt for
confirmation (y)
|
rm -i foo
|
Never execute the command
rm
*
unless you really mean it. |
|
cat
|
list the file contents
|
cat foo
|
more
|
list the file contents, by
screenful (<space> for next)
see also: less, head, tail
|
more foo
|
cd
|
change current working
directory
|
cd /tmp
|
change to home directory
|
cd
|
pwd
|
present working directory
|
pwd
|
mkdir
|
make a new directory
|
mkdir cmpsc311
|
rmdir
|
remove a directory (must be
empty)
|
rmdir oldstuff
|
cmp
|
compare two files
|
cmp x y
|
diff
|
compare two text files
|
diff x y
|
Exercises
4. Larry Wall, author of the Perl programming language, once
said "It's easier to port a shell than a shell script."
Explain why this is true; you will need to add some information
about the programming languages associated with command interpreters
in general.
- This remark was in response to an incident between David Korn,
author of the Korn shell, and a Microsoft product manager in
1998. You can read about it here.
5. The relevant code for this exercise is from APUE Appendix
B, Fig. B.3.
- APUE Exercise 1.4. In the error-handling function
err_sys
in Appendix B, why is the value of errno saved
when the function is called?
- When is the value of
errno saved? There is no assignment statement
involving errno in err_sys.
6. On APUE p. 7 (Sec. 1.4), concerning the sample program Fig 1.3, it is claimed "We
don't care what's in the DIR structure.". But,
from the functions opendir() and readdir()
it is possible to deduce at least one element of the DIR
structure. Explain why.
7. Use the program in APUE Fig. 1.3 (or the ls
-i command) and the touch command to
demonstrate that the Mac OS X file system is inconsistent about
whether filenames are case-sensitive or not.
8. strerror() and perror() have no
error conditions - they apparently never fail. Do they ever
modify errno? If so, how could that affect their
use?
Solutions to the Exercises
(no peeking!)
Last revised, 11 Jan. 2013