CMPSC 311, Introduction to Systems Programming

Introduction to Unix

Reading and References

It all started with ...
Modern trends
Programming language standards
System interface standards
Commercial systems
Non-Commercial systems
Commercial systems Semi-Commercial systems
Unix Timeline, 1969 to date

APUE, Sec. 1.2, UNIX Architecture

An Operating System provides services to running programs (processes), and manages resources on their behalf.
Nearly everything about I/O can be abstracted as a file. Software layers
Toolkits and connections between tools

APUE, Sec. 1.3, Logging In

Login protocol
(PSU CSE specific)
Log in via
(PSU CSE specific)
Login instructions
Try these commands
cat /etc/passwd        (the public part of the password file)
echo $HOME             (your home directory)
echo $SHELL            (which shell is started at login?)
hostname               (which system is this?)
w                      (who is logged in?)
who                    (who is logged in?)
ls                     (what files and directories do I have?)
pwd                    (which directory am I in?)

A shell is a command-line interpreter.
Using man pages
Some vocabulary
HOME and SHELL are examples of environment variables.  There are also shell variables, and we'll discuss both of these in more detail later.

To end a shell, type the command exit.  You may need to close the terminal window afterward.

To logout, type the command logout.  Some window managers have a button labeled EXIT, and others have Logout as an option on some menu.  Mac OS X has its own tricks; if you use it, you know them already.

Don't forget to disconnect from the VPN if you needed it to log in remotely.

If you are using a shared system, it is impolite to lock the screen and then leave.  It is especially impolite to turn the system off after you logout, as someone else could be logged in remotely.

APUE, Sec. 1.4, Files and Directories

Hierarchical file system
Some directories at the root level
Directory-related commands
There are "hidden files".
The command shells can do filename expansion and match multiple filenames in your current working directory.
Try these commands (the semicolon indicates sequential execution)
Exercise 1   ---   Solution (no peeking!)

It is stated in APUE (p. 4),
"The only two characters that cannot appear in a filename are the slash character (/) and the null character.  The slash separates the filenames that form a pathname and the null character terminates a pathname.  Nevertheless, it's good practice to restrict the characters in a filename to a subset of the normal printing characters."
Files and directories have various properties (attributes) associated with them.

% ls -l x.c bits.html
-rw-rw----   1 dheller  fcse        3736 Aug 19  2002 bits.html
-rw-------   1 dheller  fcse         533 Dec  3  2004 x.c
The command file reads (part of) the file contents, and tries to guess what type of file it is.

We created a file using MS Word (Mac 2008), and saved it in four different formats.

% ls -l
-rw-------@ 1 dheller  fcse  22016 Aug 10 12:08 foo.doc
-rw-------@ 1 dheller  fcse  23398 Aug 10 12:07 foo.docx
-rw-------@ 1 dheller  fcse  26257 Aug 10 12:08 foo.rtf
-rw-------@ 1 dheller  fcse      5 Aug 10 12:13 foo.txt

% file *
foo.doc:              Microsoft Office Document
foo.docx:             Zip archive data, at least v2.0 to extract
foo.rtf:              Rich Text Format data, version 1,
foo.txt:              ASCII text, with CRLF line terminators

% cat foo.txt

If you rename foo.doc to, or foo.html, then it is still recognized as a Word file by Mac OS X.

Source code from APUE (modified to avoid apue.h)
a.out . | sort > /tmp/x
/bin/ls -a . > /tmp/y
diff /tmp/x /tmp/y
rm /tmp/x /tmp/y
The Unix abstractions go even farther than what we described so far.

APUE, Sec. 1.5, Input and Output

Standard Input (stdin) and Standard Output (stdout) are abstractions of some input source and output target.  The OS connects stdin to an input source, depending on how you started the program, and similarly for stdout.  When the program runs, it only needs to know that stdin and stdout are connected to something, and (usually) won't need to care exactly what they are connected to.  When using an interactive shell, stdin and stdout are normally connected to the keyboard and terminal window, but it's easy to change that.

Standard Error (stderr) is a separate output channel used for reporting errors.  A frequent situation is that stdout is connected to a file, and stderr is connected to a terminal window with an interactive user.  Server programs, which have no interactive user, typically send error messages to a separate log file for later inspection.

The Standard I/O Library <stdio.h> defines stdin, stdout and stderr, and a lot of other stuff such as printf().  See also CP:AMA Ch. 22, or C:ARM Sec. 15.4.

stdin, stdout and stderr in C correspond to cin, cout and cerr in C++, but they are accessed via functions, and not through methods and operators.

You can override the default connections on the shell's command line, with the symbols <, > and | (vertical bar).  These can be combined.

command > file
stdout is connected to file
output redirection
command < file
stdin is connected to file
input redirection
command < file1 > file2
stdin is connected to file1
stdout is connected to file2

command1 | command2
stdout of command1 is connected to stdin of command2,
and the two processes run concurrently

At this early stage, you should leave stderr connected to the terminal, but it is also possible to connect it to a file.  The exact syntax of this depends on which shell you are using.

A filter is a program that reads from stdin, and writes to stdout, and uses no other files except perhaps stderr.  See the use of sort above.

A pipe is the connection between two processes denoted by the | symbol.  A pipeline is a sequence of commands connected by pipes.  Only the first and last commands in a pipeline can use additional I/O redirection (input redirection on the first, output redirection on the last).

In  command1 | command2 , the operating system maintains an intermediate buffer between command1 and command2.  The first command writes into the buffer, while the second reads from it.  If the buffer fills, the first command is temporarily suspended; if the buffer empties, the second command is temporarily suspended.  The OS also coordinates the processes to manage the buffer correctly.

There are some additional redirections possible.  See the shell's documentation for more details.

command > file stdout is connected to file
the previous contents of file are lost
output redirection
command >> file standard output of command is appended to file
the previous contents of file are not lost

When you use the open() function to open a file for reading or writing, you get a file descriptor which is used to identify the open file to functions that manipulate it.  When you use the fopen() function, you get a file pointerstdin is a file pointer, and STDIN_FILENO is a file descriptor; both refer to the same open file.

A file descriptor has type int, and a file pointer has type FILE *FILE is a struct type defined in stdio.h.  The OS uses a file descriptor as an index into a small array of structures, while a FILE is a structure that contains a file descriptor and some additional information.  The details vary from one implementation to another, but the principles don't.

A stream is the underlying concept of I/O in the C Standard Library.  Since open() and fopen() work on devices as well as files, and we can connect the output of one program to the input of another, without actually creating an intermediate file, it's often better to speak of stream I/O than file I/O.

You can open a stream for reading, or for writing, or both, subject to permissions that are checked by the operating system.

When you have read all the bytes of a stream, you have reached end-of-file, and the next read operation returns a value that indicates this condition.  When entering data at the keyboard, and the keyboard is treated as stdin, you can type control-D to indicate end-of-file.  Be careful that you don't type control-D as a command, since that terminates input to your command shell.

Here is a sampler of the available functions that have been mentioned so far, and a few more.  See CP:AMA Ch. 22 or C:ARM Ch. 15 for much more information about the C library, and CS:APP Ch. 10 (Sec. 10.1-10.3) about the Posix library.

Posix Standard
unbuffered I/O
C Standard
buffered I/O
Posix Standard directory access

The read() and fread() functions pick up a given number of bytes, the getc() and getchar() functions pick up one character, fgets() picks up a line of input as a character string, and scanf() interprets a line of input to assign values to variables.

Source code from APUE (modified to avoid apue.h)
Exercise 2   ---   Solution (no peeking!)

APUE, Sec. 1.6, Programs and Processes (and threads)

A program is an executable file.

A process is an executing instance of a program.  Processes are managed by the operating system.
Source code from APUE (modified to avoid apue.h) Process control functions, called from an existing process
Source code from APUE (modified to avoid apue.h)
In all modern systems, a process contains one or more threads.
In C (the 1989 and 1999 editions), threads are implemented through library functions provided with the operating system.  Posix specifies a particular thread library.  The 2011 edition of the C Standard adds threads directly to the language, as does the 2011 edition of the C++ Standard.

APUE, Sec. 1.7, Error Handling

When a system function or library function fails, it returns
Typical error indicators, as return value
Typical error numbers
To interpret an error number as a character string,

APUE, Sec. 1.8, User Identification

Every activity is associated with some user, from login to process creation to logout.
File access permissions are checked by matching ID's of the file and the requesting process.

APUE, Sec. 1.9, Signals

A signal is a generalization and abstraction of a hardware interrupt.
Signals are generated asynchronously by some agent outside the process (via the OS), or by the process itself (with OS support).

A signal causes transfer of control to a signal handler, which is a function provided by the programmer.
Source code from APUE (modified to avoid apue.h)
Some signals can be sent to a process from the keyboard, via the terminal controller and command shell.

default effect
terminate the process
stop (suspend) the process (it can be allowed to continue later by sending a SIGCONT signal)
dump core and terminate the process
end-of-input indicator, treated as end-of-file by stdin
After typing control-Z to stop a process, you should either restart it, or terminate it.  Here is an example using the fg (foreground) command, and then the bg (background) command.
% sleep 60             [60 seconds later we get a new prompt]
% sleep 60             [after some time, type control-C]
^C                     [the process was terminated]
% sleep 60             [after some time, type control-Z]
% jobs
[1]  + Suspended                     sleep 60
% fg                   [allow the process to continue in the foreground]
sleep 60
% sleep 60             [that one finished, start another]
% bg                   [allow the process to continue in the background]
[1]    sleep 60 &
% jobs
[1]    Running                       sleep 60
% jobs
[1]    Running                       sleep 60
[1]    Done                          sleep 60

APUE, Sec. 1.10, Time Values

How do you measure time on a computer?
How long does it take a program to run?
Most modern microprocessors have cycle counters and processor-specific hardware counters to measure "interesting events" very accurately.

Exercise 3   ---   Solution (no peeking!)

APUE, Sec. 1.11, System Calls and Library Functions

Function call -- transfer control to another part of the same program
System call -- transfer control to the operating system System function
Library function
System calls are an integral part of the operating system.

System functions are provided with the operating system.
Library functions are provided with the programming language, and (in the case of C and C++) can usually be replaced by another version.
You can use your own library functions, and maybe even your own system functions, but not your own system calls.

Summary of commands and examples so far

quit a shell (you may need to close the terminal window separately)
log out
passwd change your password
On the CSE systems, use adpasswd instead.
man read Unix manual pages (<space> for next screenful; q to quit) man csh
read all sections
man -a printf
search man page headers
apropos print
ls list directory contents ls /tmp
list content of current working directory
list with details (the long option)
ls -l
apply ls to the current directory itself, not its contents
ls -d
cp copy a file
cp source target
first, prompt for confirmation (y) if would overwrite existing file cp -i source target
move (rename) a file
mv old new
remove a file
rm foo
first, prompt for confirmation (y)
rm -i foo
Never execute the command  rm *  unless you really mean it.
list the file contents
cat foo
list the file contents, by screenful (<space> for next)
see also: less, head, tail
more foo
change current working directory
cd /tmp
change to home directory
present working directory
make a new directory
mkdir cmpsc311
remove a directory (must be empty)
rmdir oldstuff
compare two files
cmp x y
compare two text files
diff x y


4.  Larry Wall, author of the Perl programming language, once said "It's easier to port a shell than a shell script."  Explain why this is true; you will need to add some information about the programming languages associated with command interpreters in general.
5.  The relevant code for this exercise is from APUE Appendix B, Fig. B.3.
6.  On APUE p. 7 (Sec. 1.4), concerning the sample program Fig 1.3, it is claimed "We don't care what's in the DIR structure.".  But, from the functions opendir() and readdir() it is possible to deduce at least one element of the DIR structure.  Explain why.

7.  Use the program in APUE Fig. 1.3 (or the  ls -i  command) and the touch command to demonstrate that the Mac OS X file system is inconsistent about whether filenames are case-sensitive or not.

8.  strerror() and perror() have no error conditions - they apparently never fail.  Do they ever modify errno?  If so, how could that affect their use?

Solutions to the Exercises  (no peeking!)

Last revised, 11 Jan. 2013