Due Date: February 5, 2008. 55 points
In this project, you will learn about the Linux/Solaris system calls that enable a process to be loaded. Fork and exec are the most prominent system calls in creating a process's address space and loading the desired code/data for that process, but process loading consists of several additional steps. In this project, you will document these steps (and learn the associated system calls) and answer specific questions regarding process loading.
The first part of the project will focus on compiling and running a small program to gather information about process loading. We will provide you with a UNIX tarball containing the code and the files necessary to build the code. You will be required to build the code on both a Linux system and a Solaris 10 system (i.e., SunOS 5.10) to learn the build mechanism in both. You will run the resultant binaries on each system under system call tracing tools to collect the system calls executed. In addition to the traces, you will have to document process loading at the level of the system calls used for the Linux case only.
In the second part of the project, you will have to answer some questions that will require some more focused investigation into process loading. For example, you will have to compare some aspects of the Linux and Solaris traces and examine the binary to identify the causes of certain process loading operations (don't worry, I will provide some guidance below).
Hopefully, this project will enable you to become more comfortable with the Linux/Solaris systems that we will be using and provide you will an initial familiarity with how systems get your programs to run.
Follow these instructions:
Download the following tarball Project 1 Code to your CSE account file space. You should have one file p1.tgz.
We are going to build both Linux and Solaris versions of the program. As a result, you will need to create two directories for the project. tar is a rather complex command -- use man tar if you want to learn more.
First, create the Linux directory p1_linux:
tar xvfz proj1.tgz
mv proj1 p1_linux
Then, create the Solaris directory p1_sol:
tar xvfz proj1.tgz
mv proj1 p1_sol
Next, build the binary versions of the project program on both a Linux and a Solaris system:
From the Linux directory, p1_linux, the program binary must be built using makefile.linux.txt. From the Solaris directory, p1_sol the program binary must be built using makefile.sun.txt. For information about compiling via Makefiles look at man make.
To compile the Linux version, you must be on a Linux machine. To compile the Solaris version, you must be on a Solaris 10 machine (i.e., SunOS 5.10) for this project. You can determine the operating system your machine is running via the command uname -a. Think about why this is beyond the obvious facts. Both programs are compiled (in their respective directories) by invoking make -f makefile.*sys* where *sys* is either 'linux' or 'sun' depending on the platform. You should then have two copies of the pr1_c89_32 file, one in each directory.
Now, we are going to run the program to generate the system call traces. Naturally, Linux and Solaris use different utilities to trace the system calls used by a process. Linux uses strace and Solaris uses truss. Be careful: Solaris has another service called strace, but it does something completely different. So, from the respective directories, run the two binaries using the following commands:
Linux: strace -o strace.linux pr1_c89_32
Solaris 10: truss -o truss.sun pr1_c89_32
These commands will generate files, strace.linux and truss.sun containing a sequence of system calls, including their argument values. Save these traces somewhere safe, as you will need them later.
Using the Linux trace (strace.linux), you need to describe how the process loading is executed. You may use man pages to determine what a system call does, but you will probably also need to search for web resources that provide additional guidance. You should be as precise as possible (given some limitations as described below). Perform the following tasks using the strace.linux file:
Define (in your own words) the function of each unique system call in the trace. There should be about 10 unique system calls in the trace.
Using brk as an example, I would see that the man page describes the system call as one that "changes the data segment size", but just returns the end of the current data segment when sent a 0 (actually the man page doesn't say that, but you would have looked at some web resources and find that that is true). As a result you could say (but will use your own words), that "brk either returns the current end of the process's data segment or changes (usually by extending) the end of the data segment. The kernel thus backs the data segment with the corresponding physical memory."
Write a description of the process loading protocol in terms of the sequence of system calls in strace.linux. With the exception of mmap, mprotect, and munmap you should be able to provide a detailed description of what the system call is doing and what object it is operating on. This must answer the question: What is the purpose of the operation and the object (file) it is operating on?
For mmap determine whether the system call is mapping data from a file (specify the file), providing a block of zero'd memory, or causing memory to be unmapped.
munmap and mprotect affect specific memory segments in the process's address space. Identify the memory segment by specifying which mmap command mapped the memory initially. You can simply identify the mmap command by number (don't have to write it out). One mprotect appears not to correspond to the mmap'd memory.
In addition, answer the following 3 questions. The first two compare the Solaris trace to the Linux trace. The last requires you to gain a basic understanding of the structure of a Linux binary.
The first file opened in the Linux and Solaris traces are ld.so.cache and ld.config, respectively. What information do these files provide? What is the difference between the contents of these two files?
In the Solaris trace, the system calls stat and resolvepath are invoked before opening any file. What is stat being used to verify and what does resolvepath do in general?
The libraries libc.so.* and libm.so.* are loaded (via mmap) after the executable by a mechanism known as dynamic linking. Define dynamic linking (in your own words), including a contrast with static linking.
(Optional, but potentially helpful) This program is also a good example of a C program, and provides some insights into what an address space looks like.
This program prints out several program variables (data) and functions (code) to help you understand how different variables and code correspond to different areas of an address space. The program prints the following information:
Address (Memory_Size) Variable_Name Value
Recall from Tu's lecture that an address space consists of a code segment (text), global data, dynamic data (heap), and local data (on the stack). How the variables/functions are defined should give you some clue as their segment in the address space. You should then see if the addresses correspond to your understanding. You can then answer the following questions:
Which variables are global? Which variables are local? What is difference in their memory locations?
Allocate a variable using malloc(). Where is this variable located in the address space relative to the others?
Where are local variables allocated? How function arguments (argv, argc)? What about the environment variables? You should familiarize yourself with environment variables.
Add your own variables to the program and see where they are allocated.