CMPSC 311, Introduction to Systems Programming

Introduction to Unix
Solutions to Exercises

Exercise 1

It is stated in APUE (p. 4),
"The only two characters that cannot appear in a filename are the slash character (/) and the null character.  The slash separates the filenames that form a pathname and the null character terminates a pathname.  Nevertheless, it's good practice to restrict the characters in a filename to a subset of the normal printing characters."
It helps to remember that character strings in C are represented as an array of characters whose contents follow one simple rule.  The first character in the array is the first character of the string, and the null character '\0' indicates the end of the string.  The "null terminator" is not part of the string, but it must be stored with the string because there is no other way to find the end of the string.  Of course, that's not the only thing to remember ...

Why are DOS-style pathnames incompatible with C character strings?

If you try to write the pathname as a character string in C, in the obvious way, you get "C:\Program Files\Outlook Express\MSIMN.EXE", but the backslash is interpreted by C as starting an escape character sequence.  You would need to use double-backslash to fix the problem.  This probably means rewriting the string into a larger char array, or altering a string literal.

Why is it a bad idea to use the space character in a filename?

At the shell command level, spaces are used to separate words in a command.  The command
ls Program Files
complains about not finding the files Program and Files.  Write this instead:
ls 'Program Files'
ls "Program Files"

Depending on how the output of ls is arranged, and how many spaces are in the file name, you could be fooled.  Consider
% ls
a       b
Does this tell me I have two files, a and b, or one file, a       b, with seven spaces in the middle of its name?

Why is it a bad idea to use a semicolon in a filename?
Same question, ampersand
Same question, question mark, ampersand, equals sign
Same question, percent sign

Most command shells use a semicolon as a separator between commands.  One day, you will write a script that goes wrong when the second half of a filename could be misinterpreted as the start of the next command.

There are similar problems with ampersand.  This also separates commands, but doesn't wait for the first command to complete before starting the second command.

Web browser URL syntax gets confused when the filename contains ?, & and =.  Create two files whose contents differ,
echo one > filename
echo two > 'filename?a=foo&b=bar'
We need the single-quotes to prevent the command shell from doing something "interesting" to the second filename.  Now start your favorite browser, and try to open the second file.  Most browsers notice the ?, and replace it in the URL with %3F (the ASCII character code for ? in hexadecimal is 3f), which gets you the second file.  Now manually change the URL file:///... to rereplace %3F with ?, and try again.  You will get the first file.

OK, now you can guess what happens with the last case.

Last revised, 9 Jan. 2012