CMPSC 311, Introduction to Systems Programming

The C Preprocessor



Reading
References



Compiling a C program takes several steps.

In general,
Specifically for C,
The actual implementation of the compiler could be in two parts, or in one part.

Filename conventions   (for more, see the Sun and GCC documentation)
Compiler options, starting from file.c   (for more, see the Sun and GCC documentation)
Experiment



Some initial vocabulary



Preprocessor commands, in the .h or .c files

Preprocessor command line
Preprocessor command lines are removed from the source file (actually, replaced by blank lines), and may cause transformations of the remaining part of the source file.

Preprocessor commands, preprocessor directives, preprocessing directives
Preprocessor functions
Preprocessor operators
There are no comment-like preprocessor features, except for the null directive.

Exercise.  What is the proper name of the # character?



#include <something.h>
#include "something.h"

Examples

#include <stdio.h>
#include <stdio.h>  /* for printf() */
#include "something.h"

Bad examples

#include <something.c>   /* almost certainly wrong */
#include "something.c"   /* a symptom of bad program design */

Examples to try
From GCC
From Solaris



#define name body
#define name(identifier-list) body
#undef name
When the name of a defined macro is encountered by the preprocessor, it is replaced with the body of the macro.

#define TABLESIZE 20

int table1[TABLESIZE];
int table2[TABLESIZE];

for (int i = 0; i <
TABLESIZE; i++)
  ...

Simple macro, object-like macro
Parameterized macro, function-like macro
A function-like macro can be used anywhere a function can be used.
Example, from <stdio.h> on Solaris (more of this later)

#define getchar()   getc(stdin)
#define putchar(x)  putc((x), stdout)

Everyone's "favorite" example

#define max(a,b) ((a) > (b) ? (a) : (b))
Better?  Is this even legal?

#define max(a,b) { int A = (a), B = (b); A > B ? A : B }

The GNU compiler and Sun's compiler (most recent versions) have a C language extension known as statement expressions.  The following is allowed:

#define max(a,b) ({ int A = (a), B = (b); A > B ? A : B; })

[Note that an expression statement is a different concept, which is standard.]

Better?   Legal in C99 but not in C89.

inline max(int a, int b) { return a > b ? a : b; }

GCC: An Inline Function is As Fast As a Macro

Another example, although this really should be an inline function (more about that later).

#define swap(x,y) { int t = x; x = y; y = t; }
// properly used as  swap(a,b)
// easily misused as  swap(a,b);
// consider:   if (a < b) swap(a,b); else b = a;

#define swap(x,y) \
  do { int t = x; x = y; y = t; } while (0)

swap(a,b);
// now this is ok

Why is it not necessary to write
#define swap(x,y) \
  do { int t = (x); (x) = (y); (y) = t; } while (0)


This style appears in the Linux kernel, but it's often confusing.

#define incr(v,low,high) \
  for ((v) = (low); (v) <= (high); (v)++)

incr(j, 1, 20)
  printf("%d\n", j);

These are probably mistakes.

#define TABLESIZE = 20
#define func (a) sqrt(a)
#define func(a) a*sqrt(a)

The # operator (the "stringization" operator) converts a function-like macro argument to a string literal.

#define stringify(a) #a

stringify(word one) yields "word one"

Does stringify(word one, word two) yield "word one, word two"?

Try to avoid things like stringify("x").

Here's another example:

#define SHOW(type) \
  printf("%-32s %3zd %3zd\n", \
    #type, sizeof(type), __alignof__(type));


The ## operator pastes two function-like macro tokens together (token merging, token pasting).
There are further rules about rescanning and further replacement of macros.

The following is safe:
#define printf (void) printf
printf("hack\n");
but mind the spaces!



#if constant-expression-1
  group-of-lines-1
#elif constant-expression-2
  group-of-lines-2
#else
  group-of-lines-3
#endif

Conditional compilation, conditional inclusion
Example, from <stdio.h> on Solaris (indentation added)

#if __cplusplus >= 199711L
  
namespace std {
    
inline int getchar() { return getc(stdin); }
    
inline int putchar(int _x) { return putc(_x, stdout); }
  
}
#else
  
#define getchar()     getc(stdin)
  
#define putchar(x)    putc((x), stdout)
#endif /* __cplusplus >= 199711L */

Example, from the C Standard, to illustrate macro replacement on the #include line

#if VERSION == 1
    #define INCFILE "vers1.h"
#elif VERSION == 2
    #define INCFILE "vers2.h" // and so on
#else
    #define INCFILE "versN.h"
#endif
#include INCFILE

Better?

#if VERSION == 1
    #include "vers1.h"
#elif VERSION == 2
    #include "vers2.h" // and so on
#else
    #include "versN.h"
#endif



defined identifier
defined(
identifier)
#ifdef identifier
#ifndef identifier Exercise.  Evaluate defined(int) .



#line
#pragma
#error
#
Better?

#if VERSION == 1
    #include "vers1.h"
#elif VERSION == 2
    #include "vers2.h" // and so on
#else
    #error You blew it! VERSION not recognized.
#endif



Common techniques

To ignore large parts of a program,
#if 0
the compiler never sees this text
#endif
This is much safer than
/*
the compiler never sees this text
 */

The compilers allow object-like macros to be defined from the command line with the -D option.
cc -o prog -Dname=body prog.c
For example,
cc -o prog -DTABLESIZE=100 prog.c

prog.c
#ifndef TABLESIZE
#define TABLESIZE 20
#endif

int table1[TABLESIZE];
int table2[TABLESIZE];

for (int i = 0; i <
TABLESIZE; i++)
  ...

What would happen if we used this with the previous example?
cc -o prog -DTABLESIZE prog.c

A good rule is, Avoid multiple-definition errors.  Multiple declarations are allowed.
To avoid rereading an include file if it has been included already, write something.h as
#ifndef SOMETHING_H
#define SOMETHING_H
...
#endif

To avoid redefinition of a macro,  [simplified from <stddef.h> on Solaris]
#ifndef NULL
#define NULL    0
#endif

To avoid redefinition of a type, use a macro,  [simplified from <stddef.h> on Solaris]
#if !defined(_SIZE_T)
#define _SIZE_T
typedef unsigned long size_t;    /* size of something in bytes */
#endif  /* !_SIZE_T */

To select one of several cases,
#define TYPE_1 0
#define TYPE_2 1
#if TYPE_1
...
#endif
#if TYPE_2
...
#endif
How can you be certain that exactly one of these apply?
#if (TYPE_1 + TYPE_2) != 1
#error oops
#endif
Better?
cc -DTYPE=n ...

#if TYPE == 1
...
#elif TYPE == 2
...
#else
#error oops
#endif
Better?
use an int, or an enumerated type, and if/else statements, or a switch statement, make the selection at runtime



Example, from <assert.h> on Solaris, indentation added
#ifdef  NDEBUG

#define assert(EX) ((void)0)

#else

#if defined(__STDC__)

#if __STDC_VERSION__ - 0 >= 199901L
#define assert(EX) (void)((EX) || \
(__assert_c99(#EX, __FILE__, __LINE__, __func__), 0))
#else
#define assert(EX) (void)((EX) || \
(__assert(#EX, __FILE__, __LINE__), 0))
#endif /* __STDC_VERSION__ - 0 >= 199901L */

#else

#define assert(EX) (void)((EX) || \
(_assert("EX", __FILE__, __LINE__), 0))

#endif  /* __STDC__ */

#endif  /* NDEBUG */

We'll have more examples of the assert macro later.

__func__ is discussed in CP:AMA, p. 333; it is an identifier, not a macro.  The value of __func__ is essentially a pointer to a character string made from the name of the function currently being compiled.



Example, the swap function.

C uses pass-by-value only, but we can pass pointers.

static inline void int_swap(int *a, int *b)
{ int t = *a; *a = *b; *b = t; }

The usage would be like

int m = 5, n = 6;
int_swap(&m, &n);



Example, the swap macro/function as if we were using C++ pass-by-reference.

#define swap(a,b) int_swap(&a, &b)

The usage would be like

int m = 5, n = 6;
swap(m, n);

Question.  Will this confuse a C programmer who is not expecting to find C++-like features in the program?

Question.  If we use the wrong types, which line of code does the compiler complain about?

 1  static inline void int_swap(int *a, int *b)
 
2  { int t = *a; *a = *b; *b = t; }
 
3
 4 
#define swap(a,b) int_swap(&a, &b)
 
5
 
6  int main(void)
 
7  {
 
8    int m = 5, n = 6;
 
9    swap(m, n);
10    double x = 1, y = 2;
11    swap(x, y);
12    return 0;
13  }
 
% gcc -Wall -Wextra x.c
x.c: In function 'main':
x.c:11: warning: passing argument 1 of 'int_swap' from incompatible pointer type
x.c:11: warning: passing argument 2 of 'int_swap' from incompatible pointer type



Exercise.  Consider the swap macro discussed earlier.

#define swap(x,y) \
  do { int t = x; x = y; y = t; } while (0)

Some people might prefer it to be written this way.

#define swap(x,y) \
  do { int _t = x; x = y; y = _t; } while (0)

Why?  Is the second version really better?



<tgmath.h>, Type-Generic Math in C99

The non-generic math functions in <math.h> look like

double      sqrt (double x);
float       sqrtf(float x);
long double sqrtl(long double x);

The non-generic math functions in <complex.h> look like

double      complex csqrt (double complex x);
float       complex csqrtf(float complex x);
long double complex csqrtl(long double complex x);

The macros in <tgmath.h> allow you to write something like

pick_a_type y, z;
y = something;
z = sqrt(y);

You can now change the declared type of y and z without changing the rest of the code.
Here is a simple implementation, that distinguishes float, double and long double, but not the complex versions:

#define sqrt(x) \
  ((sizeof(x) == sizeof(double)) ? sqrt(x) : \
   (sizeof(x) == sizeof(float)) ? sqrtf(x) : sqrtl(x))

Note that all the comparisons and choices are done by the preprocessor, so there is no extra cost at runtime.  Since the preprocessor won't rescan the macro expansion of sqrt(y), there is no recursive explosion.

In C11, generic type-matching moves from the preprocessor to the language itself, and would look like
#define sqrt(x) _Generic((x), \
long double: sqrtl, \ default: sqrt, \ float: sqrtf \
) \
(x)
The generic selection of sqrt, sqrtf or sqrtl is determined from the type of x, without having to resort to the subterfuge of sizeof.  The selection is made by the compiler, not by the preprocessor, so there's more flexibility in the available types.



An excerpt from GCC's <tgmath.h>, Type-Generic Math in C99
/*
 *      ISO C99 Standard: 7.22 Type-generic math        <tgmath.h>
 */

#ifndef _TGMATH_H
#define _TGMATH_H       1

/* Include the needed headers.  */
#include <math.h>
#include <complex.h>


/* Since `complex' is currently not really implemented in most C compilers
   and if it is implemented, the implementations differ.  This makes it
   quite difficult to write a generic implementation of this header.  We
   do not try this for now and instead concentrate only on GNU CC.  Once
   we have more information support for other compilers might follow.  */

# ifdef __NO_LONG_DOUBLE_MATH
#  define __tgml(fct) fct
# else
#  define __tgml(fct) fct ## l
# endif


/* This is ugly but unless gcc gets appropriate builtins we have to do
   something like this.  Don't ask how it works.  */

/* 1 if 'type' is a floating type, 0 if 'type' is an integer type.
   Allows for _Bool.  Expands to an integer constant expression.  */
# define __floating_type(type) (((type) 0.25) && ((type) 0.25 - 1))

/* The tgmath real type for T, where E is 0 if T is an integer type and
   1 for a floating type.  */
# define __tgmath_real_type_sub(T, E) \
  __typeof__(*(0 ? (__typeof__ (0 ? (double *) 0 : (void *) (E))) 0    \
                 : (__typeof__ (0 ? (T *) 0 : (void *) (!(E)))) 0))

/* The tgmath real type of EXPR.  */
# define __tgmath_real_type(expr) \
  __tgmath_real_type_sub(__typeof__(expr), __floating_type(__typeof__(expr)))


/* We have two kinds of generic macros: to support functions which are
   only defined on real valued parameters and those which are defined
   for complex functions as well.  */
# define __TGMATH_UNARY_REAL_ONLY(Val, Fct) \
     (__extension__ ({ __tgmath_real_type (Val) __tgmres;             \
                       if (sizeof (Val) == sizeof (double)            \
                           || __builtin_classify_type (Val) != 8)     \
                         __tgmres = Fct (Val);                        \
                       else if (sizeof (Val) == sizeof (float))       \
                         __tgmres = Fct##f (Val);                     \
                       else                                           \
                         __tgmres = __tgml(Fct) (Val);                \
                       __tgmres; }))

...

/* XXX This definition has to be changed as soon as the compiler understands
   the imaginary keyword.  */
# define __TGMATH_UNARY_REAL_IMAG(Val, Fct, Cfct) \
     (__extension__ ({ __tgmath_real_type (Val) __tgmres;                     \
                       if (sizeof (__real__ (Val)) > sizeof (double)          \
                           && __builtin_classify_type (__real__ (Val)) == 8)  \
                         {                                                    \
                           if (sizeof (__real__ (Val)) == sizeof (Val))       \
                             __tgmres = __tgml(Fct) (Val);                    \
                           else                                               \
                             __tgmres = __tgml(Cfct) (Val);                   \
                         }                                                    \
                       else if (sizeof (__real__ (Val)) == sizeof (double)    \
                                || __builtin_classify_type (__real__ (Val))   \
                                   != 8)                                      \
                         {                                                    \
                           if (sizeof (__real__ (Val)) == sizeof (Val))       \
                             __tgmres = Fct (Val);                            \
                           else                                               \
                             __tgmres = Cfct (Val);                           \
                         }                                                    \
                       else                                                   \
                         {                                                    \
                           if (sizeof (__real__ (Val)) == sizeof (Val))       \
                             __tgmres = Fct##f (Val);                         \
                           else                                               \
                             __tgmres = Cfct##f (Val);                        \
                         }                                                    \
                       __tgmres; }))

...

/* Compute base-2 logarithm of X.  */
#define log2(Val) __TGMATH_UNARY_REAL_ONLY (Val, log2)

/* Return the square root of X.  */
#define sqrt(Val) __TGMATH_UNARY_REAL_IMAG (Val, sqrt, csqrt)

...




Stack Overflow



Last revised, 24 Jan. 2013