CMPSC 311, Introduction to Systems Programming

The C Standard



Reading
References



The C Standard describes the syntax, semantics and execution environment of the language and its associated libraries.
The official designation of the standard for the C programming language is ISO/IEC 9899:1999, and the newly-approved version is ISO/IEC 9899:2011.

The standard is produced by ISO/IEC JTC 1 / SC 22 / WG 14, http://www.open-std.org/jtc1/sc22/wg14/.
"All programming languages have constructs that are undefined, imperfectly defined, implementation-dependent, or difficult to use correctly.  As a result, software programs can execute differently than intended by the writer.  In some cases, these vulnerabilities can be exploited by an attacker to compromise the safety, security, and privacy of a system.

ISO/IEC JTC 1/SC 22/WG 23 is preparing comparative guidance spanning multiple programming languages, so that application developers will be better able to avoid the programming errors that lead to vulnerabilities in these languages and their attendant consequences.  This guidance can also be used by developers to select source code evaluation tools that can discover and eliminate coding errors that lead to vulnerabilities.

The project is preparing an ISO/IEC Technical Report containing guidance to users of programming languages on how to avoid the vulnerabilities that exist in the programming language selected for a particular project.  The document is tentatively scheduled for publication in 2010."  [and it was]

Why do we need a standard for a programming language?
We need not just "some idea", but "as clear an idea as possible" without overly constraining the design of computer systems.  Together, these goals allow a program to be portable from one system to another.

In general, the standard describes what is required, what is prohibited, and what is allowed within certain ranges.

In detail, the standard describes
The standard is divided into preliminary elements (some of which are discussed here), and
The C Standard is like a contract between implementers and users of the programming language and its associated tools.  In some countries, there may actually be legal consequences if you assert that your software is "Standards-Conforming".  In most large companies, the claim of being standards-conforming may be an important part of bidding on a software development project.

But, there is also a lot that is not specified.  By concentrating on the observable features of programs, the Standard avoids discussing the internals of compilers and instruction set architectures, and the use, limits and design of operating systems and input/output subsystems.

The Standard is accompanied by a non-binding Rationale document that explains the decisions of the C Standard Committee (up to April 2003).



Historical timeline For comparison, here is the C++ timeline.
And again for comparison, the timeline for the IEEE Floating-Point Arithmetic Standard.



Syntax and semantics

The syntax of a language describes the form of its constructs.  In English, that would be phrases and sentences, built upon the letters and words of the language.  In a programming language, that would be definitions, expressions, statements, etc., built upon the symbols and keywords of the language.  The syntax of English is rather flexible, but the syntax of a programming language should be precisely defined; some languages allow more flexibility than others.  The syntax of a language is usually described by a grammar, but there may be additional constraints on the constructs that are not expressed directly in the grammar.

The semantics of a language describes the meaning of its constructs.  The more completely or precisely a meaning is described, the more useful it can be.  For a programming language, its semantics describe what happens when a program is executed.

Some more details of the phrase "what happens" include
There is no reason to describe the semantics of a syntactically incorrect program.

Sometimes a part of the language is context-sensitive, so its syntax or semantics can only be determined in the context of some additional parts.  The structure or meaning of something might not be clear until you see the whole of it.  In the worst case, this would make it difficult to compose a large program from smaller modules.

For example, a comma character can appear in at least seven different contexts in C, depending on its surrounding text:
In English, ambiguous syntax and semantics can be entertaining, such as "Time flies like an arrow."  (Which is the verb, time, flies, or like?)  In a programming language, ambiguous syntax can be detected by a compiler, but ambiguous or undefined semantics is dangerous, and should be avoided.

C makes a distinction between the translation environment, where the compiler runs, and the execution environment, where the compiled program runs.  These are often the same, but that's not a requirement.  The execution environment often comes with an operating system, but that's not a requirement either, as the program might be the operating system, or might be executed from read-only memory.



Categories of behavior as specified in the C Standard

The term behavior describes the external appearance or action of a program or program component.  This much is observable when the program is compiled or runs.

The term implementation includes the compiler, header files, runtime library, and the execution environment in general. 
An implementation or program can follow the standard to varying degrees.  Strictly conforming programs are intended to be maximally portable among conforming implementations.  Conforming programs may depend on nonportable features of a conforming implementation.
Annex J of the C Standard (Portability issues) has a complete list of the behaviors which are described as unspecified, undefined, implementation-defined, or locale-specific.

The implementation and environment can be hosted or freestanding; the distinction is, roughly, with or without an operating system.  For example, an embedded system or the OS kernel itself would be a freestanding environment, while a workstation would be a hosted implementation.
Some features of the language and library are described as obsolescent, to indicate that they could be withdrawn in the future.  These features should be avoided in new programs.
All compilers have compile-time switches or options that allow the programmer to select from various levels of conformance, hosted or freestanding, and so on.

Categories of behavior as specified in the Posix Standard

more later ...

Categories of behavior as specified in the C++ Standard

more later ...



An informal statement of C's design philosophy



Some guiding philosophy used by the C Standard Committee

For the C89/C90 process
For the C99 process
For the C1X process



What were some of the changes from K&R C, traditional C, to C89?
What were some of the changes from C89 to C99?
What are some of the changes from C99 to C11?



Keywords in C89 (32)

auto
break
case
char
const
continue
default
do
double
else
enum
extern
float
for
goto
if
int
long
register
return
short
signed
sizeof
static
struct
switch
typedef
union
unsigned
void
volatile
while

Keywords in C99 (37)
Keywords in C11 (44)
Note that the C Preprocessor has an additional set of keyword-like symbols, such as ifdef and defined, but these are only recognized in preprocessor directives.



The C Standard headers

We highlighted the headers that describe the core library, that must be available even in a freestanding implementation.  The third column shows the version of the C Standard that first required the header, if not C89.  The chapter, section and table numbers refer to the course textbooks.  See also APUE Fig. 2.1, which needs to be updated for Solaris 10.

Header
Standard description
Cxx
CP:AMA
C:ARM
APUE
<assert.h> Diagnostics
Sec. 24.1
Sec. 19.1

<complex.h> Complex arithmetic
C99
Sec. 27.4
Ch. 23

<ctype.h> Character handling
Sec. 23.5
Ch. 12

<errno.h> Errors
Sec. 24.2 Ch. 11.2
Sec. 1.7
<fenv.h> Floating-point environment C99 Sec. 27.6 Ch. 22

<float.h> Characteristics of floating types
Sec. 23.1 Table 5-3
Sec. 2.5.1
<inttypes.h> Format conversion of integer types C99 Sec. 27.2 Ch. 21

<iso646.h> Alternative spellings C95
Sec. 25.3 Sec. 11.5

<limits.h> Sizes of integer types
Sec. 23.2
Table 5-2
Sec. 2.5
<locale.h> Localization
Sec. 25.1
Ch. 10

<math.h> Mathematics
Sec. 23.3-4
Ch. 17

<setjmp.h> Nonlocal jumps
Sec. 24.4
Sec. 19.4
Sec. 7.10
<signal.h> Signal handling
Sec. 24.3
Sec. 19.6
Ch. 10
<stdalign.h>
Alignment
C11



<stdarg.h> Variable arguments
Sec. 26.1 Sec. 11.4

<stdatomic.h>
Atomics
C11



<stdbool.h> Boolean type and values C99 Sec. 21.5 Sec. 11.3

<stddef.h> Common definitions
Sec. 21.4 Sec. 11.1

<stdint.h> Integer types C99 Sec. 27.1
Ch. 21

<stdio.h> Input/output
Sec. 22.1-8
Ch. 15
Ch. 5
<stdlib.h> General utilities
Sec. 26.2
Ch. 16

<string.h> String handling
Sec. 23.6
Ch. 13

<tgmath.h> Type-generic math C99 Sec. 27.5
Sec. 17.12

<threads.h>
Threads
C11



<time.h> Date and time
Sec. 26.3
Ch. 18
Sec. 6.10
<uchar.h>
Unicode utilities
C11



<wchar.h> Extended multibyte/wide character utilities C95 Sec. 25.5
Ch. 24

<wctype.h> Wide character classification and mapping utilities C95 Sec. 25.6
Ch. 24


The following table is derived from the C Standard, Annex B, Library summary.

Header
Types defined
<assert.h>
<complex.h> complex, imaginary are macros which expand to the keywords _Complex, _Imaginary
<ctype.h>
<errno.h>
<fenv.h> fenv_t
fexcept_t

<float.h>
<inttypes.h> imaxdiv_t
<iso646.h>
<limits.h>
<locale.h> struct lconv
<math.h> float_t
double_t
<setjmp.h> jmp_buf
<signal.h> sig_atomic_t
<stdalign.h>
<stdarg.h> va_list
<stdatomic.h> too many to list
<stdbool.h> bool is a macro that expands to _Bool
<stddef.h> ptrdiff_t
size_t
wchar_t
<stdint.h>  intptr_t
uintptr_t
 intmax_t
uintmax_t
 intN_t          (N = 8, 16, 32, 64)
uintN_t
 int_leastN_t
uint_leastN_t
 int_fastN_t
uint_fastN_t
<stdio.h> size_t
FILE
fpos_t
<stdlib.h> size_t
wchar_t
div_t
ldiv_t
lldiv_t
<string.h> size_t
<tgmath.h>
<threads.h> too many to list
<time.h> size_t
clock_t
time_t
struct tm
<uchar.h> mbstate_t
size_t
char16_t
char32_t

<wchar.h> wchar_t
size_t
mbstate_t
wint_t
struct tm
<wctype.h> wint_t
wctrans_t
wctype_t



Extensions to the C Standard

An implementation can support additional types, language features and library functions, as long as they are clearly marked as extensions of the language.  Of course, this makes the implementation non-conforming.

Here are some examples:  nested functions, typeof, insertion of assembly code, access to "unusual" types such as Intel's MMX/SSE and Motorola's AltiVec.  For more examples, see the GCC extensions list in the References, or Annex J.5 of the C Standard (Portability issues, Common extensions).

If you want to claim the highest level of portability, don't use extensions to the language.  If you are writing an operating system or a compiler, or a modern graphics library, the usual extensions are necessary, but be aware that they introduce implementation dependencies.



Programming support tools

Some typical tools that are not included with or specified by the C Standard
Some of the programming examples and projects to be described later would be part of a full inquiry and validation suite.

The idea of a C program checker to supplement a compiler goes back to 1979 with the lint program, still available in modern form on Solaris.  C is a flexible language, but it can be pushed into some highly questionable use.  Lint, and other tools like splint or cqual, can check whole programs for inconsistent or suspicious usages.

The C Standard Libraries are adopted as part of the Posix Standard libraries.  Note that Posix has a specification for some of the C99 compiler's command-line options, but the C Standard does not.



What is not described by the C Standard



Coding example

How do you know which version of the C Standard is being used, if any?  From outside the program, the compiler command can select which version of the C Standard to use; the default would be stated in the compiler documentation.  From inside the program, there are predefined macros that can help determine which version was used by the compiler, and this makes it possible for one source file to contain code for several different versions of the Standard.

Here is an example, where we want to write a function that will work with both standard and non-standard versions of the C compiler.

#ifdef __STDC__
  void print_date_compiled(void)
  {
    printf("%s", __DATE__);
  }
#else
  /* Not Standard C, void and __DATE__ not available. */
  int print_date_compiled()
  {
    printf("(unknown)"); 
    return 0;
  }
#endif

For more details and more examples, see CP:AMA, Sec. 14.3, Macro Definitions, esp. pp. 329-331, Predefined Macros, or C:ARM, Ch. 3, The C Preprocessor (especially Sec. 3.3.4, 3.9), and Sec. 10.1, 10.2.

The following is an exhaustive example, intended to gather information about the compiler and some of the available compiling options.
Operating System
Compiler
Solaris 10 Sun's
GCC 3.4.3
Solaris 9 Sun's
GCC 3.3
GNU/Linux 2.6.9

GCC 3.4.6
GNU/Linux 2.4.21

GCC 3.2.3
Mac OS X 10.4

GCC 4.0.1
Mac OS X 10.5

GCC 4.0.1
Mac OS X 10.6

GCC 4.2.1
Mac OS X 10.8

GCC 4.2.1
Microsoft Visual C++ Visual Studio




Standards and Portability

Hardware eventually becomes old and obsolete.  If your programs are tied closely to the hardware, they will also become old and obsolete.  A "high-level" programming language provides a useful abstraction of a processor and memory.  An operating system provides useful abstractions for additional devices, and the management of resources.  A standard for the language, and another standard for the interfaces to the operating system, give some assurance that the abstractions are useful on more than one kind of computer system, and will survive over time.

If your programs are carefully designed and well-written, they can be improved over time.  If your programs are portable, they can be moved to new hardware with no modification other than recompiling.  If the operating system and compiler are transportable, they can be moved to new hardware with relatively little modification, and then recompiled.  Now your programs have a chance to avoid becoming old and obsolete.

Design and Implementation

The combined history of the C and C++ languages shows that design and implementation go together.  This is typical of experimental programming.  Moreover, it suggested the rule that no feature would be added to the standard language unless it was already known to work and be useful.



References
Wikipedia



Last revised, 22 Jan. 2013