CMPSC
311,
Introduction to Systems Programming
Types and Objects
Scope, Linkage, Lifetime, Location
Reading
- CP:AMA
- Ch. 7, Basic Types
- Sec. 8.3, Variable-Length Arrays (C99)
- Ch. 10, Program Organization
- Ch. 15, Writing Large Programs
- Ch. 16, Structures, Unions and Enumerations
- Ch. 18, Declarations, incl. the Q&A
- Ch. 20, Low-Level Programming
- C:ARM
- Sec. 2.5, Identifiers; Sec. 2.6, Keywords
- Sec. 3.3.4, Predefined
Macros
- Ch. 4, Declarations (introductory part); Sec. 4.1,
Organization
of Declarations; Sec. 4.2, Terminology; Sec. 4.3, Storage
Class and
Function
Specifiers; Sec. 4.4, Type Specifiers and Qualifiers; Sec.
4.8,
External Names
- Ch. 5, Types (introductory part)
- Sec. 7.1, Objects, Lvalues, and Designators
References
- The C Standard, Sec. 3, Terms, definitions, and symbols
- The C Standard, Sec. 6,
Language
- Sec. 6.2 Concepts
- 6.2.1 Scopes of identifiers
- 6.2.2 Linkages of identifiers
- 6.2.3 Name spaces of identifiers
- 6.2.4 Storage durations of objects
- 6.2.5 Types
- Sec. 6.3 Conversions
- Sec. 6.7 Declarations
- Sec. 6.8 Statements and blocks
- 6.8.2 Compound statement
- 6.8.4 Selection statements
- 6.8.5 Iteration statements
- and a few others
An expression is a
sequence of operators and operands (with punctuators) that
- specifies computation of a
value, or
- designates an object or a function, or
- generates
side effects, or
- performs a combination of these.
A type is a set of values and a set of operations on those values.
- A variable or expression "has type T " when its values are
constrained to the domain of T
(the set of values of type T).
- The type of a variable is given by the variable's declaration.
- The type of an expression is given by the definitions of the
expression's operators, using the types of its operands.
- The
void type has no values and no
operations. A void expression is evaluated
only for
its side effects, and its value is discarded.
(void) printf("n is now %d\n", ++n);
- The C99 type
_Bool has values 0 and 1, and
operations ...
- See
<stdbool.h> for macros bool,
true, false.
Types in C are partitioned into
- object types -- types that fully describe objects
- function types -- types that describe functions
- incomplete types -- types that describe objects but lack
information needed to determine their sizes
- Note that "object" here means only some data, not as in
C++. More later.
- Note that, in C, a function is not an object.
- A function type describes a function only as seen in its
prototype, not as seen in its header and body. That's one
reason
why function types do not fully describe functions.
- The basic types are
char,
the
signed
and
unsigned
integer
types,
and
the
floating
types.
- The standard signed
integer
types are signed
char, short
int, int,
long int, and long long int.
- The extended signed
integer
types are implementation-defined.
- Together, these are the signed
integer
types.
- etc.
-
_Bool in C99 is an
unsigned integer type.
- An enumerated type is an integer type, but not a basic type.
- See CP:AMA pp. 136-137, or C:ARM Table 5-1, or the C
Standard,
for a more detailed
classification of the arithmetic types.
- Derived types are
pointers, arrays,
structures,
unions, and functions.
- A function type is derived from the type of its return
value,
and the types of its parameters.
- Atomic types were added in C11, for support of
multi-threaded
programs; this affects the load/modify/store operations.
- The
void type is an incomplete type that cannot
be
completed.
- An array type of unknown size is an incomplete type. It
is
completed, for an identifier of that type, by specifying the
size in a
later declaration (with internal or external linkage).
extern int A[]; //
incomplete, in one source file
int A[10]; //
completed,
in another source file
- A structure or union type of unknown content is an incomplete
type. It is completed, for all declarations of that type,
by
declaring
the same structure or union tag with its defining content later
in the
same scope.
struct
foo;
//
incomplete
struct foo { int bar; }; // completed
struct foo is the name of the type.
foo
is a structure tag.
- You cannot write the type name only as
foo,
which
would be allowed in C++. But you could write
typedef struct foo foo;
foo * x;
struct foo * y;
Although the type foo is incomplete, the
pointer type foo * is not. We will see
later that foo
(the structure tag) and foo (the type
identifier) are in
different name spaces, so there is no syntactic confusion.
- Incomplete types are often used as forward references; see C:ARM Sec.
4.2.3 for a linked list example.
The meaning of a value stored in an object or returned by a function
is
determined by the type of the expression used to access it.
- An identifier declared to be an object is the simplest such
expression; the type is specified in the declaration of the
identifier.
- Data in C has no inherent type. If you can access the
data
in more than one way, then the meaning of "its value" is not
unique.
- An object can be modified by going through an identifier.
unsigned int a = 0;
a = 1;
- An object can be modified by going through an
expression.
This
is legal, and does not bring a compiler warning.
unsigned int a = 0;
*(int *)&a = -1;
- Exercise. Find a better way to set a to "all 1 bits".
A declaration specifies
the
interpretation and attributes of a set of identifiers.
- An identifier is associated with the abstract properties of
some
C construct, such as a variable, function or type.
A definition of an
identifier
is a declaration for that identifier that
- for an object, causes storage to be reserved for that object;
- for a function, includes the function body;
- for an enumeration constant or
typedef name, is
the
(only) declaration of the identifier.
- An identifier is associated with the concrete representation
of
some C construct.
Notes
- A function prototype is a declaration, but not a definition.
- An object or function can be declared multiple times, as long
as
the declarations are the same or at least consistent.
- There is a formal definition of "consistent" in the C
Standard (compatible types).
- An object or function can be defined only once.
A declaration specifies the linkage,
the
storage duration, the scope, and
(part of) the type of the entities that are being declared.
- Linkage helps to
connect
declarations to definitions.
- relevant keywords -
extern, static
- Storage duration is
the
length of time that an object's storage is allocated.
- Equivalent terms are extent
and lifetime.
- Functions in C essentially have permanent storage duration.
- relevant keywords -
auto, extern,
static, register, _Thread_local
(in C11)
- Scope defines the
region
of the program text over which something is defined and
accessible.
- Related concepts are visibility
and name spaces.
- Most operations on objects need complete type information, so
there are rules for converting incomplete types to object types.
The placement of a declaration can affect the properties of the
identifier being declared.
- Top-level declarations
are outside of any function definition.
- These can declare or define variables, functions, types,
etc.
- Function definitions contain parameter declarations, and a
body.
- Function bodies contain blocks.
- Blocks may contain inner blocks. (Blocks can be nested.)
- Blocks may contain declarations.
- A declaration in an inner block hides
a declaration of the same identifier in an outer block or at the
top
level.
- A declaration is visible
throughout its scope, except when it is hidden by another
declaration
of the same identifier in an inner scope.
- Storage for a hidden identifier remains allocated, but it
becomes inaccessible until the program leaves the inner scope.
Declarations for object identifiers specify a type, with type
qualifiers, and a storage class specifier.
- For example,
extern const int foo;
- storage class specifier
extern
- type qualifier
const
- type specifier
int
- identifier
foo
A declarator describes the
properties of an identifier, in a declaration.
- used for variables, functions, types
- See CP:AMA Sec. 18.1-4, or C:ARM Sec. 4.5, for more info.
A type specifier describes
the properties of a type, in a declaration.
- used for type tags, structure and union components,
enumeration
constants
We'll ignore statement labels and preprocessor macros for now, as
they
do not represent objects or properties of objects.
The syntax of declarations in C11 (but not all of it; for example,
we
left out static assertions and alignment specifiers)
declaration:
declaration-specifiers
init-declarator-listopt ;
declaration-specifiers:
storage-class-specifier
declaration-specifiersopt
type-specifier
declaration-specifiersopt
type-qualifier
declaration-specifiersopt
function-specifier
declaration-specifiersopt
init-declarator-list:
init-declarator
init-declarator-list
, init-declarator
init-declarator:
declarator
declarator = initializer
storage-class-specifier:
typedef
extern
static
_Thread_local
auto
register
type-specifier:
void
char
short
int
long
float
double
signed
unsigned
_Bool
_Complex
atomic-type-specifier
struct-or-union-specifier
enum-specifier
typedef-name
type-qualifier:
const
restrict
volatile
_Atomic
function-specifier:
inline
_Noreturn
etc., etc.
There are some constraints on declarations:
- at most one storage class specifier, and it should come first
- at most one type specifier, after accounting for multi-keyword
types like
unsigned long long int
- etc.
An object (in C) is a
region
of
memory that can be examined and stored into.
- Do not confuse this simple concept with the more complex
concept
of object in C++ or Java.
From the C Standard,
- object, region of
data
storage in the execution environment, the contents of which can
represent values
- When referenced, an object may be interpreted as having a
particular type.
- An object representation
is a bit pattern without interpretation.
- value, precise
meaning
of the contents of an object when interpreted as having a
specific type
- Certain object representations need not represent a value of
the given type.
- access,
execution-time
action to read or modify the value of an object
Not everything is specified completely.
- An object declared as type
_Bool must be large
enough to store the values 0 and 1. The actual number of
bits or
bytes used for _Bool
is
not specified.
- The C Standard does not completely specify all values for all
the
types it defines. Values can be
- an unspecified value,
a valid value of the given type with no requirements on which
value is
chosen in any instance,
- a trap representation,
an
object
representation
that
(essentially)
prevents
use
of
the
object
as a value of the given type,
- use of a trap representation may lead to a program
exception
or fault, handled by the operating system
- this would be useful as an initial value that was not
otherwise specified by the program
- an indeterminate value,
which
is
either
an
unspecified
value
or
a
trap
representation,
- or an implementation-defined
value, which is an unspecified value where the
implementation
documents how the choice is made.
- The C Standard has a set of rules about the timing of
read/modify/write operations, to ensure that programs behave as
expected by the programmer.
- Ordinarily, the compiler and processor have some freedom to
decide when to load a value from memory, and when to store it,
as long
as the rules are respected.
An lvalue is an expression
that refers to an object in such a way that the object may be
examined
or altered.
- An lvalue designates
an
object, and provides its type information.
- An lvalue can have an object type or an incomplete type, but
cannot be
void or have a function type.
- See the notes on const for the
expressions that can be lvalues, and the operators requiring a
lvalue
operand.
- also, C:ARM Table 7-1, Table 7-2
A modifiable lvalue is an
expression that refers to an object in such a way that the object
may
be altered.
- Only a modifiable lvalue may be used on the left-hand side of
an
assignment (hence the name).
For example,
const int foo = 13;
int blat;
int *iptr = &blat;
defines an identifier foo,
and
while
the
expression
foo is an lvalue, it is not a
modifiable lvalue. The identifier iptr is an
lvalue, and a modifiable lvalue; the expression *iptr
is
also an lvalue and a modifiable lvalue.
- A modifiable lvalue is an lvalue that does not
have
- array type,
- an incomplete type, or
- a
const-qualified type, taken recursively for
members of structure and union types
More about lvalues and rvalues
- In C there is an important distinction between lvalues, which
correspond to memory locations, and rvalues, which are ordinary
values
like integers. In the C type system, lvalues and rvalues
are
given the same type. For example, consider the following
code:
int x;
x = ...;
... = x;
The first line declares that the name x refers to
a
location containing an integer. On the second line x
is used as an lvalue: it appears on the left-hand side of an
assignment, meaning that the content of the location
corresponding to x
should be updated. On the third line x is
used as
an rvalue, meaning that we are
not referring to the location of x, but to x’s
contents.
In
the
C
type
system,
x is given the type int
in both places, and the syntax distinguishes
integers that are
lvalues from integers that are rvalues.
A function designator is a
value of function type.
For example,
int bar(void);
int *gptr(void);
int (*fptr)(void) = bar;
declares a function bar
of type "function returning int", declares a
function gptr
of type "function returning pointer to int",
and defines an object fptr of type
"pointer to function returning int". The
expression fptr is an lvalue, and a modifiable
lvalue,
while the expressions bar, gptr and *fptr
are function designators.
- The expression
sizeof(function_designator)
is not allowed.
- The expression
&function_designator
yields a value of type "pointer to function
returning
...".
- Otherwise, a function designator with type "function returning
..." is convertible to an expression that has type "pointer to
function returning ...".
Any number of derived types
can be constructed from the object,
function, and incomplete types, as follows:
- An array type
describes
a contiguously allocated nonempty set of
objects with a particular member object type, called the element
type.
(Since object
types do not include incomplete types, an array of incomplete
type
cannot be constructed.) Array types are characterized by
their
element
type and by the number of elements in
the array.
- An array type is said to be derived from its element type,
and
if its
element type is T,
the array
type is called "array of T
".
- A structure type
describes a sequentially allocated nonempty set
of
member objects (and, in certain circumstances, an incomplete
array),
each of which has
an optionally specified name and possibly distinct type.
- Exercise. Explain why arrays are "contiguously
allocated" but structures are only "sequentially allocated".
- A union type
describes
an overlapping nonempty set of member
objects,
each of which has an optionally specified name and possibly
distinct
type.
- A function type
describes a function with specified return
type.
A
function type is characterized by its return type and the number
and
types of its
parameters.
- A function type is said to be derived from its return type,
and
if its
return type is T,
the
function type is called "function
returning T ".
- A pointer type may
be
derived from a function type, an object
type,
or an incomplete type, called the referenced
type. A pointer type
describes an object
whose value provides a reference to an entity of the referenced
type.
- A pointer
type derived from the referenced type T
is called "pointer to T ".
- A pointer type is a complete object type.
- An atomic type
describes the type designated by the construct _Atomic(type name) .
- Added in C11, but optional.
- Atomic types have restrictions on their access methods.
- These methods of constructing derived types can be applied
recursively.
Some further notes
- Arithmetic types and pointer types are scalar types.
- Array and structure types are aggregate
types.
- A union type is not an aggregate type, since a union type can
contain only one member at a time.
(Excerpt from the C Standard, lightly edited)
The address and indirection operators, unary & and
*
- The operand of the unary
& operator shall be
either a function
designator, the result of a [] or unary *
operator, or an lvalue that designates an object that is
not a bit-field and is not declared with the register
storage-class
specifier.
- The operand must be something in memory and addressable.
- The operand of the unary
* operator shall have
pointer type.
The unary & operator yields the address of its
operand.
- If the
operand has type T,
the
result has type "pointer to T
".
- If the operand is the result
of a unary
* operator, neither that operator nor
the &
operator is evaluated and the
result is as if both were omitted, except that the constraints
on the
operators still apply and
the result is not an lvalue.
- Thus,
&*E is equivalent to E
even if E is a null
pointer.
- If the operand is the result of a
[] operator,
neither the & operator nor the unary *
that is implied by the [] is evaluated and the
result is
as
if the & operator were removed and the []
operator were changed to a + operator.
- Recall, the expression
A[n] is defined to be *(A
+
n) if A has type pointer to T or array of T and n
is an integer.
- Thus,
&(E1[E2]) is equivalent to ((E1)+(E2)).
- Otherwise, the result is a pointer to the object or function
designated by its operand.
The unary * operator denotes indirection.
- If the operand points to a
function, the result is a function designator; if it points to
an
object, the result is an
lvalue designating the object.
- If the operand has type "pointer to T ", the result has
type T.
- If an invalid value has been assigned to the pointer, the
behavior of the
unary
* operator is undefined.
- It is always true that if
E is a function
designator or an lvalue that is a
valid operand of the unary & operator, *&E
is a function designator or an lvalue equal to E.
- If
*P is an lvalue and T is the
name
of an object
pointer
type, *(T)P is an lvalue that has a type
compatible
with that to which T points.
- Among the invalid values for dereferencing a pointer by the
unary
*
operator are a null pointer, an address inappropriately aligned
for the
type of object pointed to, and
the address of an object after the end of its lifetime.
- We'll define lifetime shortly.
(Excerpt from the C Standard, lightly edited)
The sizeof operator
sizeof yields a result of type size_t,
which is an
unsigned integer type defined in <stddef.h>
and
other headers.
- The
sizeof operator yields the size (in bytes)
of
its operand, which
may be an expression or the parenthesized name of a type.
The
size is
determined
from the type of the operand.
- You cannot apply
sizeof to an expression that
has
function type or an incomplete type, to the parenthesized name
of such
a type, or to an
expression that designates a bit-field member of a structure.
- If the type of the operand is a
variable length array type, the operand is evaluated; otherwise,
the
operand is not evaluated
and the result is an integer constant.
- When applied to an operand that has type
char, unsigned
char, or signed char, (or a qualified
version
thereof) the
result is 1.
- When applied to an
operand that has array type, the result is the total number of
bytes in
the array.
- When
applied to a function parameter declared to have array or
function type, the
sizeof operator yields the
size of the adjusted
(pointer) type.
- When applied to an
operand that has structure or union type, the result is the
total
number of
bytes in such an object, including internal and trailing
padding.
Examples
struct foo *foo_ptr = (struct foo *)
malloc(sizeof(struct
foo));
int array[] = { list of integer values };
int array_size = sizeof(array) / sizeof(array[0]);
The _Alignof operator
(C11)
- _Alignof yields a
result of type
size_t .
- You cannot apply _Alignof
to a function type or an incomplete type.
- The _Alignof
operator yields the alignment requirement of its operand
type.
The operand is not evaluated and the result is an integer
constant. When applied to an array type, the result is the
alignment requirement of the element type.
- The header <stdalign.h>
defines the macro alignof
as _Alignof .
A quick review and some vocabulary
Types and type qualifiers (more about these later)
- The type
int (for example) specifies a few
general
properties of
the data stored in an object.
- Type qualifiers describe how objects are accessed through
lvalues.
- The type qualifier
const specifies the rules for
changing the data stored in an object (you can't, after
initialization).
- The type qualifier
restrict ...
- The type qualifier
volatile ...
- Type qualifiers may be combined in the same declaration.
- The
_Atomic qualifier in C11 is treated somewhat
differently, but it still affects the access rules.
- Loads and stores of objects with atomic types are done with
memory_order_seq_cst
semantics.
- With any luck, someday there will be a course that explains
memory order semantics.
- For now, let's be satisfied with "seq_cst" =
"sequential consistency", which means "it works like you would
expect it to".
Types and storage qualifiers (more about these later)
- The storage qualifier
extern ...
- The storage qualifier
static ...
The const keyword
- The
const type qualifier specifies
the rules for
changing the data stored in an object (you can't, after
initialization).
- For examples of
const-qualified pointers, with
errors and warnings from the compilers, see const.pdf.
- You must distinguish between the ability to change a
pointer,
and the ability to change what the pointer points to.
type
read
as
int
*
pointer
to
int
const int
*
pointer
to const int
int *
const
const
pointer to int
const int * const const
pointer to const
int
- "Casting away
const-ness" is a common problem
that
leads to bugs. This is also illustrated in const.pdf.
External agents and the volatile keyword
- The
volatile type qualifier informs the compiler
that the value of an object could be changed asynchronously by
an
external agent.
- One consequence is that the object cannot be stored in a
register, although you can copy it to a register.
- See C:ARM Sec. 4.4.5 for more info and examples.
- Memory-mapped input/output registers are the usual example,
and
that is done on p. 93-94 (
volatile.pdf).
- The notion of a sequence
point
is defined in the C Standard to describe the places in a program
where
changes to objects (via side effects like assignment) are
required to
have been completed. This is surprisingly difficult to get
right
when dealing with volatile objects and threads. See C:ARM
Sec.
7.12.1 for some more info.
Aliases and the restrict keyword
- For example,
int *a, *b; // two
pointers
b = something
legitimate;
a = b;
After the second assignment, a and b
point
to the same location, so *a is an alias for *b.
- Continuing the example,
void func(int *p, int
*q) {
... }
func(a, b);
In the function call, the arguments and therefore the parameters
(after
dereferencing) are aliases of each
other. The author of func() might have been
expecting that the parameters p and q
point
to different objects.
- The
restrict type qualifier in C99 informs the
compiler that a pointer is to be treated as if it is the only
pointer
to an object (in the current context).
void func(int * restrict
p, int * restrict q)
{ ...
}
- If that is not actually true, the behavior is undefined
(it's a
programming error, with indeterminate consequences).
- See C:ARM Sec. 4.4.6 for more info and examples.
Scope - over which region of the program is something defined and
accessible?
- block scope
- selection and iteration statements are special cases of
block
scope
- function scope
- file scope
- global scope is an extension of file scope
- The placement of a definition determines the scope of the
object
being defined.
Linkage - how can we connect declarations and definitions between
scopes?
Lifetime - over what length of time is something allocated?
- between entry to and exit from a block, loop or function
(automatic
variables)
- between allocate and deallocate actions (dynamically-allocated
variables)
- between the start and end of the program execution
(statically-allocated
variables, functions)
- between the start and end of a thread's execution (per-thread
variables)
Location - where is something allocated in memory?
- address space in general
- memory segments in general
- stack
- one per thread, including main()'s
thread
- heap
- data section
- text section
What can affect the content of memory?
Scope - over which region of the program is something defined and
accessible?
Related concepts
- visibility
- hidden declarations
- duplicate declarations
- defining declaration
- referencing declaration
- You can have multiple referencing declarations, but only one
defining declaration.
- We earlier used the terms declaration and definition, which
gives the same concept.
Examples
extern int foo; //
referencing declaration
int foo; // defining declaration
extern void f(void); // referencing declaration
void g(void) { f(); } // reference to f
void f(void) { g(); } // defining declaration of f
Exercise. If a variable is "out of scope", does that mean it
no longer has any associated storage?
An identifier in C is a
name
for something.
An identifier in C can denote
- an object, via a variable name
- a function, via a function name
- a member of a structure, union, or enumeration
- A member of an enumeration is an enumeration constant.
- a type, via a
typedef name
- a type tag (part of a
struct, union
or enum type name)
- a label name
- a macro name
- a macro parameter
- Macro names and macro parameters are removed by the
preprocessor, so we won't consider them any further here.
The same identifier can denote different entities at different
points
in the program.
- For each different entity that an identifier designates, the
identifier
is visible (i.e., can
be
used) only within a region of program text
called its scope.
- Different entities designated by the same identifier either
have
different scopes, or are in different name spaces.
- If more than one declaration of a particular identifier is
visible at
any point in a translation unit, the syntactic context
disambiguates
uses that refer to different entities.
- A translation unit
is
a C source file (
.c).
There are separate name
spaces for various categories of identifiers,
- preprocessor macro names (these are removed early on, so they
cannot conflict with other identifiers)
- statement label names (disambiguated by the syntax of the
label
declaration
and use)
- the tags of structures, unions, and enumerations
(disambiguated
by following any of the keywords
struct, union,
or enum)
- There is only one name space for tags even though three
are possible.
- the members of structures or unions; each structure or union
has
a separate name space for its members (disambiguated by the type
of the
expression used to access the member via the
. or
->
operators)
- all other identifiers, called ordinary
identifiers (declared in ordinary declarators or as
enumeration
constants)
- also called overloading
classes
in C
- not the same as
namespaces in C++, which are
programmer-defined
There are four kinds of scopes in C,
- function prototype scope
- function scope
- block scope
- file scope
- also called global scope
but that can be misleading
- Function prototype scope and function scope are determined by
syntax.
- Block scope and file scope are determined by the placement of
a
declaration (inside a block or not).
Function prototype scope
- A function prototype is a declaration of a function that
declares
the types of its parameters.
- The parameters in a function prototype may be named, or
unnamed.
- If the declarator or type specifier that declares
the identifier appears within the list of parameter declarations
in a
function prototype (not part of a function definition), the
identifier
has function prototype scope,
which terminates at the end of the
function declarator.
- A formal parameter in a function prototype has scope extending
from its declaration point to the end of the function prototype.
Function scope
- A label name is the only kind of identifier that has function
scope.
It can be used (in a
goto statement) anywhere in
the
function in which
it appears, and is declared implicitly by its syntactic
appearance
(followed by a : and a statement).
- The scope of a statement label is the entire body of the
function
in which it appears.
Block scope (also called local scope)
- If the declarator or
type specifier that declares the identifier appears inside a
block or
within the list of parameter declarations in a function
definition, the
identifier has block scope,
which terminates at the end of the
associated block.
- A formal parameter in a function definition has scope
extending
from its declaration point to the end of the function body.
- See C:ARM Sec. 9.3 for much more info.
- A block (local) identifier has scope extending from its
declaration point to the end of the block.
- We'll define the varieties of blocks shortly.
File scope
- If the declarator or
type specifier that declares the identifier appears outside of
any
block or list of parameters, the identifier has file scope, which
terminates at the end of the translation unit.
- A top-level identifier
has scope extending from its declaration point to the end of the
source
file containing the declaration.
- The concept of linkage
allows you to connect file scope functions and variables across
a whole
program.
- This is where you get the idea of global scope.
- The keyword
static in the declaration of a
top-level identifier prevents that identifier from being
accessible
outside its own file scope.
- The keyword
extern in the declaration of a
top-level identifier allows that identifier to be accessible
outside
its own file scope; this is the default.
Properties of scopes
- Nested scopes can cause identifiers to be hidden.
- If an identifier designates two different
entities in the same name space, the scopes might overlap.
If so,
the
scope of one entity (the inner
scope)
will
be a strict subset of the
scope of the other entity (the outer
scope). Within the inner
scope,
the identifier designates the entity declared in the inner
scope; the
entity declared in the outer scope is hidden
(and not visible) within
the inner scope.
- If a variable or function's identifier is hidden, its storage
remains allocated.
- Exercise. How many variables named
x
are there here? If two, which is being assigned?
int x;
void foo(void) { int x; x =
17; }
- Exercise. How many variables named
x
are there here? If two, which is being assigned?
void foo(void) {
int
x; { int x; x =
17; } }
- Exercise (mildly tricky). How many variables named
x
are there here? If two, which is being assigned?
void foo(int x) { int x; x
=
17; }
% gcc -Wall -Wextra -c x.c
x.c: In function 'foo':
x.c:1: error: 'x' redeclared as different kind of symbol
x.c:1: error: previous definition of 'x' was here
x.c: At top level:
x.c:1: warning: unused parameter 'x'
- Two identifiers have the same
scope if and only if their scopes
terminate at the same point.
- This settles a number of questions about multiple
definitions
in the same scope.
Where does a scope begin?
- Structure, union, and enumeration tags have scope that begins
just
after the appearance of the tag in a type specifier that
declares the
tag.
- Each enumeration constant has scope that begins just after the
appearance of its defining enumerator in an enumerator list.
- Any other
identifier has scope that begins just after the completion of
its
declarator.
We skipped preprocessor macros for convenience. Their scope
extends from a #define to the end of the source file,
or
to a
corresponding #undef.
More about block scope
A block allows a set of declarations and statements to be grouped
into one syntactic unit. The initializers of objects that have
automatic storage duration, and
the variable length array declarators of ordinary identifiers with
block scope, are
evaluated and the values are stored in the objects (including
storing
an indeterminate value in
objects without an initializer) each time the declaration is reached
in
the order of
execution, as if it were a statement, and within each declaration in
the order that declarators
appear.
A compound statement is a
block.
compound-statement:
{
block-item-listopt
}
block-item-list:
block-item
block-item-list
block-item
block-item:
declaration
statement
A selection statement is a
block whose scope is a strict subset of
the scope of its enclosing block.
- Each associated substatement is also a block whose
scope is a strict subset of the scope of the selection
statement.
selection-statement:
if
(
expression )
statement
if
(
expression ) statement else
statement
switch
(
expression ) statement
An iteration statement is
a
block whose scope is a strict subset of
the scope of its enclosing block.
- The loop body is also a block whose scope is a strict
subset of the scope of the iteration statement.
iteration-statement:
while
(
expression )
statement
do
statement
while ( expression
) ;
for
(
expressionopt
; expressionopt ; expressionopt ) statement
for ( declaration expressionopt ; expressionopt ) statement
- The declaration part of a
for statement shall
only
declare
identifiers for objects having storage class auto
or register.
- There appears to be a missing semi-colon between the
declaration
part and the following expression, but the semi-colon appears in
the
declaration itself, so we're ok.
The statement
for ( clause-1 ; expression-2
; expression-3
)
statement
behaves as follows
- The expression
expression-2
is the controlling
expression that is evaluated before each execution of the loop
body.
- The expression
expression-3
is evaluated as a void expression after each
execution of the loop body.
- If clause-1 is a
declaration, the scope of any identifiers it
declares is the remainder
of the declaration and the entire loop, including the other two
expressions; it is reached in
the order of execution before the first evaluation of the
controlling
expression.
- If clause-1
is an expression, it is evaluated as a
void
expression
before the first evaluation of the
controlling expression.
Thus, clause-1 specifies
initialization for the loop, possibly
declaring one or more variables for use in the loop; the controlling
expression, expression-2,
specifies
an
evaluation
made
before
each
iteration,
such
that
execution
of the loop continues until the
expression compares equal to 0; and expression-3 specifies an
operation
(such as incrementing) that is performed after each iteration.
Both clause-1 and expression-3 can be omitted. An
omitted expression-2
is replaced by a nonzero constant.
continue statements do not impact the current
scope.
break statements terminate execution of the smallest
enclosing
switch or iteration statement, so execution leaves the
current scope. return statements leave all
scopes
entered since most-recently calling the function.
An identifier declared in different scopes or in the same scope more
than once can be made to refer to the same object or function by a
process called linkage.
- There is no linkage between different
identifiers.
There are three kinds of linkage:
- external,
- internal,
- none.
In the set of translation units and libraries that constitutes an
entire program, each declaration of a particular identifier with
external linkage denotes
the
same object or function. Within one
translation unit, each declaration of an identifier with internal
linkage denotes the same object or function. Each
declaration of
an
identifier with no linkage
denotes a unique entity.
If the declaration of a file scope identifier for an object or a
function contains the storage class specifier static,
the
identifier
has internal linkage.
[A
function declaration can contain the
storage-class specifier static only if it is at file
scope.]
For an identifier declared with the storage-class specifier extern
in a
scope in which a prior declaration of that identifier is visible, if
the prior declaration specifies internal or external linkage, the
linkage of the identifier at the later declaration is the same as
the
linkage specified at the prior declaration. If no prior
declaration is visible, or if the prior declaration specifies no
linkage,
then the identifier has external
linkage.
If the declaration of an identifier for a function has no
storage-class
specifier, its linkage is determined exactly as if it were declared
with the storage-class specifier extern. If the
declaration of an
identifier for an object has file scope and no storage-class
specifier,
its linkage is external.
The following identifiers have no
linkage: an identifier declared to be
anything other than an object or a function [a label or a type, for
ex.]; an identifier declared to
be a function parameter; a block scope identifier for an object
declared without the storage-class specifier extern.
If, within a translation unit, the same identifier appears with both
internal and external linkage, the behavior is undefined.
- See CP:AMA Sec. 10.2, 15.2 and 18.2, or C:ARM Sec. 4.8, for
more
discussion of external names and
definitions of
extern variables. C:ARM Sec.
4.8.5
has
good recommendations about use of top-level declarations in
source and
header files.
An object has a storage duration
that determines its lifetime,
or extent.
Lifetime - over what length of time is something allocated?
- between entry to and exit from a block, loop or function
- local extent, automatic storage duration
- automatic
variables and function parameters, on the stack
- between allocate and deallocate actions
- dynamic extent, dynamic storage duration
- dynamically-allocated objects, in the heap
- between the start and end of the program execution
- static extent, static storage duration
- functions and statically-allocated
variables, in the text and data sections
- between the start and end of a thread execution (C11)
- thread storage duration
- the first thread to be allocated runs main()
- each thread has its own stack, and may have its own data
section
- all threads share the text section
- all threads share the heap, but each thread could have its
own
subsection of the heap for efficiency
Equivalent terms
- lifetime
- This is the generic concept.
- extent
- This term is used in C:ARM.
- storage duration
- This term is used in the C Standard and CP:AMA.
Equivalent terms?
- stack-allocated dynamic variables
- heap-allocated dynamic variables
Quick summary
- Variables declared at the top level have static extent.
All
functions have static extent.
- Variables declared in blocks have local extent, unless
declared
with the
static storage class specifier, and then
they
have static extent. In either case, the variable has block
(local) scope.
- Function parameters have local extent.
malloc() and free() allocate and
deallocate objects with dynamic extent.
Exercises
- What would typically be the storage class of
- a block-scope variable
- a file-scope variable
- If a pointer is being used to hold the address of a
dynamically-allocated object, should the pointer have local
extent,
dynamic extent, or static extent?
- The lifetime of a pointer
should be longer than the lifetime of the object it points
to.
- Should we discuss the "useful lifetime" of a pointer?
- True or False? The lifetime of a variable must not
exceed its scope.
Storage class specifiers in C
auto
- This is the default for block-scope variables.
static
extern
- This is the default for functions and file-scope
variables.
Functions can have no other storage class.
register
- This is a suggestion to the compiler to keep the object in a
register, for efficiency.
typedef
- This is a storage-class specifier for syntactic convenience
only.
_Thread_local
- Added in C11.
- If used at block scope, must be used with either static or extern.
- <thread.h>
defines the macro thread_local
as _Thread_local
There are four storage durations in C.
| storage
duration |
extent
|
address
space,
section
|
storage
class
specifier
|
static
|
static
|
text, data
|
extern, static
|
| automatic |
local
|
stack
|
auto
|
| allocated |
dynamic
|
heap
|
|
thread (C11)
|
thread
|
thread-specific
|
_Thread_local
|
The key difference between extern and static
for file-scope objects and functions:
- Both indicate static extent, because the name has file scope.
- If
extern, the name is known to the
linker.
Access to the name is possible from any source file in the
program.
- If
static, the name is not known to the
linker. Access to the name is restricted to the source
file where
the declaration appears.
The key difference between auto (the default) and static
for block-scope variables:
- Both indicate local scope.
auto indicates local extent. The variable
is
initialized when the block is entered (C89) or when the
declaration is
encountered (C99).
static indicates static extent. The
variable
is initialized when the program starts, and it retains its
current value
throughout the lifetime of the process.
The lifetime of an object
is
the portion of program execution time
during
which storage is guaranteed to be reserved for it.
- An object exists, has a constant address, and retains its
last-stored value throughout its lifetime.
- The term "constant address" means that two pointers to the
object constructed at possibly different times will compare
equal. The
address may be different during two different executions of
the same
program.
- In the case of a volatile object, the last store need not be
explicit in the program.
- If an object is referred to outside of its lifetime, the
behavior
is undefined.
- The value of a pointer becomes indeterminate when the object
it
points to reaches the end of its lifetime.
- But note that the value might not change; you should not
dereference it.
An object whose identifier is declared without the storage-class
specifier _Thread_local,
and either with external or internal
linkage, or with the storage-class specifier static,
has
static storage
duration. Its lifetime is the entire execution of the
program and
its
stored value is initialized only once, prior to program startup.
An object whose identifier is declared with the storage-class
specifier _Thread_local
has thread storage duration.
Its
lifetime is the entire execution of the thread for which it is
created,
and its stored value is initialized when the thread is
started.
There is a distinct object per thread, and use of the declared name
in
an expression refers to the object associated with the thread
evaluating the expression. The result of attempting to
indirectly
access an object with thread storage duration from a thread other
than
the one with which the object is associated is
implementation-defined.
An object whose identifier is declared with no linkage and without
the
storage-class specifier static has automatic storage
duration.
- The object is associated with one thread only, and no other
thread should attempt to (indirectly) access it.
- Some compound literals, such as struct values with an array
member, also have automatic storage duration, and temporary lifetime.
These can
appear in expressions, without being associated with an
identifier.
For such an object that does not have a variable length array type,
its
lifetime extends from entry into the block with which it is
associated
until execution of that block ends in any way. (Entering an
enclosed
block or calling a function suspends, but does not end, execution of
the current block.) If the block is entered recursively, a new
instance of the object is created each time. The initial value
of
the
object is indeterminate. If an initialization is specified for
the
object, it is performed each time the declaration is reached in the
execution of the block; otherwise, the value becomes indeterminate
each
time the declaration is reached.
For such an object that does have a variable length array type, its
lifetime extends from the declaration of the object until execution
of
the program leaves the scope of the declaration. (Leaving the
innermost block containing the declaration, or jumping to a point in
that block or an embedded block prior to the declaration, leaves the
scope of the declaration.) If the scope is entered
recursively, a
new
instance of the object is created each time. The initial value
of
the
object is indeterminate.
Location - where is something allocated in memory?
- address space in general
- memory segments (sections) in general
- registers
- The machine instructions (usually) take their data from
registers or memory. Register accesses are much faster.
- The
register storage qualifier requests the
compiler to make extra efforts to keep a variable in a
register.
The request can be ignored by the compiler, in which case the
compiler
acts as if the auto storage qualifier was
specified
instead. The address operator & cannot
be
applied to a register variable.
- stack section
- automatic storage duration
- heap section
- allocated storage duration (allocated by
malloc()
and deallocated by free())
- data section
- constants
- file-scope variables,
static local-scope
variables
- static storage duration
- text section
- functions
- static storage duration
- Exercise. Why is cache memory not on this list?
Why
is disk or flash memory not on this list?
- Exercise. Where is a block-scope (local) variable
stored?
What can affect the content of memory?
- initializers
- If static extent, or thread extent, initialized when
allocated,
default 0.
- If local extent, initialized when allocated, default whatever, unpredictable.
- If dynamic extent, initialized or not by the library
function
that did the allocation.
- See the warnings in C:ARM Sec. 4.2.8.
- assignment expressions, as side effects
- assignment via name
- assignment via dereferenced pointer expression
- assignment via external agent
Common bugs associated with misunderstanding Scope
- implicit declaration of a function, because you omitted the
function's prototype
- the default prototype of a function is
int function();
- no argument type-checking, probably the wrong return type
- <more later>
Common bugs associated with misunderstanding Scope and Lifetime
Common bugs associated with misunderstanding Location
Some notes on identifiers (choice of names)
- letters, digits, underscore, not starting with a digit
- as a regular expression,
[A-Za-z_][A-Za-z0-9_]*
Identifiers to avoid when choosing your own names - these are
already
used, etc.
- keywords in C
- It would be a good idea to avoid the keywords in C++
also.
- predefined macros in C
- predefined identifiers in C
- names used in the C Standard Library, or in the Posix
Libraries
bool, true, false
in
C99 <stdbool.h>
- type names ending with
_t
- Some of these are actually macros.
- names that might be in the libraries in the future
- identifiers matching
_[A-Z_][A-Za-z0-9_]*
- function names ending with
_s
- macro names beginning with __STDC_
- names of
extern variables or functions that are
too
long for the linker
- This length is system-dependent.
An experiment with the C99 predefined identifier __func__
% cat foo.c
#include <stdio.h>
int foo(void) { return 0; }
int bar(void) { return 1; }
int hack_Foo(void) { printf("%s\n", __func__); return 0; }
int hack_Bar(void) { printf("%s\n", __func__); return 1; }
% gcc -std=c99 -c foo.c
% strings foo.o
hack_Foo
hack_Bar
Can you explain this version of the experiment?
% cat foo.c
#include <stdio.h>
int foo(void) { printf("%s\n", __func__); return 0; }
int bar(void) { printf("%s\n", __func__); return 1; }
% gcc -std=c99 -c foo.c
% strings foo.o
% strings -n 2 foo.o
}h
=k
$}
foo
%s
bar
Last revised, 8 Apr. 2013