Reading the C Standard is not an easy task. The wording has to be read carefully, and you always have to be remembered that certain words have special technical meanings, and may not mean what you think they do. Some apparently simple questions are only answered by assembling pieces from several different sections.
With all this in mind, your editor has asked me to put together a guide to some of the commonest pitfalls in Understanding the Standard. My first article explained some of the special terms in the standard, such as "undefined". This article talks about how the Standard connects different parts of your program together; the various modules you write, the library, and the operating system. I cover three topics: linkage, how and why to declare main, and freestanding implementations.
If you use the same name in two different places, you might be referring to the same object, or you might not. For example, if you declare the variable i in two functions, you expect them to be independent. On the other hand, you expect a function call to refer the function with the same name, and not for the compiler to think that they are different. The Standard uses the term "linkage" to describe this sort of thing, and discusses it in subclauses 6.1.2.2 and 6.7.
As you undoubtedly know, linkage of identifiers is controlled by the places in which declarations and definitions are placed, and by the keywords static and extern. However, you might not be aware of some of the subtleties that the Standard includes.
|
Sidebar: declarations and definitions.
The terms "declaration" and "definition" are often misunderstood. A declaration is a statement that describes an object, for example by giving its type. So a function prototype is a declaration, as is a statement like extern int i;. A definition, on the other hand, is a declaration that also reserves memory for the object. So a function definition is the declaration that includes the function body, and a variable definition is the one that causes the variable to actual exist; for example, int i = 10; is a definition. All definitions are also declarations, but declarations don't have to be definitions. In general, every variable and function in your program must have exactly one definition, but can have any number of declarations. However, declarations that are not definitions are only useful when the definition is not in scope, for example because it is in a different module, or (in the case of functions), because it comes later on in the source file. As we will see, the Standard also has the concept of tentative definitions, which are declarations that might be definitions but might not. |
The Standard talks about three kinds of linkage:
The rules for functions and variables are different enough that they are best considered separately.
The big surprise with function declarations is that the keyword extern has no meaning whatsoever; in particular, it does not mean that the function has external linkage! Instead, the meaning of a function declaration depends only on whether or not it uses the keyword static [*]. Functions must have either internal or external linkage. Furthermore, every declaration of a function in the same module refers to the same function and must have the same linkage. Thus it is the first declaration (which might be within an included header file, of course) which determines the linkage. If this declaration uses the keyword static, the function has internal linkage, and all subsequent declarations can include or omit the static without effect; the function remains an internal one that cannot be accessed by name from another module. If, on the other hand, the first declaration does not use static, then the function has external linkage, and none of the subsequent declarations may use static either (a compiler will probably, but need not, detect this error).
* Of course, using both static and extern in the same declaration is a syntax error.
There are two subtleties here, both to do with the fact that a function can be declared within another function body - a "block scope declaration" (block scope declarations cannot use the keyword static). Firstly, if the first declaration is a block scope declaration, then its effect vanishes at the end of the block it appears in. Nevertheless, the above rules apply because the next declaration, which will be treated as another "first declaration", must specify the same linkage, which must therefore be external. Secondly, if a block scope declaration is not the first declaration, but the first declaration is hidden by a declaration of a variable in an outer block, then the inner declaration always specifies external linkage, and in this case the first declaration must also have external linkage.
The following code illustrates these situations (the comment "1-D" means that this is a "first declaration"):
static int fn_a (void); /* Internal linkage (1-D) */
int fn_a (void); /* Remains internal linkage */
extern int fn_a (void); /* Remains internal linkage; extern has no effect */
int fn_b (void); /* External linkage (1-D) */
static int fn_b (void); /* Error - fn_b has external linkage */
extern int fn_c (void); /* External linkage (1-D) */
static int fn_d (void); /* Internal linkage (1-D) */
int main (void)
{
int fn_a (void); /* Remains internal linkage */
int fn_b (void); /* Remains external linkage */
static int fn_a (void); /* Error - block scope must not be static */
int fn_e (void); /* External linkage (1-D) */
int fn_f (void); /* External linkage (1-D) */
int fn_c, fn_d; /* Variable declarations hide the functions */
{
int fn_c (); /* Remains external linkage */
int fn_d (); /* Error - hidden fn_d has internal linkage */
/* fn_c here refers to the external linkage function once more */
}
/* fn_c here refers to the variable */
return 0;
}
int fn_e (void); /* Remains external linkage */
static int fn_f (void); /* Error - fn_f has external linkage */
Now we're ready to go on to variables. Unlike with functions, the keyword extern is useful here. Firstly, variables declared inside function bodies (this includes the parameters of the function) without the keyword extern have no linkage. Therefore each separate declaration is also the definition of the variable, and it follows that such a variable can only be declared once in any given block (though of course the same name can be redeclared in an inner block, yielding a new variable). As we know, if static is used, the variable persists for the duration of the program. Otherwise it is created when the block is entered and destroyed when it is left.
For variables declared within a function body with the keyword extern, and for variables declared outside a function body, the rules are somewhat more complex. They depend not only on which keyword has been used, but also on whether the declaration includes an initializer (an equals sign and an initial value for the variable). The first rule to remember is that a variable can only have one definition in the entire program. If it has internal linkage, all declarations of the variable are in the same module, and only one of them can be a definition. If it has external linkage, then it can have declarations in several modules. In this case, only one module can contain a definition. This is a point that is often misunderstood. Many linkers allow a variable to have several definitions, and provided that they all agree, there is no problem. However, other linkers will either complain if there are two definitions, or your program may silently go wrong (as I discussed in my first article, this is what is called "undefined behaviour"), and the Standard prohibits it. Therefore you should always take care to ensure there is only one definition for each variable, and that is why these rules are so important. Another, minor, point is that if the variable is never used (use within a sizeof expression does not count), it need not have a definition. Of course, if it is used, there must be one.
After we've considered that lot, the remainder of what the Standard says can best be expressed as a set of simple rules.
Finally, the two subtle points that applied to functions also apply to variables. If the first declaration is within a function body, or any declaration is within an inner block with the first declaration hidden from it, the variable has external linkage, and no declaration may use static.
Again, let's have some examples.
static int var_a = 1; /* Internal linkage, definition */
static int var_a; /* Remains internal linkage */
extern int var_a; /* Remains internal linkage */
int var_a; /* Error - var_a has internal linkage */
static int var_a = 1; /* Error - more than one definition */
static int var_b; /* Internal linkage, tentative definition */
extern int var_b = 1; /* Remains internal linkage, definition overrides
tentative definition */
int var_c = 1; /* External linkage, definition */
extern int var_c; /* Remains external linkage */
int var_c; /* Remains external linkage */
static int var_c; /* Error - var_c has external linkage */
static int var_c = 1; /* Error - more than one definition */
extern int var_d = 1; /* External linkage, definition */
static int var_e; /* Internal linkage, tentative definition */
int var_e = 1; /* Error - var_d has internal linkage */
static int var_f; /* Internal linkage, tentative definition */
int var_g; /* External linkage, tentative definition */
extern int var_h; /* External linkage */
extern int var_f; /* Remains internal linkage */
static int var_f; /* Remains internal linkage, another tentative
definition */
int main (void)
{
extern int var_i; /* External linkage */
extern int var_j; /* External linkage */
extern int var_k; /* External linkage */
auto int var_f, var_g; /* Auto declarations hide the previous ones */
{
extern int var_f; /* Error - hidden var_f has internal linkage */
extern int var_g; /* Remains external linkage */
/* var_g here refers to the external linkage variable once more */
}
/* var_g here refers to the auto variable */
return 0;
}
int var_i; /* Remains external linkage, tentative definition */
extern int var_j; /* Remains external linkage */
static int var_k; /* Error - var_k has external linkage */
If that is the complete source file, and if we delete the lines with the "Error" comments, then the some of the variables mentioned are defined and some are not, as follows:
| variable | linkage | defined |
|---|---|---|
| var_a | internal | yes |
| var_b | internal | yes |
| var_c | external | yes |
| var_d | external | yes |
| var_e | internal | yes |
| var_f | internal | yes |
| var_g | external | yes |
| var_h | external | no |
| var_i | external | yes |
| var_j | external | no |
| var_k | external | no |
There is another note that should be remembered with tentative definitions that end up initializing variables. With most types of variable, the only effect that this has is to cause the variable to be assigned the value zero. However, if the variable is an array, and no declaration gives the array a size, then it is initialized with one element [*]. So, in the following code:
int array_a [];
int array_b [];
int array_c [];
extern int array_a [5];
extern int array_b [] = { 1, 2, 3 };
arrays array_a and array_b are given sizes (5 and 3 respectively) by the second declarations, but array_c is not, and so will have only 1 element.
* This is not explicitly stated in the Standard, but is ISO's interpretation of the wording.
By this point you are probably beginning to panic slightly. Thankfully, however, you don't have to remember all of this. Instead, all you have to do is to obey a few simple rules.
These rules don't make use of all the features that the Standard allows, but there again, the Standard was designed to bring together diverse previous practices. Instead, they are easy to remember, practical, and easy to understand when you read the resulting code.
If you asked a group of experts on the C Standard what is the single most common violation in application programs, I suspect that most of them would reply 'main being defined as void'.
The Standard is very clear: subclause 5.1.2.2.1 says that main must be defined in one of the following two ways (or their equivalents not using prototypes):
int main (void)
{ /* ... */ }
int main (int argc, char *argv [])
{ /* ... */ }
"But," says the programmer, "making it void works for me". Unfortunately, that's not a defence when something fails. There are two parts to this; let's examine them separately.
Firstly, why does main return a value? Well, if we look at subclauses 5.1.2.2.3 and 7.10.4.3, we see that the answer is that the implementation (and here this usually means part of the operating system) uses the value to determine whether your program succeeded or failed. The Standard says that you can return zero, the value EXIT_SUCCESS, or the value EXIT_FAILURE (the last two are macros defined in <stdlib.h>). Zero and EXIT_SUCCESS both imply that the program "succeeded", while EXIT_FAILURE (obviously) implies that it "failed". Of course, what "success" and "failure" actually mean depend very much on the implementation, and there isn't much that the Standard can say about it; the same applies if you return a value other than one of those three. However, we can look at some common arrangements.
The other half of the problem with declaring main as void is to do with the ways in which functions are called in particular systems. The implementation is expecting main to return a value, but it doesn't! So what will be returned instead? Let's consider the four most common cases of this.
The first case is when the author of the C compiler you are using has explicitly decided to allow main to be declared as void. That's his or her prerogative, and there's nothing wrong with it. The Standard allows an implementation to do this, and many (including many popular MS-DOS compilers) have done so. On such systems, the compiler will generate a return value instead if one is needed; whether it's "success" or "failure" is, of course, another question.
So long as your program only runs under those compilers, you've got nothing to worry about. However, one day you may want to move to a new system where void doesn't work. Then you'll be in trouble. Why not save yourself the aggravation in the first place by using int?
The second case involves those systems where the return values from int functions are placed in a register. In this case, the part of the operating system that calls main will expect to find a value in that register, but nothing will have been placed there. Instead, it will find some random value, left there by a previous calculation, and use it. How it will interpret that value, of course, is beyond your control. If you don't look at the status of your program, you'll never find the problem. But it's lurking.
The third case is when the return value is placed somewhere on the stack. What will now happen depends on the exact way in which the stack is laid out, and how it is cleared after a function call. Let's consider a function call with four arguments: a, b, c, d. One arrangement is that the caller will push the parameters on to the stack, followed by the return address, and will take the result (if one is expected) from the location of the last parameter, after which it pops off the rest. Thus:
| int function | void function | |
|---|---|---|
| Before call: | ... x y z | ... x y z |
| Arguments pushed: | ... x y z a b c d | ... x y z a b c d |
| Function called: | ... x y z a b c d addr | ... x y z a b c d addr |
| Result placed: | ... x y z a b c r addr | [no action] |
| Return operation: | ... x y z a b c r | ... x y z a b c d |
| Result popped: | ... x y z a b c | ... x y z a b c |
| Other arguments popped: | ... x y z | ... x y z |
So, if the function is a void one, but the caller thinks it returns an int, the value of the fourth parameter is "returned".
However, another common arrangement is for the called function to pop the arguments instead:
| int function | void function | |
|---|---|---|
| Before call: | ... x y z | ... x y z |
| Arguments pushed: | ... x y z a b c d | ... x y z a b c d |
| Function called: | ... x y z a b c d addr | ... x y z a b c d addr |
| Arguments removed: | ... x y z addr | ... x y z addr |
| Result placed: | ... x y z r addr | [no action] |
| Return operation: | ... x y z r | ... x y z |
| Result popped: | ... x y z | ... x y |
And now, we see, the stack is corrupted! Since various things, such as cleaning up closed files, happen after main returns, we can see that disaster lurks.
The fourth and final case is the rare compiler that won't let you declare main as void. If you find one of these, it's worth paying extra for! Because there are no other functions with more than one possible prototype, and because there must not be a prototype for main in any standard header, making this test requires special code in the compiler. Anyone who's gone to the effort of getting that right will probably have put lots of other well-directed effort in as well.
"What" I hear you cry "is a freestanding implementation?". Well, if you examine subclause 5.1.2, you will see that it talks about two different kinds of execution environment: hosted and freestanding. If you're the average C programmer, you will only have used hosted implementations up to now. Freestanding implementations are rather more specialised, and in general are used for things like writing operating systems - where you don't have the basic facilities that the Standard library needs - and for code for embedded systems, where you want your final program to contain the absolute minimum, even if it means not having functions like printf available.
The differences between hosted and freestanding implementations are best described in a table:
| Feature | Hosted | Freestanding |
|---|---|---|
| Standard headers available | all 15 | <float.h> <limits.h> <stdarg.h> <stddef.h>
(the implementation may provide others) |
| Available library functions and macros | all listed in the Standard | those in the above 4 headers
(the implementation may provide others) |
| Reserved identifiers | all listed in the Standard | all listed in the Standard
(this was decided by WG14 last December; the wording of subclause 5.1.2.1 will be changed in a future Technical Corrigendum to bring it into line with this decision). |
| Function called to run the program | main | implementation-defined |
| Arguments for that function | argc and argv or none | implementation-defined |
As you can see, the main difference is that freestanding implementations only provide a "stripped-down" library. You may never come across one in practice, but at least you now know what the term means when you find it in your reading of the Standard.
Hopefully you now understand how a Standard program links together. The next article in this series will cover the topics of international character sets, how the Standard allows you to use them, and what it requires you to do.
Back to the intro.
Back to the C index.
Previous article
Back to Clive's home page.