ISO/IEC JTC1/SC22/WG14 N579 Proposal CB-004 - Changes to phases of translation ================================================== Summary ------- This proposal makes various clarifications and corrections to the concepts involved in the phases of translation. No new features are added. Conformance ----------- No C89 strictly conforming program should be affected by this proposal. Discussion ---------- On close examination, there appear to be a number of inconsistencies, irregularities, and omissions in the concepts surrounding the phases of translation. This proposal attempts to clean this up. The following changes are made. (1) C89 uses the term "translation unit" for two slightly different concepts: - the source file after all preprocessing; - the source file after #include and #if directives have been applied, but before macro substitution. A new term "preprocessing source unit" is introduced for the latter. (2) The present definition of preprocessing makes no attempt to remove the preprocessor directives from the translation unit; this means that the # preprocessing tokens will cause a mandatory diagnostic on (failed) conversion to a token. This oversight is corrected. (3) The present wording of translation phase 2 implies that, after splicing, the implementation must scan backwards one character for another backslash where another newline occurs immediately after the splice, and thus that three backslashes followed by three newlines are all removed. This seems undesirable, and it is reported that existing implementations vary in their handling of this. The proposal makes it clear that only a backslash at the end of a *physical* source line causes splicing. (4) C89 does not describe the "start symbol" of the grammar, making it unclear what the actual syntax is ! This oversight is corrected. (5) The interaction of #include and #if, where the included file contains unbalanced #if, #else, #elif, or #endif directives, is clarified to be a syntax violation. Without this, the effect of (say) the directive "#elif 0" in the middle of a conditionally included file would be non-intuitive. Detailed proposal ----------------- In subclause 5.1.1.1, replace: A source file together with all the headers and source files included via the preprocessing directive #include, less any source lines skipped by any of the conditional inclusion preprocessing directives, is called a translation unit. with A source file together with all the headers and source files included via the preprocessing directive #include, less any source lines skipped by any of the conditional inclusion preprocessing directives, is called a /preprocessing source unit/. After further preprocessing and removal of all preprocessing directives, it is called a /translation unit/. In subclause 5.1.1.2, replace translation phase 2 with: Each instance of a backslash character immediately followed by a newline character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place.4a __________ 4a. Thus the physical source lines (delimited by | characters): |\\\| || |n| generate the logical source lines: |\\| |n| and a source file may end with a backslash followed by two physical newlines, which will generate a last logical source line ending in a backslash. __________ In subclause 5.1.1.2, append to translation phase 4 the sentence: All preprocessing directives are then removed to form the resulting translation unit. Append to subclause 5.1.1.3, immediately before the example: A syntax rule is violated if, in translation phase 4, any source file[*] or header fails to be an example of the syntactic category preprocessing-file or, in translation phase 7, the translation unit fails to be an example of the syntactic category translation-unit. add the footnote: [*] This includes any included source file or header. and add the Rationale material: In translation phase 4, the syntactic category preprocessing-file applies to each included file separately from the file it is included into. Thus an included file cannot contain (for example) unbalanced #else or #elif directives. Replace: translation unit with preprocessing source unit in each of: - the tenth bullet item of subclause 5.2.4.1, - subclause 6.8.3.1, - subclause 6.8.3.5, - subclause 6.8.8. Replace: source file with preprocessing source unit in 6.8.3.4 (at two locations).