Public Comment Number PC-____ ISO/IEC CD 9899 (SC22N2620) Public Comment =========================================== Date: 1998-02-09 Author: Clive D.W. Feather Author Affiliation: Self Postal Address: Demon Internet Limited 322 Regents Park Road London N3 2QQ United Kingdom E-mail Address: Telephone Number: +44 181 371 1138 Fax Number: +44 181 371 1037 Number of individual comments: 1 Comment 1. Category: Inconsistency Committee Draft subsection: various Title: problems with UCNs Detailed description: Further examination of UCNs shows that they have many problems associated with them, and in particular produce very different behaviour than would occur with C89. The following example was presented in comp.std.c by Antoine Leca and is summarised by me: What is the effect of the following code: #include #define str(s) #s int main(void) { printf (" # of <%s> is <%s>\n", "$", str ("$")); return 0; } Since $ is not part of the basic character set, this is not strictly conforming. However, assume that the implementation has a representation for $. Then, under C9X the output is clearly: # of <$> is <"$"> Under C9X, the output is probably one of: # of <$> is <"\u0024"> or # of <$> is <"\$"> At Translation Phase 1, both $s will be converted to \u0024, and so the source will become: #include #define str(s) #s int main(void) { printf (" # of <%s> is <%s>\n", "\u0024", str ("\u0024")); return 0; } When the # operator is applied as part of the expansion of str, the \ is doubled, producing the line: printf (" # of <%s> is <%s>\n", "\u0024", "\"\\u0024\""); in accordance with 6.8.3.2p2. Now, when TP5 is reached one has to decide whether the UCN is recognised first, generating: printf (" # of <%s> is <%s>\n", "\u0024", "\"\$\""); and undefined behaviour because of the escape sequence \$ - though I would expect at least some implementations to generate: # of <$> is <"\$"> - or else the escape sequence \\ is recognised first, generating the output: # of <$> is <"\u0024"> Neither, however, is what the naive programmer would expect, and neither interpretation allows a non-basic character to remain in a string that has the # operator applied to it. Another serious issue with UCNs is that they do not mix well with systems such as ISO 2022. Consider a situation where redundant shift sequences appear within string literals in source files. In C89 these sequences will be retained throughout the translation process and will appear when the literal is output by the program. In C9X the characters in the literal will be converted to UCNs and the shift sequences lost; a new set of, possibly different, shift sequences has to be added during TP5. For some applications this is a Quiet Change from C89.