[Note to WG14: some items - 27 is a good example - are the sort of thing
that should have been, but wasn't, spotted during the C9X review process.
We agreed then that there might be "cleanup" issues that surfaced and
would have to be dealt with through Defect Reports.]


====
UK Defect Report 1 of 2001-09

Subject: completion of declarators
-------

Problem
-------

6.2.1#7 reads in part:

    Any  other identifier  has  scope that begins just after the
    completion of its declarator.

However, nothing says when a declarator is completed. While it seems
obvious to experienced people that this means the syntactic end of the
declarator, the term "complete" has other meanings when discussing
declarations and objects, and therefore it's a bad term to use.


Suggested Technical Corrigendum
-------------------------------

Change the quoted text to:

    Any  other identifier  has  scope that begins just after the
    end of the full declarator it appears in.


====
UK Defect Report 2 of 2001-09

Subject: are values a form of behaviour ?
-------

Problem
-------

I can see nothing that says or implies that production of an unspecified
value is a form of unspecified behaviour, and similarly for
implementation-defined values. It is therefore arguable that a program
is strictly-conforming even if its output depends on an unspecified
value.


Suggested Technical Corrigendum
-------------------------------

Add a new paragraph 4#2a after 4#2:

    [#2a] An evaluation that makes use of an unspecified or
    implementation-defined value is a form of unspecified or
    implementation-defined behaviour respectively.


====
UK Defect Report 3 of 2001-09

Subject: limits are required for optional types
-------

Problem
-------

The types sig_atomic_t and wchar_t are optional on freestanding
implementations, since they don't have to provide the relevant headers.
But the limits SIG_ATOMIC_MIN, SIG_ATOMIC_MAX, WINT_MIN, and WINT_MAX
are in <stdint.h>, which all implementations must provide. So a
freestanding implementation must provide limits for types which it
doesn't implement.


Suggested Technical Corrigendum
-------------------------------

Append to 7.18.3#2:

    A freestanding implementation shall only define the symbols
    corresponding to those typedef names it actually provides.


====
UK Defect Report 4 of 2001-09

Subject: Lacuna applying C89:TC1 to C99
-------

Problem
-------

Defect Report 009 made a change to the text concerning function
declarators. This text seems not have made it into C99, even though the
issue remains valid. The change should be reinstated.


Suggested Technical Corrigendum
-------------------------------

Change 6.7.5.3#11 to:

    [#11] If, in a parameter declaration, an identifier can be
    treated as a typedef name or as a parameter name, it shall be
    taken as a typedef name.


====
UK Defect Report 5 of 2001-09

Subject: non-directives within macro arguments
-------

Problem
-------

Consider the code:

    #define nothing(x) // Nothing

    /* Case 1 */
    nothing (
    #include <stdio.h>
    )

    /* Case 2 */
    nothing (
    #nonstandard
    )

6.10.3#11 reads in part:

    If
    there  are sequences of preprocessing tokens within the list
    of arguments  that  would  otherwise  act  as  preprocessing
    directives, the behavior is undefined.

This clearly covers case 1. However, it is not clear whether or not case
2 is a preprocessing directive. It is a "non-directive", but is that
also a directive ? If case 2 is a directive, it is undefined behaviour.
If it is not, then case 2 is strictly-conforming and macro-expands to
nothing.

Since non-directives are only valid as extensions, it might be more
sensible for them to behave as directives do and make the behaviour
undefined in this case.


Suggested Technical Corrigendum
-------------------------------

In 6.10.3#11, change the last sentence to:

    If
    there  are sequences of preprocessing tokens within the list
    of arguments  that  would  otherwise  act  as  preprocessing
    directives or as non-directives (that is, the first pre-processing
    token on a line is a #), the behavior is undefined.


====
UK Defect Report 6 of 2001-09

Subject: are "struct fred" and "union fred" the same type ?
-------

Problem
-------

Consider the code:

    union fred { int a; }

    int main (void)
    {
        struct fred *ptr;  /* Line X */
        // ...

I can see nothing that forbids this code. In particular, 6.7.2.3#8
reads:

    [#8] If a type specifier of the form

            struct-or-union identifier
    or
            enum identifier

    occurs  other  than as part of one of the above forms, and a
    declaration of the identifier as a tag is visible,  then  it
    specifies  the same type as that other declaration, and does
    not redeclare the tag.

At line X a declaration of "fred" as a tag is visible, so this line
specifies the same type as that other declaration, even though this uses
"struct" and that uses "union" !

It has been further pointed out to me that nothing in the Standard actually
says that "union x" is a union type as opposed to a structure type, and
vice versa.


Suggested Technical Corrigendum
-------------------------------

Append to 6.7.2.1#6:

    The keywords struct and union indicate that the type being
    specified is, respectively, a structure type or a union type.

Add a new paragraph following 6.7.2.3#1:

    [#1a] Where two declarations that use the same tag declare the
    same type, they shall both use the same choice of struct, union,
    or enum.


====
UK Defect Report 7 of 2001-09

Subject: incomplete argument types when calling non-prototyped functions
-------

Problem
-------

Consider the code:

    void jim ();
    void sheila (void);

    // ...

    sheila (jim ());   /* Line A */
    jim (sheila ());   /* Line B */

Line A violates the constraint of 6.5.2.2#2, that requires the argument
to have a type that can be assigned to the parameter type. But line B
doesn't because that constraint only applies to prototyped functions.
6.5.2.2#4 reads in part:

    [#4]  An  argument  may be an expression of any object type.

but this is not a constraint. Should it not be ? After all, the compiler
has to know the type of the argument in order to compile the function
call, so it can check at that point that the argument has a complete
object type.


Suggested Technical Corrigendum
-------------------------------

Add a new paragraph #1a following 6.5.2.2#1:

    [#1a] Each argument shall have a type which is a completed object
    type.


====
UK Defect Report 8 of 2001-09

Subject: "overriding" in designated initializers
-------

Problem
-------

Consider the code:

    struct fred
    {
        char s [6];
        int n;
    };
    struct fred x [] = { { { "abc" }, 1 }, [0].s[0] = 'q'        };
    struct fred y [] = { { { "abc" }, 1 }, [0] = { .s[0] = 'q' } };

Both x and y will contain one element of type struct fred, which will be
initialized by the initializer "{ { "abc" }, 1 }" and then modified in
some way. The question is exactly how it is modified.

6.7.8#19 reads:

    [#19] The initialization shall  occur  in  initializer  list
    order,  each initializer provided for a particular subobject
    overriding any previously listed initializer  for  the  same
    subobject;   all   subobjects   that   are  not  initialized
    explicitly shall  be  initialized  implicitly  the  same  as
    objects that have static storage duration.

In the case of x, it is fairly clear that the first initializer sets:
    x [0].s [0] = 'a'
    x [0].s [1] = 'b'
    x [0].s [2] = 'c'
    x [0].s [3] = '\0'
    x [0].n     = 1
and the second one sets:
    x [0].s [0] = 'q'
Finally, the remaining subobjects are initialized implicitly:
    x [0].s [4] = 0
    x [0].s [5] = 0

Now consider the second initializer for y. One point of view says that
this behaves the same as for x: it specifies a value for y [0].s [0],
after which the two remaining elements of y [0].s are still
uninitialized and so are set to zero. The other point of view says that
this sets:
    y [0] = (struct fred) { .s[0] = 'q' }
and that the rule concerning "all subobjects that are not initialized
explicitly" applies recursively. If so, the effect is to set:

    x [0].s [0] = 'q'
    x [0].s [1] = 0
    x [0].s [2] = 0
    x [0].s [3] = 0
    x [0].s [4] = 0
    x [0].s [5] = 0
    x [0].n     = 0

Which of these is correct ?


Suggested Technical Corrigendum 1
---------------------------------

If x and y are supposed to have the same effect, change 6.7.8#19 to:

    [#19] The initialization shall  occur  in  initializer  list
    order,  each initializer provided for a particular subobject
    overriding any previously listed initializer  for  the  same
    subobject. When all initializers have been applied, any subobjects
    of the overall object being initialized that have not been
    initialized explicitly shall be initialized  implicitly  the
    same  as objects that have static storage duration.

and add a new paragraph at the end:

    [#39] To illustrate the rules for implicit initialization, in:

        struct fred
        {
            char s [6];
            int n;
        };
        struct fred x [] = { { { "abc" }, 1 }, [0].s[0] = 'q'        };
        struct fred y [] = { { { "abc" }, 1 }, [0] = { .s[0] = 'q' } };

    the definitions of x and y result in identical objects. Each will be
    an array with one element; within that element, the members s[4] and
    s[5] are implicitly initialized to zero.


Suggested Technical Corrigendum 2
---------------------------------

If x and y are supposed to be different, change 6.7.8#19 to:

    [#19] The initialization shall  occur  in  initializer  list
    order,  each initializer provided for a particular subobject
    overriding any previously listed initializer  for  the  same
    subobject; for each brace-enclosed list, all subobjects within
    the object that that list initializes that are not initialized
    explicitly shall  be  initialized  implicitly  the  same  as
    objects that have static storage duration.

and add a new paragraph at the end:

    [#39] To illustrate the rules for implicit initialization, in:

        struct fred
        {
            char s [6];
            int n;
        };
        struct fred x [] = { { { "abc" }, 1 }, [0] = { .s[0] = 'q' } };
        struct fred y [] = { { .s[0] = 'q' } };

    the definitions of x and y result in identical objects. Each will be
    an array with one element; within that element, all the members are
    implicitly initialized to zero except for s[0]. In the definition of
    x the first initializer has no effect, since the second one
    initializes the same subobject (x[0]).


====
UK Defect Report 9 of 2001-09

Subject: Editorial
-------

Problem
-------

In 7.20.7.2, the first paragraph of the Returns section does not have a
separate paragraph number.


====
UK Defect Report 10 of 2001-09

Subject: mbtowc and partial characters
-------

Problem
-------

If mbtowc() is given a partial character (or an escape sequence that
isn't a complete character), it returns -1. However, is it supposed to
remember the relevant state information or should it ignore it ?

Consider an implementation where the character '\xE' starts an alternate
shift state and '\xF' returns to the initial shift state. The wide
character encodings are:

    initial shift state:    'x' -> ASCII codes
    alternate shift state:  'x' -> ASCII codes + 0x100

Starting in the initial shift state,

    mbtowc (&wc, "\xEZ", 2);

should return 2 and set wc to 0x15A. However, starting in the initial
shift state, consider:

    mbtowc (&wc1, "\xE", 1);
    mbtowc (&wc2, "Z",   1);

I would expect that the first call returns -1, leaving wc1 unaltered,
while the second returns 1 and sets wc2 to 0x5A. However, is it
permitted for the second to set wc2 to 0x15A ? If so, how is an
application meant to use mbtowc ?

[The newer function mbrtowc does not have this problem.]


Suggested Technical Corrigendum
-------------------------------

The UK National Body prefers to add a new return value for this case.
To do so, change the main part (see UK DR 9) of 7.20.7.2#3 to read:

    If s is a  null  pointer,  the  mbtowc  function  returns  a
    nonzero  or  zero  value,  if multibyte character encodings,
    respectively, do or do not have  state-dependent  encodings.
    If  s  is  not  a  null  pointer, the mbtowc function returns
    the first of the following that applies (given the current
    conversion state):

    0               if s points to the null character

    between 1 and   if the next n  or  fewer  bytes  complete  a
    n inclusive     valid  multibyte  character  (which  is  the
                    value stored); the  value  returned  is  the
                    number  of bytes that complete the multibyte
                    character. The value returned will not be
                    greater than that of the MB_CUR_MAX macro.

    (size_t)(-2)    if  the  next  n  bytes  contribute  to   an
                    incomplete (but potentially valid) multibyte
                    character,  and  all  n  bytes   have   been
                    processed (no value is stored).

    (size_t)(-1)    if an encoding error occurs, in  which  case
                    the  next n or fewer bytes do not contribute
                    to a complete and valid multibyte  character
                    (no value is stored); the value of the macro
                    EILSEQ  is  stored   in   errno,   and   the
                    conversion state is unspecified.

(note that most of this wording comes from mbrtowc) and delete #4.

If this option is unacceptable, then append to 7.20.7.2#2:

    If the next multibyte character is incomplete or invalid, the
    shift state is unaffected and nothing is stored.


====
UK Defect Report 11 of 2001-09

Subject: non-prototyped function calls and argument mismatches
-------

Problem
-------

Consider the code:

    #include <stdio.h>

    int f ();

    int main (void)
    {
        return f (0);
    }

    #ifdef PROTO
    int f (unsigned int x)
    #else
    int f (x) unsigned int x;
    #endif
    {
        return printf ("%u\n", x);
    }

Now, 6.5.2.2#6 reads:

    [#6]  If the expression that denotes the called function has
    a type that  does  not  include  a  prototype,  the  integer
    promotions  are  performed  on  each argument, and arguments
    that have type float are  promoted  to  double.   These  are
    called  the  default  argument  promotions.
[...]
    If the function is defined with a
    type that includes a prototype,  and  either  the  prototype
    ends with an ellipsis (, ...)  or the types of the arguments
    after promotion are not compatible with  the  types  of  the
    parameters,  the  behavior is undefined.  If the function is
    defined with a type that does not include a  prototype,  and
    the   types   of  the  arguments  after  promotion  are  not
    compatible with those of the parameters after promotion, the
    behavior is undefined, except for the following cases:

    -- one  promoted  type is a signed integer type, the other
       promoted type is  the  corresponding  unsigned  integer
       type, and the value is representable in both types;

    -- both  types  are  pointers  to qualified or unqualified
       versions of a character type or void.

So the above code is undefined if PROTO is defined, but is legitimate if
it is not. This seems inconsistent.

Traditionally, when a function is called and no prototype is in scope,
the implementation applies the default argument promotions to the
argument value and then assumes that is the parameter type. If it isn't,
this can cause all kinds of problems, which is why the undefined
behaviour. However, if it is known that the argument value will be
correctly handled by the parameter type, there is no problem; this is
the rationale behind the exceptions.

The exceptions should apply to both cases, no matter how the function is
eventually defined.


Suggested Technical Corrigendum
-------------------------------

Change the part of 6.5.2.2#6 after the omission to:

    If the types  of  the  arguments  after  promotion  are  not
    compatible with those of the parameters after promotion 78A),
    the behavior is undefined, except for the following cases:

    -- one  promoted  type is a signed integer type, the other
       promoted type is  the  corresponding  unsigned  integer
       type, and the value is representable in both types;

    -- both  types  are  pointers  to qualified or unqualified
       versions of a character type or void.

    If the function is defined with a type that includes a
    prototype, and either any parameter has a type which is altered
    by the default argument promotions or the prototype ends with an
    ellipsis (, ...), the behavior is undefined.

    78A) Because of the rule later in this paragraph, it is only
    necessary to check whether the parameter type undergoes promotion
    when the function is not defined using a prototype.


====
UK Defect Report 12 of 2001-09

Subject: multiple inclusion of headers
-------

Problem
-------

Consider the code:

    #include <stdio.h>     // Line 1
    #undef FOPEN_MAX       // Line 2, permitted by 7.1.3#3
    #include <stdio.h>     // Line 3
    #ifdef FOPEN_MAX       // Line 4

7.1.2 says:

    [#4] Standard headers may be included in any order; each may
    be included more than once in a given scope, with no  effect
    different  from  being  included  only once, except that the
    effect of including <assert.h> depends on the definition  of
    NDEBUG  (see  7.2).

Does "with no effect different" mean:
(1) the includes on lines 1 and 3 have the same effect, so at line 4 the
    macro FOPEN_MAX is defined;
(2) the include on line 3 has no effect, so that at line 4 the macro
    FOPEN_MAX is undefined;
(3) something else ?

Most current implementations wrap the contents of headers with an
"idempotent guard", such as:

    #ifndef _STDIO_H_INCLUDED_
    #define _STDIO_H_INCLUDED_
    // Real contents go here
    #endif

This will provide behaviour (2), which I would suggest is the most
desirable.

Furthermore, the concept of scope doesn't apply here, both because
includes happen during preprocessing and because there is a requirement in
the same paragraph that:

    If  used,  a  header shall be included
    outside of any external declaration or  definition,

If the wording is being altered, this would be a good opportunity to fix
this as well.


Suggested Technical Corrigendum
-------------------------------

Change the first sentence of 7.1.2#4 to:

    [#4] Standard headers may be included in any order; each may
    be included any number of times in a preprocessing translation
    unit. The second and subsequent occurrences of a given header
    shall be ignored, except in the case of <assert.h> (where the
    behaviour is defined in subclause 7.2).

====
UK Defect Report 13 of 2001-09

Subject: common initial sequences and related issues with unions
-------

Problem
-------

6.5.2.3#5 reads:

    [#5]  One special guarantee is made in order to simplify the
    use of unions: if a union contains several  structures  that
    share  a  common  initial  sequence  (see below), and if the
    union object currently contains one of these structures,  it
    is  permitted  to  inspect the common initial part of any of
    them anywhere that a declaration of the complete type of the
    union  is  visible.   Two  structures share a common initial
    sequence if  corresponding  members  have  compatible  types
    (and, for bit-fields, the same widths) for a sequence of one
    or more initial members.

Two possible reasons have been suggested for this rule.

(1) The implementation may put padding between structure members. This
rule is necessary to ensure that the common initial sequence uses the
same padding in both places, so that the corresponding members occupy
the same location.

(2) If we consider part of the second example in 6.5.2.3#8:

    struct t1 { int m; };
    struct t2 { int m; };
    int f(struct t1 * p1, struct t2 * p2)
    {
        if (p1->m < 0)
            p2->m = -p2->m;
        return p1->m;
    }

the rule is necessary for an implementation to realize that p1 and p2 might
refer the same location.

If (1) is the reason, then the example is a bad one because the two
members are both at the start of their respective structures, and
therefore are required to be at an offset of 0 from the start of the
structure (and therefore of the union). It should be changed to use a
member further along a common initial sequence.

On the other hand, the requirement is not actually very suitable. Consider
the code:

    struct t1 { int x; double y; char z; } s1;
    struct t2 { int i; double q; unsigned long u; } s2;

    void f1 (struct t1 *p) { p->y = sqrt ((double) p->x); }
    void f2 (struct t2 *p) { p->q = sqrt ((double) p->i); }
 
    union { struct t1 t1; struct t2 t2; } u;

    // Followed by code using the common initial sequence property

The implementation might wish to use different padding in structures t1
and t2. It is prevented from doing so by the existence of the union, but a
one-pass compilation will not become aware of this until after compiling
f1 and f2. Therefore it will have to assume, when deciding the layout of
the structures, that there might be a union. Therefore the rule about a
union type being visible is useless.

If, on the other hand, (2) is the reason, then the wording does not
address enough cases. For example, consider a version of the example in
6.5.2.3#8 where one member is signed and the other is unsigned.

    struct t1 { signed   int m; };
    struct t2 { unsigned int m; };
    int f(struct t1 * p1, struct t2 * p2)
    {
        if (p1->m > 0)
            p2->m = p2->m * 2;
        return p1->m;
    }

There is no common initial sequence but nevertheless many of the same
issues apply. On the other hand, the correct way for a function such as f
to protect itself against such aliasing is not to rely on the rule in
6.5.2.3#8, but rather to use the restrict qualifier.

I would suggest, therefore, that (2) is not a valid reason for the rule.
As stated above, a corollary of this discussion is that the "union type
must be visible" rule is useless.

Finally, one of the changes from C90 to C99 was to remove any restriction
on accessing one member of a union when the last store was to a different
one. The rationale was that the behaviour would then depend on the
representations of the values. Since this point is often misunderstood, it
might well be worth making it clear in the Standard.


Suggested Technical Corrigendum
-------------------------------

To address point (1), in 6.5.2.3#8, second example, change the two
structures to:

            struct t1 { double d; int m; };
            struct t2 { double d; int m; };

To address the wider point about visibility, change the first part of
6.5.2.3#5 to read:

    [#5]  One special guarantee is made in order to simplify the
    use of unions: if several structure types share a common initial
    sequence (see below), then corresponding members are required to 
    lie at the same offset from the start of the union. Therefore
    if a union contains two or more such structures, the common
    initial part may be inspected using any of them, no matter which
    one was used to store the value.

To address issues about "similar" types raised in point (2) above, change
the second part of #5 to read:

    Two  structures share a common initial
    sequence if  corresponding members have matching types for a
    sequence of one or more initial members. Two types, in turn, are
    matching if they are
    - compatible types (and, for bit-fields, the same widths)
    - signed and unsigned versions of the same integer type
    - qualified or unqualified versions of matching types, or
    - pointers to matching types.

To address the issue about "type punning", attach a new footnote 78a to
the words "named member" in 6.5.2.3#3:

    78a If the member used to access the contents of a union object
    is not the same as the member last used to store a value in the
    object, the appropriate part of the object representation of the
    value is reinterpreted as an object representation in the new type
    as described in 6.2.6 (a process sometimes called "type punning").
    This might be a trap representation.

Note: all the above changes are independent of one another, depending on
the committee's view of the issues.

====
UK Defect Report 14 of 2001-09

Subject: ordering of "defined" and macro replacement
-------

Problem
-------

Consider the code:

    #define repeat(x) x && x    // Line 1
    #if repeat(defined fred)    // Line 2

and the code:

    #define forget(x) 0         // Line 3
    #if forget(defined fred)    // Line 4

6.10.1#3 says:

    [#3] Prior to evaluation, macro invocations in the  list  of
    preprocessing   tokens  that  will  become  the  controlling
    constant expression are replaced  (except  for  those  macro
    names  modified  by  the defined unary operator), just as in
    normal text.  If the token defined is generated as a  result
    of  this  replacement  process  or  use of the defined unary
    operator does not match one of the two specified forms prior
    to  macro replacement, the behavior is undefined.

Does line 2 "generate" a defined operator ? Is line 4 strictly
conforming code, or does the fact that macro expansion "forgets" the
defined operator cause a problem ?

The restriction was clearly intended to make code like the following
undefined:

    #define jim defined
    #if jim loves sheila

I would guess that the original intention was that any "defined X" pair in
the original source worked correctly. The proposed change would resolve
this.

In addition, given the order of events, it is unsuitable to say that a
"defined X" expression is "evaluated". Rather it should be described as a
textual substitution.


Suggested Technical Corrigendum
-------------------------------

Change 6.10.1#1 to read:

    [...]
      defined identifier
    or
      defined ( identifier )

    which are replaced by the token 1 if the identifier is currently
    [...]
    subject identifier), or the token 0 if it is not.

and #3 to read:

    [#3] Prior to evaluation, the list of preprocessing tokens
    that will become the controlling constant expression is
    examined. Firstly all expressions using the "defined" operator
    are replaced as described above, and then macro invocations
    are replaced, just as in
    normal text.  If the token defined appears in the list after
    the replacement process, or the use of the defined unary
    operator does not match one of the two specified forms prior
    to  macro replacement, the behavior is undefined.  After all
    [...]

====
UK Defect Report 15 of 2001-09

Subject: macro invocations with no arguments
-------

Problem
-------

Consider the code:

    #define m0()  replacement
    #define m1(x) begin x end
    m0() m1()

The number of arguments in a macro invocation is defined by 6.10.3#11:

    [#11]  The  sequence  of preprocessing tokens bounded by the
    outside-most  matching  parentheses  forms   the   list   of
    arguments  for  the  function-like  macro.   The  individual
    arguments  within  the   list   are   separated   by   comma
    preprocessing tokens, but comma preprocessing tokens between
    matching inner parentheses do not  separate  arguments.

while 6.10.3#4 reads:

    [#4] If the identifier-list in the macro definition does not
    end  with  an  ellipsis,  the number of arguments (including
    those arguments consisting of no preprocessing tokens) in an
    invocation  of  a function-like macro shall equal the number
    of parameters in the  macro  definition.   Otherwise,  there
    shall  be  more  arguments  in the invocation than there are
    parameters in the  macro  definition  (excluding  the  ...).
    There  shall  exist  a ) preprocessing token that terminates
    the invocation.

Now:

EITHER: the invocation of m0 has a single argument,
OR:     the invocation of m1 has no arguments,

and in either case the requirement of 6.10.3#4 is violated.

This is clearly not the intent.


Suggested Technical Corrigendum
-------------------------------

Append to 6.10.3#4:

    If the invocation has no preprocessing tokens between the
    parentheses, this shall count as one argument unless the
    macro definition has neither an identifier list nor an ellipsis,
    in which case it shall count as no arguments.

====
UK Defect Report 16 of 2001-09

Subject: indeterminate values and identical representations
-------

Problem
-------

This is an intermingling of something that started out as two separate
questions:

(1) if an object holds an indeterminate value, can that value change other
than by an explicit action of the program ?
(2) if two objects hold identical representations derived from different
sources, can they be used exchangeably ?

However, after much discussion the UK National Body decided that they
were better treated together. Both involve the concept of the "provenance"
of a value.

Consider the code:

    char *p;
    unsigned char cp [sizeof p];

    p = malloc (SOME_VALUE);
    // Assume the allocation succeeds
    // Other code omitted that does not alter p.
    memcpy (cp, &p, sizeof p);
    (free)(p); // Point X
    // ...
    // Point Y
    if (memcmp (cp, &p, sizeof p))
        // ...

After the call to free the value of p is indeterminate. Is the
implementation allowed, at this point (point X), to change this
indeterminate value (presumably through compiler magic) so that the
memcmp function sees a difference, or must the value remain constant ?
Can it make the change later, between points X and Y ?

It is suggested that this is implied by 6.2.4#2:

       An object [...]
       retains its last-stored value throughout  its  lifetime.

particularly if each byte of an object is also an object.

On the other hand, such a requirement would eliminate useful optimisation
and debugging opportunities. (As an example of an optimisation, if p has
been loaded into a register and then modified, it need not be written back
to memory; as an example of a debugging opportunity, p could be set to a
null pointer or to a detectable value).

[Note that where an object contains padding, 6.2.6.1#6 and #7 allows the
value of padding bits and bytes to change whenever the object changes.]

If an implementation *is* allowed to change the value of p, then consider
the code:

    char *p, *q, *r;

    p = malloc (SOME_VALUE);
    // Assume the allocation succeeds
    q = p;
    r = p + 1;
    // Other code omitted that does not alter p, q, or r
    (free)(p);
    // Point Z

Can it change the value of q or r at point Z ? What about later ?

Now consider the code:

    int *p, *q;
    p = malloc (sizeof (int)); assert (p != NULL);
    (free)(p);
    q = malloc (sizeof (int)); assert (q != NULL);

    if (memcmp (&p, &q, sizeof p) == 0)
    {
        // Assume this point is reached
        *p = 42;  // Line A

Is the assignment valid (because an assignment using *q would have been,
and the two variables hold identical values) ? Or is it invalid because
the last-stored value of p is now indeterminate (because of the free) ?

Similarly, consider the code:

    int x, y;
    int *p, *q;

    p = &x + 1;
    q = &y;
    if (memcmp (&p, &q, sizeof p) == 0)
    {
        // Assume this point is reached
        *q = 42; p [-1] = 42;   // Line B
        *p = 42; q [-1] = 42;   // Line C

The assignments on line B are clearly valid, but what about those on line
C ? After all, p and q are identical, even in their hidden bits. What if
we then add:

        int *r;
        remote_memcpy (&r, &p, sizeof p);   // See note
        *r = 42;      // Line D
        r [-1] = 42;  // Line E

(The function remote_memcpy is identical to memcpy, but it is done in
another translation unit so that the compiler cannot associate special
semantics with it.) Which, if either, of the assignments is allowed ?

Another example is the program:

    static int *p;

    void f (int n)
    {
        int sum = 0;
        int *q = &n;

        if (memcmp (&p, &q, sizeof p) == 0)
            for (int i = 0; i < n; i++)
                sum += i, p [i] = 0;
        p = &n;
    }

    int main (void)
    {
        int x;

        p = &x;
        f (1);
        f (6);
        return 0;
    }

On the first call to f the test is false. Therefore p is set to point to
n. The value of p becomes indeterminate at the end of the call, but on
most implementations this will have no effect. On the second call the
test is true. Therefore the first time round the loop p [0], which is n,
will be set to 0 and the loop will terminate.

However, most implementations would reasonably assume that n is not
changed by anything in the loop and generate code accordingly. If the
behaviour were undefined for some reason, such an implementation would be
conforming. But is it strictly-conforming or is it undefined ?

Finally, note that we can generate a similar situation without giving the
compiler any clue in advance:

    void f (int vec [], int n)
    {
        void *vp;
        int *p;

        printf ("%p or %p", (void *) vec, (void *) &n);    // Line Q
        scanf ("%p", &vp); p = vp;
        for (int i = 0; i < n; i++)
            p [i] = 0;
    }

The user could ensure that p is set to either of the values printed.
If a debugger is used, it isn't even necessary to retain line Q to
determine the value to enter on stdin, and therefore the compiler has
no warning that the address of n is being taken.


Resolution
----------

After much discussion, the UK National Body came to a number of
conclusions as to what it would be desirable for the Standard to mean.
These can be expressed as three requirements.

(1) The implementation is entitled to take account of the provenance of a
pointer value when determining what actions are and are not defined. Thus
the assignments on lines A and C involve undefined behaviour. Similarly
line D would be undefined and line E valid, though in practice a compiler
would probably assume that p could point anywhere.

(2) Where a pointer value becomes indeterminate because the object pointed
to has reached the end of its lifetime, all objects whose effective type is
a pointer and that point to the same object acquire an indeterminate value.
Thus p at point X, and p, q, and r at point Z, can all change their value.

(3) At any time that the compiler can determine that an object contains an
indeterminate value, even if the type of the object does not have trap
representations, the object may change value arbitrarily. Thus p need not
have the same values at lines X and Y. As soon as the object is given an
explicit value, this behaviour stops.


Suggested Technical Corrigendum
-------------------------------

Change 3.17.2 to:

    [#1] indeterminate value
    a value which, at any given moment, could be either an unspecified
    value or a trap representation.

    [#2] While an object holds an indeterminate value it is indeterminate.
    Successive reads from an object that is indeterminate might return
    different results. Storing a value in an object, other than an
    indeterminate value, means that the object is no longer indeterminate.

Change the last sentence of 6.2.4#2 from:

    The value of a pointer becomes indeterminate when the object it
    points to reaches the end of its lifetime.

to:

    When an object reaches the end of its lifetime, any object with an
    effective type that is a pointer type and that points to that object
    becomes indeterminate.

[Various uses of the word "indeterminate" could be tidied up, but this is
the only one where the meaning needs to change.]

Add a new paragraph to 6.5.3.2:

    [#5] The implementation is permitted to use the derivation of a
    pointer value in determining whether or not access through that
    pointer is undefined behaviour, even if the pointer compares
    equal to, or has the same representation as, a different pointer
    for which the access would be permitted. For example, if two
    objects with the same type have non-overlapping lifetimes and
    happened to occupy the same address, a pointer to one cannot be
    used to access the other.

[The * operator seems a reasonable place to put this. However, it
could equally be elsewhere.]

====
UK Defect Report 17 of 2001-09

Subject: constant expressions
-------

Problem
-------

When is an expression a constant expression ?

Consider the code (at block scope):

    enum e1 { ex1 = INT_MAX + 1 };        // Line E1
    enum e2 { ex2 = INT_MAX + (0, 1) };   // Line E2
    char *p1 = (1 - 1);                   // Line P1
    char *p2 = (42, 1 - 1);               // Line P2
    short s1 = 42 + (0, 1);               // Line S1
    p1 = (42, 1 - 1);                     // Line X1
    s1 = (42, 69);                        // Line X2
    p2 = 0;                               // Line X3
    p2 = 1 - 1;                           // Line X4

On line E1 the syntax says that INT_MAX + 1 is a constant-expr. Therefore
this is a constant expression, the requirements of 6.6 apply, and line E2
violates the constraint in 6.6#3.

On the remaining lines the syntax says that the code following the = sign
is an assignment-expr; at no point in the parse does a constant-expr occur.
So are these constant expressions ?

For line P1 to be legitimate, the expression (1 - 1) must be an integer
constant expression (6.3.2.3#3). This implies that any expression
comprised entirely of constants is an integer constant expression. So line
P2 violates the constraint in 6.6#3 and, rather more worryingly, so does
line S1.

If a generic initializer can be a constant expression, then, surely, so
can any other expression. This means that lines X1 and X2 violate the
constraint in 6.6#3. On the other hand, if they are not constant
expressions, then the right hand sides on lines X3 and X4 do not include a
null pointer constant; nor does line P1.

Consider also:

    static int v = sizeof (int [(2, 2)]);

This is legitimate if, and only if, "(2, 2)" is a constant expression.

It would appear that the term "constant expression" actually has four
subtly different meanings.

(1) An object in the syntax. Where the syntax tree contains constant-expr
    the resulting code must meet the constraints and semantics of 6.6.
    An example is 6.7.2.2, where explicit values for enumeration constants
    must be constant-exprs.
(2) A requirement on the program that a given construct must, in context,
    be a constant expression even though in other contexts the expression
    need not be constant. An example is 6.7.8#4: if the object has static
    storage duration, the initializer is subject to the constraints and
    semantics of 6.6, but if it has automatic storage duration there is
    no such requirement.
(3) A requirement on the implementation that an entity must be a constant
    expression. For example, this applies to macros in standard headers.
    The implementation is not conforming if the definition does not meet
    the syntax, constraints, and semantic requirements of 6.6.
(4) A test that distinguishes two cases. An example is 6.3.2.3#3, where
    a certain subset of integer expressions (those that are constant-exprs
    and have a value of 0) are also null pointer constants. It is not
    clear whether expressions that break the constraints or semantic
    requirements are erroneous or are simply not constant expressions.

The Standard needs to make clear when each of these four cases applies.

On further examination, cases (1) and (2) appear to always be obvious
from the text of the Standard. Case (3) appears only to apply to macros
defined in standard headers or predefined. Case (4) is harder to
identify, but I believe that there are only two situations:
- null pointer constants;
- determining whether a type is variably modified.


Suggested Technical Corrigendum
-------------------------------

Replace 6.6#2 with the following:

    [#2] A constant expression is one which is evaluated during
    translation rather than runtime, usually because the precise value
    will affect the translation in some way.

    [#2a] Where the implementation is required to provide a constant
    expression, that expression shall be one that, if included in the
    appropriate context, would meet the requirements of this subclause
    and whose evaluation would not involve undefined behaviour.

    [#2b] An expression has a /translation-time value/ if it meets
    the requirements of this subclause and evaluation would not involve
    undefined behaviour. If the expression fails to meet these
    requirements (for example, an integer expression includes a comma
    operator or a cast to a floating type), the expression does not have
    a translation-time value but nevertheless is not necessarily invalid.

Change 6.3.2.3#3 to begin:

    [#3] An integer expression with the translation-time value 0,
    or such an expression cast  to type void *, is called a null
    pointer   constant.55)
 
Change 6.7.5.2#1 to read, in part:

    [...] an integer type. If the
    expression has a translation-time value, it shall be
    greater than zero. The element type [...]

the last part of #4 to read:

    If the size is an integer expression with a translation-time
    value and the element  type  has  a  known
    constant size, the array type is not a variable length array
    type;  otherwise,  the array type is a variable length array
    type.

#5 to begin:

    [#5] If the size is an expression that does not have a
    translation-time value: if  it  occurs [...]

#6 to begin:

    [#6] For two array types to be compatible, both  shall  have
    compatible  element  types,  and if both size specifiers are
    present and have translation-time values,  then  both
    size  specifiers shall have the same value.
 
and add a new example:

    [#11] EXAMPLE 5: an expression that contains only constants
    but breaks one or more of the rules of 6.6 does not have a
    translation-time value. Therefore, in:

        int fla [5];       // not a VLA, "5" has a translation-time value
        int vla [(0, 5)];  // VLA, 6.6 forbids comma operators

    This can be used to force an array to have a constant size but
    still be variably modified.

====
UK Defect Report 18 of 2001-09

Subject: maximum size of bit fields.
-------

Problem
-------

6.7.2.1#3 reads, in part:

    [#3] The expression that specifies the width of a  bit-field
    shall be an integer constant expression that has nonnegative
    value that shall not exceed the number of bits in an  object
    of  the  type  that is specified if the colon and expression
    are omitted.

Is "the number of bits of the type ..." the width or is it the number of
bits in the object representation ?

Since it might not be practical to make use of padding bits in such an
object, the former would be more sensible.


Suggested Technical Corrigendum
-------------------------------

Change the cited text to read:

    [#3] The expression that specifies the width of a  bit-field
    shall be an integer constant expression that has nonnegative
 |  value that shall not exceed the width of an object
    of  the  type  that is specified if the colon and expression
    are omitted.

====
UK Defect Report 19 of 2001-09

Subject: all-zero bits representations
-------

Problem
-------

Consider the code:

    int v [10];
    memset (v, 0, sizeof v);

Most programmers would expect this code to set all the elements of v to
zero. However, the code is actually undefined: it is possible for int to
have a representation in which all-bits-zero is a trap representation
(for example, if there is an odd-parity bit in the value).

Consider also:

    int *p;
    p = calloc (n_members, sizeof (int));

This problem applies to all integer types except for unsigned char. I
believe that the idiom is well-enough known that it should be made a
part of the Standard.


Suggested Technical Corrigendum
-------------------------------

Append to 6.2.6.2#5:

    For any integer type, the object representation where all the
    bits are zero shall be a representation of the value zero in
    that type.


====
UK Defect Report 20 of 2001-09

Subject: graphic characters
-------

Problem
-------

The Standard uses the terms "printing character", "graphic character",
and "nongraphic character". The first is discussed in 5.2.2#1 and defined
formally in 7.4#3:

    [#3] The term printing character refers to  a  member  of  a
    locale-specific  set  of  characters, each of which occupies
    one printing position on a display device;

A "nongraphic character" is clearly a character which is not a graphic
character, but "graphic character" is nowhere defined. It is used only in
5.2.1#3, which requires "the following 29 graphic characters" to be part
of the basic character sets, while "nongraphic character" is used in
5.2.2#2 and 6.4.4.4#8 when discussing the \a \b \f \n \r \t and \v escape
sequences.

The key questions are:
(1) Are the 29 enumerated graphic characters required to be printing
    characters ?
(2) Are isalnum() and isspace() required to be false for them ?
(3) Is ispunct() required to be true for them ?

In addition, given that the seven characters corresponding to the escape
sequences above are required to be control characters (see 5.2.1#3):

(4) Should "nongraphic character" be replaced by "control character" ?

I believe that the answers should be:

(1) yes
(2) yes;
(3) yes in the C locale, but not otherwise;
(4) yes.

However, it is not clear that these answers can be derived from the
Standard (though if (1) and (2) are "yes", (3) must at least be "yes in
the C locale").


Suggested Technical Corrigendum
-------------------------------

To address (1): in 5.2.1#3, replace "29 graphic characters" with
"29 printing characters".

To address (4): in 5.2.2#2 and 6.4.4.4#8 replace "nongraphic" with "control".

To address (2): append to 5.2.1#4:

    A /graphical mark character/ is one of the 29 other printing
    characters listed above.

in 7.4.1.2#2, insert between the two sentences:

    The isalpha function returns false for all graphical mark characters.

and in 7.4.1.10#2, change "characters for which" to "characters which are
not graphical mark characters and for which".

Given the above changes, (3) can be derived from the modified Standard.

====
UK Defect Report 21 of 2001-09

Subject: preprocessor arithmetic
-------

Problem
-------

Assume that both compile-time and run-time arithmetic have 2's
complement, no trap representations, 8/16/32/48/64 bit integer types.
Consider the code:

    #if -0xFFFFFFFF < 0

Is this expression true or false ? 6.10.1#3 reads, in part:

    and then  each  preprocessing
    token  is  converted  into  a  token.   The resulting tokens
    compose  the  controlling  constant  expression   which   is
    evaluated  according  to  the  rules of 6.6, except that all
    signed integer types and all unsigned integer types  act  as
    if  they  have the same representation as, respectively, the
    types  intmax_t  and  uintmax_t  defined   in   the   header
    <stdint.h>.
 
Does the "except" wording apply to the conversion to a token, or only to
the evaluation of the expression ? If the former, then 0xFFFFFFFF can be
represented in an int (intmax_t), it has a signed type, and the
expression is true. If the latter, 0xFFFFFFFF cannot be represented in an
int but can be represented in an unsigned int, so it has unsigned type and
the expression is false.

I believe that the former was intended, with the preprocessor only having
to consider one pair of integer types.


Suggested Technical Corrigendum
-------------------------------

Change the cited text to:

    and then  each  preprocessing
    token  is  converted  into  a  token.   The resulting tokens
    compose  the  controlling  constant  expression   which   is
    evaluated  according  to  the  rules of 6.6.
|   For the purposes of the conversion and evaluation all
    signed integer types and all unsigned integer types  act  as
    if  they  have the same representation as, respectively, the
    types  intmax_t  and  uintmax_t  defined   in   the   header
    <stdint.h>.

Add a footnote reference to the end of this text, and add the footnote:

    140a Thus on an implementation where INT_MAX is 0x7FFF and
    UINT_MAX is 0xFFFF, the constant 0x8000 is signed within a
    #if expression even though it is unsigned in translation
    phase 7.

====
UK Defect Report 22 of 2001-09

Subject: overflow of sizeof
-------

Problem
-------

Consider the following code:

    char x [SIZE_MAX / 2][SIZE_MAX / 2];
    size_t s = sizeof x;

The size of x cannot be fitted into an object of type size_t. Assuming that
SIZE_MAX is 65535, what is the value of s ? More generally, which of the
following is, or should be, the case ?

(1) The value is reduced modulo (SIZE_MAX + 1).
(2) The behaviour is undefined (or perhaps implementation-defined).
(3) The program is forbidden to use sizeof with such a large argument.
(4) The implementation must ensure that no object can be larger than
    SIZE_MAX bytes.

6.5.3.4#2 says in part:

    [#2]  The  sizeof operator yields the size (in bytes) of its
    operand, which may be an  expression  or  the  parenthesized
    name of a type.  The size is determined from the type of the
    operand.  The result is an integer.

Note that there is no indication that the result may be other than the
correct size.


Suggested Technical Corrigendum
-------------------------------

One of:

(1) Append to 6.5.3.4#4:

    If the size is too large to fit in an object of type size_t, it is
    converted to that type in the manner described in subclause 6.3.1.3.

(2) Append to 6.5.3.4#4:

    If the size is too large to fit in an object of type size_t, it is
    replaced by an implementation-defined value.

(3) Add a new constraint paragraph after 6.5.3.4#1:

    [#1a] The sizeof operator shall not be applied to an operand whose
    size, in bytes, is larger than the maximum value of the type size_t.

(4) Append to 6.5.3.4#4:

    The implementation shall ensure that the type size_t is large enough
    to hold the result of all uses of the sizeof operator.

[Some of these are less than wonderful, and consideration should also be
given to the interaction with VLAs.]

====
UK Defect Report 23 of 2001-09

Subject: jumps into iteration statements
-------

Problem
-------

Consider the code:

    int x = 0;
    goto centre;
    while (++x < 10)
    {
        // Some code
    centre:
        // More code
    }

"Everyone knows" that, when the end of the block is reached, the loop test
is evaluated in the normal way. Nevertheless, I can find nothing in the
Standard that says so (it is implied by the example in 6.8.6.1#3, but that
is all). Note that in:

    int x;
    // ...
    if (condition) { x = -1; goto true_case; }
    // ...
    if (x > 0)
      true_case:
        do_something ();
    else
        do_something_else ();

the else case is not executed after a jump to true_case, even though the
condition x > 0 is false. Therefore it is not possible to argue from
analogy; note also that this latter case is spelled out in the Standard.

Since this technique is well-known, it ought to be well-defined.


Suggested Technical Corrigendum
-------------------------------

Add a new paragraph after 6.8.5#4:

    [#4a] If the loop body is reached by a jump from outside the iteration
    statement, the behaviour is as if the body were entered in the normal
    way. That is, when the end of the body is reached the controlling
    expression is evaluated (and, in the case of a for statement, expr-3
    is evaluated first) and the body re-executed if it is not 0.
    Similarly, a break or continue statement has the appropriate effect.
    However, the code jumped over - including the controlling expressions
    in the case of a while or for statement - is not evaluated when the
    jump happens.

Possibly also add an example either as 6.8.5#6 or 6.8.6.1#5 (with
appropriate editorial changes):

    [#6] EXAMPLE: A jump into a for statement does not execute clause-1
    at all or expr-2 during the jump:

        int i = 5;
        if (condition) goto body;
        for (i = 0; i < 10; i++)
        {
            if (i > 2) i++;
        body:
            printf (" %d", i);
        }
        printf ("\n");

    If condition is true, this prints:
        5 7 9
    while if it is false it prints:
        0 1 2 4 6 8 10

====
UK Defect Report 24 of 2001-09

Subject: lacunae in exact-width integer types
-------

Problem
-------

7.18.1.1 reads:

    [#1] The typedef name intN_t  designates  a  signed  integer
    type  with  width N, no padding bits, and a two's complement
    representation.  Thus, int8_t denotes a signed integer  type
    with a width of exactly 8 bits.

    [#2] The typedef name uintN_t designates an unsigned integer
    type with width  N.   Thus,  uint24_t  denotes  an  unsigned
    integer type with a width of exactly 24 bits.

    [#3]   These   types   are   optional.    However,   if   an
    implementation provides integer types with widths of 8,  16,
    32,  or  64  bits, it shall define the corresponding typedef
    names.

The requirements for no padding bits and two's complement were added at
a late stage, and the implications to the text weren't fully thought
through. In particular:
- the second sentence of #1 is inconsistent with the first;
- the unsigned types should also have the "no padding bits" requirement
  (it can be derived from the requirement to provide both or neither of
  these types and the requirement that they have the same size, but it
  ought to be spelled out);
- the requirements in #3 aren't the same as those in #1, so an
  implementation can't have 8 bit types *with* padding bits or
  a sign-and-magnitude representation.


Suggested Technical Corrigendum
-------------------------------

Change this section to read:

    [#1] The typedef name intN_t  designates  a  signed  integer
    type  with  width N, no padding bits, and a two's complement
    representation.  Thus, int8_t denotes a signed integer  type
    with a width of exactly 8 bits and those other properties.

    [#2] The typedef name uintN_t designates an unsigned integer
    type with width N and no padding bits. Thus, uint24_t denotes an
    unsigned integer type with a width of exactly 24 bits and no
    padding bits.

    [#3]   These   types   are   optional.    However,   if   an
    implementation provides integer types with widths of 8,  16,
    32,  or  64  bits, no padding bits, and (for the signed types)
    that have a two's complement representation, it shall define
    the corresponding typedef names.

Or, alternatively:

    [#3]   These   types   are   optional.    However,   if   an
    implementation has a type with width 8, 16, 32, or 64 bits that
    meet the above requirements, it shall define the corresponding
    typedef names.


====
UK Defect Report 25 of 2001-09

Subject: wint_t is not the promoted version of wchar_t
-------

Problem
-------

In the fprintf conversion specifier "%lc", the corresponding argument is
of type wint_t, but is then treated as if it contained a wchar_t value.
In 7.19.6.1#18, the last call is:

    fprintf(stdout, "|%13lc|\n", wstr[5]);

This argument has the type wchar_t.

There is no requirement in the Standard that the default argument
promotions convert wchar_t to wint_t. Therefore this example exhibits
undefined behaviour on some implementations. Nonetheless, the code looks
like it ought to work, and WG14 should consider changing the definition
of wint_t to force it.

The current definition of wint_t is in 7.24.1#2:

        wint_t

    which  is  an  integer  type  unchanged  by default argument
    promotions that can hold any value corresponding to  members
    of the extended character set, as well as at least one value
    that does not correspond  to  any  member  of  the  extended
    character set (see WEOF below);269) and

    269wchar_t and wint_t can be the same integer type.

Three possible solutions are:
(1) Fix the example.
(2) Change the definition of wint_t to be the promoted version of
wchar_t.
(3) Change the definition of %lc to take promoted wchar_t rather than
wint_t.


Suggested Technical Corrigendum 1
---------------------------------

Change the quoted line of 7.19.6.1#18 to:

    fprintf(stdout, "|%13lc|\n", (wint_t) wstr[5]);


Suggested Technical Corrigendum 2
---------------------------------

Change the cited portion of 7.24.1#2 to:

        wint_t

    which is the integer type resulting when the default argument
    promotions are applied to the type wchar_t;269) and


Suggested Technical Corrigendum 3
---------------------------------

Change 7.19.6.1#7 and 7.24.2.1#7, l modifier, to:

   l (ell)   Specifies  that a following d, i, o, u, x, or X
             conversion specifier applies to a long  int  or
             unsigned  long int argument; that a following n
             conversion specifier applies to a pointer to  a
             long   int   argument;   that   a  following  c
             conversion  specifier  applies  to an argument       ||
             whose type is that resulting when the default        ||
             argument conversions are applied to the type         ||
             wchar_t; that   a   following  s  conversion         ||
             specifier applies to a  pointer  to  a  wchar_t
             argument; or has no effect on a following a, A,
             e, E, f, F, g, or G conversion specifier.

Change 7.19.6.1#8, c specifier, second paragraph, to:

        If  an  l  length  modifier  is  present, the argument    ||
        - whose type is that resulting when the default           ||
        argument conversions are applied to the type wchar_t -    ||
        is converted as  if  by  an  ls  conversion
        specification with no precision and an argument that
        points to the initial element of a two-element array
        of  wchar_t, the first element containing the             ||
        argument to the lc conversion specification and  the
        second a null wide character.

Change 7.24.2.1#8, c specifier, second paragraph, to:

        If  an  l  length  modifier  is  present, the             ||
        argument is converted to wchar_t and written.


====
UK Defect Report 26 of 2001-09

Subject: lacuna in iswctype and towctrans
-------

Problem
-------

Consider the calls:

    iswctrans (c, wctype (property))
    towctrans (c, wctrans (property))

where property is not valid in the current locale. The wctype and
wctrans functions return zero, but the behaviour of iswctype and
towctrans is not specified.

I believe it would be useful - and considered natural - for them to
return 0 ("c does not have this property") and c ("c is unaffected by
this mapping") respectively.


Suggested Technical Corrigendum
-------------------------------

Append to 7.25.2.2.1#4:

    If desc is zero, the iswctype function returns zero (false).

Append to 7.25.3.2.1#4:

    If desc is zero, the towctrans function returns the value of wc.


====
UK Defect Report 27 of 2001-09

Subject: type category
-------

Problem
-------

The concept of "type category" is defined but is never used in a
useful way; it is also used inconsistently. The term and its cognates
appear in only six places:

    6.2.5#24: defines the term;
    6.2.5#25: qualified and unqualified versions of types belong to
              the same category;
    6.2.5#27: example: (float *) has category "pointer";
    6.2.5#28: example: (struct tag (*[5])(float)) has category "array";
    footnote 93:  "... removes any type qualifiers from the type category
                  of the expression"
    footnote 137: "The intent is that the type category in a function
                  definition cannot be inherited from a typedef."

Note how the use in footnote 93 conflicts with that in #25, and that the
use in footnote 137 remains less than clear.

Having an unnecessary term defined leaves the reader confused to no
benefit. The term should be removed and the remaining wording changed.

Even if the other changes described here are foregone, footnote 93 is in
error and should be changed.


Suggested Technical Corrigendum
-------------------------------

Delete 6.2.5#24.

In 6.2.5#25, delete "belong to the same type category and".

In 6.2.5#27, change "Its type category is pointer" to "It is a pointer
type".

In 6.2.5#28, change "Its type category is array" to "It is an array
type".

In footnote 93 change "which removes any type qualifiers from the type
category of the expression" to "which removes any type qualifiers from
the outermost component of the type of the expression (for example, it
removes const but not volatile from the type int volatile *const)".

In footnote 137 change the first part to:

    The intent is that the fact that the identifier designates a
    function is shown explicitly and cannot be inherited from a typedef:

leaving the examples unchanged.

====
UK Defect Report 28 of 2001-09

Subject: meaning of __STDC_ISO_10646__
-------

Problem
-------


6.10.8 reads in part:

    __STDC_ISO_10646__ An integer constant of the  form  yyyymmL
             (for  example,  199712L), intended to indicate that
             values   of   type   wchar_t    are    the    coded
             representations   of   the  characters  defined  by
             ISO/IEC 10646,  along  with  all   amendments   and
             technical  corrigenda  as of the specified year and
             month.

Firstly, this wording is less than optimal, in that it could be read as
making an implementation non-conforming if wchar_t has a value that does
not correspond to an ISO 10646 (Unicode) character. Since Unicode has gaps
in the encoding tables, this would mean that no implementation could
define this symbol.

Secondly, is this wording meant to put a lower bound on the size of
wchar_t, or is the (wchar_t == Unicode) mapping only apply to those values
that wchar_t can take. In other words, if a given version of Unicode
defines characters up to U+12345, can WCHAR_MAX be less than 0x12345 on a
system that defines this symbol ?


Suggested Technical Corrigendum
-------------------------------

Replace the cited text by:

    __STDC_ISO_10646__ An integer constant of the  form  yyyymmL
             (for  example,  199712L). If this symbol is defined,
             then every character in the "Unicode required set",
             when stored in an object of type wchar_t, has the same
             value as the short identifier of that character.

and then either:

             The "Unicode required set" consists of all the
             characters that are defined by ISO/IEC 10646, along
             with all amendments and technical corrigenda, as of
             the specified year and month.
    
if the intent is to put a minimum on the value of WCHAR_MAX, or
then:

             The "Unicode required set" consists of all the
             characters that:
             - are defined by ISO/IEC 10646, along with all
               amendments and technical corrigenda, as of
               the specified year and month; and
             - have short identifiers that lie within the range
               of values that can be represented by the type wchar_t.

====
UK Defect Report 29 of 2001-09

Subject: meaning of "character" in <string,h> functions
-------

Problem
-------

7.21.2.1#2 defines the operation of memcpy as:

    [#2] The memcpy function copies n characters from the object
    pointed to by s2 into the  object  pointed  to  by  s1.

7.21.2.3#2 defines the operation of strcpy as:

    [#2] The strcpy function copies the string pointed to by  s2
    (including  the  terminating  null character) into the array
    pointed to by s1.

Other functions in 7.21 refer to either a string or a set of characters in
the same way. The definition of string is in 7.1.1#1:

    [#1]  A  string  is  a  contiguous  sequence  of  characters
    terminated by and including the first null  character.

and that of character is in 3.7:

    3.7
    [#1] character
    <abstract>  member  of  a  set  of  elements  used  for  the
    organization, control, or representation of data

    3.7.1
    [#1] character
    single-byte character
    <C> bit representation that fits in a byte

However, none of this makes it clear whether "character" is to be
interpreted as having type char, signed char, or unsigned char. This
matters because signed char need not have the same sized range of
values as unsigned char (for example, SCHAR_MIN could be -127, or on a 10
bit byte system signed chars could have a padding bit, with SCHAR_MAX
equal to 255 but UCHAR_MAX equal to 1023).
 
It would be very unfortunate if the mem* functions could not copy every
possible byte value. The str* functions probably ought to access the
values as if they were plain char.


Suggested Technical Corrigendum
-------------------------------

Append a new paragraph to 7.21.1:

    [#3] Where a block of characters is accessed through a parameter
    of type void *, each character shall be interpreted as if it had
    type unsigned char (and therefore every object repesentation is
    valid and has a different value). Where it is accessed through a
    parameter of type char *, each character shall be interpreted as
    if it had type char (and therefore, if CHAR_MAX - CHAR_MIN + 1 is
    less than UCHAR_MAX, some byte values may be trap representations
    or be treated as equal to other values).

====
UK Defect Report 30 of 2001-09

Subject: bitwise-OR of nothing
-------

Problem
-------

FE_ALL_EXCEPT is defined in 7.6#6 as:

    [#6] The macro

            FE_ALL_EXCEPT

    is  simply  the  bitwise  OR of all floating-point exception
    macros defined by the implementation.

If no floating-point exception macros are defined, is FE_ALL_EXCEPT:
- required to be defined as zero
- required to be undefined
- unspecified whether it is either of the above ?

[This appears to be the only case of its kind.]


Suggested Technical Corrigendum
-------------------------------

Append to 7.6#6:

    If no such macros are defined, FE_ALL_EXCEPT can either be defined
    as 0 or left undefined.

====
UK Defect Report 31 of 2001-09

Subject: orientation of perror
-------

Problem
-------

The perror function (7.19.10.4) is not listed in 7.19.1 as either a byte
input/output function or a wide character output function. I believe it
should be the former.


Suggested Technical Corrigendum
-------------------------------

In 7.19.1#5, fourth bullet, insert "perror" after "gets".

====
UK Defect Report 32 of 2001-09

Subject: declarations within iteration statements
-------

Problem
-------

Consider the code:

    for (enum fred { jim, sheila = 10 } i = jim; i < sheila; i++)
        // loop body

6.8.5#3 reads:

    [#3]  The  declaration  part  of  a for statement shall only
    declare identifiers for objects having storage class auto or
    register.

Does this wording forbid the declaration of tag fred - since it is not
an object - or is fred not covered by that wording because it is not an
object ?


Suggested Technical Corrigendum
-------------------------------

Change 6.8.5#3 to one of:

    [#3]  The  declaration  part  of  a for statement shall only
    declare identifiers for objects; any object so declared shall
    have storage class auto or register.

or:

    [#3]  Any objects whose identifier is declarared in the
    declaration part of a for statement shall have storage class
    auto or register.

====
UK Defect Report 33 of 2001-09

Subject: lacuna in character encodings
-------

Problem
-------

Defect Report 091 discussed a multibyte character encoding where some
single-byte characters are proper prefixes of two-byte characters. For
example, single-byte characters have codes 1 to 127 while two-byte
characters consist of such a code followed by a code from 128 to 255.
At the time WG14 stated that such an encoding was legitimate.

Now 5.2.1.2 states, inter alia:
    -- The  basic  character  set  shall  be  present and each
       character shall be encoded as a single byte.

    -- A byte with all bits zero shall  be  interpreted  as  a
       null character independent of shift state.

    -- A byte with all bits zero shall not occur in the second
       or subsequent bytes of a multibyte character.

Nothing in this wording forbids a two-byte character from having a
first byte that is zero. By the logic of DR091, just as the sequences
0x12 and 0x12 0x9A are both valid, but different, characters, so would
the sequences 0x00 and 0x00 0x9A; the first would be the null character
and the second would be something else. Note that there are no shift
states, and so the wording "independent of shift state" is irrelevant.

This interpretation is undesirable for obvious reasons, and so it ought
to be outlawed.


Suggested Technical Corrigendum
-------------------------------

Replace the current last two bullets with a single one:

    -- A byte with all bits zero shall  be  interpreted  as  a
       null character independent of shift state. Such a byte shall
       not occur as part of any other multibyte character.

====