SC22/WG14 N783 Significant outstanding issues Clive D.W. Feather clive@demon.net 1997-10-20 Abstract ======== This paper is assembled from those elements of N720, N735, and N739 that involve significant outstanding issues in the Standard. Items are given a serial number in this paper, but also carry a note stating their origin. Items taken from N720 do not have a rationale; the related DR explains the issues. Change bars are included where part of a large piece of text is changed, but not in some items where nearly all the quoted text is changed. References are relative to Draft 11 pre 3. Specific items ============== Item 1 [Was N720 DR 166] ----------------- Many constraints refer to lvalues, yet the current definition can make it impossible to tell if something is an lvalue until runtime. [6.3.2.4, 6.3.3.1, 6.3.16 are mentioned; 6.3.3.2 has already been addressed.] [Wording needed] Item 2 [Was N720 DR 172] ----------------- There are a number of defects in the rules for pointer comparison. These should be fixed. Suitable wording is provided in the original DR. Item 3 [Was N739 item 11] ------------------ In subclause 6.5.2.1, change paragraph 3 from: The expression that specifies the width of a bit-field shall be an integral constant expression that has nonnegative value that shall not exceed the number of bits in an ordinary object of compatible type. If the value is zero, the declaration shall have no declarator. to: The expression that specifies the width of a bit-field shall be an integral constant expression that has nonnegative value that | shall not exceed the number of bits in an object of the type | that would be specified if the colon and expression had been | omitted. If the value is zero, the declaration shall have no declarator. The current wording doesn't say *what* the type is compatible with. Item 4 [Was N739 item 12] ------------------ Subclause 6.5.2.2 allows an enumerated type (say /enum e/) to be compatible with /long/ or even /unsigned long long/. On the other hand, subclause 6.2.1.1 states that the type converts to /int/ or /unsigned int/ as part of the integral promotions. This produces the apparent contradiction that two compatible types promote differently ! There are two alternative approaches to solving this. (A) Change subclause 6.5.2.2 paragraph 4 from: Each enumerated type shall be compatible with an integer type. The choice of type is implementation-defined, but shall be capable of representing the values of all the members of the enumeration. to: | Each enumerated type shall be compatible with one of the following | types: | signed char unsigned char | signed short unsigned short | signed int unsigned int The choice of type is inplementation-defined, but shall be capable of representing the values of all the members of the enumeration. (B) Change subclause 6.2.1.1 paragraph 1 from: A /char/, a /short int/, or an /int/ bit-field, or their signed or unsigned versions, or an enumeration type, may be used in an expression wherever an /int/ or /unsigned int/ may be used. If an /int/ can represent all values of the original type, the value is converted to an /int/; otherwise, it is converted to an /unsigned int/. These are called the /integral promotions/.[37] All other arithmetic types are unchanged by the integral promotions. to: A /char/, a /short int/, or an /int/ bit-field, or their signed or | unsigned versions, may be used in an expression wherever an /int/ or /unsigned int/ may be used. If an /int/ can represent all values of the original type, the value is converted to an /int/; otherwise, it is converted to an /unsigned | int/. These are called the /integral promotions/.[37] | An enumeration type may be used in an expression wherever the type | that it is compatible with may be used. The integral promotions | cause the value to be converted in the same way as that compatible | type would be. All other arithmetic types are unchanged by the integral promotions. and in subclause 6.5.2.2, change the first sentence of paragraph 4 from: Each enumerated type shall be compatible with an integer type. to: | Each enumerated type shall be compatible with some signed or | unsigned integral type. [At present, enumerated types *are* integer types; the intent is to make them clearly compatible with one of the 10 types named in 6.1.2.5.] Item 5 [Was N720 DRs 072, 073 and 178] ------------------------------- These DRs leave the issue of the "struct hack" totally confused. One way out may be to explicitly bless the following: struct hack { /* other members */ T last []; /* Last member may be an indeterminate size array */ } sizeof (struct hack) would equal offsetof (struct hack, last). The notation is an explicit warning that last will be accessed as a VLA within malloced memory. In any case, wording will be required. Item 6 [Was N720 DR 142] ----------------- The Technical Corrigendum given in the DR misses the point. The words "unless explicitly stated otherwise" aren't needed, because they are implicit in any reading of the Standard, but in any case they don't solve the original problem. What the DR asked about is using #undef with reserved identifiers, something which is currently strictly conforming. The following change (suggested in the DR) is necessary to allow an implementation to make use of flag macros such as _INCLUDED_STDIO_H. Append to 7.1.3: If the program removes (with #undef) any macro definition of an identifier in the first group listed above, the behaviour is undefined. Item 7 [Was N735 item 2] ----------------- There was a long discussion some time ago about the following code: printf ("%n foo %n", &i, &i); and whether it is strictly conforming. I would suggest that we need the following somewhere in 7.1 (either as a new 7.1.9, or add in 7.1.8 after paragraph 2): [1] Except where explicitly stated, there are no sequence points during the evaluation of a library function. Where a function's action is described in sequential terms, or one function is defined in terms of calls to another, this is for the purpose of describing the final effect, and does not require the events to actually occur in that order, or for an actual call to the other function to occur. [2] Nevertheless, there is a sequence point immediately before the function is called (as specified by subclause 6.3.2.2), and immediately before it returns. [3] Example The call: int i; (printf) ("%n %n", &i, &i) invokes undefined behaviour, because it assigns to i twice between the same pair of sequence points. Even though printf is defined in terms of calls to putc(), it is not required for such a call actually to occur, nor for there to be a sequence point before and after outputting the space. There was discussion on this item at London, but no resolution. Item 8 [Was N735 item 10] ------------------ Locales are currently treated as extremely opaque. It is not possible to determine whether two locales are equivalent in a category. It is not even sensible to compare locale strings for equality; the string returned need not be the same as the string passed in, even if it was also the string returned from a previous call. That is: char *loc; char copy_loc [LARGE_ENOUGH]; loc = setlocale (LC_COLLATE, "C"); if (strcmp (loc, "C") != 0) do_something (); // This can happen assert (strlen (loc) < LARGE_ENOUGH); strcpy (copy_loc, loc); loc = setlocale (LC_COLLATE, "C"); if (strcmp (loc, copy_loc) != 0) do_something (); // This can happen I realize that most systems store most locales in files, and therefore comparing for functional equality is not as simple as it might seem. However, I would recommend the following as a minimum: (1) Add to 7.5.1.1 (setlocale()) paragraph 8: Furthermore, if this string value is passed to the setlocale function with the same category, the result shall be the same string value. (2) Add either a function to compare two locale strings for functional equivalence in a category, or a function to compare a locale string with the current locale in a category. Functional equivalence is defined as: No behaviour defined in clause 7, other than the result of the setlocale function, changes as a result of changing the locale. Note that "strictly conforming" is not a good term to use in any comparison. Item 9 [Was N735 item 11] ------------------ The localeconv() function discusses monetary and non-monetary formatting, especially the former, but provides no easy way to implement it. The natural place to do this is the printf() family of functions. Therefore add to 7.13.6.1 (fprintf()): Flag , (comma): for d, i, o, u, x, X, f, F, e, E, g, G, a, and A conversions, the output shall be grouped in accordance with the /thousands_sep/ and /grouping/ fields of the locale. For other conversions, the behaviour is undefined. Format or flag $ (dollar): [It is unclear whether this is better as a flag or a format.] Generate a formatted monetary quantity. If it is a format, the argument is a double (or long double if L is included). The plus and space flags act as if the output already included a sign (even if it does not). The # flag specifies international formatting. The minus and zero flags can be used. If no precision is specified, the value of /frac_digits/ or /int_frac_digits/ from the current locale is used; if that is CHAR_MAX, the precision is unspecified. [If it is a flag, this would overrule the normal meaning of the precision.] Issues: [comma] Should there be a mechanism to allow the grouping to depend on the format (e.g. decimal output grouped in threes, hex output grouped in fours) ? I am informed that there are circumstances where the /thousands_sep/ character is different for each grouping. For example, a notation commonly used in Japan (particularly in newspapers) places characters meaning "myriad", "hundred million", "billion" and so on between the groups. This would require changing the separator to be a list of strings, and providing a convention to indicate this (for example, using CHAR_MAX as the first byte of the string). [dollar] I've used the normal rule that the specified precision overrides the default. An alternative would be that the precision applies only if the locale-specified value is CHAR_MAX. Which is preferable, or should there be a way to choose ? If $ is a flag and is used with %d, should it scale the value to the appropriate number of fractional digits ? For example, "%$6.2d" might indicate that the integer is to be printed in /ddd.dd/ form, with 12345 being printed as "123.45". Should %$d and %$i behave differently in this case ? If $ is a format, should there be an equivalent for integral types ? Since this proposal was drafted, it has been pointed out to me that any use of $ will conflict with the X/Open mechanisms, which use descriptors of the form "%1$d", "$*2$3$d", and "%*6$.*5$4$d". Item 10 [Was N735 item 14] ------------------ The Standard is somewhat unclear about the details of stdio buffering. For example, considering output (the analogous situation happens with input) a call to fputc() can have one of the following effects: (1) the character is sent to the underlying system; (2) the character is written to a buffer; (3) the character is written to a buffer and then a number of characters are sent to the underlying system from the buffer; (4) a number of characters are sent to the underlying system from a buffer, and then the character is written to the buffer. In case (1), failure can be reported in a straightforward manner, and it can be assumed that case (2) never fails. The question is: what will happen if cases (3) or (4) have a failure during the output, but not directly as a result of that character (that is, the error occurs earlier on in the buffer) ? The present wording of the Standard implies that an error in outputting a character can only be reported on that call to fputc(), and not on any subsequent call. This needs to be changed, or buffering becomes a nonsense - the implementation would be required to *predict* whether a write will succeed. A suitable location is 7.13.3, and the wording needs to say something along the following lines: If output is buffered, then it may be transmitted to the host environment at any subsequent call to fputc(), and shall be transmitted no later than the next fflush() call or when the stream is closed. Thus a call to fputc() may fail and set the error indicator on the stream because of the earlier output. Similarly, if input is buffered, a call to fgetc() may cause the error indicator to be set even though the same call on an unbuffered stream would not (because the error is associated with a later character in the input). Even if the data is successfully transmitted to the host environment, it is possible for an error to occur within the latter. If this happens after the stream has been closed, it can not be reported to the application; if it occurs earlier, it is implementation-defined when it is so reported. A secondary issue is: can the buffer be sent to the underlying system other than within a call to fputc(); is asynchronous I/O permitted ? If so, then: When a stream is buffered, characters may be transmitted to or from the host environment other than as part of a library function, and thus the error indicator for the stream may be set outside such a function (the indicator can only be cleared as part of a function that explicitly states it does so). Item 11 [Was N735 item 15] ------------------ Is there a need to provide a way to make the three standard streams be binary, in the same way that they can already be made wide ? Without it, there's no strictly-conforming way to write "cat". Even with it there is the trailing zero byte problem. Item 12 [Was N735 item 16] ------------------ There is no way to determine whether two fpos_t values represent the same position in a file. Therefore, it is not possible to do the following: open a file read through it, looking for some mark note the position using fgetpos() rewind read through it again to the same position, using calls to fgetpos() to determine where you are, rather than recalculating it I suggest the following function be added to subclause 7.13.10: struct fcmppos fcmppos (fpos_t* a, fpos_t* b, FILE *stream) Compares two fpos_t values that refer to the given stream; if either argument is a null pointer, the result of a call to fgetpos() on the stream is used instead. The resulting structure contains at least the following fields: int before; // Less than, equal to, or greater than zero according // to whether /a/ is before, at the same location as, // or after /b/ in the file. int mbstate; // Zero if the two positions have the same multibyte // parsing status. If the stream has been written to at any point before the later of the two positions, the behaviour is undefined. Item 13 [Was N735 item 19] ------------------ The specification of the comparison functions for bsearch() and qsort() (7.14.5.1 and 7.14.5.2) is insufficient to safely code them. In particular, it does not address the following issues. (1) Are the pointers to objects within the base array (or the key object), or can they be to copies ? (2) Can the comparison alter the values of the pointed-to objects ? (3) If so, does the alteration persist ? (4) What are the requirements on the consistency of the comparison results ? I propose that comparisons are not allowed to alter the values, and therefore that the implementation can pass pointers to copies of the objects. [This, of course, invalidates an item in one of my articles in CUJ :-] Therefore add the following immediately after the heading of 7.14.5 (there is currently no text between that and the heading of 7.14.5.1). [1] These utilities make use of a comparison function. This shall behave in the following way. [2] The implementation shall ensure that the second argument (when called from /bsearch/), or both arguments (when called from /qsort/), shall be pointers to an element of the array, or to a copy of such an element. The first argument when called from /bsearch/ shall equal /key/. The function shall make its comparison based on the pointed-to objects, and not the specific addresses passed to it. [3] The comparison function shall not alter the contents of the array. The implementation may reorder elements of the array between calls to the comparison function, but shall not alter the contents of any individual element. [4] When the same object (consisting of /size/ bytes, irrespective of its current position in the array) is passed more than once to the comparison function, the results shall be consistent with one another. That is, for /qsort/ they shall define a total ordering on the array, and for /bsearch/ the same object shall always compare the same way with the key. [5] A sequence point occurs immediately before and immediately after each call to the comparison function, and also between any call to the comparison function and any movement of the objects passed as arguments to that call. If it is felt desirable that the pointers *shall* always point into the array, then replace paragraph [2] above by: [2] The implementation shall ensure that the second argument (when called from /bsearch/), or both arguments (when called from /qsort/), shall be pointers to elements of the array [*]. The first argument when called from /bsearch/ shall equal /key/. [*] That is, if the value passed is /p/, then the following expressions are always non-zero: ((char *) p - (char *) base) % size == 0 (char *) p >= (char *) base (char *) p < (char *) base + nmemb * size Item 14 [Was N720 DR 063] ----------------- What is the required precision of floating point calculations ? Item 15 [Was N720 DR 087] ----------------- The issue of sequence points, parallel evaluation, and so on still needs to be faced squarely. It isn't easy [example: x = f (x++)]. Item 16 [Was N739 item 1] ----------------- The term "access" is not well defined. From context, it sometimes appears to mean "read the value", and sometimes "read or write the value". This ambiguity sometimes makes it hard to understand what is actually meant. There needs to be a definition in clause 3, and all uses of the term need to be checked for the read-only / read-write problem. Probably the best approach is to define it as "read or write", and to find and fix the places where "read" is meant. An example of the "read" usage is 6.3.2.3 paragraph 5: With one exception, if a member of a union object is accessed after a value has been stored in a different member of the object, the behaviour is implementation-defined. where writing is clearly meant to be excluded. An example of the "read or write" usage is 6.3 paragraph 6: ... If a value is stored into an object ... the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses ... where writing is clearly meant to be included. An example where this causes problems with interpreting the Standard is 6.5.3. Paragraph 11 reads: A reference to a value means either an access to or a modification of the value. So "access" presumably means read, but not write. But then paragraph 6 reads: What constitutes an access to an object that has volatile-qualified type is implementation-defined. So what constitutes a write to a volatile object is *not* implementation- defined ? There are other instances; this is the first one that comes to mind.