Public Comment Number PC-UK0118 ISO/IEC CD 9899 (SC22N2620) Public Comment =========================================== Date: 1998-02-25 Author: N.M Maclaren Author Affiliation: Self Postal Address: University of Cambridge, Computer Laboratory, New Museums Site, Pembroke Street, Cambridge CB3 3QG, United Kingdom E-mail Address: Telephone Number: +44 1223 334761 Fax Number: +44 1223 334679 Number of individual comments: 1 Comment 1. Category: Normative change to intent of existing feature Committee Draft subsection: 6.5.3, 6.5.3.1, 6.7.1 Title: Enhancements to restrict for optimisation Detailed description: I think this can be made to work, and will be of immense advantage to many people, but there are still some very serious problems to resolve. As the current proposal stands, I am afraid that the facility will be of limited use for optimisation on a good many modern machines. The four most serious problems that I have seen are the following, and I have included suggestions to improve all of them. 1) Arrays have always been slightly second-class objects in C, though C9X makes them almost into first-class ones, but this is one area that they remain firmly second-class. For example, the term in 6.7.1 Function definitions paragraph 10 "On entry to the function" has caused some confusion in C89, but becomes extremely serious in the context of VLAs and restrict. Inter alia, it means that there is no way to declare an array parameter with bounds that can be relied on or even to guarantee that an apparently array argument is not a null pointer. The main reason this problem is so serious is that modern RISC CPUs are a hundred times as fast as their memory, and an increasing number of such architectures are introducing various forms of operand preloading to cope with this problem. To do this efficiently, a compiler needs to know both the declared size of an array and that it is valid on function entry. This can produce much better code than if it has to delay preloading until an actual reference through the argument, and this sort of optimisation is fairly standard in Fortran. I believe that it is fairly easy to make restrict-qualified array arguments safe for optimisation (though probably not debugging). It could be done for all arguments, but there are almost certainly many programs that rely on the fact that array bounds are not checked in C. Yes, I know about the wording in sections 6.3.6 and elsewhere, but they are not watertight and hence cannot be used for optimisation. 2) There is a horrible problem introduced by 6.5.3 Type qualifiers paragraph 7, and (I am afraid) described incorrectly in 6.5.3.1 Formal definition of restrict example 3. Consider the following variation of that example: extern int * s = NULL; void h (int n, int * const restrict p, const int * const q, const int * const r) { int i; *s = 0; for (i = 0; i < n; ++i) *s += (p[i] = q[i]+r[i]); } double a[5], b[5], c; s = &c; fred(5,a,b,b); s = &b[3]; fred(5,a,b,b); Unless I am much mistaken, this is conforming. It is certainly completely unoptimisable, unless the compiler can work out what the contents of s are. The following example is equally bad on many systems. extern int func(int,int); void h (int n, int * const restrict p, const int * const q, const int * const r) { int i; for (i = 0; i < n; ++i) p[i] = func(q[i],r[i]); } Firstly, these examples show that plastering arguments with const does not really help the compiler very much when it comes to optimisation, because it doesn't prevent update through any other pointer. Secondly, it shows that the restrict qualifier does very little to help with indicating that immutable arrays are actually immutable - all it does say is that a particular restrict-qualified parameter will not update another argument. There are two conditions actually needed for optimisation. One is that a modified object is not accessed through any other pointer, and restrict does this. The other is to know that an unmodified object is immutable during the life of the pointer, and the current specifications of restrict and const don't help much. The latter can be solved by noting that there is no need to forbid multiple accesses to restrict-qualified objects where no accesses modify the object. 3) 6.5.3.1 Formal definition of restrict paragraph 4 refers to "the array object that is determined dynamically by all references". Precisely what does this mean? Consider the following code fragment - for simplicity, assume that 'sizeof(double) > 1': void fred (restrict double a[3], restrict double b[3]) { ((char *)b)[sizeof(double)+2] = ((char *)a)[sizeof(double)+1]; a[0] = b[2]; } double array[3]; fred(array,array); Obviously, I can produce far more evil examples, including ones where the definition of the array objects depends on implementation-defined or unspecified (but not undefined) behaviour. I am afraid that a precise definition is needed. My proposal STILL does not handle cases like the following very well: void fred (restrict char *a, char *c) { double *b = a; ((char *)b)[5] = 0; *c = 1; } double z[10]; fred(z,&a[6]); But that is thoroughly perverse. Yes, I know that it is both legal and common, but my view is that such code is beyond hope and I don't care if it runs like a drain. The best that we can hope for is to prevent it from causing syntactic or semantic problems, which the current wording does not. 4) Consider the following: extern volatile int fred; void joe (const int * const volatile restrict a) {...} joe(&fred); Unless I have lost my marbles, that destroys the whole point of restrict. The question is whether it is worth bothering about. My view is probably not, on the grounds that C does not protect the fool from his folly, but it is debatable. It could be locked out very easily. The following are my suggestions to improve these problems. I don't guarantee that they are watertight, but they are as good as I can make them. 6.5.3 Type qualifiers. The constraint is a change from the approach taken in C89, and one that prevents several very important optimisations. So paragraph 2 should be changed to: 2 Types other than array types in function declarations or pointer types derived from object or incomplete types shall not be restrict qualified. If the combination of restrict and volatile were to be locked out, the following could be added: No type shall be both restrict and volatile qualified. The following should be added to the end of the second sentence of paragraph 7: ..., except in the case where no reference modifies any part of the object. 6.5.3.1 Formal definition of restrict We need to produce a precise definition of the object A. Also, in order to allow preloading, we need to make array bounds binding. There is some wording in 6.3.6 Additive operators paragraph 8, but I don't think that it is sufficient for this purpose. So, add after paragraph 3: The array object A determined dynamically by references though a pointer object P of type T * is defined in the following way. Firstly ignore all qualifications in T * and, if T * then becomes void *, treat P as if T * were unsigned char *. Then, consider all pointer expressions E based on P evaluated during the execution of B, convert them to unsigned char *, and denote the smallest such value by Q and the largest R. If an object is accessed, then Q and R are calculated as if each byte of the object and the byte immediately following the end of the object were addressed by separate pointer expressions of type unsigned char *. Lastly, let M and N be the largest and smallest integers, such that (unsigned char *)&P[M] <= (unsigned char *)Q and (unsigned char *)&P[N] >= (unsigned char *)R. If either M or N is not defined because one of (unsigned char *)&P[M] and (unsigned char *)&P[N]-1 would be outside the limits of the originally allocated object into which P points, the behaviour is undefined. If P is declared as a pointer parameter, A is the array with element type T and size N-M whose first element has address &P[M]. If P is declared as an incomplete array parameter and M is less than 0, the behaviour is undefined; otherwise A is the array with element type T and size N whose first element has address &P[0]. If P is declared as a a complete array parameter (whether variably modified or not) of size S and M is less than 0 or N is greater than S, the behaviour is undefined; otherwise A is the array with element type T and size S whose first element has address &P[0]. The second sentence of paragraph 4 should be changed to: Then either all references to values of A shall be through pointer expressions based on P, or no reference to A (through expressions based on P or otherwise) shall modify any part of its contents during the execution of B. Example 3 should then become: void h (int n, int * p, const int * restrict q, const int * restrict r) { int i; for (i = 0; i < n; ++i) p[i] = q[i]+r[i]; } shows how const can be used in conjunction with restrict. The const qualifiers imply, without the need to examine the body of h, that the values of any object accessed through q and r cannot change during the execution of h. This is the precise assertion needed to optimise the loop. Note that p does not need to be restrict qualified in this example, but may allow better optimisation in a more complicated function. 6.7.1 Function definitions The other requirement for preloading is to know that an argument is not a null pointer. The following should be added after paragraph 9: 9 Upon entry to a function, if the actual argument corresponding to a restrict qualified array parameter is a null pointer, the behaviour is undefined.