SC22/WG14 N764 J11/97/128 Issues about time Clive D.W. Feather clive@demon.net 1997-09-22 Abstract -------- N735 contained a number of items (20 to 25) concerning the time-related functions. In addition, N733 added four new conversion specifiers to strftime(). Since discussion in the ISO-8601 community shows that some of these changes were flawed, and since all these items are related, this paper attempts to address all these issues at once. Discussion - ISO 8601 weeks --------------------------- ISO 8601 specifies the concept of a week number in the year and the day within the week. Weeks always begin on a Monday, so, for example, Wednesday 12th January 1997 is the third day of week two of 1997 which, in ISO format, is "1997-W02-3" or "1997W023". The first week of the year is specified to be that containing January 4th or, equivalently, that containing the first Thursday of January. However, ISO 8601 does not explicitly show how to indicate days in January before week 1 of the year, or days in December that are in the same week as week 1 of the next year. For example, in 1999 week 1 starts on Monday 4rd January, and so there is an issue as to how to express January 1st to 3rd; similarly, in 1998 week 1 includes Thursday 1st January, and so there is an issue as to how to express the last three days of 1997. The changes in N733 assumed that dates always belong to the current year. However, current practice among users of ISO 8601 is to give every day of a week the same week and year number. Thus we see the following: Date N733 Current practice 1998-12-31 1998-W53-4 1998-W53-4 1999-01-01 1999-W00-5 1998-W53-5 1999-01-02 1999-W00-6 1998-W53-6 1999-01-03 1999-W00-7 1998-W53-7 1999-01-04 1999-W01-1 1998-W01-1 1997-12-28 1997-W52-7 1997-W52-7 1997-12-29 1997-W53-1 1998-W01-1 1997-12-30 1997-W53-2 1998-W01-2 1997-12-31 1997-W53-3 1998-W01-3 1998-01-01 1998-W01-4 1998-W01-4 Current practice also uses different letters for the specifiers. Given this, the changes of N733 should be altered. Proposals --------- Part A ------ In subclause 7.16.3.5 (strftime()), paragraph 3 (the list of specifiers): * Change the item %f to be %u (wording unaltered). * Change the wording of %V to be: %V is replaced by the ISO 8601 week number (see below) as a decimal number (01-53). * Add the following items: %g is replaced by the last 2 digits of the week-based year (see below) as a decimal number (00-99). %G is replaced by the week-based year (see below) as a decimal number (e.g. 1997). * Add the following text at the end of the list: %g, %G, and %V give values according to the ISO 8601 week-based year. In this system, weeks begin on a Monday and week 1 of the week is the week that includes both January 4th and the first Thursday of the year. If the first Monday of January is the 2nd, 3rd, or 4th, the preceeding days are part of the last week of the preceeding year; thus Saturday 2nd January 1999 will have %G == 1998 and %V == 53. If December 29th, 30th, or 31st is a Monday, it and any following days are part of week 1 of the following year. Thus Tuesday 30th December 1997 will have %G == 1998 and %V == 1. Part B ------ In subclause 7.16.3.5 (strftime()), paragraph 3 (the list of specifiers), change the wording of the following items to avoid confusion (e.g. 2000 is in the 20th century but 2001 is in the 21st): %y is replaced by the last 2 digits of the year as a decimal number (00-99). %Y is replaced by the whole year as a decimal number (e.g. 1997). Part C ------ [Was N735 item 23] In subclause 7.16.1, change the range of tm_sec to [0,60] and remove footnote 243. See various WG14 mailing list items (e.g. 3482) or: # The International Earth Rotation Service periodically uses leap seconds # to keep UTC to within 0.9 s of TAI (atomic time); see # Terry J Quinn, The BIPM and the accurate measure of time, # Proc IEEE 79, 7 (July 1991), 894-905. Part D ------ [Was N735 item 20] Some specifiers of strftime() generate a number, and this usually has a known width and is zero filled (though %Y is potentially an exception). However, the other specifiers generate items of an unknown width. While it is possible to expand the value into a string and then then manipulate it, this is an inconvenient approach. In subclause 7.16.3.5 (strftime()), paragraph 2, change: A conversion specifier consists of a % character followed by a character that determines the behavior of the conversion specifier. to: A conversion specifier consists of a % character followed by a character that determines the behavior of the conversion specifier, possibly separated by various modifiers. Add the following paragraph between paragraphs 2 and 3: Any or all of the following may occur between the % character and the letter of a conversion specifier (they are not permitted for %%). Those that appear must be in this order: * A minus sign, indicating that any padding (see below) is to be on the right, not the left. * A field width, as a decimal integer. If the replacement string has fewer characters than the field width, it is padded with spaces on the left (right if a minus sign was used). * A dot followed by a precision, as a decimal integer. * if the specifier produces a decimal number which contains more characters than the precision, then sufficient leading zeros (if available) are removed from the replacement string until the precision is reached; * otherwise, if the replacement string contains more characters than the precision, then only that number of characters are placed in the array, taken from the left end of the replacement string. Part E ------ [Was N735 item 22] The only facilities for generating the time zone are a locale-specific specifier (%z) in strftime(). However, zone names are not standardised, and there are two common numeric formats which give the offset from UTC: ISO 8601 and Internet common practice. Both use the notation "+0830" to mean an offset of 8 hours 30 minutes, but the signs differ: ISO 8601 uses + for east of Greenwich, while Internet common practice uses it for west of Greenwich. Add the following conversion specifiers to subclause 7.16.3.5 (strftime()) paragraph 3: %o is replaced by the offset from UTC in the form "+0830" (meaning 8 hours 30 minutes behind). This format is common on the Internet. %O is replaced by the offset from UTC in the form "-0830" (meaning 8 hours 30 minutes behind). This is the ISO 8601 format. Part F ------ [Was N735 item 24] Subclause 7.16.3.5 (strftime()) is unclear on how the values of the members of /timeptr/ affect the result, especially if they are outside the normal range. Add one of the following sets of wording, in each case after paragraph 4: Option [Fa]: If the value of any member of the structure pointed to by /timeptr/ is out of the normal range, or the values are not consistent with one another [*], the behaviour is undefined. [*] For example, the contents represent "30th Feb", "29th Feb 1997", or "Monday 10th May 1997". Option [Fb]: If the value of any member of the structure pointed to by /timeptr/ is out of the normal range, or the values are not consistent with one another [*], the value returned and the contents of the array are unspecified. [*] For example, the contents represent "30th Feb", "29th Feb 1997", or "Monday 10th May 1997". Option [Fc]: The characters placed in the array by each conversion specifier depend on a member of the structure pointed to by /timeptr/, as specified in brackets in the description. If this value is outside the normal range, the characters stored are unspecified. If option [Fc] is taken, add the following to each specifier in paragraph 3: %a [tm_wday] %A [tm_wday] %b [tm_mon] %B [tm_mon] %c [all specified in 7.16.1] %d [tm_mday] %H [tm_hour] %I [tm_hour] %j [tm_yday] %m [tm_mon] %M [tm_min] %p [tm_hour] %S [tm_sec] %U [tm_year, tm_wday, tm_yday] %w [tm_wday] %W [tm_year, tm_wday, tm_yday] %x [all specified in 7.16.1] %X [all specified in 7.16.1] %y [tm_year] %Y [tm_year] %Z [tm_isdst] If part A is accepted, add: %g [tm_year, tm_wday, tm_yday] %G [tm_year, tm_wday, tm_yday] %u [tm_wday] %V [tm_year, tm_wday, tm_yday] If part E is accepted, add: %o [tm_isdst] %O [tm_isdst] If part H is accepted, then %o, %O, and %Z become %o [tm_utcoffset, tm_isdst, tm_xisdst]. %O [tm_utcoffset, tm_isdst, tm_xisdst]. %Z [tm_utcoffset, tm_isdst, tm_xisdst]. Part G ------ [Was N735 item 25] Those conversion specifiers in subclause 7.16.3.5 (strftime()) that generate variable strings should have values specified for the C locale. Add at the end of the subclause: In the C locale the replacement strings for the following specifiers are: %a the first three characters of %A %A one of "Sunday", "Monday", ..., "Saturday" %b the first three characters of %B %B one of "January", "February", ..., "December" %c equivalent to "%A %B %d %T %Y" %P one of "am" or "pm" %x equivalent to "%A %B %d %Y" %X equivalent to "%T" %Z implementation-defined Part H ------ [Was N735 item 21] The conversion carried out by localtime() does not provide any way of determining the time zone used, and the normalization done by mktime() does not specify how DST changes are handled. Similarly, many systems are now aware of leap seconds, but the Standard is not clear on how these are to be handled. Adding this information is not trivial, because there is no obvious way to extend /struct tm/ in a compatible manner. This proposal therefore contains a kludge. [The following is not final wording, as I wanted to see agreement on the semantics before trying to craft them. Given that time is short, I will attempt to produce final wording if I have the opportunity.] Add the following fields to struct tm: int tm_version; /* version number of the structure layout */ int tm_utcoffset; /* offset from UTC in minutes - [-1439, +1439] */ int tm_leapsecs; /* leap seconds applied */ int tm_xisdst; /* daylight saving time flag - [-1, +1439] */ and add the following macros to , all constant integral expressions capable of being stored in an object of type int: _EXTENDED_TM _NO_LEAP_SECONDS _LOCALTIME The gmtime() function shall set tm_utcoffset to 0, while the localtime() function shall set it according to the local time zone, including any DST corrections; a positive value for tm_utcoffset indicates ahead of UTC, so that PDT is represented by -420. If the implementation is unable to determine the local zone, localtime() shall set this field to _LOCALTIME and gmtime() shall fail. Both functions shall set tm_isdst to represent whether DST is (believed to be) in effect at the represented time, and tm_xisdst to -1, 0, or the (positive) size of the DST offset, in minutes, according as whether tm_isdst is less than, equal to, or greater than zero. Both functions shall set tm_leapsecs to indicate the number of leap seconds that have been applied to the resulting value (if tm_sec == 60, the relevant leap second is *not* included in the count). If the implementation is not aware of leap seconds, it shall set tm_leapsecs to _NO_LEAP_SECONDS. Both functions shall set tm_version to 1. The mktime() function shall behave as follows. If the tm_isdst field is equal to _EXTENDED_TM, then the tm_version field shall be 1. The broken down time is normalized according to the following rules, and also converted to a time_t representation. If the call is successful, a second call to mktime() with the resulting struct tm value shall always leave it unchanged and return the same value as the first call. If the call is successful and the normalized time is exactly representable as a time_t value, then the normalized broken-down time, and the broken-down time generated by converting the result of mktime() as if by a call to localtime(), shall be identical except that, if the tm_isdst member of the former originally had the value _EXTENDED_TM, it shall remain unchanged. A time is normalized according to the following rules. The principle behind normalization is that the date is converted to a number of seconds past some epoch, and then converted back to the correct normalized form. If the tm_isdst member does not equal _EXTENDED_TM, then the rules shall be applied as if: - tm_leapsecs is _NO_LEAP_SECONDS; - tm_utcoffset is _LOCALTIME; - tm_xisdst is -1, 0, or +60 according to whether tm_isdst is less than, equal to, or greater than zero. All dates are in the Gregorian calendar. Thus a value of -800 for tm_year represents 1100 CE, while a value of -2000 represents -100 CE (99 BCE); neither are leap years, while -2300 (-400 CE, 399 BCE) is. The value of tm_leapsecs is the number of leap seconds applied (the value of UTC-UT0) at the represented time. It should therefore be added to the value determined by (days*86400 + hours*3600 + mins*60 + seconds). If the value is _NO_LEAP_SECONDS, then the implementation should determine the correct number if it can, and use 0 otherwise. The value of tm_utcoffset is a number of minutes to be subtracted from the time to convert it to UTC. The value _LOCALTIME is a request for the implementation to determine this; if it is unknown, it should assume that local time is UTC plus any DST offset determined from tm_xisdst. If tm_mon is outside the range [0, 11], it shall be converted to that range by adding or subtracting a multiple of 12 and adjusting the year accordingly. This shall then be used to determine the number of days in the year prior to the month. Thus tm_year == 97 and tm_mon == -8 represents May of 1996, a leap year. Apart from this, the final date can be determined simply by adding together the various fields, each with a suitable weight, to get the number of seconds past the epoch. The normalization should be exact provided that there is no unreasonable overflow. I would consider reasonable limitations to be that each of the following expressions are in the range [-1<<30,+1<<30]: tm_year * 366 tm_mon * 31 tm_mday tm_hour * 3600 tm_min * 60 tm_sec tm_leapsecs tm_utcoffset * 60 tm_xisdst * 60 [if nonnegative, else tm_xisdst must be -1] This would ensure that separate "seconds in the day" and "days since epoch" calculations won't overflow in 32 bits.