Public Comment Number PC-UK0083 ISO/IEC CD 9899 (SC22N2620) Public Comment =========================================== Date: 1998-02-25 Author: N.M Maclaren Author Affiliation: Self Postal Address: University of Cambridge, Computer Laboratory, New Museums Site, Pembroke Street, Cambridge CB3 3QG, United Kingdom E-mail Address: Telephone Number: +44 1223 334761 Fax Number: +44 1223 334679 Number of individual comments: 1 Comment 1. Category: Inconsistency Committee Draft subsection: 5.2.1.2 Title: Multibyte characters and C89/C9X changes Detailed description: This appears to be a relic of C89, as far as the source character set is concerned, and is utterly baffling in the context of C9X. Here are some of the problems: 1) Paragraph 1 refers to multibyte characters in the source character set, but 5.1.1.2 bullet one refers to physical source file multibyte characters being mapped to members of the source character set. 2) The second bullet says that that the presence, meaning, and representation of any additional characters is locale-specific. But locale is an execution concept! I cannot find anything anywhere else in the standard that describes the concept of locale during compilation. 3) The fourth bullet says that a byte with all bits zero shall be interpreted as a null character, but the source character set is not required to include a null character (5.2.1. paragraph 2). 4) Paragraph 2 describes how multibyte characters must fit within various syntactic objects. But tokenisation does not occur until translation phase 3, and multibyte characters are mapped to universal character names in phase 1! 5) And, in any case, at what state should this be true? After phase 3, or after phase 4? Token concatenation and stringisation could cause trouble here, especially if a multibyte character in an identifier changed the shift state (ugh). I suggest replacing paragraph 1 by: 1. The source may be encoded using multibyte characters, used to represent members of the extended character set. The execution character set may also contain multibyte characters, which need not have the same encoding as for the source. For the execution character set, the following shall hold: Paragraph 2 should be replaced by: Recommended practice If the source is encoded using multibyte characters, a representation compatible with some conforming multibyte execution character set should be used.