Public Comment Number PC-UK0257 ISO/IEC CD2 9899 (SC22N2794) Public Comment =========================================== Date: 1998-10-26 Author: Geoffrey Keating Author Affiliation: Self Postal Address: PO Box 2043 Woden ACT 2606 Australia E-mail address: Telephone Number: +61 4 1144 6878 Fax Number: +61 2 6290 2349 Category: Feature that should be included Committee Draft subsection: 7.25.2.1, 7.25.3. Title: ISO10646 to/from wchar_t conversion functions. Often programs that manipulate C source code are themselves written in C. The purpose of these changes is to make it easier for such programs to handle universal character names, specified in input files not source files, portably. They can also be used for interpreting data files and suchlike, although the preferred way to do this is to use the appropriate locale; thus, there is no functionality for converting several wide characters at a time. The mapping functions below could be implemented by writing wchar_t u2wc[] = { L'\U00000000', L'\U00000001', L'\U00000002', ... } wint_t toiso10646wc(long iso10646) { return u2wc[iso10646]; } and the reverse for toiso10646wc, except that implementation limits will usually prohibit such a large array. The functions can be trivially defined to return -1 or WEOF always, although this is not recommended. This can happen, for instance, if the wide character set in use does not have any characters which have known equivalents in ISO10646. It may happen that even if a wide character does have an equivalent in ISO10646, that it is unreasonable for the runtime library to know about it, and in such cases the functions may return -1 or WEOF (this is a quality-of-implementation issue). The names of the functions are chosen to not tread on anybody's namespace. `long' is chosen because int_fast32_t need not be defined by wctype.h. I would have used (long)WEOF instead of -1 as the error return for towciso10646, but (long)WEOF might be a valid result: for instance, wchar_t is 64 bits, WEOF is 0xFFFFFFFF00000000ll, long is 32 bits. These changes apply to the committee draft of August 3, 1988. Add after section 7.25.2.1.11 "The iswxdigit function": 7.25.2.1.12 The iswiso10646 function Synopsis #include int iswiso10646(wint_t wc); Description The /iswiso10646/ function tests for those characters for which /towciso10646/ would not return -1. Add to section 7.25.2.2.1 "The iswctype function": iswctype(wc, wctype("iso10646")) // iswiso10646(wc) Add after section 7.25.3.2.2 "The wctrans function": 7.25.3.3 Wide-character ISO10646 mapping functions The function /towciso10646/ and the function /toiso10646wc/ convert wide characters to and from ISO10646 code points. 7.25.3.3.1 The towciso10646 function Synopsis #include long towciso10646(wint_t wc); Description The /towciso10646/ function returns the ISO10646:1993 code point corresponding to /wc/, or -1. If /towciso10646/ does not return -1, then /toiso10646wc(towciso10646(wc))/ returns /wc/. Recommended Practise /towciso10646(L'\Unnnnnnnn')/ returns /0xnnnnnnnnl/ when /\Unnnnnnnn/ is a universal character name that corresponds to a wide character. /towciso10646/ does not return -1 for wide characters corresponding to those required in the basic execution character set. 7.25.3.3.2 The toiso10646wc function Synopsis #include wint_t toiso10646wc(long iso10646); Description The toiso10646wc function returns the wide character corresponding to the ISO10646:1993 code point /iso10646/, or /WEOF/. If /toiso10646wc/ does not return /WEOF/, then /towciso10646(toiso10646wc(iso10646))/ returns /iso10646/.