5.2.1.2 Multibyte characters

Previous Table of Contents "New C Standard" commentary

238 The source character set may contain multibyte characters, used to represent members of the extended character set.

239 The execution character set may also contain multibyte characters, which need not have the same encoding as for the source character set.

240 For both character sets, the following shall hold:

241 —  The basic character set shall be present and each character shall be encoded as a single byte.

242 —  The presence, meaning, and representation of any additional members is locale-specific.

243 —  A multibyte character set may have a state-dependent encoding, wherein each sequence of multibyte characters begins in an initial shift state and enters other locale-specific shift states when specific multibyte characters are encountered in the sequence.

244 While in the initial shift state, all single-byte characters retain their usual interpretation and do not alter the shift state.

245 12) The trigraph sequences enable the input of characters that are not defined in the Invariant Code Set as described in ISO/IEC 646, which is a subset of the seven-bit US ASCII code set.

246 The interpretation for subsequent bytes in the sequence is a function of the current shift state.

247 —  A byte with all bits zero shall be interpreted as a null character independent of shift state.

248A byte with all bits zero shall not occur in the second or subsequent bytes of a Such a byte shall not occur as part of any other multibyte character.

249 For source files, the following shall hold:

250 —  An identifier, comment, string literal, character constant, or header name shall begin and end in the initial shift state.

251 —  An identifier, comment, string literal, character constant, or header name shall consist of a sequence of valid multibyte characters.

Next

Created at: 2008-01-30 02:39:39 The text from WG14/N1256 is copyright © ISO