5.2.1 Character sets

Previous Table of Contents "New C Standard" commentary

214 Two sets of characters and their associated collating sequences shall be defined: the set in which source files are written (the source character set), and the set interpreted in the execution environment (the execution character set).

215 Each set is further divided into a basic character set, whose contents are given by this subclause, and a set of zero or more locale-specific members (which are not members of the basic character set) called extended characters.

216 The combined set is also called the extended character set.

217 The values of the members of the execution character set are implementation-defined.

218 In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or by escape sequences consisting of the backslash \ followed by one or more characters.

219 A byte with all bits set to 0, called the null character, shall exist in the basic execution character set;

220 it is used to terminate a character string.

221 Both the basic source and basic execution character sets shall have the following members: the 26 uppercase letters of the Latin alphabet

        A  B  C  D  E  F  G  H  I  J  K  L  M
        N  O  P  Q  R  S  T  U  V  W  X  Y  Z

the 26 lowercase letters of the Latin alphabet

        a  b  c  d  e  f  g  h  i  j  k  l  m
        n  o  p  q  r  s  t  u  v  w  x  y  z

the 10 decimal digits

        0  1  2  3  4  5  6  7  8  9

the following 29 graphic characters

        !  "  #  %  &  '  (  )  *  +  ,  -  .  /  :
        ;  <  =  >  ?  [  \  ]  ^  _  {  |  }  ~

the space character, and control characters representing horizontal tab, vertical tab, and form feed.

222 The representation of each member of the source and execution basic character sets shall fit in a byte.

223 In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.

224 In source files, there shall be some way of indicating the end of each line of text;

225 this International Standard treats such an end-of-line indicator as if it were a single new-line character.

226 In the basic execution character set, there shall be control characters representing alert, backspace, carriage return, and new line.

227 If any other characters are encountered in a source file (except in an identifier, a character constant, a string literal, a header name, a comment, or a preprocessing token that is never converted to a token), the behavior is undefined.

228 A letter is an uppercase letter or a lowercase letter as defined above;

229 in this International Standard the term does not include other characters that are letters in other alphabets.

230 The universal character name construct provides a way to name other characters.

231 Forward references: universal character names (6.4.3), character constants (6.4.4.4), preprocessing directives (6.10), string literals (6.4.5), comments (6.4.9), string (7.1.1).

Next

Created at: 2008-01-30 02:39:39 The text from WG14/N1256 is copyright © ISO