Translation phases

Previous Table of Contents "New C Standard" commentary

115 The precedence among the syntax rules of translation is specified by the following phases.5)

116 1. Physical source file multibyte characters are mapped, in an implementation-defined manner, to the source character set (introducing new-line characters for end-of-line indicators) if necessary.

117 Trigraph sequences are replaced by corresponding single-character internal representations.

118 2. Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines.

119 Only the last backslash on any physical source line shall be eligible for being part of such a splice.

120 5) Implementations shall behave as if these separate phases occur, even though many are typically folded together in practice.

121 Source files, translation units and translated translation units need not necessarily be stored as files, nor need there be any one-to-one correspondence between these entities and any external representation.

122 The description is conceptual only, and does not specify any particular implementation.

123 A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place.

124 3. The source file is decomposed into preprocessing tokens6) and sequences of white-space characters (including comments).

125 A source file shall not end in a partial preprocessing token or in a partial comment.

126 Each comment is replaced by one space character.

127 New-line characters are retained.

128 Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.

129 4. Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed.

130 If a character sequence that matches the syntax of a universal character name is produced by token concatenation (, the behavior is undefined.

131 A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively.

132 All preprocessing directives are then deleted.

133 5. Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set;

134 if there is no corresponding member, it is converted to an implementation-defined member other than the null (wide) character.7)

135 6. Adjacent string literal tokens are concatenated.

136 7. White-space characters separating tokens are no longer significant.

137 Each preprocessing token is converted into a token.

138 The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.

139 8. All external object and function references are resolved.

140 Library components are linked to satisfy external references to functions and objects not defined in the current translation.

141 All such translator output is collected into a program image which contains information needed for execution in its execution environment.

142 Forward references: universal character names (6.4.3), lexical elements (6.4), preprocessing directives (6.10), trigraph sequences (, external definitions (6.9).

143 6) As described in 6.4, the process of dividing a source file's characters into preprocessing tokens is context-dependent.

144 For example, see the handling of < within a #include preprocessing directive.

145 7) An implementation need not convert all non-corresponding source characters to the same execution character.


Created at: 2008-01-30 02:39:39 The text from WG14/N1256 is copyright © ISO