![]() |
![]() |
![]() |
![]() |
![]() |
Syntax Definition
In this section, a notation called a context-free grammar (grammar for short) is introduced. This notation is used for specifying the syntax of a language.
A grammar naturally describes the hierarchical structure of many programming language constructs. For example, an if-else statement in C has the form as below.
if ( expression ) statement else statementThat is, the statement is the concatenation of the keyword if, an opening parenthesis, an expression, a closing parenthesis, a statement, the keyword else, and another statement. Using the variables expr to denote an expression and the variables stmt, stmt1 and stmt2 to denote three (possible different) statements, this structuring rules can be expressed as
stmt -> if ( expr ) stmt1 else stmt2in which the arrow may be read as "can have the form". Such a rule is called a production. In a production lexical elements like the keyword if and the parentheses are called tokens. Variables like expr and stmt2 represent sequence of tokens and are called nonterminals.
A context-free grammar has four components:
- A set of tokens, known as terminal symbols.
- A set of nonterminals.
- A set of productions where each production consists of a nonterminal, called the left side of the production, an arrow, and a sequence of tokens and/or nonterminals, called the right side of the production.
- A destination of one of the nonterminals as the start symbol.
Through this chapter, examples and pictures will be present in order to clarify a topic or explain a specific concept. As TTCN, at this stage, has a too large grammar definition to be included in this document, a language called SMALL is defined that in this chapter will be used for examples and discussions.
SMALL will be defined as a language only consisting of digits and the plus and minus signs, e.g., 9-5+2, 3-1 and 7. In the language SMALL, a plus or minus sign must appear between two digits, and expressions as above will be referred to as "lists of digits separated by plus or minus sign".
The following grammar specifies the syntax of this language with list as its start symbol:
1:list -> list + digit2:list -> list - digit3:list -> digit4:digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9From production (3), a single digit by itself is a list. Productions (1) and (2) express the fact that if one take any list and follow it by a plus or a minus sign and then another digit one has a new list. For example, it is possible to deduce that 9-5+2 is a list as follows:
- a) 9 is a list by production (3), since 9 is a digit.
- b) 9-5 is a list by production (2), since 9 is a list and 5 is a digit.
- c) 9-5+2 is a list by production (1), since 9-5 is a list and 2 is a digit.
This reasoning is illustrated by the tree in the next figure. Each node in the tree is labelled by a grammar symbol. An interior node and its children correspond to a production; the interior node corresponds to the left side of the production, the children to the right side. Such trees are called parse trees and are discussed in the next chapter.
http://www.ibm.com/rational |
![]() |
![]() |
![]() |
![]() |