All subsequent references to identifiers refer to the appropriate symbol table index. The grammar in the above diagram is a text file you create with a text editor. Yacc will read your grammar and generate C code for a syntax analyzer or parser. The syntax analyzer uses grammar rules that allow it to analyze tokens from the lexical analyzer and create a syntax tree. The syntax tree imposes a hierarchical structure on the tokens. For example, operator precedence and associativity are apparent in the syntax tree.
The next step, code generation, does a depth-first walk of the syntax tree to generate code. Some compilers produce machine code, while others, as shown above, output assembly language. First, we need to specify all pattern matching rules for lex bas.
Furthermore, Lex reads an input stream specifying the lexical analyzer. Then, it outputs the source code implementing the lexer in the C language. Rules : It contains regular expression patterns with C statements.
When the lexer identifies that the text in the input matches a given pattern, it will execute the associated C code. C code : This section consists of C statements and functions. The most popular open-source version of Lex is called flex, which stands for Fast Lexical Analyzer. Yacc stands for Yet Another Compiler-Compiler. Stephan C. Johnson developed it, and it is used in UNIX systems. Lex turns these regular expressions into a form that the lexer can use to scan the input text extremely fast, independent of the number of expressions that it is trying to match.
A lex lexer is almost always faster than a lexer that you might write in C by hand. As the input is divided into tokens, a program often needs to establish the relationship among the tokens. A C compiler needs to find the expressions, statements, declarations, blocks, and procedures in the program. This task is known as parsing and the list of rules that define the relationships that the program understands is a grammar. Yacc takes a concise description of a grammar and produces a C routine that can parse that grammar, a parser.
A yacc parser is generally not as fast as a parser you could write by hand, but the ease in writing and modifying the parser is invariably worth any speed loss. The amount of time a program spends in a parser is rarely enough to be an issue anyway. When a task involves dividing the input into units and establishing some relationship among those units, you should think of lex and yacc.
We do not intend for this chapter to be a complete tutorial on lex and yacc, but rather a gentle introduction to their use. It acts very much like the UNIX cat command run with no arguments.
Lex automatically generates the actual C program code needed to handle reading the input file and sometimes, as in this case, writing the output as well. We start by identifying parts of speech noun, verb, etc. Example shows a simple lex specification to recognize these verbs. What we type is in bold. This first section, the definition section , introduces any initial C program code we want copied into the final program.
This is especially important if, for example, we have header files that must be included for code later in the file to work. In this example, the only thing in the definition section is some C comments. You might wonder whether we could have included the comments without the delimiters. The next section is the rules section. Each rule is made up of two parts: a pattern and an action , separated by whitespace. The lexer that lex generates will execute the action when it recognizes the pattern. These patterns are UNIX-style regular expressions, a slightly extended version of the same expressions used by tools such as grep, sed , and ed.
Chapter 6 describes all the rules for regular expressions. The first rule in our example is the following:. Thus, this pattern describes whitespace any combination of tabs and spaces. The second part of the rule, the action , is simply a semicolon, a do-nothing C statement.
Its effect is to ignore the input. This is a special action that means to use the same action as the next pattern, so all of the verbs use the action specified for the last one. Our patterns match any of the verbs in the list. Once we recognize a verb, we execute the action, a C printf statement. The array yytext contains the text that matched the pattern. The answer is that lex has a set of simple disambiguating rules. The two that make our lexer work are:.
Lex executes the action for the longest possible match for the current input. If you think about how a lex lexer matches patterns, you should be able to see how our example matches only the verbs listed. The last line is the default case. The special action ECHO prints the matched pattern on the output, copying any punctuation or other characters. We explicitly list this case although it is the default behavior. We have seen some complex lexers that worked incorrectly because of this very feature, producing occasional strange output when the default pattern matched unanticipated input characters.
Even though there is a default action for unmatched input characters, well-written lexers invariably have explicit rules to match all possible input. The final section is the user subroutines section , which can consist of any legal C code.
Lex copies it to the C file after the end of the lex generated code. We have included a main program. The lexer produced by lex is a C routine called yylex , so we call it. We placed our original example in a file called ch To create an executable program on our UNIX system we enter these commands:. Lex translates the lex specification into a C source file called lex. We then execute the resulting program to check that it works as we expect, as we saw earlier in this section. In many cases, though, tasks don't want to assume that the recipient stores a data structure in the same way the sender does.
This may be due to portability requirements or to the fact that the tasks are part of a distributed application and are running on different hardware. In such cases, it's common for a message to be provided that describes the data in an agreed format rather than passing the binary. Where this is done, you need to parse the description at the receiving end. Many embedded systems require configuration data, which is commonly stored in a flash filesystem. This configuration data is typically stored within a text file that is parsed at run-time and used to populate a database with values affecting the subsequent behavior of the application.
I use the term database in the loosest sense. The data storage may comprise anything from a couple of variables to a large and complex set of data structures. Let's look at a couple of examples to see how it can be done. There are many versions of lex and yacc available for a variety of platforms. For the examples below I have used GNU's flex 2. String Processing Figure 3 shows a motor control system that consists of three key components:.
Figure 3: A motor controller and supervisory micro-controller configuration the drive. Firmware running on the microcontroller sends commands to and receives responses from the drive system by means of an RS cable.
We're only concerned here with one component of this data flow: the data coming in from the drive. This data takes the form of a stream of characters; a lex-generated scanner interprets it and generates messages that can be used internally within the application.
We'll assume we're looking for just one message:. It may be generated in response to some microcontroller output or asynchronously by the drive system. This doesn't affect how the scanner interprets the data. Due to its simplicity, this application won't require the use of yacc. Instead of returning tokens to a parser, we'll do everything within our scanner function and generate messages for the application.
The procedure for producing our scanner is as follows:. What do we want our scanner to do? Our scanner function will be called as required by the application maybe by polling it or by generating an event when data is available for it to process. It will read from its input repeatedly until it:. Although the lex specification format is fundamentally simple, like many tools it has enough options to occupy you for a long time, should you be so inclined.
The specification in Listing 1 is divided into three sections as follows:. Listing 1 Barebones lex specification for simple serial protocol interpreter. In Listing 1, the only item in the definitions section is an alias for a complex regular expression. You can use aliases to avoid duplicating a complex expression throughout the rules section. The DoubleValue alias describes the acceptable format of a double-precision decimal value. Lex supports a large regular expression syntax that is documented in the companion manuals.
The second section is the rules section. This is the business end of the lex specification and describes most of the functionality of the scanner. Each rule consists of an input pattern to be matched and an action to be taken when and if it is. The input patterns can be literal characters, aliases from our definitions section enclosed in braces , or a combination of both. The user code contains any additional code we want to insert at the bottom of the generated source file.
In the example, I have included an input function to read from the serial port. Some additional definitions are required to enable this function but I've omitted them for clarity. This is why I have redefined the input function here.
Execution proceeds as follows:. If at any stage an unexpected input is received something that cannot be matched , it is matched by the default rule, which is represented by the period.
In this case, we return the fact that the input is invalid to the calling function. The scanner will repeatedly call the input function until no data is available. If the input function returns 0 because no further data is available via the serial connection, for example , the scanner will return 0 to the calling function but will maintain its internal state.
When it is subsequently called, it will simply try once more to get additional data from the input function. As already stated, if at any stage the data received is inconsistent with the expected patterns, the default rule is matched and an error is returned. Many aspects of a lex scanner's behavior can be altered by using different options in both the lex specification and on the command line and the resulting scanner can be a very different beast.
To use the scanner in your program, call the scanner function int yylex void as required. Configuration File Parsing Figure 4 illustrates the parsing of a simple configuration file. The configuration file is designed to hold a list of a variable number of image descriptions for a device's GUI.
0コメント