Lexical analysis with lex


The lex definitions section may contain any of several classes of items. The most critical are external definitions, preprocessor statements like #include, and abbreviations. Recall that for valid lex source this section is optional, but in most cases some of these items are necessary. Preprocessor statements and C source code should appear between a line of the form %{ and one of the form %}. All lines between these delimiters -- including those that begin with white space -- are copied to lex.yy.c immediately before the definition of yylex(). (Lines in the definition section that are not enclosed by the delimiters are copied to the same place provided they begin with white space.) The definitions section is where you would normally place C definitions of objects accessed by actions in the rules section or by routines with external linkage.

One example occurs in using lex with yacc, which generates parsers that call a lexical analyzer. In this context, you should include the file, which may contain #defines for token names:

   #include ""
   extern int tokval;
   int lineno;
After the %} that ends your #include's and declarations, you place your abbreviations for regular expressions to be used in the rules section. The abbreviation appears on the left of the line and, separated by one or more spaces, its definition or translation appears on the right. When you later use abbreviations in your rules, be sure to enclose them within braces. Abbreviations avoid needless repetition in writing your specifications and make them easier to read.

As an example, reconsider the lex source reviewed at the beginning of this section on advanced lex usage. The use of definitions simplifies our later reference to digits, letters, and blanks. This is especially true if the specifications appear several times:

   D               [0-9]
   L               [a-zA-Z]
   B               [ \t]+
   -{D}+           printf("negative integer");
   \+?{D}+         printf("positive integer");
   -0.{D}+         printf("negative fraction");
   G{L}*           printf("may have a G word here");
   rail{B}road     printf("railroad is one word");
   crook           printf("criminal");
     .              .
     .              .

Next topic: Start conditions
Previous topic: lex routines

© 2004 The SCO Group, Inc. All rights reserved.
UnixWare 7 Release 7.1.4 - 27 April 2004