Lexical analysis with lex

Generating a lexical analyzer program

lex generates a C language scanner from a source specification that you write to solve the problem at hand. This specification consists of a list of rules indicating sequences of characters -- expressions -- to be searched for in an input text, and the actions to take when an expression is found. We will show you how to write a lex specification in the next section.

The C source code for the lexical analyzer is generated when you enter

   $ lex lex.l
where lex.l is the file containing your lex specification. (The name lex.l is the favored convention, but you may use whatever name you want. Keep in mind, though, that the .l suffix is a convention recognized by other UNIX system tools, in particular, make.) The source code is written to an output file called lex.yy.c by default. That file contains the definition of a function called yylex() that returns 1 whenever an expression you have specified is found in the input text, 0 when end of file is encountered. Each call to yylex() parses one token. When yylex() is called again, it picks up where it left off.

Note that running lex on a specification that is spread across several files

   $ lex lex1.l lex2.l lex3.l
produces one lex.yy.c. Invoking lex with the -t option causes it to write its output to stdout rather than lex.yy.c, so that it can be redirected:
   $ lex -t lex.l > lex.c
Options to lex must appear between the command name and the file name argument.

The lexical analyzer code stored in lex.yy.c (or the .c file to which it was redirected) must be compiled to generate the executable object program, or scanner, that performs the lexical analysis of an input text. The lex library, libl.a, supplies a default main() that calls the function yylex(), so you need not supply your own main(). The library is accessed by specifying libl with the -l option to cc:

   $ cc lex.yy.c -ll
Alternatively, you may want to write your own driver. The following is similar to the library version:
   extern int yylex();

int yywrap() { return(1); }

main() { while (yylex()) ; }

We will take a closer look at the function yywrap() in ``lex routines''. For now it is enough to note that when your driver file is compiled with lex.yy.c
   $ cc lex.yy.c driver.c
its main() will call yylex() at run time exactly as if the lex library had been loaded. The resulting executable reads stdin and writes its output to stdout. Figure 4-1 shows how lex works.

Creation and use of a lexical analyzer with lex

Next topic: Writing lex source
Previous topic: Lexical analysis with lex

© 2004 The SCO Group, Inc. All rights reserved.
UnixWare 7 Release 7.1.4 - 27 April 2004