Lexical analysis with lex

Lexical analysis with lex

lex is a software tool that lets you solve a wide class of problems drawn from text processing, code enciphering, compiler writing, and other areas. In text processing, you might check the spelling of words for errors; in code enciphering, you might translate certain patterns of characters into others; and in compiler writing, you might determine what the tokens are in the program to be compiled. The task common to all these problems is lexical analysis: recognizing different strings of characters that satisfy certain characteristics. Hence the name lex.

You do not have to use lex to handle problems of this kind. You could write programs in a standard language like C to handle them, too. In fact, what lex does is produce such C programs. (lex is therefore called a program generator.) What lex offers you, once you acquire a facility with it, is typically a faster, easier way to create programs that perform these tasks. Its weakness is that it often produces C programs that are longer than necessary for the task at hand and that execute more slowly than they otherwise might. In many applications this is a minor consideration, and the advantages of using lex considerably outweigh it.

lex can also be used to collect statistical data on features of an input text, such as character count, word length, number of occurrences of a word, and so forth. In the remaining sections, we will see