In case you need to add a parser to a C/C++ project, this should be a useful starting point with an empty lexer/parser combination that uses no global variables and accurately tracks location.

You can replace the string "project" in these files with the name of your parser and then rename the files accordingly. Obviously these defaults are not correct for every project.

Lexer Template

%option 8bit
%option never-interactive
%option noyywrap
%option nodefault
%option bison-bridge
%option bison-locations
%option reentrant
%option warn
%option yylineno
%option outfile=""
%option header-file="project_lex.hh"

#include "project_parse.hh"
#include <stdio.h>

#define YY_USER_ACTION \
    { \
        yylloc->first_line = yylloc->last_line; \
        yylloc->first_column = yylloc->last_column; \
        yylloc->last_line = yylineno; \
        yylloc->last_column = yycolumn; \
        yycolumn += yyleng; \


\n*     yycolumn = 1;
.       fprintf(stderr, "%d:%d:Unhandled character %02x\n", yylineno, yycolumn, (unsigned int)(unsigned char)(yytext[0]));

Newlines need explicit handling in a rule, as the column counter needs to be reset to 1. If there is no return statement in this line, this also means that newlines do not appear as tokens in the token stream, so if these are significant in your language, you will need to adapt this line accordingly. The yylineno variable is silently incremented when newlines are matched, so no special handling is required here.

The catch-all rule at the bottom generates an error message with location data for otherwise unhandled input, then ignores these characters for further parsing.

The YY_USER_ACTION macro updates the standard YYLTYPE as defined by Bison and the internal yycolumn variable. If you define your own YYLTYPE, e.g. because you need to add a file name, this needs to be adjusted as well.

The parser definition needs the lexer declarations (for the token types), so this is included here as well.

Parser Template

%define api.pure full
%define parse.error verbose
%param {yyscan_t scanner}
%parse-param {toplevel &top}

%output ""
%defines "project_parse.hh"

%union {

%code requires {
#include "project_tree.h"
    typedef void *yyscan_t;

%code {
#include "project_lex.hh"

#include <stdio.h>

void yyerror(YYLTYPE *yylval, yyscan_t, toplevel &, char const *msg)
    fprintf(stderr, "%d:%d: %s\n", yylval->first_line, yylval->first_column, msg);


%token end_of_file 0 "end of file"



The order of the include files here is tricky, as the parser definition needs the lexer declarations, which in turn needs the parser declarations.

Also, the user rules in the parser require the syntax tree declarations, which I normally keep in a separate file.

As the pure parser doesn't pass the value of the top production outside of yyparse, the top production needs to copy it somewhere. For this, a separate parameter is added to the parser, a reference to a toplevel object. This is available in all levels of yyparse as well as in yyerror, so an alternative error handling method could store errors in a list inside the toplevel.

The typedef void *yyscan_t; is knowledge we're not supposed to have here, but this definition needs to be available before the declaration of yyparse in the parser header file, which normally doesn't have the lexer definition visible. Including the AST definitions that early makes them available for use in the generated YYSTYPE declaration, so AST types can be used as value types for productions.

Invocation Template

To use the generated parser, you need to open a FILE * stream, attach it to the lexer and invoke the parser (which will call the lexer as required):

toplevel top;

FILE *in = fopen(input, "r");
    /* handle error */ ;
yyscan_t scanner;
yylex_set_in(in, scanner);
int ret = yyparse(scanner, top);

The return code from yyparse indicates if the top production was matched successfully, the action in this production should then update the toplevel object.

Makefile Template

To compile the files, just invoke flex and bison with the respective file as a single argument. The output names are listed in the input files, and both outputs are generated at the same time, so make sure dependencies are declared correctly: project_lex.hh

project_lex.hh: project.ll
    flex $< project_parse.hh

project_parse.hh: project.yy
    bison $<

If you generate dummy dependency files somewhere, include the generated headers in them:

    @echo >$@ "$*.o: project_lex.hh project_parse.hh"

This ensures that the headers are generated before any source is compiled on the first build — subsequent builds will have accurate dependency information anyway.