ANTLR 4: using the lexer, parser and listener with example grammar

This is a quick overview of the latest version of ANTLR and how to write a simple lexer/parser and listen to any matching parse events using a Java target. Listening to parse events is new to ANTLR 4 and makes writing a grammar much more concise.


If you want to follow along you’ll need to create a new Java project in your IDE of choice, you’ll need to have the ANTLR runtime on the classpath.

ANTLR is on the main Maven repository if you use Maven.

Lexer

First thing we need to do is create a grammar file. A grammar file will be fed any text you want to parse and attempt to match the string to a number of lexer tokens. This grammar file is a human-readable text file appended with ‘.g’.

A grammar file is what you’d expect – a file describing rules about a certain grammar. At a basic level it’s simply a key value pairing of different tokens. For example a grammar file containing:

OF : 'of' ;

Will trigger the OF token when you pass in the string ‘of’ into the grammar file. We’ll write a simple grammar to parse a string such as ‘a cup of tea’ and listen for what type of drink they want.

Create a file named Drink.g and paste the below into it:


grammar Drink;

// Parser Rules

drinkSentence : ARTICLE? DRINKING_VESSEL OF drink ;

drink : TEXT;

// Lexer Rules

ARTICLE : 'the' | 'an' | 'a' ;

OF : 'of' ;

DRINKING_VESSEL : 'cup' | 'pint' | 'shot' | 'mug' | 'glass' ;

TEXT : ('a'..'z')+ ;

WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ -> skip ;

You may be able to guess what this grammar does but I’ll explain the process in a little more detail. When you feed a string into this grammar ANTLR will look at your string and try and match sub components of the string with Lexer rules.

Lexer rules are the words in capitals. Given the string; ‘a pint of beer’ the lexer will match ‘a’ to the ARTICLE Lexer rule, ‘pint’ to the DRINKING_VESSEL rule, ‘of’ to OF and ‘beer’ will match TEXT (which is any range of lower case letters – ('a'..'z') 1 or more times which is the meaning of the +).

Parser

That’s lexing done, now the parsing process occurs.

Parser rules are the rules in camelCase. When we pass a string to our grammar we’ll specify a parser rule that acts as an entry point, we’ll specify this later in the Java code – but for your reference the entry parsing rule will be the drinkSentence parsing rule.

What the parser will see when ANTLR parses our string ‘a pint of beer’ is the tokens tagged by the lexer; ARTICLE (the ‘?’ on article means that this token may occur zero or one time – ie optional), DRINKING_VESSEL, OF and TEXT (TEXT is also the drink rule). This matches the drinkSentence rule and so ANTLR is able to recognize this string.

Listener – New to ANTLR4

Once parsing has done it’s job we’ll get ANTLR to walk the grammar and attach a listener – the listener is notified when any one of our parsing rules is triggered. We want to find out what drink people are having so we’ll need to listen to when the drink parser rule gets triggered.

To do that we’ll need to compile our grammar file:

java -jar path/to/antlr/download/antlr-4.1-complete.jar /path/to/antlr/file/Drink.g

This will create a number of files within the same directory as Drink.g :

  • DrinkLexer.java
  • DrinkParser.java
  • DrinkLexer.tokens
  • Drink.tokens
  • DrinkListener.java
  • DrinkBaseListener.java
  • The only file we care about for listening is DrinkBaseListener.java this is a class that has entry and exit methods for each of our parse rules. We simply inherit from this base class and override the enterDrink(DrinkContext ctx) method. This allows us to capture every time the parser hits the drink rule. From here we get a DrinkContext object and we can access the text within the drink parser rule by simply calling ctx.getText():

    	public class AntlrDrinkListener extends DrinkBaseListener {
    
    		@Override
    		public void enterDrink(DrinkContext ctx) {
    			System.out.println(ctx.getText());
    		}
    
    	}
    

    Finishing off

    Now we need the entry point for all this; Java code that tells ANTLR to lex the string, parse it, walk it and attach our listener to it:

    	private void printDrink(String drinkSentence) {
    		// Get our lexer
    		DrinkLexer lexer = new DrinkLexer(new ANTLRInputStream(drinkSentence));
    
    		// Get a list of matched tokens
    		CommonTokenStream tokens = new CommonTokenStream(lexer);
    
    		// Pass the tokens to the parser
    		DrinkParser parser = new DrinkParser(tokens);
    
    		// Specify our entry point
    		DrinkSentenceContext drinkSentenceContext = parser.drinkSentence();
    
    		// Walk it and attach our listener
    		ParseTreeWalker walker = new ParseTreeWalker();
    		AntlrDrinkListener listener = new AntlrDrinkListener();
    		walker.walk(listener, drinkSentenceContext);
    	}
    

    Once you’ve merged this somewhere suitable in your project (I’ve left out imports and package declarations) you can call the printDrink method above with a string and sys out should be printing the drink (via our listener).

    That’s pretty much the jist of ANTLR, this example doesn’t really do anything useful, a lexical problem this trivial is not worth using ANTLR but it’s hoped you can build upon this to build something useful.

    13 comments on “ANTLR 4: using the lexer, parser and listener with example grammar

    1. Hello, Very nice tutorial and extremely helpful, I’ve recently started with ANTRL and I’m thinking of creating a new language with all the basic principles and guidelines. It would be very delightful if you’d direct me on how to go about it.

    2. Rahuls-MacBook-Pro:Documents ashutoshdwivedi$ java -jar antlr4-runtime-4.1.jar /Users/ashutoshdwivedi/work/Test/src/Drink.g
      no main manifest attribute, in antlr4-runtime-4.1.jar getting this error

    3. Thanks for the post. People please note that for the printDrink method imports you need are
      import org.antlr.v4.runtime.ANTLRInputStream;
      import org.antlr.v4.runtime.CommonTokenStream;

      and NOT
      import org.antlr.runtime.ANTLRInputStream;
      import org.antlr.runtime.CommonTokenStream;

    4. Hi Alex,

      Can we use ANTLR using c# with Lexer and Parser?
      Please reply.

      I am waiting your response.

      Thanks in advance.

    5. hello,
      I am a new driver for Antlr4, kindly I asked how we can augment the type checking or semantic analysis to it.
      thanks

    Leave a Reply

    Your email address will not be published. Required fields are marked *