Skip to content
Advertisement

ANTLR4: How to match extra spaces at the beginning of a line?

I tried to match the extra space at the beginning of the line, but it didn’t work. How to modify the lexer rule to match?

TestParser.g4:

parser grammar TestParser;

options { tokenVocab=TestLexer; }

root
    : choice+ EOF
    ;

choice:
    QUESTION OPTION+;

TestLexer.g4:

lexer grammar TestLexer;

@lexer::members {
    private boolean aheadIsNotAnOption(IntStream _input) {
        int nextChar = _input.LA(1);
        return nextChar != 'A' && nextChar != 'B' && nextChar != 'C' && nextChar != 'D';
    }
}

QUESTION:                      {getCharPositionInLine() == 0}? DIGIT DOT CONTENT -> pushMode(OPTION_MODE);
OTHER:                         . -> skip;

mode OPTION_MODE;
OPTION:                        OPTION_HEADER DOT CONTENT;
NOT_OPTION_LINE:               NEWLINE SPACE* {aheadIsNotAnOption(_input)}? -> popMode, skip;
OPTION_OTHER:                  OTHER -> skip;

fragment DIGIT:                [0-9]+;
fragment OPTION_HEADER:        [A-D];
fragment CONTENT:              [a-zA-Z0-9 ,.'?/()!]+? {_input.LA(1) == 'n'}?;
fragment DOT:                  '.';
fragment NEWLINE:              'n';
fragment SPACE:                ' ';

Text:

1.title
A.aaa
B.bbb
 C.ccc
2.title
A.aaa

Java code:

import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.Lexer;
import org.antlr.v4.runtime.tree.ParseTree;

import java.io.IOException;
import java.net.URISyntaxException;

public class TestParseTest {

    public static void main(String[] args) throws URISyntaxException, IOException {
        CharStream charStream = CharStreams.fromString("1.titlen" +
                "A.aaan" +
                "B.bbbn" +
                " C.cccn" +
                "2.titlen" +
                "A.aaan");
        Lexer lexer = new TestLexer(charStream);

        CommonTokenStream tokens = new CommonTokenStream(lexer);
        TestParser parser = new TestParser(tokens);
        ParseTree parseTree = parser.root();

        System.out.println(parseTree.toStringTree(parser));
    }

}

The output is as follows:

(root (choice 1.title A.aaa B.bbb) (choice 2.title A.aaa) <EOF>)

The idea is that when a non-option line is encountered in OPTION_MODE, the mode will pop up, and now when there is an extra space at the beginning of the line, it is not matched as expected.

It seems that the n before C.ccc matches NOT_OPTION_LINE causing the mode to pop up? I want C.ccc to match as OPTION, thanks.

Advertisement

Answer

I think you’re making it a bit too complex. As I see it, lines either start as a question ([ t]* [0-9]+) or as an option [ t]* [A-Z]. In all other cases, just ignore the line (. -> skip). That boils down to the following grammar:

lexer grammar TestLexer;

QuestionStart
 : {getCharPositionInLine() == 0}? [ t]* [0-9]+ '.' -> pushMode(ContentMode)
 ;

OptionStart
 : {getCharPositionInLine() == 0}? [ t]* [A-Z] '.' -> pushMode(ContentMode)
 ;

Ignored
 : . -> skip
 ;

mode ContentMode;

  Content
   : ~[rn]+
   ;

  QuestionEnd
   : [rn]+ -> skip, popMode
   ;

A parser grammar could then look like this:

parser grammar TestParser;

options { tokenVocab=TestLexer; }

root
 : question+ EOF
 ;

question
 : QuestionStart Content option+
 ;

option
 : OptionStart Content+
 ;

And the Java code:

String source = "1.titlen" +
    "A.aaan" +
    "B.bbbn" +
    " C.cccn" +
    "  ...ignored ...n" +
    "2.titlen" +
    "A.aaan";

Lexer lexer = new TestLexer(CharStreams.fromString(source));

CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
ParseTree parseTree = parser.root();

System.out.println(parseTree.toStringTree(parser));

will then print:

(root (question 1. title (option A. aaa) (option B. bbb) (option  C. ccc)) (question 2. title (option A. aaa)) <EOF>)

EDIT

Given that you already have target specific code in your grammar, you could just trim the spaces from an option like this (untested!):

OptionStart
 : {getCharPositionInLine() == 0}? [ t]* [A-Z] '.'
   {setText(getText().trim());}
   -> pushMode(ContentMode)
 ;
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement