I tried to match the extra space at the beginning of the line, but it didn't work. How to modify the lexer rule to match? TestParser.g4: TestLexer.g4: Text: Java code: The output is as follows: The idea is that when a non-option line is encountered in OPTION_MODE, the mode will pop up, and now when there is an extra space

ANTLR4: How to match extra spaces at the beginning of a line?

I tried to match the extra space at the beginning of the line, but it didn’t work. How to modify the lexer rule to match?

TestParser.g4:

parser grammar TestParser;

options { tokenVocab=TestLexer; }

root
    : choice+ EOF
    ;

choice:
    QUESTION OPTION+;

JavaScript
​x
 
parser grammar TestParser;​options { tokenVocab=TestLexer; }​root    : choice+ EOF    ;​choice:    QUESTION OPTION+;​

TestLexer.g4:

lexer grammar TestLexer;

@lexer::members {
    private boolean aheadIsNotAnOption(IntStream _input) {
        int nextChar = _input.LA(1);
        return nextChar != 'A' && nextChar != 'B' && nextChar != 'C' && nextChar != 'D';
    }
}

QUESTION:                      {getCharPositionInLine() == 0}? DIGIT DOT CONTENT -> pushMode(OPTION_MODE);
OTHER:                         . -> skip;

mode OPTION_MODE;
OPTION:                        OPTION_HEADER DOT CONTENT;
NOT_OPTION_LINE:               NEWLINE SPACE* {aheadIsNotAnOption(_input)}? -> popMode, skip;
OPTION_OTHER:                  OTHER -> skip;

fragment DIGIT:                [0-9]+;
fragment OPTION_HEADER:        [A-D];
fragment CONTENT:              [a-zA-Z0-9 ,.'?/()!]+? {_input.LA(1) == 'n'}?;
fragment DOT:                  '.';
fragment NEWLINE:              'n';
fragment SPACE:                ' ';

JavaScript
 
lexer grammar TestLexer;​@lexer::members {    private boolean aheadIsNotAnOption(IntStream _input) {        int nextChar = _input.LA(1);        return nextChar != 'A' && nextChar != 'B' && nextChar != 'C' && nextChar != 'D';    }}​QUESTION:                      {getCharPositionInLine() == 0}? DIGIT DOT CONTENT -> pushMode(OPTION_MODE);OTHER:                         . -> skip;​mode OPTION_MODE;OPTION:                        OPTION_HEADER DOT CONTENT;NOT_OPTION_LINE:               NEWLINE SPACE* {aheadIsNotAnOption(_input)}? -> popMode, skip;OPTION_OTHER:                  OTHER -> skip;​fragment DIGIT:                [0-9]+;fragment OPTION_HEADER:        [A-D];fragment CONTENT:              [a-zA-Z0-9 ,.'?/()!]+? {_input.LA(1) == 'n'}?;fragment DOT:                  '.';fragment NEWLINE:              'n';fragment SPACE:                ' ';​

Text:

1.title
A.aaa
B.bbb
 C.ccc
2.title
A.aaa

JavaScript
 
1.titleA.aaaB.bbb C.ccc2.titleA.aaa​​

Java code:

import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.Lexer;
import org.antlr.v4.runtime.tree.ParseTree;

import java.io.IOException;
import java.net.URISyntaxException;

public class TestParseTest {

    public static void main(String[] args) throws URISyntaxException, IOException {
        CharStream charStream = CharStreams.fromString("1.titlen" +
                "A.aaan" +
                "B.bbbn" +
                " C.cccn" +
                "2.titlen" +
                "A.aaan");
        Lexer lexer = new TestLexer(charStream);

        CommonTokenStream tokens = new CommonTokenStream(lexer);
        TestParser parser = new TestParser(tokens);
        ParseTree parseTree = parser.root();

        System.out.println(parseTree.toStringTree(parser));
    }

}

JavaScript
 
import org.antlr.v4.runtime.CharStream;import org.antlr.v4.runtime.CharStreams;import org.antlr.v4.runtime.CommonTokenStream;import org.antlr.v4.runtime.Lexer;import org.antlr.v4.runtime.tree.ParseTree;​import java.io.IOException;import java.net.URISyntaxException;​public class TestParseTest {​    public static void main(String[] args) throws URISyntaxException, IOException {        CharStream charStream = CharStreams.fromString("1.titlen" +                "A.aaan" +                "B.bbbn" +                " C.cccn" +                "2.titlen" +                "A.aaan");        Lexer lexer = new TestLexer(charStream);​        CommonTokenStream tokens = new CommonTokenStream(lexer);        TestParser parser = new TestParser(tokens);        ParseTree parseTree = parser.root();​        System.out.println(parseTree.toStringTree(parser));    }​}​

The output is as follows:

(root (choice 1.title A.aaa B.bbb) (choice 2.title A.aaa) <EOF>)

JavaScript
 
(root (choice 1.title A.aaa B.bbb) (choice 2.title A.aaa) <EOF>)​

The idea is that when a non-option line is encountered in OPTION_MODE, the mode will pop up, and now when there is an extra space at the beginning of the line, it is not matched as expected.

It seems that the n before C.ccc matches NOT_OPTION_LINE causing the mode to pop up? I want C.ccc to match as OPTION, thanks.

Answer

I think you’re making it a bit too complex. As I see it, lines either start as a question ([ t]* [0-9]+) or as an option [ t]* [A-Z]. In all other cases, just ignore the line (. -> skip). That boils down to the following grammar:

lexer grammar TestLexer;

QuestionStart
 : {getCharPositionInLine() == 0}? [ t]* [0-9]+ '.' -> pushMode(ContentMode)
 ;

OptionStart
 : {getCharPositionInLine() == 0}? [ t]* [A-Z] '.' -> pushMode(ContentMode)
 ;

Ignored
 : . -> skip
 ;

mode ContentMode;

  Content
   : ~[rn]+
   ;

  QuestionEnd
   : [rn]+ -> skip, popMode
   ;

JavaScript
 
lexer grammar TestLexer;​QuestionStart : {getCharPositionInLine() == 0}? [ t]* [0-9]+ '.' -> pushMode(ContentMode) ;​OptionStart : {getCharPositionInLine() == 0}? [ t]* [A-Z] '.' -> pushMode(ContentMode) ;​Ignored : . -> skip ;​mode ContentMode;​  Content   : ~[rn]+   ;​  QuestionEnd   : [rn]+ -> skip, popMode   ;​

A parser grammar could then look like this:

parser grammar TestParser;

options { tokenVocab=TestLexer; }

root
 : question+ EOF
 ;

question
 : QuestionStart Content option+
 ;

option
 : OptionStart Content+
 ;

JavaScript
 
parser grammar TestParser;​options { tokenVocab=TestLexer; }​root : question+ EOF ;​question : QuestionStart Content option+ ;​option : OptionStart Content+ ;​

And the Java code:

String source = "1.titlen" +
    "A.aaan" +
    "B.bbbn" +
    " C.cccn" +
    "  ...ignored ...n" +
    "2.titlen" +
    "A.aaan";

Lexer lexer = new TestLexer(CharStreams.fromString(source));

CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
ParseTree parseTree = parser.root();

System.out.println(parseTree.toStringTree(parser));

JavaScript
 
String source = "1.titlen" +    "A.aaan" +    "B.bbbn" +    " C.cccn" +    "  ...ignored ...n" +    "2.titlen" +    "A.aaan";​Lexer lexer = new TestLexer(CharStreams.fromString(source));​CommonTokenStream tokens = new CommonTokenStream(lexer);TestParser parser = new TestParser(tokens);ParseTree parseTree = parser.root();​System.out.println(parseTree.toStringTree(parser));​

will then print:

(root (question 1. title (option A. aaa) (option B. bbb) (option  C. ccc)) (question 2. title (option A. aaa)) <EOF>)

JavaScript
 
(root (question 1. title (option A. aaa) (option B. bbb) (option  C. ccc)) (question 2. title (option A. aaa)) <EOF>)​

EDIT

Given that you already have target specific code in your grammar, you could just trim the spaces from an option like this (untested!):

OptionStart
 : {getCharPositionInLine() == 0}? [ t]* [A-Z] '.'
   {setText(getText().trim());}
   -> pushMode(ContentMode)
 ;

JavaScript
 
OptionStart : {getCharPositionInLine() == 0}? [ t]* [A-Z] '.'   {setText(getText().trim());}   -> pushMode(ContentMode) ;​

Advertisement

Answer

EDIT