I tried to match the extra space at the beginning of the line, but it didn’t work. How to modify the lexer rule to match?
parser grammar TestParser; options { tokenVocab=TestLexer; } root : choice+ EOF ; choice: QUESTION OPTION+;
lexer grammar TestLexer; @lexer::members { private boolean aheadIsNotAnOption(IntStream _input) { int nextChar = _input.LA(1); return nextChar != 'A' && nextChar != 'B' && nextChar != 'C' && nextChar != 'D'; } } QUESTION: {getCharPositionInLine() == 0}? DIGIT DOT CONTENT -> pushMode(OPTION_MODE); OTHER: . -> skip; mode OPTION_MODE; OPTION: OPTION_HEADER DOT CONTENT; NOT_OPTION_LINE: NEWLINE SPACE* {aheadIsNotAnOption(_input)}? -> popMode, skip; OPTION_OTHER: OTHER -> skip; fragment DIGIT: [0-9]+; fragment OPTION_HEADER: [A-D]; fragment CONTENT: [a-zA-Z0-9 ,.'?/()!]+? {_input.LA(1) == 'n'}?; fragment DOT: '.'; fragment NEWLINE: 'n'; fragment SPACE: ' ';
1.title A.aaa B.bbb C.ccc 2.title A.aaa
Java code:
import org.antlr.v4.runtime.CharStream; import org.antlr.v4.runtime.CharStreams; import org.antlr.v4.runtime.CommonTokenStream; import org.antlr.v4.runtime.Lexer; import org.antlr.v4.runtime.tree.ParseTree; import java.io.IOException; import java.net.URISyntaxException; public class TestParseTest { public static void main(String[] args) throws URISyntaxException, IOException { CharStream charStream = CharStreams.fromString("1.titlen" + "A.aaan" + "B.bbbn" + " C.cccn" + "2.titlen" + "A.aaan"); Lexer lexer = new TestLexer(charStream); CommonTokenStream tokens = new CommonTokenStream(lexer); TestParser parser = new TestParser(tokens); ParseTree parseTree = parser.root(); System.out.println(parseTree.toStringTree(parser)); } }
The output is as follows:
(root (choice 1.title A.aaa B.bbb) (choice 2.title A.aaa) <EOF>)
The idea is that when a non-option line is encountered in OPTION_MODE
, the mode will pop up, and now when there is an extra space at the beginning of the line, it is not matched as expected.
It seems that the n
before C.ccc
causing the mode to pop up? I want C.ccc
to match as OPTION
, thanks.
I think you’re making it a bit too complex. As I see it, lines either start as a question ([ t]* [0-9]+
) or as an option [ t]* [A-Z]
. In all other cases, just ignore the line (. -> skip
). That boils down to the following grammar:
lexer grammar TestLexer; QuestionStart : {getCharPositionInLine() == 0}? [ t]* [0-9]+ '.' -> pushMode(ContentMode) ; OptionStart : {getCharPositionInLine() == 0}? [ t]* [A-Z] '.' -> pushMode(ContentMode) ; Ignored : . -> skip ; mode ContentMode; Content : ~[rn]+ ; QuestionEnd : [rn]+ -> skip, popMode ;
A parser grammar could then look like this:
parser grammar TestParser; options { tokenVocab=TestLexer; } root : question+ EOF ; question : QuestionStart Content option+ ; option : OptionStart Content+ ;
And the Java code:
String source = "1.titlen" + "A.aaan" + "B.bbbn" + " C.cccn" + " ...ignored ...n" + "2.titlen" + "A.aaan"; Lexer lexer = new TestLexer(CharStreams.fromString(source)); CommonTokenStream tokens = new CommonTokenStream(lexer); TestParser parser = new TestParser(tokens); ParseTree parseTree = parser.root(); System.out.println(parseTree.toStringTree(parser));
will then print:
(root (question 1. title (option A. aaa) (option B. bbb) (option C. ccc)) (question 2. title (option A. aaa)) <EOF>)
Given that you already have target specific code in your grammar, you could just trim the spaces from an option like this (untested!):
OptionStart : {getCharPositionInLine() == 0}? [ t]* [A-Z] '.' {setText(getText().trim());} -> pushMode(ContentMode) ;