I tried to match the extra space at the beginning of the line, but it didn’t work. How to modify the lexer rule to match?
TestParser.g4:
parser grammar TestParser;
options { tokenVocab=TestLexer; }
root
: choice+ EOF
;
choice:
QUESTION OPTION+;
TestLexer.g4:
lexer grammar TestLexer;
@lexer::members {
private boolean aheadIsNotAnOption(IntStream _input) {
int nextChar = _input.LA(1);
return nextChar != 'A' && nextChar != 'B' && nextChar != 'C' && nextChar != 'D';
}
}
QUESTION: {getCharPositionInLine() == 0}? DIGIT DOT CONTENT -> pushMode(OPTION_MODE);
OTHER: . -> skip;
mode OPTION_MODE;
OPTION: OPTION_HEADER DOT CONTENT;
NOT_OPTION_LINE: NEWLINE SPACE* {aheadIsNotAnOption(_input)}? -> popMode, skip;
OPTION_OTHER: OTHER -> skip;
fragment DIGIT: [0-9]+;
fragment OPTION_HEADER: [A-D];
fragment CONTENT: [a-zA-Z0-9 ,.'?/()!]+? {_input.LA(1) == 'n'}?;
fragment DOT: '.';
fragment NEWLINE: 'n';
fragment SPACE: ' ';
Text:
1.title
A.aaa
B.bbb
C.ccc
2.title
A.aaa
Java code:
import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.Lexer;
import org.antlr.v4.runtime.tree.ParseTree;
import java.io.IOException;
import java.net.URISyntaxException;
public class TestParseTest {
public static void main(String[] args) throws URISyntaxException, IOException {
CharStream charStream = CharStreams.fromString("1.titlen" +
"A.aaan" +
"B.bbbn" +
" C.cccn" +
"2.titlen" +
"A.aaan");
Lexer lexer = new TestLexer(charStream);
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
ParseTree parseTree = parser.root();
System.out.println(parseTree.toStringTree(parser));
}
}
The output is as follows:
(root (choice 1.title A.aaa B.bbb) (choice 2.title A.aaa) <EOF>)
The idea is that when a non-option line is encountered in OPTION_MODE
, the mode will pop up, and now when there is an extra space at the beginning of the line, it is not matched as expected.
It seems that the n
before C.ccc
matches NOT_OPTION_LINE
causing the mode to pop up? I want C.ccc
to match as OPTION
, thanks.
Advertisement
Answer
I think you’re making it a bit too complex. As I see it, lines either start as a question ([ t]* [0-9]+
) or as an option [ t]* [A-Z]
. In all other cases, just ignore the line (. -> skip
). That boils down to the following grammar:
lexer grammar TestLexer;
QuestionStart
: {getCharPositionInLine() == 0}? [ t]* [0-9]+ '.' -> pushMode(ContentMode)
;
OptionStart
: {getCharPositionInLine() == 0}? [ t]* [A-Z] '.' -> pushMode(ContentMode)
;
Ignored
: . -> skip
;
mode ContentMode;
Content
: ~[rn]+
;
QuestionEnd
: [rn]+ -> skip, popMode
;
A parser grammar could then look like this:
parser grammar TestParser;
options { tokenVocab=TestLexer; }
root
: question+ EOF
;
question
: QuestionStart Content option+
;
option
: OptionStart Content+
;
And the Java code:
String source = "1.titlen" +
"A.aaan" +
"B.bbbn" +
" C.cccn" +
" ...ignored ...n" +
"2.titlen" +
"A.aaan";
Lexer lexer = new TestLexer(CharStreams.fromString(source));
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
ParseTree parseTree = parser.root();
System.out.println(parseTree.toStringTree(parser));
will then print:
(root (question 1. title (option A. aaa) (option B. bbb) (option C. ccc)) (question 2. title (option A. aaa)) <EOF>)
EDIT
Given that you already have target specific code in your grammar, you could just trim the spaces from an option like this (untested!):
OptionStart
: {getCharPositionInLine() == 0}? [ t]* [A-Z] '.'
{setText(getText().trim());}
-> pushMode(ContentMode)
;