Skip to content
Advertisement

How to force ANTLR to parse all input CharStream

I’m using ANTLR4 to parse a syntax file. When I use BaseErrorListener to detect errors, I got a problem. When faced with an illegal input string, ANTLR automatically matches the appropriate branch and then ignores the subsequent stream of characters even if it contains errors. And I want to detect that error. Here are my g4 file and java file.
TransitionLexer is my lexer file and TransitionCondition is my parser file. ErrorDialogListener.java is my errorListener and Test.java id main java file.

TransitionLexer.g4

lexer grammar TransitionLexer;

BOOLEAN: 'true' | 'false';
IF: 'if';
THEN: 'then';
ELSE: 'else';

NAME: (ALPHA | CHINESE | '_')(ALPHA | CHINESE | '_'|DIGIT)*;

ALPHA: [a-zA-Z];
CHINESE: [u4e00-u9fa5];

NUMBER: INT | REAL;
INT: DIGIT+
    |'(-'DIGIT+')';
REAL: DIGIT+ ('.' DIGIT+)?
    | '(-' DIGIT+ ('.' DIGIT+)? ')';
fragment DIGIT: [0-9];

OPCOMPARE: '='|'>='|'<='|'>'|'<';
WS: [ tnr]+ ->skip;
SL_COMMENT:  '/*' .*? '*/' ->skip;

TransitionCondition.g4

grammar TransitionCondition;
import TransitionLexer;

condition : stat+;
stat : expr;
expr: expr (('and' | 'or') expr)+
    | '(' expr ')'
    | '(' var OPCOMPARE value ')'
    | booleanExpr
    | BOOLEAN
    ;

var: localStates
     | globalStates
     | connector
     ;
localStates: NAME;
globalStates: 'Top' ('.' brick)+ '.' NAME;
connector: brick '.' NAME;

value: userdefinedValue | basicValue;
userdefinedValue: NAME;
basicValue: basicValue op=('*'|'/') basicValue
                    | basicValue op=('+' | '-') basicValue
                    | basicValue ('and' | 'or') basicValue
                    | NUMBER | BOOLEAN
                    | '(' basicValue ')'
                    ;

booleanExpr: booleanExpr OPCOMPARE booleanExpr
           | '(' booleanExpr ')'
           | NUMBER (OPCOMPARE|'*'| '/'|'+'|'-') NUMBER
           ;
brick: NAME;

ErrorDialogListener.java

package errorprocess;

import java.awt.Color;
import java.awt.Container;
import java.util.Collections;
import java.util.List;

import javax.swing.JDialog;
import javax.swing.JFrame;
import javax.swing.JLabel;

import org.antlr.v4.runtime.BaseErrorListener;
import org.antlr.v4.runtime.Parser;
import org.antlr.v4.runtime.RecognitionException;
import org.antlr.v4.runtime.Recognizer;
import org.antlr.v4.runtime.atn.ATNConfigSet;
import org.antlr.v4.runtime.dfa.DFA;

public class ErrorDialogListener extends BaseErrorListener {


    @Override
    public void reportContextSensitivity(Parser recognizer, DFA dfa, int startIndex, int stopIndex, int prediction,
            ATNConfigSet configs) {
        System.out.println(dfa.toLexerString());
        System.out.println(dfa.getStates());        
        super.reportContextSensitivity(recognizer, dfa, startIndex, stopIndex, prediction, configs);
    }

    @Override
    public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine,
            String msg, RecognitionException e) {
        List<String> stack = ((Parser)recognizer).getRuleInvocationStack();
        Collections.reverse(stack);
        StringBuilder buf = new StringBuilder();
        buf.append("rule stack: "+stack+" ");
        buf.append("line "+line+":"+charPositionInLine+" at "+
                   offendingSymbol+": "+msg);

        JDialog dialog = new JDialog();
        Container contentPane = dialog.getContentPane();
        contentPane.add(new JLabel(buf.toString()));
        contentPane.setBackground(Color.white);
        dialog.setTitle("Syntax error");
        dialog.pack();
        dialog.setLocationRelativeTo(null);
        dialog.setDefaultCloseOperation(JFrame.DISPOSE_ON_CLOSE);
        dialog.setVisible(true);
    }

}

Test.java

package errorprocess;

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.atn.PredictionMode;

import antlr4.my.transition.TransitionConditionLexer;
import antlr4.my.transition.TransitionConditionParser;

public class Test {

    public static void main(String[] args) throws IOException {
        InputStream in = new FileInputStream("G:\AltaRica\ANTLR4\test\condition\t.expr");
        ANTLRInputStream input = new ANTLRInputStream(in);
        TransitionConditionLexer lexer = new TransitionConditionLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        TransitionConditionParser parser = new TransitionConditionParser(tokens);
        parser.removeErrorListeners();
        parser.addErrorListener(new ErrorDialogListener());
//        parser.addErrorListener(new DiagnosticErrorListener());
//        parser.getInterpreter().setPredictionMode(PredictionMode.LL_EXACT_AMBIG_DETECTION);
//        parser.getInterpreter().setPredictionMode(PredictionMode.LL);
        parser.condition();
    }

}

The main problem

When my input is (Top.b2.states = nominal) and (b1.i1 = wrong) and (states >= 5.5), the parser works fine.
But when my input is (Top.b2.states = nominal) aaa (b1.i1 = wrong) and (states >= 5.5), the parser only parse (Top.b2.states = nominal) and ignores words after aaa which is not right with syntax file.
I guess the reason is that the parser follows the second branch of my first rule in TransitionCondition.g4, which is expr: ‘(‘ expr ‘)’, and simply ignores others. So How to force ANTLR recognize all input or how to force ANTLR only choose the first branch(expr: expr ((‘and’ | ‘or’) expr)+) in this situation?

What I’ve tried.

I tried to use DiagnosticErrorListener or override reportContextSensitivity() but it seems not worked.

Advertisement

Answer

Your main rule needs to end with the EOF token – an ANTLR-provided special token that matches end of input.

If the token’s not there, ANTLR will just parse whatever it can match and then stop. By putting the EOF at the end of your entry rule, you tell ANTLR that whatever it parses must end at the end of input.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement