Skip to content

How to get from parse tree to Java class file

I am working on a command-line tool with the following functionality:

  1. Parse modified .java files using an extended ANTLR4 Java9 grammar. The syntax in the files is Java, with one modification to the method declaration which includes a purpose, like in this example: public void {marketing} sendEmail() {}
  2. Collect and remove all purposes using a visitor. Collection and analysis of the purposes is the main functionality of the program.
  3. Compile and execute the Java files where the purposes are removed.

I am searching for the simplest and most effective way to achieve step 3. It is out of the scope of my project to build a full compiler, I would prefer to exploit the Java compiler and run javac if possible. I have considered the following approaches, but none seem optimal:

Any input is much appreciated.



You could use TokenStreamRewriter to get the source code without the purpose node (or accomplish many other rewriting tasks). Here’s an example from an application where I conditionally add a top level LIMIT clause to a MySQL query:

001     * Parses the query to see if there's already a top-level limit clause. If none was found, the query is
002     * rewritten to include a limit clause with the given values.
003     *
004     * @param query The query to check and modify.
005     * @param serverVersion The version of MySQL to use for checking.
006     * @param sqlMode The current SQL mode in the server.
007     * @param offset The limit offset to add.
008     * @param count The row count value to add.
009     *
010     * @returns The rewritten query if the original query is error free and contained no top-level LIMIT clause.
011     *          Otherwise the original query is returned.
012     */
013    public checkAndApplyLimits(query: string, serverVersion: number, sqlMode: string, offset: number,
014        count: number): [string, boolean] {
016        this.applyServerDetails(serverVersion, sqlMode);
017        const tree = this.startParsing(query, false, MySQLParseUnit.Generic);
018        if (!tree || this.errors.length > 0) {
019            return [query, false];
020        }
022        const rewriter = new TokenStreamRewriter(this.tokenStream);
023        const expressions = XPath.findAll(tree, "/query/simpleStatement//queryExpression", this.parser);
024        let changed = false;
025        if (expressions.size > 0) {
026            // There can only be one top-level query expression where we can add a LIMIT clause.
027            const candidate: ParseTree = expressions.values().next().value;
029            // Check if the candidate comes from a subquery.
030            let run: ParseTree | undefined = candidate;
031            let invalid = false;
032            while (run) {
033                if (run instanceof SubqueryContext) {
034                    invalid = true;
035                    break;
036                }
038                run = run.parent;
039            }
041            if (!invalid) {
042                // Top level query expression here. Check if there's already a LIMIT clause before adding one.
043                const context = candidate as QueryExpressionContext;
044                if (!context.limitClause() && context.stop) {
045                    // OK, ready to add an own limit clause.
046                    rewriter.insertAfter(context.stop, ` LIMIT ${offset}, ${count}`);
047                    changed = true;
048                }
049            }
040        }
052        return [rewriter.getText(), changed];
053    }

What is this code doing:

  • Line 017: the input is parsed to get a parse tree. If you have done that already, you can pass in the parse tree, of course, instead of parsing again.
  • Line 022 prepares a new TokenStreamRewriter instance with your token stream.
  • Line 023 uses ANTLR4’s XPATH feature to get all nodes of a specific context type. This is where you can retrieve all your purpose contexts in one go. This would also be a solution for your point 2).
  • The following lines only check if a new LIMIT clause must be added at all. Not so interesting for you.
  • Line 046 is the place where you manipulate the token stream. In this case something is added, but you can also replace or remove nodes.
  • Line 052 contains probably what you are most interested in: it returns the original text of the input, but with all the rewrite actions applied.

With this code you can create a temporary java file for compilation. And it could be used to execute two actions from your list at the same time (collect the purposes and remove them).