I am working on a command-line tool with the following functionality:
- Parse modified .java files using an extended ANTLR4 Java9 grammar. The syntax in the files is Java, with one modification to the method declaration which includes a purpose, like in this example:
public void {marketing} sendEmail() {}
- Collect and remove all purposes using a visitor. Collection and analysis of the purposes is the main functionality of the program.
- Compile and execute the Java files where the purposes are removed.
I am searching for the simplest and most effective way to achieve step 3. It is out of the scope of my project to build a full compiler, I would prefer to exploit the Java compiler and run javac if possible. I have considered the following approaches, but none seem optimal:
- Prettyprinting (from parse tree to source code) as proposed in this post: Compiling an AST back to source code. It could be a lot of work on large directories though.
- Use ASM to generate byte code, though as I understand I would need valid java source code or class files for this to work (https://asm.ow2.io/asm4-guide.pdf).
- Build a Java compiler plugin, to modify the AST and remove purposes at the parse step in the compilation (https://www.baeldung.com/java-build-compiler-plugin). I am unsure if the compilation would fail before I can modify the AST because the syntax is not valid.
Any input is much appreciated.
Advertisement
Answer
You could use TokenStreamRewriter
to get the source code without the purpose node (or accomplish many other rewriting tasks). Here’s an example from an application where I conditionally add a top level LIMIT
clause to a MySQL query:
/** 001 * Parses the query to see if there's already a top-level limit clause. If none was found, the query is 002 * rewritten to include a limit clause with the given values. 003 * 004 * @param query The query to check and modify. 005 * @param serverVersion The version of MySQL to use for checking. 006 * @param sqlMode The current SQL mode in the server. 007 * @param offset The limit offset to add. 008 * @param count The row count value to add. 009 * 010 * @returns The rewritten query if the original query is error free and contained no top-level LIMIT clause. 011 * Otherwise the original query is returned. 012 */ 013 public checkAndApplyLimits(query: string, serverVersion: number, sqlMode: string, offset: number, 014 count: number): [string, boolean] { 015 016 this.applyServerDetails(serverVersion, sqlMode); 017 const tree = this.startParsing(query, false, MySQLParseUnit.Generic); 018 if (!tree || this.errors.length > 0) { 019 return [query, false]; 020 } 021 022 const rewriter = new TokenStreamRewriter(this.tokenStream); 023 const expressions = XPath.findAll(tree, "/query/simpleStatement//queryExpression", this.parser); 024 let changed = false; 025 if (expressions.size > 0) { 026 // There can only be one top-level query expression where we can add a LIMIT clause. 027 const candidate: ParseTree = expressions.values().next().value; 028 029 // Check if the candidate comes from a subquery. 030 let run: ParseTree | undefined = candidate; 031 let invalid = false; 032 while (run) { 033 if (run instanceof SubqueryContext) { 034 invalid = true; 035 break; 036 } 037 038 run = run.parent; 039 } 040 041 if (!invalid) { 042 // Top level query expression here. Check if there's already a LIMIT clause before adding one. 043 const context = candidate as QueryExpressionContext; 044 if (!context.limitClause() && context.stop) { 045 // OK, ready to add an own limit clause. 046 rewriter.insertAfter(context.stop, ` LIMIT ${offset}, ${count}`); 047 changed = true; 048 } 049 } 040 } 051 052 return [rewriter.getText(), changed]; 053 }
What is this code doing:
- Line 017: the input is parsed to get a parse tree. If you have done that already, you can pass in the parse tree, of course, instead of parsing again.
- Line 022 prepares a new TokenStreamRewriter instance with your token stream.
- Line 023 uses ANTLR4’s XPATH feature to get all nodes of a specific context type. This is where you can retrieve all your purpose contexts in one go. This would also be a solution for your point 2).
- The following lines only check if a new LIMIT clause must be added at all. Not so interesting for you.
- Line 046 is the place where you manipulate the token stream. In this case something is added, but you can also replace or remove nodes.
- Line 052 contains probably what you are most interested in: it returns the original text of the input, but with all the rewrite actions applied.
With this code you can create a temporary java file for compilation. And it could be used to execute two actions from your list at the same time (collect the purposes and remove them).