Skip to content
Advertisement

How to parse a Clickhouse-SQL statement using ANTRL4?

Objective : Add an additional WHERE clause to any given Clickhouse statement.

I’m using the following Antlr grammars to generate Java classes for a lexer & parser.

Lexer grammar

https://github.com/ClickHouse/ClickHouse/blob/master/utils/antlr/ClickHouseLexer.g4

Parser grammar

https://github.com/ClickHouse/ClickHouse/blob/master/utils/antlr/ClickHouseParser.g4

Problem : I cannot figure out/understand how to interact or create the appropriate POJOs for use with the generated classes that Antlr produces.

Example of statement

JavaScript

Goal of SQL (enrichment code)

JavaScript

I have the follow Java main

JavaScript

Advertisement

Answer

I’d suggest taking a look at TokenStreamRewriter.

First, let’s get the grammars ready.

1 – with TokenStreamRewriter we’ll want to preserve whitespace, so let’s change the -> skip directives to ->channel(HIDDEN)

At the end of the Lexer grammar:

JavaScript

2 – The C++ specific stuff just guards against using keywords more than once. You don’t really need that check for your purposes (and it could be done in a post-parse Listener if you DID need it). So let’s just lose the language specific stuff:

JavaScript

and

JavaScript

NOTE: There seems to be an issue with the grammar not accepting the actual values for an insert statement:

JavaScript

(I’m not going to try to fix that part, so I’ve commented your input to accommodate)

(It would also help if the top level rule needed with an EOF token; without that ANTLR just stops parsing after VALUE. An EOF at the end of a root rule is considered a best practice for exactly this reason.)

The Main program:

JavaScript

The Listener:

JavaScript

Output:

JavaScript
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement