Skip to content
Advertisement

Java regular expression to match valid Java identifiers

I need to create a regular expression able to find and get valid identifiers in Java code like this:

JavaScript

I have tried to add multiple regexes in a single regex, but how can I build a pattern to exclude reserved words?

I tried this regex ^(((&&|<=|>=|<|>|!=|==|&|!)|([-+=]{1,2})|([.!?)}{;,(-]))|(else|if|float|int)|(d[d.])) but it does not work as expected.

Online demo

In the following picture, how should I match for identifiers?

enter image description here

Advertisement

Answer

A Java valid identifier is:

  1. having at least one character
  2. the first character MUST be a letter [a-zA-Z], underscore _, or dollar sign $
  3. the rest of the characters MAY be letters, digits, underscores, or dollar signs
  4. reserved words MUST not be used as identifiers
  5. Update: as single underscore _ is a keyword since Java 9

A naive regexp to validate the first three conditions would be as follows: (b([A-Za-z_$][$w]*)b) but it does not filter out the reserved words.

To exclude the reserved words, negative look-ahead (?!) is needed to specify a group of tokens that cannot match: b(?!(_b|if|else|for|float|int))([A-Za-z_$][$w]*):

  • Group #1: (?!(_b|if|else|for|float|int)) excludes the list of the specified words
  • Group #2: ([A-Za-z_$][$w]*) matches identifiers.

However, word border b consumes dollar sign $, so this regular expression fails to match identifies starting with $.
Also, we may want to exclude matching inside string and character literals (“not_a_variable”, ‘c’, ‘u65’).

This can be done using positive lookbehind (?<=) to match a group before main expression without including it in the result instead of the word-border class b: (?<=[^$w'"\])(?!(_b|if|else|for|float|int))([A-Za-z_$][$w]*)

Online demo for a short list of reserved words

Next, the full list of the Java reserved words is as follows, which can be collected into a single String of tokens separated with |.

A test class showing the final pattern for regular expression and its usage to detect the Java identifiers is provided below.

JavaScript

Output

JavaScript
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement