I’m writing a syntax checker (in Java) for a file that has the keywords and comma (separation)/semicolon (EOL) separated values. The amount of spaces between two complete constructions is unspecified.
What is required:
Find any duplicate words (consecutive and non-consecutive) in the multiline file.
// Example_1 (duplicate 'test'): item1 , test, item3 ; item4,item5; test , item6; // Example_2 (duplicate 'test'): item1 , test, test ; item2,item3;
I’ve tried to apply the
(w+)(s*Ws*w*)*1 pattern, which doesn’t catch duplicate properly.
You may use this regex with mode
DOTALL (single line):
(bw+b): Match a complete word and capture it in group #1
(?=.*b1b): Lookahead to assert that we have back-reference
1present somewhere ahead.
bis used to make sure we match exact same word again.
Based on earlier comments below if intent was to not match consecutive word repeats like
item1 item1, then following regex may be used:
There is one extra negative lookahead assertion here to make sure we don’t match consecutive repeats.
(?!W+1b): Negative lookahead to fail the match for consecutive repeats.