I’m writing a syntax checker (in Java) for a file that has the keywords and comma (separation)/semicolon (EOL) separated values. The amount of spaces between two complete constructions is unspecified.
What is required:
Find any duplicate words (consecutive and non-consecutive) in the multiline file.
// Example_1 (duplicate 'test'): item1 , test, item3 ; item4,item5; test , item6; // Example_2 (duplicate 'test'): item1 , test, test ; item2,item3;
I’ve tried to apply the (w+)(s*Ws*w*)*1
pattern, which doesn’t catch duplicate properly.
Advertisement
Answer
You may use this regex with mode DOTALL
(single line):
(?s)(bw+b)(?=.*b1b)
RegEx Details:
(?s)
: EnableDOTALL
mode(bw+b)
: Match a complete word and capture it in group #1(?=.*b1b)
: Lookahead to assert that we have back-reference1
present somewhere ahead.b
is used to make sure we match exact same word again.
Additionally:
Based on earlier comments below if intent was to not match consecutive word repeats like item1 item1
, then following regex may be used:
(?s)(bw+b)(?!W+1b)(?=.*b1b)
There is one extra negative lookahead assertion here to make sure we don’t match consecutive repeats.
(?!W+1b)
: Negative lookahead to fail the match for consecutive repeats.