Skip to content
Advertisement

Regex pattern matching is getting timed out

I want to split an input string based on the regex pattern using Pattern.split(String) api. The regex uses both positive and negative lookaheads. The regex is supposed to split on a delimiter (,) and needs to ignore the delimiter if it is enclosed in double inverted quotes(“x,y”).

The regex is – (?<!(?<!Q\E)Q\E)Q,E(?=(?:[^Q"E]*(?<=Q,E)Q"E[[^Q,E|Q"E] | [Q"E]]+[^Q"E]*[^Q\E]*[Q"E]*)*[^Q"E]*$)

The input string for which this split call is getting timed out is –

JavaScript

I read that the lookup technics are heavy and can cause the timeouts if the string is too long. And if I remove the backward slashes enclosing ["BOLT,HI-JOK"] at the end of the string, then the regex is able to detect and split.

The pattern also does not detect the first delimiter at place [STIFFENER]","QH20426AD3 with the above string. But if I remove the backward slashes enclosing ["BOLT,HI-JOK"] at the end of the string, then the regex is able to detect it.

I am not very experienced with the lookup in regex, can some one please give hints about how can I optimize this regex and avoid time outs? Any pointers, article links are appreciated!

Advertisement

Answer

If you want to split on a comma, and the strings that follow are from an opening till closing double quote after it:

JavaScript

The pattern matches:

  • , Match a comma
  • (?= Positive lookahad
    • "[^"\]* Match " and 0+ times any char except " or
    • (?:\.[^"\]*)*" Optionally repeat matching to escape any char using the . and again match any chars other than " and /
  • ) Close lookahead

Regex demo | Java demo

JavaScript

Output

JavaScript
Advertisement