Skip to content
Advertisement

Regular expression to match optional patterns

I know that Regex is a pretty hot topic and that there’s a plethora of similar questions, however, I have not found one which matches my needs.

I need to check the formatting of my string to be as follows:

  • All line must start with 5 digits.
  • Characters 6 to 12 must be white space.
  • Character 13 must be either white space or asterisk.
  • if there is any period, colon or semicolon before the final period, the character must not be preceded by a white space, but it must be followed by a white space.
  • opening parentheses cannot be followed by a white space.
  • closing parentheses cannot be preceded by a white space.

I haven’t tried to implement the colon, semicolon or parentheses, but so far I’m stuck at just the period. These characters are optional so I can’t make a hard check, and I’m trying to catch them but I’m still getting a match in a case like

00000      *TEST .FINAL STATEMENT. //Matches, but it shouldn't match.
00001      *TEST2 . FINAL STATEMENT. //Matches, but it shouldn't match.
00002      *TEST3. FINAL STATEMENT. //Matches, **should** match.

This is the regex I have so far:

^d{5}s{6}[s*][^.]*([^.s]+.s)?[^.]*..*$

I really don’t see how this is happening, especially because I’m using [^.] to indicate I’ll accept anything except a period as a wildcard, and the optional pattern looks correct at a glance: If there’s a period, it should not have white space behind it and it should have white space after it.

Advertisement

Answer

Try this:

^d{5}s{6}[s*]   # Your original pattern
(?:                 # Repeat 0 or more times:
  [^.:;()]*|        # Unconstrained characters
  (?<!s)[.:;](?=s)|    # Punctuation after non-space, followed by space
  ((?!s)|         # Opening parentheses not followed by space
  (?<!s))         # Closing parentheses not preceeded by space
)*
.$                 # Period, then end of string

https://regex101.com/r/WwpssV/1

In the last part of the pattern, the characters with special requirements are .:;(), so use a negative character set to match anything but those characters: [^.:;()]* Then alternate with:

if there is any period, colon or semicolon before the final period, the character must not be preceded by a white space, but it must be followed by a white space.

Fulfilled by (?<!s)[.:;](?=s) – match one of those characters only if not preceded by a space, and if followed by a space.

opening parentheses cannot be followed by a white space.

Fulfilled by ((?!s)

closing parentheses cannot be preceded by a white space.

Fulfilled by (?<!s))

Then just alternate between those 4 possibilities at the end of the pattern.

Advertisement