Skip to content
Advertisement

Regex for adding a word to a specific line if line does not contain the word

I have a YAML file with multiple lines and I know there’s one line that looks like this:

...
  schemas: core,ext,plugin
...

Note that there is unknown number of whitespaces at the beginning of this line (because YAML). The line can be identified uniquely by the schemas: expression. The number of existing values for the schemas property is unknown, but greater than zero. And I do not know what these values are, except that one of them might be foo.

I would like to use a regex match-and-replace to append the word ,foo to this line if foo is not already contained in the list of values at any position. foo might appear on any other line but I want to ignore these instances. I don’t want the other lines to be modified.

I’ve tried different regular expressions with lookarounds and capture groups, but none did the job. My latest attempt that looked promising at first was:

(?s)(?!.*foo)(.*schemas:.*)

But this does not match if foo is contained on any other line, which is not what I want.

Any assistance would be very much appreciated. Thanks.

(I use the Java regex engine, btw.)

Advertisement

Answer

Would this work?

^(?!.*foo)(s*schemas:.*)$

If you want to make sure stuff like

food, fool, etc.

matches you can use this:

^(?!.*(?:foos*$|foo,))(s*schemas:.*)$

Replacement:

$1,foo

If I understood your question correctly, you want to make sure only one line is checked for the negative lookahead. This should accomplish that. I tested it on https://regex101.com/ using the Java 8 engine. You can also check what each operator does there.

Explanation:

wrapping the expression with

^$

makes sure that only one line is considered at a time.

The negative lookahead

(?!.*(?:foos*$|foo,))

looks for any “foo” followed by either (whitespaces and a newline) or a comma within this line. If you want to make the expression faster you could probably turn the lookahead into a lookbehind, so that the simpler check for “schemas:” comes first. However, I don’t know if this actually improves performance.

^(s*schemas:.*)(?<!(?:foos?$|foo,))$

With lookbehinds you can’t use the * quantifier, so the regex would match if foo is followed by more than one whitespace.

Advertisement