I am trying to get the string starting from third /
in a url.
here is the url:
http://192.168.1.253:18888/2021/03/11/896459e4-875f-455a-a2cb-768c879555e7.png
I wish to get /2021/03/11/896459e4-875f-455a-a2cb-768c879555e7.png
So I used the following regex (?://.+)/.+
?:
marks a non-capturing group, so //192.168.1.253:18888
shouldn’t be matched.
But when I test in regex101.com, its result is //192.168.1.254:18888/2021/03/11/896459e4-875f-455a-a2cb-768c879555e7.png
.
Why is that?
Advertisement
Answer
The reason the regex101.com result is //192.168.1.254:18888/2021/03/11/896459e4-875f-455a-a2cb-768c879555e7.png
is that non-capturing groups (?: ... )
consume the text that they match with. Hence, where it has matched text Regex101 is showing that as a match.
For languages such as Java just match everything that doesn’t include a forward slash after the initial double slash, and only keep the group match:
Regex: `//[^/]+(.+)` Input: `http://192.168.1.253:18888/2021/03/11/896459e4-875f-455a-a2cb-768c879555e7.png` Ignore Match1: `//192.168.1.253:18888/2021/03/11/896459e4-875f-455a-a2cb-768c879555e7.png` Keep Group1: `/2021/03/11/896459e4-875f-455a-a2cb-768c879555e7.png`