Skip to content
Advertisement

Regex to extract valid Http or Https

I’m currently having some issues with a regex to extract a URL.

I want my regex to take URLS such as:

http://stackoverflow.com/questions/ask
https://stackoverflow.com
http://local:1000
https://local:1000

Through some tutorials, I’ve learned that this regex will find all the above: ^(http|https)://.*$ however, it will also take http://local:1000;http://invalid http://khttp://as a single string when it shouldn’t take it at all.

I understand that my expression isn’t written to exclude this, but my issue is I cannot think of how to write it so it checks for this scenario.

Any help is greatly appreciated!

Edit:

Looking at my issue, it seems that I could eliminate my issue as long as I can implement a check to make sure ‘//’ doesn’t occur in my string after the initial http:// or https://, any ideas on how to implement?

Sorry this will be done with Java

I also need to add the following constraint: a string such as http://local:80/test:90 fails because of the duplicate of port…aka I need to have a constraint that only allows two total : symbols in a valid string (one after http/s) and one before port.

Advertisement

Answer

This will only produce a match if if there is no :// after its first appearance in the string.

^https?://(?!.*://)S+

Note that trying to parse a valid url from within a string is very complex, see
In search of the perfect URL validation regex, so the above does not attempt to do that.
It will just match the protocol and following non-space characters.

In Java

Pattern reg = Pattern.compile("^https?:\/\/(?!.*:\/\/)\S+");
Matcher m = reg.matcher("http://somesite.com"); 
if (m.find()) {
    System.out.println(m.group());
} else {
    System.out.println("No match");
}
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement