I’m currently having some issues with a regex to extract a URL.
I want my regex to take URLS such as:
http://stackoverflow.com/questions/ask https://stackoverflow.com http://local:1000 https://local:1000
Through some tutorials, I’ve learned that this regex will find all the above: ^(http|https)://.*$
however, it will also take http://local:1000;http://invalid
http://khttp://
as a single string when it shouldn’t take it at all.
I understand that my expression isn’t written to exclude this, but my issue is I cannot think of how to write it so it checks for this scenario.
Any help is greatly appreciated!
Edit:
Looking at my issue, it seems that I could eliminate my issue as long as I can implement a check to make sure ‘//’ doesn’t occur in my string after the initial http:// or https://, any ideas on how to implement?
Sorry this will be done with Java
I also need to add the following constraint: a string such as http://local:80/test:90
fails because of the duplicate of port…aka I need to have a constraint that only allows two total : symbols in a valid string (one after http/s) and one before port.
Advertisement
Answer
This will only produce a match if if there is no ://
after its first appearance in the string.
^https?://(?!.*://)S+
Note that trying to parse a valid url from within a string is very complex, see
In search of the perfect URL validation regex, so the above does not attempt to do that.
It will just match the protocol and following non-space characters.
In Java
Pattern reg = Pattern.compile("^https?:\/\/(?!.*:\/\/)\S+"); Matcher m = reg.matcher("http://somesite.com"); if (m.find()) { System.out.println(m.group()); } else { System.out.println("No match"); }