Given a domain string like aaaa.bbbb.cccc.dddd I am trying to iterate over all of its subdomains i.e.
aaaa.bbbb.cccc.dddd bbbb.cccc.dddd cccc.dddd dddd
I thought this regex ((?:[a-zA-Z0-9]+.)*)([a-zA-Z0-9]+)$ should do the trick (please ignore the fact, that I am only matching these characters [a-zA-Z0-9]), however it only matches the full string.
How can I modify it to make it work?
Edit 1: The following code
var pattern = Pattern.compile("((?:[a-zA-Z0-9]+\.)*)([a-zA-Z0-9]+)$"); //fixed regex here
var matcher = pattern.matcher("aaaa.bbbb.cccc.dddd");
matcher.results()
.forEach(matchResult -> System.out.println(matchResult.group()));
should print (in any order)
aaaa.bbbb.cccc.dddd bbbb.cccc.dddd cccc.dddd dddd
Advertisement
Answer
The regex you’re looking for is
(?=(?:^|.)([.w]+)*)
This pattern is based on lookahead. It can cross-match substrings that have already been matched in previous iterations.
Java Example
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
final String regex = "(?=(?:^|\.)([\.\w]+)*)";
final String domain = "aaaa.bbbb.cccc.dddd";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(domain);
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
}
}