Skip to content
Advertisement

Regex to return all subdomains from a given domain

Given a domain string like aaaa.bbbb.cccc.dddd I am trying to iterate over all of its subdomains i.e.

aaaa.bbbb.cccc.dddd
bbbb.cccc.dddd
cccc.dddd
dddd

I thought this regex ((?:[a-zA-Z0-9]+.)*)([a-zA-Z0-9]+)$ should do the trick (please ignore the fact, that I am only matching these characters [a-zA-Z0-9]), however it only matches the full string.

How can I modify it to make it work?

Edit 1: The following code

var pattern = Pattern.compile("((?:[a-zA-Z0-9]+\.)*)([a-zA-Z0-9]+)$"); //fixed regex here
var matcher = pattern.matcher("aaaa.bbbb.cccc.dddd");
matcher.results()
    .forEach(matchResult -> System.out.println(matchResult.group()));

should print (in any order)

aaaa.bbbb.cccc.dddd
bbbb.cccc.dddd
cccc.dddd
dddd

Advertisement

Answer

The regex you’re looking for is

(?=(?:^|.)([.w]+)*)

This pattern is based on lookahead. It can cross-match substrings that have already been matched in previous iterations.

Java Example

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static void main(String[] args) {
        final String regex = "(?=(?:^|\.)([\.\w]+)*)";
        final String domain = "aaaa.bbbb.cccc.dddd";
        
        final Pattern pattern = Pattern.compile(regex);
        final Matcher matcher = pattern.matcher(domain);
        
        while (matcher.find()) {
            for (int i = 1; i <= matcher.groupCount(); i++) {
                System.out.println(matcher.group(i));
            }
        }
    }
}
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement