Skip to content
Advertisement

Mapping a stream of tokens to a stream of n-grams in Java 8

I think this is a fairly basic question concerning Java 8 streams, but I have a difficult time thinking of the right search terms. So I am asking it here. I am just getting into Java 8, so bear with me.

I was wondering how I could map a stream of tokens to a stream of n-grams (represented as arrays of tokens of size n). Suppose that n = 3, then I would like to convert the following stream

{1, 2, 3, 4, 5, 6, 7}

to

{[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6], [5, 6, 7]}

How would I accomplish this with Java 8 streams? It should be possible to compute this concurrently, which is why I am interested in accomplishing this with streams (it also doesn’t matter in what order the n-arrays are processed).

Sure, I could do it easily with old-fashioned for-loops, but I would prefer to make use of the stream API.

Advertisement

Answer

Such an operation is not really suited for the Stream API. In the functional jargon, what you’re trying to do is called a sliding window of size n. Scala has it built-in with the sliding() method, but there is nothing built-in in the Java Stream API.

You have to rely on using a Stream over the indexes of the input list to make that happen.

public static void main(String[] args) {
    List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7);
    List<List<Integer>> result = nGrams(list, 3);
    System.out.println(result);
}

private static <T> List<List<T>> nGrams(List<T> list, int n) {
    return IntStream.range(0, list.size() - n + 1)
                    .mapToObj(i -> new ArrayList<>(list.subList(i, i + n)))
                    .collect(Collectors.toList());
}

This code simply makes a Stream over the indexes of the input list, maps each of them to a new list that is the result of getting the values of the list from i to i+n (excluded) and collect all that into a List.

Advertisement