Java 8 Streams Remove Duplicate Letter

Tags: , ,



I’m trying to apply my knowledge of streams to some leetcode algorithm questions. Here is a general summary of the question:

Given a string which contains only lowercase letters, remove duplicate letters so that every letter appears once and only once. You must make sure your result is the smallest in lexicographical order among all possible results.

Example:

Input: "bcabc"
Output: "abc"

Another example:

Input: "cbacdcbc"
Output: "acdb"

This seemed like a simple problem, just stream the values into a new list from the string, sort the values, find the distinct values, and then throw it back into a list, and append the list’s value to a string. Here is what I came up with:

public String removeDuplicateLetters(String s)
{
    char[] c = s.toCharArray();
    List<Character> list = new ArrayList<>();
    for(char ch : c) 
    {
        list.add(ch);
    }
    
    List<Character> newVal = list.stream().distinct().collect(Collectors.toList()); 
    String newStr = "";
    for(char ch : newVal) 
    {
        newStr += ch;
    }
    
    return newStr;
}

The first example is working perfectly, but instead of “acdb” for the second output, I’m getting “abcd”. Why would abcd not be the least lexicographical order? Thanks!

Answer

As I had pointed out in the comments using a LinkedHashSet would be best here, but for the Streams practice you could do this:

public static String removeDuplicateLetters(String s) {
    return s.chars().sorted().distinct().collect(
        StringBuilder::new,
        StringBuilder::appendCodePoint,
        StringBuilder::append
    ).toString();
}

Note: distinct() comes after sorted() in order to optimize the stream, see Holger’s explanation in the comments as well as this answer.

Lot of different things here so I’ll number them:

  1. You can stream the characters of a String using String#chars() instead of making a List where you add all the characters.

  2. To ensure that the resulting string is smallest in lexographical order, we can sort the IntStream.

  3. We can convert the IntStream back to a String by performing a mutable reduction with a StringBuilder. We then convert this StringBuilder to our desired string.

A mutable reduction is the Stream way of doing the equivalent of something like:

for (char ch : newVal) {
    newStr += ch;
}

However, this has the added benefit of using a StringBuilder for concatenation instead of a String. See this answer as to why this is more performant.

For the actual question you have about the conflict of expected vs. observed output: I believe abcd is the right answer for the second output, since it is the smallest in lexographical order.



Source: stackoverflow