Skip to content
Advertisement

How do I remove duplicate lines and ignore some of the text? [closed]

There is a list containing lines of the following form:

1/ce/a6/5a/1cea65ab9260df8d55fb29ce0df570d3.jpg ::: 2021-09-17T17:07:52Z

How do I remove duplicate lines but still ignore the date? that is, ignore the date at the end of the line:

::: 2021-09-17T17:07:52Z

Only the first part of the string before the date is important:

1/ce/a6/5a/1cea65ab9260df8d55fb29ce0df570d3.jpg

Advertisement

Answer

This should work:

public static void main(String[] args) {
    String[] input = {"1/ce/a6/5a/1cea65ab9260df8d55fb29ce0df570d3.jpg ::: 2021-09-17T17:07:52Z",
            "1/ce/a6/5a/1cea65ab9260df8d55fb29ce0df570d4.jpg ::: 2021-09-17T17:07:52Z",
            "1/ce/a6/5a/1cea65ab9260df8d55fb29ce0df570d3.jpg ::: 2021-09-17T17:07:00Z"};

    HashMap<String, String> outMap = new HashMap<>();
    List<String> keys = new LinkedList<>();
    for(String line:input) {
        String key = line.substring(0, line.indexOf(":::"));
        String oldVal = outMap.putIfAbsent(key, line);
        if(oldVal==null) {
            keys.add(key);
        }
    }
    List<String> collect = keys.stream().map(key -> outMap.get(key)).collect(Collectors.toList());
    collect.forEach(System.out::println);
}

For each line, the part before the ::: is treated as a key. The HashMap is used to remember if a line with that key was already encountered in the input list and the first occurrence is saved in the Map.

The Map has one problem though: the order of the things it contains is not preserved. To solve this, we remember the order of the keys using the List<String> keys.

This code prints:

1/ce/a6/5a/1cea65ab9260df8d55fb29ce0df570d3.jpg ::: 2021-09-17T17:07:52Z
1/ce/a6/5a/1cea65ab9260df8d55fb29ce0df570d4.jpg ::: 2021-09-17T17:07:52Z
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement