Skip to content
Advertisement

Deleting Strings from an ArrayList between 2 specified tags

I am trying to remove all elements of an ArrayList between a start and an endtag.

My list and my tags:

String startTag = "<p>";
String endTag = "</p>";
List<String> elements = new ArrayList<>();

Let’s say my list looks like this:

[<text>, <p>, <text>, clean me, </text>, </p>, </text>]

I only want to delete the contents between the sepcified tags and the tags themselves. This is my code for doing that:

       boolean delete = false;
       List<String> remove = new ArrayList<>();
        for(String element : elements) {
            if(delete) {
                remove.add(element);
            }

            if(element.startsWith(startTag)) {
                delete = true;
                remove.add(element);
            }
            if(element.endsWith(endTag)) {
                delete = false;
                remove.add(element);
            }
        }
        elements.removeAll(remove);
    }

This is how my list “remove” looks like after that:

[<p>, <text>, clean me, </text>, </p>, </p>]

So after deleting those elements from my list it looks like this:

[]

When it should look like this:

[<text>, </text>]

How can I prevent Strings who have duplicates to be deleted when they are outside of the deletion range?

Advertisement

Answer

How can I prevent Strings who have duplicates to be deleted when they are outside of the deletion range?

By identifying the range to delete by element index instead of by element value. There are lots of ways you could do that, but here’s one that I like:

List<String> remainingElements = elements;
List<String> result = new ArrayList<>();

for (int start = remainingElements.indexOf(startTag);
         start >= 0;
         start = remainingElements.indexOf(startTag)) {
    List<String> tail = remainingElements.subList(start, remainingElements.size());
    int end = tail.indexOf(endTag);

    if (end >= 0) {
        List<String> range = tail.subList(0, end + 1);
        result.addAll(range);
        range.clear();
        remainingElements = tail;
    } else {
        break;
    }
}

Note in particular that a subList is backed by its parent list, so that modifications to the former are reflected in the latter.

Note also that the details presented here follow the apparent idea of your original example: they match the first appearance of startTag with the first appearance after that of endTag. This might not be what you actually want if you need to account for tag nesting. For example, the result with startTag = "<text>"; endTag = "</text>"; would be [</p>, </text>]. You can still use subList in such a case, but you need to be cleverer about identifying the range boundaries.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement