Counting Inversions using TreeSet Java

Question

I'm using TreeSet for solving counting inversions problem. I'm using following approach which uses gnu_pbds which works in O(logn) time. Algorithm Insert the first element of the array in the Ordered_Set. For all the remaining element in arr[] do the following: Insert the current element in the Ordered_Set. Find the number of element strictly less than current element + 1

Accepted Answer

The order_of_key function returns the number of elements less than the provided key. In your C++ code, you add the element to the set and then call order_of_key element plus one, in order to include the element you just added.Preserving this arrangement in the Java code, the way to implement something like order_of_key faithfully using a Java TreeSet is to callreturn s.headSet(k, false).size();or simplyreturn s.headSet(k).size();because either of these returns the size of the head set containing elements strictly less than k. There’s no need to do a contains-check first. When I run this using the input array from your C++ code, I get the same intermediate results and final result for the number of inversions: 6.Note that Java’s TreeSet also includes the ability to create the tail-set, that is, the elements that are greater than the provided element. This turns out to be (in my opinion) a much simpler way to compute the number of inversions. Prior to adding an element, the size of the tail-set that is larger than the current element is the number of inversions relative to this element. Since we want the tail-set to be strictly larger than the current element, we use tailSet(k, false). We can then do this for each element in the input list:int inversions(List a) { var s = new TreeSet(); int invcount = 0; for (int k : a) { invcount += s.tailSet(k, false).size(); s.add(k); } return invcount;}inversions(List.of(32, 35, 43, 1, 38, 39, 42)) // result is 6UPDATE 2020-06-24The above code works only for input with unique values. If the input has duplicate values, it doesn’t work. I note that in the C++ code, a tree that uses the less_equal comparison function preserves duplicates, while using less compresses out duplicates.The reason that keeping duplicates is important is that each element — even if it’s a duplicate — can count as an inversion. Thus, input of [2, 2, 1] is considered to have two inversions. A Java TreeSet compresses out duplicates, so we have to do some additional work to preserve them.One way to allow “duplicate” int values is to make them unique somehow. This can be done by creating a new object to contain the int value paired with a counter that’s always incremented. Duplicate values are made unique since they’ll have different counter values. Here’s a class that does that:static class UniqInt implements Comparable { static int count = 0; final int value; final int uniq; UniqInt(int value) { this.value = value; uniq = count++; } public int compareTo(UniqInt other) { int c = Integer.compare(this.value, other.value); return c != 0 ? c : Integer.compare(this.uniq, other.uniq); }}Note that the compareTo method here compares both the value and the “uniq” counter value, so creating multiple UniqInt instances will differ from each other and will be totally ordered. Once we have this class, we essentially do the same thing, except that we keep track of UniqInt instead of Integer objects in the TreeSet.int inversions1(List a) { var s = new TreeSet(); int invcount = 0; for (int k : a) { var u = new UniqInt(k); invcount += s.tailSet(u, false).size(); s.add(u); } return invcount;}For the input provided in the comment (which includes duplicate values),84, 2, 37, 3, 67, 82, 19, 97, 91, 63, 27, 6, 13, 90, 63, 89, 100, 60, 47, 96, 54, 26, 64, 50, 71, 16, 6, 40, 84, 93, 67, 85, 16, 22, 60this gives the expected result of 290.This is a bit fragile, however. Note that creating a new UniqInt with the same value always creates an instance that is greater than any existing one because the counter is always incremented. (Until you’ve created 2^31 of them.) When the new instance is created, its tail-set will never include any of the duplicate values already in the TreeSet. This is probably reasonable for this small example, but if this were part of a larger system, I’d think more carefully about how to get the correct head-set or tail-set relative to some value, without having to rely on the most recently created UniqInt being greater than all previous ones.

Counting Inversions using TreeSet Java

Algorithm

My code

Respective C++ Code

Observations

Advertisement

Answer