Ambiguity in a CodeForces Problem – usage of HashSet Vs LinkedHashSet

Tags: , , , ,



I was solving a Codeforces problem yesterday. The problem’s URL is this

I will just explain the question in short below.

Given a binary string, divide it into a minimum number of subsequences
in such a way that each character of the string belongs to exactly one
subsequence and each subsequence looks like “010101 …” or “101010
…” (i.e. the subsequence should not contain two adjacent zeros or
ones).

Now, for this problem, I had submitted a solution yesterday during the contest. This is the solution. It got accepted temporarily and on final test cases got a Time limit exceeded status.

So today, I again submitted another solution and this passed all the cases.

In the first solution, I used HashSet and in the 2nd one I used LinkedHashSet. I want to know, why didn’t HashSet clear all the cases? Does this mean I should use LinkedHashSet whenever I need a Set implementation? I saw this article and found HashSet performs better than LinkedHashSet. But why my code doesn’t work here?

Answer

This question would probably get more replies on Codeforces, but I’ll answer it here anyways.

After a contest ends, Codeforces allows other users to “hack” solutions by writing custom inputs to run on other users’ programs. If the defending user’s program runs slowly on the custom input, the status of their code submission will change from “Accepted” to “Time Limit Exceeded”.

The reason why your code, specifically, changed from “Accepted” to “Time Limit Exceeded” is that somebody created an “anti-hash test” (a test on which your hash function results in many collisions) on which your program ran slower than usual. If you’re interested in how such tests are generated, you can find several posts on Codeforces, like this one: https://codeforces.com/blog/entry/60442.

As linked by @Photon, there’s a post on Codeforces explaining why you should avoid using Java.HashSet and Java.HashMap: https://codeforces.com/blog/entry/4876, which is essentially due to anti-hash tests. In some instances, adding the extra log(n) factor from a balanced BST may not be so bad (by using TreeSet or TreeMap). In many cases, an extra log(n) factor won’t make your code time out, and it gives you protection from anti-hash tests.

How do you determine whether your algorithm is fast enough to add the log(n) factor? I guess this comes with some experience, but most people suggest performing some sort of calculation. Most online judges (including Codeforces) show the time that your program is allowed to run on a particular problem (usually somewhere between one and four seconds), and you can use 10^9 constant-time operations per second as a rule of thumb when performing calculations.



Source: stackoverflow