# Find top k frequent words in real time data stream

I am trying to solve an algorithms problem using java tree set.

The problem as follows:

```Find top k frequent words in realtime data stream.

Implement three methods for Topk Class:

TopK(k). The constructor.
topk(). Get the current top k frequent words.
```

And my thought was to use a hashmap to remember frequencies and a treeset as a buffer.

My implementation passed most of the case, except one:

```TopK(10)
topk()
```

The answer supposed to be [fb,aw] but now it’s [fb,aw, fb] However, my code passed test case like:

```    TopK(10)
topk()
```

and

```TopK(10)
topk()
```

I have no idea what’s wrong, so I printed some value when the comparator is called. It gave me this:

``` aw aw
11111111
fb aw
33333333
fb aw
33333333
fb aw
222222222
fb aw
222222222
```

Which means, the second “fb” was compared to “aw” twice and the comparator is done. I spent hours to debug and I have found nothing so far.

Here is my implementation:

```public class TopK {
int size;
HashMap<String, Integer> map;
TreeSet<String> seen;
public TopK(int k) {
// do intialization if necessary
size = k;
seen = new TreeSet<String>(new Comparator<String>(){
@Override
public int compare(String str1, String str2){
System.out.println(str1 + " "+ str2);
if (str1.equals(str2)){
System.out.println("11111111");
return 0;
}
// important !https://www.jiuzhang.com/qa/7646/
// 128 以后integer就不同了
int number1 = map.get(str1);
int number2 = map.get(str2);
if (number1 != number2){
System.out.println("222222222");
return map.get(str1) - map.get(str2);
} else {
System.out.println("33333333");
return str2.compareTo(str1);
}
}
});
map = new HashMap<String, Integer>();
}

/*
* @param word: A string
* @return: nothing
*/
if (!map.containsKey(word)){
map.put(word, 0);
}
map.put(word, map.get(word) + 1);

if (seen.contains(word)){
seen.remove(word);
} else {
if (seen.size() > size){
seen.pollFirst();
}
}
}

/*
* @return: the current top k frequent words.
*/
public List<String> topk() {
List<String> results = new ArrayList<String>();
Iterator it = seen.iterator();
while(it.hasNext()) {
String str = (String)it.next();
}
return results;
}
}
```

Our first clue is that case:

```aw
fb
fb
```

Fails but:

```iiiii
fb
fb
```

Succeed.

This can happen only because of the line: `return str2.compareTo(str1);` – if the number of appearances are different order by string compare (this could be check easily – please do that).

The only explanation I can think of is the `contains` function of java TreeSet has “optimization” of searching only until where the element should be – if you have order and the element is not where it should be then consider it as none exist in the TreeSet (think about array that should be sort for checking for number you run in log(n) but no on all array – so if he exist in wrong position you will miss him).

Notice that you change the place where the element should be before checking the `contains` function. So let look at the 2th iteration – we have `fb` and `aw` both with value of 1 in the map. On the TreeSet the are as `[fb,aw]` (because string compare as explain before). Now you change the map and `fb` have value of 2 -> it should be in the last place but the `contains` function compare to `aw` and think it should be after it – but he is the last element so it assume `fb` does not exist and just add him -> That why you see 2 compares between `fb` and `aw` – one for the `contain` and one for the `add`.

Hope that was understandable….

Source: stackoverflow