Skip to content
Advertisement

Find element position in a Java TreeMap

I am working with a TreeMap of Strings TreeMap<String, String>, and using it to implement a Dictionay of words.

I then have a collection of files, and would like to create a representation of each file in the vector space (space of words) defined by the dictionary.

Each file should have a vector representing it with following properties:

  • vector should have same size as dictionary
  • for each word contained in the file the vector should have a 1 in the position corresponding to the word position in dictionary
  • for each word not contained in the file the vector should have a -1 in the position corresponding to the word position in dictionary

So my idea is to use a Vector<Boolean> to implement these vectors. (This way of representing documents in a collection is called Boolean Model – http://www.site.uottawa.ca/~diana/csi4107/L3.pdf)

The problem I am facing in the procedure to create this vector is that I need a way to find position of a word in the dictionary, something like this:

String key;
int i = get_position_of_key_in_Treemap(key); <--- purely invented method...

1) Is there any method like this I can use on a TreeMap?If not could you provide some code to help me implement it by myself?

2) Is there an iterator on TreeMap (it’s alphabetically ordered on keys) of which I can get position?

3)Eventually should I use another class to implement dictionary?(If you think that with TreeMaps I can’t do what I need) If yes, which?

Thanks in advance.

ADDED PART:

Solution proposed by dasblinkenlight looks fine but has the problem of complexity (linear with dimension of dictionary due to copying keys into an array), and the idea of doing it for each file is not acceptable.

Any other ideas for my questions?

Advertisement

Answer

I’d like to thank all of you for the effort you put in answering my question, they all were very useful and taking the best from each of them made me come up to the solution I actually implemented in my project.


What I beleive to be best answers to my single questions are:

2) There is not an Iterator defined on TreeMaps as @Isoliveira sais:

There's no such implementation in the JDK itself. 
Although TreeMap iterates in natural key ordering,
its internal data structures are all based on trees and not arrays
(remember that Maps do not order keys, by definition, 
in spite of that the very common use case).

and as I found in this SO answer How to iterate over a TreeMap?, the only way to iterate on elements in a Map is to use map.entrySet() and use Iterators defined on Set (or some other class with Iterators).


3) It’s possible to use a TreeMap to implement Dictionary, but this will garantuee a complexity of O(logN) in finding index of a contained word (cost of a lookup in a Tree Data Structure).

Using a HashMap with same procedure will instead have complexity O(1).


1) There exists no such method. Only solution is to implement it entirely.

As @Paul stated

Assumes that once getPosition() has been called, the dictionary is not changed.

assumption of solution is that once that Dictionary is created it will not be changed afterwards: in this way position of a word will always be the same.

Giving this assumption I found a solution that allows to build Dictionary with complexity O(N) and after garantuees the possibility to get index of a word contained with constat time O(1) in lookup.

I defined Dictionary as a HashMap like this:

public HashMap<String, WordStruct> dictionary = new HashMap<String, WordStruct>();
  • key –> the String representing the word contained in Dictionary
  • value –> an Object of a created class WordStruct

where WordStruct class is defined like this:

public class WordStruct {

    private int DictionaryPosition;    // defines the position of word in dictionary once it is alphabetically ordered

    public WordStruct(){

    }

    public SetWordPosition(int pos){
        this.DictionaryPosition = pos;
    }

}

and allows me to keep memory of any kind of attribute I like to couple with the word entry of the Dictionary.

Now I fill dictionary iterating over all words contained in all files of my collection:

THE FOLLOWING IS PSEUDOCODE

for(int i = 0; i < number_of_files ; i++){

        get_file(i);

        while (file_contais_words){

            dictionary.put( word(j) , new LemmaStruct());

        }

}   

Once HashMap is filled in whatever order I use procedure indicated by @dasblinkenlight to order it once and for all with complexity O(N)

    Object[] dictionaryArray = dictionary.keySet().toArray();
    Arrays.sort(dictionaryArray);

    for(int i = 0; i < dictionaryArray.length; i++){

        String word = (String) dictionaryArray[i];
        dictionary.get(word).SetWordPosition(i);

    }

And from now on to have index position in alphatebetic order of word in dictionary only thing needed is to acces it’s variable DictionaryPosition:

since word is know you just need to access it and this has constant cost in a HashMap.


Thanks again and Iwish you all a Merry Christmas!!

Advertisement