I’m building an inverted index, but I can’t seem to get the correct frequencies when I check the database. I read everywhere that you should use a HashMap
, but I’m not quite sure if this is the correct method of doing so. Any ideas?
public class Tokenize { public static void createIndex() throws Exception{ ArrayList<Dokument> dok = new QueryHandler().getDokuments(); ArrayList<String> queries = new ArrayList<String>(); ArrayList<String> queries2 = new ArrayList<String>(); HashMap<String, Integer> frek = new HashMap<String, Integer>(); for(int d = 0; d < dok.size(); d++){ String token = ""; int frekvens = 0; try{ Dokument document = dok.get(d); StringTokenizer st = new StringTokenizer(document.dokument()); while (st.hasMoreTokens()) { token = st.nextToken(); token.replaceAll("[']", ""); token.replaceAll("[,]", ""); token.replaceAll("[)]", ""); token.replaceAll("[(]", ""); token.replaceAll("[.]", ""); frekvens ++; frek.put(token, frekvens); queries.add("INSERT IGNORE INTO termindeks (docID, term) values ("+document.docID()+", '"+token+"')"); queries2.add("INSERT IGNORE INTO invertedindeks (term, docID, termfrekvens) values ('"+token+"', "+document.docID()+", "+ frekvens+")"); } } catch (Exception e) { e.printStackTrace(); System.out.println(token); } } String[] ffs = new String[queries.size()]; ffs = queries.toArray(ffs); getDB().runQueriesIgnoreException(queries.toArray(ffs)); String[] ffs2 = new String[queries2.size()]; ffs2 = queries2.toArray(ffs2); getDB().runQueriesIgnoreException(queries2.toArray(ffs2)); } }
Advertisement
Answer
You should get the value for the token first, increment it and put it again.
Like this in your loop:
Integer frekvens = frek.get(token); //remove the other frekvens as it's not needed - or find a better name for this one ;) if( frekvens == null ) { frekvens = 0 }; frekvens++; frek.put(token, frekvens);