Skip to content
Advertisement

How do I count word occurrences in a csv file?

I have a CSV file that I need to read and display the number of occurrences of each word, the application should only count words that have more than one letter and not alphanumerical also turned to lowercase.

This is what I have right now and I’m stuck at this and have no ideas where to go from this.

public static void countWordNumber() throws IOException, CsvException

String pathFile1 = "src/main/resources/Documents/Example.csv"

 {

        CSVReader reader = new CSVReaderBuilder(new FileReader(pathFile1)).withSkipLines(1).build();

        Map<String, Integer> frequency = new HashMap<>();
        String[] line;


        while ((line = reader.readNext()) != null) {
            String words = line[1];

            words = words.replaceAll("\p{Punct}", " ").trim();
            words = words.replaceAll("\s{2}", " ");
            words = words.toLowerCase();

            if (frequency.containsKey(words)) {
                frequency.put(words, frequency.get(words) + 1);
            } else {
                frequency.put(words, 0);
            }


        }


    }

I am trying to read the second index in the array list of the csv, which is line[1] , This is where the text of the document is located.

I have replaced all punctuation with spaces and trimmed it, also if there are more than 2 spaces I have replaced those with 1 and made it lowercase.

The output I am trying to achieve is:

Title of document: XXXX

Word: is, Value: 3 

EDIT: This is an example of my input file.

title,text,date
exampleTitle,This is is is an example example, April 2022

Advertisement

Answer

Your solution does not look that bad. But for initialization i would replace

frequency.put(words, 0);

with

frequency.put(words, 1);

Since I am mising your Input file i created a dummy that works fine.

    Map<String, Integer> frequency = new HashMap<>();
    List<String> csvSimulation = new ArrayList<String>();
    csvSimulation.add("test");
    csvSimulation.add( "marvin");
    csvSimulation.add("aaaaa");
    csvSimulation.add("nothing");
    csvSimulation.add("test");
    csvSimulation.add("test");
    csvSimulation.add("aaaaa");
    csvSimulation.add("stackoverflow");
    csvSimulation.add("test");
    csvSimulation.add("bread");

    Iterator<String> iterator = csvSimulation.iterator();


    while(iterator.hasNext()){
        String words = iterator.next();
        words = words.toLowerCase();
        if (frequency.containsKey(words)) {
            frequency.put(words, frequency.get(words) + 1);
        } else {
            frequency.put(words, 1);
        }

    }

    System.out.println(frequency);

Are you sure that accessing line[1] in an loop while iteration is correct? The correct reading of the input seems to be the problem for me. Without seeing your CSV file i however cant help you any further.

EDIT:

with the provided csv data an adjustemt to your Code like this would solve your Problem

.....
.....
    while ((line = reader.readNext()) != null) {
        String words = line[1];

        words = words.replaceAll("\p{Punct}", " ").trim();
        words = words.replaceAll("\s{2}", " ");
        words = words.toLowerCase();
        String[] singleWords = words.split(" ");
        
        for(int i = 0 ; i < singleWords.length; i++) {
            String currentWord = singleWords[i];
            if (frequency.containsKey(currentWord)) {
                frequency.put(currentWord, frequency.get(currentWord) + 1);
            } else {
                frequency.put(currentWord, 1);
            }   
        }


    }
    
    System.out.println("Word: is, Value: " + frequency.get("is"));
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement