Unable to Parse header from github CSV URL using Apache Commons



I’m trying to access the header values for each record which is present in CSV file url from github using Apache commons csv library.

This is my code:

@Service
public class CoronaVirusDataService {

    private static String virus_data_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv";
    
    @PostConstruct
    public void getVirusData()
    {
        try
        {
        URL url = new URL(virus_data_url);
        HttpURLConnection con = (HttpURLConnection) url.openConnection();
        BufferedReader in = new BufferedReader( new InputStreamReader(con.getInputStream()));
        
        while((in.readLine()) != null)
        {
            StringReader csvReader = new StringReader(in.readLine());
            Iterable<CSVRecord> records = CSVFormat.DEFAULT.withFirstRecordAsHeader().parse(csvReader);
            for (CSVRecord record : records) {
                String country = record.get("Country/Region");
                System.out.println(country);
            }       
        }
        in.close();
        }
        catch(Exception e) 
        {
            e.printStackTrace();
        }
    }
}

When i run the application i’m getting this error:

java.lang.IllegalArgumentException: A header name is missing in [, Afghanistan, 33.93911, 67.709953, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 4, 4, 4, 4, 5, 7, 8, 11, 12, 13, 15, 16, 18, 20, 24, 25, 29, 30, 34, 41, 43, 76, 80, 91, 107, 118, 146, 175, 197, 240, 275, 300, 338, 368, 424, 445, 485, 532, 556, 608, 666, 715, 785, 841, 907, 934, 997, 1027, 1093]
at org.apache.commons.csv.CSVParser.createHeaders(CSVParser.java:501)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:412)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:378)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:1157)
at com.p1.Services.CoronaVirusDataService.getVirusData(CoronaVirusDataService.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)

Answer

You should not read line by line if you want to read first line as header because the Apache CSV tries to read every line as header. So the exception is thrown. Instead you should pass reader to read data. Below code works fine.

@Service
public class CoronaVirusDataService {

    private static String virus_data_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv";
    
    @PostConstruct
    public void getVirusData()
    {
        try
        {
        URL url = new URL(virus_data_url);
        HttpURLConnection con = (HttpURLConnection) url.openConnection();
        BufferedReader in = new BufferedReader( new InputStreamReader(con.getInputStream()));

            Iterable<CSVRecord> records = CSVFormat.DEFAULT.withFirstRecordAsHeader().parse(in);
            for (CSVRecord record : records) {
                String country = record.get("Country/Region");
                System.out.println(country);
            }       
   
        in.close();
        }
        catch(Exception e) 
        {
            e.printStackTrace();
        }
    }
}


Source: stackoverflow