I have a large file containing two million lines . I’m looking to traverse through each line of the file and, process it into a key value pair and store it into a hashmap to make comparisons later on. However, I do not want to have a hashmap with 2 million key value pairs in the interest of space complexity.
I need to develop an application that will process csv files as soon as the files are created in a predefined directory. Huge number of incoming files is expected. I have seen applications using Apache Commons IO File Monitoring in the production. It works pretty well. I have seen it processing as many as 21 million files in a day.