Java Programmingtechnic to check files performant?

Question

I'm searching for a pattern or programming technic in Java for my service. The Input of the service is an Array of Strings. These Strings represent a file and some information. The output of the service is a Map of the exact String and a Boolean. The Boolean represents that the information exists in the file. e.g. Input Output I

Accepted Answer

To make an efficient solution, the assumptions and preconditions of the system must be considered.Assuming the file sizes are smallIf each file is going to be relatively small, then I’d recommend caching the contents of the files.When search patterns are unpredictableIf you cannot predict the possible search terms or the contents of the files cannot be split into searchable items, then simply checking the cached file contents with String#contains will have to do.However, if search terms are predictable, or files contain a set number of search terms, then you can attempt to index these terms using the methods discussed below (each have different computation requirements).O(m) time complexity approach – where m is number of filesAssuming each search term is going to be a word in the file, it’s a good idea to store the file contents as a HashSet (to take advantage of the O(1) time complexity of HashSet#contains), with each element being a word or potential search term.That way, you could have a Map like so://the following code runs once at startup//map containing file names against their words as a HashSetMap> fileContents = new HashMap<>();//add files to Map (assuming parseFileContents returns a HashSet)fileContents.put("file1", parseFileContents("file1"));fileContents.put("file2", parseFileContents("file2"));fileContents.put("file3", parseFileContents("file3"));//the following code runs every time a request is made//get files containing the word "dog" (time complexity is O(m) where m is the number of files)fileContents.entrySet() .stream() .filter(e -> e.getValue().contains("dog")) .map(Map.Entry::getKey) .collect(LinkedList::new);O(1) time complexity approachIf the search patterns were predictable and there weren’t many, then you could create a Map where search terms are keys and a List of files are the values://the following code runs once at startup//map containing possible search patterns against files that they appear inMap> patternsAgainstFiles = new HashMap<>();//then loop through all files and each term found in the file as a key to the map,//with the file being added to the list in the value//the following code runs every time a request is made//get files containing the word "dog" (time complexity is O(1) after initial computation at startup)patternsAgainstFiles.getOrDefault("dog", new LinkedList<>());Other optionUse this option when:Files are too large to store in memoryFiles could change during program executionIn this case, you have no choice but to actually search the file in secondary storage every time you wish to search for the given pattern in the file.The best method I can think for doing this at the time of writing, is opening a FileInputStream (or something similar) for each file, then reading one byte at a time and appending it to a ‘current buffer’ which is never longer than the length of the search term in bytes. Whenever the ‘current buffer’ has the same length (in bytes) as the search term, it is evaluated against the search term, and if they match then the search term is in the file, but if they don’t match pop the first byte in the ‘current buffer’ and continue.The above is all my suggestions and there may be more efficient ways to achieve the task at hand. These are just the best approaches I could think of as of writing.

Java Programmingtechnic to check files performant?

Advertisement

Answer

Assuming the file sizes are small

When search patterns are unpredictable

O(`m`) time complexity approach – where `m` is number of files

O(1) time complexity approach

Other option