I have a big test file with 70 million lines of text. I have to read the file line by line.
I used two different approaches:
InputStreamReader isr = new InputStreamReader(new FileInputStream(FilePath),"unicode"); BufferedReader br = new BufferedReader(isr); while((cur=br.readLine()) != null);
and
LineIterator it = FileUtils.lineIterator(new File(FilePath), "unicode"); while(it.hasNext()) cur=it.nextLine();
Is there another approach that can make this task faster?
Advertisement
Answer
1) I am sure there is no difference speedwise, both use FileInputStream internally and buffering
2) You can take measurements and see for yourself
3) Though there’s no performance benefits I like the 1.7 approach
try (BufferedReader br = Files.newBufferedReader(Paths.get("test.txt"), StandardCharsets.UTF_8)) { for (String line = null; (line = br.readLine()) != null;) { // } }
4) Scanner based version
try (Scanner sc = new Scanner(new File("test.txt"), "UTF-8")) { while (sc.hasNextLine()) { String line = sc.nextLine(); } // note that Scanner suppresses exceptions if (sc.ioException() != null) { throw sc.ioException(); } }
5) This may be faster than the rest
try (SeekableByteChannel ch = Files.newByteChannel(Paths.get("test.txt"))) { ByteBuffer bb = ByteBuffer.allocateDirect(1000); for(;;) { StringBuilder line = new StringBuilder(); int n = ch.read(bb); // add chars to line // ... } }
it requires a bit of coding but it can be really faster because of ByteBuffer.allocateDirect
. It allows OS to read bytes from file to ByteBuffer
directly, without copying
6) Parallel processing would definitely increase speed. Make a big byte buffer, run several tasks that read bytes from file into that buffer in parallel, when ready find first end of line, make a String
, find next…