How to check whether the file is binary?

Question

I wrote the following method to see whether particular file contains ASCII text characters only or control characters in addition to that. Could you glance at this code, suggest improvements and point out oversights? The logic is as follows: &#8220;If first 500 bytes of a file contain 5 or more Control charac…

Accepted Answer

Since you call this class &#8220;isASCIIText&#8221;, you know exactly what you&#8217;re looking for. In other words, it&#8217;s not &#8220;isTextInCurrentLocaleEncoding&#8221;.  Thus you can be more accurate with:if (thisByte < 32 || thisByte > 127) bin++;edit, a long time later — it&#8217;s pointed out in a comment that this simple check would be tripped up by a text file that started with a lot of newlines. It&#8217;d probably be better to use a table of &#8220;ok&#8221; bytes, and include printable characters (including carriage return, newline, and tab, and possibly form feed though I don&#8217;t think many modern documents use those), and then check the table.

Advertisement

Answer