Are there any well-known libraries in Java for sparse bit vectors?
(And are there guidelines for how sparse is useful to use them vs. java.util.BitSet?)
Advertisement
Answer
The colt library has sparse matrices (1D, 2D and 3D). It also has an efficient BitVector, with 1 bit per value, rather than 8-bits as boolean[]
does.
However, the sparse matrices do not support bits directly – only doubles and objects. You could wrap the 1D sparse double matrix by maping bit index to long indices (bitIndex>>6)
since each long holds 64 bits, convert the retrieved double to a raw long value, and use bit manipulation to access the bits of the retrieved long. A little work, but nowhere near as much as implementing the sparse vector yourself. Once your wrapper is working, you might avoid converting doubles to longs, and implement a real sparse long 1d matrix using the available Colt source code for the double 1D sparse matrix as a starting point.
EDIT: More info. The Colt vectors/matrices require no memory initially for storage, assuming all bits (longs) are initially 0. Setting a value to non-zero consumes memory. Setting the value back to 0 continues to consume memory, although memory for zero values is reclaimed periodically.
If the bits are truly sparse, such that each backing long value only has one bit set, then the storage overhead will be very poor, requiring 64-bits per actual bit stored. But as you mention typical case is 20-40% sparse, then the overhead will be much lower, with possibly no wasted storage if bits are clustered in ranges, e.g. bits from 0-100, then 1000-1100, and 2000-2200 (values in hex.) Overall, only 1/16 of the region is assigned to bits, but the clustering means that the bits are stored with no wasted space.