Skip to content
Advertisement

Ultra-fast “Begins With” Query from Disk

I have a 40MB (too big for memory in this case) list of strings that I want to do “begins with” queries on to extract matches. Anyone know of a good data structure for this? Bonus points for an existing os java implementation. I would be willing to sacrifice “begins with” to just exact matching if something already exists. A disk-based trie sounds ideal.

Advertisement

Answer

It looks like you need something like this: http://en.wikipedia.org/wiki/Trie

An implementation in Java can be found here, although it isn’t disk-based. I’ll keep searching :/

Useful papers: Trie methods for text and spatial data on secondary storage, B-tries for disk-based string management

Edit: I came across this might might be useful: MG4J: Managing Gigabytes for Java™

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement