Skip to content
Advertisement

Storing and searching 4+ million documents [closed]

I am expected to implement a storage and search solution for large dataset which has more than 4 million of documents. Each document will have 40 or more fields (or search criteria)

I have worked with Lucene and Solr before, so I tend to use them for this problem (any other ideas and solutions are welcomed of course). But the thing bugs me is the efficient and scalable storage. I have been looking around for Cassandra and MongoDB and some other NoSQL solutions but couldn’t be sure which technology could be the best for the requirement.

I would like to ask if anyone has ever faced a similar issue and what she/he used to solve it..

Advertisement

Answer

Check this survey paper for general reference:

Survey of Document Oriented Datastores, some metrics available
http://cattell.net/datastores/Datastores.pdf

For IEEE subscribers:

NoSQL evaluation: A use case oriented survey
http://www.computer.org/portal/web/csdl/doi/10.1109/CSC.2011.6138544
Link

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement