I will be saving a flattened JSON entity type into Cassandra. I have 2 options for the data model:
((entityType, entityId), jsonPath), value
(entityType, entityId), map<text, text> keyValue
My use case would be, when inserting for each entityId – delete all mappings and insert. And query by entityType, entityId and jsonPath.
Which of the above should be better performing/scalable from a streaming-ingestion-and-UI-query system perspective?
A flattened JSON will have around ~100 fields. Number of entities would be less than a million – in mid hundreds of thousands.
Just to add to what Erick said, large collections in Cassandra can lead to other issues. DataStax has some documentation on how to “freeze” collections to help with different access patterns. The tradeoff, is that non-frozen collections can generate a LOT of tombstones during various high write throughput scenarios, and frozen collections must re-write the entire collection on an in-place write.
Mapping to individual columns is a much better option.