I have an ordered set of incoming events and I need to insert them into Cassandra. I want to take advantage of the speed of asynchronous inserts, but my incoming events may have duplicates by key of target table.
If I understand correctly, then asynchronous insertions can’t guarantee data consistency in this case, since asynchronous executions imply the program order of async operations, which implies no guarantee for the sequence of async operations, but I was unable to implement an example in java, where the order of asynchronous inserts does not match the order. And I also could not find information about this in any documentation related to asynchronous inserts with Cassandra driver(datastax-java-driver).
Do I have to take care of data deduplication on my side before async inserting to ensure data consistency in this case?
If you need sample code of what I am doing :
@Autowired private ReactiveCassandraRepository repository; ... Flux.from(eventsList) .flatMap(value -> repository.save(value)) .subscribe()
Advertisement
Answer
This isn’t really a problem and you can definitely maximise throughput using asynchronous writes.
The important thing to note is that the “order” isn’t determined by when the asynchronous request hits the cluster. By default, the Java driver (v3.0+) assigns a client-side timestamp which is going to be the write-time for the request.
When you read the data, only the latest version is returned based on the timestamp so there is no duplication. Cheers!