I am trying to retrieve values of _id
for inserted documents after successful InsertMany
operation. To achieve this I am using InsertManyResult.getInsertedIds()
. While this approach works most of the time there are cases where not all _id
values are retrieved.
I am not sure if I am doing something wrong but I would assume that InsertManyResult.getInsertedIds()
returns _id
for all the documents inserted.
Problem details
I am inserting 1000 documents in MongoDB in two batches of 500 documents. Each document is approx 1 MB in size.
After batch is inserted using InsertMany
I attempt to read values of _id
via InsertManyResult.getInsertedIds()
and save it to a collection for later use.
I would assume that after inserting 500 documents via InsertMany
the InsertManyResult.getInsertedIds()
would return 500 _id
values. It is however returning only 16 _id
values out of 500.
When I check the Mongo collection directly via Mongo Shell I see that all records were successfully inserted. There is 1000 documents in my test collection. I am just unable to get the _id
of all the inserted document via InsertManyResult.getInsertedIds()
. I only get 32 _id
for 1000 documents inserted.
JSON structure
To replicate the issue I have exactly one JSON which is approx 1 MB in size which looks like this.
{ "textVal" : "RmKHtEMMzJDXgEApmWeoZGRdZJZerIj1", "intVal" : 161390623, "longVal" : "98213019054010317", "timestampVal" : "2020-12-31 23:59:59.999", "numericVal" : -401277306, "largeArrayVal" : [ "MMzJDXg", "ApmWeoZGRdZJZerI", "1LhTxQ", "adprPSb1ZT", ..., "QNLkBZuXenmYE77"] }
Note that key largeArrayVal
is holding almost all the data. I have omitted most of the values for readability.
Sample code
The code below parses JSON shown above into a Document
which is then inserted to MongoDB via InsertMany
. After that is done I try to get inserted _id
using InsertManyResult.getInsertedIds()
.
private static final int MAX_DOCUMENTS = 1000; private static final int BULK_SIZE = 500; private static List<ObjectId> insertBatchReturnIds(List<Document> insertBatch) { List<ObjectId> insertedIds = new ArrayList<ObjectId>(); InsertManyResult insertManyResult; insertManyResult = mongoClient.getDatabase(MONGO_DATABASE).getCollection(MONGO_COLLECTION).insertMany(insertBatch); insertManyResult.getInsertedIds().forEach((k,v) -> insertedIds.add(v.asObjectId().getValue())); System.out.println("Batch inseted:"); System.out.println(" - Was acknowladged: " + Boolean.toString(insertManyResult.wasAcknowledged()).toUpperCase()); System.out.println(" - InsertManyResult.getInsertedIds().size(): " + insertManyResult.getInsertedIds().size()); return insertedIds; } private static void insertDocuments() { int documentsInserted = 0; List<Document> insertBatch = new ArrayList<Document>(); List<ObjectId> insertedIds = new ArrayList<ObjectId>(); final String largeJson = loadLargeJsonFromFile("d:\test-sample.json"); System.out.println("Starting INSERT test..."); while (documentsInserted < MAX_DOCUMENTS) { insertBatch.add(Document.parse(largeJson)); documentsInserted++; if (documentsInserted % BULK_SIZE == 0) { insertedIds.addAll(insertBatchReturnIds(insertBatch)); insertBatch.clear(); } } if (insertBatch.size() > 0) insertedIds.addAll(insertBatchReturnIds(insertBatch)); System.out.println("INSERT test finished"); System.out.println(String.format("Expected IDs retrieved: %d. Actual IDs retrieved: %d.", MAX_DOCUMENTS, insertedIds.size())); if (insertedIds.size() != MAX_DOCUMENTS) throw new IllegalStateException("Not all _ID were returned for each document in batch"); }
Sample output
Starting INSERT test... Batch inseted: - Was acknowladged: TRUE - InsertManyResult.getInsertedIds().size(): 16 Batch inseted: - Was acknowladged: TRUE - InsertManyResult.getInsertedIds().size(): 16 INSERT test finished Expected IDs retrieved: 1000. Actual IDs retrieved: 32. Exception in thread "main" java.lang.IllegalStateException: Not all _ID were returned for each document in batch
My questions
- Is
InsertManyResult.getInsertedIds()
meant to return_id
for all documents inserted? - Is the way I am using
InsertManyResult.getInsertedIds()
correct? - Could size of the inserted JSON be a factor here?
- How should I use
InsertManyResult
to get_id
for inserted documents?
Note
I am aware that I can either read _id
after Document.parse
as it is the driver that generates this or I can select _id
after documents were inserted.
I would like to know how can this be achieved using InsertManyResult.getInsertedIds()
as it seems to be made to fit this purpose.
Advertisement
Answer
This is a bug in the Java driver, and it’s being tracked in https://jira.mongodb.org/browse/JAVA-4436 (reported on January 5, 2022).