I am trying to retrieve values of _id for inserted documents after successful InsertMany operation. To achieve this I am using InsertManyResult.getInsertedIds(). While this approach works most of the time there are cases where not all _id values are retrieved.
I am not sure if I am doing something wrong but I would assume that InsertManyResult.getInsertedIds() returns _id for all the documents inserted.
Problem details
I am inserting 1000 documents in MongoDB in two batches of 500 documents. Each document is approx 1 MB in size.
After batch is inserted using InsertMany I attempt to read values of _id via InsertManyResult.getInsertedIds() and save it to a collection for later use.
I would assume that after inserting 500 documents via InsertMany the InsertManyResult.getInsertedIds() would return 500 _id values. It is however returning only 16 _id values out of 500.
When I check the Mongo collection directly via Mongo Shell I see that all records were successfully inserted. There is 1000 documents in my test collection. I am just unable to get the _id of all the inserted document via InsertManyResult.getInsertedIds(). I only get 32 _id for 1000 documents inserted.
JSON structure
To replicate the issue I have exactly one JSON which is approx 1 MB in size which looks like this.
{
"textVal" : "RmKHtEMMzJDXgEApmWeoZGRdZJZerIj1",
"intVal" : 161390623,
"longVal" : "98213019054010317",
"timestampVal" : "2020-12-31 23:59:59.999",
"numericVal" : -401277306,
"largeArrayVal" : [ "MMzJDXg", "ApmWeoZGRdZJZerI", "1LhTxQ", "adprPSb1ZT", ..., "QNLkBZuXenmYE77"]
}
Note that key largeArrayVal is holding almost all the data. I have omitted most of the values for readability.
Sample code
The code below parses JSON shown above into a Document which is then inserted to MongoDB via InsertMany. After that is done I try to get inserted _id using InsertManyResult.getInsertedIds().
private static final int MAX_DOCUMENTS = 1000;
private static final int BULK_SIZE = 500;
private static List<ObjectId> insertBatchReturnIds(List<Document> insertBatch)
{
List<ObjectId> insertedIds = new ArrayList<ObjectId>();
InsertManyResult insertManyResult;
insertManyResult = mongoClient.getDatabase(MONGO_DATABASE).getCollection(MONGO_COLLECTION).insertMany(insertBatch);
insertManyResult.getInsertedIds().forEach((k,v) -> insertedIds.add(v.asObjectId().getValue()));
System.out.println("Batch inseted:");
System.out.println(" - Was acknowladged: " + Boolean.toString(insertManyResult.wasAcknowledged()).toUpperCase());
System.out.println(" - InsertManyResult.getInsertedIds().size(): " + insertManyResult.getInsertedIds().size());
return insertedIds;
}
private static void insertDocuments()
{
int documentsInserted = 0;
List<Document> insertBatch = new ArrayList<Document>();
List<ObjectId> insertedIds = new ArrayList<ObjectId>();
final String largeJson = loadLargeJsonFromFile("d:\test-sample.json");
System.out.println("Starting INSERT test...");
while (documentsInserted < MAX_DOCUMENTS)
{
insertBatch.add(Document.parse(largeJson));
documentsInserted++;
if (documentsInserted % BULK_SIZE == 0)
{
insertedIds.addAll(insertBatchReturnIds(insertBatch));
insertBatch.clear();
}
}
if (insertBatch.size() > 0)
insertedIds.addAll(insertBatchReturnIds(insertBatch));
System.out.println("INSERT test finished");
System.out.println(String.format("Expected IDs retrieved: %d. Actual IDs retrieved: %d.", MAX_DOCUMENTS, insertedIds.size()));
if (insertedIds.size() != MAX_DOCUMENTS)
throw new IllegalStateException("Not all _ID were returned for each document in batch");
}
Sample output
Starting INSERT test... Batch inseted: - Was acknowladged: TRUE - InsertManyResult.getInsertedIds().size(): 16 Batch inseted: - Was acknowladged: TRUE - InsertManyResult.getInsertedIds().size(): 16 INSERT test finished Expected IDs retrieved: 1000. Actual IDs retrieved: 32. Exception in thread "main" java.lang.IllegalStateException: Not all _ID were returned for each document in batch
My questions
- Is
InsertManyResult.getInsertedIds()meant to return_idfor all documents inserted? - Is the way I am using
InsertManyResult.getInsertedIds()correct? - Could size of the inserted JSON be a factor here?
- How should I use
InsertManyResultto get_idfor inserted documents?
Note
I am aware that I can either read _id after Document.parse as it is the driver that generates this or I can select _id after documents were inserted.
I would like to know how can this be achieved using InsertManyResult.getInsertedIds() as it seems to be made to fit this purpose.
Advertisement
Answer
This is a bug in the Java driver, and it’s being tracked in https://jira.mongodb.org/browse/JAVA-4436 (reported on January 5, 2022).