Skip to content

How to ensure uniqueness when elasticsearch is inserted in multithreading´╝č

We have some documents of elasticsearch . The uniqueness of the document is determined by some fields together, how to ensure uniqueness when java multi-threading determines whether it exists and is inserted.

I didn’t know what good method I had before, so I wrote a method: I guess if it exists, if it doesn’t exist, I insert it, and this method is modified by syncronized. But I found this to be a very inefficient practice.

/**
 * @param document
 */
synchronized void selectAndInsert(Map<String, Object> document){
    //Determine if it exists, insert it if it does not exist
}

My mapping is as follows: {“properties”:{“pt_number”:{ “type”:”keyword” }, “pt_name”:{“type”:”keyword” },”pt_longitude”:{ “type”:”text”},”pt_latitude”:{“type”:”text” },”rd_code”:{ “type”:”text” }, “rd_name”:{ “type”:”keyword”}, “area_code”:{ “type”:”keyword”} … and so on }}

Uniqueness is determined by area_code, pt_longitude and pt_latitude. When the document is inserted, I will judge whether it exists according to area_code, pt-longitude, pt_latitude, and insert if it does not exist. How do I guarantee the uniqueness of a document when java multithreading is running?

This question has plagued me for some time. Who can help me, I will be very grateful.

Answer

There is no way to guarantee there isn’t such document just by properties in index in any way. Even if you check it’s presence in index and don’t see it, there is some time between response for that operation was issued and your indexing request accepted by ES.

So basically you have only two ways:

  • Guarantee single execution of indexing operation (long and not-so-easy way because we don’t have exactly-once systems)
  • Convert document unique properties into document ID, so even if your indexing operations overlap, they will just write same values into same document (or the second one and following will fail, depending on request options).

The latter one is quite easy, you have some options out of the box:

  • Take all unique properties in determined order and concatenate their string representations (ugly)
  • Take all unique properties in determined order, concatenate their byte values and encode using Base64 (less ugly)
  • Take all unique properties in determined order, pass them through hashing function (md5, sha-X families, whatever you like) and use string representation of result.