Skip to content
Advertisement

Gremlin Driver blocks while initializing ConnectionPool with multiple endpoints

We are running a neptune DB in AWS. We have one writer and 3 reader instances. A few weeks ago, we found out, that the load balancing does not work as expected. We figured out, that our software instance is connecting to just one reader and keeps this connection until EOL. So the other reader instances were never be taken. Considering following link https://docs.aws.amazon.com/neptune/latest/userguide/feature-overview-endpoints.html. There is described, that for neptune load balancing, you have to do it client side and one precondition is, that you have to disable DNS cache. The client side implementation is described here https://docs.amazonaws.cn/en_us/neptune/latest/userguide/best-practices-gremlin-java-multiple.html respectively https://docs.aws.amazon.com/neptune/latest/userguide/best-practices-gremlin-java-separate.html because we handle the writer and reader cluster separately. Our software is written in java. So we implemented the described problem as follows:

disbale DNS cache in jvm:

JavaScript

pom.xml looks like:

JavaScript

Connecting to database via gremlin driver:

JavaScript

Problem is, while running this code, nothing happens at the first time of getting the graph. After some debugging we found out that in the constructor of ConnectionPool is the blocking code. In it, dependent on the minPoolSize, there is a CompletableFuture created for each Connection. In it, the Connection is established via a Host. While execution through the Clusters Manager ScheduledExecutor, the ConnectionPool constructor is joining all futures. As described here I want do something as future done order in CompletableFuture List the implementations seems to be right. But there must be happen something blocking. After checking out the gremlin-driver and comment the joining-code-line out and set up a simple Thread.sleep(), the code does work as expected. And now, the load balancing thing is working too. After adding some outputs, the output of the executed code above looks like:

JavaScript

The question is now, are we using the gremlin driver in a wrong way or is this a bug and we should add an issues to the tinkerpop-master repository? Or is there some other magic we do not understand?

Advertisement

Answer

We had hit this issue with Neptune load balancing for reader nodes in the past. We addressed it by making use of

https://github.com/awslabs/amazon-neptune-tools/tree/master/neptune-gremlin-client/gremlin-client

and we had to tweak our reader client a bit in order to handle load balancing at client side.

The updated way of creating a reader client looks something like this:

JavaScript

And this reader client can be created before creating the GraphTraversalSource something like this:

JavaScript
Advertisement