I am using Apache Ignite as a distributed cache and I am running into some fundamental robustness issues. If our Ignite servers reboot for any reason it seems like this breaks all of our Ignite clients, even after the Ignite servers come back online.
This is the error the clients see when interacting with caches after the servers reboot and the clients reconnect:
Caused by: org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed to perform cache operation (cache is stopped): <redacted>
My expectation is that the Ignite clients would reconnect to the Ignite servers and continue working once the servers are online. From what I’ve read thick clients should do this, but I don’t see this happening. Why is the cache still considered to be stopped?
We are using Ignite 2.7.6 with Kubernetes IP finder.
Advertisement
Answer
Looks like you are using a stale cache proxy.
If you are using an in memory-cluster, and created a cache dynamically from a client, then the given cache will disappear when the cluster restarts.
The following code, executed from a client against an in-memory cluster, will generate an exception when the cluster restarts, if the cache in question is not part of a server config, but created dynamically on the client.
Ignition.setClientMode(true); Ignite = Ignition.start(); IgniteCache cache = ignite.getOrCreateCache("mycache"); //dynamically created cache int counter = 0; while(true) { try { cache.put(counter, counter); System.out.println("added counter: " + counter); } catch (Exception e) { e.printStackTrace(); } }
generates
java.lang.IllegalStateException: class org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed to perform cache operation (cache is stopped): mycache at org.apache.ignite.internal.processors.cache.GridCacheGateway.enter(GridCacheGateway.java:164) at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1555)
You need to watch for disconnect events/exceptions
see: https://ignite.apache.org/docs/latest/clustering/connect-client-nodes
IgniteCache cache = ignite.getOrCreateCache(cachecfg); try { cache.put(1, "value"); } catch (IgniteClientDisconnectedException e) { if (e.getCause() instanceof IgniteClientDisconnectedException) { IgniteClientDisconnectedException cause = (IgniteClientDisconnectedException) e.getCause(); cause.reconnectFuture().get(); // Wait until the client is reconnected. // proceed
If this is a persistent cluster consisting of multiple baseline nodes,
you should wait until the cluster activates.
https://ignite.apache.org/docs/latest/clustering/baseline-topology
while (!ignite.cluster().active()) { System.out.println("Waiting for activation"); Thread.sleep(5000); }
After re-connect you might need to reinitialize your cache proxy
cache = ignite.getOrCreateCache(cachecfg); }