I’m looking to implement resilience4j circuit breaking library into a web application.
My application talks to two services and each service receives anywhere between 20 and 150 requests per second depending on the time of day. Resilience4j provides you with the ability to define a config for each circuit breaker which lets you configure the thresholds and ring buffer size.
Extra info on ring buffer size and resilience4j vs netflix hystrix implementation:
Hystrix, by default, stores execution results in 10 1-second window buckets. If a 1-second window bucket is passed, a new bucket is created and the oldest is dropped. This library stores execution results in Ring Bit Buffer without a statistical rolling time window. A successful call is stored as a 0 bit and a failed call is stored as a 1 bit. The Ring Bit Buffer has a configurable fixed-size and stores the bits in a long array which is saving memory compared to a boolean array. That means the Ring Bit Buffer only needs an array of 16 long (64-bit) values to store the status of 1024 calls. The advantage is that this CircuitBreaker works out-of-the-box for low and high frequency backend systems, because execution results are not dropped when a time window is passed.
My question is, with requests fluctuating between 20 and 150 per second, how do I determine the optimal size for the ring buffer? How would I justify the number I’ve chosen if someone asked me this same question?
If I set the ring buffer to 100, it will take 5 seconds to fill up @ 20 requests a second, and during peak hours it will take less than 1 second to fill up. I’m not sure if I should be using a time based implementation like hystrix or if I can work around this with resilience4j.
Completely agree with Jim Garrison on that one.
This configuration will heavily depend on the requirements and behaviors of your application.
Before you can get an answer to your primary question
With requests fluctuating between 20 and 150 per second, how do I determine the optimal size for the ring buffer?
you should decide what is the typical error rate for these particular requests that your system will treat like normal and the circuit will stay closed?
Also, you should take into account how fast your system should react to anomaly high error rates.
By adjusting the
CircuitBraker configuration you will actually balance between sensitivity and specificity and this balance completely depends on your business requirements.
For example, if you have a system where safety and availability are top priorities you can accept some amount of false-positive circuit openings.
In real production systems it is pretty hard to configure CircuitBreaker from scratch, so be ready to externalize this config and change it when needed.