I have seen a lot of presentation on circuit-breakers - on how successful it was when Company X or Company Y applied it in services. Yet, I have always asked myself this question — is circuit breaker really practical in an auto-scaling environment? That question was never really answered by those who presented it, and a lot of examples were demonstrated on a single node that was run locally.
I may be wrong, but there are two ways to apply circuit-breaker to your application — application level (a.k.a thick client, placing all the logics on your application, typical in monolith application) or infrastructure level (think Linkerd2, Envoy, Traefik etc).
For application level, you would normally use existing solutions such as Hystrix and attach it as a middleware. But this can be limiting, since it is hard to find equivalent libraries in other languages (Hystrix is Java-based). The go’s equivalent for example, hystrix-go does not have a dashboard comparable to Java’s Hystrix. And what if you need a library for Rust? Tough luck. Microservices are supposed to be heterogenous.
Let’s imagine if we have such application with circuit-breaker baked in, and we scaled it so that we have 10 instances behind a load balancer. If you are distributing 1000 requests evenly among this 10 instances, such that each would receive 100 requests, what would happen when one of them trip? It would most likely reject incoming requests. But the desired outcome would be to stop sending requests altogether to that instance (correct me if I am wrong). Since the state of the circuit-breaker is local with respect to the application, there’s no way to prevent sending the requests to the application. So each instance would still be receiving 100 requests, one would reject all requests until it recovered.
By placing the circuit-breaker in the infrastructure-level, we could achieve what is called latency-aware load-balancing. Services that exceeds certain error threshold would not receive requests until they recovered as the requests are distributed evenly to other healthy services. Also, by moving the logic outside of the application, it becomes much easier to configure the circuit-breaker for your applications. Any microservices written in any languages can share the same configuration without adding a single line of code. Most modern service mesh has circuit-breaker capabilities baked-in, as well as other features (retry, gRPC-load balancing etc, rate-limit) and a centralized dashboard for monitoring.
That is my two cents on circuit-breakers in an auto-scaling environment.