While building SaaS products with microservice architecture, one might look to improve performance by reducing latency. Can one really improve?

There’s no silver bullet:

First, you have to be able to find where, which part makes the latency high? Service to db? Load balancer to service? Internally in the service? Then you find out why.

In a point to point communication, we can calculate latency from the response time of the system and latency is the sum of all microservices in a single flow.

If two services are going to interact a lot, the first approach to reduce the network latency would be to ask politely the scheduler to place the pods as close as possible using the node affinity feature.

Low latency microservice reacts to the events, processes the input data, and generates output data. In many cases, the function is stateless: all required information is derived from the input events. However, in some cases efficiency is gained by maintaining some state between transactions. High performance and low latency are the objectives that determine whether a particular Chronicle service maintains some state or is stateless. If state is maintained, it can be persisted to reduce start-up time the next time the service is launched.

Adding timeout and retry functionality to a service sounds like a good idea, another service is chronically slow it always triggers the timeout and the retry will put additional stress on an already overloaded service, causing a bigger latency issue than the original fix tried to resolve.

Asynchronous calls can help to avoid a single slow response slowing down the entire response chain. But developers also need to be careful to avoid anti-patterns with these asynchronous calls.

Throttling requests or using fixed connection limits on a service by service basis can help your receiving services keep up. Throttling also helps with fairness by preventing a few hyperactive services from starving others.

While throttling does ensure availability of the service for your application, it will make it work slower. But it’s a better alternative than having the application fail altogether.

Even properly configured and optimized services can have performance ceilings.

If you’ve already determined that all your requests are necessary and optimized, and you’re still overloading your service, consider load balancing across additional containers to improve scalability.

Need to consider auto-scaling to dynamically adjust to incoming request load by adding and removing containers as necessary. If you go this route, be sure to implement a maximum container count and have a plan for defending against DDoS (Distributed Denial of Service) attacks, especially if your application is deployed in a public cloud.


Addressing performance issues in microservices is not a simple, one-off task. Given the increasing dependence on them while developing software products, it is too big an issue to ignore.

It pays to have a robust strategy to solve latency problems today and mitigate them in the future. Basically, no different from bug hunting and fixing, only the bug here is latency.

Leave a Reply