In my last post I showed how simple request rate based throttling could be adapted to provide a more proactive approach to managing service consumption limits; applicable to a range of scenarios like capacity management, service metering and critical service protection. I also emphasized the difference between request throttling and concurrent connection throttling. Over the course of the past few weeks I have had cause to think a little harder about how to control and restrict the number of “in-flight” transactions for a particular service.
The obvious point to note here is that the number of requests processed per second does not necessarily relate to the load on the target system (whatever that may be);for example a search request with broad parameters that returns a huge amount of data will consume more resources on the service provider system than an update that contains a unique identifier for a record and a couple of values. The ability to restrict the number of concurrently processing, high cost requests should be considered as important as controlling the rate of incoming requests. Failing to provide this type of protection for mission critical services has seriously undesirable side-effects.
In this post I’ll give an example of how a gateway can be used to provide protection from system overload (at an operation level) and discuss the various approaches to managing this in an HA (High Availability) and dual site configuration.
First off let’s be clear about in-flight requests and concurrent transactions….
Assuming the target service is exposed to external consumers in the classic service virtualisation pattern above, an “in-flight” request is considered to have a lifetime that begins as soon as the request leaves the gateway (forwarded to the service) and that ends once the response from the service is received by the gateway. For longer running requests (in the order of 2 to 5 seconds or more) the effect of concurrent processing is exaggerated. To clarify, the requests do not need to be concurrent – it is the overlap of the lifetime of in-flight requests that constitutes a “concurrent transaction” in the context of MCT (Maximum Concurrent Transaction) throttling.
In the diagram above, (green dots representing requests) if a limit of 2 MCT had been applied to the service, then subsequent requests would be rejected until such time as one of the requests currently being processed completes and returns.
Clearly the objective here is to manage a count, applying a one-in one-out restriction when the maximum number of allowed transactions currently being processed by the target server is reached… sounds pretty easy, and in a single gateway in-line configuration it is.
Using either the service name, operation name (or a combination of both) or even the URI of the incoming request to provide the key for the increment allows the restriction of the number of concurrently executing requests on the target server.
The challenge here is how to scale, simply dividing the maximum limit by the number of gateways behind a load balancer is notoriously unreliable and could easily result in a loss of transaction capacity if the load balancer has any bias at all. The only real solution is to maintain an atomic transaction count across 1..n gateways.
I spent some time looking at the different options for achieving this, and whilst there are a few that are satisfactory from a functionality point of view, they all came with a cost (usually a performance hit).
The silver bullet was actually handed to me on plate by David Cooke at Oracle (thanks David), who introduced me to Oracle Coherence, a very cool piece of tech. The key to it all is providing a non-blocking lock on the count – allowing multiple gateways to read, increment and decrement the count for a given service/operation. The simple policy shown below is all that is required to implement service level MCT throttling (in its most basic form)
If the gateway fails to acquire a lock on a transaction a custom error message is returned back to the client. The fault handler (in blue) for this circuit also releases the lock – to prevent orphaned transactions. In this example the connection to the target service is also load balanced (by the gateways) to multiple hosts.
As a solution this is highly scalable (multiple gateways, 300+ services/operations) and also self healing; one or more gateways can be removed from the architecture without impacting the availability or the transaction count, as long as there is one gateway up and running. Orphaned transaction locks from a terminated gateway are automatically cleaned up and freed for use by the remaining gateways.
The load balancing performed by the gateway will also mitigate downstream server failure – by removing an unresponsive host from the available pool and adding it back in when it becomes operational and available again.
If that wasn’t enough, the transaction restrictions can be applied even in a dual / multi site (DR) scenario – although performance may vary depending on the inter-site connectivity.
The MCT throttling limits can be implemented as “hard” and “soft” limits with alerting (as described in the previous request throttling – and the concurrency can be tracked against almost any facet of the interaction service/operation/user/client IP address etc… Limits can be stored externally (abstracted to a directory/database / IdM solution) allowing the administration.