> Archive > Issue LVIII, January 2012 > Runtime Governance Challenges & Concerns: A Practical Approach to Throttling
Roger Stoffers

Roger Stoffers


Roger is a Certified SOA Architect, Security Specialist and Consultant as well as a Certified Trainer for Arcitura Education. Roger is a TOGAF-certified Enterprise Architect and a senior Solution Architect with affinity for service-orientation and Cloud at Hewlett Packard Enterprise in the Netherlands. He has been the lead for many SOA and Cloud projects, primarily in the telecommunications and media sectors, with 20 years of international experience working with large organizations across many countries. He is a contributor to the Service-Oriented Architecture: Analysis and Design for Services and Microservices book from the Prentice Hall Service Technology Series from Thomas Erl.

As a Solution Architect, Roger has a wide interest in different types of architectures and leads many service-orientation initiatives. He is interested in application integration as well as the end-to-end consequences for businesses, organizations and business processes, and is always looking for the business driver to meet the requirements with the most potential for business satisfaction.

As an enterprise architect, his interests include translating business drivers and goals into architecture principles and requirements. Lastly, he is interested in finding relationships between business and IT: how business decisions affect IT and how IT decisions can affect businesses. As a strategic and pragmatic thinker, he architected several innovative future architectures—some of which have already been realized, and some of which are presently in progress due to their long-term nature.

Roger’s work with Hewlett Packard Enterprise has been as an Enterprise and Solution Architect, where he led many projects for Digital Transformation and Enterprise Application Transformation. His recent work involved being the Lead Architect for many projects in a large-scale business and IT transformation program, acting as strategic advisor for the program board, and providing advice about integration strategy and principles. Roger is also responsible for strategic architecture and governance principles of a Global Service Bus in a challenging multi-national environment where central governance was not possible.

Further expertise with a wide range of business capabilities includes but is not limited to the following: Billing, CRM, Retail, Credit Management, Product and Offer Management, Service Provisioning, Business Process Management, Lead Management, Sales and Ordering, and Order Fulfillment.


rss  subscribe to this author


Runtime Governance Challenges & Concerns:
A Practical Approach to Throttling

Published: January 19, 2012 • Service Technology Magazine Issue LVIII PDF

Abstract: Similar to technology vendors messing up the meaning of SOA by providing their technology and calling it "even better SOA than my competitor's", now runtime governance tooling is running into the same problem. They say "Buy our product and all your problems will be solved. A system admin can do the job in less than one hour and your show is on the road again". But is this really not a possibility (perhaps we still remember Thomas Erl's "Evil-twin" – "Good-twin" analogy [REF-1])? This article takes throttling as an example of runtime governance and shows how difficult it can be to come up with a proper strategy. Also it will become clear that up-front in the definition of the reference architecture the use of runtime governance tooling must be arranged to make sure we do not limit the system accidentally. This involves defining which kinds of policies can be applied and how, and also who is responsible for defining and managing policies.


When runtime governance vendors present their tools to potential customers they explain how installation is easy and how implementing the actual behavior of the actual service agents is configuration only and this can simply be done by system administrators.

The pitfall is that customers exist who infer from these statements that they can purchase the licenses for any required service agents, give the CD to an administrator, and ultimately let the system administrator configure the throttling agents into the service inventory. All problems solved. This is an incorrect assumption which is often (ab)used by vendors, as this is a good selling argument: “zero-build”.

To begin, let’s start by defining what throttling actually is from several points of view and then illustrate how these “zero-build” conclusions can be way off and can be very costly to resolve. For vendors, it's high-value sales. Vendors will spend lots of time persuading the customer that this (service) agent is the solution to their problems with system load and capacity. The zero-build argument is close to a guarantee that customers will spend more money on something that saves build effort. In the eyes of customers, throttling is a way to manage the amount of traffic to a service or a back end system, without build effort. Customers pay significant amounts of money to use throttling agents. The expectation is often that they manage the amount of throughput by restricting the access to a service to not exceed a certain metric (i.e. a specific number of calls per second or per minute, depending on load). These are tangible metrics which can be discussed. Customers can easily assume that the load on the system is reduced automatically by the throttling agent, without thinking about consequences of that statement. It sounds like magic. If we believe the vendors, it is magic. In the eyes of designers and architects this is a way of managing runtime characteristics of a service-based on a service-oriented solution.

What does a statement like this mean for the messages that arrive at the throttling agent and are marked to be beyond the predefined threshold?

For a consumer program this could mean that the message has to wait until the measured load does not exceed the threshold anymore. To a throttling agent itself this usually means that the service call cannot be executed, because allowing execution would violate the expected threshold even more. Depending on the business goal and the context in which the service activity is requested by a consumer, the severity of what happens if messages are throttled is different in various scenarios.

Throttling Examples

What should happen to a service call exceeding the threshold? This actually depends on the purpose of the service and the context in which this service call is executed. The following three examples can be seen in Figure 1.

  1. A read capability for customer data intended for display in a client application
  2. An update capability of an address based upon a consumer program request, or placing an online order by a customer.
  3. A read capability in the context of an update service composition. (Perhaps this is exactly the same read capability as mentioned in #1.)
Figure 1 - Three (Business) Contexts of Capabilities.

Let’s highlight some characteristics for each of these:

Ad 1:
It seems OK to respond “too busy” to the service consumer. The consumer can retry during a less busy time if it’s not too important. To the throttling agent it means it’s ok to discard the message and respond “too busy”.

Ad 2:
A little less easy to deal with. Discarding the message does most likely not meet the business requirements. If the front end application (service consumer) is a web application which exposes a page to submit address changes, or an order, it might even cost customers and revenue if the data was captured and the system would say “too busy”. But then what? It seems that storing the message and trying later in less busy times should be fine. This means that, unless the context of the write operation requires a status to be returned to the consumer, store and forward is ok. If a success/failure status must be returned this is not an option however.

Ad 3:
This scenario is distinctly different from the first one. Discarding the “read message” executed in the context of an update service activity will likely trigger a retry mechanism. The retry mechanism will make the same message come back, potentially, even quicker than a retry attempt coordinated by a service consumer. This will significantly increase the resource load on the message infrastructure as well as on the throttling mechanism. A way to overcome this issue is to have message properties which help identify in which context the message is being executed. And on the same throttled service, but in the different context, you can decide to allow this service call although messages in the ‘regular read’ scenario would be refused. For this you can even use two different throttling statistics, or as they can also be called: throttling policies.

This does however expose an issue related to throttling in service composition context. (See the Where to Throttle in the SOA section)

But, let’s make things a bit more complex. One can ask himself whether the order in which messages are processed is significant in the context of the core service logic. This determines whether it’s okay to park requests which exceed the threshold for later execution. Even if the requests are parked for later use, this is just moving the problem to another place in the system or to another point in time. If the amount of throttled (parked) messages is large, when processing the parked messages later you may face another throttling challenge. If we’re just moving the problem it does not seem like a viable solution.

If the order of execution is to be guaranteed, the solution mentioned cannot be executed as no new messages can be executed until the one exceeding the throttling policy can be successfully processed. This is another solution which does not seem viable to solve the throttling issue. In addition this seriously affects scalability of the service logic.

What other options do we have? If we look at this scenario, the throttling should do no more than ensure that messages do not get lost while the availability of the service provider is not guaranteed (the throttling policy exceeded situation is the same as having an availability issue of the service provider). This can be solved by utilizing a queue to mitigate for the times of reduced availability of the service provider. This should be sufficiently supporting to ensure that (eventually) the address update will be executed.

A way of throttling in this situation is to have messages read from a queue (the store that holds messages posted by a service consumer) at a maximum predefined rate to prevent the throttling policy from being violated. This can only be done if the consumer does not require a synchronous response to its update request message. This is a perfectly fine solution where no messages will get lost, but the order of execution is maintained only if messages are read from the queue one at a time which then seriously impacts the scalability of the throttled service.

The processing order of messages is relevant if out of order execution causes data integrity issues or service failures. Some examples are: two subsequent address changes on the same customer result in the incorrect address details in the customer database if executed out of order. Or similarly, if the core service logic of a service must create subsequent service calls, and the second one cannot be completed successfully without the first one being executed successfully, this will create similar data inconsistency issues.

Figure 2 - Out of Order Execution Causes Data Corruption.
(Click here for large image)

A way to make the system more scalable is when messages can be skipped if a more recent one was received; this applies in certain situations only. When taking the two subsequent address changes as an example: only the most recent message for the address change is relevant to the system. This can be achieved by assigning a time based or sequence based message property or header element to a message upon receipt by the service provider. In Figure 2, the last update would not happen if the system could tell that the first update was actually caused by a more recent consumer request. Perhaps the service consumer can already assign this to the request, although one shouldn't abuse a consumer to help solve a runtime service availability problem. An expression on a message ID/sequence number can be used to identify whether a processed message can be dropped because a more recent one has been processed already. (A similar system can be used to detect replay attacks). For this to work, a form of data store must be available in the system to keep track.

Where to Throttle in the SOA

Where should throttling happen anyway? There is no single right answer to this topic. Let’s consider the following layered service inventory:

  1. A. Public services controlling the access to services inside the service inventory, acting as a formalized and centralized endpoint for all external access to the service inventory (i.e. federated endpoint layer)

  2. B. Orchestration services (orchestrated task services) controlling all centralized and long-running processes

  3. C. Business services (i.e. task services, utility services) which are the sole access point to any underlying layer

  4. D. Data Services (i.e. entity services, utility services) controlling all back end access.

Figure 3 - Throttling Can Happen on Several Places in the Infrastructure.

What happens if we throttle on services in each of these layers?

Ad A:
This can control the amount of traffic allowed into the service inventory but what does that achieve? It would only achieve throttling on that level in the infrastructure. It would be good for controlling specific service consumer policies and indirectly keep the load on the underlying system manageable, but in the end, many public services can access the same business services or back end services in complex compositions. This results in a significantly greater amount of requests to the back end, which can be a multitude of the amount of client requests. Furthermore, not all business services would need to be exposed on a public level, meaning that “hidden” load can exist on the lower layers inside the service inventory that the throttling mechanism would not be measuring. Best goal of throttling on this layer is to control the amount of load a specific consumer is allowed to produce on a certain endpoint.

Ad B:
A system can hardly be throttled on this level, as process starters are often inside the orchestration engine and cannot be exposed to service agents. As a process can be a complex enterprise asset for example due to the long-lived nature of execution, time based triggers, schedulers and compensation handlers, this level of throttling seems virtually useless. This means that if a process must be controlled, it’s probably best to throttle externally to the process, (i.e. in the layers ‘above’ or ‘below’ the orchestration layer). This is essentially the same as controlling the flow of messages to and from the process in general.

Ad C:
Business services might seem to be the best suitable place to control throttling because they see that all the traffic coming into the system has to pass through business services. They see the traffic from the top layers and the traffic from the layers below – generally speaking. But multiple business services may be composed together in complex compositions, resulting in a throttling nightmare when the need to throttle arises on these kinds of services.

For example, if I had an order service which is used in a composition which should also invoice the customer and schedule an electronic payment, on which of these elements (functional business service areas) do I throttle? On one of the composed services? On the composition controller? On all of the above? (check all that apply)

Even here a single answer is not possible because it all depends on the purpose of the throttling and this differs per throttled service and on the context in which it is being used. Last, but not least: more than one task service may use a single entity service. If the business service is being used to control access to a particular data service then how do we control the total amount of load on the data service? We can’t necessarily, not in this layer anyway.

Ad D:
This can control the back end load probably best, but as the data access services usually do not know the context in which they are being called, applying the throttling on this level has its restrictions. Referring back to 3): if the read capability of underlying data is relevant for an update operation on the same service composition, then what kind of an effect does refusing the read operation have on the service composition, and what is the consequence of that to the SOA infrastructure? This is not easy to answer as it is probably different for many services.

It all comes down to why the throttling happens: from a business or technology point of view. Sometimes the throttling happens to protect the legacy and back end systems, which should allow for throttling on the data service level. Sometimes the throttling happens to protect the middleware from excessive load, which can probably best be managed at the business service level. Sometimes it’s one specific consumer which can threaten the entire system’s availability and then a throttling policy for the selected relevant service capabilities exposed to that particular consumer may be used to protect other consumers from the load caused in the middleware by the throttled one.

Each method has its pros and cons. When looking at the overall picture, it may be perfectly fine to throttle on two or three levels. Although a combined throttling policy may not be the easiest to comprehend and it may not be using the system resources to the best extent, it still remains a popular method as it guards a number of key parameters of the system. This results in a solution which is still manageable without the need for immediate capacity enhancements.

Of course throttling policies can be used in other ways, for example to give priority to certain messages, or messages from certain consumers or customer requests on the system, and many other ways exist, but this is just an example.


Throttling is not trivial. In fact it’s crucial to have an up-front analysis done for your throttling architecture before any policy is applied. This can be formalized by documenting specific approaches for specific situations in a reference architecture document.

A well-respected colleague of mine once said: the throttled message can be discarded and the service consumer can be thrown a technical exception. Although this may be fine for many messages and throttling implementations, please be aware that more options exist and can be addressed by having a “throttling (reference) architecture”.

A throttled service may throw a technical exception. Usually technical exceptions are treated by consumers as a permanent failure for a service call. If the call was a read operation, probably it may not happen again, but if it’s a write operation, the consumer may have retry mechanisms in place which might immediately result in another call with the same message. This however is the easiest and most straight-forward implementation and can be introduced without really big implementation issues in the system. Most initial implementations may be using this method.

Some caution with this statement: if the consumer treats this as an invalid message call to another system, some elaborate log analysis sessions may come from this, since people cannot tell the difference between a back end availability issue and a throttled message response. To make this difference, you may not be able to avoid customization of the services. This can be done in standardized ways to make it easier for solution designers to understand and use these features. What must be clear by now that is that pairing “throttling and zero-build” should be considered contradictio in terminis. To do it properly, prepare reference architecture aspects concerned with throttling. This can have consequences on service architecture and service composition architecture levels. This can hardly be considered zero-build. Retrofitting an existing service inventory with these kinds of changes can be very costly as cascading effects can be expected throughout the service inventory and even outside of it. Assuming that existing service consumers are not ready for these kinds of changes, it becomes clear that this all can only start with good planning and implementation strategies.

Figure 4 - Standard Exception Does Not Allow the Consumer to Find Out When a Retry Makes Sense, New Exception is Not Always Understood.

A throttled service may throw a technical or functional status (ie. “not now”) but in the end this means that the service consumers must be able to understand this message. What it means is that at present the message cannot be completed. Probably it does not make sense to retry the message at this point in time but a retry at a later time may work perfectly well. This means that a delayed retry may succeed after all, whereas an immediate retry would not. This would require up-front analysis and design of the SOA infrastructure to make sure this is even possible.

Once a reference architecture exists, it should be easier for system administrators to think about and implement new policies, and fine-tune existing throttled entities. But be aware that, depending on how elaborate the throttling architecture is, the complexity of throttling may dramatically increase. Even if current throttling parameters are understood perfectly well, a dependency analysis must be conducted to be able to fully assess and understand the implementation of a new throttling policy. Sometimes a small change in one parameter or in service compositions can have catastrophic cascading consequences that can cripple (areas of) the service inventory.

A similar risk may apply for making changes to an existing throttling policy. As soon as you tune the throttling policy to switch to a point near “typical load”, or the typical load increases to a value near the configured throttling policy, dramatic changes in system performance and behavior can be expected even by fairly small changes in system load.

Changes or new policies should be investigated by a team which consists of administrators (system current knowledge), architects (system dependencies and consequences) and capacity planners (system future load). Once the analysis and configuration is done, any new policy of this kind must be tested.


[REF-1] Erl, Thomas & Manes, Anne Thomas. “Exorcising the Evil SOA: A Necessary Step Towards Next Generation SOA”, 2009.