Software Pipelines Theory
From Software Pipelines Alliance
Contents |
[edit] Abstract
The concept of Software Pipelines was developed as a methodology to enable service-oriented business applications to implement concurrency while maintaining order of execution priorities and simplicity of application development. In order to effectively apply the Software Pipelines methodology, it is helpful to have a solid understanding of the principles that underlie the methodology.
This paper focuses on Software Pipelines law and rules, and discusses the challenges and opportunities presented by Software Pipelines when applied to business applications. Pipelines law and rules govern and guide the development of successful applications that utilize pipelines architecture and technology.
[edit] Introduction
A related wiki entry, Software Pipelines Overview covered an overview of Software Pipelines, showing how this technology can be used to provide powerful scalability for business applications. Pipelines can be used in a wide variety of applications, to provide tremendous improvements in throughput and performance, while maintaining key requirements such as order-of-processing and priority.
The previous paper provided some background about the need for concurrent processing and why traditional approaches to concurrent processing were not a good fit for business application transactions. Business applications are growing, driving higher volume and performance requirements. This is especially important given the shift to new “multi-core” processor architectures, underscoring the need for business application software to support concurrent processing.
The vast majority of research on parallel or concurrent processing has been applied to scientific or engineering problems, or to the “embarrassingly parallel” computation of large analytical calculations that are easily distributed.
The needs of business applications, however, often contradict the concepts of concurrent processing due to the following characteristics:
- Order-of-processing — Business application tasks commonly must be executed in a specific sequence. “First in/ first out” (FIFO) requirements are often used in business applications to maintain order of processing.
- Unpredictable workloads — Business applications must process “mixed workload” tasks where size and processing requirements are difficult to predict. Application workloads vary greatly over time, even by the hour. This makes it difficult to divide business applications so that they behave in a predictable manner.
- High volume of discrete transactions — Unlike scientific processes that are often self-contained computations that generate an end result after a lengthy calculation, business applications are based on a high volume of discrete transactions that often utilize external resources such as a centrally shared database. As a business application grows and its transaction volume increases, bottlenecks can occur due to contention for shared resources.
Pipelines offer a new way for developers to control and distribute the workload of business applications, relying on a high-performance execution control facility combined with a Pipeline Distributor to intelligently allocate transactions across available hardware.
This paper focuses on Software Pipelines law and rules, and discusses the challenges and opportunities presented by Software Pipelines when applied to business applications. Pipelines law and rules govern and guide the development of successful applications that utilize pipelines architecture and technology. Although pipelines solutions will, of course, vary from application to application, three basic questions will need to be dealt with by any pipelines application:
- How do pipelines work best to accelerate application performance?
- What are the keys to predicting and optimizing pipelines performance?
- What are the best ways to minimize the constraints and limitations of pipelines?
Understanding pipelines laws and rules is key to answering these three questions, and to the development of successful Software Pipelines applications.
[edit] Pipelines Law
There is a famous adage that describes a fundamental limitation in computing performance, and it is as old as information technology itself: “All processors wait at the same speed.” While it most often is used to describe underlying hardware limitations of I/O and other components that surround and interconnect with the CPU, it is highly applicable to software design and development. It turns out that this adage is more important than ever in today’s high-performance business applications. Many IT organizations today will admit that they average only 25 percent utilization of their existing CPU capacity. Published studies show that even this is optimistic with the industry average at 15 percent utilization. This leaves the other 75 to 85 percent theoretically available for up to 4+ times improvement in performance and throughput from existing hardware resources alone. Add the concept of multi-core CPUs to the equation (with 2, 8 or even 32 CPU “cores” per physical chip) and there is even more unused capacity. If today’s business applications are only utilizing a fraction of current available computing power, how can IT organizations capitalize on these vastly expanded capabilities?
A key goal of business application developers everywhere, is to obtain maximum throughput in a given system. To meet that goal, developers must maximize the utilization of available processors in a way that is optimized for a given set of application problems or requirements. When processors are not busy (i.e., waiting for other resources to perform), they are not productive, and thus a performance bottleneck results. The goal of Software Pipelines, and of the application of Pipelines Law, is to minimize wait time between interconnected components in an application system, thus helping to maximize processor utilization.
The Software Pipelines architecture is designed to accomplish tremendous scalability with predictability of performance and order control for business transaction applications. By understanding Pipelines Law and its impact on a system, it can be very straightforward to analyze, optimize and predict the performance of a given software system using this approach.
Many readers will be familiar with Amdahl’s Law, a mathematical formula used to predict the theoretical maximum speedup using multiple processors. Although Amdahl’s law has been used since the mid-1960s to predict system performance (and limitations) in parallel systems, most applications of Amdahl’s principles have been applied to either the “embarrassingly parallel” problem, or low-level chip and operating system architectures. Pipelines Law, however, is specifically designed to simplify the analysis of business transaction applications in order to maximize throughput. Pipelines Law is a simplified derivation of concepts similar to those of Amdahl’s law, combined with principles derived from the fundamentals of fluid dynamics.
Fluid dynamics is the engineering discipline concerned with fluids moving through a physical “pipe” system. Astonishingly, the parallels between fluid dynamics and software systems are very direct and easy to understand. The analogy to fluid dynamics helps make the principles of Software Pipelines easier to understand. The analogy is used in this paper to affirm some basic formulas that can allow developers to predict and optimize a Software Pipelines system.
For example, in fluid dynamics the first and most important rule is:
Inflow equals outflow.
While this is obvious enough, it is nonetheless the first and most important consideration when designing a pipeline system. Given a pipe of a certain size, the pipe can never output more fluid on the downstream end than was fed into it. Therefore, the inflow volume and rate determine the maximum possible outflow from a given pipe system (Figure 1).
Figure 1. Inflow equals outflow
This fluid dynamics rule translates into Software Pipelines Law as:
Input equals output.
When considering a computing system as a flow of transactions, this rule shows that one can never process (output) more transactions than the available input supply. In practice, input and output transaction rates will always be the same. However, it is also known that the potential output rate will never be greater than the available input rate. The input rate of transactions is the “raw material” available to work with. Thus, the first consideration in any concurrent processing system is to evaluate the available input transaction rate, as this represents the maximum number of transactions that can be processed. Of course, a single input transaction can generate multiple output transactions. However, the law still applies. One cannot process more than the available number of transactions. There must be an adequate supply of transactions to work with. This can be immediately applied when determining the potential usefulness of a concurrent processing system through this one question, “Are there enough transactions available to justify concurrent processing in the first place?”
[edit] Aspects of Pipelines Law
One of the most important concepts to understand in a pipelines system is the existence of limiters to flow. If an engineer can observe and predict what inhibits the flow of a given system, then the system can be optimized to ensure that the design is adequate to meet expected performance goals.
There are some important corollaries to Pipelines Law that can help illustrate the behavior of Software Pipelines.
The first and most obvious corollary is:
Friction and Restriction Limit the Flow
Any restriction in a fluid pipe limits the overall flow through the system, restricting and reducing both inflow and outflow. This can be viewed as a “crimp” in the pipe, a bend in the pipe, or a pipe of reduced diameter as shown in Figure 2.
Figure 2. Restrictions to flow can limit overall throughput
This translates to Software Pipelines very directly. If a single component in the middle of a system cannot process fast enough to keep up with the input flow, the processing rate of the entire system will be reduced accordingly. In other words, the processing rate is always limited by the slowest component in the system.
When a single component cannot keep up, transactions will “back up” on the input side. If that goes on long enough, it can result in a system crash due to running out of memory or available disk space. Therefore, it is very important to evaluate the performance of each component in a Software Pipelines application and ensure that components are balanced to provide optimum flow through the system. Without this evaluation, the optimization of one component in a pipelines system can often create a bottleneck in another, “downstream” part of the system.
Another corollary which relates to limited flow is as follows:
Restriction of the Outflow Reduces the Overall Flow
Limiting output flow is another common way that the flow through a fluid pipeline system can be restricted. If anything restricts the output side of a fluid pipeline, the entire flow is adversely affected. Consider any blockage on the output, such as some unwanted debris, whether partial or complete. Another example is a reservoir at the end of the pipe system that overflows, thus backing up the entire system and effectively stopping the flow in the entire system (Figure 3). In other words, if anything restricts the output side of a fluid pipeline, the entire flow is adversely affected.
Figure 3. Blockage on the output side will restrict the flow
The same thing is also true of a software system. Every developer has seen the effect of some centralized component that is external to the application and yet cannot handle the load. For example, a mainframe system that is the final repository for processed transactions can be too slow to accommodate the input from many sources. Another common example is a centralized relational database that cannot keep up with transaction rates. And an extreme case is where a database runs out of disk space or becomes locked, and simply stops receiving transactions at all.
In each of these cases, the processing rate of the entire system is either hampered or stopped completely. When this occurs, transactions “back-up”, users wait, and in some cases the entire application crashes. This is a sure recipe for disaster in any mission critical business application.
[edit] Pipelines Rules
Rule #1
Input equals Output.
Rule #2
The capacity (transaction rate) of any “downstream” component or process must be greater than or equal to the input rate supplied by any “upstream” process or component. When this is not the case, the component or process must be optimized, or a Pipeline Distributor must be used to support concurrent processing to meet the load.
Rule #3
The processing rate of the Pipeline Distributor must be far greater than the downstream processing rate.
The following sections look at each of these rules in more detail. Based on these rules, some very simple mathematical formulae are provided for effectively analyzing a Software Pipelines system.
[edit] Rule #1
Input equals Output
This is of course Pipelines Law itself, and all of the points made earlier that apply to Pipelines Law, apply to Rule #1 as well. The success of a pipelines system requires that: An adequate supply of input transactions is available, justifying the use of pipelines in the first place.
The slowest component or process in the system is identified and optimized, because it will govern the overall throughput and performance of the system.
Any external system or application that cannot handle the load is identified and optimized, because it will present a bottleneck similar to that of a slow component or process.
If there is an adequate supply of input transactions, the next order of business is to identify each component in the system, and analyze each for its performance characteristics in order to predict and remove bottlenecks.
The formula to express Pipelines Law is:
InputRate = OutputRate
This formula shows that the InputRate and the OutputRate will always be the same, regardless of how many transactions there are to process. In other words, a pipelines system (or any software system for that matter) cannot accept more transactions (InputRate) than it can process (OutputRate). Another, and more useful way of stating this is:
AvailableInputRate = PotentialOutputRate
For example, if the AvailableInputRate is equal to 10 transactions-per-second (TPS), the system can never process or output more than 10 TPS, regardless of how effective the processing actually is.
Viewing this from the processing side and assuming an AvailableInputRate of 1000 TPS, it is easy to determine the PotentialOutputRate. If the process can handle a load of 1000 TPS or better, then the PotentialOutputRate will also be 1000 TPS. However, what if the process can only handle 500 TPS? It’s easy to see that there will be a “back-up” of queued or lost transactions to the tune of 500 TPS in the system, definitely far from ideal.
Rule #2 addresses this potential of a downstream bottleneck created by a relatively slow downstream process.
[edit] Rule #2
The capacity (transaction rate) of any “downstream” component or process must be greater than or equal to the input rate supplied by any “upstream” process or component. When this is not the case, the component or process must be optimized, or a Pipeline Distributor must be used to support concurrent processing to meet the load.
This rule is the key to ensuring the success of any pipelines system in meeting maximum performance and delivering on established service level agreements and business requirements. Every single point in a pipelines system can be analyzed with this rule. Potential and actual bottlenecks can be identified as can the need for a Pipeline Distributor which enables concurrent processing for increased capacity.
The formula to represent this rule is:
InputRate must be <= ProcessRate
In other words, the “downstream” ProcessRate for any component or process must be able to accommodate the InputRate supplied by any “upstream” component or process. When this is not the case, one must increase the ProcessRate through the implementation of multiple pipelines or other optimization.
Consider the simple flow of components shown in Figure 4.
Figure 4. Step A (1000 TPS) -> Step B (500 TPS Capacity) -> Step C (2000 TPS )
In this case, Step B is the bottleneck that limits transaction flow. The result will be a backup of transactions queuing up in front of Step B and limited flow of transactions into Step C such that Step C is then underutilized. Furthermore, if there is no buffer for transactions to queue back up in this example up in front of Step B, transactions could be lost, an unacceptable consequence for mission-critical applications such as banking.
To balance the transaction flow requires the application of pipelines distribution or another form of optimization for Step B that increases its throughput. Otherwise the entire flow will be limited to the 500 TPS capacity of Step B. If it is safe to distribute Step B, a developer orn operator can simply create 2 pipelines for Step B to double its capacity. This will resolve the problem and balance the flow as shown in Figure 5.
Figure 5. Software Pipelines can be used to add capacity and balance process flow
Adding a second Software Pipeline to Step B may not be enough to double its capacity if there are not enough hardware resources to handle the increased volume as well. In order to double the capacity, it may be necessary to add more hardware resources as well. With two pipelines and sufficient hardware resources, Step B can take advantage of concurrent processing to double its capacity (Figure 6).
However, there is one more missing link for effective concurrent processing. Concurrent processing across multiple pipelines implies the use of a Pipelines Distributor to distribute and balance the load. Figure 6 also shows the addition of a Pipelines distributor that distributes transactions across the software pipelines in Step B.
Figure 6. A Pipeline Distributor enables Software Pipelines to take advantage of concurrent processing across distributed hardware resources.
A Pipeline Distributor performs content-based routing on input messages, distributing the load to individual pipelines for effective concurrent processing and execution of transactions. This facility provides the developer or operations administrator with a large measure of control over the flow of transactions and supports key business application requirements such as first-in-first-out (FIFO) ordering or priority.
The Pipelines Distributor is what allows pipelines to take advantage of concurrent processing across multiple hardware systems or multiple processor cores within a single hardware system. Each software pipeline can be assigned to specific hardware resources so that it can execute independently and concurrently with other pipelines.
The Pipelines Distributor then sorts transactions into the pipelines where they can be processed independently of transactions in other pipelines. Developers or administrators can then add hardware resources to a given pipeline if the hardware becomes the bottleneck in processing for a given pipeline. Because the pipelines are completely independent of each other, adding more hardware resources along with more pipelines enables linear or near-linear scalability.
Taken to an extreme, it’s easy to see that adding a large number of Software Pipelines with enough hardware resources under the control of a single Pipelines Distributor will eventually create a bottleneck at the Pipelines Distributor itself. To effectively implement a Pipeline Distributor, Rule #3 must be applied.
[edit] Rule #3
The processing rate of the Pipeline Distributor must be far greater than the downstream processing rate.
There are several things to consider when implementing a Pipeline Distributor. The first and foremost is that the work of distribution always adds overhead to the system.
For pipelines distribution to be effective, the Pipeline Distributor must perform its work far faster than the actual process that it is supplying.
This can be expressed with the formula:
DistributorRate >> ProcessRate
In other words, the DistributorRate must be far greater than the “downstream” ProcessRate. Otherwise the Pipeline Distributor will create a new bottleneck in itself, thereby defeating the entire purpose of the Software Pipelines architecture.
An example is shown in Figure 76. In this example, the Pipeline Distributor must have at least a 2000 TPS throughput capacity in order to avoid becoming a bottleneck itself.
Figure 7. Pipeline Distributor throughput must at least match downstream throughput
If the distributor in the diagram can process 2000 TPS, and feed four pipelines for Step B that each can handle 500 TPS, then everything will work fine.
However, what would happen if the distributor has very complex logic for determining the correct pipeline and routing of transactions and can only route transactions at the rate of 1000 TPS? As shown in Figure 8, this would simply create a bottleneck at the Pipeline Distributor. In that case, there would be no benefit from implementing pipelines and the Pipelines Distributor would be unnecessarily wasting valuable computing resources .
Figure 8. A low throughput Pipeline Distributor will create a bottleneck.
Another problem can come about if a developer were to distribute too many pipelines. Assume again that the distributor can process 2000 TPS. However, instead of feeding four pipelines as in the optimal flow of Figure 7, this time it routes the transactions to eight pipelines as shown in Figure 9. In this case, the distributor would only be able to feed 250 TPS to each instance of Step B running on a given pipeline, yet Step B pipelines can each handle 500 TPS. Again, resources are wasted and the system is not optimized . Figure 9 shows eight pipelines in Step B for a total transaction throughput capacity of 4000 TPS. Yet the Pipeline Distributor feeding these pipelines has a capacity of only 2000 TPS, It can therefore only send enough transactions for half of the capacity of Step B.
Figure 9. Adding more pipelines than the Pipeline Distributor can handle will also waste resources.
The following formula can be used to determine the right number of pipelines for a given part of a system:
NumberOfPipelines = DistributorTPS / ProcessTPS
The formula simply states that the theoretical maximum number of downstream execution pipelines (or subdivisions of a given process) is the ratio of the distribution rate (DistributorTPS) and the downstream processing rate (ProcessTPS). The formula shows that the theoretical ideal number of effective pipelines (NumberOfPipelines) is determined by this ratio. Of course, real-life systems demand some margin for error and it is thus typical to “over-design” to some degree. It will be best to plan to increase the NumberOfPipelines by 10% - 20% to allow for this margin of error and help ensure adequate flow in the system. Used in this way, the formula can provide an excellent guide for properly sizing a system and determining the optimum application of a Pipelines Distributor in a pipelines system.
In the simplest possible example, assume that a given distributor can evaluate and route transactions at the rate of 1000 TPS, and the downstream process runs at the rate of 100 TPS. Therefore, the maximum number of pipelines would be 10, expressed as:
10 = 1000 / 100
In this example, the pipelined system should be able to perform up to ten times as fast as a comparable system that does not utilize pipelines technology. More specifically, the system should be able to process a throughput of ten times the number of transactions in a given amount of time.
The formula is designed to show that downstream execution pipelines must be “supplied” at a rate comparable to the rate they can process or “consume” transactions. Otherwise the execution pipeline processors will simply wait and be an unproductive resource in the system.
Therefore, in the worst possible scenario, if the pipeline distributor rate requires more time to evaluate and route a transaction than a single execution of the actual transaction process, there is no benefit to concurrent processing . In fact, quite to the contrary, the distributor would simply introduce unnecessary overhead into the process. For a pipelined application to be successful, the design goal must be to minimize the latency of the distributor (i.e., increase the DistributorTPS), while delegating as much work as possible to downstream execution pipelines.
It is also important to remember that pipelines can be distributed at multiple levels in a given system as required. This rule must be applied at each level of the pipeline distribution hierarchy.
[edit] Summary
Pipelines Law and the three derived Pipelines Rules provide a very simple, yet highly effective method for evaluating a Software Pipelines system. To be sure, there are many other limiting factors that need to be taken into account, such as network latency, contention, etc., and these will be the subject of further papers and examples on this subject. However, these rules can be easily applied to the analysis and prediction of performance for any business application that requires the scalability offered by concurrent processing. In actual practice Rogue Wave Software has seen that these rules provide an excellent measure of prediction. When applied skillfully adding processors or processor “cores.” can result in linear or near-linear scalability. These principles offer a means for rapid identification of performance limiters and barriers and can greatly improve throughput and utilization of hardware resources.









