Software Pipelines Examples

From Software Pipelines Alliance

Jump to: navigation, search

Contents

[edit] Abstract

The Software Pipelines architecture provides a practical, usable way to enable concurrent computing in business applications by maximizing all CPUs and cores in a multi-core system. And because this technology is specifically for business applications, you can maintain critical order requirements at any level of complexity.

The surprising, yet direct parallels between fluid dynamics and software provide the basic laws that make this architecture work. You can use these laws to predict and optimize any Software Pipelines application.

In this paper, we’ll show you how to use Software Pipelines and Software Pipeline Rules to solve some typical business problems. You’ll learn how to get the best performance from your pipeline and Pipeline Distributor components, and how to easily achieve scalability for future expansion.

[edit] Introduction

In the previous wiki entries of this series, we defined Software Pipelines, and we described Pipelines Law – the fundamental rules and formulas for predicting and optimizing a pipelined system. We’re going to use these concepts to solve some real-world problems, but before we work through the examples, let’s briefly recap the principles. The basic Pipelines Law is:

Inflow equals outflow.

According to this law, we know that for a given process, any component that can’t keep up with the input rate will inhibit system performance. In other words, your application will never perform faster than the slowest component in the flow.

The following Software Pipelines Rules and mathematical expressions are based on the Pipelines Law.

Rule #1. Input equals output.

InputRate = OutputRate

AvailableInputRate = PotentialOutputRate

Rule #2. The capacity (transaction rate) of any downstream component or process must be greater than or equal to the input rate of any upstream process or component. When this is not the case, you must optimize the downstream component or process, or you must use a Pipeline Distributor to support concurrent processing and handle the load.

InputRate must be <= ProcessRate

Rule #3. The processing rate of the Pipeline Distributor must be far greater than the downstream processing rate.

DistributorRate >> ProcessRate

NumberOfPipelines = DistributorRate / ProcessRate

In the following sections, we’ll solve two business problems by using Software Pipelines architecture, and you’ll learn how to apply Software Pipelines Rules and formulas to each step of the design process

[edit] Bank ATM System (Single-Tier Distribution)

We’ll use a bank ATM application for our example problems, as we did in the Pipelines Overview paper. Nearly everyone uses ATMs, so we’re all familiar with this type of software.

The main processing flow for each transaction in our example ATM application is 1) Authenticate Customer, 2) Validate Transaction, and 3) Perform Transaction. You can see this flow in Figure 1.

Figure 1. Main process for each ATM transaction.

Figure 1. Main process for each ATM transaction.

An ATM application typically has the following critical requirements:

  • It must handle a high (and often growing) volume of customers at many locations.
  • It must prevent each customer from overdrawing his or her account.
  • An ATM system is, by nature, a highly distributed system, so it’s vital to maintain FIFO order when processing transactions. FIFO prevents any two users from accessing the same account and overdrawing available funds.

If we use Software Pipelines, we can design an application that will meet all these requirements.

[edit] Pipelines

To apply pipelines to the ATM application, we need several specific metrics. The metrics help us determine how many pipelines to use. We can use actual measurements of the existing system, which is ideal, or we can use best estimates based on how we believe the system will perform.

In our first design step, we’ll compare the InputRate metric (the rate at which we expect to receive incoming transactions) to the ProcessRate metric (the rate of the application’s main process). InputRate will vary, but we must design for peak loads. Assume the bank has 1000 ATM machines around the country, and, on average, it takes a customer 30 seconds to enter a transaction (insert the bank card, enter the PIN number, key in the transaction, etc.). We want to know the total number of transactions performed each second (TPS) across the whole system. To calculate that, let’s first get the TPS for one transaction:

One transaction / 30 seconds = .033 transactions per second (TPS)

The TPS for one transaction is .033, and we have 1000 ATMs. Let’s calculate the TPS for the whole system (we’ll round it off):

1000 ATMs in the system * .033 TPS for one ATM = 33 average TPS across the system

Users don’t submit their transactions at an even rate, so we’ll double the average TPS (from 33 to 66) to cover the potential peak load. We now have our first metric, InputRate, which is 66 TPS.

For our second metric, ProcessRate, we’ll use the TPS for the whole process flow – the Authenticate, Validate, and Perform Transaction steps. In other words, we estimate the time to perform the entire monolithic process (this is normally the easiest way to start implementing pipelines). Let’s assume we tested the process code, and discovered it takes 50 milliseconds to perform. We use the same formula as before to calculate the TPS for one transaction, and determine that ProcessRate is 20 TPS. We can now compare InputRate to ProcessRate:

66 TPS InputRate > 20 TPS ProcessRate

The metrics prove we have a bottleneck to handle, because the application violates Rule #2 (InputRate must be <= ProcessRate). To find out how many pipelines we need to fix the bottleneck and get acceptable performance, we calculate how much InputRate exceeds ProcessRate:

66 TPS InputRate / 20 TPS ProcessRate = 3.3

The ratio tells us how many pipelines to use: 3.3. We want to allow for some cushion (about 20%), so let’s round it up to four pipelines. In our next step, we’ll make sure the Pipeline Distributor can support this number of pipelines.

[edit] Pipeline Distributor

To implement the Pipeline Distributor, we must first decide how we want to split the transaction load among the four pipelines. Each incoming message contains values we can use to sort transactions into groups for routing. The distributor uses the selected value(s), which we’ll call input keys, to choose the pipeline for each transaction.

The bank has 1000 branches, each with a branch_id numbered from 0001 to 1000, and each customer account (account_no) belongs to a particular branch. Since we need only four pipelines, we don’t need very complex logic to sort transactions. Using the branch_id as the input key, we’ll divide transactions into four groups (each group includes a range of branches) and assign an explicitly-named pipeline to each group.

Next, we’ll do some additional planning to ensure acceptable performance. The distributor must enforce FIFO order, which adds some overhead, so we don’t want to use more pipelines than the distributor can handle. To calculate how many pipelines it supports, we use this formula:

NumberOfPipelines = DistributorRate / ProcessRate

We already know the ProcessRate. To complete the formula, we need the distributor’s TPS, which we plug into DistributorRate. Before we calculate the distributor’s TPS, let’s learn more about its operation. It receives transactions as input messages in the following XML format:

<atmtrans> <branch_id>222</branch_id> <trans_date>01/01/2007</trans_date> <trans_time>15:41:01/trans_time> <account_no>11111111</account_no <pin>0000</pin> <atm_id>456</atm_id> <trans_type>withdrawal</trans_type> <trans_amount>100.00</trans_amount> <currency>USD</currency> </atmtrans>

The distributor’s code is very simple; as each message arrives, the distributor parses the XML into a DOM (Document Object Model), reads the message, and then uses a case statement to evaluate the branch_id element. Then, the distributor routes the transaction to the pipeline assigned to the range in which the branch_id falls.

The following table shows the transaction groups and pipeline names. The evaluation expressions are written in XPath notation:

Pipeline Evaluation Expression Pipeline Name
/atmtrans/[branch_id >= 0001 and branch_id <= 0250] P_0001_to_0250
/atmtrans/[branch_id >= 0251 and branch_id <= 0500] P_0251_to_0500
/atmtrans/[branch_id >= 0501 and branch_id <= 0750] P_0501_to_0750
/atmtrans/[branch_id >= 0751 and branch_id <= 1000] P_0751_to_1000

Let’s assume we tested or estimated the distribution algorithm, and we determined that its latency is two milliseconds. When we calculate the distributor’s TPS for one transaction, we get 500 TPS, which we use for DistributorRate. Now, we can figure out the maximum number of pipelines:

25 NumberOfPipelines = 500 TPS DistributorRate / 20 TPS ProcessRate

The distributor’s maximum NumberOfPipelines is 25, so it can easily support four. We’re now done with the high-level design, which appears in Figure 2.

Figure 2. High-level design for the first ATM application.

Figure 2. High-level design for the first ATM application.

The bank will be able to easily manage 1000 ATMs if we use this design for the actual application. To deploy it, we’ll use a multi-core server. Since we have only four pipelines, a single CPU quad-core system should be adequate for the load.

[edit] Bank ATM System (Multi-Tier Distribution)

Now that we’ve seen a simple pipelines example and design, let’s look at one that’s a bit more challenging.

Assume that our pipelined ATM system runs well and performs as expected. We attend a special company meeting, and the executive team makes a surprise announcement: our bank is merging with another retail bank that has 10,000 ATMs. Because our technology is SOA-based, and theirs is mainframe-based, they’ve selected our system to run everything for the newly-combined bank. Great news – until we’re told we have three months to make the switch! As if that wasn’t enough, the team announces plans to acquire more banks over the next two years, so we might have to run as many as 30,000 ATMs!

We do a quick analysis. If we’re handling 10,000 ATMs, our peak input rate jumps 10X from 66 TPS to 660 TPS. However, a single instance of our process component still handles only 20 TPS. Once again, the process violates Rule #2 (InputRate must be <= ProcessRate):

660 TPS InputRate > 20 TPS ProcessRate

This time, we have a much bigger bottleneck to resolve. We’ll start with the same formula; to determine how many pipelines we need for acceptable performance, we calculate how much InputRate exceeds ProcessRate:

660 TPS InputRate / 20 TPS ProcessRate = 33

That tells us we need 33 pipelines. To add the 20% cushion, we’ll make that 40 pipelines.

At first glance, it’s easy to assume we can just add more hardware and pipelines, but wait – from our earlier calculations, we know the maximum NumberOfPipelines for our current distributor is only 25. That won’t handle 10,000 ATMs, and there’s no capacity for future expansion. In addition, our server hardware platform supports dual quad-core CPUs. With eight cores per system, it’s not realistic to host all 40 pipelines on a single server.

The solution is to distribute across and within servers. To do this, we’ll implement a multi-tier distribution mechanism, as seen in Figure 3.

Figure 3. Multi-tier distribution.

Figure 3. Multi-tier distribution.

Using Pipelines Law, we find two areas in this example we should optimize:

  • We must increase the primary distributor’s throughput.
  • We must distribute transactions across multiple levels of pipelines (multi-tier distribution).

Two tiers will handle our new requirements. The best approach to the problem is bottom-up; in other words, we’ll analyze and solve the lowest (secondary) tier of distribution and processing, and then design the top (primary) tier. The lowest tier provides the actual processing, and must handle the ultimate input load, so this is the best place to start our analysis.

[edit] Secondary Pipeline Tier

The secondary tier includes the distributors and pipelines that actually perform the ATM process. We already know we need 40 pipelines to handle the load. At this point, we must determine the best way to distribute transactions, and we must select the value we’ll use as the input key.

We could simply create 40 named pipelines, each supporting a range of branch_id values, as we did before. This is acceptable, but the input load might not be evenly distributed by branch. Instead, let’s use a different technique for the secondary tier – dynamic named pipelines.

Dynamic named pipelines are virtual pipelines; their names are based on the changing values of the current input key. The server creates each pipeline on the fly as it’s needed, allocates it to an individual thread, and then destroys it when it’s no longer required. As long as the application is processing transactions for a specific key value, the associated pipeline is “alive”. When the key value is no longer present, the system frees the pipeline thread, and it can use that thread to create a new pipeline. This approach provides superior utilization of resources; the server uses a fixed thread pool, and constantly shifts pipelines to accommodate the load.

Figure 4. Dynamic named pipelines.

Figure 4. Dynamic named pipelines.

Now, let’s determine the best input key for the secondary tier of pipelines. Remember, we must support FIFO processing, so we still want to use order-based pipelines. We have two choices: we can distribute by branch_id or by account_no. Branch ID enforces order for each branch, and thus enforces order for the accounts within that branch. Account Number enforces order for each individual account.

There’s a trade-off to consider here. If we use branch_id, it takes less overhead to allocate dynamic pipelines; however, if there’s a burst of transactions for a single branch, we might have a serious wait condition, and users for that branch will see their transactions backing up. Using account_no increases overhead for pipeline allocation, but it guarantees almost 100% that pipelines will not contend for resources. Based on these factors, account_no is our best input key; it provides more scalability and better performance.

Next, let’s examine distributor performance at the secondary tier. We know we need 40 pipelines to handle the load, and we have eight cores available per server. We estimate we’ll need five servers, and we’ll assume we can safely allocate eight threads per server (we’re allocating one thread per physical core, although in some hardware architectures you might be able to use more or less than one thread per core).

Figure 5. Five servers, each with dual quad-core CPUs, and a pool of eight threads per server.

Figure 5. Five servers, each with dual quad-core CPUs, and a pool of eight threads per server.

The distributor reads each incoming message as it arrives, and then performs the following steps:

  • Parses the input message into a DOM tree.
  • Evaluates the account_no value.
  • Based on the account_no value, routes the transaction to a dynamic named pipeline. If the pipeline doesn’t yet exist, the distributor allocates it on an available thread. If the pipeline does exist, the transaction waits until the prior transaction for that specific account_no is complete. Note that this design does not constrain transactions for other account_no values.

Figure 6. Secondary-tier distributor with dynamic named pipelines. Pipelines change according to the value of the current input key.

Figure 6. Secondary-tier distributor with dynamic named pipelines. Pipelines change according to the value of the current input key.

Using this mechanism, if the distributor receives two or more transactions for the same account_no at or close to the same time, order of processing is guaranteed.

Remember, our InputRate is now 660 TPS, so let’s find out what the input load is for each server:

660 TPS InputRate / 5 servers = 132 TPS input load per server

The ProcessRate for any single instance of the ATM process is still 20 TPS, so eight threads provide an adequate cushion for pipeline allocation. To check if this is true, we calculate the downstream ProcessRate; that’s the combined rate for each secondary-tier server, including distribution overhead:

20 TPS ProcessRate * 8 threads/pipelines per server = 160 TPS downstream ProcessRate

And then compare it to the input load:

132 TPS input load per server < 160 TPS downstream ProcessRate

As you can see, this complies with Rule #2, InputRate must be <= ProcessRate.

However, when we test the distributor, we find out it’s slightly slower than the original one; the overhead for allocating pipelines requires 3 milliseconds per transaction. We calculate the distributor’s TPS for one transaction and get 333 TPS, which we use for DistributorRate. Now, we can determine the maximum number of pipelines:

16 NumberOfPipelines = 333 TPS DistributorRate / 20 TPS ProcessRate

The maximum NumberOfPipelines is 16, which isn’t enough to manage the entirety of the expected load. However, it is more than adequate to support eight concurrent pipelines per server, and we can support the entire load by properly designing the primary pipeline tier. We’ll do that in the next section.

[edit] Primary Pipeline Tier

We’ll now look at the primary pipeline tier, which uses the distributor from the first ATM application we designed. This level must handle the load of all input transactions, and then distribute them to the secondary-tier servers for actual processing. Remember from our first example, the DistributorRate for this distributor is 500 TPS. However, this time the InputRate is 660 TPS, so we have to increase the distributor’s performance. And as our bank expands, we know that the required InputRate will grow well beyond this level.

The answer lies in a simple design change. The distributor still routes transactions by branch_id as it did before, but we’ll optimize its design. Instead of using a DOM model to parse the inbound message, the distributor uses stream-based parsing (such as SAX or other stream-based technology). Instead of waiting until it parses an entire document, the distributor sends the transaction as soon as it finds and evaluates the branch_id. Because branch_id is near the top of the document, we get a tremendous improvement in distributor throughput.

The new design reduces the distributor’s latency to 0.5 milliseconds, including any network latency incurred when forwarding the transaction. Now, when we calculate the distributor’s TPS for one transaction, we get 2000 TPS for the DistributorRate. We’ve improved our original design by four times, which provides more than enough capacity for 10,000 ATMs. In addition, we’ve made it possible to add extra capacity when the bank expands in the future.

Figure 7. Primary-tier distributor, with pipelines routing transactions to five other servers.

Figure 7. Primary-tier distributor, with pipelines routing transactions to five other servers.

Let’s use the Software Pipelines rules to validate these conclusions. Using the downstream ProcessRate from our earlier calculations, we’ll determine the maximum NumberOfPipelines for the primary tier (we’ve rounded down the answer):

12 NumberOfPipelines = 2000 TPS new DistributorRate / 160 TPS downstream ProcessRate

The new primary-tier distributor can support 12 secondary-tier servers. To handle our expected load, we require only five servers at the secondary tier. Therefore, our conclusions are correct; the new design will comfortably support our expected load and provide the capacity to increase the load when the bank acquires more ATMs.

As for the distribution routing, we’ll use the same approach as in the first ATM example, but we’ll use five pipelines this time, one to each server. Again, we’ll use a range of branch IDs to group transactions, this time dividing transactions into five groups. The distributor reads each message as it arrives, evaluates the branch_id, and then routes the transaction to the pipeline assigned to the range in which the branch_id falls.

The following table shows the transaction groups and pipeline names. Again, the evaluation expressions are written in XPath notation:

Pipeline Evaluation Expression Pipeline Name
/atmtrans/[branch_id >= 0001 and branch_id <= 2000] P_0001_to_2000
/atmtrans/[branch_id >= 2001 and branch_id <= 4000] P_2001_to_4000
/atmtrans/[branch_id >= 4001 and branch_id <= 6000] P_4001_to_6000
/atmtrans/[branch_id >= 6001 and branch_id <= 8000] P_6001_to_8000
/atmtrans/[branch_id >= 8001 and branch_id <= 10000] P_8001_to_10000

This completes our second ATM application. You can see the high-level design in Figure 8.

Figure 8. High-level design for the second ATM application.

Figure 8. High-level design for the second ATM application.

[edit] Summary

We’ve used Software Pipelines to solve two typical business problems, and we showed you how to apply Pipelines Theory to application design. Many other factors can affect your system, and we’ll cover these topics in later articles. You’ll learn how external systems, such as mainframes or centralized databases, can cause bottlenecks, how to further minimize distribution overhead, and how to include scalability in all your designs.

Regarding Software Pipelines and Pipeline Distributor design we’ve only scratched the surface; many other possible designs exist and our follow-up articles will explore these in depth. You’ll learn additional patterns that illustrate a variety of typical pipelines problems and potential solutions. Our goal is to give you a predictable methodology for applying Software Pipelines Rules. If you understand this technology, you can design for maximum optimization of available resources, and more importantly, you can predict how your pipelined application will perform.

Personal tools