Profiling Integration Performance on a Tight Budget

Application performance is heavily dependent on the performance of the communications between the primary application and all other integrated systems. Even the tiniest of changes in a connected system can suddenly cause a huge performance hit. For example, a small web service retrieving data about a user when they log in gets altered to return an extra field from a joined table, and suddenly the performance for users logging in bombs and you start getting timeouts.

When an application is built under one set of assumptions, the changes that will inevitably occur in any integrated system need to be monitored on an ongoing basis. This is where automation of performance regression tests comes in. Whether running an agile project or not, ensuring integrated systems are performing over time requires a solid regression plan. How can this be accomplished this cheaply?

Automated Load Testing

Building a full automated load test suite that hits the common user flows and can be run every iteration gives you the best feedback on what actual performance for your users is going to be like. Additionally, this finds problems when you start hitting different load levels such as memory leaks, or network infrastructure problems.

However, this usually winds up eating up a lot of project budget as somebody will usually have to make ongoing tweaks to these load test scripts for minor changes that happen along the way, and the analysis of the data is not a simple “yes/no” response. This is definitely a solid way to ensure performance, but you need the extra budget for QA to make sure you can do this.

Additionally, a lot of organizations use tools that the developers are not familiar with, which means you will usually need to introduce an expert into your team for a period of time who does not have project context but knows how to use the tool. This additional ramp-up time gives you another cost to the project. How can this be kept in the team to reduce the cost?

Profiling

Visual Studio Premium and Ultimate comes with a profiling capability that you can use to test the performance of your unit tests. There are lot of other tools like Ants Profiler or JetBrains dotTrace that can do something similar. Assuming you have some integration unit tests, this is definitely your next best cost option. If you are using MSTest for executing unit tests with Visual Studio, that means you already have Premium or a higher license so you won’t really need to pay extra for licensing.

However, if you don’t already have the necessary licenses and this is still outside your budget, or you don’t want to add another tool, you can use your existing unit tests with a cheap alternative.

Automated Integration Performance Unit Tests

Do you have integration unit tests that are executing against the various integrated systems to ensure connectivity and business logic hasn’t been compromised? This layer is a great way to keep a constant monitoring on your integrated systems, although you don’t want these tests running on every health check build that runs. For these integration tests, though, the execution time of the tests also needs to be measured to ensure our performance remains consistent.

If there has already been a decision to build out a series of integration tests into your unit test layer, you can extend or clone these existing integration unit tests and make some minor modifications as a cheap way to get some performance regression tests into the system.

Eliminating Environment Variables

The performance integration tests are required to execute in an environment that is consistent from one run to the other. This eliminates network or hardware differences that can affect performance time. Using Microsoft Test and Lab Environments to execute a build against the actual target lab, or running these unit tests only on the build server as part of an integration testing build, allow you to more readily predict the behaviour of the tests and not introduce yet another possible performance variable.

Determining Limits

The assumption here is that you know when the system is in a good state. If the system is performing the way it is expected, you can then see how long your routines and integrations take in a “good state” scenario. These become your limits. You may want to buffer your limits a little to allow for slight variances in performance, but capturing these run times is crucial to being able to monitor performance. These limits will NOT be your production target metrics. If you are executing these tests on a build server or in a development lab, you won’t have production hardware or networks, so you can’t assume they will perform as quickly. The goal here is to determine the limit that yields acceptable performance within the target environment where the test will run, and monitor against this limit to ensure that performance doesn’t get worse.

How is this limit captured? An existing unit test built to execute a regression call on the integrated system can be extended to also monitor the performance of the call. In this example, we will see an integration test that was ensuring the application could load a Coffee instance from an integrated third-party Coffee Service.

/// 
/// Validates if the performance is matching the limits specified.
/// 
[TestMethod(), TestCategory("Performance"), TestCategory("Integration")]
public void TestIntegrationPerformanceTime()
{
   //Setup variables
   var code = "espresso";

   //Capture the time
   var startTime = DateTime.Now;
   CoffeeService.Get(code);
   var endTime = DateTime.Now;

   //Determine response time in milliseconds
   var responseTime = (endTime - startTime).TotalMilliseconds;
}

In the above unit test example, there is an integration call to a third-party system that is feeding us coffee information via a web service based on a product code. This simple unit test can be executed to capture the response time.

The unit test can be executed in debug mode on the local developer machine and be watched for the responseTime variable value to determine a possible initial limit, but this value will not match to how the test will execute on the build server or in the development lab. However, it gives a first “guesstimate” at the limit.

Alternatively, a bit of logging of the responseTime variable to a log file can be used to help determine the response time when it executes in the target environment. After checking in an update to the above example with logging code added, a run of the integration tests can be done and the logging outputs observed.

For the purposes of this example, assume the response time that came back is 530 milliseconds. The performance limit now needs to be determined. If this call doubles in time, will the application still function within acceptable parameters? Probably not, so an initial limit of around 1000 milliseconds can be used to try to monitor if this call starts exceeding that limit.

Side note: If it is known that the application is already teetering on the edge of performance, the buffer should not be as large as the one done above. In those cases, a tighter limit would be set (perhaps 600 milliseconds in this example).

Monitoring Limits

The next step is simply to add assertions to check against the performance limits. The unit test will be updated to ensure that the test will fail if the execution time exceeds the determined performance limit.

/// 
/// Validates if the performance is matching the limits specified.
/// 
[TestMethod(), TestCategory("Performance"), TestCategory("Integration")]
public void TestIntegrationPerformanceTime()
{
   //Setup variables
   var code = "espresso";
   var performanceLimit = 1000;

   //Capture the time
   var startTime = DateTime.Now;
   CoffeeService.Get(code);
   var endTime = DateTime.Now;

   //Determine response time in milliseconds
   var responseTime = (endTime - startTime).TotalMilliseconds;

   //Check response time against limit
   Assert.IsTrue(responseTime < performanceLimit, "Coffee integration performance failed.  Expected {0}.  Actual: {1}", performanceLimit, responseTime);
}

Configuring Limits

You may also want to add a configuration layer on this to allow you to tweak the limits. Some simple app settings in the app.config of your unit test project can be used to specify the performance limits and will allow you to leave the unit test itself untouched when you decide to tweak what your expectations are of the various integration performance times. Just load the performanceLimit variable from an application setting instead of coding in the limit directly into the test, and you’re good to go!

Scaling Up and Moving Forward

As you build up, you’ll find this cheap option does start slowing down your integration test runs as you are effectively doubling the times you run these integration tests. You’ll run it once to check for logical consistency, and then an additional time to check for performance. For longer-running integrations, this makes your integration testing suite start grinding to a halt. To try to alleviate this, you can embed the performance check inside the existing unit test that tests the logic instead of cloning the test to run performance.

In the long run, however, it won’t just be your integration tests that you want to profile in this way. Adding this code to every single unit test slows the test cycle down and increases the time you are spending writing it, determining limits, and monitoring the limits. After a handful of these, you should probably look into the profiling tools or the load testing solution to allow for scaling up.