Baby Steps to SOA – Step Two: Measure It

Service-oriented ArchitectureIn the continuing Baby Steps to SOA series, we follow Doug and his IT team behind BuyMyWidget.com as they take steps to renovate their digital asset architecture. Previously, we introduced the problem and the team, started planning and analysis, and now we continue on our travel through the road map with measuring success!

The Evolutionary Roadmap

Evolution

Step Two: Measure It

Before embarking on years of refactoring, rebuilding, and redesigning, the team needs to identify how they will know if the changes they have made are bringing them closer to their goals.  This means determining what can be measured, and how it can be reported on to provide metrics on success of the project.  This is crucial to any endeavour in order to support the massive expenditures happening for a project, but given that most SOA projects involve internal “behind the scenes” changes which don’t yield visual changes to most end users, the metrics become even more important.

As with any Key Performance Indicators (KPIs), we must be very careful about which ones we choose.  Whatever metric is targeted will inevitably influence the team’s behaviour to maximize the outcomes of these metrics.  If bug counts are measured, fewer bugs will be reported by QA.  If mean-time-to-repair is measured, developers will focus on older bugs rather than using priority or severity to determine which bugs should be dealt with first.

Possible KPIs

There are may possible KPIs that we could use to validate that our system is evolving into something more usable, maintainable, and extendable.  The following are a few examples that could be used for evaluating progress in these areas:

  1. Percentage of New Defects Introduced by Change
    This allows us to measure improvements in the extendability and maintainability of the system.  Fixing bugs and adding new features will inevitably cause dependent systems or components to fail.  The fewer of these that occur when making changes to the system, the cheaper it is to build and fix the system.

  2. Mean Time to Repair (MTTR)
    Captures the average time to repair a failed component.  If no lead time is included in the measurement, the KPI can be used to identify how long it takes to fix the average issue in the system.  If a lead time is included, then the KPI will be more useful in tracking how long it takes from the point of reporting a bug to when the bug is fixed and deployed to production. In both cases, using a weighting based on the severity of the issue will yield a better result.  Higher severity issues usually take more time to do, but also need to have a short lead time.  Lower severity issues might take very little time to repair, but generally should have a very long lead time.

  3. Maintenance %
    Tracks the percentage of hours spent by the team working on bug fixes and general maintenance of the application as opposed to feature development.  Using this KPI will let you know if you are spending more time delivering revenue-generating features instead of keeping the existing feature set up and running.

  4. Support Requests per Page View
    Measures how many calls, tickets, emails, issues, etc. are received by your customer support team.  Weighting by Page View allows for this metric to scale as your user base expands and you begin to receive more visits to your web application.  The intention of this metric is to track an increase in usability of the site.  The fewer problems and questions that users have, the better an indication that the application is performing better for the target audience without direct assistance.

  5. Maintenance Down Time per day
    This metric averages the number of hours/minutes of maintenance down time that has occurred over a span of time for the measurement. Typically, you would want to capture this number for a quarter and for a year to match against any strategic goals identified by the management team.  Reducing the amount of time that systems are offline for maintenance shows an increase in the ability to rapidly deploy fixes to the system.

  6. Projects Completed per Quarter
    Increasing the number of projects that can be completed in a quarter allows for faster generation of revenue from investment made into projects.  This can be achieved by either adjusting processes to deliver smaller batches within a project, or by decreasing the costs of projects such that they can be delivered more quickly.

  7. Project Cost
    This requires projects to be similar when compared.  This KPI can be difficult to track if projects vary wildly in complexity.  If projects are very similar in scope and complexity, and are repeatable, comparing project costs allows for measuring if changes to the architecture have allowed for delivering the same amount of value for a lower cost.

In our scenario, Doug has several goals that he is trying to meet.  We first examined some of these in the scenario introduction:

“Doug’s budget costs from the previous year are under scrutiny. His team is unable to deliver new revenue-generating features without significant costs, and most of his team are unavailable due to incredibly time-consuming ongoing maintenance and support needs.  The board hasn’t cut Doug’s budget yet, but has indicated that Doug needs to show better revenue coming from his team’s activities in order for them to justify the costs”

“Customers are complaining about slow load-times and dropped order transactions.  Dealing with this is going to consume a lot of the team on an ongoing basis…”

“It is hard to find where problems are, and very expensive to fix them. Since the team has only been hacking in fixes and new features on top of a bad base, the system has gradually become a huge cost sink-hole.”

Given that the majority of Doug’s reporting needs are going to center around the maintenance cost aspect, Doug decides to track the following KPIS:

  1. Maintenance %
  2. Support Requests per Page View
  3. Project Cost
  4. Projects Completed per Quarter

Measuring

Having decided on the KPIs to be measured, the team now needs to implement ways to track and report the data required for these metrics.  Most teams don’t readily have the historical data they need available to them to create a baseline, which means that an exercise needs to be done to create this KPI baseline from whatever data can be gathered from existing information.  This generally involves going over budgets, project cost reports, customer support logs, bug reports, etc. in order to gather up the information needed for the various KPIs that are going to be measured.

MTTR graph
MTTR dashboard example from sms.interxion.com

Once the baseline has been created, systems need to be put in place to be able to gather new data in a manner that will be easy to compare against the baseline.  This data also needs to be able to be gathered in such a way that trends over time can be analyzed.  Most executive levels will want this information pushed up in dashboard form in a centralized location.  If your team needs to have quick responses to fluctuations in this data, you may also wish to consider realtime dashboards that are fed through strategically-placed monitors.

In our scenario, Doug has four KPIs he would like to measure.  For each, Doug needs to gather the data required and determine how the new data will be captured.

  1. Maintenance %
    Of all the KPIs, this data is the most readily available to Doug.  The current timesheet system has been split between R&D and maintenance projects, which allows Doug to easily grab realtime data from this system for actual hours spent on maintenance.  By dividing maintenance hours by total hours spent by the department, Doug can easily capture the KPI value at any point in time, but will need to store the historical result of this data.
  2. Support Requests per Page View
    The Page View portion of this KPI can be easily pulled from the system analytics.  Doug decides to pull down total page views that occur every day in order to provide the denominator for his KPI. The support requests are not as simple.  Customer support tracks their phone calls in one system, and this number can be pulled readily from this third-party tool.  The bugs filed by support are in the central bug tracking system alongside bugs that have been found internally.  The source of the bug is not readily apparent, though a guess can be made based on the reporter of the issue.  Doug decides to collect a list of reporters that he knows are on the support team and pull these issues from the system.Emails to support are not being tracked at all.  The mail server logs can indicate the number of emails that are sent to the support address, but this does not correlate to the number of emails that are related to customer support issues.  Doug decides to take a subset of these emails, and assume that any that don’t start with “Re:” are initial requests for help.  This isn’t entirely accurate, but will hopefully be close enough for comparison purposes. In order to ensure data is tracked correctly in the future, Doug asks the support team and internal QA team to ensure that bugs are logged with a value on them indicating their source.  This will allow for filtering out internal issues.  Additionally, Doug asks that emails be logged in the customer support call tracking system as a different type of call, so that there is only the single support system from which Doug needs to pull his information.
  3. Project Cost
    Doug has already pulled together some of the projects that the team has run recently when he used it for his analysis of costs related to systems within the architecture.  The difficulty here will be comparing new projects to existing ones of similar complexity.
  4. Projects Completed Per Quarter
    Historical data is available for Doug, but projects have not typically been scheduled to run and complete within a quarter.  This means that some projects span multiple quarters, so comparing quarterly numbers to each other is somewhat misleading.  Having 3 completed in the second quarter, and one completed in the first quarter could mean that the team is getting better, or just means that one of the projects slid from quarter one into two. In essence, this KPI is very similar to our velocity chart, in that stories from one iteration might slide into another and give a false sense of per-iteration performance.  In aggregate, however, we know that our velocity average gains more value with more data, and we can start predicting more accurately the next iteration based on our historical velocity average. With this in mind, Doug takes the data he has and plots the projects based on their completion date (“done”).  The numbers per quarter change drastically, and Doug notices that this seems to be because of projects of different complexities.   Going forward, this can continue and Doug can see if over time his project completion velocity can go up, just as he would with iteration velocity on story points.

How long will this step take?

If all of the data is already being tracked and is ready for reporting, it should only take a few hours to discuss and plan out which KPIs should be used.  A calendar day or two may be needed to create around 4 reports on existing data.

If data is not in a state to be used, the team should allocate a few weeks for data gathering, as well as spin up a short project (2-4 weeks) to run parallel to ensure data is gathered correctly in the future.

How much will this step cost?

The total cost will vary on the complexity of your data mining requirements.  The easier it is to garner the data for your KPIs, the cheaper this step will be.  At the very least, a senior member of the team will be spending 2-4 days developing KPI options, presenting them to a group, and gathering the data for the KPIs.

If additional effort is required for data mining, especially in an automated fashion, expect a group of 2-3 developers to be required to build the automation for mining the data from any tools that are in use by the organization.  The effort here will be based on the complexity of the data required for the KPI, the ease with which the data can be extracted from the tracking systems, and the ease with which the systems can be customized for gathering the data in the future.

What’s next?

This series continues with the team starting on their first refactoring of the architecture by focusing on the website codebase.  COMING SOON!

Advertisements

11 thoughts on “Baby Steps to SOA – Step Two: Measure It”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s