You are here

Be Your Company's IT Hero - Evolving IT's Metrics

Author William Gibson is famously reported as having said: “The future is already here – it’s just not very evenly distributed.” Good examples of this phenomenon: how IT organizations view the role of “business metrics” (i.e., KPIs, SLIs, SLAs, SLOs, etc.), and which metrics they gather and emphasize in conversations (internally, with the business at large, and with customers).

Within some organizations, it’s still 2015 or so. Leadership is still earnestly working to be “data driven,” and IT life is measured by many numbers, typically derived in complicated ways.

In 2015-world, operational metrics propose arcane ways of thinking about availability that are strangely detached from the reality of the business. For example (and this is a real example, drawn from a 2015 article in CIO magazine), an online application’s availability might be judged first by establishing per-page baselines: having developers measure the time it takes for application pages to render on an unburdened deployment (Dev? Test? Prod?); then deciding arbitrarily that “fully available” means the time it takes for each page to render to the 90% point (what?), and then determining production availability by measuring what percentage of live page-renders meet or beat these times. The “relating this to the business” part then happens by calling this metric “Customer Service Level.”

2015-era delivery metrics are, likewise, earnestly timewarped. The organization is still hung up in the transition between oldschool waterfall project management and more agile methods. The former is still strongly dominant. But the latter (itself not yet fully embraced, shortcomings still unknown) is already making people uncomfortable – kind of like a snarky teenager, smirking at uncool parents. Discomfort is amplified by earnest attempts to fix waterfall by making it smell more agile (e.g., by insisting that delivery dates aren’t set until the ‘design phase’ is complete – a hugely-sensible thing to do that can all too easily end up looking like CYA). Attempts are also made to characterize agile as capital-A radical, rather than pragmatic, e.g., “For Agile projects, (delivery date) is not relevant as delivery dates are almost always met by adjusting scope.” Which is truthy, but very much the TL;DR (as pretty well elaborated in the answer to this StackOverflow question, also from 2015).

Cost metrics, defect metrics, all sorts of other metrics remain earnestly wonky in these decelerated environments. “Agile projects are less likely to benefit from (analysis of project cost).” Okay. “Measure changes made during pre-release code-freezes.” Oookay. “Measure unscheduled changes to apps in production.” “Measure manual changes made post-install on customer sites.” So okay. This is normal.

It is normal. It’s how a lot of IT organizations still run today, and it reflects a great deal of solid thinking based on direct observation. But that leaves several problems:

A lot of legacy IT business metrics are heavily conditioned by underlying assumptions that are malleable and fast-evolving. That means a metric’s useful life may be very short, and metrics need constant re-evaluation for relevance against ground truth. (Example: In a world of single-page javascript applications, what does that “every page gets scored on how fast it takes to render to 90%” even mean?)

Second, and maybe more important, business metrics need to be looked at from outside the technological box they were incubated in, because the most important, powerful changes you can and should be making now will likely make whole flocks of metrics go away completely. If you start delivering continuously, those “changes during code-freeze” metrics aren’t front-and-center any more. If you adopt infrastructure as code, you will never again make manual changes in production (or staging, or QA/Test, if you’re serious).

Third, business metrics selected and promoted as “important” are largely invisible (and/or irrelevant) to the business at large. And this is not helping you, your team, your department, or your business succeed.

Business Metrics for IT Heroes

Today’s IT Heroes are measured on contribution to the business: their impact (good or bad) on other peoples’ workflows, on customer satisfaction, revenue/sales growth, and on success in reaching longer-term strategic goals for tech.

Business relevance and common-sense understanding are key to IT business metrics that work for the whole organization. IT Heroes ask questions of managers, business leaders, colleagues, and customers to find out what’s really important to them, then create metrics reflecting these priorities. Eschew complexity. Nobody cares if 99% of page views were completed to 90% within each page’s computed optimal load time. But they care a lot if the website feels slow during peak usage hours.

Sharing is caring. Willingness to share metrics (and attendant commitments) widely is the hallmark of a mature data-driven business culture. Wherever you can, overcome institutional resistance (and technical frictions) preventing everyone from easily seeing what’s going on with critical services, how hard your team is working to provide them, and (gulp!) how well you’re succeeding.

Just because you inherited bad metrics, doesn’t necessarily mean you need to endure them. The stack of customer-facing metrics-based agreements (SLIs, SLOs, SLAs, etc.) tends to be deep, complicated, and can be resistant to change. Seek the help of Sales and Management in analyzing inherited SLAs, determining their relevance and good sense, building change proposals, and presenting these to customers at the most appropriate opportunity.

Automate all the metrics things, or suffer. Legacy metrics may resist updating because they’re inaccessible: computed in batches, at intervals, and requiring manual steps before they’re consumable. Take an action-item to stop this: any important metric should be accessible in near-realtime, packaged for consumption by all stakeholders.

Cost cutting is still important, but so are the semantics around savings. Efforts to control (read: ‘cut’) costs are important in IT – certainly, cost-consciousness is esteemed and needs to be visible. In the current, still-bullish climate, however, selective spending – i.e., “investing to cut costs long-term” – gets positive attention from management.

The take-away: it’s a good time to propose disruptive upgrades to IT process that hit the trifecta of increasing productivity and customer satisfaction while reducing risk. Deep automation initiatives, integrations with analytics and process management frameworks, new ways of managing and tracking cloud usage and costs, and implementation of self-service frameworks are all good candidates.

How does this Connect with Monitoring?

Enterprise-class IT monitoring software can play a critically important role in changing the business metrics culture of your organization. Here are some ways:

Use dashboards to share metrics. Monitoring should let you create simplified, customized views of system status (and share them securely without needing to provide access credentials to all and sundry). Bonus: sharing status dashboards tends to save IT time (no need to answer repetitive questions) and reduce organizational panic when issues arise.

Enable analytics, ITOM integration, automation. The best enterprise monitoring platforms can output raw IT metrics to analytics and machine learning platforms, enabling discovery of obscure patterns, security risks, etc. Integrated with ticketing or other process management platforms, monitoring can trigger ticket creation, facilitate resolution – even kick off automated fixes.

Track costs as a metric. Costs for public cloud services can be forbidding, and are often obscure until bills arrive. A solid enterprise monitoring platform can be customized to take cloud usage data and compute estimated costs automatically. Bonus, while costs are important to the front office, they’re also useful as a metric of engineering efficiency of apps on pay-for-use infrastructure.

Get unified insight into your IT operations with Opsview Monitor

More like this

Systems Fail
Blog

Here are three reasons why sysadmins should implement 'Read Only Fridays' and avoid making large-scale changes at the end of the week. 

Monitoring Azure
Blog

Opsview comes with 23 Azure Opspacks to quickly get your company monitoring your Azure infrastructure and applications.

Don't Monitor Yourself into a Madhouse 2
Blog

Done right, IT monitoring provides clarity and promotes operational effectiveness. Done wrong, it can make your staff crazy and limit business...