ITIL distils and documents the best practices of IT. However, some of its critics maintain that in practice, the result is to crystallise the status quo at a particular moment in time, preventing further evolution, explains Dominic Wellington from Moogstoft.


ITIL’s many defenders will quite rightly point out that ITIL does not define or document any specific process. Instead, its best practices are supposed to guide and inform the creation and definition of specific processes and their implementation in a particular context. Instead of refighting this particular battle yet again, I would like instead to look at how some of those implementations can be updated in order to better fit the new realities of IT.

Dynamic, not static
ITIL is over three decades old, and IT has changed dramatically in that time. In particular, where infrastructure (whether compute or network) used to be fairly static, at least in the short term, today it is increasingly dynamic, and the rate of change is continuing to accelerate. While the high-level imperatives of ITIL remain valid, this change requires a corresponding change in how monitoring events are filtered and correlated. The easy assumption that infrastructure could be exhaustively documented and assets tagged with barcode stickers no longer holds true in a virtualised and software-defined world. In turn, this means that alerts can no longer be filtered and correlated based on which device they originated on and how that device relates to others. Instead, new analysis techniques are needed to identify significant events in real time and map how they may relate to other events that are occurring around them. In this way, we can avoid overwhelming help desk operators with irrelevant tickets, and instead supply them with high-quality, real, and actionable incidents that they can immediately begin to investigate.

No single owner or single cause
A second consequence of the increasing complexity of IT environments is that there is rarely a single causal event which has triggered an issue. Instead, incidents are caused by a number of seemingly unrelated events, occurring in an unexpected pattern – because the expected patterns were foreseen and mitigated for already. This means that there is rarely an obvious “root cause”, but rather, there is a cloud of related events. To avoid creating duplicate tickets and causing unnecessary work for already busy Operations teams, we need to capture all of these related events in one place, and do so quickly and automatically. In turn, this means that there is no single owner of the resulting incident record, as it probably spans various teams’ areas of coverage and competence. New ways of working will be required, enabling ad-hoc teams to form and swarm around an issue, using techniques such as ChatOps to collaborate and resolve incidents rapidly and minimise their impact.

Distributed knowledge
In such a distributed context, it would be easy for knowledge to be lost, requiring operators to start their investigations from scratch each time. Because of the rapid pace of change, the chance that any individual incident might recur exactly is relatively low, reducing the usefulness of traditional knowledge base systems. The new approach is to capture knowledge unobtrusively in real time from operator actions, documenting behind the scenes which alerts were causal and which particular suggestion turned out to be the one that resolved the incident.

AIOps: From a singular static ITSM suite to a flexible algorithm-enabled IT Ops toolchain
The emerging field of AIOps helps to take ITIL into this new world, updating specific processes and implementations to suit the new realities in the data centre and in the cloud, while building on the solid shared foundations that ITIL continues to make available. The goal is to augment the capabilities of expert IT Operations teams by enabling them to complement their existing service desk systems with a number of specialised tools, assembling them into a toolchain that is a perfect fit for each organisation’s specific needs.

For more on how Moogsoft can accelerate incident detection and resolution times in concert with your existing ITIL processes, please visit booth 705 at SITS18. We would also like to invite you to our AIOps Symposium, in London on the 6th of September.

Guest post by:

Dominic Wellington

Dominic Wellington is Chief Evangelist at Moogsoft. He has been involved in IT operations for a number of years, working in fields as diverse as SecOps, cloud computing, and data center automation.

Articles mentioning Dominic Wellington

More like this

, ,

No comments yet.

Have your say

%d bloggers like this: