This is the last in a five-part series about IT Service Management. Review the series at www.rwonline.com ; click on the tab IT Service Management.
You’ve gone through the hoops of getting your IT Service Management together; now it’s time to go live. But are you ready? What distinguishes a good IT show from a bad one? And how to prevent stage fright?
Much of what people think of as IT comes from the public-focused Service Desk, which handles Incident Management and some Problem Management. In IT Service Management, Incident Management is concerned primarily with returning a system to a usable state, although service requests as well as failures can constitute Incidents.
Being out front, it’s easy to go with the flow, handle requests as they come in and take care of problems as they appear. This is a recipe for chaos. A good performer prepares the audience, has practiced the material and knows just how far to go. He or she even anticipates hecklers and has witty comebacks at the ready. ITSM should be no different. A good performance is creativity layered on top of thorough preparation and comfort with the material.
The Service Desk should have its act timed – how long to handle an Incident of whatever type – and have systems back in service by the end. The recovery process involves tools, spare parts, specific roles, procedures, communications, a business case and a limited mission.
Your Service Desk personnel should be comfortable running through the basic types of swap-outs and repairs required in typical cases, and they should have a good sense of procedures in unrehearsed scenarios. They should know the types of inputs needed for a solution – personnel, equipment, processes, et al. They should know when and how to escalate their response to problems to higher levels or more specific support functions, and when to stop, including how to use a diplomatic “No.”
The priority of an Incident is based on urgency of the problem crossed with business impact, usually defined and limited by Service Level Agreements. While some common sense is required in performing duties, too often the Service Desk is seen as providing heroics – performing above and beyond the call of duty. Well-designed IT Service Management should instead promote outstanding performance within agreed-upon boundaries through planning.
Training can be especially important for the handling of Incidents. Life is much easier when your users know how to work around system failure, turning trouble tickets into issues for Problem Management to resolve at a convenient time rather than urgent matters for the Service Desk.
Some methods of helping users to Do It Yourself include rebooting machines, using spare stations and studios or choosing alternate transmission paths, though there are rational limits to users’ abilities as well as costs involved with providing spare facilities. But for off-hours users, emergency training can cut down on late-night and weekend pages and the need for more 24×7 staff.
Of course not all Incidents are initiated by users. Your monitoring systems or routine maintenance can flag needs, failures or impending doom, and it’s up to the Service Desk to assign the proper priority for the response.
Once the Incident is handled, it should be properly recorded and passed on to Problem Management as needed. While in practice the cause of Incidents may be diagnosed and fixed in handling the Incident, this is a second hat, and should be kept separate, at least mentally, from the prime responsibility of restoring order.
Don’t forget the role of external systems in recovery. While you may have underpinning contracts for third-party vendors, these may not help when you need to be back online. Plan your contingencies around the likelihood of external problems, and try to understand the vendors’ emergency flows before you need to rely on them.
The human touch
Once the situation is handled, make sure users are aware of and using the restored system. It is also time to perform whatever public relations are required, be it a simple e-mail noting the problem is fixed, or a full facility briefing of what happened.
While a good automated trouble ticket system will handle much of the work for you, there is always room for a human touch. This is important not just for keeping users informed and productive but also for keeping a good attitude towards the clientele and putting the relationship to good use. One study noted that only 15 percent of facility problems are caused by people, but almost all problems can be better solved through feedback from those familiar with the systems.
Though Incident Management is separate from Problem Management, the staff that resolves problems has wide access to historical data and current Configuration info that can be useful to Incident efforts as well as Problem diagnosis. Recent change to the facility is a frequent source of breakage, so that logging changes and being able to pull up maintenance records quickly are important for the recovery effort. Anything that can help automate logging and bring the reporting closer to the point of service helps ensure accurate records. It’s not uncommon to be stopped in the hall on the way to your desk, so think of ways such as PDAs and logging stations to make updating the database more immediate.
Of course if the Configuration Database and other resources are down, they cannot help. Some thought needs to be given to what resources and information are critical with power outages, network or server failures; make sure backup versions are available, such as a PDA, USB drive on a laptop, a hard-printed copy or an accessible off-site system.
By the time an issue arrives at Problem Management, it is typically an Error that needs to analyzed and resolved. This is generally a fix or a workaround matched to a new or known error, or a Request For Change (RFC) affecting equipment, software, processes, training or even the Service Level Agreements.
Monitoring known problems and proactively preventing outages is a main part of the job. But the facility doesn’t stand still – learning about new weaknesses, building up better capabilities for debugging and monitoring, going pro-active on system upkeep and staying abreast of improved technical solutions are some of the tactics for problem handling. Make problem, process and facility improvement part of the daily and weekly schedule, build it into the budget, force it into the routine. Otherwise, there will never be a “good” time when staff can review procedures and knowledge – daily firefighting will take precedence.
Problem Management is of course not standalone. Aside from areas mentioned, there are strong obvious ties with Capacity, Continuity, Availability and Security, as well as Financial and Service Level management. Release Management attempts to decrease problems through controlled rollouts and homogenous systems. Even when a fix is in place, the solution needs to be monitored to make sure it’s effective and sustainable. Problem Management should drive the effort to make the facility reliable and understood.
Understanding takes on a special role with management. Aside from users, Service and Problem roles need to communicate effectively with management, both to provide metrics on performance and facility needs, as well as to express the reasons for its existence. Look at ways to tie your Configuration Database and trouble ticket systems into management reports that provide simplified graphical views of key facility metrics. Even line personnel need to see processed relevant metrics and other information, not raw unreadable data.
Try to send out reports automatically, not on demand, and follow up with human contact to make sure management and others involved are paying attention to the important points. Aside from outsourcing your job and cutting your budget, management might be making large errors in planning due to faulty understanding of IT operations, while lower staff may be adding to problems by not following procedures. Be prepared to follow up on pertinent issues in more depth. IT performance may impact other strategic issues, and its goals should be aligned as part of the overall business.
Managing expectations and providing calibrated service are two goals of ITSM, though “shameless self-promotion” should not be neglected. A quiet IT department may have problems or be under control, but users and management typically will suspect the worst. It’s not often that a shy, hesitant performer wins over a crowd, and the same can be said for IT. Come in prepared, organized and ready to improvise, and the chances of success improve 1,000 percent.
Smooth the edges
We’ve covered the gamut of IT Service Management, from its business goals, structuring and planning down to implementation. But this is just a framework; a real-world system accounts for the type of facility, the people who work there and end users of services.
Some audiences are better than others, while every performer has an off night here and there. Train your ITSM sights on the long term, a steadily improving act moving from stage fright through remembering your lines, adapting your content to the crowd and finally, perfecting your delivery.
Every act starts off a bit stiff, but try to have fun along the way, choosing appropriate goals and challenges and enjoying your successes. These will smooth the edges on the long path to becoming a successful ITSM performer.