6 Steps to Service Outage Analysis Denver CO

SOA provides both a valuable learning exercise as well as a clear and justified RFC to improve service availability and improve customer satisfaction, writes ITSM Watch columnist Hank Marquis of itSM Solutions.

Local Companies

Mission Critical Systems
303.383.1627
621 17th St. Suite 2121
Denver, CO
Qwest
(303) 896-8515
1801 California Street
Denver, CO
Invision Consultants, Inc.
(303)339-0848
303 S. Broadway, Suite 200-112
Denver, CO
MCCOOL'S GPS SERVICE
303-555-1212
300 South Locust ST.
DENVER, CO
BoydAnswerTech LLC
(303) 520-0073
Denver, CO
The Uptime Group, Inc.
303-757-4611, X404
5805 W 6th Ave Unit 1PA
Lakewood, CO
DENVER DATAMAN
303731-5978
1685 S. Colorado Blvd
Denver, CO
deSabran
303 782 6976
5082 E Hampden Ave
Denver, CO
BlueModus
303-951-0318
1720 South Bellaire St Suite 701
Denver, CO
Data Technology Services, Inc.
(303) 989-1446
10920 W Alameda Ave
Lakewood, CO

provided by: 
Originally published at Internet.com


The IT Infrastructure Library (ITIL) refers to service or systems outage analysis (SOA) as a method to improve availability. Presented as an availability management process tool or technique, SOA is a powerful management tool to improve quality.

As is quite common since the ITIL is descriptive and not prescriptive, ITIL does not explain how to carry out a SOA. In this article I will explain what an SOA is, its benefits, and give you an easy to follow six-step guide to performing SOA.

The reason to use SOA is to identify the causes of outages and thus reduce the frequency and duration of outages. SOA aims to improve mean-time-to-repair (MTTR).

The result of a SOA is clear understanding of what happened to cause an outage, and exposes the risk of future outages due to the same cause or causes. Finally, an SOA can produce recommendations for improvement to avoid the issue in the future.

With these types of benefits, you might think that performing an SOA is complicated but, in reality, just the opposite is true: You can perform a SOA without any major investment in software, tools, or training.

Performing an SOA is straight forward. Working with problem management and customers, you examine past outages to identify configuration items (CI) (products, people, or process) related to an outage. In effect, you simply review the impact to the organization and infrastructure as reflected by how the organization responded to an outage.

This is different from proactive problem management since availability management has a scope that includes the organization (people, process, training, staffing, etc.)

Getting Started

To get going, collect outage data in the form of incidents, any related closed problems, or known errors. Gather together a team of people familiar with the outages, the infrastructure, processes, procedures, people, and so on. Be sure to include a customer representative and perhaps some users on the team as well (their input will be critical in guiding the team through the SOA process).

Once you have the team empowered, lead them through the six following steps:

Group related outages together by vendor, product, family, application, customer, etc. Then, using customer and user input as appropriate, categorize each outage as "significant" or "less significant." Focus only on those labeled "significant," and monitor the "less significant" for future outages.

For each outage tagged as "significant" review the root cause of the unavailability (this requires closed incidents and problems.) For example, faulty hardware or software. This is probably already known since the outage is resolved.

Perform a simple Pareto analysis to break the significant issues into a smaller group. The Using the Pareto 80/20 rule you can rank the related outages and their causes.

You will find that the majority (80%) of the outages result from a select few causes (20% of the organization or infrastructure.) Of course, you want to focus on the 80% of the outages caused by the 20% of the causes.

For each grouping of similar outages, examine the reasons for the duration of the unavailability. For example, the outage may have occurred because of faulty hardware or software, but the duration of the unavailability might have been extended by lack of tools, little or no training, unavailable spares, etc.

Remember to consider the "3 P's" - people, product and process. Then review:

* All existing procedures and policies used during the outage. * The actions and inactions of staff members, customers and anyone else involved in the outage or its restoration. * The management directives given to all involved during the before and during the outage.

You must determine if anything might have lessened the duration of the outage, or better yet, avoided it altogether. Your examination of the "3 P's" should locate a trend, a related cause, or at something in common with similar outages. This is the smoking gun.

For example, a common cause might extend an outage may be a hierarchical escalation requirement that does not allow staff to proceed without management approval or a special tool is required and could not be found.

The next step is to quantify the avoidable outage time. That is, if one hour of downtime resulted from trying to locate the proper tool, then the avoidable outage time is one hour times the number of outages so affected.

Identifying the most preventable downtime is your goal. This is then the most significant generator of preventable downtime.

End the SOA by creating a report summarizing the number of outages analyzed, timeframe, avoidable outage time, and the suggestions for improving or avoiding the outage. Prepare a request for change (RFC) and pass the entire kit on to change management.

Hank Marquis is a managing partner and CTO at itSM Solutions. You can contact Hank at hank.marquis@itsmsolutions.com.

Author: Hank Marquis

Read article at Internet.com site

Featured Local Company

Mission Critical Systems

MCS Denver IT Services include IT Maintenance, IT Consulting, Document Management, SharePoint, & Microsoft Training

303.383.1627
621 17th St. Suite 2121
Denver, CO
http://www.mcstech.net

Mission Critical Systems is a well respected IT Services Company in Denver Colorado. We are the outsourced IT department for many small and mid-sized Denver companies, and we offer IT Consulting to mid sized and large businesses.
Our IT services in Denver include IT Maintenance, IT Consulting, Knowledge Management solutions, Document Management Solutions, SharePoint Development, Business Process Management Solutions, Microsoft Training, Managed Services, Microsoft Online Services, and much more.
We handle our IT Services engagements with an obsessive attention to process and planning. This distinguishes our company and provides quality IT Services performance for our Denver clients. We ensure that projects run on-time and within budget by thoroughly planning each project, systematically communicating with the client, tactical and strategic level consulting, and leveraging the depth of our IT Services experience.
Of all Denver IT Services vendors, Mission Critical Systems stands out for consistently delivering quality IT Services because our process, not our talent, controls client experience. Our process was designed with three needs in mind. Clients need to have transparency into the IT Function. Clients need consistent IT Services regardless of which engineer is available, and small and mid-sized businesses still need access to CIO level IT Services in order to make good decisions about IT.
We provide transparency into our IT Services by making our ticketing system accessible to clients. LiveDOC is our proprietary IT Services documenting system. Every project and task is recorded in the system so that the client can see what work has been accomplished and what is yet to be done. Also, if the primary engineer is unavailable for a particular critical maintenance visit, LiveDOC provides the information the replacement engineer will need to get started quickly and efficiently.
We provide consistent IT Services by constantly training and supporting our engineers in best practices and the latest technology. Because we install every server upgrade, every software addition, or every new piece of hardware for every client the same way, all IT Services are delivered on the same high level of quality. Our engineers have depth of experience in every task, our IT Managers can make quality recommendations, and our clients enjoy stable IT Services environments.
Every IT Services client is assigned a Director of IT Services, or DITS for short. This Director is a seasoned IT Professional, who can think strategically about IT problems and provide the best CIO level advice available.
As a Denver IT Services provider, Mission Critical Systems excels. We would be happy to earn your business too.

Denver IT Services
Denver Document Management

Related Local Events
SEG - Society of Exploration Geophysicists Annual International Meeting and Exposition 2010
Dates: 10/17/2010 - 10/21/2010
Location: Denver Convention Center
Denver, CO
View Details

INTC 2009
Dates: 9/21/2009 - 9/24/2009
Location: Hyatt Convention Center Hotel
Denver, CO
View Details

The American Society for Bone and Mineral Research 31st Annual Meeting
Dates: 9/11/2009 - 9/15/2009
Location: Colorado Convention Center
Denver, CO
View Details

RetailVision Fall 2009
Dates: 8/30/2009 - 9/3/2009
Location: Hyatt Regency Denver
Denver, CO
View Details

SANS Rocky Mountain 2009
Dates: 7/7/2009 - 7/13/2009
Location: Grand Hyatt Denver
Denver, CO
View Details