IBM GTS - PA
L&I - Systems Management Plan
Problem/Incident Management
Version 1.14
Prepared for
Commonwealth of Pennsylvania
Department of Labor and Industry
December 2010
Revision History
|Release/ |Revision Date |Author / Editor |Summary of Changes |
|Version Information| | | |
|1.0 |02/01/07 |Mike Smith |Creation of document |
|1.2 |05/15/08 |Mary Hill-Hartman |Updates per L&I comments |
| | |John Tamosaitis |Update document |
|1.3 |07/22/08 |John Tamosaitis |Updates per L&I comments |
|1.4 |8/13/08 |John Tamosaitis |Updates per L&I comments |
|1.5 |8/25 |John Tamosaitis |Updates per L&I comments |
|1.6 |10/31 |John Tamosaitis |Updates per L&I comments |
|1.7 |12/22/08 |John Tamosaitis |Updates per L&I comments |
|1.8 |3/2/09 |John Tamosaitis |Updates per L&I comments |
|1.9 |4/27/09 |John Tamosaitis |Updates per L&I comments |
|1.10 |5/15/09 |John Tamosaitis |Updates per L&I comments |
|1.11 |5/27/09 |John Tamosaitis |Updates per L&I comments |
|1.12 |10/16/09 |John Tamosaitis |Updates per L&I comments |
|1.13 |10/30 |John Tamosaitis |Updates per L&I comments |
|1.14 |12/9/09 |John Tamosaitis |Updates per L& I comments |
|1.15 |11/15/10 |John Tamosaitis |Update for Prod outages – use of incident reports |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
|Reviewed By: |
|Name |Team/Role |Reviewer Comments |Date Reviewed |
|Myrna Barnes |Chief, Customer Relations Division |Myrna Barnes |12/7/2009 |
|Anita Steinmeier |Chief, Enterprise Software and |Anita Steinmeier |12/7/2009 |
| |Information Division | | |
|Karen Fausnacht |Chief, Project Mgmt Division |Karen Fausnacht |11/16/2009 |
|Steve Yurich |Chief, Security Division |Steve Yurich |11/10/2009 |
|Jacki Hagmayer |Chief, Engineering and Research |Jacki Hagmayer |12/7/2009 |
| |Division | | |
|Ed Bowlen |Chief, Standards Development & |Ed Bowlen |12/8/2009 |
| |Compliance Division | | |
|Joe Sheridan |Chief, Data Mgmt & Database Operations |Joe Sheridan |12/8/2009 |
| |Division | | |
|Bryan Reed |Chief, Compensation & Insurance |Bryan Reed |11/13/2009 |
| |Division | | |
|Mary Lynn Kowalski |Chief, Unemployment Compensation |Mary Lynn Kowalski |12/8/2009 |
| |Division | | |
|John Shontz |Chief, Vocational Rehabilitation – |John Shontz |12/8/2009 |
| |Safety & Labor Mgmt Relations Division| | |
|Phil Day |Chief, Workforce Development Division |Phil Day |12/8/2009 |
|John Auchey |Chief, Server Farm Operations Division |John Auchey |12/8/2009 |
|David Vogelsong |Chief, Infrastructure Division |David Vogelsong |11/10/2009 |
|Bill Glatz |Chief, Network Support Services |Bill Glatz |11/12/2009 |
| |Division | | |
|Marty Thomas |Chief, Mainframe Operations Division |Marty Thomas |11/12/2009 |
|Approved By: |
|Name |Team/Role |Sign-off Date |
|Michele Sinko |Director, BES |12/18/2009 |
|John Malinoski |Director, BIO |12/18/2009 |
|Neil Ross |Director, BEA |12/18/2009 |
|David Andrews |Director, BBAD |12/18/2009 |
Table of Contents
1.0 Preface 1
2.0 Owner/Responsible 1
3.0 IT Process Integration 1
4.0 Problem/Incident Management 3
4.1 Problem/Incident Management Introduction 3
4.2 Problem/Incident Management Purpose 3
4.3 Problem/Incident Management Definitions 3
4.3.1 Problem/Incident Management Definitions 4
4.4 Problem/Incident Management Objectives 5
4.5 Problem/Incident Management Inter-relationships 5
4.6 Problem/Incident Management Guiding Principles 6
4.7 Problem Management Roles and Responsibilities 7
4.8 Problem/Incident Management Process 11
4.8.1 Problem Management Activities 11
4.8.2 Process Components 13
4.8.3 Description of Process Components 14
4.9 Problem/Incident Management Procedures 14
4.9.1 Problem Identified 14
4.9.2 Record, Assess, Classify Problem/Incident 14
4.9.3 Diagnose and Escalate Problem/Incident 15
4.9.4 Resolution/Bypass and Verification of Problem/Incident 15
4.9.5 Survey and Follow-up Problem/Incident 16
4.10 Problem Management Guidelines 16
4.10.1 Remedy Problem/Incident Priority Levels Matrix 16
4.10.2 Help Desk Priority/Event Management Matrix for Servers 17
4.10.3 Tivoli Response Framework Matrix 18
4.10.4 Help Desk Brain Knowledgebase Entries 19
4.11 Problem Management Metrics 20
4.12 Problem/Incident Management Tool Capabilities 21
5.0 Appendix A– Sample Problems 22
5.1 L&I Enterprise Problem/Incident Examples 22
6.0 Appendix B – Acronyms 23
6.1 L&I Acronyms 23
Preface
A cross IBM team signed a 6 1/2 year Service Oriented Architecture(SOA)-based application development contract with the Commonwealth of Pennsylvania for a new unemployment compensation modernization system (UCMS) that will provide a new platform for growth and innovation that will serve the Commonwealth for the foreseeable future. As part of the Agreement, IBM was required to prepare a Systems Management Plan for the UCMS project. This document represents the evolution of the Enterprise Systems Management (ESM) Plan work product. This document, L&I - Systems Management Plan - Problem/Incident Management, can be found at, T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL), along with other Systems Management Plan documents based on Information Technology Infrastructure Library (ITIL).
See Appendix B for a complete list of Acronyms used in this document.
Owner/Responsible
As of July, 21st, 2008 the Office of Information Technology (OIT), Bureau of Enterprise Services Customer Relations Division (BES-CRD) is the owner of this document. It is expected that the Systems Management Plan will be updated by the owner on a quarterly basis.
IT Process Integration
Multiple integration points exist between the processes that exist in IT Operational Management. The following figure gives a high-level overview of those integration points.
[pic]
3.0 Figure 1: ESM Process Integration (Refer to section 4.5 for description of this process)
Problem/Incident Management
1 Problem/Incident Management Introduction
A formal, structured process that addresses and identifies service anomalies and restoration of application or systems functions as quickly as possible to mitigate the impact to the Department of Labor and Industry (L&I) business and bring the services back up to the levels outlined in the Service Level Agreements (SLAs). L&I’s Problem/Incident Management Plan includes a Problem Process Owner, an Operations Manager Team Lead, a Help Desk Manager, Help Desk Coordinator, LINKS Help Desk Agents, and Level 2/Level 3 Subject Matter Experts (SME’s) for diagnosing and resolving Problem tickets. The entire process will be managed by the Help Desk Manager. The process will record the Problem and the root cause behind it, record the results of the resolution of the Problem, and provide information required by other processes, such as Change.
2 Problem/Incident Management Purpose
The L&I Problem/Incident Management Plan covers all problems and incidents that occur in all of the L&I custom application software, commercial-off-the-shelf software, and infrastructure/network support services hardware and software components that impact (or may impact) the L&I business and technology environments. Examples are listed in Appendix A – this list is a working document and will be modified over time.
The L&I Problem/Incident Management Plan will also serve as the “starting” point for problems that need to be forwarded to L&I’s overall Problem Management process.
3 Problem/Incident Management Definitions
The following diagram illustrates the relationships between Event Management, Problem/Incident Management, and Configuration Management.
[pic]
4.3 Figure 2: Problem, Event and Configuration Management
1 Problem/Incident Management Definitions
• An incident is any event that is not part of the standard operation of a service and causes, or may cause, an interruption to or reduction in the quality of that service.
• A problem is an unknown, underlying cause of one or more incidents. A single problem may generate several incidents.
o For the purpose of this document, the following are examples of problems:
▪ Events that are detected by L&I Tivoli Infrastructure and escalated to the Tivoli Enterprise Console (TEC) and Remedy.
▪ Incidents that are reported by end users through the LINKS Help Desk.
▪ Incidents that are identified by OIT staff and reported through the LINKS Help Desk.
• A large scale problem or outage is defined as one or more applications or services which becomes inoperable and causes a major impact on the availability or function of systems. Examples of some systems include but are not limited to:
o Wide Area Network (WAN) links or Metropolitan Area Network (MAN) links that affect a large number of users
o Enterprise applications
o Public facing applications
o Enterprise servers that service a large number of users
o Enterprise shared applications
o Mainframe applications
o Voice services affecting a large number of users or multiple sites
o Desktop services that affect a large number of users or sites
o Facility issues that affect a large number of users or multiple sites
o Business Applications
4 Problem/Incident Management Objectives
The objective of the L&I Problem/Incident Management Plan is to provide a set of unambiguous and repeatable processes and procedures for:
• Providing a model for recording and resolving Problems and Incidents that may occur within the L&I environment. (Please see Appendix A – Sample Problems)
• Providing initial support and classification of received Incidents and Problems
• Ensuring that Problems and Incidents are assigned to the proper support team with an assigned priority
• Ensuring that all Problems and Incidents are resolved within established time frames (according to priority) and/or escalated to the next level of support
• Effectively tracking and managing Problems and Incidents once they occur
• Providing information to other processes, such as Change and Service Level Management
• Leveraging knowledge bases to increase problem resolution effectiveness
• Reviewing and validating closed problems to ensure customer satisfaction
• Performing trend analysis and proactive problem prevention
• Leveraging Help Desk tools to increase problem resolution effectiveness
5 Problem/Incident Management Inter-relationships
Following are some specific examples of how Problem/Incident Management interacts with other IT Operational processes. (Please note, not all process listed are depicted in 3.0 Figure 1)
• Change Management
o Fixes for problems will generate changes to all environments and will require change requests to install the tested and approved fixes.
o The implementation of a change may trigger problems in all environments that need to be logged and managed by Problem Management Process.
o Description and schedule of changes planned for systems is needed for problem analysis
o Help Desk training requirements associated with technical changes.
• Event Management
o Future Management Plan
o When an event is classified as a potential or current problem, a Problem Ticket should be opened and handled within the Problem Management process.
• Configuration Management
o Future Management Plan
o The Problem Management process will obtain configuration information via the Configuration Management Process, when required.
o The Problem Management process uses Configuration Management information during monitoring and troubleshooting problems.
• Asset Management
o Future Management Plan
o The Problem/Incident Management Process requires updated Asset Management information for use during monitoring and troubleshooting problems.
• Service Level Management
o Future Management Plan
o Problem/Incident Management provides data to Service Level Management for use in preparing measurement reports.
o Problem/Incident Management must also detect and identify problem trends that impact the attainment of service targets as a result of repetitive problems.
• Performance Management
o Future Management Plan
o System and performance problems are reported through Problem Management for analysis and resolution by the responsible technical support staff.
• Backup and Recovery Management
o Future Management Plan
o Problem Management is linked to Backup and Recovery Management to ensure that all component problems have been identified and properly recorded.
o Documented and/or automated recovery procedures are essential for fast problem resolution or service restoration.
6 Problem/Incident Management Guiding Principles
Guiding principles are fundamental rules or guidelines that establish design and implementation constraints and align with management’s vision of L&I service delivery:
• The LINKS Help Desk provides a single point of contact (SPOC) for L&I employees and business partners needing technology support during agreed upon coverage hours. All problems raised are entered into Remedy.
• The long term objective of Problem/Incident Management is to have all problem/incidents called into the LINKS Help Desk as a SPOC.
• Events are generated, forwarded to the Tivoli Enterprise Console (TEC) and entered into Remedy depending on a pre-assigned priority and risk level. Priority and risk levels are described in T:\All (Common area for all OIT Staff)\Tivoli.
• Soft skills such as customer service orientation, communication and analytic ability are a priority at the LINKS Help Desk.
• The Help Desk is proactive rather than reactive wherever possible.
• Service level targets for the LINKS Help Desk are defined, measured and reported on a regular basis.
• Interfaces to other organizations are through a defined set of escalation processes and support agreements and enabled by a Help Desk management system.
• Support acceptance criteria for applications and systems include timely review and acceptance rights for the Help Desk at the pre-implementation stages and all required changes are made in accordance with the Change Management process requirements.
• The Help Desk is automated wherever possible.
• There are defined second and third level support groups or SME’s, depending upon level of expertise, and associated procedures for routing all problems or service requests that can not be addressed by the LINKS Help Desk (LINKS Help Desk = Level 1; Support Groups = Level 2 or Level 3).
• Help Desk, Level 2 and Level 3 Support Groups have access to all appropriate resource tools and information databases to assist in servicing the customer request or addressing problems.
• Any problems that cause an outage are entered into Remedy and an Incident Report developed.
• Failure to conform to the Problem Management process will result in appropriate management action.
7 Problem Management Roles and Responsibilities
|Role |Responsibility |Members |
|L&I Customer/Employee |Initiates the need for a Help Desk Ticket and opens a call with |L&I Customer/Employee |
| |the LINKS Help Desk Analyst (All calls come to LINKS Help Desk) | |
|L&I/OIT Employee |Reports problems in response to L&I employee concern by entering |L&I/OIT/BBAD Staff |
| |into Remedy or by reporting problems to LINKS to enter into |L&I/OIT/BEA Staff |
| |Remedy |L&I/OIT/BES Staff |
| |Provides responsive, timely support to all support requests |L&I/OIT/BIO Staff |
| |escalated from the LINKS Help Desk Agents or OIT self reported | |
| |Help Desk tickets | |
| |Resolves the problem, documents the solution in the database or | |
| |ensures follow-through if the call is passed to another Level 2, | |
| |Level 3 SME | |
| |Maintains service level agreements on response turnaround | |
|Problem Process Owner |Acts as the overall “evangelist” for process work |L&I/OIT/BES/CRD Chief, |
| |Prioritizes investment, as the responsible individual for the | |
| |cost and investment overall in process work | |
| |Resolves or escalates cross-process issues | |
| |Approves new process definitions and approves or rejects process | |
| |deviation requests· | |
| |Assigns or designates ownership and roles and responsibilities | |
| |for each Operational process· | |
| |Evaluates process performance against standards and control | |
| |criteria | |
|LINKS Help Desk Agents |Provides telephone assistance to customers and maintains accurate|LINKS Help Desk Agents |
| |records | |
| |Makes the first attempt to resolve the service issue reported by | |
| |the end user | |
| |Acts as end-user advocate to ensure that service issues are | |
| |resolved in a timely fashion | |
| |Ensures that the ticket contains an accurate and properly | |
| |detailed description of the problem | |
| |Ensures that the priority classification is correct | |
| |Recognizes patterns of symptoms, applies search tools to identify| |
| |previously developed solutions, and helps end-users implement the| |
| |solution. | |
| |Assumes responsibility for problem tickets until resolved | |
| |Escalates problems, to Level 2 support group, if unable to | |
| |satisfactorily resolve them | |
|Lead Help Desk Agent |Provides telephone assistance to customers and maintains accurate|LINKS Help Desk Lead Agent |
| |records | |
| |Makes the first attempt to resolve the service issue reported by | |
| |the end user | |
| |Acts as end-user advocate to ensure that service issues are | |
| |resolved in a timely fashion | |
| |Ensures tickets contain an accurate and properly detailed | |
| |description of the problem | |
| |Ensures ticket priority classification is correct | |
| |Recognizes patterns of symptoms, applies search tools to identify| |
| |previously developed solutions, and helps end-users implement the| |
| |solution. | |
| |Assumes responsibility for problem tickets until resolved | |
| |Escalates problems, to Level 2 support group, if unable to | |
| |satisfactorily resolve them | |
| |Verifies customer satisfaction of problem resolutions (Level 1 | |
| |and Level 2) by performing customer follow-ups | |
| |Develops department-specific reports in Remedy and for the | |
| |Automatic Call Distribution System(ACD) | |
|Help Desk Coordinator |Communicates problem status and unresolved problems to customers |L&I, |
| |and Help Desk management |Help Desk Coordinator |
| |Verifies customer satisfaction of problem resolutions (Level 1 | |
| |and Level 2). Remedy sends a survey to customers when a ticket is| |
| |resolved | |
| |Maintains and improves communication and escalation lists | |
| |Develops department-specific reports and procedures | |
| |Participates in the problem review process | |
| |Ensures assigned priority level for tickets follows the | |
| |agreed-upon guidelines and that problems are resolved or | |
| |escalated within service level targets | |
|Help Desk Manager |Ensures that a well defined, consistently executed and effective | |
| |PM/IM process is established and maintained |L&I Help Desk Manager |
| |As owner of the IM process, ensures that the process and | |
| |capabilities are adequate, and are improved when necessary | |
| |Reviews and understands the Problem Management process and tools | |
| |Evaluates the effectiveness of the PM/IM process and supporting | |
| |mechanisms such as reports, communication formats/messages, and | |
| |escalation procedures | |
| |Makes recommendations to the Problem Process Owner on ways to | |
| |improve the process | |
|Level 2, Level 3 Subject |Provides responsive, timely support to all support requests |L&I/OIT/BBAD Staff |
|Matter Experts (SMEs) |escalated from the LINKS Help Desk Agents |L&I/OIT/BEA Staff |
| |Resolves the problem, documents the solution in the database and |L&I/OIT/BES Staff |
| |ensures follow-through if the call is passed to another SME |L&I/OIT/BIO Staff |
| |Maintains service level agreements on response turnaround | |
| |Works as a team to resolve outstanding support problems and/or | |
| |requests to even workload, establish priorities, and meet | |
| |deadlines | |
| |Escalates and works with appropriate vendor support to resolve | |
| |issues where appropriate | |
|Level2, Level 3 SME Managers |Leads and manages Level 2 and Level 3 SMEs throughout the problem|L&I/OIT/BBAD Management |
| |resolution process. |L&I/OIT/BEA Management |
| |Provides communication and notification to users, OIT Bureau |L&I/OIT/BES Management |
| |Directors and CIO as necessary |L&I/OIT/BIO Management |
|Problem/Incident Coordinator |Assembles and manages the Level 2 and Level 3 SME teams and |L&I OIT BEA Bureau Director |
| |sub-teams | |
| |Coordinates with other Commonwealth agencies, OIT managers, | |
| |business process managers and agency executives | |
| |Establishes team leads as needed | |
| |Leads and manages sub-teams to ensure close coordination and | |
| |communications with each of the sub-teams | |
| |Takes ownership of business critical IT problems and deliver | |
| |effective workaround implementation, accurate root cause analysis| |
| |and problem resolution | |
| |Ensures complete and accurate documentation is completed at all | |
| |stages of the Problem Management process | |
| |Details responsibilities and specific tasks for emergency | |
| |response activities and business resumption operations based upon| |
| |pre-defined timeframes | |
|Operations Manager Team Lead |Attends required meetings and effectively communicates the status| |
| |of problems with high visibility to senior management, when |BIO Technical Operations Lead |
| |required | |
| |Conducts Incident Report meetings for analyzing outages | |
4.7 Figure 3: Problem/Incident Management Roles and Responsibilities
8 Problem/Incident Management Process
The Problem/Incident Control Process is a structured, step-by-step approach to controlling and managing Problem activity. This process will focus on restoring interrupted service as soon as possible.
1 Problem Management Activities
Five documented activities make-up the Problem Control Process:
1. Problem Identified
a. Receive notification of Problem/Incident through LINKS Help Desk
b. Tivoli generates an event
2. Record, Assess, and Classify Problem/Incident
a. Log call details in Remedy
b. Assess & classify Problem/Incident or Event (priority) & communicate
c. Assign priority level
d. Identify and execute incident bypass or resolution, if possible
3. Diagnose and Escalate Problem/Incident
a. Diagnose or escalate problem (Level 2,Level 3, vendor or Problem Incident Coordinator)
4. Resolution/Bypass and Verify
a. Recover from problem, if necessary, apply bypass or temporary fix
b. Resolve problem (correction at root cause)
c. Update customer and verify resolution
5. Survey and Follow-up Problem/Incident
a. Survey end user
b. Conduct Problem/Incident review meeting, produce an Incident Report, and analyze reports
The following figure, 4.8 Figure 4, illustrates the proposed L&I Problem/Incident Management process flow.
[pic]4.8 Figure 4: Process Flow
2 Process Components
The Problem/Incident Management process focuses on restoring interrupted service as soon as possible. Process Components are the Inputs, Tools/Techniques, and Outputs required for effective and comprehensive Problem/Incident control management.
The following figure maps each Process Component to the appropriate Process Flow Activity.
|Activity | |Record, Assess and | | |Survey and Follow-up |
| |Problem Identified |Classify |Diagnose and Escalate |Resolution, Bypass and | |
| | | | |Verify | |
|Tools and |Telephone system |Remedy |Level 2 Support team |Remedy |Remedy |
|Techniques | | |tools (various) |Level 2 Support team tools| |
| |Email |Brain Knowledgebase | |(various) | |
| | | |Level 3 Support team |Level 3 Support team tools| |
| |Tivoli Monitors |Procedures |tools (various) |(various) | |
|Output |Remedy Problem Ticket |Remedy Problem Ticket |Updated Remedy |Updated Remedy |Completed Problem |
| | | | | |Ticket |
| | |Communication to user |Problem Ticket |Problem Ticket | |
| | | | | |Communicate result |
| | | |Remedy Work Log |Remedy Work Log and | |
| | | | |Solution |Survey |
| | | |Conference Call | | |
| | | | |Incident Report(s) |Incident Log |
4.8 Figure 6: Process Flow Activity
3 Description of Process Components
|Process Component |Description |
|Purpose |Restore interrupted service as soon as possible |
|Owner |LINKS Help Desk/OIT Level 2/Level 3/Problem/Incident Coordinator |
|Input |Problem identified |
| |Problem recorded in Remedy |
|Output |Service restored |
| |End user notified |
| |Recorded in Remedy |
| |Updated Brain Entry, if required |
|Measurement |Quantity of tickets presently open |
| |Quantity of incidents (tickets) by time (monthly, quarterly) |
| |Quantity of tickets resolved by each support groups |
| |Average time tickets were assigned to each group |
| |Average time to resolve incident |
| |Percentage of incidents resolved by LINKS Help Desk |
| |Percentage of incidents escalated to support groups |
| |Customer Surveys |
4.8 Figure 7: Process Components
9 Problem/Incident Management Procedures
A procedure integrates a Process Flow Activity with one or more Process Components to create a series of step-by-step instructions that facilitate effective Problem/Incident Management. Below are the Problem/Incident Management Procedures.
1 Problem Identified
1 Receive call or notice of the Problem/Incident
• Problems are identified in one of two ways:
o An incident occurs is reported to the LINKS Help Desk via telephone call.
o Tivoli identifies a problem and TEC generates an event.
2 Record, Assess, Classify Problem/Incident
1 Log call details in Remedy
• The problem is recorded automatically in Remedy by TEC or manually by the LINKS Help Desk or an L&I OIT/Employee. Multiple tickets for associated problems/incidents will be related to one parent Remedy ticket as necessary.
• The problem is analyzed, properly classified and assigned a priority level.
• Common or previously identified problems will be resolved at this level when possible.
3 Diagnose and Escalate Problem/Incident
1 Diagnose the Problem/Incident
• Problems not immediately resolved or those that appear part of a larger problem will then be escalated to Level 2 SMEs.
• The Level 2 SMEs will perform problem diagnosis activities using the appropriate technical tools.
• In the event of a major problem/incident or an outage, Level 2 SME Manager will notify the appropriate people using the L&I – Problem/Incident Communication Plan.
• The Level 2 SME Manager will work with the BES-CRD Chief or the Help Desk Manager to determine if enterprise wide notification is necessary.
2 Escalate the Problem/Incident
• If the Level 2 SME is unable to identify and resolve the problem, Level 2 will escalate the problem to Level 3.
• In the event of a major problem/incident or an outage, Level 3 SME Manager will notify the appropriate people using the L&I – Problem/Incident Communication Plan.
• The Level 3 SME Manager will work with the BES-CRD Chief or the Help Desk Manager to determine if enterprise wide notification is necessary.
• Level 3 works to resolve the problem and/or involves the Vendor as required. If the Level 3 SME is unable to identify or resolve the problem or if the problem appears to be part of an outage or larger problem, Level 3 will escalate the problem to the Problem/Incident Coordinator.
• The Problem/Incident Coordinator will form adhoc teams to resolve the problem or deliver effective workaround and ensure complete and accurate documentation is completed at all stages of the problem resolution process.
• If the problem/incident or outage affects the Production environment and/or a Production System, the Problem/Incident Coordinator will initiate a conference call within the first 30 minutes with all appropriate staff involved (See Problem/Incident Communication PlanSection 5 for teleconference phone number).
• In the event of a major problem/incident or an outage, Level 3 SME Manager will notify the appropriate people using the L&I – Problem/Incident Communication Plan.
• The Problem/Incident Coordinator will work with the BES-CRD Chief or the Help Desk Manager to determine if enterprise wide notification is necessary.
The Problem/Incident Coordinator will work with Level 2, Level 3 SMEs and vendors as necessary on diagnosing the problem/incident.
4 Resolution/Bypass and Verification of Problem/Incident
1 Recover from problem/incident: Bypass or temporary fix
• Once the Problem/Incident has been correctly identified, a bypass or temporary fix can be implemented, if a permanent fix is not available in the required time frame as defined in Section 4.10 Figure 8.
• Before any temporary fix or bypass can be implemented, it will need to be tested, and scheduled through the change control process.
• The Remedy ticket should be updated and remain open until a permanent resolution can be developed and implemented.
2 Resolve problem (correction at root cause)
• Work will continue on the problem to ensure the root cause is identified and the problem resolved.
• The Level 2, Level 3 SMEs will document resolution in Remedy for quicker diagnosis of a similar Problem tickets in the future. If applicable, the resolution will be incorporated into the BRAIN Knowledgebase for future calls. Multiple tickets for associated problems/incidents will be related to one parent Remedy ticket as necessary.
• The resolution will be tested, scheduled through the change control process, and implemented.
3 Update Customer and Verify Resolution
• Once the Problem/Incident is resolved, the resolving Help Desk agent or Level 2, level 3 SME will verify with the L&I Customer/Employee that it has been resolved. The resolving agent or technician will proceed to resolve the ticket in Remedy.
5 Survey and Follow-up Problem/Incident
1 Survey End User
• When a Problem/Incident is successfully resolved in Remedy, a Remedy survey will be electronically sent to the L&I Customer/Employee as a follow up to the Problem/Incident resolution.
• The L&I Customer/Employee has the option of filling out the survey and commenting. The results are returned to the Help Desk Manger, Help Desk Coordinator and LINKS Lead Help Desk agent to be reviewed for possible follow up. The appropriate Level 2, Level 3 SME Supervisor may be contacted when the survey warrants further follow up action.
2 Conduct Problem/Incident Review meetings and analyze reports
• Produce OIT Incident Report and review report with management.
• Monthly reports are generated through Remedy and the LINKS ACD System.
o Top Ten Issues – Category, Type and Item
o Links - HD Services Rpt
o Call and Ticket statistics
o Survey Report
o Close Ratio
o Tickets per Agent
• The L&I Help Desk Manager will conduct monthly review meetings to review all Problem tickets, and review reports (generated from Remedy).
10 Problem Management Guidelines
1 Remedy Problem/Incident Priority Levels Matrix
The problem/incident priority levels are set depending on their source.
• An incident occurs for the end user and is reported to the LINKS Help Desk via telephone call. These problems/incidents are assigned a priority level by the LINKS Help Desk agent following the table in 4.10 Figure 8, Remedy Problem/Incident Priority Levels Matrix.
• OIT staff is alerted to Problem/Incident. These problem/incidents are assigned a priority level by the Level 2, Level 3 Subject Matter Experts (SMEs) following the table in 4.10 Figure 8. Remedy Problem/Incident Priority Levels Matrix.
• Tivoli identifies a problem and TEC generates an event. Based on the server’s risk assessment and the severity of the event in TEC, a ticket may or may not be opened in Remedy. The source of an event from Tivoli may determine its initial severity. If the source does not set the severity, it will be determined by the default settings for the event class in the Tivoli Enterprise Console (TEC). For tickets opened in Remedy, Remedy sets the trouble ticket priority based on the Help Desk Priority/Event Management Matrix for Servers, 4.10 Figure 9.
|Priority |Scope of Impact |Impact |Resolution Time |
|Urgent | | | |
| | |Critical (Entire office| |
| |Location or System Down|or Location, Impacts a | |
| | |large number of users) |0-2 Hours |
|High | | | |
| |Component Down or |Severe (Impacts a |2-4 Hours |
| |Degraded |number of users) | |
|Medium | |Minimal (Impact to a | |
| |Component Down or |single user) |+4 hours -Day |
| |Degraded | | |
|Low |None, component if |None (Impact viewed as | |
| |functional |an inconvenience to a |Day(s) |
| | |single user) | |
4.10 Figure 8: Remedy Problem/Incident Priority Levels Matrix
2 Help Desk Priority/Event Management Matrix for Servers
The Help Desk Priority/Event Management Matrix defines how events generated through TEC are mapped to the Remedy Priority Level. In most cases, this is done by checking the “risk level” assigned to the server in the Remedy Asset Record.
For example, a TEC Severity Level of ‘Critical’ and a Risk Level of ‘Medium’ will produce a Remedy Help Desk priority level of ‘High’. In cases where a Help Desk ticket needs to be entered manually, only the Risk Level assignment of the server will be used to set the Help Desk Priority Level.
|Help Desk Priority Matrix |
| | | | |Risk Level Blank |
| |Risk Level High |Risk Level Medium |Risk Level Low | |
|
|TEC Severity |Set Help Desk Priority as Shown Below |
|Warning | | | | |
| | | | | |
| |High |Medium |Low |Medium |
|Critical | | | | |
| | | | | |
| |Urgent |High |Medium |High |
|Fatal | | | | |
| |Urgent |Urgent |High |Urgent |
|None (Manually | | | | |
|Created DH Ticket) |Urgent |High |Medium |N/A |
4.10 Figure 9: Help Desk Priority/Event Management Matrix for Servers
3 Tivoli Response Framework Matrix
Problems/incidents that are generated through TEC will be escalated based on the server’s risk assessment and the severity of the event as defined in the following Tivoli Response Framework Matrix below.
For example, if a server risk level is set to “High” and the TEC event is determined to be either critical or fatal the following actions will be taken:
1. An Alarm Point Call will be generated 24/7 AND
2. A Remedy Ticket will be created using the Help Desk Priority Matrix AND
3. A Text Page will be generated AND
4. An Email will be generated AND
5. The event will be displayed on the TEC console
In a second example, if a server risk level is set to “HIGH” and the TEC event is determined to be either warning or minor the following actions will be taken:
1. A Remedy Ticket will be created using the Help Desk Priority Matrix AND
2. An Email will be generated AND
3. The event will be displayed on the TEC console
|Server Risk Level/ |High Priority |Medium Priority |Low Priority |
|TEC Event Severity | | | |
|Critical/Fatal |Alarm Point Call/Operator Call (24/7) |Alarm Point Call/Operator Call (Work |Alarm Point Call/Operator Call|
| |Remedy Ticket |Hours) |(Work Hours) |
| |Text Page |Remedy Ticket |Remedy Ticket |
| |E-Mail |Text Page |Text Page |
| |TEC Console |E-Mail |E-Mail |
| | |TEC Console |TEC Console |
|Warning/Minor |Remedy Ticket |TEC Console |TEC Console |
| |E-Mail | | |
| |TEC Console | | |
|Harmless/Unknown |TEC Console |TEC Console |TEC Console |
4.10 Figure 10: Tivoli Response Framework Matrix
4 Help Desk Brain Knowledgebase Entries
When the need for a new Brain Knowledgebase entry is identified, the following outlines the necessary steps.
• Notify the Help Desk Manager and Help Desk Coordinator
• The Help Desk Manager and Coordinator will email Brain Knowledgebase Entry Template to the requestor
• Create the new Entry using the template as a guide
• Email the new Entry to the Help Desk Manager and Coordinator
• The new Entry will be reviewed by the Help Desk Manager, Coordinator and the LINKS Help Desk Lead Agent
• If no corrections or additions are necessary the new Entry will be scheduled, then added to the Brain Knowledgebase
• If there were corrections or additions made the to the new Entry, it will be sent back to the requestor for review and approval
• The new Entry is then emailed back to the Help Desk Manager and Coordinator
• It is reviewed once again by the Help Desk Manager , Coordinator and LINKS Lead Help Desk Agent
• The LINKS Lead Agent schedules and adds the new Entry to the Brain Knowledgebase
11 Problem Management Metrics
Following are current reports of Problem/Incident Management metrics, which can measure the effectiveness of the process:
|Key Performance Indicators |
|Number of Tickets generated per Day, week, and Month. |
|Number of Tickets resolved Monthly at Level 1 |
|Monthly Top 10 Category of tickets created using CTIs. |
|Number of Tickets processed per agent Monthly |
|Number of Tickets escalated to Level 2 Monthly |
|Monthly Satisfaction Surveys |
|Monthly LINKS Help Desk Services Report |
|Number of Tickets generated by Tivoli daily |
|Remedy Problem Tickets by Category |
|Applications |
|Hardware |
|Network |
|Remote Access |
|Restore |
|Security |
|Server |
|Voice |
|Web Services Event |
|Problem Priority |
|1 - Urgent |
|2 - High |
|3 - Medium |
|4 - Low |
12 Problem/Incident Management Tool Capabilities
The following are capabilities that have been evaluated and implemented:
• Request/problem status communication
• Ability to assign priority to problems
• Interface with Tivoli Event Management Tool to create Problem tickets from Events
• Ability to provide current status of all tickets
• Ability to forward ticket based on escalation status matrix
• Ability to provide status and analysis reports
• Logging of Problem data in a database
• Access to Asset Management information
• Automated notification when problems are transferred from queue to queue
• Simple and quick entry and update of problem tickets
• High availability for the Remedy application and data
• Solicit and retrieve customer satisfaction information via Satisfaction Survey emailed to user after resolution of Problem/Incident
• Ability to design custom reports to extract desired data
Appendix A– Sample Problems
1 L&I Enterprise Problem/Incident Examples
The following are Problem/Incident examples given the use of current CTIs within Remedy
• Security/Password/CWOPA - CWOPA security password resets and/or account unlocks
• Voice/Voice/Dial Tone - Network problems experienced by a site concerning phone issues
• Network/Connection/Router - Network problems experienced by a site related to connectivity issues and/or performance problems
• Hardware/Network Printer/IBM-Lexmark - Local and/or network printer issues
• Applications/CWDS-BWDP/Staff Access - Application problems concerning CWDS
• Applications/UCMS-BBAD/Staff Access - Application problems concerning UCMS
• (Applications/DeskTop/Adobe - Application problems encountered locally on user’s PC
• Applications/Operating Systems/Windows 2000 - Application problems encountered concerning operating system errors or performance issues
• Applications/Email/Outlook-CWOPA - Application problems encountered concerning email operation
• Hardware/PC/Hard Disk Drive - Computer hardware problems encountered by users related to hard disk driver errors
• Hardware/PC/Network Card - Computer hardware problems encountered by users specifically related to network connectivity
• Server/Hardware/Hard Disk Drive - Server Hardware problems experienced by users generated by faulty hard disk drives
• Hardware/MainFrame/CPU - Mainframe system problems
Appendix B – Acronyms
1 L&I Acronyms
|Acronym |Definition |
|ACD System |Automatic Call Distribution System |
|BBAD |Bureau of Business Application Development |
|BBAD/CI |Bureau of Business Application Development/Compensation and Insurance Division |
|BBAD/WFD |Bureau of Business Application Development/Workforce Development Division |
|BBAD/UC |Bureau of Business Application Development/Unemployment Compensation Division |
|BBAD/OVR |Bureau of Business Application Development/Occupational and Vocational Rehabilitation Division |
|BBAD/SLMR |Bureau of Business Application Development/Safety and Labor-Management Relations Division |
|BEA |Bureau Of Enterprise Architecture |
|BEA/DMDB |Bureau Of Enterprise Architecture/Data Management and Database Management Division |
|BEA/ERD |Bureau Of Enterprise Architecture/Engineering and Research Division |
|BEA/SDCD |Bureau Of Enterprise Architecture/Standards Development and Compliance Division |
|BES |Bureau of Enterprise Services |
|BES/CoE |Bureau of Enterprise Services/Business Center of Excellence Division |
|BES/CRD |Bureau of Enterprise Services/Customer Relations Division |
|BES/PMD |Bureau of Enterprise Services/Project Management Division |
|BES/SD |Bureau of Enterprise Services/Security Division |
|BIO |Bureau Infrastructure and Operations |
|BIO/ID |Bureau Infrastructure and Operations/Infrastructure Division |
|BIO/NSS |Bureau Infrastructure and Operations/Network Support Services Division |
|BIO/SFO |Bureau Infrastructure and Operations/Server Farm Operations Division |
|BIO/MFO |Bureau Infrastructure and Operations/Mainframe Operations Division |
|BWDP |Bureau of Workforce Development Partnership |
|CIO |Chief Information Officer |
|CTI |Category, Type, Item |
|CWDS |Commonwealth Workforce Development System |
|ESM |Enterprise System Management |
|IS/IT |Information Systems/Information Technology |
|IT |Information Technology |
|ITIL |Information Technology Infrastructure Library |
|MAN |Metropolitan Area Network |
|OIT |Office of Information Technology |
|PIC |Problem/Incident Coordinator |
|PM/IM |Problem Management/Incident Management |
|SLA |Service Level Agreement |
|SME |Subject Matter Experts |
|
|SOA |Service Oriented Architecture |
|
|SPOC |Single Point of Contact |
|TEC |Tivoli Enterprise Console |
|UCMS |Unemployment Compensation Modernization System |
|Wan |Wide Area Network |
-----------------------
Process
Management
Availability
Process
Distribution
Release/Software
Security Process
Process
Management
Capacity
Performance/
System
Management
Event
Process
Management
Service Level
Process
Management
Configuration
Asset/
Process
Backup/Recovery
Problem Process
Help Desk/
Process
Management
Change
ESM Process Integration
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.