Go to a list of vendor call procedures or



Escalation Procedure (rev October 27, 2003)

|Jump to Severity Three crisis management |

|Definitions |

|Assignment and Aging |

|Sample problems and their severity levels |

|Back to Main On Call Page |

Escalation Definitions

Following are the definitions for the four severity levels referenced in this escalation procedure.

There is also a flow picture of how severity levels might change for a given problem.

Severity 0: lowest impact. Default for unassigned problems. Requires no active

assignment in a database. Definition:"Minor Failure [Single user, non-critical facility,

not clustered patient care]"

Severity 1: May be assigned by agent upon report. Definition: "Multiple Users, or

Single in Critical Area".

Severity 2: Equivalent of severity 1 with limited direct assignment by agent. This

level is typically a severity 1 problem with expected > 12 business hour repair

time escalated. Definition: Severity 1 definition with repair expected to be

greater than 12 business hours from the time of the report.

Severity 3: Major failure; CBX or Voicemail node down. Strategic business unit

network down. Backbone down. Internet connection down.

There are two e-mail lists used to communicate and escalate problems based on their severity. These messages are sent from the NTOC. One of the email lists sends the 33999 page and the other list is used to send a text message to a VIP list.

Escalation Flows

External Report’s Initial

Assignment and Ageing

0 1 3

2

|Jump to Severity Three crisis management |

|Definitions |

|Assignment and Aging |

|Sample problems and their severity levels |

|Back to Main OnCall Page |

Severity 0 : Minor Failure [Single user, non-critical facility, not clustered patient care]

Voice Examples:

NEC phone, Analog line, Metered Business Line (no dial tone, features not working, etc.);

VoiceMail box (can't access, lost messages);

A single individual's pager is not working.

A single student authorization code is not functioning.

Network Examples:

Soft failure degradation of departmental network performance (network functioning but

not efficiently), or intermittent component failure. Network problem clearly identified to be

within the department's internal network.

Patient Care Examples:

Single patient unit phone, or multiple patient unit phones where there is remaining service,

such that patient care has minimal impact.

Response time:

Average 4 elapsed hours, up to 12 business hours. Discretionary negotiation between

Manager On Call and patient unit leader. Staff phones serviced same evening. Patient

phones escalated through the patient phones telephone # x50143.

Notification / Business Hours:

Network and Telecommunications Operations Center (NTOC) > technician.

Problem owner is responsible for communication with customer and Network and Telecommunications Operations Center (NTOC).

Notification / After hours;

Network and Telecommunications Operations Center (NTOC) via Answering Service >

Manager On Call >

customer >

Network and Telecommunications Operations Center (NTOC) evening reporting number @ 3-1159. >

tech >

customer >

Manager On Call >

Closure report to Network and Telecommunications Operations Center (NTOC) @ 3-1159.

Escalation:

At any point, if trouble is determined to have propagated to more than one user, trouble

will be immediately escalated to a "severity 2 or 1" as appropriate by the technician

dispatched. Technician and Manager On Call discuss escalation to telecommunications

engineer on call.

Response > 12 business hours? = escalation to severity 1

Comments:

Customer and Network and Telecommunications Operations Center (NTOC) will be advised of status if problem is not projected to be

resolved within 12 business hours. Long standing problem status will be provided to

Network and Telecommunications Operations Center (NTOC) Triage position and customer by EOB day by the problem owner**1

|Jump to Severity Three crisis management |

|Definitions |

|Assignment and Aging |

|Sample problems and their severity levels |

|Back to Main OnCall Page |

Severity 1: Multiple Users, or Single in Critical Area

Voice Examples:

Can't access area code (software config); several phones down in an area (CBX group);

T line down to medical off site location; any problem, single critical user or area. Multiple

critical phones out of service in given area;

A single pager is not operational for a person who is on-call.

Network Examples:

Hard failure of important network components (department LAN, network router interfaces,

SLA devices, single switch, internet connectivity or single modem pool server).

Security Examples:

Limited display of individualized security concerns on individual, isolated machines. See Incident Response Team or other actions following the generalized response time section.

See response section below.

Patient Care Examples:

Three or more patient unit phones, or multiple staff phones in patient unit where business

is impacted and patient care may degrade as a result.

Response time:

Immediate dispatch for escalated trouble ticket (>12 business hours). On site within 1.5 to 2 hours (for non-escalated trouble tickets). Average response in 4 elapsed hours, up to 6 hours before escalation to severity two or three, as appropriate. Discretionary negotiation between Manager On Call and patient unit leader. Staff phones serviced same evening. Patient phones escalated through the patient phones telephone #. Critical phones on the Emergency Preparedness list require response with two way

radios (or cellular, if the customer prefers). Immediate response by technician who troubleshoots the system suspected. Contacts the customer if required for further information to assess the problem. Determine problem, advise Network and Telecommunications Operations Center (NTOC), resolve.

Technician/engineer assigned will be dedicated to problem until it is resolved.

Network and Telecommunications Operations Center (NTOC) immediately advises customers. If after one hour from start of problem determination, problem cause not identified, consult with switch tech and/or

engineering. Switch tech assumes ownership of the problem (even if engineering consulted). Switch Tech and Engineer assess the problem, advise the Network and Telecommunications Operations Center (NTOC), fix the problem. If appropriate, trouble escalated to vendor.

Security Incident Response Team Actions:

Upon receiving notification of a serious security vulnerability, the Engineer On Call e-mails notification to mailto:its.telecomhelp@rochester.edu and works with the Incident Response Team as well as any desktop support staff to address the localized problem and keep it localized.

Notification / Business Hours:

Network and Telecommunications Operations Center (NTOC) > technician

Problem owner is responsible for communication with customer, Manager On Call,

and Triage at Network and Telecommunications Operations Center (NTOC).

Notification / After hours;

Network and Telecommunications Operations Center (NTOC) via Answering Service >

Manager On Call >

customer >

------ assessment of severity two, potential escalation

------ and notification per severity one guidelines

----- if affecting patient care, potential consult with

----- AOC to validate impact and whether a need to

----- escalate

Network and Telecommunications Operations Center (NTOC) evening reporting number @ 3-1159. >

tech and engineer >

customer >

Manager On Call >

Closure report to Network and Telecommunications Operations Center (NTOC) @ 3-1159.

If severity two, the Network and Telecommunications Operations Center (NTOC) or Manager On Call may opt to initiate a severity page following the NTOC response process (at the end of this section). After-hours the Manager On Call may choose to initiate a call-tree.

Escalation:

At any point, if trouble is determined to have propagated to strategic business

units (e.g. Emergency or event locations) multiple users, trouble will be immediately

escalated to a severity two or three, as appropriate assessed by the technician and/or

engineer on site. Technician and Manager On Call discuss escalation to

telecommunications engineer on call. Consultation with AOC may occur to assess.

Comments:

Customer and Network and Telecommunications Operations Center (NTOC) will be advised of status if problem is not projected to be

resolved within 12 business hours. Long standing problem status will be provided to

Network and Telecommunications Operations Center (NTOC) Triage position and customer by EOB day by the problem owner*1

|Jump to Severity Three crisis management |

|Definitions |

|Assignment and Aging |

|Sample problems and their severity levels |

|Back to Main On Call Page |

Severity 2: Multiple Users, or Single in Critical Area with response time likely to exceed the initial 12 hours from time of report.

Voice Examples:

Can't access area code (software config); several phones down in an area (CBX

group); T line down to medical off site location; any problem, single critical user

or area. Multiple critical phones out of service in given area.

Network Examples:

Hard failure of important network components (department LAN, network router

interfaces, SLA devices, single switch, internet connectivity or single modem

pool server).

Security Examples:

Display of security concerns on machines with somewhat limited impact to the business environment. Limited danger to University assets. Presents a level of inconvenience. See Incident Response Team or other actions following the generalized response time section. Specific examples include port scans, large numbers of virus infected e-mail messages, reports of attempted exploit originating from the UofR. See security response section below.

Patient Care Examples:

Three or more patient unit phones, or multiple staff phones in patient unit where

business is impacted and patient care may degrade as a result.

Response time:

Immediate dispatch. On site within 1.5 to 2 hours with sniffers or other complex diagnostic devices. Average response in 4 elapsed hours, up to 6 hours before escalation to severity three. Discretionary negotiation between Manager On Call and patient unit leader. Staff phones serviced same evening. Patient phones escalated through the patient phones telephone #. Critical phones on the Emergency Preparedness list require response with two way radios (or cellular, if the customer prefers). Immediate response by technician who troubleshoots the

system suspected. Contacts the customer if required for further information to assess the problem. Determine problem, advise Network and Telecommunications Operations Center (NTOC), resolve. Technician/engineer assigned will be dedicated to problem until it is resolved.

Network and Telecommunications Operations Center (NTOC) immediately advises customers. ITS web site is updated. If after one hour from start of problem determination, problem cause not identified, consult with switch tech and/or engineering. Switch tech assumes ownership of the problem (even if engineering consulted). Switch Tech and Engineer assess the problem, advise the Network and Telecommunications Operations Center (NTOC),

fix the problem. If appropriate, trouble escalated to vendor.

Security Incident Response Team Actions:

Upon receiving notification of a serious security vulnerability as previously described, the Engineer On Call e-mails notification to mailto:its.telecomhelp@rochester.edu and works with the Incident Response Team as well as any desktop support staff to address the problem reduce further spread. Consider whether escalation to severity three is called for. If so, determine whether we communicate via abuse@rochester.edu and urcert@utd.rochester.edu mail lists.

Notification for other than a Security Issue / Business Hours:

Network and Telecommunications Operations Center (NTOC) > technician

Problem owner is responsible for communication with customer, Manager On

Call, and Triage at Network and Telecommunications Operations Center (NTOC).

Notification for other than a Security Issue / After hours;

Network and Telecommunications Operations Center (NTOC) via Answering Service >

Manager On Call >

customer >

------ assessment of severity two, potential escalation

------ and notification per severity one guidelines

----- if affecting patient care, potential consult with

----- AOC to validate impact and whether a need to

----- escalate

Network and Telecommunications Operations Center (NTOC) evening reporting number @ 3-1159. >

tech and engineer >

customer >

Manager On Call >

Closure report to Network and Telecommunications Operations Center (NTOC) @ 3-1159.

If severity two, the Network and Telecommunications Operations Center (NTOC) or Manager On Call may opt to initiate a severity page following the NTOC response process (at the end of this section). After-hours the Manager On Call may choose to initiate a call-tree.

Escalation:

At any point, if trouble is determined to have propagated to strategic business units (e.g. Emergency or event locations) multiple users, trouble will be immediately escalated to a "severity 3" as appropriated by the technician and/or engineer on site. Technician and Manager On Call discuss escalation to telecommunications engineer on call. Consultation with AOC may occur to assess.

Comments:

Customer and Network and Telecommunications Operations Center (NTOC) will be advised of status if problem is not projected to be resolved within 12 business hours. Long standing problem status will be provided to Network and Telecommunications Operations Center (NTOC) Triage position and customer by EOB day by the problem owner*1.

|Jump to Severity Three crisis management |

|Definitions |

|Assignment and Aging |

|Sample problems and their severity levels |

|Back to Main On Call Page |

Severity 3 : Major Failure; CBX or VoiceMail Node down. Strategic (critical) business unit network down. Backbone down. Internet connection down. Serious information security problem affects multiple clients and has non-trivial impact creating outages versus inconvenience. Information security problems that include serious disruption to business activities, exemplified by problems such as worms and security vulnerability exploits, especially those launched from the UofR community.

IMMEDIATE Organizational Actions, if voice services impacted.

1. Upon receipt of a severity three condition, during daytime business hours, the Network and Telecommunications Operations Center (NTOC) notifies a Senior Manager of the severity one and assigns a scribe to record events in a chronology. During non-business hours, the Manager On Call notifies a Senior Manager. Notifier clearly states "severity three emergency condition".

2. Chain of command defined.

3. The Senior Manager determines a designated location for the response team.

4. Manager presence in the affected area, as appropriate.

5. Lead Technical Engineer assigned.

6. Two way radios are distributed as follows:

a) Network and Telecommunications Operations Center (NTOC);

b) Manager;

c) Emergency Operations Center (EOC) rep;

d) Lead technical engineer;

e) Comm Center (as appropriate);

and / or

f) Area runner and walkabout. Two-way radios will be available locked in the Network and Telecommunications Operations Center (NTOC) for this purpose.

g) Each holder of a radio must be familiar with two-way radio guidelines. Note that the key to the cabinet holding the radios is in the Network and Telecommunications Operations Center (NTOC) Team Leader's desk drawer. It is labeled radios.

7. Emergency Operations Center phone number list distributed to individuals (either x50500, or two-way radio). Non-University phone service at this location.

8. 15 minute updates initiated.

9. Debrief document available within 72 hours of re-institution of service.

Response Time:

Immediate and continuous effort. Immediate technician dispatch and engineering involvement. Immediate response by technician to Network and Telecommunications Operations Center (NTOC) who advises status every 15 minutes. Work begins immediately. Identify, report and resolve problem. Technician/engineer assigned to this problem will be dedicated to its resolution until fixed. Complex diagnostic gear are immediate brought to both ends of communications points.

Security Incident Response Team Actions:

Upon receiving notification of a serious security vulnerability as previously described, the Engineer On Call e-mails notification to mailto:its.telecomhelp@rochester.edu and works with the Incident Response Team as well as any desktop support staff to address the problem and reduce further impact. Communicate via abuse@rochester.edu and urcert@utd.rochester.edu mail lists. Engineer On Call notifies the Manager On Call. Manager On Call notifies Senior Managers.

Notification for other than a Security Issue:

Network and Telecommunications Operations Center (NTOC) > engineer > Senior Manager, Director, and University Administration, as appropriate.

Immediate notification of critical areas affected by Senior Telecommunications Engineer. Engineer responsible for 15 minute updates* to Network and Telecommunications Operations Center (NTOC). Network and Telecommunications Operations Center (NTOC) or engineer communicates with critical customers every 15 minutes. Immediate notification to Security Dispatch as a "condition utility / telecommunication"

Comments:

*If delay is caused by equipment availability, tech will track shipment and notify Network and Telecommunications Operations Center (NTOC) as soon as repair can be scheduled

Expectations:

Whenever a "hand-off" occurs, tech rep and engineer will ensure that problem ownership is clear and communicated to the Network and Telecommunications Operations Center (NTOC).

Problem Owner* Duties:

*1(technician or engineer until TT returned to Network and Telecommunications Operations Center (NTOC))

1 - Informing the Network and Telecommunications Operations Center (NTOC) at regular intervals = every 15 minutes at most to one hour at least for Severity 3 status; end of day, or as possible for Severity 1 or 2; ongoing problem, by end of business day to Triage.

2 - Network and Telecommunications Operations Center (NTOC) sends a severity page following the NTOC response process (at the end of this section).

After hours communications protocol;

Network and Telecommunications Operations Center (NTOC) >

On Call Rep >

customer >

----- Notification to Security Dispatch for condition utility occurs -----

here

Network and Telecommunications Operations Center (NTOC) after hours @ 3-1159 >

tech/engineer >

customer >

On Call Rep.

1 - Updating "Notification" parties identified above. Tracking accumulated tech and engineer labor and materials for repair.

2 - Verifying billing with appropriate tech supervisor or backup.

3 - Providing all data to the Network and Telecommunications Operations Center (NTOC) at close out.

4 - Filing, or categorizing in AimWorX, any trouble tickets for SLA customers.

5 - Entering information into tracking dbase (as defined above).

|Jump to Severity Three crisis management |

|Definitions |

|Assignment and Aging |

|Sample problems and their severity levels |

|Back to Main On Call Page |

Sample Problems and Their Severity level (under construction)

|Sample Problem Description |Severity Level |

|dead phone, not on critical list |0 |

|single report of inability to connect to VPN and therefore to the University network |0 |

|single voicemail box problem, not on critical list |0 |

|inability to connect to VPN determined to be a multiple user problem. VPN never exceeds this severity level as a |1 |

|service that is not mission critical. | |

|notification of serious security vulnerability that has not implemented itself. |1 |

|pager broken for person who is on-call |1 |

|a pager is broken for a person on-call, the problem is 6 hours old, and estimated time for repair is > 6 more |2 |

|hours. Thus total response exceeds 12 business hours and problem became a sev 2 when that estimate was first known.| |

|Vulnerability exploit which has begun implementing itself and has created inconveniences rather than serious |2 |

|impacts such as an outage. | |

|Auth Code Manager is down (though sev 2 is typically an ageing from sev 1, this problem was deemed more critical |2 |

|than the usual sev 1, while keeping it from the glut of sev 3 conditions that may require a SWAT team response). | |

|An FPC (processor) in the PBX is down |3 |

|Voicemail is down |3 |

|URNet backbone is down |3 |

|Worm or security vulnerability exploit in progress, with outage impact or other serious disruption to mission |3 |

|critical services. Includes those both affecting and launched from the UofR community. | |

|Ability to call out, whether long distance, or local, is impaired by congestion in the public telephone network |3 |

|(therefore affects multiple users in critical areas) | |



Published by the University of Rochester Telecommunications Division. Copyright 2000.

Daytime

Severity 3 - Call Center Responsibilities

Definition - Severity 3 : Major Failure; PBX or VoiceMail Node down. Strategic (critical) business unit network down. Backbone down. Internet connection down.

IMMEDIATE Organizational Actions, if voice services impacted.

Upon receipt of a severity 3 condition, during daytime business hours, the Call Center notifies a Senior Manager and the Director of the severity 3 and assigns a scribe to record events in a chronology.

1. A SEV3 page will be issued.

2. _______________________________

Identify the engineering lead responsible for responding to / communicating regarding the severity. The engineering lead will be based upon the type of outage and the on-call schedule.

3. _______________________________

Assign an NTOC partner role of lead communicator to someone in the Call Center – either a Senior Call Agent, Call Center Manager, or Triage. That person will identify themselves to Kate – if she is unavailable – this role will be identified to Norm. If neither are on site – the escalation is to David Lewis.

4. _______________________________

Establish communication expectation with Dave, Kate & Norm - status updates every 20 minutes (or other acceptable interval).

5. Open a trouble ticket - a new trouble ticket will be opened for each report if it is necessary to track specific information to assess and correct the outage.

6. _______________________________

Assign chronology duties to track the "history" of the event.

In the chronology, include:

non-technical description of what happened (customer experience)

technical description

clear understanding of: is the problem solved

if not, who is following up

7. Notify the Directory Service Agents – provide them a script to use for callers.

8. Notify the Front Office – provide them a script to use for callers.

9. The NTOC lead assigned in Step 2 will be responsible for email notification to the proper list(s) (pager, phonedown, netdown, etc).

Prepare an email notification.

Ask Engineering Lead to approve.

Provide notification to either Kate or Norm to review.

Suggest the lists that need to be accessed.

_______________________________ David Lewis

_______________________________ Med Ctr Director’s office

(Julie Choate (x54601), Roberta Parker)

_______________________________ ‘Phonedown’

_______________________________ CIOs office

(Maureen Baisch (x55240))

_______________________________ President’s Office

_______________________________ Provost’s Office

10. Copies of each communiqué to chron keeper. The phonedown & netdown communiqué's will be sent by the NTOC communication lead. If they are unable to send the communiqué alert Kate or Norm – they will assign this function.

11. Call Center staff will ready 2-way radios for possible issuance.

12. Call Center will ready Cell phones for issuance.

13. Determine need for other communication devices.

14. Determine additional needs for follow up email to lists.

Post sev 3 activity:

15. Verify with Engineering & the NTOC that the interruption has been resolved.

16. Notify each of the lists & communication points that the interruption has been resolved. Work with Kate & Mike to

17. Follow up with each customer that has a trouble ticket that the issue has been resolved.

18. Prepare & email chron to mgrs & appropriate engineers. Engineering prepares any additional documentation required. Post document to On-call/Reporting folder on server.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download