Definitions



Table of Contents TOC \o "1-3" \h \z \u Definitions PAGEREF _Toc416687947 \h 2Incident PAGEREF _Toc416687948 \h 2Declaring PAGEREF _Toc416687949 \h 2Incident Levels PAGEREF _Toc416687950 \h 2Priority 1 PAGEREF _Toc416687951 \h 2Priority 2 PAGEREF _Toc416687952 \h 3Priority 3 PAGEREF _Toc416687953 \h 3Priority 4 PAGEREF _Toc416687954 \h 3Mission Critical Services PAGEREF _Toc416687955 \h 3Authentication PAGEREF _Toc416687956 \h 3Computer Labs PAGEREF _Toc416687957 \h 3Email PAGEREF _Toc416687958 \h 3Network PAGEREF _Toc416687959 \h 3Wireless Network PAGEREF _Toc416687960 \h 4Power Disruptions to Campus PAGEREF _Toc416687961 \h 4Storage Area Network (SAN) Disruption PAGEREF _Toc416687962 \h 4Roles and Responsibilities of Team for Priority 1 Outages/Major Incidents PAGEREF _Toc416687963 \h 4CIO or Backup PAGEREF _Toc416687964 \h 4Incident Commander PAGEREF _Toc416687965 \h 4Technical Lead PAGEREF _Toc416687966 \h 4User Support Lead PAGEREF _Toc416687967 \h 4Communications Lead PAGEREF _Toc416687968 \h 4Field Coordinator PAGEREF _Toc416687969 \h 5Service Owners PAGEREF _Toc416687970 \h 5Subject Matter Expert (SME) PAGEREF _Toc416687971 \h 5Communications PAGEREF _Toc416687972 \h 5Internal Communications Options PAGEREF _Toc416687973 \h 5External Communications Options PAGEREF _Toc416687974 \h 5Reporting PAGEREF _Toc416687975 \h 6Meeting Location PAGEREF _Toc416687976 \h 6Logistics PAGEREF _Toc416687977 \h 7Execution of Work Steps for Priority1 Major Incident PAGEREF _Toc416687978 \h 7Incident commander duties PAGEREF _Toc416687979 \h 12General duties PAGEREF _Toc416687980 \h 12Incident commander timed checklist PAGEREF _Toc416687981 \h 12Check List PAGEREF _Toc416687982 \h 13Review the Emergency Prep roles and procedures!??????????????? Username: itPassword: ?Ic3DefinitionsIncidentWhenever a user is not receiving an expected level of service from an IT service. Expected levels of service are based on Service Level Agreements (SLA).Major Incident/OutageA major incident is defined as a significant event, which demands a response beyond the routine, resulting from uncontrolled developments in the course of the operation of any establishment or transient work activity.DeclaringMission critical (university or internal) IT service(s) are not performing at the expected level for a period of 30 minutes unless defined differently in the SLA or designated otherwise by this plan.Incident LevelsPriority 1Mission critical services are not performing for the University. All appropriate resources will be dedicated to restore service(s). Priority 2Mission critical services are not performing for departments or computer labs. Service(s) is not performing at a campus or enterprise level. Appropriate services owners will be dedicated to restore service(s).Priority 3Address problem and escalate as necessary. These incidents do not require the dedication of level 1 or 2. Priority 4There is a known work around for the issue. Does not require dedicate resources to resolve.Mission Critical ServicesAuthentication Accepting authentication requests and responses for the following systems:BlackboardCampus desktop computersCentral IT maintained computer lab machinesCUSIS portalMyUCCS PortalWireless System Computer LabsDisruptions to IT maintained computer lab machines not allowing customers the ability to utilize systems.Email Email messages not flowing in or out of the following systems:Exchange on premiseOffice 365 Cloud Solution Note: Unless the service outage is determined to be an exclusive Microsoft issue and UCCS IT personnel have no control to participant in a resolution, than this will not follow the full Major Critical procedures. Conceivably only the communication plan will be followed. NetworkDisruptions to campus network systems to include:Campus FirewallsCampus Routing Campus SwitchesConnections in and out or within the El Pomar Data CenterConnections in and out or within the Columbine Data CenterConnections in and out or within Main Hall and Cragmor HallExternal internet connectivityWireless Network Disruptions to the wireless system not allowing customers to utilize the networkPower Disruptions to CampusAny power disruption to the El Pomar or Columbine Data Centers lasting longer than 10 minutes.Storage Area Network (SAN) DisruptionAny disruption to data flowing in or out of the campus SAN solution. Roles and Responsibilities of Team for Priority 1 Outages/Major IncidentsCIO or Backup Authorize resources for the major incident; direct communication with Chancellor and UCCS Leadership team; and if needed communications with Presidents office.Incident Commander Coordinate plan; oversee response; lead meetings; organize meals; and provide funding;See below for detailed description.Technical Lead Examine situation; confirm major incident; attempt to identify root cause; work to find technical options; present technical options to team; and participant with plan where needed. User Support Lead Provide information from user’s perspective; provide user support options; contact specialized users; and participant where needed. Communications Lead Create plan for messaging including frequency; provide messaging to campus; update ; point person for internal communication; and participant where needed. Field Coordinator Provide information from the field; deliver support from the field; participant where needed. Note: depending on the major incident this role may not be needed.Service Owners Provide information on services effected; work with technical lead to create options for plan of action. Subject Matter Expert (SME)An individual with a high-level of overall knowledge of the service impacted, both in terms of general architecture and business service municationsInternal Communications OptionsCommunications should be sent out from the helpdesk@uccs.edu email address if possibleoutage@uccs.edu - hosted on lists.uccs.edu (Communigate server – local infrastructure must be working) (Texting and Email)outage@ - hosted through (Texting and Email)uccshelpdesk@ - help desk communications sent when exchange is not availableCenturyLink Conferencing Audio Conferencing USA:1-720-279-0026USA /Canada (toll free): 1-877-820-7831 This will be the Major Incident main line:Web / sharing External Communications Options Students - student-1@uccs.edu (lists.uccs.edu is required)Faculty - faculty-l@uccs.edu (lists.uccs.edu is required)Staff – staff-l@uccs.edu (lists.uccs.edu is required) {needing login information} – Automatically posts to IT Twitter UIS – itccop@cu.eduHousing Email Lists:Summit-l@uccs.eduAlpine_l@uccs.eduTimberline-l@uccs.eduUCCS leadership - Only CIO or backup communicates with leadership teamUniversity RelationsHutton, Tom. . . .719-255-3439Executive DirectorUniversity Advancement - University Communications and Media RelationsMAIN 301Athutton@uccs.eduUCCS TwitterDenman, Philip. . . .719-255-3732Assistant DirectorUniversity Advancement - University Communications and Media RelationsMAIN 301pdenman@uccs.eduWebsite Alerts {Craig needing information for posting in Ingeniux or how this should be handled}Rave (Must first check with Tim Stoecklein before post message with system)Stoecklein, Tim. . . .719-255-3106Program Director of Emergency ManagementPublic Safety Department - Emergency ManagementDPS 208tstoeckl@uccs.eduPhonesHelp Desk ACD messageSidecars if necessary MediaUniversity Relations will be the only organization allowed to speak to the media.?Reporting When to reportWho to report toUISOther CU campusesChancellor's officePresident's officeMeeting LocationEPC 139, IT Conference RoomLocation needs:PhoneLaptop/ProjectorWhite boardTableRoom and chairs for 10 peopleExtra PortsPowerLogistics Review Mission Critical Services Communications expectations planCommunication templates Define essential personnel and backupsPersonnel expectations during major incident and afterEssential personnel is expected to participant in major incident/outage response. If incident is after hours essential personnel is expected to participant if available. Working Time:16 hours working max or 2 a.m. At the start of 14 hours, or midnight appropriately, technical lead must start to create plan for providing rest to employees. Discuss of break/meal every 4 hours. Food/Drink coordinationAfter major incident/outage is resolved and work was conducted after normal business hours, employees will be given hour for hour flex time. The employee is expected to take the time and must be used within one month from when the work was performed. Incident Commander will work with employee’s supervisor to coordinate flex time. Equipment needsEquipment needs shall be coordinated by the Incident Commander.FundingWill be coordinated by the Incident Commander.Execution of Work Steps for Priority1 Major IncidentService restoration target is two hours for a Priority 1 Major Incident.TaskDescription Time1.Notification and Confirmation Priority1 Major Incident has been identifiedIncident as past the trigger points2.Notify IT Outage GroupSend email or text message to outage@ or outage@uccs.eduText Template:UCCS IT internal alert(ServiceName) is experiencing a service interruption. See email for more details.Sent by: (Name) (PhoneNumber)Email Template:Subject: (ServiceName) is experiencing a service interruption.Current symptoms includes:- SYMPTOM1Known workarounds include:- WORKAROUND1IT is working to restore service and will provide more information as it becomes available. The next communication will be sent by XX:XX a.m./p.m.Within 10 Minutes 3.Contact Computing Services Directors:Kirk Moore Cell: (719)238-9451House: (719)282-1887Email: kmoore@uccs.edu or uccsit@Greg WilliamsCell: (719)237-6491House:(719)481-1290Email: gwillia5@uccs.edu or If directors have not been reached contact associate directors:Rob Garvie Cell: (719)439-1724House: (719)266-8525Email: rgarvie@uccs.eduMike Belding Cell: (719)338-9776House: (719)260-6794Email: mbelding@uccs.eduIf directors or associate directors have not been reached contact CIO:Jerry WilsonCell: (719)440-2215House: (719)599-4752Email: jwilson@uccs.eduWithin 10 minutes of initial contact4.Open Technical Bridge1-720-279-0026(toll free):1-877-820-7831 Host and Guest passcodes stored in LastPass under Incident ResponseWithin 15 minutes of initial contact5.Initial TriageStart assessmentReplicate issueReview monitoring and logsTry to identify workaroundsIf no quick solution or workout is discovered then declare a Priority 1 Incident. Within 20 minutes of initial contact6.Declare Priority 1 IncidentSelect Incident commander Start calling in personnel to meeting location or conference bridgeDefine roles of teamStart external communicationSet Message on using Email Template Text Template:UCCS IT Alert: SERVICENAME is experiencing a service interruption. IT is working to restore service and will provide more information as it becomes available.Email Template:Subject:UCCS IT Alert: SERVICENAME is experiencing a service interruption.?Body:Current symptoms includes:?- SYMPTOM1?Known workarounds include:?- WORKAROUND1?IT is working to restore service and will provide more information as it becomes available.? The next communication will be sent by XX:XX a.m./p.municate to CIO or person acting as backup and they will contact Chancellor’s officeDetermine if service should remain active or be brought down. Determine whether individuals aiding in incident restoration should convene in person to aid in restoration efforts.Determine whether vendor involvement or escalation is required. If incident resolution is not expected within 15 minutes, establish time frame for next status update.Continue to facilitate conversation as appropriate to ensure focus is on restoring service.Within 30 minutes of initial contact7.While Priority 1 Incident is occurringRequest current status of restoration efforts.Instruct communication lead to send a notice using the following templates:Email Template: Subject:UCCS IT Alert: SERVICENAME is experiencing a service interruption.?Body:Current symptoms includes:?- SYMPTOM1?Known workarounds include:?- WORKAROUND1?Update: UCCS IT is working to restore service and will provide more information as it becomes available. The next communication will be sent by XX:XX a.m./p.m. Every 60 minutes8.Upon Service RestorationRequest the technical lead to verify that service has been restored and report on the current state of operation.Confirm the communications lead will:Update Remove ACD message on Help Desk phone line.Send an incident restoration message with the following content:Email Template:Subject:UCCS IT Alert: SERVICENAME Service Now Available Body:UCCS IT has identified and addressed root cause. As of XX:XX a.m./p.m. service has been restored. Thank you for your cooperation we worked to resolve this issue. The technical manager or director of the failing service or component that caused the incident will:Hold a debrief meeting Prepare and deliver incident report to CIO, Directors, and campus IT partners within three business days.Formally state to participants on the technical bridge that the incident status is downgraded, everyone is standing down and the technical bridge is being closed. Document resolution and close incident management (IM) ticket.Open problem management ticket to track ongoing root cause analysis efforts and document any known workarounds.Upon service restorationClose Major IncidentHold debrief meeting with three days Prepare Major Incident Response report within five days with the help from those participating {Rachel - needing report template}Distribute reportIncident commander dutiesAn incident commander serves to keep an incident project on track for process, maintain focus on the problems, facilitate analysis and interactions, and verify that the incident response team’s needs are being met (resources, information, etc.). To that end, the individual has several duties outlined below.General dutiesOpen the incident phone bridge line:1-720-279-0026 (toll free):1-877-820-7831 Host Passcode: 9694542Guest Passcode: 321592Monitor incident phone bridge line or assign duty.Assign a participant to set up any required A/V resources (projecting monitoring data, etc.). Briefly recap incident process at the beginning of incident room level events.Coordinate efforts within the room to minimize confusion and reduce the risk of inadvertent or simultaneous changes.Draw focus back together when conversations become unproductively fragmented.Document notable events and steps taken in a visible incident log (to be recorded electronically by designated in-room scribe). Solicit approval and/or consensus on decisions to bring services up/down and to make changes to production services.Ensure that the appropriate UIS employees and vendors are engaged and working the issue, tasking people to escalate as needed.Initiate brainstorming during troubleshooting and ensure that identified paths of investigation (hypotheses) or actions are assigned to individuals and given an order/priority.Facilitate communications efforts (both to broad groups of customers and executives) by ensuring that the appropriate communicators have timely and accurate information.Record employees’ hours worked and ensure they take their flex time. Incident commander timed checklistEvery 30 minutes:Request status from teams working issues.What current hypotheses are being investigated and have any been eliminated or verified.What actions have been completed or are in progress.Provide a verbal update within the incident room and update the incident log.Every hour:Check in with communications staff regarding next status update steps.If the list of hypotheses has been exhausted, initiate a new cycle of brainstorming, documenting, assigning tasks, etc. At 11:30am and 5:30pm:Request that business office (if available) order some food for those working the issue in the incident room. Be sure to cover dietary needs (vegetarian, etc.). Encourage participants to use mealtime as an opportunity to leave the room for a little while, allowing for coverage if needed. At 9pm:Request that directors/managers begin their plans for staff rotation during the night if on-going work is required. Check ListNotification of Priority 1 incidentConfirmation of major incident/outagePriority 1 incident has been determinedIf Level 1 priority 1 incidentHas incident crossed trigger pointsNo – continue to monitor situationYes Create problem in CherwellDetermine which individuals are needing to evaluate the situationDefine roles for individuals participating Tools Last pass Cloud Service for password management MonitoringTesting environmentBuild action plan:Define scope / timeframeDevelop technical plan Define personnel neededDetermine return on investmentAssign tasksCommunication planHow do we communicate with each other?How and who do we communicate with externally?Recording communications Confirming communications postingsHow often do we need to communicateCommunicating to UCCS Leadership (Role of CIO)Document going progress and issues, record in CherwellResolvedDocumenting issue, response and fixClosing responseHold debrief meeting with three days Prepare Major Incident Response report within five days with the help from those participating{Rachel - needing report template}Distribute report ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download