Change Management Procedure
Description
The university has established a change management process designed to document and communicate changes to LTS defined Tier 0, 1, 2, and 3 services. This procedure outlines how LTS staff initiate, approve/deny, and communicate change requests.
Scope
This procedure applies to all LTS staff.
Changes that are done during unplanned outages resulting in full, partial outage, or degraded performance should not follow this process so that we can get services back online quickly. The Incident Response procedure should be followed instead.
Change Management Procedure
Submitting a change request
Any LTS staff making changes to LTS defined Tier 0, 1, and Tier 2 services must follow this procedure. To create a change request, go to the LTS Service Desk management at https://lehigh.atlassian.net/jira/servicedesk/projects/LTS/forms/form/direct/51/10122?requestTypeId=81 create a ticket, and select Change Request for the Request Type. Enter necessary information.
Changes to the Secure Research Cloud must follow this process in addition to IaC and Cloud Steering Committee reviews.
Important Note: Planned changes must be submitted 3 working days ahead of the change. Some resources will require more such as HPC. HPC provides a 14 day notice before a scheduled outage, anything between 3-4 weeks lead time would be nice. We also need a minimum of 3 day notice to pause the scheduler and let jobs complete normally. Anything affecting HPC with less than 3 days lead time is unacceptable.
Approval of the change
By default, all changes are automatically approved provided they are entered via the Change Management portal and the Change start date/time is not left empty. In lieu of having a traditional Change Advisory Board (CAB), the following safeguards are in place to allow staff to request more information, deny changes, or change the date/time.
Anyone in LTS can request more information on the change request.
Any manager or director within LTS can request that a change be denied or have the date and time changed. If any staff has concerns about a change, please work through your supervisor to get this concern raised and resolved.
Communicating the Change Request
The Change Management process has automation built in that once a change request is submitted, the system will send the notifications through various channels.
An email is sent to the LTS-Changes (lts-changes-list@lehigh.edu) Google Group.
All approved changes are automatically entered into the LTS Systems Upgrade Calendar. You can add the calendar to your calendar views in Google Calendar.
You can set up the calendar to notify you via email if a change event is added. Go to the settings menu (the gear), select the Calendars tab, and go through to edit the notifications of the specific calendar for which you want to receive these notifications.
All approved changed will be posted in the #change-management Slack channel.
15 minutes prior to the scheduled change, a reminder will be posted in the #operations Slack channel.
Working the scheduled Change Request
Depending on the scope of the change, one recommendation, especially for major changes, is to create a temporary Slack channel for the change. Announce the Slack channel in the #incidentmanagement Slack channel so others know in case they want to follow along during the change.
Create a temporary Slack channel for the change.
Announce the temp Slack channel in the #incidentmanagement Slack channel so others know where to find it.
Post periodic updates in the temp Slack channel.
For those not involved with the change, feel free to observe. Enter any information such as other services that may have been impacted during the change. Please be mindful of the staff that are working on the change so that you are not overwhelming or distracting them while they are making the changes.
When the change is completed, lock the temp Slack channel.
Delete the temp Slack channel after a few days or after a post mortem is completed.
Closing out the Change Request
When the change has been completed, it is important to bring closer to the change request so others know that the change was completed and if it was successful.
Open the Jira Change Management Queue at https://lehigh.atlassian.net/issues/?filter=10797&atlOrigin=eyJpIjoiYWQ3NWQ1NzM3YTdlNDdhY2FlOGI2NTQ5MmJjM2JjZTEiLCJwIjoiaiJ9 and select the change request.
Update and close out the change request.
Retrospective Process
As part of the process to close out an incident, a blameless retrospective must be conducted by the Incident Owner. The retrospective process must be followed in a way that no one is to be blamed for the incident itself and activities through completion. It must be in an open, honest, and learning environment so that we may capture findings, observations, and actions to continuously improve our services.
All LTS Retrospectives are stored in Google Shared Drive: https://drive.google.com/drive/folders/1cE9od_f0d9cqDqeEG9scYB_x6oIwcRIxConnect your Google account
The Retrospective template is located at https://docs.google.com/document/d/1tRgl6V8nNqUtOIadrcGDqkXylmQnZ0GaR8X5_TpTFXk/editConnect your Google account
Key information captured includes:
Incident Summary: Executive summary of the incident
Detection: How were we notified of the incident? Our management tools? Help Desk ticket?
Incident Time Frame: Time between incident notification, incident declaration, and services completely restored.
Root Cause: What was the root cause of the incident, if known.
Resolution: What resolved the incident to return the service to normal operations?
Team Members: Who was involved in resolving the incident, identifying roles such as Incident Owner.
Timeline (Sequence of events from first incident through normal operations): Document key events that occurred during the incident such as notifications, communications, debugging, testing, changes in service state, and final resolution.
Findings and Observations (Positives and Negatives): Open and honest assessment of the incident. What went well, what went wrong, and where did we get lucky?
Actions: Document any corrective, preventive, and process actions. Assign owners and migrate actions into Jira in either team, project, or general tickets for tracking and documentation purposes.
Changes occurring over end of year winter break
Changes that occur over the winter break pose unique challenges in the amount of changes that are occurring and staff vacations. With this, we are establishing the following special process for changes over Winter Breaks.
Winter Break is defined as the day after the last exam to the day before the first day of class for the Spring Semester.
All changes must be submitted to the change management process by 2 weeks prior to the last day of exams.
The calendar showing all the planned changes over winter break will be shared with LTS staff on the day after the last day of exams.
Any change requests after the last day of exam will be treated as an exception requiring approval from Assistant Directors, Directors, or CTO.
Change Management Advisory Committee
A Change Management Advisory Committee is established to review and approve changes to the process.
Committee will meet quarterly for 30 minutes.
Committee will discuss the effectiveness of the process.
Committee will review and approve/deny features and changes to the process.
Change Management Workflow
Revision History
Date | Version | Description | Approval |
---|---|---|---|
Mar 1, 2024 | 1.1 | Updated to new Jira Cloud | Approved |
Feb 10, 2022 | 1.0 | Final Original Document | Approved |