Overview:
We are seeking a highly experienced Major Incident & Problem Manager to lead critical incident response efforts and ensure timely resolution of business and customer-impacting incidents in a 24x7x365 environment. This role acts as the central point of contact during major incidents and coordinates across internal teams and external partners to drive resolution and continuous improvement.
Assess the impact and severity of major incidents; gather data to support decision-making.
Act as the primary contact during critical incidents, keeping all relevant teams informed and engaged.
Lead incident response efforts, including triage, technical bridges, recovery coordination, and communication.
Maintain clear, timely communication with internal stakeholders, external partners, and authorities.
Develop and maintain communication templates for effective incident updates.
Ensure adherence to response SLAs and escalation procedures.
Coordinate with external vendors for additional support during incidents.
Maintain accurate records of incident response activities and decisions.
Conduct thorough post-incident reviews to identify root causes and improvement opportunities.
Assign and track Problem records to resolution, coordinating RCA to closure.
Create and manage incident response and escalation plans.
Establish metrics and reporting to track incident and problem management performance.
Continuously review and update incident response plans and frameworks.
Provide support and guidance to the incident response team during complex situations.
Develop contingency plans for various scenarios and ensure organizational readiness.
Requirements:4+ years of experience in IT operations within a large-scale environment.
5+ years of hands-on experience leading major incident resolution.
Strong background in Incident and Problem Management (5+ years).
Solid understanding of ITIL processes – Incident, Problem, and Change Management.
Experience working with ServiceNow (or similar ITSM platforms).
Familiarity with cloud platforms like AWS and Azure.
Ability to work in a 24x7x365 on-call rotation.
Key Skills:Incident & Problem Management
Criticality Analysis
Service Desk Operations
Security Incident Response
Communication & Coordination
ITIL Framework
ServiceNow
Cloud Platforms (AWS, Azure)
Notice period: Immediate
Job location: Chennai