Washington, DC, 20080, USA
1 day ago
Sr. Manager, SRE
**Job Overview** We are seeking an exceptional Senior Manager of Site Reliability Engineering (SRE) to lead our global SRE organization and drive operational excellence across our multi-cloud SaaS platform. This role is critical to our mission of delivering reliable, scalable, and performant solutions to thousands of customers worldwide. The successful candidate will lead distributed teams across the US, Ireland, and India while ensuring optimal customer outcomes through proactive issue prevention and rapid incident resolution. **Success Metrics:** + **Customer Impact** : Reduced MTTR and improved customer satisfaction scores + **Reliability** : Achievement of 99.9%+ uptime SLAs across all products and regions + **Team Growth** : Successful scaling of global SRE organization with low attrition + **Proactive Prevention** : Reduction in incident frequency through automated detection and prevention + **Cross-functional Collaboration** : Improved partnership metrics with Product, Engineering, and Customer Success teams **About Us** When you join iCIMS, you join the team helping global companies transform business and the world through the power of talent. Our customers do amazing things: design rocket ships, create vaccines, deliver consumer goods globally, overnight, with a smile. As the Talent Cloud company, we empower these organizations to attract, engage, hire, and advance the right talent. We’re passionate about helping companies build a diverse, winning workforce and about building our home team. We're dedicated to fostering an inclusive, purpose-driven, and innovative work environment where everyone belongs. **Responsibilities** **Leadership & Strategy** + Lead and scale a global SRE organization spanning multiple time zones (US, Ireland, India) + Develop and execute SRE strategy aligned with business objectives and customer success metrics + Drive cultural transformation toward reliability-first engineering practices across the organization + Partner closely with Customer Success to ensure customer-centric approach to all SRE initiatives + Establish and maintain SLAs, SLOs, and error budgets that balance reliability with feature velocity **Incident Management & Response** + Lead enterprise-wide incident management, ensuring rapid detection, response, and resolution + Serve as executive point of contact during critical incidents + Drive comprehensive root cause analysis (RCA) processes with actionable prevention strategies + Establish and maintain 24/7 on-call rotation and escalation procedures across global teams + Develop and execute disaster recovery and business continuity plans **Technical Leadership** + Provide technical direction for complex, multi-cloud infrastructure spanning AWS, Azure, and GCP + Oversee reliability engineering for our entire product portfolio + Lead application performance monitoring initiatives + Drive modernization efforts and ensure optimal performance across geographically distributed DCs + Drive best practices in tuning SQL and NoSQL data platforms **Platform Reliability** + Ensure high availability and performance of services including: AWS (ECS, ECR, RDS, Aurora, SQS, SNS, Kinesis, S3, DynamoDB, OpenSearch), Authentication (Auth0/Okta CIC), Integration platforms (Workato), BI (Looker), API management (Apigee), Legacy systems (Tomcat, MongoDB) + Manage reliability for thousands of customers in North America and EU **Operational Excellence** + Establish observability standardization strategy (Sumo Logic, New Relic and Grafana) + Drive automation initiatives to reduce manual operational overhead + Implement chaos engineering and reliability testing practices + Lead capacity planning and performance optimization efforts + Establish metrics-driven culture with focus on customer impact measurements **Qualifications** **Leadership Experience** + 15+ years in SRE, DevOps, or Infrastructure Engineering roles with 5+ years in senior positions + Proven track record of scaling global engineering teams across multiple time zones + Experience leading teams through high-stakes incident response and customer escalations + Strong organizational skills with ability to influence cross-functional stakeholders **Technical Expertise** + Deep expertise in multi-cloud environments (AWS primary, Azure secondary, GCP preferred) + Extensive experience with containerization, orchestration, and modern deployment practices + Strong background in database technologies + Proficiency with observability tools (New Relic, Grafana, Sumo Logic, or similar) + Experience with large-scale Java applications and legacy system modernization **SRE & Operations** + Demonstrated success implementing SRE principles in large-scale production environments + Experience with ITIL, incident management frameworks and tools + Background in establishing and maintaining SLAs for enterprise SaaS products **Preferred** + Background with authentication systems (Auth0, Okta, SAML, OAuth) + Experience with API management platforms and integration architectures + Previous exposure to CDN optimization and global content delivery + Relevant certifications in AWS, Azure, or SRE practices **EEO Statement** iCIMS is a place where everyone belongs. We celebrate diversity and are committed to creating an inclusive environment for all employees. Our approach helps us to build a winning team that represents a variety of backgrounds, perspectives, and abilities. So, regardless of how your diversity expresses itself, you can find a home here at iCIMS. We are proud to be an equal opportunity and affirmative action employer. We prohibit discrimination and harassment of any kind based on race, color, religion, national origin, sex (including pregnancy), sexual orientation, gender identity, gender expression, age, veteran status, genetic information, disability, or other applicable legally protected characteristics. If you’d like to view a copy of the company’s affirmative action plan or policy statement and/or if you would like to request an accommodation due to a disability, please contact us at careers@icims.com . **Compensation and Benefits** We accept applications for this position on an ongoing basis until the position is filled. Applications will be reviewed as they are received, and qualified candidates may be contacted throughout the posting period. The anticipated base pay range for this position is $150,000-200,000 annually. Additional compensation for this role includes a bonus eligibility, which is based on personal, department and/or company performance, where applicable. Final compensation will be based on factors such as relevant experience, skills, education, internal equity, and market data. This range aligns with our commitment to equitable and transparent compensation practices, as required by applicable law. Competitive health and wellness benefits include medical, dental, vision, 401(k), dependent care, short term and long-term disability, life and AD&D insurance, bonding and parental leave, mindfulness resources, an open vacation policy, sick days, paid holidays, quiet hours each workday, and tuition reimbursement. Benefits and eligibility may vary by location, role, and tenure. Learn more here: https://careers.icims.com/benefits .
Por favor confirme su dirección de correo electrónico: Send Email