Chicago, IL, United States
16 hours ago
Senior Site Reliability Engineering (SRE) Manager

Guide and shape the future of technology at a globally recognized firm, driven by pride in ownership.

 

 

As a Senior Manager of Site Reliability Engineering at JPMorgan Chase within the COMMERCIAL & INVESTMENT BANK Merchant and Commercial Card Production Management, you are the non-functional requirement owner and champion for the applications in your remit. You are a key influencer in your team’s strategic planning, driving continual improvement in customer experience, resiliency, security, scalability, monitoring, instrumentation, and automation of the software in your area. You act in a blameless, data-driven manner and navigate difficult situations with composure and tact.

Job responsibilities

Demonstrates expertise in site reliability principles and demonstrates an understanding of the fine balance between features, efficiency, and stability Effectively negotiates with peers and executive partners to ensure optimal outcomes for all  Drives the adoption of site reliability practices throughout the organization Ensures your teams demonstrate site reliability best practices with the ability to demonstrate this empirically through stability and reliability metrics Drives a culture of continual improvement and solicits real-time feedback to improve the customer’s experience Ensures your team collaborates with other teams within your group’s specialization and avoids duplication of work where possible Follows blameless, data-driven, post-mortem strategies and conducts regular team debriefs to enable learning from both successes and mistakes Provides personalized coaching for entry to mid-level team members  Ensures your team documents and shares their knowledge and innovations via internal forums, communities of practice, guilds, and conferences 

Key Responsibilities:

Leadership and Team Management: Lead and mentor a global team of site reliability engineers, fostering a culture of innovation, collaboration, and continuous improvement. Provide leadership training to enhance strategic thought leadership capabilities and understanding of SRE tenets. Operational Efficiency: Implement enhanced communication protocols and structured prioritization processes to streamline operations and reduce missed deliverables. Develop a dashboard for resource capacity monitoring to enable real-time adjustments and better capacity planning. Strategic Service Improvement: Introduce automation and AI-driven solutions to improve monitoring, telemetry, and communication processes. Deploy AI algorithms for pattern and trend detection to proactively address potential issues and optimize system performance. Onboarding and Recruitment: Develop a comprehensive onboarding program for new SREs to accelerate integration and productivity. Expedite the hiring process to fill open positions and ensure the team is adequately staffed to meet demands. Alignment with SRE Tenets: Implement training programs focused on the five key SRE tenets, ensuring all team members understand 

Required qualifications, capabilities, and skills

15+ years of experience in site reliability engineering, with a focus on financial services.  Advanced proficiency in site reliability culture and principles and can demonstrate how to implement site reliability across application and platform teams while avoiding common pitfalls Experience leading technologists to manage and solve complex technological issues at a firmwide level Ability to influence the team’s culture by championing innovation and change for success Experience hiring, developing, and recognizing talent Strong communication skills and a desire to mentor and educate others on SRE principles and practices. Technical proficiency in Python, Java, AWS Cloud, Jenkins, Terraform, Kubernetes, Docker, and monitoring tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk. Demonstrated proficiency in software applications and technical processes within a technical discipline (e.g., cloud, artificial intelligence, machine learning, mobile, etc.) Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.) Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.) Experience with troubleshooting common networking technologies and issues Formal training or certification on software engineering concepts and 5+ years applied experience 5+ years of experience leading technologists to manage and solve complex technical items within your domain of expertise

 

Preferred qualifications, capabilities, and skills

Ability to code and demonstrate data fluency AWS Certified Cloud Practitioner or equivalent certifications Bachelor of Engineering or equivalent experience
Por favor confirme su dirección de correo electrónico: Send Email