London, Undisclosed, GB
22 hours ago
Sr. SDM, Availability Eng, Prime Video
Come build the future of entertainment with us. Are you interested in shaping the future of movies and television? Do you want to define the next generation of how and what Amazon customers are watching?

Prime Video is a premium streaming service that offers customers a vast collection of TV shows and movies - all with the ease of finding what they love to watch in one place. We offer customers thousands of popular movies and TV shows including Amazon Originals and exclusive licensed content to exciting live sports events. We also offer our members the opportunity to subscribe to add-on channels which they can cancel at anytime and to rent or buy new release movies and TV box sets on the Prime Video Store. Prime Video is a fast-paced, growth business - available in over 200 countries and territories worldwide. The team works in a dynamic environment where innovating on behalf of our customers is at the heart of everything we do. If this sounds exciting to you, please read on.

We are seeking an experienced Senior Software Development Manager to lead our Availability Engineering team within Prime Video. This team is responsible for developing and maintaining our observability platform, incident management systems, and resiliency programs.

Key job responsibilities
- Manage a high-performing team of software engineers, program managers, data scientists, and incident responders focused on improving the availability and resilience of Prime Video
- Oversee the development and evolution of our observability platform, which enables analysis of logs, traces, and other telemetry at scale to rapidly triage and resolve issues
- Implement observability and incident management solutions, including the use of generative AI to assist developers in diagnosis and remediation
- Establish and refine processes for effective incident management, including on-call rotations, escalation paths, and post-incident review
- Drive initiatives to improve the overall resilience and fault-tolerance of the Prime Video platform
- Partner closely with other engineering leaders to ensure availability and reliability goals are met
- Hire, develop, and retain top technical talent for the Availability Engineering team

A day in the life
1. Team Management:
- Hold 1-on-1 meetings with direct reports to discuss progress, challenges, and development goals
- Lead daily/weekly team standups to align on priorities and unblock any issues
- Facilitate team planning and retrospective sessions to continuously improve processes
- Provide technical and career mentorship to team members

2. Observability Platform Oversight:
- Review performance metrics and identify areas for improvement in the observability platform
- Collaborate with applied scientists and engineers to enhance the platform's analytics capabilities, including the use of generative AI
- Ensure the platform is scaling to meet the growing needs of the Prime Video development teams
- Oversee the roadmap and backlog for new observability features and capabilities

3. Incident Management:
- Oversee the incident management process, including establishing escalation paths and post-incident review
- Analyze incident data to identify recurring issues and drive long-term reliability improvements

4. Resiliency Program:
- Work with the team to develop and execute on initiatives to improve the overall resilience of the Prime Video platform
- Collaborate with other engineering leaders to align on resiliency goals and strategies
- Monitor key resiliency metrics and evaluate the effectiveness of resiliency efforts

5. Stakeholder Engagement:
- Regularly communicate with engineering leadership on the team's progress and challenges

6. Talent Management:
- Recruit and interview candidates to grow the Availability Engineering team
- Develop and retain top talent through career development plans and performance management

About the team
We bring together multiple complex programs and work streams to deliver high availability and minimize customer impact. Our solutions provide observability and insights to Prime Video developers with particular focus on directing users to root cause via GenAI techniques. This software is used to: support high value events such as English Premier League and NBA, measure success and failure metrics, and investigate customer journeys through our products and services.
Por favor confirme su dirección de correo electrónico: Send Email