Celebration, Florida, USA
18 hours ago
Sr Site Reliability Engineer
Job Summary:

“We Power the Magic!” That’s our motto at Disney Experiences Tech & Digital (DXT). Our team creates world-class immersive digital experiences for the Company’s premier vacation brands including Disney’s Parks & Resorts worldwide, Disney Cruise Line, Aulani, a Disney Resort & Spa, and Disney Vacation Club.

We are responsible for the end-to-end digital and physical Guest experience for all technology & digital-led initiatives across the Attractions & Entertainment, Food & Beverage, Resorts & Transportation and Merchandise lines of business as well as other initiatives including MyDisneyExperience and Hey, Disney!

This role sits in the DSE Technologies Operations organization within Technology & Digital for Disney Experiences. It works closely with Applications Teams from across the company. The Sr. Site Reliability Engineer will report to the Manager, Technology Operations.

In this role, you will be responsible for coordinating and managing all retrospective discussions and continued troubleshooting in support of operational systems.  This role will work closely with our infrastructure and application teams to troubleshoot, determine root cause, and provide recommendations for both long-term fixes and interim mitigation steps, with an eye toward increased availability and reduced time to recover in the event of a systems failure. This role will also be deeply involved with designing and refreshing our lower environment strategy to allow us to better support release and deployment activities. The DSE Technology Operations team provides operational support for the production systems used by our guests, cast, and crew for Disney Cruise Line, Disney Vacation Club, and all DSE emerging businesses.

What You'll Do:

Drive a DevOps culture among peers and developersDesign, build, and support of products platformsConsult, design, build, and support development pipelines, automate infrastructure and operations, create telemetry for monitoring, engineer high reliability and reinforce best- practices to secure company dataStrong systems administration skills on Linux, Windows and Kubernetes, including AWS, Google Cloud, and Azure, and must have extensive experience with web technologies, source control management using Git, AWX, and Ansible.Perform systems administration in the Windows, Linux, and Kubernetes platforms and bring knowledge on systems, network, operational excellence and application stability, security, performance, and capacity management, operational excellence and application stability, security, performance, and capacity management, as well as documentationYou will be expected to stay up to date with emerging technologiesCollaborate with cross-functional teams to ensure timely and comprehensive resolution of system issues.Design and implement robust monitoring and tracking solutions for Windows, Linux, and containerized systems by leveraging existing investments in productivity and related tools.Coordinate and organize retrospective discussions following major incident outages or key system challenges; review troubleshooting and resolution as compared with best practices.Apply SDLC, ITIL, and other industry-wide best practices to leverage incident and problem management to effectively increase system availability while decreasing time to resolve and return to service.Provide expert-level support in troubleshooting and resolving application-related incidents when needed.Drive a DevOps culture among peers, infrastructure engineers, and developersKnowledge of systems administration on both Windows and Linux platforms, and bring knowledge on systems, network, operational excellence, application stability, security, performance, and capacity management to application and infrastructure teamsDesire and ability to stay up to date on emerging technologies and operational best practices.This role will include responsibility for lower environment design, build and management; developing automation and monitoring for both lower and production environments.This will include coordination and automation deployment of new software builds across multiple lower environments. 

Required Qualifications & Skills:

Minimum 5 years of related work experienceProficient in agile environmentsApplied understanding of observability principles using relevant toolsHands-on experience with CI tools like Gitlab, Ansible, and Azure DevOpsProficient in configuration management tools: Terraform, Ansible, ChefExperience in procedural programming languages (Python, Perl, Ruby, Java, Go, Rust, C/C++, PowerShell)Skilled in Cloud environments (AWS, Azure, Google Cloud)Collaborative in building reliable, scalable enterprise systemsCapable of identifying root causes in large-scale distributed systemsProficient in UNIX/Linux/Windows and Kubernetes administration, troubleshooting, and securityLeading technical projects and ensuring smooth deliveryCollaborative work with Security Operations teams for secure solutionsStrong troubleshooting skills across systems, network, and codeProficiency in systems, network, operational excellence and application stability, security, performance, and capacity management, security, performance, and capacity management, as well as system documentation, with a proactive demeanor towards continuous learning and skill development, and an interest in mastering emerging data engineering tools and methodologies

Required Education:

Bachelor’s degree in Computer Science, Information Systems, Software, Electrical or Electronics Engineering, or comparable field of study, and/or equivalent work experience

Por favor confirme su dirección de correo electrónico: Send Email