Senior Engineer, Site Reliability Engineering

Chicago, IL, USA

1 day ago

United Airlines

Job overview and responsibilitiesAs a Sr. Engineer, you will be a self-starter who is seen as a technical expert in Site Reliability Engineering, responsible for the day-to-day performance and availability of enterprise applications used across United’s businesses. This will be accomplished with a combination of general application/environment understanding and the use of analytics and automation tools provided to you for both cloud and on-premises. You will also participate in a 24x7 on-call rotation and be accountable for all aspects of IT Service Delivery, including incident, problem, and change management and ensure adherence to these processes, from coding to scaling applications, performance tuning and post-mortem analysis. Lastly, as the Sr. Engineer, you will drive thought leadership and function as an interim leader in the absence of the Sr. Manager, partnering with DevOps teams to define and implement observability and monitoring practices during the SDLC.Collaborate proactively with interdisciplinary teams across the IT department to identify and mitigate unplanned application downtime and engage in thorough root cause analysis post-outage, improving system designs for automated troubleshootingPartner with Application Development and DevOps teams to continuously refine application instrumentation in order to maximize reliability and availability, enforcing best practices and enhancing system optimizationContinuously build upon knowledge of the assigned portfolio of applications to understand architecture, usage patterns, performance trends, outages, and business impact, creating strategies to proactively identify and report application performance problems and failures, detecting and preventing issues to mitigate operational risksContinuously monitor the production environment availability and take a holistic view of system health, service performance and availability, including real user monitoring, logging, distributed tracing and alerting for cloud and on-premise systemsEngage with project teams to guarantee that operational monitoring and instrumentation requirements are addressed by defining and implementing SLI, SLO and SLA during application deploymentDevelop expert-level knowledge of SRE and Observability toolsets to maintain and enhance our Observability practices and solutions, implementing proactive self-healing capabilities to avoid real user impactServes as mentor to other team members to provide support and guidance in performing core functions, and in championing the adoption of SRE practicesJob overview and responsibilitiesAs a Sr. Engineer, you will be a self-starter who is seen as a technical expert in Site Reliability Engineering, responsible for the day-to-day performance and availability of enterprise applications used across United’s businesses. This will be accomplished with a combination of general application/environment understanding and the use of analytics and automation tools provided to you for both cloud and on-premises. You will also participate in a 24x7 on-call rotation and be accountable for all aspects of IT Service Delivery, including incident, problem, and change management and ensure adherence to these processes, from coding to scaling applications, performance tuning and post-mortem analysis. Lastly, as the Sr. Engineer, you will drive thought leadership and function as an interim leader in the absence of the Sr. Manager, partnering with DevOps teams to define and implement observability and monitoring practices during the SDLC.Collaborate proactively with interdisciplinary teams across the IT department to identify and mitigate unplanned application downtime and engage in thorough root cause analysis post-outage, improving system designs for automated troubleshootingPartner with Application Development and DevOps teams to continuously refine application instrumentation in order to maximize reliability and availability, enforcing best practices and enhancing system optimizationContinuously build upon knowledge of the assigned portfolio of applications to understand architecture, usage patterns, performance trends, outages, and business impact, creating strategies to proactively identify and report application performance problems and failures, detecting and preventing issues to mitigate operational risksContinuously monitor the production environment availability and take a holistic view of system health, service performance and availability, including real user monitoring, logging, distributed tracing and alerting for cloud and on-premise systemsEngage with project teams to guarantee that operational monitoring and instrumentation requirements are addressed by defining and implementing SLI, SLO and SLA during application deploymentDevelop expert-level knowledge of SRE and Observability toolsets to maintain and enhance our Observability practices and solutions, implementing proactive self-healing capabilities to avoid real user impactServes as mentor to other team members to provide support and guidance in performing core functions, and in championing the adoption of SRE practicesWhat’s needed to succeed (Minimum Qualifications):Bahcelors degree in information technology, Computer Science, or relevant field4 years in an IT organization with experience in end user technical support or systems administrationExperience with AWS networking services like VPC, Route 53, and CloudFront, with understanding of cloud concepts like IaaS, PaaS, or SaaSAbility to diagnose and troubleshoot issues effectivelyMust be legally authorized to work in the United States for any employer without sponsorshipSuccessful completion of interview required to meet job qualificationReliable, punctual attendance is an essential function of the position

What will help you propel from the pack (Preferred Qualifications):

Experience with ITIL Service Management, SRE practices or Observability solutions for cloud in a medium to large IT organizationExperience with DevOps in a medium to large IT organizationExperience using Dynatrace, DQL and large enterprise experience is a plusExperience leading small projects or teamsExperience with distributed storage technologies such as EC2 (Elastic Compute Cloud), S3 (Simple Storage Service), RDS (Relational Database Service), VPC (Virtual Private Cloud), Lambda, and CloudFormationProficiency with dynamic resource management frameworks (Kubernetes, Yarn)Proficiency with DevOps practices and tools (CI/CD pipelines, Jenkins)Ability to code (structured and OOP) using one or more high-level languages, such as Python, Java, C# or JavaScriptUnderstanding of API management and integration services like API Gateway, and experience with RESTful and SOAP APIsDynatrace Associate Certification or AWS Certified DevOps Engineer requiredWhat’s needed to succeed (Minimum Qualifications):Bahcelors degree in information technology, Computer Science, or relevant field4 years in an IT organization with experience in end user technical support or systems administrationExperience with AWS networking services like VPC, Route 53, and CloudFront, with understanding of cloud concepts like IaaS, PaaS, or SaaSAbility to diagnose and troubleshoot issues effectivelyMust be legally authorized to work in the United States for any employer without sponsorshipSuccessful completion of interview required to meet job qualificationReliable, punctual attendance is an essential function of the position

What will help you propel from the pack (Preferred Qualifications):

Mostrar mas

Save & Solicitar más tarde Applying Later... Click to ApplyI AppliedDidn't Apply

Por favor confirme su dirección de correo electrónico: Send Email

Aplicar para este empleo

Next Job »

---

127 United Airlines empleos en 279 United Airlines empleos en