Pittsburgh, PA, USA
5 days ago
Site Reliability Engineer

As the Site Reliability Engineer, will be responsible for ensuring the availability, reliability, and  performance of our customer-facing software applications. This role combines planning, engineering, monitoring, incident response, and administration to create highly scalable and fault-tolerant systems. You will handle complex and escalated application and network infrastructure support cases including troubleshooting of ethernet networking problems. You will develop, review, and approve customer-facing and internal documentation on best practices, troubleshooting flowcharts, training materials and FAQs. You will act as a technical team lead, technical resource, and coach for the Associate Engineer – Support and Engineer – Support team members. 

Responsibilities:

Collaborate with engineering, technical services, and quality assurance divisions on any problems, software bugs or emerging customer needs Ensure the high availability and reliability of the production environment by monitoring system health and performance Provide primary operational support for large-scale distributed software applications Facilitate incident resolution via triage, communication, engagement, escalation, and documentation Partner with platform administration (both internal and external) to define and achieve stability and scalability objectives Collaborate with technical and quality teams to improve services by identifying areas of risk and helping to define and proactively implement solutions Drive continual improvement in system performance by setting service level objectives in collaboration with a performance center of practice and/or product development teams Participate in system design, capacity planning, and platform management  Analyze and publish metrics from operating systems and applications to assist in performance tuning and fault finding Pursue opportunities for automation and process improvements Delivers an exceptional levels of customer service, providing infrastructure support per Service Level Agreements (SLA). Handles escalated cases, including troubleshooting of complex audio, video, and ethernet networking problems. Handles security patch and vulnerability management. Evaluates, identifies, and replicates issues and follows an escalation process to reach desirable outcomes to ensure positive customer experience. Serves as a technical resource to other functional groups and individuals to improve service quality and user experience. Develops customer-facing and internal documentation on best practices, troubleshooting flowcharts, training materials and FAQs to ensure consistent customer experience. Takes ownership of the escalated cases from Associate Engineers and Engineers and takes it to the resolution.

Qualifications:

Bachelor's Degree - Engineering related discipline required; Master’s Degree preferred Experience providing first-level incident response and troubleshooting with technical teams to resolve end-user issues Proficiency with enterprise system monitoring software (examples: NewRelic, Nagios, Solarwinds, Dynatrace, Datadog, Azure Monitor, Splunk) Experience with cloud-based infrastructure, databases, and applications  Experience with performance tuning and fault finding in large-scale distributed systems. Experience with designing, implementing, and managing performance testing practices, including specific tools and frameworks Knowledge of disaster recovery planning and execution. Ability to effectively work in a highly matrixed organization Strong understanding of coding, automation, and engineering principles to build resilient, self-healing systems Familiarity with DevOps practices and tools Jira (or equivalent work management) Confluence (or equivalent knowledge management) Licenses/Certificates/Designations - IT industry networking certifications such as CCNP or JNCIP; ITIL or equivalent Minimum 5 years of experience supporting network and AV operations 5 years required delivering support in ethernet technologies/AV and networking concepts Advance knowledge of platform OS (router platform, VxLAN, WAN, LAN & routing protocols) and how they interact with the network Ability to apply principles, theories, and concepts, as well as knowledge or related networking/AV disciplines Advanced skills and knowledge and adherence to change management process. Network routing & switching Possess a customer-centric mindset Possess strong computer skills, including proficiency with Microsoft Office Outlook, Word, Excel, and PowerPoint Excellent oral and written communication

 

Wesco International, Inc., including its subsidiaries and affiliates (“Wesco”) provides equal employment opportunities to all employees and applicants for employment. Employment decisions are made without regard to race, religion, color, national or ethnic origin, sex, sexual orientation, gender identity or expression, age, disability, or other characteristics protected by law. US applicants only, we are an Equal Opportunity and Affirmative Action Employer.

#LI-GS1 

Por favor confirme su dirección de correo electrónico: Send Email