Bangalore, Karnataka, India
19 hours ago
Lead Engineer - Support

The Site Reliability Engineer will be responsible for ensuring the availability, reliability, and  performance of our customer-facing software applications.  This role combines planning, engineering, monitoring, incident response, and administration to create highly scalable and fault-tolerant systems.

Responsibilities:

Ensure the high availability and reliability of the production environment by monitoring system health and performance Provide primary operational support for large-scale distributed software applications Facilitate incident resolution via triage, communication, engagement, escalation, and documentation Partner with platform administration (both internal and external) to define and achieve stability and scalability objectives Collaborate with technical and quality teams to improve services by identifying areas of risk and helping to define and proactively implement solutions Drive continual improvement in system performance by setting service level objectives in collaboration with a performance center of practice and/or product development teams Participate in system design, capacity planning, and platform management  Analyze and publish metrics from operating systems and applications to assist in performance tuning and fault finding Pursue opportunities for automation and process improvements

Qualifications:

Bachelor’s degree (or demonstrable equivalent work experience) in information technology Experience providing first-level incident response and troubleshooting with technical teams to resolve end-user issues Proficiency with enterprise system monitoring software (examples: NewRelic, Nagios, Solarwinds, Dynatrace, Datadog, Azure Monitor, Splunk) Experience with cloud-based infrastructure, databases, and applications  Experience with performance tuning and fault finding in large-scale distributed systems. Experience with designing, implementing, and managing performance testing practices, including specific tools and frameworks Knowledge of disaster recovery planning and execution. Ability to effectively work in a highly matrixed organization Excellent verbal and written communication skills. Strong understanding of coding, automation, and engineering principles to build resilient, self-healing systems Familiarity with DevOps practices and tools Jira (or equivalent work management) Confluence (or equivalent knowledge management)

 

Wesco International, Inc., including its subsidiaries and affiliates (“Wesco”) provides equal employment opportunities to all employees and applicants for employment. Employment decisions are made without regard to race, religion, color, national or ethnic origin, sex, sexual orientation, gender identity or expression, age, disability, or other characteristics protected by law. US applicants only, we are an Equal Opportunity and Affirmative Action Employer.

Por favor confirme su dirección de correo electrónico: Send Email