Bellevue, WA, USA
1 day ago
Site Reliability Engineer with Observability & Production Support
Job Seekers, Please send resumes to resumes@hireitpeople.com

Mandatory Skills

SRE, Observability Tools and Production Support

Desired Skills

Knowledge on Java, Python, Go, Node etc

Any Certification (Mandatory)

Certified on one or more observability tools like Splunk. AppDynamics, Grafana, Dynatrace etc.

Detailed JD (Roles and Responsibilities)

Skills

SRE Mindset in Production support: Proactive issue identification using observability tools. Skills in using different monitoring & observability tools to track system performance  Incident commander: Ability to diagnose complex issues and actively drive incident calls working with technical, product SMEs, and Tier 2 SREs.  Communication: Excellent communicator who could interact with Director/Sr. Director and above.   

Technical expertise

Splunk (including Splunk APM and Splunk O11y), AppDynamics, Grafana, RedMetrics, 1000Eyes Knowledge of VMs, Load balancers, Firewalls, API Gateways, DB, Network, Linux / Unix  Knowledge of Containerization, Docker, Kubernetes, AWS, PCF, GCP ServiceNow (including AIOps, tools for Self-Heal and automated playbooks) APM, NMON, Wireshark usage and analysis Experience in UEM and synthetic monitoring tools

Responsibilities

Production support activities including proactive identification of issues leveraging observability tools with the aim of reducing MTTD and MTTR  Coordinate all activities required to lead incident triage in compliance with SLAs and OLAs. Corelating inputs from various dashboards & tools to drive resolution. Flexibility to work in 24 X 7 environment 
Por favor confirme su dirección de correo electrónico: Send Email