Newtown Square, PA, 19073, USA
9 hours ago
Site Observability Engineer
Job Description We are seeking a skilled and experienced Senior Observability Engineer to join the NS2 Observability team. The ideal candidate will be responsible for improving our monitoring and alerting posture for Cloud Infrastructure. The role requires a strong understanding of observability tools and practices, with a focus on Prometheus, Grafana, Gardener Kubernetes, and Splunk. Experience with Dynatrace is a plus. Key Responsibilities: - Implement, manage, and improve monitoring solutions that use Prometheus, ensuring high availability and accurate alerting for our systems. - Contribute to the development of observability strategies to improve our Cloud monitoring posture. - Collaborate with development teams to integrate observability into the CI/CD pipeline and throughout the application lifecycle. - Respond to and investigate incidents, providing thorough post-mortem analyses and implementing preventive measures. - Stay current with the latest trends and best practices in site reliability and observability. - Work with cross-functional teams to ensure system reliability, scalability, and performance. We are a company committed to creating inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity employer that believes everyone matters. Qualified candidates will receive consideration for employment opportunities without regard to race, religion, sex, age, marital status, national origin, sexual orientation, citizenship status, disability, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to Human Resources Request Form (https://airtable.com/app21VjYyxLDIX0ez/shrOg4IQS1J6dRiMo) . The EEOC "Know Your Rights" Poster is available here (https://www.eeoc.gov/sites/default/files/2023-06/22-088\_EEOC\_KnowYourRights6.12ScreenRdr.pdf) . To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/ . Skills and Requirements - 5+ years of experience as a Site Reliability or DevOps Engineer - 3+ years of experience working with Kubernetes for container orchestration, preferably with Gardener Kubernetes - 3+ years of scripting experience with Python, Terraform, or Ansible - Experience with Prometheus for monitoring solutions and ensuring high availability and accurate alerting for systems - Experience with Grafana and Splunk for observability - Ideally has experience with Dynatrace, they will work on making it a large footprint in the organization null We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal employment opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment without regard to race, color, ethnicity, religion,sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military oruniformed service member status, or any other status or characteristic protected by applicable laws, regulations, andordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request to HR@insightglobal.com.
Por favor confirme su dirección de correo electrónico: Send Email