IT Engineer IV - Site Reliability Engineer
Kforce
Kforce has a client that is seeking an IT Engineer IV - Site Reliability Engineer in Reston, VA.
Duties Include:
* Design and implement observability strategies using OpenTelemetry for distributed tracing, metrics, and logging
* Instrument microservices written in Java and Python using Otel SDKs and auto-instrumentation tools
* Develop and maintain Splunk dashboards, alerts, and reports to provide actionable insights into system performance and reliability
* Collaborate with development and operations teams to ensure consistent and effective telemetry across services
* Automate monitoring and alerting pipelines to proactively detect and resolve issues
* Participate in on-call rotations, incident response, and postmortem analysis to improve system resilience
* Drive adoption of SRE best practices including SLIs, SLOs and error budgets
* Continuously evaluate and improve observability tools and practices
Por favor confirme su dirección de correo electrónico: Send Email