Site Reliability Engineer
Oracle
We are looking for a Site Reliability Engineer (SRE) to join our team and help ensure the reliability, scalability, and performance of our systems. In this role, you will bridge the gap between development and operations by implementing best practices in automation, monitoring, incident response, and infrastructure management.
Key Responsibilities:
Design, implement, and maintain scalable, reliable, and high-performance infrastructure. Develop and improve monitoring, alerting, and logging systems to ensure system health and performance. Automate operational tasks, deployments, and infrastructure provisioning. Collaborate with development and operations teams to improve system reliability and efficiency. Identify and resolve production issues, ensuring minimal downtime and fast recovery. Conduct root cause analysis and post-mortems for incidents, implementing preventive measures. Optimize system performance, capacity planning, and cost efficiency. Enhance security, compliance, and risk management practices for infrastructure and applications.
Qualifications & Skills:
Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience). Experience with cloud platforms such as OCI, AWS, GCP, or Azure. Proficiency in scripting and automation using Python, Bash, or similar languages Hands-on experience with infrastructure-as-code tools like Terraform, Ansible, or CloudFormation. Familiarity with containerization and orchestration (Docker, Kubernetes). Strong knowledge of CI/CD pipelines and DevOps best practices. Experience with monitoring and logging tools (Prometheus, Grafana, ELK, Datadog, etc.) Understanding of networking, Linux system administration, and database management. Strong problem-solving skills and a proactive approach to system reliability.Career Level - IC4
Por favor confirme su dirección de correo electrónico: Send Email
---