Chennai
3 days ago
Senior Site Reliability Engineer – Azure | Kubernetes | Terraform

We are seeking a skilled Site Reliability Engineer to support the administration of Azure Kubernetes Service (AKS) clusters running critical, always-on middleware that processes thousands of transactions per second (TPS). The ideal candidate will operate with a mindset aligned to achieving 99.999% (five-nines) availability.

Key Responsibilities:

Own and manage AKS cluster deployments, cutovers, base image updates, and daily operational tasks.

Test and implement Infrastructure as Code (IaC) changes using best practices.

Apply software engineering principles to IT operations for maintaining scalable and reliable production environments.

Write and maintain IaC as well as automation code for:

Monitoring and ing

Log analysis

Disaster recovery testing

Incident response

Documentation-as-code

Mandatory Skills:

Strong experience with Terraform

In-depth knowledge of Azure Cloud

Proficiency in Kubernetes cluster creation and lifecycle management (deployment-only experience is not sufficient)

Hands-on experience with CI/CD tools (GitHub Actions preferred)

Bash and Python scripting skills

Desirable Skills:

Exposure to Azure Databricks and Azure Data Factory

Experience with secret management using HashiCorp Vault

Familiarity with monitoring tools (any)

Por favor confirme su dirección de correo electrónico: Send Email