Identify repetitive, manual operational tasks and design automation solutions to eliminate them.
Develop scripts, tools, and pipelines to automate deployments, scaling, monitoring, and incident response.
Kubernetes & Cloud OperationsManage and optimize Kubernetes clusters across multiple environments (dev, staging, production).
Implement automated cluster lifecycle management (provisioning, upgrades, scaling).
Reliability & ObservabilityBuild self-healing mechanisms for common failure scenarios.
Enhance observability by automating metrics, logging, and ing integrations.
CI/CD & Infrastructure as CodeImplement and maintain CI/CD pipelines for application and infrastructure deployments.
Use Infrastructure as Code (IaC) tools for consistent environment management.
Collaboration & Best PracticesWork closely with SREs, developers, and platform teams to improve reliability and reduce MTTR.
Advocate for an automation-first culture and SRE principles across teams.
Required Skills:Automation & Scripting: Proficiency in Python or Bash for automation tasks.
Kubernetes Expertise: Hands-on experience with Kubernetes (deployment, scaling, troubleshooting); CKA/CKAD certification preferred.
Cloud Platforms: Experience with AWS.
CI/CD Tools: Jenkins, GitLab CI, or similar.
IaC Tools: Terraform.
Observability: Familiarity with Splunk.
Version Control: Strong Git skills and experience with GitOps workflows.
Problem-Solving: Ability to analyze operational pain points and design automation solutions.