8+ Years of relevant experience. Total industry experience
Technical Capabilities:
•Automating Tasks: Designing, maintenance and management of tools for automation of different operational processes. Design and Write code to automate repetitive tasks, such as provisioning new servers or managing configurations.
•Troubleshooting Outages: When incidents occur, dive into troubleshooting, identifying root causes, and resolving issues promptly.
•On-Call Responsibilities: participate in on-call rotations, ensuring 24/7 availability and rapid response to incidents.
•Monitoring and Observability: They set up monitoring systems, track key metrics, and respond proactively to anomalies.
•Capacity Planning: analyze system capacity, predict resource needs, and optimize infrastructure.
•Deployment and Release Management: Deployment, automation, management, configuration and maintenance of AWS cloud-based production system.
Process Capabilities:
•Change Management: oversee how code is deployed, configured, and monitored.
•Availability and Latency: focus on maintaining high availability and low latency for services.
•Emergency Response: incident management, ensuring timely resolution and minimal impact.
•Capacity Management: assess system capacity, scaling resources as needed.
•Documentation: document processes, best practices, and incident resolutions.
•Collaboration: They work closely with development teams, fostering collaboration and shared responsibility.
Work Experience
•Experience as Devops for large cloud native applications
•Communication: Effective communication and collaboration with cross-functional teams.
•Manage your own time and work well both independently and as part of a team
•Good understanding of Agile processes
•Experience in code development in at least one high-level programming language.
•System Administration: Familiarity with Operating Systems and networking.
•Cloud Platforms: Extensive experience on AWS platform - EC2, EKS, S3, RDS, IAM, CloudFront, CloudWatch, SNS/SQS, Kubernetis, ElastiCache, Lambda, AWS IOT, Kinesis. Experience with multi-tier architectures: load balancers, caching, web servers, application servers, databases, and networking.
•Monitoring Tools: Knowledge of monitoring and observability tools : Grafana and Splunk.
•Automation: should be comfortable with infrastructure-as-code (e.g., Terraform, Ansible).
•Problem-Solving: Strong analytical skills to troubleshoot complex issues.
Deployments Know-How : CI/CD, pods management, Sonar Q, Git, etc