1) AWS Cloud Monitoring & Performance Management
Design, implement, and manage monitoring solutions for AWS cloud infrastructure using tools like Amazon CloudWatch, AWS X-Ray, or third-party monitoring tools (e.g., Datadog, New Relic, Nagios).
Define and set up metrics, alerts, and dashboards for system health, application performance, and infrastructure reliability.
Troubleshoot and resolve AWS infrastructure issues to minimize downtime and optimize system performance.
2) Automation Using Ansible
Write, manage, and maintain Ansible playbooks for automating configuration management, deployments, patching, and other operational processes.
Develop and test automation workflows to ensure reliable execution across different environments.
Collaborate with DevOps and development teams to streamline CI/CD pipelines using Ansible.
3) Cloud Infrastructure Management
Migration from Chef to Ansible will be added advantage
Deploy and manage AWS services, including EC2, S3, RDS, Lambda, VPC, CloudFormation, etc.
4) Optimize AWS resources for cost efficiency and performance.
Stay updated on the latest AWS offerings and recommend relevant services to enhance infrastructure.
5) Incident Management and Problem Resolution
Monitor system incidents and resolve them efficiently, ensuring adherence to SLAs.
Perform root cause analysis and implement preventive measures to mitigate recurring issues.
Maintain and improve incident response processes and documentation.
6) Documentation and Reporting
Maintain accurate documentation of infrastructure configurations, monitoring systems, and automation scripts.
Create reports to demonstrate cloud environment health, resource utilization, and compliance.
Share knowledge and best practices with team members through documentation and training sessions.
7) Security and Compliance
Implement security best practices for monitoring and automation scripts.
Ensure systems are compliant with organizational and regulatory requirements.
Collaborate with security teams to perform vulnerability assessments and patch management.
Required Skills and Qualifications
Technical Skills:
Extensive experience in AWS services, architecture, and tools (e.g., CloudWatch, CloudFormation, IAM, EC2, S3, Lambda, etc.).
Proficient in writing and managing Ansible playbooks for automation and orchestration.
Experience with monitoring tools and setting up dashboards (e.g., Datadog, Prometheus, Grafana, etc.).
Strong understanding of networking concepts within AWS, including VPCs, subnets, routing, and security groups.
Experience with Linux/Unix environments and scripting languages like Python, Bash, or PowerShell.
Familiarity with CI/CD tools like Jenkins, GitLab CI, or AWS CodePipeline.
Knowledge of cloud cost optimization strategies and resource tagging.
Soft Skills:
Strong problem-solving and troubleshooting abilities.
Excellent communication and collaboration skills to work effectively with cross-functional teams.
Ability to multitask and prioritize tasks in a fast-paced environment.