Associate Manager GCP SRE
PepsiCo
Overview We are seeking a skilled and dedicated Associate Manager Google Cloud Platform Site Reliability Engineer (GCP SRE) to join our dynamic team. The ideal candidate will be responsible for maintaining the reliability, scalability, and performance of our cloud-based infrastructure. As an Associate Manager GCP SRE, you will work closely with our development and operations teams to ensure the seamless deployment and operation of our applications and services on the Google Cloud Platform. Responsibilities Infrastructure Management: Design, implement, and manage scalable and reliable infrastructure on the Google Cloud Platform. Monitoring and Incident Response: Develop and maintain monitoring and alerting systems, respond to incidents, and perform root cause analysis to prevent future occurrences. Automation: Automate repetitive tasks to improve efficiency and reduce the likelihood of human error. Performance Optimization: Continuously evaluate and optimize the performance of our cloud-based applications and services. Collaboration: Work closely with development, operations, and security teams to ensure the successful deployment and operation of applications. Security: Implement and maintain security best practices to protect our infrastructure and data. Documentation: Create and maintain documentation for our infrastructure, processes, and procedures. Capacity Planning: Perform capacity planning to ensure our infrastructure can handle future growth and demand. Continuous Improvement: Identify and implement improvements to our infrastructure and processes to enhance reliability and efficiency. Qualifications Experience: 10+ years of experience and minimum of 5 years of experience in a site reliability engineering or similar role, with a focus on Google Cloud Platform. Education: Bachelor’s degree in Computer Science, Information Technology, or a related field, or equivalent work experience. Technical Skills: Proficiency in GCP services, such as Compute Engine, Kubernetes Engine, Cloud Storage, BigQuery, and Pub/Sub. Programming: Strong scripting and programming skills in languages such as Python, Go, or Bash. DevOps Tools: Experience with DevOps tools and practices, including CI/CD, Terraform, Ansible, and Jenkins. Monitoring Tools: Experience with monitoring and alerting tools, such as Prometheus, Grafana, and Stackdriver. Problem-Solving: Excellent problem-solving skills and the ability to troubleshoot complex issues in a cloud environment. Communication: Strong communication and collaboration skills, with the ability to work effectively in a team-oriented environment.
Por favor confirme su dirección de correo electrónico: Send Email