We are looking for DevOps engineers who are problem solvers at heart, with solid ability to dig into code, logs, troubleshooting and own the reliability domain. As a member of the this team, you will work with your developer partners and implement operability improvements, security, infrastructure
About Us
The Central Data team builds industry leading next generation cloud platforms, services and tools that underpin our products. We build and maintain systems at scale and ensure that Yahoo is always able to process petabytes of data and billions of events daily. Using a combination of open source software and internal tools across multiple cloud providers, we engineer platform solutions for data collection, stream processing, batch-processing, data querying, data governance, and data lineage. We promote and enable developer self-sufficiency and DevOps ownership models.
About You
We're looking for world-class, fun-loving engineers to join our team in Taiwan, where you will have the opportunity to help develop and implement scalable and reliable big data solutions. You will analyze requirements; investigate production best practices; architect, design, implement and test those solutions; and support our globally deployed infrastructure.
Responsibilities:
Lead initiatives to enhance and optimize existing cloud infrastructure, drive improvements in scalability, efficiency, and resilience, and oversee large-scale projects related to cloud platforms, automation, and performance optimization.
Develop and optimize tools for infrastructure management and automation on cloud platforms, applying Software Engineering Reliability (SRE) principles to write high-quality, maintainable code in languages such as Python, Java, and GoLang.
Collaborate with engineering teams to integrate SRE principles into the product lifecycle, ensuring improved site reliability and product functionality across cloud platforms.
Develop and implement automation strategies across cloud/on-prem environments to enhance system deployment, monitoring, and operational efficiency. This includes designing and managing CI/CD pipelines and utilizing infrastructure-as-code tools like Terraform, Ansible, and CloudFormation.
Maintain and support production systems and associated infrastructure, ensuring their availability, performance, and scalability through continuous monitoring and automation.
Work closely with cross-functional teams to understand product and technical roadmaps, identifying potential impacts on system operability and proposing proactive solutions for Cloud environments
Foster cross-functional collaboration between development, infrastructure, and operations teams to improve the overall performance and reliability of services on cloud.
Minimum Qualifications
BS/MS in Computer Science or equivalent degree
3+ years of experience in site reliability engineering, system engineer, or a related role, ideally in large-scale environments, with a focus on supporting 24x7 highly-available systems.
Familiarity & working experience with Kubernetes and container-based orchestration
Intermediate level of coding expertise in one or more language including Java, Python, or GoLang
Experience working with IaC (eg. Terraform, Ansible)
Experience with using Git to manage code
Experience with building CI/CD pipelines
Good knowledge of TCP/IP and networking
Familiarity with Observability tools, metric design and implementation
Strong verbal and written communication skills
Preferred Qualifications
Experience in Big Data technologies such as Hadoop, HBase, Storm, Flink and EMR
Experience in designing, managing large scale infrastructure in either AWS EKS, AWS Open Search, or GCP GKE, GCP Observability/Monitoring and with multi zone, multi region deployments
Deep understanding of UNIX/Linux system internals and tools for troubleshooting application stack dumps and networking
Experience working with GitHub Actions
Prior experience in technical operations and exposure to tools/product development.
Familiarity with observability tools & best practices and hands-on experience with applications like Chronosphere, Splunk, OpenSearch, GCP Observability Suite, Grafana & OTEL
The material job duties and responsibilities of this role include those listed above as well as adhering to Yahoo policies; exercising sound judgment; working effectively, safely and inclusively with others; exhibiting trustworthiness and meeting expectations; and safeguarding business operations and brand integrity.
Yahoo is proud to be an equal opportunity workplace. All qualified applicants will receive consideration for employment without regard to, and will not be discriminated against based on age, race, gender, color, religion, national origin, sexual orientation, gender identity, veteran status, disability or any other protected category. Yahoo will consider for employment qualified applicants with criminal histories in a manner consistent with applicable law. Yahoo is dedicated to providing an accessible environment for all candidates during the application process and for employees during their employment. If you need accessibility assistance and/or a reasonable accommodation due to a disability, please submit a request via the Accommodation Request Form (www.yahooinc.com/careers/contact-us.html) or call +1.866.772.3182. Requests and calls received for non-disability related issues, such as following up on an application, will not receive a response.
Yahoo has a high degree of flexibility around employee location and hybrid working. In fact, our flexible-hybrid approach to work is one of the things our employees rave about. Most roles don’t require specific regular patterns of in-person office attendance. If you join Yahoo, you may be asked to attend (or travel to attend) on-site work sessions, team-building, or other in-person events. When these occur, you’ll be given notice to make arrangements.
If you’re curious about how this factors into this role, please discuss with the recruiter.
Currently work for Yahoo? Please apply on our internal career site.