Mexico
1 day ago
Site Reliability Engineer

Site Reliability Engineering at Ford Motor Company plays a critical role in maintaining and improving the reliability, scalability, and performance of our services. You will work closely with our development teams to build and maintain large-scale, distributed systems and ensure our products meet our high standards for availability and user experience.

Basic Qualifications:

Bachelor’s degree in computer science, engineering, mathematics or equivalent experience.3+ years of experience as an SRE, DevOps Engineer, Software Engineer or similar role.3+ years of experience with Python, Java, C/C++, Ruby, and JavaScript3+ years of experience with J2EE, NoSQL/SQL Datastore, Spring Boot, GCP/AWS/Azure & Docker/K8, RESTful APIs and microservices platform3+ years of  experience with any of APM and other monitoring tools such as Dynatrace, New Relic, ELK, Splunk, Prometheus, Sensu, Nagios, Kafka, DataDog, PagerDuty.

Preferred Qualifications:

Strong experience with monitoring and observability tools, particularly Dynatrace and OpenTelemetry or other tools.Experience using Cycode APSM.Proficient with cloud services, with a strong preference for Google Cloud Platform (GCP) experience.Solid programming skills in Java, Golang, or other programming languages, with a good understanding of software development best practices.Experience with relational and document databases.Familiarity with front-end development frameworks, particularly React.Ability to debug, optimize code, and automate routine tasks.Strong problem-solving skills and the ability to work under pressure in a fast-paced environment.Excellent verbal and written communication skills.

Key Responsibilities Include:

Continuously monitoring the availability, reliability, and performance of systems, platforms, and applications, maintaining a holistic view of system health. Regularly review key site technical metrics such as transactions errors, logging, response times, caching strategies, conversion/bounce rates, capacity & resource utilization. Providing primary operational and engineering support for multiple large, distributed software applications. Proactively identify stability risks & work with engineering leadership to establish appropriate mitigation plans. Using automation tools, scripts, and processes to reduce or eliminate repetitive tasks, thereby improving the support provided by Site Reliability Engineering. Creating or modifying terraform files according to Ford formats to develop new monitoring dashboards and alert policies.
Por favor confirme su dirección de correo electrónico: Send Email