London, NA, GB
1 day ago
Search Inference - Senior MLOps Engineer

Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale — unleashing the potential of businesses and people. The Elastic Search AI Platform, used by more than 50% of the Fortune 500, brings together the precision of search and the intelligence of AI to enable everyone to accelerate the results that matter. By taking advantage of all structured and unstructured data — securing and protecting private information more effectively — Elastic’s complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI.

The Search Inference team is responsible for bringing performant, ergonomic, and cost effective machine learning (ML) model inference to Search workflows. ML inference has become a crucial part of the modern search experience whether used for query understanding, semantic search, RAG, or any other GenAI use-case.

Our goal is to simplify ML inference in Search workflows by focusing on large scale inference capabilities for embeddings and reranking models that are available across the Elasticsearch user base. As a team, we are a collaborative, cross-functional group with backgrounds in information retrieval, natural language processing, and distributed systems. We work with Go microservices, Python, Ray Serve, Kubernetes/KubeRay, and work on AWS, GCP & Azure.

We provide thought leadership across a variety of mediums including open code repositories, publishing blogs, and speaking at conferences. We focus on matching the expectations of our customers along the lines of throughput, latency, and cost. We’re seeking an experienced ML Ops Engineer to help us deliver on this vision.

Please include whatever info you believe is relevant in your application: resume, GitHub profile, code samples, blog posts and writing samples, links to personal projects, etc.

What You Will Be Doing

• Working with the team (and other teams) to evolve our inference service so it may host LLMs in addition to existing models (ELSER, E5, Rerank)

• Enhancing the scalability and reliability of the service and work with the team to ensure knowledge is shared and best practices are followed

• Improving the cost and efficiency of the platform, making the best use of available infrastructure

• Adapting existing solutions to use our inference service, ensuring a seamless transition

What You Bring

• 5+ years working in an MLOps or related ML Engineering role

• Production experience self-hosting & operating LLMs at scale for generative tasks via an inference framework such as Ray or KServe (or similar)

• Production experience with running and tuning specialized hardware for Generative AI workloads, especially GPUs via CUDA

• Measured and articulate written and spoken communication skills. You work well with others and can craft concise and expressive thoughts into correspondence: emails, issues, investigations, documentation, onboarding materials, and so on.

• An interest in learning new tools, workflows and philosophies that can help you grow. You can function well in an environment that drives towards change. This role has tremendous opportunities for growth!

As a distributed company, diversity drives our identity. Whether you’re looking to launch a new career or grow an existing one, Elastic is the type of company where you can balance great work with great life. Your age is only a number. It doesn’t matter if you’re just out of college or your children are; we need you for what you can do.

We strive to have parity of benefits across regions, and while regulations differ from place to place, we believe taking care of our people is the right thing to do.

• Competitive pay based on the work you do here and not your previous salary

• Health coverage for you and your family in many locations

• Ability to craft your calendar with flexible locations and schedules for many roles

• Generous number of vacation days each year

• Increase your impact - We match up to $2000 (or local currency equivalent) for financial donations and service

• Up to 40 hours each year to use toward volunteer projects you love

• Embracing parenthood with a minimum of 16 weeks of parental leave

Por favor confirme su dirección de correo electrónico: Send Email