Machine Learning Engineer II, AWS Just-Walk-Out Science Team
Amazon
Description
As part of the AWS Solutions organization, we have a vision to provide business applications, leveraging Amazon’s unique experience and expertise, that are used by millions of companies worldwide to manage day-to-day operations. We will accomplish this by accelerating our customers’ businesses through delivery of intuitive and differentiated technology solutions that solve enduring business challenges. We blend vision with curiosity and Amazon’s real-world experience to build opinionated, turnkey solutions. Where customers prefer to buy over build, we become their trusted partner with solutions that are no-brainers to buy and easy to use.
The Team
Just Walk Out (JWO) is a new kind of store with no lines and no checkout—you just grab and go! Customers simply use the Amazon Go app to enter the store, take what they want from our selection of fresh, delicious meals and grocery essentials, and go!
Our checkout-free shopping experience is made possible by our Just Walk Out Technology, which automatically detects when products are taken from or returned to the shelves and keeps track of them in a virtual cart. When you’re done shopping, you can just leave the store. Shortly after, we’ll charge your account and send you a receipt. Check it out at amazon.com/go. Designed and custom-built by Amazonians, our Just Walk Out Technology uses a variety of technologies including computer vision, sensor fusion, deep learning, and foundation models. Innovation is part of our DNA! Our goal is to be Earths’ most customer centric company and we are just getting started. We need people who want to join an ambitious program that continues to push the state of the art in computer vision, deep learning, real-time and distributed systems, and hardware design.
Everyone on the team needs to be entrepreneurial, wear many hats and work in a highly collaborative environment that’s more startup than big company. The team works on designing autonomous AI agents that can make intelligent decisions based on visual inputs, understand customer behavior patterns, and adapt to dynamic retail environments. This includes developing systems that can perform complex scene understanding, reason about object permanence, and predict customer intentions through visual cues.
Key job responsibilities
- Collaborate with Applied Scientists to integrate state-of-the-art model architectures into the training pipeline, integrate state-of-the-art MLLMs into the auto-labelling pipeline.
- Collaborate with Applied Scientists to process massive data, scale machine learning models while optimizing GPU utilization, memory management, and the training workflows (like kernel fusion, mixed-precision training, gradient accumulation, offloading optimizer states, massive parallelization, etc).
- Design and maintain large-scale distributed training systems to support multi-modal foundation models for autonomous retailing. Optimize GPU utilization for efficient model training and fine-tuning on massive datasets.
- Develop robust monitoring and debugging tools to ensure the reliability and performance of training workflows on large GPU clusters. Design and maintain large-scale auto-labeling pipeline.
- Collaborate with Engineers and Applied Scientists to investigate design approaches, prototype new technology and evaluate technical feasibility, identify and solve complex problems.
A day in the life
As a MLE with the JWO team, you will be responsible for leading the development of novel algorithms and modeling techniques to advance the state of the art of model training using hardware like NVDIA GPUs. Your work will directly impact our customers in the form of products and services that make use of Just-Walk-Out innovations. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate development with multi-modal Foundation Models and other Artificial Intelligence (AI) applications. As a key player in our team, you'll have a significant influence on our overall strategy, shaping the future direction of JWO at Amazon. You'll be the driving force behind our system architecture and the champion of best practices that will ensure an unparalleled infrastructure of the highest quality. Work in an Agile/Scrum environment to move fast and deliver high quality software.
Basic Qualifications
- 3+ years of non-internship professional software development experience, including coding standards, code reviews, source control management, build processes, testing, and operations.
- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience.
- Proficient in Python or related language.
- Hands-on model training experience in PyTorch and deep learning frameworks such as MMEngine or Megatron-LM; experienced in large-scale deep learning or machine learning operations.
- Familiar with modern visual-language models, multi-modal AI systems, pre-training and post-training techniques. Proficient in training profilers and performance analysis tools to identify and optimize bottlenecks in model training.
Preferred Qualifications
- Master's or PhD degree in computer science or equivalent.
- 1+ years of experience in developing, deploying or optimizing ML models. Exceptional engineering skills in building, testing, and maintaining scalable distributed GPU training frameworks. Familiar with HuggingFace Transformers for vision-language modeling.
- Hands-on experience in large-scale multimodal LLM and generative model training. Contributions to popular open-source LLM frameworks or research publications in top-tier AI conferences, such as CVPR, ECCV, ICCV, ICLR, etc.
- Experience in GPU utilization and memory optimization techniques like kernel fusion and custom kernels, mixed precision training using lower precision and dynamic loss scaling, gradient (activation) checkpointing, gradient accumulation, offloading optimizer states, and smart prefetching, Fully Sharded Data Parallel (FSDP), tensor and pipeline model parallelism.
- Proven experience in large-scale video understanding tasks, with a focus on multi-modal learning that integrates visual and/or textual information; includes experience designing efficient data preprocessing pipelines, building and scaling multi-modal model architectures, and conducting robust evaluation at scale.
Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company’s reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.
Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $129,300/year in our lowest geographic market up to $223,600/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit https://www.aboutamazon.com/workplace/employee-benefits . This position will remain posted until filled. Applicants should apply via our internal or external career site.
Por favor confirme su dirección de correo electrónico: Send Email