At a Glance
- Tasks: Join us to revolutionise AI by compressing models for edge and cloud deployment.
- Company: Amazon Devices, a leader in innovative tech and consumer products.
- Benefits: Competitive salary, flexible work options, and opportunities for professional growth.
- Other info: Collaborative team environment with a focus on innovation and career development.
- Why this job: Make a real impact on cutting-edge AI technology used by millions worldwide.
- Qualifications: Master’s or PhD in relevant fields; programming skills in Java, C++, or Python.
The predicted salary is between 80000 - 98000 £ per year.
Amazon Devices is an inventive research and development company that designs and engineers high-profile devices like the Kindle family of products, Fire Tablets, Fire TV, Health & Wellness, Amazon Echo, and Astro products. This is an exciting opportunity to bring generative AI to Amazon's consumer products, both on-device at the edge and in the cloud. Our compression platform delivers 20x to 100x neural network compression, but using it well still takes weeks of hands-on learning and expert intuition. The Edge AI Model Studio team exists to change that.
We are looking for an Applied Scientist to join Model Studio and help compress the next generation of models for edge and cloud deployment across modalities, including large language models, vision-language models, speech and audio models, and omni models that reason jointly over text, audio, and video. You will apply and extend state-of-the-art compression recipes to real models, define the benchmarks and evaluation methodology that make trade-offs explicit, and build the reference implementations that let other teams deploy compressed models without our help.
Key job responsibilities:
- Apply and extend compression recipes (knowledge distillation, structured pruning, and post-training and quantization-aware quantization including low-bit and mixed-precision) to assigned models, achieving 20x to 100x compression while preserving model quality.
- Design and run healing recipes (fine-tuning and distillation that recover accuracy lost to compression), iterating on data mixes, objectives, and training settings until the compressed model meets its quality bar.
- Track emerging model architectures and dissect how they work internally, so you can choose where to compress, anticipate where accuracy will break, and design recovery strategies grounded in the model's actual structure.
- Build a library of compression-ready model entries: reference implementations, compression recipes, model cards, and benchmark results that partner teams can run self-service to produce deployment-ready artifacts for edge and cloud targets.
- Define the datasets, benchmarks, and KPIs that matter for your models, and build evaluation methodology that makes accuracy, latency, memory, and cost trade-offs explicit.
- Run fast feasibility gates on new model families and modalities before committing to long efforts, and pivot early when a candidate does not clear the bar.
- Capture platform friction as high-signal feedback: minimal reproductions and tracked fix requests that help platform and compression-science partners root-cause issues, so partner teams never rediscover the same blockers.
- Write reproducible, testable, well-documented code that meets the SDE I bar, so your recipes and results can be reproduced and built on by others.
- Collaborate with Applied Scientists, platform and compiler engineers, hardware architects, and partner teams; mentor interns and help newer teammates ramp up.
- Where appropriate and not precluded by business considerations, publish and present on Amazon's behalf at top ML venues such as NeurIPS, ICLR, and MLSys.
A typical week mixes hands-on compression and evaluation with design discussions alongside fellow scientists and platform engineers. You work in a small, fast-moving team where every recipe you harden compounds across future models and every partner you unblock ships faster.
Basic Qualifications:
- Master’s degree, or a PhD and experience in CS, CE, ML or related field.
- Experience programming in Java, C++, Python or related language.
- Experience in patents or publications at top-tier peer-reviewed conferences or journals.
- Experience in state-of-the-art deep learning models architecture design and deep learning training and optimization and model pruning.
Preferred Qualifications:
- Experience with multimodal and omni models: vision-language models, audio-language or speech models, or omni architectures that jointly process text, audio, and video.
- Experience with neural network compression techniques (quantization, knowledge distillation, structured pruning, low-rank factorization) for resource-constrained deployment.
- Familiarity with mixed-precision training and inference (FP16, BF16, FP8, INT8, INT4) and low-bit quantization.
- Experience with edge deployment, model compilation, or inference optimization, and an understanding of hardware-aware trade-offs.
- Experience with large-scale ML systems, including profiling, debugging, and reasoning about system performance.
Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience and skills. We value your passion to discover, invent, simplify and build.
Applied Scientist, Edge AI and Science in Cambridge employer: Amazon Science
Amazon Devices is an exceptional employer that fosters a culture of innovation and collaboration, particularly in the dynamic field of Edge AI. Employees benefit from a fast-paced environment where they can work on cutting-edge technology, enjoy ample opportunities for professional growth, and contribute to impactful projects that reach millions of customers worldwide. With a commitment to diversity and inclusion, Amazon Devices empowers its workforce to thrive while pushing the boundaries of generative AI.