Staff / Principal Machine Learning Engineer, Serving - UK

Staff / Principal Machine Learning Engineer, Serving - UK

Full-Time 140000 - 200000 £ / year (est.) No working from home possible
I

At a Glance

  • Tasks: Develop and optimise cutting-edge machine learning models for real-time applications.
  • Company: Join a top AI research lab backed by major investors and industry leaders.
  • Benefits: Competitive salary, equity options, and comprehensive benefits package.
  • Other info: Dynamic work environment with opportunities for growth and open-source contributions.
  • Why this job: Make a significant impact in the AI field with innovative technology and projects.
  • Qualifications: Experience in ML systems, programming, and a strong problem-solving mindset.

The predicted salary is between 140000 - 200000 £ per year.

About Inworld

Inworld is a product-oriented research lab of top AI researchers and engineers, developing best-in-class realtime multimodal models and the only realtime orchestration platform optimized for thousands of queries per second. Our technology has powered experiences from companies such as NVIDIA, Microsoft Xbox, Niantic, Logitech Streamlabs, Wishroll, Little Umbrella and Bible Chat. We’ve also been recognized by CB Insights as one of the 100 most promising AI companies globally and have been named one of LinkedIn's Top 10 Startups in the USA.

Who We're Looking For

A year ago, reliably working agentic systems and sub-second multimodal inference at scale barely existed. Nobody has a decade of experience here. So we're not screening for a resume template — we're looking for strong people from varied backgrounds who learn fast, thrive in ambiguity, and can show us what they've built, broken, and understood.

Experience We Find Useful

  • Inference Optimization: Deep understanding of modern serving frameworks and techniques like vLLM or TRT-LLM.
  • Model Acceleration: Hands-on experience with quantization, distillation, caching strategies, continuous batching, paged attention, and speculative decoding.
  • High-Performance Systems: Proficiency in C++, CUDA, Rust, or highly optimized Python. You know how to profile code and squeeze every ounce of performance out of NVIDIA GPUs.
  • Distributed Systems & Scaling: Experience with Kubernetes, Ray, custom load balancing, multi-GPU/multi-node inference, and reliably handling thousands of concurrent connections.
  • Public work: Non-trivial systems programming projects, open-source contributions to major inference engines, or deep-dive technical write-ups.
  • Full-cycle ownership: You can take a model from the research team, containerize it, optimize its serving, and ensure it runs reliably in production.
  • Background: PhD in CS, Physics, Math, or equivalent practical experience building backend or ML systems.

Who Thrives Here

  • You don’t need a roadmap to start walking; you’re comfortable picking a direction and building the map as you go.
  • You believe engineering isn't finished until it’s shipped and stable. You have a bias for impact over purely theoretical optimizations.
  • You don't just ship code; you obsess over the why. You’re the first to question an architecture if you think there’s a better way to solve the core latency or throughput problem.
  • You aren't satisfied with "the PM said so." You thrive on deep context and want to understand the fundamental logic behind every decision we make.

What Working Here Is Like

We hand you unclear problems and expect you to make them clear. We value engineers who say "I don't know yet" and then design the benchmark or prototype that finds out. We treat performance, latency, and reliability as first-class product features, not a box to check before launch. Impact comes before everything else, though we support sharing work and open-source contributions that move the field forward. Your work should be visible. Flat structure, fast iterations, minimal process theater.

The base salary range for this full-time position is £140,000 – £200,000. In addition to base pay, total compensation includes equity and benefits. Within the range, individual pay is determined by work location, level, and additional factors, including competencies, experience, and business needs. The base pay range is subject to change and may be modified in the future.

Candidates must already have the legal right to work in the United Kingdom, as visa sponsorship is not available for this role. For candidates interested in relocating to the San Francisco Bay Area in the future, full U.S. visa and relocation support may be available, subject to business needs and applicable legal and work authorization requirements.

Staff / Principal Machine Learning Engineer, Serving - UK employer: Inworld AI

Inworld is an exceptional employer for those passionate about AI and machine learning, offering a dynamic work culture that encourages innovation and impact. With a flat structure and a focus on meaningful contributions, employees are empowered to tackle complex challenges and drive advancements in technology. Located in the UK, Inworld provides competitive compensation, equity options, and opportunities for professional growth in a rapidly evolving field.

I

Contact Details:

Inworld AI Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Staff / Principal Machine Learning Engineer, Serving - UK

Tip Number 1

Get your hands dirty with projects that showcase your skills. Build something cool, break it, and then fix it! This hands-on experience is what we want to see, so don’t be shy about sharing your journey.

Tip Number 2

Network like a pro! Connect with folks in the industry on platforms like LinkedIn or at meetups. You never know who might have the inside scoop on job openings or can refer you directly to us.

Tip Number 3

When you get that interview, come prepared with questions that show you’ve done your homework. We love candidates who are curious and want to understand the 'why' behind our tech and decisions.

Tip Number 4

Apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows us you’re genuinely interested in being part of our team.

We think you need these skills to ace Staff / Principal Machine Learning Engineer, Serving - UK

Inference Optimization
Deep understanding of modern serving frameworks
Model Acceleration
Hands-on experience with quantization
Caching strategies
Proficiency in C++
CUDA

Some tips for your application 🫡

Show Us Your Passion:When you're writing your application, let your enthusiasm for machine learning and AI shine through. We want to see what excites you about the field and how you've engaged with it in your past projects.

Be Specific About Your Experience:Don't just list your skills; give us examples of how you've applied them. Whether it's optimising inference or working with distributed systems, share specific projects or challenges you've tackled that relate to the role.

Keep It Clear and Concise:While we love detail, make sure your application is easy to read. Use clear language and structure your thoughts logically. This helps us understand your journey and thought process better.

Apply Through Our Website:We encourage you to submit your application directly through our website. It’s the best way for us to keep track of your application and ensure it gets the attention it deserves!

How to prepare for a job interview at Inworld AI

Know Your Stuff

Make sure you have a solid grasp of inference optimisation and model acceleration techniques. Brush up on frameworks like vLLM or TRT-LLM, and be ready to discuss your hands-on experience with quantisation and caching strategies.

Show Your Work

Prepare to showcase any public work or contributions you've made to open-source projects. This could be anything from systems programming projects to technical write-ups that demonstrate your understanding of complex ML systems.

Embrace Ambiguity

Inworld values engineers who can navigate unclear problems. Be ready to discuss how you've tackled ambiguous situations in the past and how you approach designing benchmarks or prototypes to find solutions.

Ask Why

Don't just accept decisions at face value. Prepare thoughtful questions about the architecture and design choices made by the team. Show that you’re not only interested in shipping code but also in understanding the underlying logic behind every decision.