LLM Inference & Deployment Engineer in London
LLM Inference & Deployment Engineer

LLM Inference & Deployment Engineer in London

London Temporary 60000 - 80000 £ / year (est.) Home office (partial)
Synergetic

At a Glance

  • Tasks: Own the end-to-end inference infrastructure for large-scale LLMs in air-gapped environments.
  • Company: Join a cutting-edge tech firm focused on innovative AI solutions.
  • Benefits: Hybrid working, competitive pay, and a chance to work with advanced technologies.
  • Other info: Exciting opportunity for career growth in a specialist role.
  • Why this job: Make a real impact by deploying frontier models in a unique, compliance-driven setting.
  • Qualifications: Experience with multi-GPU deployments and model quantisation techniques required.

The predicted salary is between 60000 - 80000 £ per year.

3 month + contract

Inside IR35

Hybrid working (2-3 days in London)

You've deployed 70B parameter models on GPU clusters with no Internet access. You know the difference between a model that works in a notebook and one that runs reliably in production under compliance scrutiny. If that's your world, we want to talk.

This is a genuinely specialist role. The platform you'll be working on runs multiple large-scale LLMs concurrently, frontier models for text screening, code LLMs for analysis, transformer encoders for classification, all in an air-gapped environment with a fixed compute budget and zero external API access.

You'll own the inference infrastructure end-to-end: GPU allocation strategy, quantisation decisions, batching, determinism controls, and offline deployment packaging. The system has to be fast, reliable, and auditable. That's a rare combination of skills and this role is for someone who has genuinely done it before.

What we're looking for:

  • Production experience with vLLM, TensorRT-LLM, TGI, or equivalent at multi-GPU scale
  • Model quantisation expertise: GPTQ, AWQ, GGUF, bitsandbytes
  • Multi-node inference: tensor/pipeline/expert parallelism
  • Air-gapped or classified environment deployment experience strongly preferred
  • Offline dependency packaging: conda-pack, pip wheels, container images

If you are available and interested in this new role please send a current CV.

LLM Inference & Deployment Engineer in London employer: Synergetic

As an LLM Inference & Deployment Engineer, you will join a forward-thinking company that values innovation and expertise in cutting-edge technology. With a hybrid working model based in London, the company fosters a collaborative work culture that encourages professional growth and development, offering unique opportunities to work on large-scale models in air-gapped environments. Employees benefit from a supportive atmosphere that prioritises compliance and reliability, making it an excellent place for those seeking meaningful and rewarding employment in a specialist field.
Synergetic

Contact Detail:

Synergetic Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land LLM Inference & Deployment Engineer in London

✨Tip Number 1

Network, network, network! Reach out to folks in your industry on LinkedIn or at meetups. We all know that sometimes it’s not just what you know, but who you know that can get you in the door.

✨Tip Number 2

Prepare for those interviews like a pro! Research the company and the role thoroughly. We want you to be able to discuss how your experience with deploying large models fits perfectly into their needs.

✨Tip Number 3

Showcase your skills through projects or contributions. If you've worked on similar deployments or have relevant side projects, make sure to highlight them. We love seeing practical examples of your expertise!

✨Tip Number 4

Apply directly through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we’re always on the lookout for talent that matches our needs.

We think you need these skills to ace LLM Inference & Deployment Engineer in London

LLM Deployment
GPU Cluster Management
Model Quantisation
vLLM
TensorRT-LLM
TGI
Multi-GPU Scaling
Multi-Node Inference
Air-Gapped Environment Experience
Offline Dependency Packaging
Conda-pack
Pip Wheels
Container Images
Production Reliability
Auditable Systems

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights your experience with deploying large models in air-gapped environments. We want to see specific examples of your work with GPU clusters and any relevant tools you've used, like vLLM or TensorRT-LLM.

Showcase Your Skills: Don’t just list your skills; demonstrate them! Include details about your expertise in model quantisation and multi-node inference. We’re looking for someone who can really own the inference infrastructure, so make that clear!

Be Clear and Concise: Keep your application straightforward and to the point. We appreciate clarity, so avoid jargon unless it’s necessary. Make it easy for us to see why you’re a great fit for this specialist role.

Apply Through Our Website: We encourage you to apply directly through our website. It helps us keep track of applications better and ensures you don’t miss out on any important updates from us!

How to prepare for a job interview at Synergetic

✨Know Your Models Inside Out

Make sure you can discuss the deployment of large models like vLLM or TensorRT-LLM in detail. Be prepared to explain your experience with model quantisation techniques and how they impact performance in air-gapped environments.

✨Demonstrate Your Infrastructure Skills

Be ready to talk about your end-to-end ownership of inference infrastructure. Highlight your strategies for GPU allocation, batching, and ensuring reliability under compliance scrutiny. Real-world examples will make your points more compelling.

✨Showcase Your Problem-Solving Abilities

Prepare to discuss challenges you've faced in deploying models in classified environments. Think of specific instances where you had to innovate or adapt your approach to meet strict requirements, and share those stories.

✨Familiarise Yourself with Offline Packaging

Brush up on your knowledge of offline dependency packaging tools like conda-pack and pip wheels. Being able to explain how you’ve used these tools in previous roles will demonstrate your readiness for this position.

LLM Inference & Deployment Engineer in London
Synergetic
Location: London

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>