At a Glance
- Tasks: Own the end-to-end inference infrastructure for large-scale LLMs in air-gapped environments.
- Company: Join a cutting-edge tech firm focused on innovative AI solutions.
- Benefits: Hybrid working, competitive pay, and a chance to work with advanced technologies.
- Other info: Exciting opportunity for career growth in a specialist role.
- Why this job: Make a real impact by deploying frontier models in a unique, compliance-driven setting.
- Qualifications: Experience with multi-GPU deployments and model quantisation techniques required.
The predicted salary is between 60000 - 80000 £ per year.
3 month + contract
Inside IR35
Hybrid working (2-3 days in London)
You've deployed 70B parameter models on GPU clusters with no Internet access. You know the difference between a model that works in a notebook and one that runs reliably in production under compliance scrutiny. If that's your world, we want to talk.
This is a genuinely specialist role. The platform you'll be working on runs multiple large-scale LLMs concurrently, frontier models for text screening, code LLMs for analysis, transformer encoders for classification, all in an air-gapped environment with a fixed compute budget and zero external API access.
You'll own the inference infrastructure end-to-end: GPU allocation strategy, quantisation decisions, batching, determinism controls, and offline deployment packaging. The system has to be fast, reliable, and auditable. That's a rare combination of skills and this role is for someone who has genuinely done it before.
What we're looking for:
- Production experience with vLLM, TensorRT-LLM, TGI, or equivalent at multi-GPU scale
- Model quantisation expertise: GPTQ, AWQ, GGUF, bitsandbytes
- Multi-node inference: tensor/pipeline/expert parallelism
- Air-gapped or classified environment deployment experience strongly preferred
- Offline dependency packaging: conda-pack, pip wheels, container images
If you are available and interested in this new role please send a current CV.
LLM Inference & Deployment Engineer in London employer: Synergetic
Contact Detail:
Synergetic Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land LLM Inference & Deployment Engineer in London
✨Tip Number 1
Network, network, network! Reach out to folks in your industry on LinkedIn or at meetups. We all know that sometimes it’s not just what you know, but who you know that can get you in the door.
✨Tip Number 2
Prepare for those interviews like a pro! Research the company and the role thoroughly. We want you to be able to discuss how your experience with deploying large models fits perfectly into their needs.
✨Tip Number 3
Showcase your skills through projects or contributions. If you've worked on similar deployments or have relevant side projects, make sure to highlight them. We love seeing practical examples of your expertise!
✨Tip Number 4
Apply directly through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we’re always on the lookout for talent that matches our needs.
We think you need these skills to ace LLM Inference & Deployment Engineer in London
Some tips for your application 🫡
Tailor Your CV: Make sure your CV highlights your experience with deploying large models in air-gapped environments. We want to see specific examples of your work with GPU clusters and any relevant tools you've used, like vLLM or TensorRT-LLM.
Showcase Your Skills: Don’t just list your skills; demonstrate them! Include details about your expertise in model quantisation and multi-node inference. We’re looking for someone who can really own the inference infrastructure, so make that clear!
Be Clear and Concise: Keep your application straightforward and to the point. We appreciate clarity, so avoid jargon unless it’s necessary. Make it easy for us to see why you’re a great fit for this specialist role.
Apply Through Our Website: We encourage you to apply directly through our website. It helps us keep track of applications better and ensures you don’t miss out on any important updates from us!
How to prepare for a job interview at Synergetic
✨Know Your Models Inside Out
Make sure you can discuss the deployment of large models like vLLM or TensorRT-LLM in detail. Be prepared to explain your experience with model quantisation techniques and how they impact performance in air-gapped environments.
✨Demonstrate Your Infrastructure Skills
Be ready to talk about your end-to-end ownership of inference infrastructure. Highlight your strategies for GPU allocation, batching, and ensuring reliability under compliance scrutiny. Real-world examples will make your points more compelling.
✨Showcase Your Problem-Solving Abilities
Prepare to discuss challenges you've faced in deploying models in classified environments. Think of specific instances where you had to innovate or adapt your approach to meet strict requirements, and share those stories.
✨Familiarise Yourself with Offline Packaging
Brush up on your knowledge of offline dependency packaging tools like conda-pack and pip wheels. Being able to explain how you’ve used these tools in previous roles will demonstrate your readiness for this position.