Job Board

Companies

Doubleword

Member of Technical Staff: LLM Inference Systems

Member of Technical Staff: LLM Inference Systems in London

London Full-Time 60000 - 80000 £ / year (est.) No home office possible

Apply now

At a Glance

Tasks: Develop cutting-edge inference technology for generative AI and optimise performance.
Company: Join a pioneering team dedicated to advancing large language models.
Benefits: Competitive pay, comprehensive benefits, and growth opportunities in tech.
Other info: Dynamic environment focused on innovation and professional development.
Why this job: Make a real impact in AI by solving complex inference challenges.
Qualifications: Experience with GPU architectures and deep learning libraries is a plus.

The predicted salary is between 60000 - 80000 £ per year.

We're seeking a Senior Research Engineer to join our mission of solving the hardest inference challenges in generative AI. You'll be responsible for developing cutting edge inference technology at all levels of the inference stack. This could involve writing custom kernels for inference, designing compute clusters for unique inference needs, or contributing to state of the art open source inference engines.

Examples of projects you might work on:

Building and optimizing infrastructure for batch inference workloads: focusing on high throughput, cost-efficient processing
Inferencing fine tuned models at scale: using tools like multi LoRA and multi PEFT inference engines.
Optimizing open source inference engines for offloading-based inference: implementing inference optimizations for severely memory constrained environments.

What We're Looking For:

Note: A good candidate will have 80% of the following qualities. Please apply, even if the following doesn't describe you perfectly.

Understanding of GPU architectures and their performance characteristics
Deep understanding of LLM inference workloads, performance characteristics, and optimization techniques
Familiarity with Inference tooling and deep learning libraries (PyTorch, TensorRT, vLLM, SGLang, TensorRT-LLM)
Curiosity about emerging hardware trends and ML optimization techniques
Ability to understand complex research requirements and translate them into infrastructure needs
Comfort with ambiguity and rapidly evolving technical landscapes
Experience supporting research workflows and experimental systems

We're dedicated to making large language models faster, cheaper, and more accessible. Our infrastructure team is laser-focused on LLM inference optimization, pushing the boundaries of what's possible in terms of performance and cost efficiency while maintaining the reliability needed to serve these models at scale. We provide competitive compensation, comprehensive benefits, and opportunities for professional growth in one of the most exciting fields in technology.

Member of Technical Staff: LLM Inference Systems in London employer: Doubleword

Join a pioneering team dedicated to advancing generative AI through innovative LLM inference systems. We offer a collaborative work culture that fosters creativity and professional growth, alongside competitive compensation and comprehensive benefits. Located in a vibrant tech hub, our company provides unique opportunities to work on cutting-edge projects that shape the future of AI technology.

Contact Detail:

Doubleword Recruiting Team

View Doubleword Profile

StudySmarter Expert Advice 🤫

We think this is how you could land Member of Technical Staff: LLM Inference Systems in London

✨Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can refer you directly.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects related to LLM inference systems. This gives potential employers a taste of what you can do and sets you apart from the crowd.

✨Tip Number 3

Prepare for interviews by brushing up on technical concepts and common questions in the field. Practice explaining your thought process clearly, especially around GPU architectures and optimization techniques.

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love seeing candidates who are genuinely interested in joining our mission.

We think you need these skills to ace Member of Technical Staff: LLM Inference Systems in London

GPU Architecture Understanding

LLM Inference Workloads

Performance Optimization Techniques

Inference Tooling Familiarity

Deep Learning Libraries (PyTorch, TensorRT, vLLM, SGLang, TensorRT-LLM)

Research Mindset

Curiosity about Emerging Hardware Trends

Machine Learning Optimization Techniques

Infrastructure Development for Inference

Batch Inference Workload Optimization

Experimental Systems Support

Adaptability to Rapidly Evolving Technical Landscapes

Some tips for your application 🫡

Tailor Your CV: Make sure your CV reflects the skills and experiences that align with the role of Member of Technical Staff. Highlight your understanding of GPU architectures and LLM inference workloads, as these are key to what we’re looking for.

Craft a Compelling Cover Letter: Use your cover letter to showcase your passion for generative AI and your curiosity about emerging hardware trends. This is your chance to tell us why you’re excited about the role and how you can contribute to our mission.

Showcase Relevant Projects: If you've worked on projects involving deep learning libraries or inference tooling, make sure to include them in your application. We love seeing practical examples of your work that demonstrate your technical skills and research mindset.

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you don’t miss out on any important updates from our team!

How to prepare for a job interview at Doubleword

✨Know Your Tech Inside Out

Make sure you brush up on your understanding of GPU architectures and LLM inference workloads. Be ready to discuss optimisation techniques and how they apply to real-world scenarios. This will show that you’re not just familiar with the theory, but you can also translate it into practical solutions.

✨Showcase Your Curiosity

Demonstrate your research mindset by discussing recent trends in hardware and ML optimisation techniques. Bring examples of how you've kept up with emerging technologies or contributed to projects that required innovative thinking. This will highlight your passion for the field and your ability to adapt.

✨Prepare for Technical Challenges

Expect to tackle some technical questions or problems during the interview. Practice explaining complex concepts clearly and concisely. You might even want to simulate a coding challenge or a system design discussion to get comfortable with articulating your thought process under pressure.

✨Align with Their Mission

Familiarise yourself with the company’s goals around making large language models faster and more accessible. Be prepared to discuss how your skills and experiences align with their mission. Showing that you understand and are excited about their objectives can set you apart from other candidates.

Member of Technical Staff: LLM Inference Systems in London

Doubleword

Location: London

Apply now

Member of Technical Staff: LLM Inference Systems in London

At a Glance

Member of Technical Staff: LLM Inference Systems in London employer: Doubleword

StudySmarter Expert Advice 🤫

✨Tip Number 1

✨Tip Number 2

✨Tip Number 3

✨Tip Number 4

We think you need these skills to ace Member of Technical Staff: LLM Inference Systems in London

Some tips for your application 🫡

How to prepare for a job interview at Doubleword

Member of Technical Staff: LLM Inference Systems in London

Land your dream job quicker with Premium