Inference Engine Development - Member of Technical Staff

Job Board

Companies

Callosum

Inference Engine Development - Member of Technical Staff

Full-Time 60000 - 80000 £ / year (est.) No working from home possible

Apply Now

At a Glance

Tasks: Build cutting-edge inference engines for diverse AI systems and optimise performance across hardware.
Company: Join Callosum, a pioneering company in intelligent systems and AI infrastructure.
Benefits: Competitive salary, flexible work options, and opportunities for professional growth.
Other info: Dynamic team environment focused on solving the impossible.
Why this job: Tackle complex challenges and shape the future of AI with innovative technology.
Qualifications: Experience with SGLang, vLLM, high-performance Python, and C++/CUDA.

The predicted salary is between 60000 - 80000 £ per year.

About Us

Artificial intelligence scaled on a bet - that bigger models, more identical chips, and more data would keep delivering. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. The next era belongs to heterogeneous intelligence: diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability unreachable by any single model or accelerator. Callosum is the Intelligent Systems company. We built the infrastructure to make that possible. Our co-evolution engine optimises simultaneously across workflows, agents, and silicon. We launched in early 2026 showing orders of magnitude improvements in performance and a shift in the cost-performance frontier that no single chip or model provider can provide. We believe intelligence comes from the system, not the model. We are scientists and engineers solving what others consider impossible. If you thrive on hard problems, and are passionate and energised by the scale of the challenge, we'd love to hear from you.

About the Role

Callosum believes that orders of magnitude improvements in AI systems will come through application-aware orchestration across heterogeneous hardware. We are building that vision: infrastructure that treats the full landscape of compute as a unified, co-evolving system, evolved beyond GPUs. Inference engines were designed for single-model inference on homogeneous GPU clusters - this role builds them beyond that. Working directly on systems like vLLM and SGLang, you will adapt and extend them for heterogeneous resources, making them hardware-aware, with deeper optimisation around scheduling, memory, and execution. The execution strategies you design - parallelism, disaggregation, caching - will define what heterogeneous inference looks like at production scale. Your work ensures that the capabilities exposed by the lower layers of the stack translate into real, measurable gains, the new standard for how inference runs on diverse hardware.

What You'll Build

Contribute upstream to SGLang and vLLM, and maintain internal forks where our requirements diverge
Improve hardware-awareness within inference engines so that scheduling, memory management, and execution adapt to the capabilities of the underlying accelerator
Design and implement bespoke parallelism and disaggregation strategies that go beyond default configurations to better exploit heterogeneous hardware
Work closely with an Accelerator Systems Software engineer to ensure engine-level abstractions map cleanly onto diverse hardware capabilities

What You Bring

Deep familiarity with the internals of SGLang, vLLM, or comparable inference serving frameworks - scheduler design, memory management, and execution pipelines
Strong background in high-performance Python and C++/CUDA systems, particularly in the context of ML inference
Experience designing or implementing parallelism strategies for large model serving
Understanding of disaggregated serving architectures and the tradeoffs involved in separating modules of a workflow
Demonstrable record of working effectively in fast-moving open source codebases with evolving APIs and design conventions

Inference Engine Development - Member of Technical Staff employer: Callosum

At Callosum, we pride ourselves on being an exceptional employer that fosters a culture of innovation and collaboration. Our team thrives in a dynamic environment where tackling complex challenges is not just encouraged but celebrated, offering ample opportunities for professional growth and development. Located at the forefront of AI technology, we provide our employees with access to cutting-edge resources and a supportive community that values diverse perspectives and ideas.

Contact Details:

Callosum Recruitment Team

View Callosum profile

StudySmarter Expert Advice🤫

We think this is how you could land Inference Engine Development - Member of Technical Staff

✨Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can put in a good word for you.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects related to inference engines or heterogeneous systems. This gives potential employers a taste of what you can do and sets you apart from the crowd.

✨Tip Number 3

Prepare for technical interviews by brushing up on your knowledge of SGLang, vLLM, and parallelism strategies. Practice coding challenges and system design questions that relate to high-performance Python and C++/CUDA systems.

✨Tip Number 4

Don’t forget to apply through our website! We love seeing passionate candidates who resonate with our mission. Tailor your application to highlight how your experience aligns with our vision of heterogeneous intelligence.

We think you need these skills to ace Inference Engine Development - Member of Technical Staff

SGLang

vLLM

Inference Serving Frameworks

Scheduler Design

Memory Management

Execution Pipelines

High-Performance Python

C++

CUDA

Parallelism Strategies

Disaggregated Serving Architectures

Open Source Codebases

API Design

Adaptability

Some tips for your application 🫡

Show Your Passion:When you're writing your application, let your enthusiasm for AI and complex problem-solving shine through. We want to see that you're not just ticking boxes but genuinely excited about the challenges we tackle at Callosum.

Tailor Your Experience:Make sure to highlight your experience with SGLang, vLLM, or similar frameworks. We’re looking for specific examples of how you've worked on scheduling, memory management, or execution pipelines, so don’t hold back!

Be Clear and Concise:While we love detail, clarity is key! Keep your application straightforward and to the point. Use bullet points if it helps to make your achievements stand out – we want to see what you can bring to the table without wading through fluff.

Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way to ensure your application gets into the right hands. Plus, it shows us you’re keen on being part of our team at Callosum!

How to prepare for a job interview at Callosum

✨Know Your Frameworks

Make sure you’re well-versed in SGLang, vLLM, or similar inference serving frameworks. Brush up on scheduler design, memory management, and execution pipelines. Being able to discuss your experience with these frameworks will show that you’re ready to hit the ground running.

✨Showcase Your Coding Skills

Since high-performance Python and C++/CUDA systems are crucial for this role, prepare to demonstrate your coding skills. You might be asked to solve a problem on the spot, so practice coding challenges related to ML inference to build your confidence.

✨Understand Heterogeneous Systems

Familiarise yourself with the concept of heterogeneous hardware and how it impacts inference engines. Be ready to discuss your understanding of disaggregated serving architectures and the trade-offs involved. This knowledge will help you stand out as someone who can contribute to the team’s vision.

✨Be Ready for Open Source Discussions

Since the role involves working with fast-moving open source codebases, prepare to talk about your past experiences in this area. Highlight any contributions you've made, how you adapted to evolving APIs, and how you collaborated with others in the community.

Inference Engine Development - Member of Technical Staff

Callosum

Apply Now

Inference Engine Development - Member of Technical Staff

At a Glance

Inference Engine Development - Member of Technical Staff employer: Callosum

StudySmarter Expert Advice🤫

We think you need these skills to ace Inference Engine Development - Member of Technical Staff

Some tips for your application 🫡

How to prepare for a job interview at Callosum

Company

Product

Help