At a Glance
- Tasks: Build cutting-edge inference engines for diverse hardware and optimise AI systems.
- Company: Join Callosum, a pioneering Intelligent Systems company revolutionising AI infrastructure.
- Benefits: Competitive salary, flexible work options, and opportunities for professional growth.
- Other info: Dynamic team environment focused on innovation and collaboration.
- Why this job: Tackle complex challenges in AI and make a real impact on the future of technology.
- Qualifications: Experience with SGLang, vLLM, high-performance Python, and C++/CUDA.
The predicted salary is between 60000 - 80000 € per year.
About Us
Artificial intelligence scaled on a bet - that bigger models, more identical chips, and more data would keep delivering. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. The next era belongs to heterogeneous intelligence: diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability unreachable by any single model or accelerator.
Callosum is the Intelligent Systems company. We built the infrastructure to make that possible. Our co-evolution engine optimises simultaneously across workflows, agents, and silicon. We launched in early 2026 showing orders of magnitude improvements in performance and a shift in the cost-performance frontier that no single chip or model provider can provide. We believe intelligence comes from the system, not the model. We are scientists and engineers solving what others consider impossible. If you thrive on hard problems, and are passionate and energised by the scale of the challenge, we'd love to hear from you.
About the Role
Callosum believes that orders of magnitude improvements in AI systems will come through application-aware orchestration across heterogeneous hardware. We are building that vision: infrastructure that treats the full landscape of compute as a unified, co-evolving system, evolved beyond GPUs. Inference engines were designed for single-model inference on homogeneous GPU clusters - this role builds them beyond that. Working directly on systems like vLLM and SGLang, you will adapt and extend them for heterogeneous resources, making them hardware-aware, with deeper optimisation around scheduling, memory, and execution. The execution strategies you design - parallelism, disaggregation, caching - will define what heterogeneous inference looks like at production scale. Your work ensures that the capabilities exposed by the lower layers of the stack translate into real, measurable gains, the new standard for how inference runs on diverse hardware.
What You'll Build
- Contribute upstream to SGLang and vLLM, and maintain internal forks where our requirements diverge
- Improve hardware-awareness within inference engines so that scheduling, memory management, and execution adapt to the capabilities of the underlying accelerator
- Design and implement bespoke parallelism and disaggregation strategies that go beyond default configurations to better exploit heterogeneous hardware
- Work closely with an Accelerator Systems Software engineer to ensure engine-level abstractions map cleanly onto diverse hardware capabilities
What You Bring
- Deep familiarity with the internals of SGLang, vLLM, or comparable inference serving frameworks - scheduler design, memory management, and execution pipelines
- Strong background in high-performance Python and C++/CUDA systems, particularly in the context of ML inference
- Experience designing or implementing parallelism strategies for large model serving
- Understanding of disaggregated serving architectures and the tradeoffs involved in separating modules of a workflow
- Demonstrable record of working effectively in fast-moving open source codebases with evolving APIs and design conventions
Inference Engine Development - Member of Technical Staff in London employer: Callosum
At Callosum, we pride ourselves on being at the forefront of intelligent systems, fostering a culture of innovation and collaboration. Our team thrives in a dynamic environment where tackling complex challenges is not just encouraged but celebrated, offering ample opportunities for professional growth and development. Located in a vibrant tech hub, we provide our employees with access to cutting-edge resources and a supportive community that values diverse perspectives and ideas.
StudySmarter Expert Advice🤫
We think this is how you could land Inference Engine Development - Member of Technical Staff in London
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can refer you directly.
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your projects related to inference engines or heterogeneous systems. This gives potential employers a taste of what you can do beyond your CV.
✨Tip Number 3
Prepare for technical interviews by brushing up on your knowledge of SGLang and vLLM. Practice coding challenges and system design questions that focus on parallelism and memory management to impress your interviewers.
✨Tip Number 4
Apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining our team at Callosum.
We think you need these skills to ace Inference Engine Development - Member of Technical Staff in London
Some tips for your application 🫡
Show Your Passion for AI:When writing your application, let your enthusiasm for artificial intelligence shine through. We want to see how excited you are about tackling complex problems and pushing the boundaries of what's possible in AI systems.
Tailor Your Experience:Make sure to highlight your experience with SGLang, vLLM, or similar frameworks. We’re looking for specific examples of how you've worked on scheduling, memory management, and execution pipelines, so don’t hold back!
Be Clear and Concise:Keep your application straightforward and to the point. We appreciate clarity, so avoid jargon unless it’s necessary. Make it easy for us to see how your skills align with the role.
Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows you’re keen to join our team!
How to prepare for a job interview at Callosum
✨Know Your Frameworks Inside Out
Make sure you’re well-versed in SGLang, vLLM, and any comparable inference serving frameworks. Brush up on scheduler design, memory management, and execution pipelines. Being able to discuss these topics confidently will show that you’re not just familiar but truly knowledgeable.
✨Showcase Your Coding Skills
Since the role requires a strong background in high-performance Python and C++/CUDA systems, be prepared to demonstrate your coding skills. You might be asked to solve a problem on the spot, so practice coding challenges related to ML inference beforehand.
✨Understand Heterogeneous Systems
Dive deep into the concepts of heterogeneous hardware and disaggregated serving architectures. Be ready to discuss the trade-offs involved in separating modules of a workflow. This understanding will be crucial in showcasing your fit for the role.
✨Prepare for Open Source Discussions
Since the company values contributions to open source codebases, think about your past experiences in this area. Be ready to talk about how you've navigated evolving APIs and design conventions, and how you can bring that experience to Callosum.