Inference Performance & Deployment - Member of Technical Staff in London

Job Board

Companies

Callosum

Inference Performance & Deployment - Member of Technical Staff

Inference Performance & Deployment - Member of Technical Staff in London

London Full-Time 60000 - 80000 € / year (est.) Home office (partial)

Apply Now

At a Glance

Tasks: Design and optimise tooling for AI performance in real-world conditions.
Company: Join Callosum, a pioneering Intelligent Systems company revolutionising AI infrastructure.
Benefits: Competitive salary, flexible work options, and opportunities for professional growth.
Other info: Collaborative environment with a focus on empirical evidence and real-world impact.
Why this job: Be at the forefront of AI innovation and tackle complex challenges head-on.
Qualifications: Experience with large model inference and strong performance characterisation skills.

The predicted salary is between 60000 - 80000 € per year.

About Us

Artificial intelligence scaled on a bet - that bigger models, more identical chips, and more data would keep delivering. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. The next era belongs to heterogeneous intelligence: diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability unreachable by any single model or accelerator.

Callosum is the Intelligent Systems company. We built the infrastructure to make that possible. Our co-evolution engine optimises simultaneously across workflows, agents, and silicon. We launched in early 2026 showing orders of magnitude improvements in performance and a shift in the cost-performance frontier that no single chip or model provider can provide. We believe intelligence comes from the system, not the model. We are scientists and engineers solving what others consider impossible. If you thrive on hard problems, and are passionate and energised by the scale of the challenge, we'd love to hear from you.

About the Role

Callosum believes that orders of magnitude improvements in AI systems will come through application-aware orchestration across heterogeneous hardware. We are building that vision: infrastructure that treats the full landscape of compute as a unified, co-evolving system, evolved beyond GPUs. This role owns the bridge between Callosum's internal engineering and the real world. You design the tooling and methodologies that ground our technology in real-world performance and behaviour, sitting at the integration point of every engineering function. You will be the first to run our heterogeneous infrastructure in production-equivalent conditions, systematically characterising performance, identifying bottlenecks, and driving decisions on production-readiness. Your work ensures that every layer of the stack is guided by empirical evidence rather than assumption.

What You’ll Build

Run experiments self-hosting models on cloud instances or on-prem across providers and hardware configurations, systematically characterising performance envelopes.
Develop and maintain deployment patterns that are reproducible, measurable, and optimised for latency, throughput, and cost.
Work at the orchestration and routing software that sits above the inference engine - to improve caching, request scheduling, batching, and resource allocation.
Act as the integration point for the other roles: consume new accelerator support, engine features, and infrastructure upgrades – to provide high-quality feedback on bottlenecks, essential capabilities, and guide the stack optimisations.
Build and maintain benchmarking harnesses, regression suites, and performance dashboards that give the team a shared view of system health and progress.

What You Bring

Experience deploying and benchmarking large model inference in production or production-equivalent environments.
Familiarity with multi-node GPU deployments and associated networking/communication stacks.
Strong end-to-end performance characterisation skills: able to isolate whether a bottleneck is in the network, the runtime, the memory subsystem, or the model itself.
Familiarity with serving frameworks like Dynamo, Triton Inference Server, or similar orchestration layers.
Clear communication skills - able to translate performance data into actionable, prioritised feedback for the teams building the underlying systems.
A demonstrable disciplined and systematic approach to deployment: reproducibility, measurement methodology, controlled comparisons, etc.

Inference Performance & Deployment - Member of Technical Staff in London employer: Callosum

At Callosum, we pride ourselves on being at the forefront of intelligent systems, fostering a culture of innovation and collaboration. Our team thrives in an environment that encourages tackling complex challenges, with ample opportunities for professional growth and development. Located in a vibrant tech hub, we offer competitive benefits and a unique chance to contribute to groundbreaking advancements in AI technology.

Contact Detail:

Callosum Recruiting Team

View Callosum Profile

StudySmarter Expert Advice🤫

We think this is how you could land Inference Performance & Deployment - Member of Technical Staff in London

✨Tip Number 1

Network, network, network! Get out there and connect with people in the industry. Attend meetups, webinars, or conferences related to AI and tech. You never know who might have a lead on your dream job!

✨Tip Number 2

Show off your skills! Create a portfolio showcasing your projects, especially those related to deploying large model inference. This will give potential employers a taste of what you can do and how you approach complex problems.

✨Tip Number 3

Don’t just apply blindly! Tailor your approach for each role. Research Callosum and understand their vision. When you reach out, mention specific projects or values that resonate with you – it shows genuine interest.

✨Tip Number 4

Use our website to apply! We love seeing candidates who take the initiative to engage with us directly. Plus, it’s a great way to stay updated on new roles and company news. Let’s make this happen together!

We think you need these skills to ace Inference Performance & Deployment - Member of Technical Staff in London

Performance Characterisation

Benchmarking

Deployment Patterns

Cloud Computing

Multi-node GPU Deployments

Networking/Communication Stacks

Serving Frameworks (Dynamo, Triton Inference Server)

Data Analysis

Systematic Approach to Deployment

Reproducibility

Measurement Methodology

Controlled Comparisons

Clear Communication Skills

Integration and Orchestration

Some tips for your application 🫡

Show Your Passion:When writing your application, let your enthusiasm for tackling complex problems shine through. We want to see that you're not just looking for a job, but that you're genuinely excited about the challenges we face at Callosum.

Tailor Your Experience:Make sure to highlight your relevant experience with deploying and benchmarking large model inference. We’re keen on seeing how your skills align with our needs, so don’t hold back on showcasing your achievements in this area!

Be Clear and Concise:We appreciate clear communication, so keep your application straightforward. Use bullet points or short paragraphs to make it easy for us to digest your key points, especially when discussing your performance characterisation skills.

Apply Through Our Website:Don’t forget to submit your application through our website! It’s the best way for us to receive your details and ensures you’re considered for the role. Plus, it shows you’re serious about joining our team!

How to prepare for a job interview at Callosum

✨Know Your Stuff

Make sure you brush up on your knowledge of large model inference and multi-node GPU deployments. Familiarise yourself with the specific technologies mentioned in the job description, like Dynamo or Triton Inference Server. Being able to discuss these confidently will show that you're serious about the role.

✨Showcase Your Problem-Solving Skills

Prepare examples from your past experiences where you've tackled complex problems, especially in performance characterisation or deployment. Be ready to explain how you identified bottlenecks and what steps you took to resolve them. This will demonstrate your analytical thinking and hands-on experience.

✨Communicate Clearly

Practice translating technical performance data into clear, actionable insights. During the interview, focus on how you can convey complex information simply and effectively. This is crucial for the role, as you'll need to provide feedback to various teams, so showing you can do this well will set you apart.

✨Ask Insightful Questions

Prepare thoughtful questions about Callosum's approach to heterogeneous intelligence and their co-evolution engine. This not only shows your interest in the company but also gives you a chance to demonstrate your understanding of the challenges they face. It’s a great way to engage with the interviewers and leave a lasting impression.

Inference Performance & Deployment - Member of Technical Staff in London

Callosum

Location: London

Apply Now

Inference Performance & Deployment - Member of Technical Staff in London

At a Glance

Inference Performance & Deployment - Member of Technical Staff in London employer: Callosum

StudySmarter Expert Advice🤫

We think you need these skills to ace Inference Performance & Deployment - Member of Technical Staff in London

Some tips for your application 🫡

How to prepare for a job interview at Callosum

Company

Product

Help