Inference Performance & Deployment - Member of Technical Staff

Job Board

Companies

Callosum

Inference Performance & Deployment - Member of Technical Staff

Full-Time 60000 - 80000 £ / year (est.) No working from home possible

Apply Now

At a Glance

Tasks: Run experiments and optimise AI performance across diverse hardware.
Company: Join Callosum, a pioneering Intelligent Systems company revolutionising AI infrastructure.
Benefits: Competitive salary, flexible work options, and opportunities for professional growth.
Other info: Dynamic team environment with a focus on collaboration and cutting-edge technology.
Why this job: Be at the forefront of AI innovation and tackle complex challenges head-on.
Qualifications: Experience in deploying large model inference and strong performance characterisation skills.

The predicted salary is between 60000 - 80000 £ per year.

About Us

Artificial intelligence scaled on a bet - that bigger models, more identical chips, and more data would keep delivering. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. The next era belongs to heterogeneous intelligence: diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability unreachable by any single model or accelerator. Callosum is the Intelligent Systems company. We built the infrastructure to make that possible. Our co-evolution engine optimises simultaneously across workflows, agents, and silicon. We launched in early 2026 showing orders of magnitude improvements in performance and a shift in the cost-performance frontier that no single chip or model provider can provide. We believe intelligence comes from the system, not the model. We are scientists and engineers solving what others consider impossible. If you thrive on hard problems, and are passionate and energised by the scale of the challenge, we'd love to hear from you.

About the Role

Callosum believes that orders of magnitude improvements in AI systems will come through application-aware orchestration across heterogeneous hardware. We are building that vision: infrastructure that treats the full landscape of compute as a unified, co-evolving system, evolved beyond GPUs. This role owns the bridge between Callosum's internal engineering and the real world. You design the tooling and methodologies that ground our technology in real-world performance and behaviour, sitting at the integration point of every engineering function. You will be the first to run our heterogeneous infrastructure in production-equivalent conditions, systematically characterising performance, identifying bottlenecks, and driving decisions on production-readiness. Your work ensures that every layer of the stack is guided by empirical evidence rather than assumption.

What You’ll Build

Run experiments self-hosting models on cloud instances or on-prem across providers and hardware configurations, systematically characterising performance envelopes.
Develop and maintain deployment patterns that are reproducible, measurable, and optimised for latency, throughput, and cost.
Work at the orchestration and routing software that sits above the inference engine - to improve caching, request scheduling, batching, and resource allocation.
Act as the integration point for the other roles: consume new accelerator support, engine features, and infrastructure upgrades – to provide high-quality feedback on bottlenecks, essential capabilities, and guide the stack optimisations.
Build and maintain benchmarking harnesses, regression suites, and performance dashboards that give the team a shared view of system health and progress.

What You Bring

Experience deploying and benchmarking large model inference in production or production-equivalent environments.
Familiarity with multi-node GPU deployments and associated networking/communication stacks.
Strong end-to-end performance characterisation skills: able to isolate whether a bottleneck is in the network, the runtime, the memory subsystem, or the model itself.
Familiarity with serving frameworks like Dynamo, Triton Inference Server, or similar orchestration layers.
Clear communication skills - able to translate performance data into actionable, prioritised feedback for the teams building the underlying systems.
A demonstrable disciplined and systematic approach to deployment: reproducibility, measurement methodology, controlled comparisons, etc.

Inference Performance & Deployment - Member of Technical Staff employer: Callosum

At Callosum, we are at the forefront of revolutionising artificial intelligence through our innovative approach to heterogeneous intelligence. Our collaborative work culture fosters creativity and problem-solving, empowering employees to tackle complex challenges while enjoying opportunities for professional growth and development. Located in a dynamic environment, we offer competitive benefits and a unique chance to contribute to groundbreaking technology that shapes the future of AI.

Contact Details:

Callosum Recruitment Team

View Callosum profile

StudySmarter Expert Advice🤫

We think this is how you could land Inference Performance & Deployment - Member of Technical Staff

✨Tip Number 1

Network, network, network! Get out there and connect with people in the industry. Attend meetups, webinars, or even just chat with folks on LinkedIn. You never know who might have the inside scoop on job openings or can put in a good word for you.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects and experiments. This is your chance to demonstrate your hands-on experience with deploying large model inference and performance characterisation. Make it easy for potential employers to see what you can do!

✨Tip Number 3

Prepare for interviews by brushing up on your technical knowledge and problem-solving skills. Be ready to discuss your experience with multi-node GPU deployments and how you've tackled bottlenecks in the past. Practice makes perfect, so consider mock interviews with friends or mentors.

✨Tip Number 4

Don't forget to apply through our website! We love seeing passionate candidates who resonate with our mission at Callosum. Tailor your application to highlight your relevant experience and enthusiasm for tackling complex challenges in AI systems.

We think you need these skills to ace Inference Performance & Deployment - Member of Technical Staff

Performance Characterisation

Benchmarking

Cloud Deployment

Multi-node GPU Deployments

Networking and Communication Stacks

Serving Frameworks (e.g., Dynamo, Triton Inference Server)

Latency Optimisation

Throughput Measurement

Cost Optimisation

Integration Skills

Clear Communication

Systematic Approach to Deployment

Reproducibility

Measurement Methodology

Some tips for your application 🫡

Show Your Passion:When writing your application, let your enthusiasm for tackling complex problems shine through. We want to see that you're not just looking for a job, but that you're genuinely excited about the challenges we face at Callosum.

Be Specific About Your Experience:Make sure to highlight your relevant experience with deploying and benchmarking large model inference. We love details, so share specific examples of how you've tackled similar challenges in production environments.

Communicate Clearly:Your ability to translate performance data into actionable insights is crucial. Use clear and concise language in your application to demonstrate your communication skills, as this will be key in our collaborative environment.

Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows you’re proactive and keen to join our team!

How to prepare for a job interview at Callosum

✨Know Your Tech Inside Out

Make sure you’re well-versed in the technologies mentioned in the job description, like multi-node GPU deployments and serving frameworks. Brush up on your knowledge of performance characterisation skills, as you'll need to demonstrate how you can isolate bottlenecks effectively.

✨Prepare Real-World Examples

Think of specific instances where you've deployed large model inference in production or production-equivalent environments. Be ready to discuss the challenges you faced, how you overcame them, and the impact of your work. This will show your practical experience and problem-solving skills.

✨Communicate Clearly

Practice translating complex performance data into simple, actionable insights. The interviewers will want to see that you can communicate effectively with both technical and non-technical team members. Consider preparing a few examples where your communication made a difference in a project.

✨Show Your Passion for Innovation

Since Callosum is all about pushing boundaries, express your enthusiasm for tackling hard problems and your interest in heterogeneous intelligence. Share any personal projects or research that align with their vision, as this will demonstrate your genuine interest in the role and the company.

Inference Performance & Deployment - Member of Technical Staff

Callosum

Apply Now

Inference Performance & Deployment - Member of Technical Staff

At a Glance

Inference Performance & Deployment - Member of Technical Staff employer: Callosum

StudySmarter Expert Advice🤫

We think you need these skills to ace Inference Performance & Deployment - Member of Technical Staff

Some tips for your application 🫡

How to prepare for a job interview at Callosum

Company

Product

Help