Member of technical staff (Inference) - London
Member of technical staff (Inference) - London

Member of technical staff (Inference) - London

London Full-Time 70000 - 90000 £ / year (est.) Home office (partial)
H

At a Glance

  • Tasks: Develop scalable AI inference pipelines and optimise model performance for cutting-edge technology.
  • Company: Join a pioneering AI startup focused on superintelligence and agentic capabilities.
  • Benefits: Competitive salary, hybrid work, and opportunities for professional growth and continuous learning.
  • Other info: Collaborative environment with world-class talent and exciting career development opportunities.
  • Why this job: Be part of a dynamic team shaping the future of AI and making a real impact.
  • Qualifications: MS or PhD in Computer Science or related fields; proficient in Python, Rust, or C/C++.

The predicted salary is between 70000 - 90000 £ per year.

About H: H exists to push the boundaries of superintelligence with agentic AI. By automating complex, multi-step tasks typically performed by humans, AI agents will help unlock full human potential. H is hiring the world’s best AI talent, seeking those who are dedicated as much to building safely and responsibly as to advancing disruptive agentic capabilities. We promote a mindset of openness, learning, and collaboration, where everyone has something to contribute.

About the Team: The Inference team develops and enhances the inference stack for serving H-models that power our agent technology. The team focuses on optimizing hardware utilization to reach high throughput, low latency and cost efficiency in order to deliver a seamless user experience.

Key Responsibilities:

  • Develop scalable, low-latency and cost effective inference pipelines
  • Optimize model performance: memory usage, throughput, and latency, using advanced techniques like distributed computing, model compression, quantization and caching mechanisms
  • Develop specialized GPU kernels for performance-critical tasks like attention mechanisms, matrix multiplications, etc.
  • Collaborate with H research teams on model architectures to enhance efficiency during inference
  • Review state-of-the-art papers to improve memory usage, throughput and latency (Flash attention, Paged Attention, Continuous batching, etc.)
  • Prioritize and implement state-of-the-art inference techniques

Requirements:

Technical skills:

  • MS or PhD in Computer Science, Machine Learning or related fields
  • Proficient in at least one of the following programming languages: Python, Rust or C/C++
  • Experience in GPU programming such as CUDA, Open AI Triton, Metal, etc.
  • Experience in model compression and quantization techniques

Soft skills:

  • Collaborative mindset, thriving in dynamic, multidisciplinary teams
  • Strong communication and presentation skills
  • Eager to explore new challenges

Bonuses:

  • Experience with LLM serving frameworks such as vLLM, TensorRT-LLM, SGLang, llama.cpp, etc.
  • Experience with CUDA kernel programming and NCCL
  • Experience in deep learning inference framework (Pytorch/execuTorch, ONNX Runtime, GGML, etc.)

Location: Paris or London. This role is hybrid, and you are expected to be in the office 3 days a week on average. The final decision for this will lie with the hiring manager for each individual role.

What We Offer:

  • Join the exciting journey of shaping the future of AI, and be part of the early days of one of the hottest AI startups
  • Collaborate with a fun, dynamic and multicultural team, working alongside world-class AI talent in a highly collaborative environment
  • Enjoy a competitive salary
  • Unlock opportunities for professional growth, continuous learning, and career development

If you want to change the status quo in AI, join us.

Member of technical staff (Inference) - London employer: H Company

At H, we are committed to pushing the boundaries of superintelligence with agentic AI, making us an exceptional employer for those passionate about innovation and responsible technology. Our London office fosters a vibrant work culture that values collaboration, continuous learning, and professional growth, offering competitive salaries and the chance to work alongside world-class talent in a dynamic, multicultural environment. Join us to be part of a transformative journey in AI, where your contributions will help unlock human potential.
H

Contact Detail:

H Company Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Member of technical staff (Inference) - London

✨Tip Number 1

Network like a pro! Reach out to people in the AI field, especially those at H. Use LinkedIn or even Twitter to connect and engage with them. A friendly message can go a long way in getting your foot in the door.

✨Tip Number 2

Show off your skills! Create a portfolio showcasing your projects, especially those related to inference pipelines or GPU programming. Share it during interviews or on your social media to grab attention.

✨Tip Number 3

Prepare for technical interviews by brushing up on your knowledge of model compression and quantization techniques. Practice coding challenges in Python, Rust, or C/C++ to demonstrate your proficiency.

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining our team.

We think you need these skills to ace Member of technical staff (Inference) - London

Machine Learning
GPU Programming
CUDA
Model Compression
Quantization Techniques
Python
Rust
C/C++
Distributed Computing
Attention Mechanisms
Collaboration
Communication Skills
Presentation Skills
Deep Learning Inference Frameworks

Some tips for your application 🫡

Tailor Your CV: Make sure your CV reflects the skills and experiences that align with the role. Highlight your technical expertise in programming languages like Python, Rust, or C/C++, and any relevant GPU programming experience. We want to see how you can contribute to our mission!

Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Share your passion for AI and how your background fits with our goals at H. Don’t forget to mention your collaborative mindset and eagerness to tackle new challenges – we love that!

Showcase Your Projects: If you've worked on any projects related to model compression, quantization, or GPU programming, make sure to include them. We’re keen to see real examples of your work and how you’ve tackled complex problems in the past.

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you don’t miss out on any important updates. Plus, it shows you’re serious about joining our team!

How to prepare for a job interview at H Company

✨Know Your Tech Inside Out

Make sure you’re well-versed in the technical skills listed in the job description. Brush up on your knowledge of GPU programming, model compression, and quantization techniques. Being able to discuss these topics confidently will show that you're not just a candidate, but a potential asset to the team.

✨Show Off Your Collaborative Spirit

Since the role emphasises collaboration, be prepared to share examples of how you've worked effectively in multidisciplinary teams. Highlight your communication skills and how you’ve contributed to group projects. This will demonstrate that you fit into their culture of openness and teamwork.

✨Stay Current with Research

Familiarise yourself with state-of-the-art papers related to inference techniques. Being able to discuss recent advancements like Flash attention or Continuous batching can set you apart. It shows that you’re proactive about learning and can bring fresh ideas to the table.

✨Prepare Questions That Matter

Think of insightful questions to ask during the interview. Inquire about the team’s current challenges with inference pipelines or how they approach optimising model performance. This not only shows your interest in the role but also your eagerness to contribute meaningfully from day one.

Member of technical staff (Inference) - London
H Company
Location: London

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>