At a Glance
- Tasks: Build and optimise AI inference systems for cutting-edge multimodal models.
- Company: Join a pioneering AI company pushing the boundaries of superintelligence.
- Benefits: Competitive salary, hybrid work, and opportunities for professional growth.
- Other info: Exciting career development opportunities and a multicultural team atmosphere.
- Why this job: Shape the future of AI while collaborating with top talent in a dynamic environment.
- Qualifications: Strong software engineering skills and experience with deep learning frameworks.
The predicted salary is between 60000 - 80000 £ per year.
About H: H exists to push the boundaries of superintelligence with agentic AI. By automating complex, multi-step tasks typically performed by humans, AI agents will help unlock full human potential. H is hiring the world's best AI talent, seeking those who are dedicated as much to building safely and responsibly as to advancing disruptive agentic capabilities. We promote a mindset of openness, learning, and collaboration, where everyone has something to contribute.
About the Team: The Inference team builds and operates the systems that serve H's foundational models in production. We focus on multimodal inference and serving for Computer Use Agents, optimizing across both the inference engine layer (e.g., vLLM, SGLang) and the model serving layer (e.g., disaggregated inference, intelligent routing). Agentic inference brings constraints around context length, multimodality, and tool calls, which we address by co-designing with the Models team on training-time choices and with the agent teams on how models are deployed. We operate at the intersection of research and production, translating cutting-edge inference techniques into the systems that power H's next generation of agents. We are looking for strong engineers excited about inference to join the team and help shape the systems behind superintelligent AI.
Key Responsibilities:
- Build and operate the inference stack that serves H's multimodal agentic models
- Improve latency, throughput, and cost of model serving across the stack
- Research and implement inference techniques tailored to agent workloads
- Co-design with the Models team on training-time decisions that affect inference
- Collaborate with cross-functional teams to integrate inference into agentic AI products
- Evaluate inference, serving, and hardware platforms, and communicate findings to stakeholders
- Stay current with advancements in inference, model serving, and accelerator technology
Requirements:
Technical skills:
- Strong software engineering track record
- Proficient in Python and at least one systems language (Rust, C++, or Go)
- Hands-on experience with deep learning frameworks (PyTorch, JAX), preferably in an industry setting
- Solid distributed systems fundamentals
- Experience working in a modern cloud environment and with production ML infrastructure (Kubernetes, etc.)
- Working knowledge of modern ML, including transformers and multimodal architectures
Research skills:
- Research engagement: an advanced degree with research output, or publications at top-tier AI or systems venues (e.g., NeurIPS, ICML, MLSys, OSDI), research internships, or substantive open-source contributions
Soft skills:
- Excellent communication and presentation skills
- Strong collaboration and teamwork skills
- Passion for inference and AI
Preferred qualifications:
- Startup experience
- Hands-on experience with inference frameworks (vLLM, SGLang, TensorRT-LLM)
- Writing or modifying GPU kernels (CUDA, Triton, etc.)
- Edge or on-device inference experience (llama.cpp, MLX, ONNX Runtime, etc.)
- Experience with quantization, speculative decoding, disaggregated inference or KV-cache compression
- Experience with multimodal models and/or agentic systems
Location: Paris or London. This role is hybrid, and you are expected to be in the office 3 days a week on average. Please expect some travel between offices on a reasonable cadence (e.g., every 4-6 weeks).
What We Offer:
- Join the exciting journey of shaping the future of AI
- Collaborate with a fun, dynamic and multicultural team, working alongside world-class AI talent in a highly collaborative environment
- Enjoy a competitive salary
- Unlock opportunities for professional growth, continuous learning, and career development
Research Engineer, Model Inference & Serving - London employer: H Company
Contact Detail:
H Company Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Research Engineer, Model Inference & Serving - London
✨Tip Number 1
Network like a pro! Reach out to folks in the AI and tech community, especially those who work at H or similar companies. Attend meetups, webinars, or conferences to make connections that could lead to job opportunities.
✨Tip Number 2
Show off your skills! Create a portfolio showcasing your projects, especially those related to inference and AI. This can be a game-changer when it comes to standing out during interviews.
✨Tip Number 3
Prepare for technical interviews by brushing up on your coding skills and understanding of deep learning frameworks. Practice common algorithms and system design questions to impress the interviewers.
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining our team.
We think you need these skills to ace Research Engineer, Model Inference & Serving - London
Some tips for your application 🫡
Tailor Your CV: Make sure your CV reflects the skills and experiences that align with the Research Engineer role. Highlight your software engineering track record, especially in Python and any systems languages you know. We want to see how your background fits into our mission!
Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Share your passion for inference and AI, and explain why you're excited about joining our team. Be sure to mention any relevant research or projects that showcase your expertise.
Showcase Your Research Experience: If you've got publications or research internships under your belt, flaunt them! We love seeing candidates who have engaged with cutting-edge research, so make sure to include any relevant work that demonstrates your capabilities.
Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows you’re keen on being part of our journey at H!
How to prepare for a job interview at H Company
✨Know Your Tech Inside Out
Make sure you’re well-versed in the technical skills listed in the job description. Brush up on Python, deep learning frameworks like PyTorch or JAX, and any systems languages you’ve worked with. Be ready to discuss your past projects and how they relate to inference techniques.
✨Showcase Your Research Experience
If you have an advanced degree or publications, be prepared to talk about them! Highlight any research internships or open-source contributions that demonstrate your engagement with cutting-edge AI topics. This will show your passion for the field and your ability to contribute to their innovative environment.
✨Collaboration is Key
Since the role involves working with cross-functional teams, think of examples where you successfully collaborated with others. Whether it’s co-designing with teams or integrating systems, showing that you can work well with diverse groups will set you apart.
✨Stay Current and Curious
Keep yourself updated on the latest advancements in inference and model serving. Mention any recent articles, papers, or technologies you’ve explored. This not only shows your enthusiasm but also your commitment to continuous learning, which aligns perfectly with their company culture.