Member of Technical Staff: LLM Inference Systems

Job Board

Companies

Doubleword

Member of Technical Staff: LLM Inference Systems

Full-Time 36000 - 60000 € / year (est.) No home office possible

Apply Now

At a Glance

Tasks: Develop cutting-edge inference technology for generative AI and optimise performance.
Company: Join a pioneering team dedicated to advancing large language models.
Benefits: Competitive pay, comprehensive benefits, and growth opportunities in tech.
Other info: Dynamic environment focused on innovation and professional development.
Why this job: Make a real impact in AI by solving complex inference challenges.
Qualifications: Experience with GPU architectures and deep learning libraries is essential.

The predicted salary is between 36000 - 60000 € per year.

About the Role

We are seeking a Senior Research Engineer to join our mission of solving the hardest inference challenges in generative AI. You will be responsible for developing cutting-edge inference technology at all levels of the inference stack. This could involve writing custom kernels for inference, designing compute clusters for unique inference needs, or contributing to state-of-the-art open source inference engines.

What You Will Do

Building and optimizing infrastructure for batch inference workloads: focusing on high throughput, cost-efficient processing.
Inferencing fine-tuned models at scale: using tools like multi LoRA and multi PEFT inference engines.
Optimizing open source inference engines for offloading-based inference: implementing inference optimizations for severely memory constrained environments.

What We Are Looking For

Note: A good candidate will have 80% of the following qualities. Please apply, even if the following doesn’t describe you perfectly.

Core Technical Skills

Understanding of GPU architectures and their performance characteristics.
Deep understanding of LLM inference workloads, performance characteristics, and optimization techniques.
Familiarity with inference tooling and deep learning libraries (PyTorch, TensorRT, vLLM, SGLang, TensorRT-LLM).

Research Mindset

Curiosity about emerging hardware trends and ML optimization techniques.
Ability to understand complex research requirements and translate them into infrastructure needs.
Comfort with ambiguity and rapidly evolving technical landscapes.
Experience supporting research workflows and experimental systems.

About Us

We are dedicated to making large language models faster, cheaper, and more accessible. Our infrastructure team is laser-focused on LLM inference optimization, pushing the boundaries of what is possible in terms of performance and cost efficiency while maintaining the reliability needed to serve these models at scale. We provide competitive compensation, comprehensive benefits, and opportunities for professional growth in one of the most exciting fields in technology.

Member of Technical Staff: LLM Inference Systems employer: Doubleword

Join a pioneering team dedicated to advancing generative AI through innovative LLM inference systems. We offer a collaborative work culture that fosters curiosity and creativity, alongside competitive compensation and comprehensive benefits. With ample opportunities for professional growth in a rapidly evolving field, our company is an excellent employer for those looking to make a meaningful impact in technology.

Contact Detail:

Doubleword Recruiting Team

View Doubleword Profile

StudySmarter Expert Advice🤫

We think this is how you could land Member of Technical Staff: LLM Inference Systems

✨Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can refer you directly.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects related to LLM inference systems. This gives potential employers a taste of what you can do and sets you apart from the crowd.

✨Tip Number 3

Prepare for interviews by brushing up on your technical knowledge and problem-solving skills. Practice common interview questions related to GPU architectures and LLM workloads so you can impress us with your expertise.

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love seeing candidates who are genuinely interested in joining our mission.

We think you need these skills to ace Member of Technical Staff: LLM Inference Systems

GPU Architecture Understanding

LLM Inference Workloads

Performance Optimization Techniques

Inference Tooling Familiarity

Deep Learning Libraries (PyTorch, TensorRT, vLLM, SGLang, TensorRT-LLM)

Research Mindset

Curiosity about Emerging Hardware Trends

ML Optimization Techniques

Infrastructure Development

Batch Inference Workload Optimization

Experimental Systems Support

Comfort with Ambiguity

Adaptability to Rapidly Evolving Technical Landscapes

Some tips for your application 🫡

Tailor Your CV:Make sure your CV reflects the skills and experiences that align with the role of Member of Technical Staff. Highlight your understanding of GPU architectures and any relevant projects you've worked on in LLM inference.

Craft a Compelling Cover Letter:Use your cover letter to showcase your curiosity about emerging hardware trends and ML optimisation techniques. This is your chance to express why you're passionate about generative AI and how you can contribute to our mission.

Showcase Your Research Mindset:In your application, emphasise your ability to navigate complex research requirements. Share examples of how you've translated research needs into practical solutions, especially in rapidly evolving tech landscapes.

Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for this exciting opportunity in LLM inference systems!

How to prepare for a job interview at Doubleword

✨Know Your Tech Inside Out

Make sure you brush up on your understanding of GPU architectures and LLM inference workloads. Be ready to discuss optimisation techniques and how they apply to real-world scenarios. This will show that you're not just familiar with the theory, but you can also translate it into practical solutions.

✨Showcase Your Research Mindset

Prepare to talk about your curiosity for emerging hardware trends and ML optimisation techniques. Think of examples where you've tackled complex research requirements and how you translated them into infrastructure needs. This will demonstrate your ability to adapt in a rapidly evolving technical landscape.

✨Familiarity with Tools is Key

Get comfortable with the inference tooling and deep learning libraries mentioned in the job description, like PyTorch and TensorRT. If you’ve worked with any of these tools, be ready to share specific projects or challenges you faced and how you overcame them.

✨Prepare for Ambiguity

Since the role involves comfort with ambiguity, think of instances where you've thrived in uncertain situations. Be prepared to discuss how you approach problem-solving when the path isn't clear, as this will highlight your adaptability and innovative thinking.

Member of Technical Staff: LLM Inference Systems

Doubleword

Apply Now

Member of Technical Staff: LLM Inference Systems

At a Glance

Member of Technical Staff: LLM Inference Systems employer: Doubleword

StudySmarter Expert Advice🤫

We think you need these skills to ace Member of Technical Staff: LLM Inference Systems

Some tips for your application 🫡

How to prepare for a job interview at Doubleword

Company

Product

Help