At a Glance
- Tasks: Build and scale infrastructure for real-time AI inference across GPU fleets and servers.
- Company: Join a remote-first collective revolutionising AI products with cutting-edge technology.
- Benefits: Enjoy flexible hours, generous paid time off, and meaningful stock options.
- Other info: Collaborative culture with twice-yearly retreats and a focus on work-life balance.
- Why this job: Make a real impact on the future of AI while working in a dynamic environment.
- Qualifications: Strong DevOps experience and deep Linux knowledge required.
The predicted salary is between 70000 - 90000 £ per year.
Runware is building the API layer for the next generation of AI products. Our platform gives teams fast, reliable access to real-time inference across thousands of models through a single flexible API. We help customers build and scale media generation products with better performance, lower cost, and less operational complexity.
Behind this is an infrastructure platform built for speed, reliability, and GPU scale. New models launch constantly. Customer traffic can grow quickly. Performance matters at every layer.
We are looking for a Staff/Senior DevOps Engineer to help build, operate, and scale the infrastructure behind Runware’s global AI inference platform. You’ll play a critical role in making our systems faster, more resilient, easier to operate, and ready for the next stage of growth.
Runware’s infrastructure is the engine behind some of the fastest-growing AI products in the world. As a Staff/Senior DevOps Engineer, you’ll help design, build, and operate the systems that power real-time AI inference across large-scale GPU fleets and a global production platform.
This is not a traditional DevOps role. You’ll be working at the intersection of bare-metal infrastructure, GPUs, networking, automation, observability, and high-performance distributed systems. Your work will directly shape how quickly we can launch new models, scale customer traffic, recover from failures, and deliver low-latency AI experiences to millions of users.
You’ll turn complex, hardware-driven infrastructure into reliable, automated, developer-friendly platforms. From provisioning and orchestration to deployment pipelines, monitoring, incident response, and capacity scaling, you’ll help remove friction so engineering teams can move faster without compromising reliability.
You’ll build the foundations that let Runware scale with confidence: infrastructure that is fast, resilient, observable, secure, and built for the demands of real-time AI.
What you’ll do
- Build and scale the infrastructure that powers real-time AI inference across GPU fleets, bare-metal servers, serverless and containerised production systems.
- Help evolve Runware’s platform toward more elastic, on-demand infrastructure that can scale quickly with customer traffic and model demand.
- Make Runware faster, more reliable and more resilient by improving the critical paths behind our request entrypoints, inference services, queues, storage, load balancers and networking layer.
- Automate the hard parts of infrastructure operations, from provisioning and configuration through to CI/CD, deployment safety, progressive rollouts and rapid rollback.
- Build the observability backbone for a high-performance AI platform, with the signals needed to spot issues early, understand capacity and fix problems before customers feel them.
- Play a leading role in production operations, incident response, debugging and post-incident improvements, helping us turn operational challenges into a stronger platform.
- Strengthen the security and compliance foundations of our infrastructure through patching, secrets management, access controls, hardening, auditability, documentation and repeatable operational processes.
Requirements
- Strong experience as a DevOps Engineer, SRE, Infrastructure Engineer, Platform Engineer or similar, with a track record of running production systems at scale.
- Deep Linux knowledge and confidence debugging real production issues across networking, storage, performance, services and system behaviour.
- Hands-on experience building automation, Infrastructure-as-Code, CI/CD pipelines and deployment workflows that make infrastructure safer and easier to operate.
- Experience operating high-availability, low-latency or high-throughput platforms where reliability and performance directly affect customers.
- Strong networking fundamentals across TCP/IP, DNS, load balancing, routing, firewalls, proxies, TLS and HTTP.
- A calm and pragmatic approach under pressure, with strong communication, good judgement and a bias toward automation over manual toil.
Bonus
- Experience operating GPU infrastructure for AI/ML inference, including NVIDIA drivers, CUDA, container runtimes, GPU monitoring, capacity planning and workload isolation.
- Familiarity with inference serving and optimisation frameworks such as vLLM, TensorRT, Triton or similar.
Benefits
- We’re a remote-first collective, meeting in person twice a year to plan, brainstorm, celebrate wins, and enjoy some face-to-face time.
- We have core hours for cooperative working and calls, but outside of that your calendar is yours. Work the hours that let you perform at your peak while also building a healthy life.
- Our release cycles are fast and intense, but they’re followed by real downtime. After big pushes we expect the team to unplug, recharge, and come back ready & stronger than ever for the next leap.
- Generous paid time off – vacation, sick days, public holidays.
- Meaningful stock options – share in the upside you create.
- Remote-first setup – work from home anywhere we can employ you.
- Flexible hours – own your schedule outside core collaboration blocks.
- Family leave – paid maternity, paternity, and caregiver time.
- Company retreats – twice-yearly gatherings in inspiring locations.
Staff DevOps Engineer employer: Runware
Contact Detail:
Runware Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Staff DevOps Engineer
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can put in a good word for you.
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your projects and contributions. This is a great way to demonstrate your expertise in DevOps and make a lasting impression on potential employers.
✨Tip Number 3
Prepare for interviews by practising common DevOps scenarios and technical questions. We recommend doing mock interviews with friends or using online platforms to get comfortable with the format and types of questions you might face.
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining our team at Runware.
We think you need these skills to ace Staff DevOps Engineer
Some tips for your application 🫡
Tailor Your CV: Make sure your CV reflects the skills and experiences that align with the Staff DevOps Engineer role. Highlight your experience with automation, infrastructure-as-code, and any relevant projects that showcase your ability to operate high-performance systems.
Craft a Compelling Cover Letter: Use your cover letter to tell us why you're passionate about AI and how your background makes you a great fit for our team. Share specific examples of how you've tackled challenges in previous roles, especially those related to scaling infrastructure and improving reliability.
Showcase Your Problem-Solving Skills: In your application, don’t shy away from discussing complex problems you've solved in past positions. We love seeing how you approach challenges, especially in high-pressure situations, so give us the details on your thought process and outcomes.
Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it gives you a chance to explore more about our culture and values!
How to prepare for a job interview at Runware
✨Know Your Tech Inside Out
Make sure you’re well-versed in the technologies mentioned in the job description, especially around Linux, networking, and automation. Brush up on your experience with CI/CD pipelines and Infrastructure-as-Code, as these will likely come up during technical discussions.
✨Showcase Your Problem-Solving Skills
Prepare to discuss specific challenges you've faced in previous roles, particularly around high-availability systems or low-latency platforms. Be ready to explain how you approached these issues, what solutions you implemented, and the outcomes of your actions.
✨Demonstrate Your Automation Mindset
Since the role emphasises automation, think of examples where you’ve automated processes or improved operational efficiency. Highlight any tools or frameworks you’ve used, and be prepared to discuss how these changes positively impacted your team’s workflow.
✨Be Ready for Scenario-Based Questions
Expect scenario-based questions that test your ability to handle real-world situations, such as incident response or debugging under pressure. Practice articulating your thought process clearly, as communication is key in a collaborative environment like Runware.