At a Glance
- Tasks: Build and maintain high-availability systems while pushing the boundaries of AI technology.
- Company: Join a cutting-edge company focused on superintelligence and innovative solutions.
- Benefits: Competitive salary, flexible work options, and opportunities for professional growth.
- Other info: Collaborative environment with a focus on innovation and career advancement.
- Why this job: Make a real impact in the AI space and drive system reliability.
- Qualifications: 5+ years in infrastructure engineering with strong AI tooling experience.
The predicted salary is between 70000 - 90000 £ per year.
Requirements
- 5+ years of experience in infrastructure engineering, DevOps, or a similar role focused on building and operating large-scale, high-availability production systems at a high-growth product company.
- Experience running containerisation in production (a real cluster, not a lab), with experience in Helm and Terraform or Pulumi on at least one major cloud (AWS preferred), plus good proficiency in Python or Go for automation and tooling.
- AI is part of how you ship, not a thing you've read about — agentic tooling (Claude Code, Droid, Codex, internal skills) is in your daily loop, you've built or adopted AI-assisted workflows others now use, and you have strong opinions on where it's unreliable. This is a hard requirement, not a bonus.
- Demonstrated ability to challenge the status quo, proactively identify systemic weaknesses, and propose innovative solutions to complex reliability problems — reason from constraints and failure modes (not analogy or vendor defaults), name the tradeoff in business terms (reliability vs. velocity, cost vs. blast radius, standardisation vs. one-off), and reject the "best practices" answer when it doesn't fit the problem.
- Make reversible calls by default — write the rollback before you touch production, work fluently with monitoring and logging stacks (Prometheus, Grafana, ELK or equivalent), and stress the system in safe places so it comes back stronger.
- Excellent communication, collaboration, and problem-solving skills, with a talent for building strong relationships and connecting with cross-functional teams.
- A strong sense of ownership and accountability, eager to own mission-critical systems and drive them toward peak performance and unparalleled reliability.
- At least one 0-to-1 infrastructure build you owned end-to-end, with the outcome metric attached (Desirable).
- A software-engineering background, not only config and scripting — you've designed, built, and shipped non-trivial production code (services, libraries, internal frameworks) in Python, Go, or a comparable language.
What the job involves
- At WRITER, our mission to expand human capacity with superintelligence relies on a foundational truth: our platform must be available, performant, and reliable, 24/7.
- As an Infrastructure engineer, you'll be at the heart of making this a reality, impacting every enterprise customer who trusts us with their AI-powered workflows.
- This isn't just about keeping the lights on; it's about pushing the boundaries of what's possible, proactively identifying and solving complex systemic challenges, and laying the groundwork for our rapid growth and the evolving demands of enterprise generative AI.
- You'll build resilient systems, automate across the stack, and champion reliability best practices, directly enabling our ambitious product roadmap and ensuring our customers always have access to the powerful tools they need.
- Bring deep focus to one problem at a time, with the breadth to move between SRE, DevOps, Infrastructure, and Platform work over a quarter or two as the leverage shifts.
- Challenge the status quo and remove toil before adding features — automate operational tasks and infrastructure management with Python or Go, reject tools that don't fit the problem, and treat manual on-call work as a defect to be designed out, not a status quo to be staffed up.
- Design scalable, fault-tolerant infrastructure across AWS (preferred), GCP, and Azure, working fluently across Kubernetes, Helm, Terraform, and the supporting cloud and AI tooling that backs WRITER's high-traffic platform.
- Run agents in your daily loop — Claude Code, Droid, Codex, internal skills — to investigate incidents, draft Terraform / Helm changes, write runbooks, scaffold tooling, and review PRs.
- Lead incident response, post-mortems, and root-cause analyses — trace failures to the underlying problem (never the symptom), apply the learning back into the architecture, and prevent the same incident from happening twice.
- Own the reliability, performance, and efficiency of WRITER's core services end-to-end — define and uphold the SLOs and error budgets, carry the on-call pager, and stand behind the outcome metric, not just the system you shipped.
- Balance this week's critical work with the 6–12-month platform direction — ship the on-call-driving fix today while shaping the multi-year observability, cost, and reliability investments that move WRITER's enterprise customers.
- Operate at the seams with product, security, and engineering peers — provide expert guidance on system design for reliability, performance, and scalability from conception through launch.
Infrastructure Engineer employer: Writer
At WRITER, we pride ourselves on being an exceptional employer that fosters a culture of innovation and collaboration. Our Infrastructure Engineers play a pivotal role in shaping the future of AI-powered workflows, with ample opportunities for professional growth and development in a dynamic, high-growth environment. Located in a vibrant tech hub, we offer competitive benefits, a commitment to work-life balance, and a unique chance to work with cutting-edge technologies that push the boundaries of what's possible.
StudySmarter Expert Advice🤫
We think this is how you could land Infrastructure Engineer
✨Tip Number 1
Network like a pro! Attend industry meetups, conferences, or even online webinars. Connect with fellow engineers and recruiters on LinkedIn. You never know who might have the inside scoop on your dream job!
✨Tip Number 2
Show off your skills! Create a portfolio showcasing your projects, especially those involving AI tooling and infrastructure automation. This gives potential employers a taste of what you can do and sets you apart from the crowd.
✨Tip Number 3
Prepare for interviews by practising common technical questions and scenarios related to infrastructure engineering. Use mock interviews with friends or online platforms to get comfortable discussing your experience and problem-solving approach.
✨Tip Number 4
Apply through our website! We love seeing candidates who are genuinely interested in joining us at StudySmarter. Tailor your application to highlight your experience with containerisation, cloud services, and AI workflows to catch our eye.
We think you need these skills to ace Infrastructure Engineer
Some tips for your application 🫡
Show Off Your Experience:Make sure to highlight your 5+ years of experience in infrastructure engineering or DevOps. We want to see how you've built and operated large-scale systems, so don’t hold back on the details!
Get Technical:We’re looking for your hands-on experience with containerisation, Helm, Terraform, or Pulumi. If you’ve worked with AWS or any major cloud platform, let us know how you’ve used these tools in real-world scenarios.
AI is Key:Don’t forget to mention how AI tooling is part of your daily workflow. Share specific examples of how you’ve integrated AI into your processes, as this is a must-have for us!
Be Yourself:When writing your application, let your personality shine through! We value authenticity and want to get to know the real you. Remember to apply through our website for the best chance!
How to prepare for a job interview at Writer
✨Know Your Tech Inside Out
Make sure you’re well-versed in the technologies mentioned in the job description, especially AWS, Kubernetes, Helm, and Terraform. Be ready to discuss your hands-on experience with these tools, including any real-world challenges you've faced and how you overcame them.
✨Showcase Your AI Experience
Since AI tooling is a must-have for this role, prepare examples of how you've integrated AI into your workflows. Talk about specific tools you've used, like Claude Code or Codex, and how they’ve improved your processes. This will demonstrate that you’re not just familiar with AI but actively using it.
✨Demonstrate Problem-Solving Skills
Be prepared to discuss complex reliability problems you've tackled in the past. Use the STAR method (Situation, Task, Action, Result) to structure your answers, focusing on how you identified issues, proposed solutions, and the outcomes of your actions.
✨Emphasise Collaboration and Communication
This role requires strong cross-functional collaboration, so be ready to share examples of how you've worked with product, security, and engineering teams. Highlight your ability to surface non-goals and build relationships, as this will show you can operate effectively within a team.