At a Glance
- Tasks: Lead the design and execution of balenaCloud’s infrastructure and reliability architecture.
- Company: Join a forward-thinking tech company with a flat structure and high trust.
- Benefits: Enjoy competitive salary, flexible schedules, and generous parental leave.
- Other info: Collaborate asynchronously and enjoy autonomy in a dynamic work environment.
- Why this job: Make a real impact on global infrastructure while working remotely with a talented team.
- Qualifications: 6+ years in infrastructure engineering and deep AWS expertise required.
The predicted salary is between 60000 - 80000 £ per year.
We are looking for a Staff Infrastructure Engineer to lead the technical direction and execution of balenaCloud’s infrastructure and reliability architecture. As our customer base and device fleets expand globally, we need a dedicated technical lead to drive our transition into multi-region hosting and single-tenant dedicated instances, natively within Amazon Web Services (AWS).
At balena, we don't have traditional managers or hierarchy; we rely on high levels of trust, autonomy, and alignment. You will be joining at the Staff Level (Tactical scope / Domain Leader). Given the company strategy (the Why), you define the Tactics and the What, design the How, and heavily participate in the Do.
This role represents a dual leadership mandate: you will operate across both Infrastructure Engineering (planning for immense scale, multi-region hosting, and deep AWS automation) and Reliability Engineering (designing the observability tooling, defining operational procedures, and scaling the team's ability to debug and improve the system). Our infrastructure is deeply rooted in AWS, and we need an engineer who can drop in and be highly effective within this ecosystem immediately.
Your Impact (Responsibilities)
- AWS-Native Architecture: Architect, automate, and optimize deeply integrated AWS environments. You will leverage the right AWS services to build a system that hosts balenaCloud reliably, delivering maximum performance and deep cost/resource optimization on a per-device basis.
- Infrastructure & Reliability: Bridge the gap between building for scale and running for stability. You will not only design the infrastructure but also drive the reliability practices for our growing systems, driving continuous improvement, robust feedback loops, and incident resilience.
- Architect for Massive B2B Scale: Design infrastructure capable of handling enterprise-level loads: billions of requests per week (>30 million/hour) and terabytes of data per day. Your mental model should align with massive B2B platforms rather than B2C media streaming.
- Multi-Region & Single-Tenant Hosting: Own the technical tactics and execution to deploy single-tenant, single-region balenaCloud instances (e.g., dedicated instances in the EU, Australia, US, or Japan) to satisfy strict customer data sovereignty needs.
- Kubernetes at Scale: Architect and manage multiple balenaCloud stacks simultaneously, overseeing the deployment and orchestration of many independent Kubernetes clusters for various customers.
- Decade-Long Reliability: We are responsible for physical devices in the real world that will stay deployed for decades. Short-term, fragile infrastructure solutions are unacceptable, as they risk rendering devices lost in the field. Your designs and implementations must meet our >10-year durability bar.
- Team Enablement & Async Collaboration: You will scale your knowledge across an overwhelmed engineering team. You will document, articulate, and demonstrate decision proposals based on objective facts and empirical evidence, minimizing the need for synchronous calls.
Essential Qualifications
- Experience: Minimum of 6 years of highly relevant professional work experience in infrastructure and reliability engineering.
- Deep AWS Expertise: Proven, hands‑on mastery of the AWS ecosystem. You must be able to navigate, architect, and optimize AWS services with immediate effectiveness.
- Observability & Reliability: Deep understanding of Site Reliability Engineering principles. You have proven experience building highly usable observability tooling, metrics, and monitoring systems from the ground up to support high availability.
- Exceptional Documentation Skills: Strong, hands‑on ability to write clear, actionable, and maintainable technical documentation, scaling plans, and onboarding materials for the team.
- Distributed Systems: Proven experience in multiple geolocation hosting with distributed data and processing, specifically in multi‑tenant SaaS environments.
- Core Stack & Automation: Deep expertise with Kubernetes deployments at scale, managing massive PostgreSQL / RDS databases, and proven mastery of Infrastructure as Code and infrastructure automation.
- Scale Testing: Extensive experience in load and scale testing, specifically handling magnitudes of 10k–100k simultaneous connections.
- Remote & Async Communication: Fluent English. Intrinsic motivation to prioritize open, text‑based communication in a public knowledge base. You actively work to reduce synchronous call time to respect scarce overlapping hours across global time zones.
- Abstract Thinking: Ability to identify, research, and advocate for solutions to complicated problems with minimal technical guidance, working from a defined company strategy.
Preferred Skills (Nice to Have)
- Compliance: Experience deploying solutions into special compliance environments (e.g., federal services, FedRAMP, GovCloud).
- AWS Certifications: High-level AWS certifications (e.g., AWS Certified Solutions Architect - Professional) are a strong bonus.
What “Staff Level” Means for You
To succeed in this role, you should fit the following profile based on our internal leveling guide (Tactical level / Domain Leader).
Given: The Company Strategy and Environment (e.g., "We need to scale our AWS infrastructure to support dedicated regional hosting to satisfy global data sovereignty laws, while improving overall fleet reliability").
You: Define the What and the How (researching AWS networking options, advocating for a specific EKS cluster architecture, writing the scaling plans, observability specs, and IaC), and heavily participate in the Do (hands‑on coding and infrastructure provisioning).
Enable: You elevate the entire company. You remove systemic friction, prevent architectural dead‑ends by identifying doomed approaches early, and mentor Domain Contributors. You back up your decision‑making with solid reasoning. You execute within architectural decisions that hold up over a 10+ year horizon, and raise flags early when tactics or designs threaten that durability.
Benefits
- Competitive salary
- Autonomous vacation allowance
- 12 weeks of paid parental leave for new parents
- Equipment of your choice and hardware for side projects
- Books of your choice to help you in your work
- Annual company gathering in an international location, Balena Summit 2024
- Working with a talented and globally distributed team
- Flexible schedules by default
Staff Infrastructure Engineer (fully remote) in London employer: Yocto Project
At balena, we pride ourselves on fostering a culture of trust and autonomy, making us an exceptional employer for those seeking meaningful work in a fully remote environment. Our commitment to employee growth is evident through our competitive benefits, including generous parental leave, flexible schedules, and opportunities for professional development, all while collaborating with a talented, globally distributed team. Join us to lead innovative infrastructure solutions that will shape the future of device management and reliability.
StudySmarter Expert Advice🤫
We think this is how you could land Staff Infrastructure Engineer (fully remote) in London
✨Join Local Tech Meetups
Get out there and mingle with fellow developers by joining local tech meetups. It’s a fantastic way to meet people who might be working at Yocto Project or know someone who does. Plus, you can pick up some trendy tech skills and trends while you're at it!
✨Contribute to Open Source Projects
Show off your coding chops by jumping into open-source projects. Not only does this give you practical experience, but it also gets you noticed in the dev community. You'll create a killer portfolio that speaks volumes about your skills to Yocto Project.
✨Tap into Online Developer Communities
Don’t underestimate the power of online developer communities like GitHub, Stack Overflow, and even Reddit. Participate in discussions, share your projects, and build your visibility. We can often find opportunities through these channels that can lead to a full-time gig at companies like Yocto Project.
✨Explore Job Boards Specifically for Tech Roles
Keep your eyes peeled on job boards that focus on tech roles. Sites like TechCareers or Stack Overflow Jobs can often have listings for companies like Yocto Project that might not show up on broader job sites. Make it a habit to check these regularly, and don’t hesitate to apply directly through our website!
We think you need these skills to ace Staff Infrastructure Engineer (fully remote) in London
Some tips for your application 🫡
Show off your coding skills:When applying for a software engineering role, it's super important to showcase your coding skills. Make sure your CV includes your tech stack, any relevant programming languages you’re comfortable with, and examples of projects you've worked on. If you have a GitHub profile, link it up! We love to see code in action.
Tailor your portfolio:For a full-time role, we’d expect to see some solid examples of your work in your portfolio. Make sure to include at least two or three projects that highlight your problem-solving skills and your ability to work with different technologies. Focus on the projects that are most relevant to the position at Yocto Project.
Craft a killer cover letter:Your cover letter is your chance to stand out—make it personal! Explain why you want to work at Yocto Project and how your skills align with the role. Show us your passion for software development. We dig enthusiastic candidates who understand the value of collaboration and continuous learning!
Be clear and concise:When it comes to writing your CV and cover letter, clarity is key. Avoid jargon that could confuse us and stick to simple, direct language. Highlight your achievements with quantifiable results where possible, and keep everything easy to read. A well-organised application goes a long way!
How to prepare for a job interview at Yocto Project
✨Brush Up on Your Coding Skills
For a full-time software engineering role, it's crucial that we stay sharp with our coding abilities. Expect technical questions that might involve solving problems on the spot or discussing algorithms. Practise on platforms like LeetCode or HackerRank to get comfortable with the types of questions that often come up.
✨Know Your Tools and Frameworks
Make sure we’re well-acquainted with the tools and technologies listed in the job description. Familiarise ourselves with any specific frameworks or programming languages mentioned. If Yocto Project uses React or Node.js, for instance, be ready to discuss how we’ve used them in previous projects or coursework.
✨Showcase Your Projects
Bring along a portfolio that highlights our best work. This could be code samples, GitHub repositories, or any side projects we’ve built. Make sure we can talk through our thought process for each project, especially the challenges we faced and how we solved them—this shows our problem-solving skills in action.
✨Prepare for Behavioural Questions
While technical skills are key, full-time positions also require cultural fit. Be ready to discuss our previous experiences and how we handle teamwork, conflict, and deadlines. Brush up on the STAR method—Situation, Task, Action, Result—to clearly articulate our past experiences when discussing how we've contributed to a team.