At a Glance
- Tasks: Lead the strategy for AI infrastructure and manage complex systems across multiple clouds.
- Company: Join Anthropic, a pioneering tech company focused on reliable AI systems.
- Benefits: Enjoy a competitive salary, flexible hours, and generous leave policies.
- Other info: Collaborative environment with opportunities for mentorship and career growth.
- Why this job: Make a real impact in AI by building scalable, cutting-edge infrastructure.
- Qualifications: Expertise in distributed systems and cloud platforms; strong coding skills required.
About the role
Anthropic's Infrastructure organization is foundational to our mission of developing AI systems that are reliable, interpretable, and steerable. The systems we build determine how quickly we can train new models, how reliably we can run safety experiments, and how effectively we can scale Claude to millions of users — demonstrating that safe, reliable infrastructure and frontier capabilities can go hand in hand. Node Infra owns the full lifecycle of accelerator capacity at Anthropic. We ingest and provision compute from all major CSPs and our own datacenters, stand up and scale clusters from thousands to hundreds of thousands of hosts, and build the health, diagnostics and repair automation that keep every GPU, TPU and Trainium node in the fleet usable and ready to power Anthropic’s frontier AI research.
Key responsibilities
- Own the technical strategy and roadmap for node lifecycle management – ingestion, bring‑up, health checking, and automated repair
- Drive cross‑team initiatives to build and scale AI clusters across multiple clouds and accelerator families
- Design and operate the systems that detect, isolate, and remediate unhealthy hardware automatically, driving up fleet MTBI and minimizing stranded capacity
- Define infrastructure architecture, ensuring the hardest problems get solved – whether by you directly or by working through others
- Work closely with cloud providers and internal research/inference/product teams to shape long‑term compute, data, and infrastructure strategy
- Establish and evolve operational excellence practices (incident response, postmortem culture, on‑call)
- Support the growth of engineers around you through technical mentorship and coaching
Minimum qualifications
- Deep expertise in distributed systems, reliability, and cloud platforms (e.g., Kubernetes, IaC, AWS/GCP/Azure)
- Strong proficiency in at least one systems language (e.g., Rust, Go, or Python), IaC proficiency with Terraform
- Hands‑on experience with machine learning accelerators (GPUs, TPUs, or Trainium)
- Track record of leading complex, multi‑quarter technical initiatives that span multiple teams or systems
- Ability to build alignment across senior stakeholders and communicate effectively at all levels
- Bachelor’s degree or equivalent in a field relevant to the role
Preferred qualifications
- 8+ years of software engineering experience, including time as a technical lead setting direction for a team
- Experience managing large‑scale compute infrastructure at hyperscale (10K+ nodes), including capacity management and efficiency
- Depth in one or more of: Kubernetes internals (scheduler, autoscaler, kubelet, Karpenter), cluster orchestration systems (Mesos, Borg‑like), or node provisioning pipelines
- Low‑level systems experience: kernel, virtualization, device drivers, firmware, or hardware health/diagnostics daemons
- Familiarity with high‑performance networking (EFA, RDMA, InfiniBand) for distributed ML workloads
- Demonstrated ownership of production reliability for high‑throughput, latency‑sensitive systems
- Contributions to relevant open‑source projects (Kubernetes, Linux kernel, container runtimes, etc.)
- Skill in quickly understanding systems design tradeoffs and keeping track of rapidly evolving software systems
Compensation
Annual salary: £325,000 – £485,000 GBP
Benefits
- Competitive compensation and benefits package
- Optional equity donation matching
- Generous vacation and parental leave
- Flexible working hours
- Lovely office space in San Francisco
Senior Staff+ Software Engineer, Node Infra employer: Menlo Ventures
At Anthropic, we pride ourselves on being an exceptional employer, particularly for those in the Senior Staff+ Software Engineer role within our Infrastructure organisation. Our commitment to fostering a collaborative and innovative work culture is complemented by competitive compensation, generous vacation policies, and flexible working hours, all set in the vibrant city of San Francisco. We prioritise employee growth through mentorship opportunities and encourage contributions to cutting-edge AI research, making it a truly rewarding environment for talented engineers looking to make a meaningful impact.
StudySmarter Expert Advice🤫
We think this is how you could land Senior Staff+ Software Engineer, Node Infra
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, especially those at Anthropic. A friendly chat can sometimes lead to opportunities that aren’t even advertised yet.
✨Tip Number 2
Show off your skills! If you’ve got a GitHub or portfolio, make sure it’s up to date. Share projects that highlight your expertise in distributed systems and cloud platforms.
✨Tip Number 3
Prepare for technical interviews by brushing up on your knowledge of Kubernetes and IaC tools like Terraform. We want to see how you tackle real-world problems, so practice coding challenges related to infrastructure management.
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love seeing candidates who are proactive about their job search.
We think you need these skills to ace Senior Staff+ Software Engineer, Node Infra
Some tips for your application 🫡
Tailor Your Application:Make sure to customise your CV and cover letter for the role. Highlight your experience with distributed systems and cloud platforms, as these are key for us at Anthropic. Show us how your skills align with our mission!
Showcase Your Technical Skills:Don’t hold back on detailing your technical expertise! We want to see your proficiency in systems languages and any hands-on experience with machine learning accelerators. This is your chance to shine!
Communicate Clearly:When writing your application, keep it clear and concise. We appreciate straightforward communication, especially when it comes to complex topics like infrastructure architecture. Make it easy for us to understand your thought process.
Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it’s super easy!
How to prepare for a job interview at Menlo Ventures
✨Know Your Tech Inside Out
Make sure you’re well-versed in distributed systems, cloud platforms, and the specific technologies mentioned in the job description. Brush up on your knowledge of Kubernetes, IaC, and machine learning accelerators. Being able to discuss these topics confidently will show that you’re not just familiar but truly understand the tech landscape.
✨Showcase Your Leadership Skills
Since this role involves leading complex initiatives, be prepared to share examples of past experiences where you’ve successfully led teams or projects. Highlight how you built alignment across stakeholders and drove technical strategies. This will demonstrate your capability to take charge and guide others effectively.
✨Prepare for Problem-Solving Questions
Expect to face questions that assess your problem-solving skills, especially around infrastructure architecture and reliability. Think of scenarios where you had to troubleshoot or optimise systems, and be ready to walk through your thought process. This will showcase your analytical skills and ability to handle real-world challenges.
✨Engage with the Interviewers
Interviews are a two-way street! Prepare thoughtful questions about the team’s current projects, challenges they face, and their vision for the future. This not only shows your interest in the role but also helps you gauge if the company aligns with your career goals. Plus, it makes for a more engaging conversation!