At a Glance
- Tasks: Design and optimise AI infrastructure for large-scale machine learning systems.
- Company: Leading tech firm focused on innovative AI solutions.
- Benefits: Competitive salary, flexible working, and opportunities for professional growth.
- Other info: Join a diverse team with exciting projects across Europe.
- Why this job: Be at the forefront of AI technology and make a real impact.
- Qualifications: Experience in AI/ML infrastructure and strong coding skills required.
The predicted salary is between 80000 - 100000 £ per year.
As a Lead and Principal Infrastructure Architect, you own end-to-end responsibility for designing optimized compute infrastructure for large-scale AI and machine learning systems, including large-scale distributed training environments. You are the authority who translates business goals, SLAs, and client standards into infrastructure architectures that perform at scale while being deliberately engineered for cost-efficiency.
Drawing on deep experience, you weigh multiple viable solutions for any given problem - across compute, networking, storage, orchestration, and model serving - and make rational, well-justified architectural decisions tailored to each client's situation, products, constraints, and standards. You architect and optimize the full computational stack for performance, power, cost, and scalability; design and tune large-scale GPU clusters and distributed training systems; and ensure infrastructure meets security, compliance, and regulatory requirements.
As the recognized AI infrastructure expert in at least one hyperscaler cloud (such as AWS, Azure, or Google Cloud), you bring authoritative knowledge of that platform's AI/ML services, accelerators, networking, and cost levers, and apply it to deliver best-in-class solutions. Beyond design, you set technical direction and standards, lead and mentor engineers and architects, partner with clients and stakeholders to shape the infrastructure roadmap, and are ultimately accountable for delivering AI/ML infrastructure that meets business SLAs, controls cost, and scales to enterprise and frontier workloads.
THE WORK
- Own the end-to-end architecture and design of optimized compute infrastructure for large-scale AI/ML systems, including large-scale distributed training environments, from concept through delivery.
- Develop and evaluate architecture alternatives, weighing trade-offs across compute, networking, storage, orchestration, and model serving to make rational, well-justified decisions tailored to each client's situation and standards.
- Lead architecture assessments and reviews of existing and proposed environments, identifying gaps, risks, bottlenecks, and optimization opportunities, and recommending remediation.
- Drive architectural decision-making, documenting rationale, trade-offs, and assumptions so decisions are transparent, defensible, and aligned with business SLAs and standards.
- Define and maintain the AI infrastructure roadmap, planning capacity, scaling, and technology evolution in step with business and product goals.
- Architect and optimize the full computational stack for performance, power, cost, and scalability, ensuring infrastructure meets business SLAs while being deliberately engineered for cost-efficiency.
- Design and tune large-scale GPU clusters and distributed training systems, including accelerator selection, interconnect/networking, and storage for high-throughput training workloads.
- Serve as the authoritative AI infrastructure expert in at least one hyperscaler cloud (AWS, Azure, or GCP), applying deep knowledge of its AI/ML services, accelerators, networking, and cost levers.
- Design deployment, automation, and CI/CD strategies for reliable, repeatable, and scalable releases of AI systems, models, and data pipelines into production.
- Establish AI monitoring and observability strategy across InfraOps and MLOps, defining SLAs, SLOs, alerting, and performance/cost tracking, and driving continuous optimization.
- Integrate AI/ML systems into enterprise environments, ensuring interoperability, security, compliance, and adherence to regulatory and client standards.
- Lead capacity planning and cost modeling, forecasting compute needs and engineering cost-efficiency into the architecture without compromising performance.
- Collaborate with clients, stakeholders, and engineering teams to align infrastructure decisions with business outcomes, translating requirements into actionable architecture and standards.
- Set technical direction, standards, and best practices, mentoring engineers and architects and leading design and code reviews across the team.
Qualification
EDUCATION
Bachelor's Degree in Computer Science, Computer Engineering, or related Engineering field.
BASIC (REQUIRED) QUALIFICATION
- Solid background in coding, building, monitoring, troubleshooting applications of AI/ML models; selecting, designing and implementing infrastructure for deploying and running them on premise or on public cloud.
- Strong understanding of AI and machine learning as a subject.
- Strong understanding of computing infrastructure as a subject, preferred knowledge of AI infrastructure.
- Good proficiency in programming languages such as Python, Java, or C++.
- Experience with data pipeline and workflow management tools (e.g., Apache Airflow, Kubeflow).
- Strong problem-solving skills and ability to work in a fast-paced environment.
- Excellent communication and collaboration skills.
- Significant experience in AI/ML infrastructure engineering or related roles on a hyperscaler platform for deploying large scale solutions.
- Proven experience in leading and managing AI projects and teams.
- Strong project management skills, with the ability to manage multiple projects simultaneously.
- Demonstrated experience in evaluating and selecting AI technologies and frameworks.
- Ability to work with cross-functional teams and drive project alignment.
Locations
London, Berlin, Madrid, Paris
Equal Employment Opportunity Statement
All employment decisions shall be made without regard to age, race, creed, color, religion, sex, national origin, ancestry, disability status, veteran status, sexual orientation, gender identity or expression, genetic information, marital status, citizenship status or any other basis as protected by federal, state, or local law.
AI Infrastructure Lead Architect employer: WeAreTechWomen
As an AI Infrastructure Lead Architect, you will thrive in a dynamic and innovative work culture that prioritises collaboration and continuous learning. Our London office offers exceptional employee growth opportunities, including mentorship from industry experts and access to cutting-edge technology, ensuring you remain at the forefront of AI advancements. With a commitment to diversity and inclusion, we foster an environment where every voice is valued, making us an excellent employer for those seeking meaningful and rewarding careers.
StudySmarter Expert Advice🤫
We think this is how you could land AI Infrastructure Lead Architect
✨Join Local Tech Meetups
Get out there and mingle with fellow developers by joining local tech meetups. It’s a fantastic way to meet people who might be working at WeAreTechWomen or know someone who does. Plus, you can pick up some trendy tech skills and trends while you're at it!
✨Contribute to Open Source Projects
Show off your coding chops by jumping into open-source projects. Not only does this give you practical experience, but it also gets you noticed in the dev community. You'll create a killer portfolio that speaks volumes about your skills to WeAreTechWomen.
✨Tap into Online Developer Communities
Don’t underestimate the power of online developer communities like GitHub, Stack Overflow, and even Reddit. Participate in discussions, share your projects, and build your visibility. We can often find opportunities through these channels that can lead to a full-time gig at companies like WeAreTechWomen.
✨Explore Job Boards Specifically for Tech Roles
Keep your eyes peeled on job boards that focus on tech roles. Sites like TechCareers or Stack Overflow Jobs can often have listings for companies like WeAreTechWomen that might not show up on broader job sites. Make it a habit to check these regularly, and don’t hesitate to apply directly through our website!
We think you need these skills to ace AI Infrastructure Lead Architect
Some tips for your application 🫡
Show off your coding skills:When applying for a software engineering role, it's super important to showcase your coding skills. Make sure your CV includes your tech stack, any relevant programming languages you’re comfortable with, and examples of projects you've worked on. If you have a GitHub profile, link it up! We love to see code in action.
Tailor your portfolio:For a full-time role, we’d expect to see some solid examples of your work in your portfolio. Make sure to include at least two or three projects that highlight your problem-solving skills and your ability to work with different technologies. Focus on the projects that are most relevant to the position at WeAreTechWomen.
Craft a killer cover letter:Your cover letter is your chance to stand out—make it personal! Explain why you want to work at WeAreTechWomen and how your skills align with the role. Show us your passion for software development. We dig enthusiastic candidates who understand the value of collaboration and continuous learning!
Be clear and concise:When it comes to writing your CV and cover letter, clarity is key. Avoid jargon that could confuse us and stick to simple, direct language. Highlight your achievements with quantifiable results where possible, and keep everything easy to read. A well-organised application goes a long way!
How to prepare for a job interview at WeAreTechWomen
✨Brush Up on Your Coding Skills
For a full-time software engineering role, it's crucial that we stay sharp with our coding abilities. Expect technical questions that might involve solving problems on the spot or discussing algorithms. Practise on platforms like LeetCode or HackerRank to get comfortable with the types of questions that often come up.
✨Know Your Tools and Frameworks
Make sure we’re well-acquainted with the tools and technologies listed in the job description. Familiarise ourselves with any specific frameworks or programming languages mentioned. If WeAreTechWomen uses React or Node.js, for instance, be ready to discuss how we’ve used them in previous projects or coursework.
✨Showcase Your Projects
Bring along a portfolio that highlights our best work. This could be code samples, GitHub repositories, or any side projects we’ve built. Make sure we can talk through our thought process for each project, especially the challenges we faced and how we solved them—this shows our problem-solving skills in action.
✨Prepare for Behavioural Questions
While technical skills are key, full-time positions also require cultural fit. Be ready to discuss our previous experiences and how we handle teamwork, conflict, and deadlines. Brush up on the STAR method—Situation, Task, Action, Result—to clearly articulate our past experiences when discussing how we've contributed to a team.