At a Glance
- Tasks: Own and enhance core infrastructure for cutting-edge products using Kubernetes and cloud technologies.
- Company: Fast-paced Silicon Valley startup with a focus on innovation and collaboration.
- Benefits: Competitive salary, flexible work environment, and opportunities for professional growth.
- Why this job: Shape the future of tech by building reliable systems that power impactful products.
- Qualifications: 5+ years in production infrastructure, strong Kubernetes experience, and coding skills.
- Other info: Join a dynamic team where your contributions directly influence product development.
The predicted salary is between 54000 - 84000 £ per year.
About the Role
We are looking for a Senior Platform / Infra Engineer to own the core infrastructure that powers Cosine's products — from Kubernetes and deployment pipelines to networking and platform services. You will design and run the "paved road" that our engineers, researchers, and customers build on: reliable Kubernetes clusters, fast and safe CI/CD, solid observability, and hardened environments for demanding enterprise and on-prem deployments. You will also wear a classic "DevOps/SRE" hat: thinking in SLOs, running incident response, and keeping us up even as we move quickly. This is a high-ownership role at a fast-paced, venture-backed Silicon Valley startup. You will work directly with founding engineers and leadership, and your decisions will materially shape how we build and ship products.
What You Will Do
- Own core infrastructure
- Design, operate, and evolve our Kubernetes-based platform (EKS or similar), including cluster topology, node groups, autoscaling, and multi-environment isolation.
- Manage supporting cloud resources: container registries, load balancers, queues, caches, and data infra needed to run our APIs and agents.
- Build the deployment & tooling layer
- Design and maintain CI/CD pipelines for image builds and infra rollouts (e.g. Pulumi/Terraform + Helm/Docker).
- Implement safe rollout strategies (blue/green, canary, staged rollouts) and fast rollback paths.
- Build internal tools and abstractions that make it easy for product teams to self-serve infra safely.
- Own reliability & operations (SRE-ish)
- Define and track SLOs/SLIs for key services (latency, error rates, availability).
- Improve our observability stack (metrics, logs, traces, alerts) so issues are obvious, actionable, and debuggable.
- Participate in the on-call rotation, lead incident response when needed, and drive blameless post-mortems and fixes.
- Shape networking & security
- Design and maintain networking: VPCs, subnets, ingress/egress, service meshes / L7 routing, DNS, and TLS.
- Implement least-privilege access via IAM, secure secret management, and hardened configurations for multi-tenant and isolated customer environments.
- Help design patterns for secure enterprise and on-prem / regulated deployments.
- Partner with product & research
- Work closely with application, ML, and research teams to understand their needs and translate them into reusable infra building blocks.
- Provide guidance on "how to run this in production" — capacity planning, failure modes, and operational readiness reviews.
You Might Be a Great Fit If You
- Have strong experience
- 5+ years building and operating production infrastructure on a major cloud (AWS, GCP, or Azure).
- Significant hands-on experience running Kubernetes in production (EKS/GKE/AKS or self-managed): Cluster upgrades, autoscaling, node group design, and multi-env setups.
- Helm or similar for packaging services.
- Deep experience with IaC tools (Pulumi, Terraform, CDK, or similar).
- Comfortable managing infra changes via code review, CI, and automated rollouts.
- Have owned the uptime and performance of user-facing systems.
- Comfortable participating in (and improving) on-call rotations and incident management.
- Experience setting up / tuning observability (Prometheus, Grafana, CloudWatch, OpenTelemetry, etc.).
- You have built internal tools, libraries, or platforms on top of cloud providers so product teams can move faster with fewer foot-guns.
- You think about developer experience and "golden paths," not just raw infra.
- Strong scripting and programming skills in at least one modern language (e.g. TypeScript, Go, Python).
- Happy to dive into app code when needed to debug a production issue or improve an integration.
- Enjoy working in a fast-moving environment with evolving priorities and incomplete specs.
- Bias toward pragmatic solutions: ship something small, measure, iterate.
- Communicate clearly, give/receive direct feedback, and collaborate across functions.
Nice to Have (Not Required)
- Experience with:
- AWS primitives like EKS, ECS/Fargate, ECR, SQS, ElastiCache/Redis.
- Argo CD or other GitOps tools for Kubernetes.
- On-prem, air-gapped, or regulated industry deployments (e.g. finance, healthcare).
- AI/ML infrastructure (GPU workloads, model hosting, feature stores).
- Prior experience as an early infra / platform hire at a startup.
Senior Platform Engineer in London employer: Cosine
Contact Detail:
Cosine Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Senior Platform Engineer in London
✨Tip Number 1
Network, network, network! Get out there and connect with folks in the industry. Attend meetups, webinars, or even just grab a coffee with someone who works at a company you admire. You never know when a casual chat could lead to a job opportunity!
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those related to Kubernetes, CI/CD, or any cool tools you've built. This gives potential employers a taste of what you can do and sets you apart from the crowd.
✨Tip Number 3
Prepare for interviews like a pro! Research the company, understand their products, and be ready to discuss how your experience aligns with their needs. Practice common technical questions and be prepared to demonstrate your problem-solving skills on the spot.
✨Tip Number 4
Don’t forget to apply through our website! We love seeing candidates who are genuinely interested in joining us. Tailor your application to highlight your relevant experience and show us why you’d be a great fit for our team.
We think you need these skills to ace Senior Platform Engineer in London
Some tips for your application 🫡
Tailor Your Application: Make sure to customise your CV and cover letter for the Senior Platform Engineer role. Highlight your experience with Kubernetes, CI/CD pipelines, and any relevant cloud infrastructure work. We want to see how your skills align with what we're looking for!
Showcase Your Projects: Include specific examples of projects you've worked on that demonstrate your expertise in building and operating production infrastructure. If you've implemented observability tools or designed deployment strategies, let us know! We love seeing real-world applications of your skills.
Be Clear and Concise: When writing your application, keep it straightforward and to the point. Use bullet points where possible to make it easy for us to read through your experience. We appreciate clarity and directness, especially in a fast-paced environment like ours.
Apply Through Our Website: We encourage you to submit your application directly through our website. This helps us keep track of all applications and ensures you’re considered for the role. Plus, it’s super easy to do!
How to prepare for a job interview at Cosine
✨Know Your Kubernetes Inside Out
Make sure you can talk confidently about your experience with Kubernetes, especially in production environments. Be ready to discuss cluster upgrades, autoscaling, and how you've managed multi-environment setups. This is crucial for the role, so brush up on any recent projects or challenges you've faced.
✨Showcase Your Infrastructure as Code Skills
Prepare to discuss your experience with IaC tools like Terraform or Pulumi. Have examples ready of how you've implemented infrastructure changes through code reviews and automated rollouts. This will demonstrate your ability to manage infrastructure efficiently and safely.
✨Emphasise Reliability and Observability
Be prepared to share specific instances where you've owned uptime and performance for user-facing systems. Discuss how you've set up observability stacks using tools like Prometheus or Grafana, and how you've handled incident management. This shows that you care about reliability, which is key for this position.
✨Communicate Your Startup Mindset
Since this is a fast-paced startup environment, highlight your adaptability and willingness to tackle evolving priorities. Share examples of how you've shipped pragmatic solutions quickly and iterated based on feedback. This will show that you're a great fit for their dynamic culture.