Lead Staff Systems Reliability Engineer (Linux & Distributed Systems) in London

Lead Staff Systems Reliability Engineer (Linux & Distributed Systems) in London

London Full-Time 80000 - 100000 € / year (est.) Home office (partial)
The Trade Desk

At a Glance

  • Tasks: Lead a team to manage and optimise systems at scale in a global ecosystem.
  • Company: The Trade Desk, a top-rated tech company focused on innovative advertising solutions.
  • Benefits: Inclusive culture, competitive salary, and opportunities for professional growth.
  • Other info: Diverse environment with a commitment to fostering inclusivity and innovation.
  • Why this job: Join a dynamic team and work with cutting-edge technology to shape the future of infrastructure.
  • Qualifications: Experience with Linux, leadership skills, and a passion for problem-solving.

The predicted salary is between 80000 - 100000 € per year.

The Trade Desk is a global technology company with a mission to create a better, more open internet for everyone through principled, intelligent advertising. Handling over 1 trillion queries per day, our platform operates at an unprecedented scale. We value the unique experiences and perspectives that each person brings to The Trade Desk, and we are committed to fostering inclusive spaces where everyone can bring their authentic selves to work every day.

Do you have a passion for solving hard problems at scale? Are you eager to join a dynamic, globally-connected team where your contributions will make a meaningful difference in building a better media ecosystem?

We are looking to hire a Lead Systems Reliability Engineer to join our engineering team to continue building and maintaining our data-driven platform. We leverage technologies like Aerospike, MongoDB, and Kafka to perform many real-time activities, translating to a p99 latency under 1 millisecond on the back end. Do you enjoy tuning, performance testing, troubleshooting, automation, and operating at scale? Does testing next-gen hardware, evaluating data access patterns, and designing automation around distributed systems excite you?

What makes this role different

  • First in the Industry: The Trade Desk is the first company to run over 5MM QPS to NVMe in Aerospike on a single node, forcing core software redesigns to achieve this scale.
  • Work on Cutting-Edge Hardware: Design clusters with nodes featuring 300TB of NVMe, 3TB RAM, and 512 cores, delivering a global 2,500GB/s throughput directly from flash.
  • Shape the Future of Infrastructure: Spec your own systems and collaborate directly with AMD and NoSQL vendors to run PoCs and optimize bleeding-edge technology for internet-scale workloads.
  • Deep Performance Engineering: Dive into kernel, hardware, and system interactions, leveraging tools like flamegraphs, NUMA counters, BIOS tuning, and synthetic testing to achieve world-class performance.
  • Push Hardware Endurance Limits: Build clusters engineered to withstand over 1 zettabyte of endurance.

What you’ll do

  • Lead a team to influence, manage, and plan work streams, systems, and data structures at scale within a global ecosystem, spanning multiple infrastructure providers (cloud and traditional datacenters).
  • Encourage, improve, and build infrastructure automation in a way that works with stateful systems at scale.
  • Own operations for Linux-based systems running Aerospike, Kafka, and Mongo.
  • Serve as a point of contact to review new use cases, answer questions, and participate in on-call rotation.
  • Learn to be a NoSQL SME. You do not need experience to apply – we will train you.
  • Benchmark and analyze next generation hardware offerings.

Who you are

  • Skills And Experience
    • Linux operating system
    • Leadership experience and ability to mentor
    • Troubleshooting Techniques for isolation, scientific method
    • Identify bottlenecks (Is it CPU? IO?)
  • Nice-To-Have experience:
    • Physical hardware (on-prem) internals, management, and operation
    • Performing testing and tuning
    • Databases (relational or NoSQL)
    • Ansible/PyInfra/Chef
    • Prometheus
    • Kubernetes
    • Python/Ruby/Rust/Bash/Golang/C#
  • Empathetic, Objective, Critical Thinker: Thinking beyond the task at hand to deeply understand the 'why' behind an objective.
  • A welcoming of ideas, and understanding of, perspectives that are different from your own and an interest in seeking and building from a common ground.
  • You are a creative thinker, not bound by "the way things have always been done" but are thinking of the questions nobody has thought of and are "yet to be asked".
  • What you know is less important than how well you learn, innovate, collaborate, and adapt.
  • As a global team from many diverse backgrounds, experiences, and perspectives, you value and seek out paths for fostering diversity.

Lead Staff Systems Reliability Engineer (Linux & Distributed Systems) in London employer: The Trade Desk

The Trade Desk is an exceptional employer that champions innovation and inclusivity, making it a fantastic place for a Lead Staff Systems Reliability Engineer to thrive. With a commitment to employee growth, cutting-edge technology, and a collaborative work culture, team members are empowered to push the boundaries of performance engineering while enjoying a supportive environment that values diverse perspectives. Located in a dynamic global setting, The Trade Desk offers unique opportunities to work with state-of-the-art hardware and influence the future of infrastructure at scale.

The Trade Desk

Contact Detail:

The Trade Desk Recruiting Team

StudySmarter Expert Advice🤫

We think this is how you could land Lead Staff Systems Reliability Engineer (Linux & Distributed Systems) in London

Tip Number 1

Network like a pro! Reach out to current employees at The Trade Desk on LinkedIn. Ask them about their experiences and any tips they might have for the interview process. This can give you insider knowledge and make your application stand out.

Tip Number 2

Prepare for technical interviews by brushing up on your Linux and distributed systems knowledge. Practice common troubleshooting scenarios and be ready to discuss how you've tackled similar challenges in the past. We want to see your problem-solving skills in action!

Tip Number 3

Showcase your leadership experience! Be ready to share examples of how you've mentored others or led projects. The Trade Desk values collaboration, so demonstrating your ability to work well in a team will definitely give you an edge.

Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen. Plus, it shows you're genuinely interested in joining our team. Good luck, and we can't wait to see what you bring to the table!

We think you need these skills to ace Lead Staff Systems Reliability Engineer (Linux & Distributed Systems) in London

Linux Operating System
Leadership Experience
Mentoring
Troubleshooting Techniques
Performance Testing
Automation
NoSQL Databases

Some tips for your application 🫡

Tailor Your Application:Make sure to customise your CV and cover letter to highlight your experience with Linux and distributed systems. We want to see how your skills align with the role, so don’t hold back on showcasing your relevant projects!

Show Your Passion:Let your enthusiasm for solving complex problems shine through in your application. We love candidates who are excited about technology and eager to contribute to our mission of building a better media ecosystem.

Be Clear and Concise:When writing your application, keep it straightforward and to the point. Use clear language to describe your experiences and achievements, making it easy for us to see why you’d be a great fit for the team.

Apply Through Our Website:We encourage you to submit your application directly through our website. It’s the best way for us to receive your details and ensures you’re considered for the role. Plus, it’s super easy!

How to prepare for a job interview at The Trade Desk

Know Your Tech Stack

Familiarise yourself with the technologies mentioned in the job description, like Aerospike, Kafka, and MongoDB. Be ready to discuss how you've used similar tools in past projects or how you would approach learning them.

Showcase Problem-Solving Skills

Prepare examples of complex problems you've solved, especially those involving Linux systems or distributed architectures. Use the STAR method (Situation, Task, Action, Result) to structure your responses clearly.

Demonstrate Leadership and Mentorship

Since this role involves leading a team, think of instances where you've successfully mentored others or led projects. Highlight your leadership style and how you encourage collaboration and innovation within a team.

Emphasise Adaptability and Learning

The Trade Desk values creative thinkers who can adapt and learn quickly. Be prepared to discuss how you've tackled new challenges or technologies in the past and your approach to continuous learning in tech.