At a Glance
- Tasks: Lead the reliability and stability of production systems while driving continuous improvement.
- Company: Join a forward-thinking tech company focused on innovation and excellence.
- Benefits: Competitive salary, flexible working options, and opportunities for professional growth.
- Other info: Dynamic team environment with a focus on collaboration and cutting-edge solutions.
- Why this job: Make a real impact by ensuring system reliability and enhancing customer experiences.
- Qualifications: 5+ years in Site Reliability, strong leadership, and expertise in modern technologies.
The predicted salary is between 60000 - 80000 £ per year.
Requirements
- At least 5 years of hands-on experience in Site Reliability focused positions
- Strong knowledge of containerization technologies (Docker, Kubernetes)
- Experience with infrastructure as code (Terraform)
- Solid understanding of networking, security, and system architecture
- Proficient in scripting languages (Java, Golang, Python, Bash, or similar)
- Experience with monitoring and observability tools (DataDog, Prometheus, Grafana)
- Knowledge of database management systems (PostgreSQL, Bigtable)
- Understanding of API and microservices architecture
- Strong people leadership skills with at least a year in leading and driving high-performance technical teams
- Operations teams within enterprise environments with knowledge of DevOps, ITIL, Cloud Services, IT Infrastructure and Operations supporting and maintaining production and development environments and building cloud services that are secure, reliable, scalable and observable
- Experience with establishing Service Delivery strategies that align to new ways of work methods, including Agile
- Experience of establishing and delivering IT support services in a high availability (HA) environment such as 24/7 operations
What the job involves
- The System and Platform Operations Manager is a technical leadership role responsible for the support, reliability and stability of Epsilon Retail Media production systems, environments and offerings
- The team owns the reliability vision for the company, driving continuous improvement through a combination of development and operations initiatives as well as process excellence
- This position and their team has solid-line responsibility for operations including the deployment, management, monitoring, reporting, troubleshooting, and repair of production systems
- Core to the success of the role is to provide a premium customer support experience focused on a “center of excellence” that allows for a full-service delivery support cycle
- This role is responsible for managing the Platform Operation Team centralized within a single geo-region, orchestrating the regional teamwork, serving with both technical and professional support, and championing the company values
- The Platform Operations Engineer works closely with the Engineering team to ensure ongoing system stability and supports the Technical Account Managers from an environment's perspective
- The Platform Operations team is responsible for supporting all retailers once they are live
- Critically important is how this team collaborates and liaises with other teams such as Customer Support, Technical Account Management, Engineering and Customer Success teams
- You'll establish and manage operational practices and ensure we design, implement and operate a support model that is fit for purpose for our future
- Adopt a “Measure Everything” approach to ensure that internal service level objectives and customer service levels agreements are exceeded including executive level reporting on operational health metrics such as SLAs, incident resolution, performance, availability, reliability, capacity etc
- Take ownership of complex issues related to performance, reliability, and scalability and lead resolution of serious incidents and events including communications with customers and wider stakeholders
- Provide insight and expertise on how customers will perceive the changes or impacts to customers to drive customer organization change management and communication
- Empower the Delivery teams to release new products, features, updates and fixes quickly, while ensuring Platforms remain reliable and stable
- Work with the wider Engineering, Product, Delivery and Security teams to ensure that appropriate attention is given to production/system reliability
- Identify the capabilities needed to meet the current and emerging business needs of a significant function
- As subject matter expert on the team, maintain understanding of current technology, database management, reliability practices, and future trends through ongoing education, conference attendance and industry press
Platform Reliability & Operations Lead in London employer: Epsilon
Epsilon Retail Media is an exceptional employer that prioritises a culture of collaboration and continuous improvement, making it an ideal place for professionals in the tech industry. With a strong focus on employee growth, we offer opportunities for skill enhancement through ongoing education and industry engagement, all while fostering a supportive environment that values innovation and excellence. Located in a vibrant region, our team enjoys a dynamic work atmosphere that encourages creativity and teamwork, ensuring that every member can contribute to our mission of delivering premium customer support and operational excellence.
StudySmarter Expert Advice🤫
We think this is how you could land Platform Reliability & Operations Lead in London
✨Tip Number 1
Network, network, network! Get out there and connect with folks in the industry. Attend meetups, webinars, or even local tech events. You never know who might have a lead on that perfect Platform Reliability & Operations Lead role!
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those involving Docker, Kubernetes, or Terraform. This gives potential employers a taste of what you can do and sets you apart from the crowd.
✨Tip Number 3
Prepare for interviews by brushing up on your technical knowledge and soft skills. Be ready to discuss your experience with monitoring tools like DataDog or Grafana, and how you've led high-performance teams. Practice makes perfect!
✨Tip Number 4
Don’t forget to apply through our website! We love seeing applications directly from candidates who are passionate about joining our team. Plus, it shows you're proactive and really interested in the role.
We think you need these skills to ace Platform Reliability & Operations Lead in London
Some tips for your application 🫡
Tailor Your CV:Make sure your CV highlights your hands-on experience in Site Reliability roles and showcases your knowledge of containerization technologies like Docker and Kubernetes. We want to see how your skills align with our needs, so don’t be shy about making those connections clear!
Showcase Your Leadership Skills:Since this role involves leading high-performance technical teams, it’s crucial to demonstrate your people leadership experience. Share specific examples of how you've driven team success and fostered collaboration in previous roles – we love a good story!
Highlight Relevant Tools and Technologies:Be sure to mention your proficiency in scripting languages and any experience with monitoring tools like DataDog or Grafana. We’re looking for someone who can hit the ground running, so let us know how you’ve used these tools to enhance system reliability.
Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it gives you a chance to explore more about our company culture and values!
How to prepare for a job interview at Epsilon
✨Know Your Tech Inside Out
Make sure you brush up on your knowledge of containerization technologies like Docker and Kubernetes, as well as infrastructure as code with Terraform. Be ready to discuss how you've used these tools in past roles, and think of specific examples that showcase your hands-on experience.
✨Showcase Your Leadership Skills
Since this role involves leading high-performance technical teams, prepare to share your experiences in people leadership. Think about challenges you've faced, how you motivated your team, and the strategies you implemented to drive success. This will demonstrate your capability to manage and inspire others.
✨Understand the Bigger Picture
Familiarise yourself with the company's vision for reliability and how it aligns with their operations. Be prepared to discuss how you would contribute to their goals, especially in terms of continuous improvement and customer support excellence. Showing that you understand their mission can set you apart.
✨Prepare for Scenario-Based Questions
Expect questions that assess your problem-solving skills, especially around performance, reliability, and scalability issues. Think of complex incidents you've resolved in the past and be ready to explain your thought process, the steps you took, and the outcomes. This will highlight your analytical skills and ability to handle pressure.