At a Glance
- Tasks: Lead the Site Reliability Engineering team and ensure the reliability of the Midnight platform.
- Company: Join IOG, a pioneering tech company focused on blockchain research and development.
- Benefits: Enjoy remote work, laptop reimbursement, and competitive PTO.
- Why this job: Be part of a visionary team shaping the future of blockchain technology with a focus on data privacy.
- Qualifications: 8+ years in SRE or DevOps, coding skills in Python or JavaScript, and strong leadership experience required.
- Other info: Embrace a culture of innovation and continuous growth at IOG.
The predicted salary is between 72000 - 108000 Β£ per year.
Who are we? IOG is a technology company focused on blockchain research and development. We are renowned for our scientific approach to blockchain development, emphasizing peer-reviewed research and formal methods to ensure security, scalability, and sustainability.
Our projects include the Cardano blockchain, as well as other products in the areas of decentralized finance (DeFi), governance, and identity management, aiming to advance the capabilities and adoption of blockchain and Web3 technology globally.
About Midnight: IOG's Midnight Tribe is a business technology provider and core contributor to the Midnight Network, a blockchain platform for developing decentralized applications that safeguard personal and commercial data. The Midnight Network is the first blockchain to offer programmable data isolation by leveraging zero-knowledge (ZK) proofs to enable selective disclosure of what information is visible on-chain and is designed to help developers implement necessary business policies, such as meeting regulatory requirements.
What The Role Involves: As an experienced and visionary Head of Site Reliability Engineering (SRE), you will be responsible for leading the infrastructure and reliability strategy for Midnight, a regulatory-friendly blockchain focused on data protection, privacy, and freedom of expression. In this senior leadership role, you will own the reliability, scalability, and performance of the Midnight platform. You will be responsible for building and leading a high-performing team of SREs, driving the SRE roadmap, and partnering closely with engineering, security, and product teams to deliver robust production systems.
You will be instrumental in setting the foundations of our infrastructure, designing systems that scale globally, and ensuring high availability, while embracing the unique challenges of a blockchain-based architecture. This is a hands-on leadership role combining technical depth, architectural vision, operational rigor, and people leadership.
- Lead the SRE team, sharing expertise and best practices.
- Coach, mentor and develop SRE team.
- Demonstrate leadership in driving initiatives that enhance service reliability, scalability, and overall performance.
- Lead the entire lifecycle of services, including inception, design, deployment, operation, and refinement.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
- Oversee the maintenance of live services by continuously measuring and monitoring factors like availability, latency, and overall system health.
- Assist our teams in creating software that is both simple and flexible to configure and deploy.
- Lead sustainable incident response practices, ensuring timely resolution with a focus on minimizing impact.
- Collaborate with software engineering and testing teams to establish and maintain automated regression suite infrastructure and performance testing.
- Sustainably scale systems through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.
- Conduct blameless postmortems to analyze incidents, identify root causes, and implement preventive measures.
Requirements: Who you are:
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- At least 8 years in a Reliability Engineering, DevOps or infrastructure focused role.
- Proven track record of leading and managing a high-performing SRE team.
- Experience writing code in Python, Rust/C++ or JavaScript.
- Proven years of experience in Build and Release engineering, Linux operational excellence and automation.
- Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
- You will be someone who works well on your own and with a team.
- You are kind and respectful of othersβ opinions and you are open and act with integrity when engaging in academic or technical discussions.
- Proven experience in capacity planning, performance monitoring, and optimization to ensure systems can handle current and future loads efficiently.
- System engineering experience working with application servers, containers, and web servers.
- Demonstrated ability to analyze incidents, identify root causes, and implement preventive measures to reduce the likelihood of recurring issues.
- Strong understanding of cloud architecture including the major cloud providers (AWS, GCP, etc).
- Experience working with Docker containers and related orchestration technologies (such as Kubernetes or ECS).
- Knowledge of SRE principles (observability, SLOs, SLIs, logging, etc).
- Understand underlying networking and security considerations when developing the architecture of our deployment environments.
- Fluency in git based workflows, commit discipline.
- Experience in providing mentorship and coaching to team members.
Benefits:
- Remote work
- Laptop reimbursement
- New starter package to buy hardware essentials (headphones, monitor, etc)
- Learning & Development opportunities
- Competitive PTO
At IOG, we value diversity and always treat all employees and job applicants based on merit, qualifications, competence, and talent. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
Head of Site Reliability Engineering - Midnight employer: Io Me
Contact Detail:
Io Me Recruiting Team
StudySmarter Expert Advice π€«
We think this is how you could land Head of Site Reliability Engineering - Midnight
β¨Tip Number 1
Familiarise yourself with the specific technologies and tools mentioned in the job description, such as Python, Rust/C++, and Docker. Being able to discuss your experience with these technologies during an interview will demonstrate your technical expertise and alignment with the role.
β¨Tip Number 2
Showcase your leadership skills by preparing examples of how you've successfully led teams in the past. Highlight any initiatives you've driven that improved service reliability or performance, as this is crucial for the Head of Site Reliability Engineering position.
β¨Tip Number 3
Research IOG and its projects, particularly the Midnight Network. Understanding their mission and values will help you articulate how your vision aligns with theirs, making you a more compelling candidate.
β¨Tip Number 4
Prepare to discuss your approach to incident response and system reliability. Be ready to share specific examples of how you've handled incidents in the past, including your problem-solving process and the outcomes of your actions.
We think you need these skills to ace Head of Site Reliability Engineering - Midnight
Some tips for your application π«‘
Tailor Your CV: Make sure your CV highlights relevant experience in Site Reliability Engineering, DevOps, and infrastructure roles. Emphasise your leadership skills and any specific technologies mentioned in the job description, such as Python, Rust/C++, or JavaScript.
Craft a Compelling Cover Letter: In your cover letter, express your passion for blockchain technology and how your vision aligns with IOG's mission. Mention specific projects or experiences that demonstrate your ability to lead high-performing teams and drive reliability initiatives.
Showcase Problem-Solving Skills: Provide examples of how you've tackled complex problems in previous roles. Highlight your systematic approach to incident analysis and your experience with blameless postmortems, as these are crucial for the role.
Highlight Collaboration Experience: Discuss your experience working closely with engineering, security, and product teams. Illustrate how youβve successfully collaborated on projects to enhance service reliability and performance, which is key for this position.
How to prepare for a job interview at Io Me
β¨Understand the Company and Its Vision
Before your interview, make sure to research IOG and its Midnight project thoroughly. Familiarise yourself with their approach to blockchain technology, especially their focus on data protection and privacy. This will help you align your answers with their vision and demonstrate your genuine interest in the role.
β¨Showcase Your Leadership Experience
As a Head of Site Reliability Engineering, you'll need to lead a high-performing team. Be prepared to discuss your previous leadership roles, how you've mentored team members, and the strategies you've implemented to enhance service reliability and performance. Use specific examples to illustrate your impact.
β¨Demonstrate Technical Proficiency
Given the technical nature of the role, be ready to discuss your experience with coding languages like Python, Rust/C++, or JavaScript. Highlight your familiarity with cloud architecture, containerisation technologies like Docker, and orchestration tools such as Kubernetes. This will show that you have the hands-on skills necessary for the position.
β¨Prepare for Problem-Solving Scenarios
Expect to face questions that assess your systematic problem-solving approach. Prepare to discuss past incidents you've managed, how you identified root causes, and the preventive measures you implemented. This will demonstrate your ability to handle challenges effectively and your commitment to continuous improvement.