At a Glance
- Tasks: Ensure our systems run smoothly and reliably while collaborating with development teams.
- Company: Join a rapidly growing startup blending human expertise with AI for innovative solutions.
- Benefits: Enjoy flexible hours, remote work, competitive salary, equity, and generous leave policies.
- Why this job: Make a significant impact in a smaller company with cutting-edge technology and a supportive culture.
- Qualifications: Experience with monitoring tools, coding in Go or Python, and CI/CD platforms required.
- Other info: Work remotely within a 3-hour time zone of the UK and grow your professional skills.
The predicted salary is between 36000 - 60000 £ per year.
We are a rapidly growing startup developing solutions that blend human expertise and AI agents to handle manual customer and marketplace operations tasks. Our unique approach combines the strengths of human expertise (high accuracy and nuanced decision-making) with the advantages of AI automation (speed and cost efficiency). This cutting-edge technology helps businesses solve real-world challenges in trust & safety and beyond without complex technical integration. We believe in an online world free from harm, where we can trust AI to make safe and fair decisions.
We have raised about $25M in VC funding from top tier funds including Creandum and Plural, and operate at significant scale - analysing millions of daily images and videos. We are now looking for a Site Reliability Engineer to ensure our systems run smoothly and reliably at scale. Your expertise in monitoring, observability, and system automation will help maintain the high availability and performance our customers depend on. You will work at the intersection of development and operations, using your technical skills to build robust infrastructure and streamline deployment processes.
You will collaborate closely with development teams to implement monitoring solutions, create comprehensive alerting systems, and develop the tools needed to maintain system reliability. Initially, you will focus on enhancing our existing monitoring and alerting infrastructure, then gradually build self-healing systems and self-service capabilities that empower teams to diagnose and resolve issues independently.
- Collaborate with our development teams to ensure our observability stack provides clear visibility into system health and performance.
- Build self-healing systems using AI tools that automatically resolve common issues before they require human intervention.
- Develop automation tools and diagnostic capabilities that help teams quickly identify and resolve issues when manual investigation is required.
- Ensure secure and reliable code deployment processes through robust CI/CD pipelines and infrastructure automation.
We are looking for someone who is excited about building innovative solutions and wants to have a large impact in a smaller company; you should be comfortable balancing your time between fixing urgent issues and investing in proactive system improvements.
- Have worked with visualisation tools such as Grafana for creating and maintaining dashboards that provide meaningful insights into system performance.
- Are proficient with metrics platforms such as Prometheus, InfluxDB, or OpenTelemetry for collecting and analysing system data.
- Are confident in writing production code in languages such as Go or Python.
- Experience working in a fully remote, international team.
- Experience with CI/CD platforms for building reliable deployment pipelines (e.g. Worked with Kubernetes and infrastructure as code tools such as Terraform for scalable system deployment).
- Are familiar with MLOps practices and tools, and monitoring machine learning systems in production.
This role will report to the VP of Engineering and can be based anywhere within a 3-hour time zone of the UK. Unitary is a remote-first team with flexible hours and location.
Competitive salary and equity package, occupational pension, generous paid parental leave, generous paid sick leave, annual budget for your professional development and growth.
Site Reliability Engineer I employer: Unitary
Contact Detail:
Unitary Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Site Reliability Engineer I
✨Tip Number 1
Familiarise yourself with the specific tools mentioned in the job description, such as Grafana, Prometheus, and Kubernetes. Having hands-on experience or projects showcasing your skills with these technologies can set you apart during discussions.
✨Tip Number 2
Engage with the SRE community online. Join forums, attend webinars, or participate in relevant discussions on platforms like LinkedIn or GitHub. This not only helps you stay updated but also allows you to network with professionals who might provide insights or referrals.
✨Tip Number 3
Prepare to discuss real-world scenarios where you've implemented monitoring solutions or automated processes. Be ready to share specific examples of how your contributions improved system reliability or performance in previous roles.
✨Tip Number 4
Show your enthusiasm for the company's mission by researching their products and understanding the challenges they address. During conversations, express how your skills align with their goals and how you can contribute to building innovative solutions.
We think you need these skills to ace Site Reliability Engineer I
Some tips for your application 🫡
Tailor Your CV: Make sure your CV highlights relevant experience in site reliability engineering, particularly with monitoring, observability, and automation. Use keywords from the job description to demonstrate your fit for the role.
Craft a Compelling Cover Letter: In your cover letter, express your enthusiasm for the role and the company. Mention specific projects or experiences that align with their focus on AI and system reliability, showcasing how you can contribute to their mission.
Showcase Technical Skills: Clearly outline your proficiency with tools mentioned in the job description, such as Grafana, Prometheus, and CI/CD platforms. Provide examples of how you've used these tools in past roles to improve system performance.
Highlight Remote Work Experience: Since this is a remote position, emphasise any previous experience working in remote teams. Discuss how you effectively communicate and collaborate with colleagues across different time zones.
How to prepare for a job interview at Unitary
✨Understand the Company’s Mission
Before your interview, make sure you grasp Unitary AI's mission of blending human expertise with AI. Be prepared to discuss how your skills as a Site Reliability Engineer can contribute to their goal of creating a safe and fair online world.
✨Showcase Your Technical Skills
Be ready to demonstrate your proficiency in monitoring tools like Grafana and metrics platforms such as Prometheus or InfluxDB. Prepare examples of how you've used these tools to enhance system reliability in previous roles.
✨Discuss Automation Experience
Since the role involves building self-healing systems, share your experiences with automation tools and CI/CD pipelines. Highlight any projects where you implemented infrastructure as code using Terraform or worked with Kubernetes.
✨Emphasise Collaboration
Unitary AI values teamwork, especially between development and operations. Be ready to discuss how you've collaborated with cross-functional teams in the past to improve system performance and reliability.