Arbi is an AI‑powered legal research platform serving legal professionals. We run our own infrastructure and take reliability seriously — our clients depend on the platform being fast, stable, and available. You’ll be the person who makes sure it stays that way.
What you’ll be responsible for
- System monitoring & alerting — keep a close eye on all services, spot issues before clients do, and own the alerting stack
- Incident response — be the first line of response when something goes wrong; diagnose, escalate when needed, and resolve quickly
- Service health & uptime — ensure our cluster, databases, inference services, and networking are running smoothly day to day
- Databases & storage — monitor and maintain replicated databases and object storage; perform routine health checks and backups verification
- Observability — maintain and improve the monitoring stack: metrics, dashboards, distributed tracing, and log aggregation
What we’re looking for
- Working knowledge of Linux systems administration
- Hands‑on experience with Docker and Docker Swarm — managing services, stacks, and multi‑node deployments
- CI/CD pipeline experience (GitHub Actions or similar)
- Experience monitoring and maintaining production systems — you’ve been on‑call and know what it feels like
- Familiarity with observability tools: Grafana, log aggregation, alerting (Prometheus, Netdata, or similar)
- PostgreSQL or similar database administration: backups, replication monitoring
- Basic networking awareness — understanding of how services are exposed, port configurations, overlay networks, and DNS resolution in containerised environments
- Good communication: you can write a clear incident summary and keep stakeholders informed
- Experience with GPU infrastructure or AI/ML inference services
- Familiarity with Redis or other in‑memory data stores
- Ansible or configuration management tooling
Nice to have
- Experience with GPU infrastructure or AI/ML inference services
- Familiarity with Redis or other in‑memory data stores
- Ansible or configuration management tooling
How we work
We are an AI‑first company — not just in what we build, but in how we operate. We provide access to AI coding assistants and automation tools, and we expect you to use them. You don’t need to arrive with experience — just the right mindset.
Good to know
- Existing infrastructure — you’d be joining a small, focused engineering team with systems already in production, runbooks in place, and monitoring set up. Your job is to keep it healthy and improve it over time, not rebuild it from scratch.
- Real stakes — our clients are legal professionals who rely on the platform daily. Uptime and responsiveness matter.
- Location — ideally based in London. We are not able to sponsor visas at this time.
The ideal candidate is someone who takes quiet pride in a green dashboard — and knows exactly what to do when it turns red.