Join to apply for the DevOps Engineer role at Institute of Communication.
DevOps Engineer – Reinforcement Learning Platforms
We are seeking an experienced DevOps Engineer to help build and scale a web-based platform for reinforcement learning (RL) training and RLOps. You will design, implement, and maintain the cloud infrastructure, CI/CD pipelines, and deployment systems that support large-scale RL workloads.
Responsibilities
- Design and manage scalable cloud infrastructure for high-performance RL training and distributed environments
- Build and optimise CI/CD pipelines for open-source and enterprise components
- Implement containerisation and orchestration using Docker and Kubernetes
- Develop Infrastructure as Code solutions (Terraform, CloudFormation, Pulumi)
- Implement monitoring, logging, and alerting for distributed ML systems
- Collaborate with ML teams on resource optimisation and cost efficiency
- Apply security best practices, manage access controls, and ensure compliance
- Automate operational tasks: backups, disaster recovery, maintenance
- Support GPU clusters and distributed compute resources for RL workloads
- Maintain availability and performance of production ML systems
Requirements
- Degree in Computer Science/Engineering or 3+ years of DevOps/infrastructure experience
- Strong background with AWS, GCP, or Azure, including ML/AI workloads
- Proficiency with Docker, Kubernetes, and ML-focused orchestration
- Experience with Terraform/CloudFormation/Pulumi and configuration management
- Solid understanding of CI/CD tools (GitHub Actions, GitLab CI, Jenkins)
- Knowledge of monitoring/observability tools (Prometheus, Grafana, OpenObserve)
- Experience with GPU infrastructure and distributed ML compute frameworks
- Familiarity with MLOps tools and model lifecycle management
- Strong scripting skills (Python, Bash)
- Understanding of cloud networking, security, and database fundamentals
- Experience with HPC environments or schedulers is a plus
- Strong problem‑solving and communication skills
Compensation & Benefits
- Stock options
- 30 days\’ holiday plus bank holidays
- Flexible and remote working options
- Enhanced parental leave
- £500 annual learning and development budget
- Pension scheme
- Regular socials and quarterly gatherings
- Bike‑to‑Work scheme
Seniority level
Mid‑Senior level
Employment type
Full‑time
Job function
Information Technology
Industries
Software Development
Location: Greater London, England, United Kingdom. Referrals increase your chances of interviewing at Institute of Communication by 2x.
#J-18808-Ljbffr
Contact Detail:
Institute of Communication Recruiting Team