At a Glance
- Tasks: Manage AI helpdesk platform operations, ensuring reliability and performance.
- Company: HelloKindred, a leader in staffing for marketing, creative, and tech roles.
- Benefits: Hybrid work setup, competitive salary, and opportunities for professional growth.
- Other info: Inclusive hiring practices with excellent career advancement potential.
- Why this job: Join a dynamic team and shape the future of AI technology.
- Qualifications: Experience in DevOps, cloud platforms, and strong troubleshooting skills.
The predicted salary is between 60000 - 80000 € per year.
Our client in the Information Technology and Services industry is looking for a Platform / SRE Engineer to own deployment, observability, reliability, cost control, and production operations for an AI helpdesk platform. This role will support the design, deployment, and operational management of AI services and production environments while ensuring scalability, uptime, performance optimization, and operational resilience across cloud-based infrastructure.
The ideal candidate will bring strong expertise in DevOps and Site Reliability Engineering practices, along with experience managing cloud-native platforms, CI/CD pipelines, observability tooling, and AI/ML production workloads within complex enterprise environments.
What you will do:
- Build and manage CI/CD pipelines, infrastructure, and runtime environments for AI services.
- Deploy and operate model-serving, orchestration, and application workloads.
- Implement monitoring, tracing, alerting, logging, and operational dashboards.
- Manage scaling activities, release processes, rollback mechanisms, and production support operations.
- Optimize inference cost, latency, uptime, and overall system reliability.
- Create runbooks, operational standards, and incident response processes.
- Support infrastructure automation and platform engineering initiatives.
- Maintain observability and monitoring solutions across production environments.
- Support release automation, secrets management, and production operational processes.
- Collaborate with engineering teams to support AI platform reliability and operational readiness.
- Troubleshoot production issues and support system diagnostics and remediation activities.
- Ensure platform stability, scalability, and performance across cloud-native environments.
Qualifications:
- Strong experience in DevOps and Site Reliability Engineering environments.
- Experience with Docker, Kubernetes, cloud platforms, and Infrastructure as Code practices.
- Strong experience with monitoring, observability, and operational tooling.
- Familiarity with CI/CD pipelines, release automation, secrets management, and production support processes.
- Understanding of LLM deployment patterns and API-based model integrations.
- Experience working with cloud platforms, particularly AWS.
- Experience using Jira, Confluence, and ServiceNow.
- Experience supporting AI/ML workloads in production environments is preferred.
- Experience with GPU workloads, autoscaling, and cost optimization is preferred.
- Strong troubleshooting, operational support, and incident response capabilities.
- Strong communication and collaboration skills within cross-functional engineering teams.
All your information will be kept confidential according to EEO guidelines. Candidates must be legally authorized to live and work in the country where the position is based, without requiring employer sponsorship. HelloKindred is committed to fair, transparent, and inclusive hiring practices. We assess candidates based on skills, experience, and role-related requirements. We appreciate your interest in this opportunity. While we review every application carefully, only candidates selected for an interview will be contacted. HelloKindred is an equal opportunity employer. We welcome applicants of all backgrounds and do not discriminate on the basis of race, colour, religion, sex, gender identity or expression, sexual orientation, age, national origin, disability, veteran status, or any other protected characteristic under applicable law.
Platform - SRE Engineer in Sheffield employer: HelloKindred
HelloKindred is an exceptional employer that prioritises employee well-being and professional growth, offering a hybrid work environment that fosters flexibility and collaboration. With a strong commitment to inclusivity and innovation, employees have access to cutting-edge projects in AI and cloud technologies, alongside opportunities for continuous learning and development within a supportive team culture. Join us to be part of a forward-thinking organisation that values your contributions and empowers you to make a meaningful impact.
StudySmarter Expert Advice🤫
We think this is how you could land Platform - SRE Engineer in Sheffield
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, especially those who work at HelloKindred or similar companies. A friendly chat can open doors and give you insider info that could help you stand out.
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repo showcasing your projects, especially those related to DevOps and SRE. This gives potential employers a taste of what you can do beyond your CV.
✨Tip Number 3
Prepare for the interview by brushing up on common SRE scenarios and challenges. Think about how you’d tackle issues like scaling or incident response. We want you to be ready to impress with your problem-solving skills!
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen. Plus, we love seeing candidates who take that extra step to connect directly with us.
We think you need these skills to ace Platform - SRE Engineer in Sheffield
Some tips for your application 🫡
Tailor Your CV:Make sure your CV is tailored to the Platform - SRE Engineer role. Highlight your experience with DevOps, Site Reliability Engineering, and any relevant cloud platforms like AWS. We want to see how your skills match what we're looking for!
Craft a Compelling Cover Letter:Your cover letter is your chance to shine! Use it to explain why you're passionate about this role and how your background makes you a great fit. We love seeing enthusiasm and a personal touch, so let your personality come through.
Showcase Relevant Projects:If you've worked on projects involving CI/CD pipelines, Docker, or Kubernetes, make sure to mention them! We want to see concrete examples of your work that demonstrate your expertise in managing cloud-native platforms and AI services.
Apply Through Our Website:Don't forget to apply through our website! It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it helps us keep everything organised and efficient. We can’t wait to hear from you!
How to prepare for a job interview at HelloKindred
✨Know Your Tech Stack
Make sure you’re well-versed in the technologies mentioned in the job description, like Docker, Kubernetes, and AWS. Brush up on your knowledge of CI/CD pipelines and observability tools, as these will likely come up during the interview.
✨Showcase Your Problem-Solving Skills
Prepare to discuss specific examples where you've troubleshot production issues or optimised system performance. Use the STAR method (Situation, Task, Action, Result) to structure your answers and highlight your operational support capabilities.
✨Understand AI/ML Workloads
Since the role involves supporting AI services, be ready to talk about your experience with AI/ML workloads. Familiarise yourself with LLM deployment patterns and API-based model integrations to demonstrate your expertise in this area.
✨Ask Insightful Questions
Prepare thoughtful questions about the company’s approach to platform reliability and operational readiness. This shows your genuine interest in the role and helps you assess if the company culture aligns with your values.