At a Glance
- Tasks: Maintain cloud infrastructure and manage incidents for seamless operations.
- Company: WALT Labs, a leading managed service provider in cloud technology.
- Benefits: 20 holiday days, private health insurance, and career growth opportunities.
- Other info: On-site role in Kings Cross, London, with a collaborative work environment.
- Why this job: Join a dynamic team and work with cutting-edge Google Cloud Platform technologies.
- Qualifications: 8-10 years experience in cloud management and strong GCP knowledge required.
The predicted salary is between 36000 - 60000 £ per year.
Company Description
WALT Labs, a leading managed service provider, is dedicated to empowering businesses by harnessing the power of cloud technology. Our team specializes in delivering customized solutions tailored to meet the unique needs of our clients, driving growth and operational efficiency across industries. From supporting small businesses with seamless data migration to enabling large corporations to manage complex infrastructure projects, we provide exceptional service while staying at the forefront of cloud technology advancements.
Role Description
This is a full-time on-site role 3 days a week minimum in Kings Cross London. We are seeking a skilled Site Reliability Engineer with a strong focus on Google Cloud Platform (GCP) to join our dynamic team. In this role, you’ll be responsible for maintaining cloud infrastructure, managing incidents, and ensuring seamless operations for our clients. You’ll use tools like incident.io and JIRA to manage and resolve support requests efficiently.
Responsibilities
- Serve as L2 on-call escalation point for complex technical issues requiring advanced troubleshooting
- Lead response to critical incidents, coordinating multiple teams and ensuring effective communication
- Provide expert-level support for GCP services including advanced networking, security, and architecture
- Perform advanced Google Workspace administration including domain management, security policies, and integration
- Use incident.io to manage escalated incidents, major incidents, and coordinate war room activities
- Optimize support workflows in JIRA, creating automation rules and improving ticket routing
- Monitor and tune infrastructure performance using advanced Grafana queries and custom metrics
- Lead technical projects including migrations, upgrades, and new service implementations
- Create comprehensive documentation including architectural diagrams, runbooks, and best practices guides
- Achieve minimum 50% billable hours through complex Cloud Assist/Managed Cloud customers and consulting engagements
- Mentor Cloud Support Engineers and juniors through formal and informal training sessions
- Identify and implement process improvements to increase efficiency and reduce resolution time
- Conduct thorough root cause analysis for recurring issues and implement permanent fixes
- Present technical solutions and recommendations to customer stakeholders and management
- Design and implement monitoring strategies for complex multi-cloud environments
- Develop automation scripts and tools to improve team efficiency and reduce manual work
- Participate in pre-sales activities providing technical expertise for solution design
- Review and approve changes to production environments following change management procedures
- Lead knowledge sharing sessions and technical deep-dives for the team
- Coordinate with vendor support for complex issues requiring manufacturer assistance
- Maintain expertise in multiple GCP services and stay current with new feature releases
- Participation in business hours escalation rotation
Qualifications
- 3-5 years experience with Google Cloud Platform
- Minimum 2 Google Cloud Professional certifications
- Advanced Kubernetes knowledge and troubleshooting
- Proficient in Infrastructure as Code (Terraform)
- Strong scripting abilities (Python, Go, Bash)
- Expert with monitoring tools (Grafana, Datadog)
- Experience leading incident response
- Excellent communication and mentoring skills
- Proven track record of process improvement
- Ability to manage multiple priorities effectively
- Strong customer service orientation
Benefits
- 20 holiday days + bank holidays (earn 1.5 days every 3 years)
- Private health insurance
Site Reliability Engineer in London employer: WALT Labs
WALT Labs is an exceptional employer that fosters a collaborative and innovative work culture in the heart of Kings Cross, London. With a strong commitment to employee growth, we offer extensive training opportunities and support for professional development, particularly in cloud technologies like Google Cloud Platform. Our competitive benefits package, including private health insurance and generous holiday allowances, ensures that our team members are well-supported both personally and professionally.
StudySmarter Expert Advice🤫
We think this is how you could land Site Reliability Engineer in London
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, attend meetups, and connect with current employees at WALT Labs. A friendly chat can sometimes lead to opportunities that aren’t even advertised!
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those related to Google Cloud Platform. This gives you a chance to demonstrate your expertise beyond just a CV.
✨Tip Number 3
Prepare for the interview by brushing up on common SRE scenarios and incident management processes. Practise explaining how you’ve tackled challenges in the past, especially using tools like incident.io and JIRA.
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining the WALT Labs team!
We think you need these skills to ace Site Reliability Engineer in London
Some tips for your application 🫡
Tailor Your CV:Make sure your CV is tailored to the Site Reliability Engineer role. Highlight your experience with Google Cloud Platform and any relevant tools like incident.io and JIRA. We want to see how your skills match what we're looking for!
Craft a Compelling Cover Letter:Your cover letter is your chance to shine! Use it to explain why you're passionate about cloud technology and how your background makes you a great fit for our team. We love seeing enthusiasm and personality in applications.
Showcase Your Problem-Solving Skills:In your application, don’t forget to mention specific examples of how you've tackled challenges in cloud environments. We’re all about strong troubleshooting skills, so let us know how you’ve made a difference in past roles!
Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way to ensure your application gets into the right hands. Plus, it shows us you’re serious about joining our awesome team at WALT Labs!
How to prepare for a job interview at WALT Labs
✨Know Your GCP Inside Out
Make sure you brush up on your Google Cloud Platform knowledge. Be ready to discuss specific services you've used and how they relate to the role. Prepare examples of how you've managed cloud infrastructure or resolved incidents in the past.
✨Familiarise Yourself with Incident Management Tools
Since you'll be using tools like incident.io and JIRA, it’s a good idea to get comfortable with them before the interview. Think of scenarios where you’ve effectively used these tools to manage incidents or support requests, and be ready to share those experiences.
✨Show Off Your Troubleshooting Skills
Prepare to discuss your approach to troubleshooting in cloud environments. Have a couple of real-life examples ready that showcase your problem-solving skills, especially under pressure. This will demonstrate your ability to handle the challenges of the role.
✨Communicate Clearly and Confidently
Excellent communication is key, especially when dealing with clients and stakeholders. Practice explaining complex technical concepts in simple terms. This will not only help you during the interview but also show that you can provide great customer service.