At a Glance
- Tasks: Design and maintain cloud infrastructure, ensuring reliability and performance through automation.
- Company: Join a forward-thinking tech company focused on cloud-native solutions.
- Benefits: Competitive salary, flexible working hours, and opportunities for professional growth.
- Why this job: Be at the forefront of cloud technology and make a real impact on data platforms.
- Qualifications: Experience with AWS, SRE principles, and automation tools like Terraform or CloudFormation.
- Other info: Dynamic team environment with a focus on innovation and continuous improvement.
The predicted salary is between 36000 - 60000 £ per year.
We are looking for an AWS Site Reliability Engineer (SRE) to support and scale a cloud-native data platform built on AWS, Snowflake, and Databricks. The role focuses on driving reliability through automation, disaster recovery (DR) testing, resiliency engineering, observability, and proactive SLO/SLI/SLA management.
Key Responsibilities
- Design, build, and maintain automation for infrastructure provisioning, platform operations, and incident response using IaC and CI/CD.
- Lead resiliency and disaster recovery planning, including regular DR drills, failure testing, and recovery validation across AWS and data platform components.
- Define, implement, and manage SLIs, SLOs, and SLAs for critical data pipelines and platform services; use error budgets to guide reliability improvements.
- Build and operate robust observability solutions (metrics, logs, traces, alerts) for AWS services, Snowflake, and Databricks workloads.
- Partner with data engineering and platform teams to embed reliability-by-design into architecture and delivery practices.
- Perform root cause analysis (RCA) and drive continuous improvement to reduce toil and improve platform availability and performance.
- Own and drive resolution of incidents and service requests raised by consumer teams, providing operational support for platform usage while identifying recurring issues and automating fixes to improve reliability and user experience.
Required Skills & Experience
- Practical knowledge of SRE principles, including SLO/SLI/SLA design and error budgets.
- Strong experience with AWS (e.g., EC2, S3, IAM, VPC, CloudWatch) in production environments.
- Experience with observability tools and monitoring/alerting best practices.
- Hands-on experience with automation and IaC (Terraform, CloudFormation, CDK) and scripting (Python, Bash).
- Exposure to data platforms such as Snowflake and/or Databricks.
Nice to Have
- Experience running DR tests, chaos engineering, or resiliency testing in cloud environments.
- Familiarity with CI/CD pipelines and GitOps practices.
- Background supporting large-scale data or analytics platforms.
Cloud Engineer in Milton employer: Pyramid Consulting, Inc
Contact Detail:
Pyramid Consulting, Inc Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Cloud Engineer in Milton
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can refer you directly.
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those related to AWS, automation, and data platforms. This gives potential employers a taste of what you can do beyond your CV.
✨Tip Number 3
Prepare for interviews by brushing up on SRE principles and your hands-on experience with AWS tools. Practice common interview questions and scenarios related to incident response and reliability engineering to boost your confidence.
✨Tip Number 4
Don’t forget to apply through our website! We’re always on the lookout for talented individuals like you. Plus, it’s a great way to ensure your application gets the attention it deserves.
We think you need these skills to ace Cloud Engineer in Milton
Some tips for your application 🫡
Tailor Your CV: Make sure your CV speaks directly to the role of AWS Site Reliability Engineer. Highlight your experience with SRE principles, AWS services, and any relevant automation tools. We want to see how your skills align with what we're looking for!
Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about reliability engineering and how you can contribute to our cloud-native data platform. Keep it engaging and personal – we love to see your personality come through!
Showcase Relevant Projects: If you've worked on projects involving AWS, Snowflake, or Databricks, make sure to mention them! We want to know about your hands-on experience and how you've tackled challenges in the past. Real-world examples can really set you apart.
Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows us you’re keen on joining the StudySmarter team!
How to prepare for a job interview at Pyramid Consulting, Inc
✨Know Your SRE Principles
Make sure you brush up on your understanding of SRE principles, especially around SLOs, SLIs, and error budgets. Be ready to discuss how you've applied these concepts in past roles, as this will show your practical knowledge and readiness for the job.
✨Showcase Your AWS Experience
Prepare to talk about your hands-on experience with AWS services like EC2, S3, and CloudWatch. Have specific examples ready that demonstrate how you've used these tools in production environments to drive reliability and performance.
✨Demonstrate Automation Skills
Since automation is key in this role, be ready to discuss your experience with IaC tools like Terraform or CloudFormation. Share examples of how you've built automation for infrastructure provisioning or incident response, highlighting any challenges you overcame.
✨Emphasise Collaboration
This role involves partnering with data engineering and platform teams, so be prepared to discuss how you've worked collaboratively in the past. Highlight any projects where you embedded reliability-by-design into architecture and delivery practices, showcasing your teamwork skills.