At a Glance
- Tasks: Design and maintain cloud infrastructure, ensuring reliability and performance through automation.
- Company: Join a forward-thinking tech company focused on cloud-native solutions.
- Benefits: Competitive salary, flexible working hours, and opportunities for professional growth.
- Why this job: Be at the forefront of cloud technology and make a real impact on data platforms.
- Qualifications: Experience with AWS, SRE principles, and automation tools like Terraform or CloudFormation.
- Other info: Dynamic team environment with a focus on innovation and continuous improvement.
The predicted salary is between 36000 - 60000 Β£ per year.
We are looking for an AWS Site Reliability Engineer (SRE) to support and scale a cloud-native data platform built on AWS, Snowflake, and Databricks. The role focuses on driving reliability through automation, disaster recovery (DR) testing, resiliency engineering, observability, and proactive SLO/SLI/SLA management.
Key Responsibilities
- Design, build, and maintain automation for infrastructure provisioning, platform operations, and incident response using IaC and CI/CD.
- Lead resiliency and disaster recovery planning, including regular DR drills, failure testing, and recovery validation across AWS and data platform components.
- Define, implement, and manage SLIs, SLOs, and SLAs for critical data pipelines and platform services; use error budgets to guide reliability improvements.
- Build and operate robust observability solutions (metrics, logs, traces, alerts) for AWS services, Snowflake, and Databricks workloads.
- Partner with data engineering and platform teams to embed reliability-by-design into architecture and delivery practices.
- Perform root cause analysis (RCA) and drive continuous improvement to reduce toil and improve platform availability and performance.
- Own and drive resolution of incidents and service requests raised by consumer teams, providing operational support for platform usage while identifying recurring issues and automating fixes to improve reliability and user experience.
Required Skills & Experience
- Practical knowledge of SRE principles, including SLO/SLI/SLA design and error budgets.
- Strong experience with AWS (e.g., EC2, S3, IAM, VPC, CloudWatch) in production environments.
- Experience with observability tools and monitoring/alerting best practices.
- Hands-on experience with automation and IaC (Terraform, CloudFormation, CDK) and scripting (Python, Bash).
- Exposure to data platforms such as Snowflake and/or Databricks.
Nice to Have
- Experience running DR tests, chaos engineering, or resiliency testing in cloud environments.
- Familiarity with CI/CD pipelines and GitOps practices.
- Background supporting large-scale data or analytics platforms.
Cloud Engineer in Paisley employer: Pyramid Consulting, Inc
Contact Detail:
Pyramid Consulting, Inc Recruiting Team
StudySmarter Expert Advice π€«
We think this is how you could land Cloud Engineer in Paisley
β¨Tip Number 1
Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can refer you directly.
β¨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those related to AWS, automation, and data platforms. This gives potential employers a taste of what you can do beyond your CV.
β¨Tip Number 3
Prepare for interviews by practising common SRE scenarios and technical questions. Brush up on your knowledge of SLIs, SLOs, and disaster recovery strategies. The more confident you are, the better you'll perform!
β¨Tip Number 4
Don't forget to apply through our website! We love seeing applications from passionate candidates who are eager to join our team. Plus, itβs a great way to ensure your application gets the attention it deserves.
We think you need these skills to ace Cloud Engineer in Paisley
Some tips for your application π«‘
Tailor Your CV: Make sure your CV speaks directly to the role of AWS Site Reliability Engineer. Highlight your experience with SRE principles, AWS services, and any relevant automation tools. We want to see how your skills align with what we're looking for!
Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about reliability engineering and how you can contribute to our cloud-native data platform. Keep it engaging and personal β we love to see your personality come through!
Showcase Your Projects: If you've worked on any projects related to automation, disaster recovery, or observability, make sure to mention them. Weβre keen to see real-world examples of your work that demonstrate your problem-solving skills and technical expertise.
Apply Through Our Website: We encourage you to apply directly through our website. Itβs the best way for us to receive your application and ensures you donβt miss out on any important updates. Plus, it shows us youβre serious about joining the StudySmarter team!
How to prepare for a job interview at Pyramid Consulting, Inc
β¨Know Your SRE Principles
Make sure you brush up on your understanding of SRE principles, especially around SLOs, SLIs, and error budgets. Be ready to discuss how you've applied these concepts in past roles, as this will show your practical knowledge and readiness for the position.
β¨Showcase Your AWS Experience
Prepare specific examples of your experience with AWS services like EC2, S3, and CloudWatch. Highlight any projects where youβve implemented automation or incident response strategies, as this will demonstrate your hands-on expertise and problem-solving skills.
β¨Demonstrate Automation Skills
Be ready to talk about your experience with Infrastructure as Code (IaC) tools like Terraform or CloudFormation. Share instances where youβve built automation for infrastructure provisioning or incident response, as this aligns perfectly with the role's requirements.
β¨Discuss Observability Solutions
Familiarise yourself with observability tools and monitoring best practices. Prepare to discuss how you've built or operated observability solutions in previous roles, focusing on metrics, logs, and alerts, as this is crucial for ensuring platform reliability.