Site Reliability Engineer
Site Reliability Engineer

Site Reliability Engineer

Full-Time 48000 - 72000 £ / year (est.) No home office possible
M

At a Glance

  • Tasks: Join us as a Site Reliability Engineer to automate and enhance system reliability.
  • Company: Be part of a dynamic team in London focused on cutting-edge technology solutions.
  • Benefits: Enjoy a collaborative work environment with opportunities for professional growth.
  • Why this job: Make a real impact by reducing manual toil and improving system performance.
  • Qualifications: 5-9 years of experience in SRE, automation tools, and cloud technologies required.
  • Other info: This is a full onsite position, perfect for tech enthusiasts ready to innovate.

The predicted salary is between 48000 - 72000 £ per year.

SRE Expert (Full Onsite)

Location- London

Responsible to perform end to end Self-Healing automation solution to reduce manual effort/TOIL.

Technical Skill –Ansible, Terraform, Python, DevOps, SRE, Dockers, AWS (Atlas), ECS Based internal tooling. Shell Script, Linux, Monitoring tools – Datadog, Splunk, Dynatrace, Grafana,

Thousand Eyes, Gremlin etc.

  1. 5 to 9 years of experience with Automation principals and tools (Ansible etc.). should have worked with Toil identification and quality of life automation.
  2. Advanced working experience with two or more of the following: Unix/Linux, Windows Server, Oracle, MSSQL, MongoDB.
  3. Experience with Python, Java, Curl scripting or any other types of scripting.
  4. Experience with JIRA, Confluence, BitBucket, GitHUB, Jenkins, Jules, Terraform.
  5. Experience with two or more of the following observability tools: AppDynamics, Geneos, Dyanatrace, ECS Based internal tooling, Datadog, Cloud watch, Big Panda, Elastic Search (ELK), Google Cloud Logging, Grafana, Prometheus, Splunk, Thousand Eyes etc..
  6. Experience in creating Dashboard for Infra / APM / E2E workflows.
  7. Monitoring, logging, Alerting and Error budget (99.9, 99.99, 99.999 %) for software, Operations & Business.
  8. Define SLO, SLI, SLA with business/ operations / Engineering team
  9. Experience with logging, monitoring, and event detection on Cloud or Distributed platforms.
  10. Experience creating and modifying technical documentation such as environment flow, functional requirements, nonfunctional requirements.
  11. Effective production management – Incident & change Management, Production control, ITSM, Service Now, problem solving and analytical skills with ability to turn findings into strategic imperatives.
  12. Technical operations application support and stability, realiability and resiliency experience.
  13. Minimum 4-6 years of hands-on experience into SRE implementation of monitoring system- Dashboards development for application reliability using Splunk, Dynatrace, Grafana, App Dynamics, Datadog, Big panda.
  14. Experience working on Configuration as Code, Infrastructure as code, AWS(Altas)
  15. Provides technical direction regarding monitoring and logging to less experienced staff or develops highly complex original solutions. Acts as an Expert technical resource for modeling, simulation and analysis efforts.
  16. Overall, we are looking for an Automation Engineer, who could reduce the toil issues and enhance the system towards reliability and scalability.

Nature of the Job:

1. Collaborate with Production support team, identify the existing manual activities, and automate.

2. Identify toil area where it can be automated to avoid manual intervention

3. Build Monitoring system and observability platform for more Stack traces and alerts and Dashboards.

4. Ability to define SLA, SLO and SLI and implement the same for better monitoring

5. Scalability, reliability, and observability are the primary goals for reduction of MTTD and MTTR.

Site Reliability Engineer employer: Mphasis

As a Site Reliability Engineer in London, you will join a dynamic team that values innovation and collaboration, fostering a culture of continuous improvement and professional growth. We offer competitive benefits, including flexible working arrangements and opportunities for skill development in cutting-edge technologies like AWS and DevOps tools. Our commitment to reducing manual toil through automation not only enhances operational efficiency but also empowers you to make a meaningful impact on our systems' reliability and scalability.
M

Contact Detail:

Mphasis Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Site Reliability Engineer

✨Tip Number 1

Make sure to showcase your hands-on experience with automation tools like Ansible and Terraform. Highlight specific projects where you've successfully reduced manual effort through automation, as this aligns perfectly with our focus on self-healing solutions.

✨Tip Number 2

Familiarize yourself with the observability tools mentioned in the job description, such as Datadog and Grafana. Being able to discuss how you've used these tools to create dashboards or improve monitoring will set you apart during discussions.

✨Tip Number 3

Prepare to discuss your experience with defining SLAs, SLOs, and SLIs. We value candidates who can articulate how they have implemented these metrics in previous roles to enhance system reliability and performance.

✨Tip Number 4

Be ready to share examples of how you've collaborated with production support teams to identify and automate toil areas. This collaborative mindset is crucial for the role and demonstrates your ability to work effectively within a team.

We think you need these skills to ace Site Reliability Engineer

Ansible
Terraform
Python
DevOps
SRE
Docker
AWS (Atlas)
ECS Based internal tooling
Shell Scripting
Linux
Monitoring tools (Datadog, Splunk, Dynatrace, Grafana, Thousand Eyes, Gremlin)
Toil identification
Quality of life automation
Unix/Linux
Windows Server
Oracle
MSSQL
MongoDB
Java
Curl scripting
JIRA
Confluence
BitBucket
GitHub
Jenkins
Jules
AppDynamics
CloudWatch
Big Panda
Elastic Search (ELK)
Google Cloud Logging
Prometheus
Dashboard creation for Infra/APM/E2E workflows
Monitoring, logging, alerting, and error budget management
Defining SLO, SLI, SLA
Technical documentation creation and modification
Incident & change management
ITSM
Service Now
Problem-solving skills
Analytical skills
Configuration as Code
Infrastructure as Code
Technical direction provision
Modeling, simulation, and analysis

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights your experience with automation tools like Ansible and Terraform, as well as your proficiency in Python and other scripting languages. Emphasize your hands-on experience with monitoring tools such as Datadog and Splunk.

Craft a Strong Cover Letter: In your cover letter, explain how your background aligns with the responsibilities of the Site Reliability Engineer role. Discuss specific projects where you reduced manual effort through automation and improved system reliability.

Showcase Relevant Experience: When detailing your work history, focus on your experience with incident management, production control, and your ability to define SLAs, SLOs, and SLIs. Use metrics to demonstrate your impact on system reliability and scalability.

Highlight Collaboration Skills: Since the role involves collaboration with production support teams, mention any relevant teamwork experiences. Describe how you identified manual activities and successfully automated them, showcasing your problem-solving skills.

How to prepare for a job interview at Mphasis

✨Showcase Your Automation Skills

Be prepared to discuss your experience with automation tools like Ansible and Terraform. Highlight specific projects where you successfully reduced manual effort and improved system reliability.

✨Demonstrate Your Monitoring Expertise

Familiarize yourself with the monitoring tools mentioned in the job description, such as Datadog and Grafana. Be ready to explain how you've used these tools to create dashboards and improve observability.

✨Discuss Your Experience with SRE Principles

Talk about your understanding of SRE principles, including defining SLAs, SLOs, and SLIs. Provide examples of how you've implemented these concepts in previous roles to enhance system performance.

✨Prepare for Technical Questions

Expect technical questions related to scripting languages like Python and shell scripting. Brush up on your knowledge of Unix/Linux systems and be ready to solve problems on the spot.

Site Reliability Engineer
Mphasis
M
  • Site Reliability Engineer

    Full-Time
    48000 - 72000 £ / year (est.)

    Application deadline: 2027-03-15

  • M

    Mphasis

Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>