System Monitoring & Observability Engineer. Job in Bristol LilyLifestyle Jobs in Woodford

System Monitoring & Observability Engineer. Job in Bristol LilyLifestyle Jobs in Woodford

Woodford Full-Time 50000 - 65000 £ / year (est.) No working from home possible
United Cerebral Palsy of Georgia

At a Glance

  • Tasks: Design and maintain observability solutions using Prometheus and Grafana for real-time insights.
  • Company: Join SRT, a forward-thinking tech company in Bristol with a collaborative spirit.
  • Benefits: Enjoy a competitive salary, generous leave, and career development opportunities.
  • Other info: Work in a dynamic environment with a supportive team of experienced engineers.
  • Why this job: Make a real impact by enhancing user experience through innovative monitoring solutions.
  • Qualifications: Experience with Prometheus, Grafana, and strong Linux skills are essential.

The predicted salary is between 50000 - 65000 £ per year.

Role Overview

As a Software Engineer (Prometheus / Grafana) at SRT, you will be part of a small team tasked with implementing an end-user observability visualisation. Currently, we have observability dashboards in place for our engineers, utilising Prometheus for metrics collection and Grafana for visualisation. This initiative aims to deliver a more user-friendly solution tailored for our end-users. Our clients are located across various countries worldwide, each with differing WAN capabilities, and our system is geographically distributed on-premises across multiple sites. We are fortunate to have a team of highly experienced engineers, including UX designers, who can provide support and guidance. The lead observability engineer will oversee and assist with your work throughout the project in the role of Software Engineer (Prometheus / Grafana).

Key Responsibilities (not exhaustive)

  • Monitoring & Metrics Collection
    • Design, configure, and maintain Prometheus-based monitoring solutions
    • Develop and manage metric exporters for application and system-level data
    • Optimise Prometheus scraping configurations and retention policies
  • Alerting & Incident Response
    • Define and maintain alert rules based on SLIs/SLOs and performance baselines
    • Ensure alerts are actionable, with minimal false positives
    • Participate (not necessarily lead) in on-call rotations and incident postmortems
  • Observability Dashboards
    • Design and maintain Grafana dashboards for real-time operational insights
    • Collaborate with engineering and product teams to create tailored visualisations
    • Provide self-service dashboard capabilities for end users
  • System Performance & Reliability
    • Monitor infrastructure (servers, containers, databases, services) for uptime, latency, and throughput
    • Identify bottlenecks and recommend improvements

Required Skills & Experience

  • Proven experience with Prometheus (including PromQL) and Grafana in production environments
  • Strong knowledge of Linux-based systems
  • Experience writing and optimising PromQL queries for alerts and dashboards
  • Familiarity with exporters (node_exporter, blackbox_exporter, custom exporters)
  • Understanding of alertmanager configuration and routing
  • Proficiency with Grafana dashboard creation and templating
  • Strong troubleshooting skills for infrastructure and application issues
  • Familiarity with containers (Docker)
  • Scripting skills with a focus on Python (Bash or Go also beneficial) for automation

Work Arrangement

You will be required to come to our Cardiff office one day a week.

Benefits

  • Highly competitive salary & benefits package
  • Matched company pension contributions up to 5%
  • 25 days annual leave rising to 28 days with service
  • Career development opportunities

Equal Opportunity Employer Statement

SRT Marine Systems plc are an equal opportunity employer. We are committed to creating an inclusive working environment for all employees and actively encourage applications from all sectors of the community.

System Monitoring & Observability Engineer. Job in Bristol LilyLifestyle Jobs in Woodford employer: United Cerebral Palsy of Georgia

At SRT, we pride ourselves on being an excellent employer, offering a highly competitive salary and benefits package alongside unmatched career development opportunities. Our inclusive work culture fosters collaboration among experienced engineers and UX designers, ensuring that every team member can thrive while contributing to innovative projects in the vibrant city of Bristol. With a commitment to employee growth and a supportive environment, SRT is the ideal place for those seeking meaningful and rewarding employment in the tech industry.

United Cerebral Palsy of Georgia

Contact Details:

United Cerebral Palsy of Georgia Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land System Monitoring & Observability Engineer. Job in Bristol LilyLifestyle Jobs in Woodford

Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can refer you directly.

Tip Number 2

Show off your skills! Create a portfolio showcasing your work with Prometheus and Grafana. Include any dashboards you've built or metrics you've optimised. This will give potential employers a taste of what you can do.

Tip Number 3

Prepare for interviews by brushing up on your technical knowledge. Be ready to discuss your experience with Linux systems, PromQL queries, and alerting strategies. Practice common interview questions to boost your confidence.

Tip Number 4

Don't forget to apply through our website! It’s the best way to ensure your application gets seen. Plus, we love seeing candidates who are proactive about their job search.

We think you need these skills to ace System Monitoring & Observability Engineer. Job in Bristol LilyLifestyle Jobs in Woodford

Prometheus
Grafana
PromQL
Linux-based Systems
Metric Exporters
Alertmanager Configuration
Grafana Dashboard Creation

Some tips for your application 🫡

Tailor Your CV:Make sure your CV highlights your experience with Prometheus and Grafana. We want to see how you've used these tools in real-world scenarios, so don’t hold back on the details!

Craft a Compelling Cover Letter:Your cover letter is your chance to shine! Tell us why you're passionate about observability and how your skills align with our needs. Keep it engaging and personal – we love a good story!

Show Off Your Projects:If you've worked on any relevant projects, whether at work or in your spare time, make sure to mention them. We’re keen to see your hands-on experience and how you’ve tackled challenges in monitoring and metrics collection.

Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it’s super easy!

How to prepare for a job interview at United Cerebral Palsy of Georgia

Know Your Tools Inside Out

Make sure you’re well-versed in Prometheus and Grafana. Brush up on your PromQL queries and be ready to discuss how you've used these tools in production environments. Being able to share specific examples of your experience will show that you’re not just familiar with the tools, but that you can effectively leverage them.

Showcase Your Problem-Solving Skills

Prepare to discuss past incidents where you had to troubleshoot infrastructure or application issues. Highlight your approach to identifying bottlenecks and the improvements you recommended. This will demonstrate your analytical skills and your ability to think critically under pressure.

Understand the User Perspective

Since the role involves creating user-friendly observability dashboards, think about how you can tailor visualisations for end-users. Be ready to discuss any previous experiences where you collaborated with product teams to enhance user experience. This shows you value the end-user and can work cross-functionally.

Be Ready for Technical Questions

Expect some technical questions related to alerting, metrics collection, and system performance. Brush up on your knowledge of alertmanager configuration and routing. Practising common interview questions in this area can help you feel more confident and articulate during the interview.