Observability Engineer

Observability Engineer

London Full-Time 36000 - 60000 £ / year (est.) No home office possible
T

At a Glance

  • Tasks: Lead the design and improvement of our observability platform, focusing on monitoring and event correlation.
  • Company: Join Trafigura Group IT, a leader in providing shared services across a diverse global network.
  • Benefits: Enjoy a collaborative work environment, opportunities for remote work, and a commitment to diversity.
  • Why this job: Be part of a team that enhances infrastructure performance and automates operational intelligence with cutting-edge tools.
  • Qualifications: 3+ years in Infrastructure Observability Engineering; degree in computer science or related field required.
  • Other info: Work alongside a dynamic team and contribute to innovative solutions in a fast-paced environment.

The predicted salary is between 36000 - 60000 £ per year.

Main Purpose: We are looking for an experienced Infrastructure Observability Engineer to lead the design, implementation, and continuous improvement of our enterprise observability platform. This role focuses on delivering comprehensive monitoring, event correlation, and impact analysis, demonstrating AIOps capabilities and tools such as BMC Helix Operations Manager.
The ideal candidate will be passionate about improving access to infrastructure performance, automating operational intelligence, and reducing mean time to resolution (MTTR) through intelligent alerting and root cause analysis. Knowledge Skills and Abilities, Key Responsibilities:

  • Own and evolve the enterprise observability strategy across all infrastructure tracks

  • Design, implement, and support event management and impact analysis workflows using platforms such as BMC Helix Operations Manager

  • Integrate and correlate data from multiple sources (e.g., 20+ monitoring systems) into a unified monitoring and alerting framework.

  • Apply AIOps principles to reduce alert noise, detect anomalies, and predict/prevent potential outages

  • Collaborate with infrastructure, application, and service desk teams to define meaningful service-level metrics and dashboards

  • Maintain and extend the configuration of monitoring tools, event enrichment, suppression rules, and correlation logic

  • Develop and support automation for observability platform configuration using Infrastructure as Code

  • Define best practices for monitoring new platforms and services in collaboration with engineering and operations teams

  • Support the integration of observability data with ITSM platforms (e.g., Ivanti Neurons ITSM) to streamline incident and change processes

  • Ensure observability platforms are reliable, secure, well-documented, and continuously aligned with business requirements

Knowledge, Skills and Abilities

Specialist Knowledge:

  • Demonstrable experience in observability engineering, infrastructure monitoring, or event management roles

  • Experience with traditional and modern observability stacks such as SCOM, Solarwinds, Prometheus, Grafana and Elastic Stack (ELK)

  • Hands-on experience with BMC Helix Operations Manager, TrueSight, or similar enterprise monitoring platforms

  • Solid understanding of AIOps concepts, including event correlation, noise reduction, anomaly detection, and root cause analysis

  • Strong proficiency with scripting (e.g., Python, PowerShell, Bash) for automation and data handling

  • Solid understanding of networking fundamentals

  • Excellent problem-solving skills with the ability to diagnose complex issues using observability tools and logs

  • Exposure to cloud-native monitoring for platforms such as Azure Monitor, AWS CloudWatch, or Google Cloud Operations

  • Experience with implementing self-healing alerts/systems based on tools such as VMWare VCF Operations, Syslog Splunk and VMWare Loginsight

  • Proficiency with observability of Kubernetes clusters

Educational Background:

Bachelor’s degree in computer science; information technology or a related field.

Professional Experience:

Minimum of 3 years of experience in Infrastructure Observability Engineering.

Competencies

  • Problem-solving

  • Ability to improve business processes

  • Able to use initiative

  • Strategic planning

Key Relationships and Department Overview:

Key Relationships

  • Outsourced Event & Impact Management Team

  • Outsourced Monitoring Administration Teams

  • Engineering Teams (Platform, Windows, Networks, SQL Server & Oracle)

  • Vendor management

  • Change, Incident & Problem Manager

  • Outsourced IT management

Department

Trafigura Group IT provides shared services across the Trafigura group of companies, offering services at scale where it makes economic sense.

Reporting Structure

The engineer will report to the Platform Architect and will join a team of six other engineers who work in a collaborative team covering the Storage, Linux and Virtualisation towers.

Equal Opportunity Employer

We are an Equal Opportunity Employer and take pride in a diverse workforce. We do not discriminate in recruitment, hiring, training, promotion or other employment practices for reasons of race, colour, religion, gender, sexual orientation, national origin, age, marital or veteran status, medical condition or handicap, disability, or any other legally protected status.

#J-18808-Ljbffr

Observability Engineer employer: Trafigura

At Trafigura, we pride ourselves on being an excellent employer, offering a dynamic work culture that fosters collaboration and innovation. As an Infrastructure Observability Engineer, you will have the opportunity to work with cutting-edge technologies while benefiting from our commitment to employee growth through continuous learning and development. Our diverse workforce and inclusive environment ensure that every team member is valued, making Trafigura a rewarding place to advance your career in a meaningful way.
T

Contact Detail:

Trafigura Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Observability Engineer

✨Tip Number 1

Familiarise yourself with the specific observability tools mentioned in the job description, such as BMC Helix Operations Manager and AIOps concepts. Having hands-on experience or even a solid understanding of these tools will give you an edge during discussions.

✨Tip Number 2

Network with professionals in the observability engineering field. Join relevant online forums or LinkedIn groups where you can engage with others who work in similar roles. This can provide insights into industry trends and potentially lead to referrals.

✨Tip Number 3

Prepare to discuss your problem-solving skills and how you've used observability tools to diagnose complex issues in past roles. Be ready to share specific examples that demonstrate your ability to improve business processes and reduce MTTR.

✨Tip Number 4

Showcase your scripting skills, particularly in Python, PowerShell, or Bash. Consider creating a small project or script that automates a task related to observability, which you can present during interviews to highlight your technical capabilities.

We think you need these skills to ace Observability Engineer

Observability Engineering
Infrastructure Monitoring
Event Management
BMC Helix Operations Manager
AIOps Principles
Data Correlation
Anomaly Detection
Root Cause Analysis
Scripting (Python, PowerShell, Bash)
Networking Fundamentals
Cloud-Native Monitoring (Azure Monitor, AWS CloudWatch, Google Cloud Operations)
Kubernetes Cluster Observability
Self-Healing Alerts Implementation
Problem-Solving Skills
Automation using Infrastructure as Code
Collaboration with Engineering and Operations Teams

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights relevant experience in observability engineering and infrastructure monitoring. Focus on specific tools you've used, such as BMC Helix Operations Manager, and any AIOps principles you've applied.

Craft a Strong Cover Letter: In your cover letter, express your passion for improving infrastructure performance and automating operational intelligence. Mention how your skills align with the responsibilities outlined in the job description.

Showcase Relevant Projects: If you have worked on projects involving event management, impact analysis, or automation using Infrastructure as Code, be sure to include these in your application. Provide specific examples of how you contributed to reducing MTTR or improving alerting systems.

Highlight Problem-Solving Skills: Demonstrate your problem-solving abilities by discussing complex issues you've diagnosed using observability tools. Include any experience with self-healing alerts or cloud-native monitoring that showcases your expertise.

How to prepare for a job interview at Trafigura

✨Showcase Your Technical Expertise

Be prepared to discuss your hands-on experience with observability tools like BMC Helix Operations Manager, Prometheus, and Grafana. Highlight specific projects where you've implemented monitoring solutions or automated processes, as this will demonstrate your capability in the role.

✨Understand AIOps Principles

Since the role involves applying AIOps concepts, brush up on event correlation, noise reduction, and anomaly detection. Be ready to explain how you've used these principles in past roles to improve infrastructure performance and reduce MTTR.

✨Prepare for Problem-Solving Scenarios

Expect to face technical problem-solving questions during the interview. Practice articulating your thought process when diagnosing complex issues using observability tools and logs, as this will showcase your analytical skills and approach to troubleshooting.

✨Demonstrate Collaboration Skills

The role requires collaboration with various teams, so be ready to discuss how you've worked with cross-functional teams in the past. Share examples of how you defined service-level metrics or contributed to team projects, emphasising your ability to communicate effectively and work towards common goals.

Observability Engineer
Trafigura
T
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>