Senior Operations Reliability Engineer – Enterprise Platforms and Tools

Senior Operations Reliability Engineer – Enterprise Platforms and Tools

Full-Time 60000 - 75000 £ / year (est.) No working from home possible
Genesys Cloud Services, Inc.

At a Glance

  • Tasks: Own the reliability and management of enterprise productivity platforms like Jira and Confluence.
  • Company: Join Genesys, a leader in customer experience technology with a collaborative culture.
  • Benefits: Enjoy competitive salary, health perks, remote work options, and growth opportunities.
  • Other info: Dynamic team environment with mentorship and career advancement potential.
  • Why this job: Make a real impact by enhancing operational reliability and automating processes.
  • Qualifications: 5+ years in SaaS administration and strong troubleshooting skills required.

The predicted salary is between 60000 - 75000 £ per year.

Genesys empowers organizations of all sizes to improve loyalty and business outcomes by creating the best experiences for their customers and employees. Through Genesys Cloud, the AI-powered Experience Orchestration platform, organizations can accelerate growth by delivering empathetic, personalized experiences at scale to drive customer loyalty, workforce engagement, efficiency and operational improvements.

We employ more than 6,000 people across the globe who embrace empathy and cultivate collaboration to succeed. And, while we offer great benefits and perks like larger tech companies, our employees have the independence to make a larger impact on the company and take ownership of their work. Join the team and create the future of customer experience together.

Overview

As a Senior Operations Reliability Engineer specializing in Enterprise Platforms and Tools, you will own the operational reliability, health, and lifecycle management of enterprise productivity and collaboration platforms. This role combines hands-on platform administration with day-to-day operational ownership and governance of enterprise SaaS tools such as Jira, Confluence, Figma, Lucid, and other SaaS related platforms. In addition to serving as a senior escalation point, you will improve monitoring accuracy, reduce alert noise, validate automation workflows, and contribute to AIOps tuning and observability standards. You will help transition enterprise tool operations from reactive issues handling toward proactive, automation-driven reliability practices that improve uptime, user communication, and service maturity.

Responsibilities
  • General Reliability Operations
    • Monitor observability and AIOps platforms to detect anomalies, performance degradation, and emerging issues across enterprise systems.
    • Perform advanced incident triage and event correlation to identify root cause and reduce duplicate or misrouted incidents.
    • Lead or contribute to post-incident reviews, identifying systemic fixes and automation opportunities.
    • Validate automated remediation workflows prior to production adoption.
    • Identify recurring manual tasks and translate them into automation requirements or scripted improvements.
    • Improve alert signal quality by refining thresholds, suppression logic, and event correlation rules.
    • Ensure platform telemetry, SaaS health signals, and configuration data align with monitoring and CMDB standards.
    • Collaborate with Cloud, IAM, Network, Security, and ServiceNow teams to improve enterprise service reliability.
  • Enterprise Tools Ownership & Operational Management
    • Own day-to-day operational health and administration of enterprise SaaS platforms (e.g., Jira, Confluence, Figma, Lucid, monitoring tools, and similar productivity platforms).
    • Monitor vendor service health dashboards and integrate SaaS outage signals into internal observability and AIOps workflows.
    • Lead user-impact communications during enterprise tool outages or service degradations in partnership with IT Communications and ServiceNow teams.
    • Review vendor release notes and roadmap updates; assess feature changes, security updates, and deprecations.
    • Plan and coordinate controlled feature rollouts, configuration updates, and tenant-level optimizations.
    • Provide guidance and education to end users on new features, configuration changes, and best practices.
    • Manage licensing, usage monitoring, and cost optimization for enterprise tools.
    • Partner with Security and IAM teams to ensure access governance and compliance standards are maintained.
    • Improve monitoring coverage for enterprise tools by integrating telemetry and health signals into AIOps platforms.
    • Document operational standards, support models, and escalation paths for each owned platform.
  • Enterprise Platform Responsibilities
    • Diagnose and remediate integration issues between enterprise platforms and supporting systems.
    • Validate patching and upgrade activities to ensure minimal service disruption.
    • Participate in resilience validation exercises, including failover and recovery testing.
    • Provide mentorship and knowledge-sharing to junior reliability engineers.
    • Support operational reliability of Microsoft Power Platform components (Power Apps, Power Automate, Power BI), including:
      • Monitoring flow failures
      • Troubleshooting environment-level issues
      • Supporting connector configuration
      • Assisting with environment governance and data loss prevention policies
    • Automation & AIOps Contributions
      • Develop and maintain automation scripts (PowerShell, Python) to reduce repetitive operational effort.
      • Contribute to ServiceNow and Power Automate workflow improvements tied to enterprise tool incidents.
      • Partner with teams to refine automated remediation logic.
      • Improve enterprise tool signal quality by integrating vendor health data and usage telemetry into AIOps systems.
      • Support tuning of alert correlation and anomaly detection models for enterprise services.
      • Track improvements in MTTR, alert noise reduction, automation coverage, and platform uptime.
    Requirements
    • Bachelor’s degree in Computer Science, Information Technology, or related field; equivalent experience considered.
    • 5+ years of experience in enterprise platform operations, SaaS administration, or infrastructure support roles.
    • Hands-on experience administering enterprise tools such as Jira, Confluence, Figma, Lucid, or similar SaaS platforms. This includes setting up monitoring and event management capabilities to alert for outage or service degradation.
    • Experience with SQL Server and IIS/Apache administration is an asset.
    • Experience managing SaaS service health, vendor communications, and feature rollouts.
    • Proficiency in PowerShell or equivalent scripting for automation tasks.
    • Solid understanding of monitoring, observability, and event management practices.
    • Familiarity with ITIL principles and ServiceNow workflows.
    • Strong troubleshooting and analytical skills.
    • Effective communication skills, including experience communicating user-facing outages or changes.
    • Motivation to deepen expertise in automation, AIOps, and reliability engineering.
    Preferred Qualifications
    • Experience integrating SaaS platforms with identity providers (Okta, Entra ID).
    • Familiarity with CI/CD pipelines or automation-driven configuration management.
    • Exposure to cloud platforms (AWS or Azure).
    Additional Information
    • On-Call Support: Participation in a shared, rotational on-call schedule is required.

    Genesys is an equal opportunity employer committed to fairness in the workplace. We evaluate qualified applicants without regard to race, color, age, religion, sex, sexual orientation, gender identity or expression, marital status, domestic partner status, national origin, genetics, disability, military and veteran status, and other protected characteristics.

Senior Operations Reliability Engineer – Enterprise Platforms and Tools employer: Genesys Cloud Services, Inc.

Genesys is an exceptional employer that fosters a collaborative and empathetic work culture, empowering employees to take ownership of their roles and make a significant impact. Located in Northern Ireland, the company offers competitive benefits and opportunities for professional growth, particularly in the rapidly evolving field of enterprise platforms and tools. With a commitment to innovation and employee development, Genesys provides a unique environment where you can thrive while contributing to the future of customer experience.

Genesys Cloud Services, Inc.

Contact Details:

Genesys Cloud Services, Inc. Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Senior Operations Reliability Engineer – Enterprise Platforms and Tools

Tip Number 1

Network like a pro! Reach out to folks in your industry on LinkedIn or at local meetups. A friendly chat can lead to opportunities that aren’t even advertised yet.

Tip Number 2

Prepare for those interviews! Research the company and its culture, and be ready to discuss how your skills with tools like Jira and Confluence can make a difference. We want to see your passion shine through!

Tip Number 3

Show off your problem-solving skills! Be ready to share examples of how you’ve tackled challenges in past roles, especially around operational reliability and automation. We love hearing about real-life experiences!

Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining our team!

We think you need these skills to ace Senior Operations Reliability Engineer – Enterprise Platforms and Tools

Operational Reliability
SaaS Administration
Platform Monitoring
Incident Triage
Automation Scripting (PowerShell, Python)
AIOps
Event Management

Some tips for your application 🫡

Tailor Your CV:Make sure your CV is tailored to the Senior Operations Reliability Engineer role. Highlight your experience with enterprise platforms and tools like Jira and Confluence, and don’t forget to showcase your automation skills!

Craft a Compelling Cover Letter:Your cover letter is your chance to shine! Use it to explain why you’re passionate about operational reliability and how your background makes you a perfect fit for our team at Genesys.

Showcase Your Problem-Solving Skills:In your application, give examples of how you've tackled complex issues in previous roles. We love seeing candidates who can think critically and come up with innovative solutions!

Apply Through Our Website:Don’t forget to apply through our website! It’s the best way to ensure your application gets into the right hands. Plus, it shows you’re serious about joining our awesome team!

How to prepare for a job interview at Genesys Cloud Services, Inc.

Know Your Tools Inside Out

Make sure you’re well-versed in the enterprise platforms mentioned in the job description, like Jira, Confluence, and Figma. Familiarise yourself with their functionalities, common issues, and best practices. This will not only help you answer technical questions but also show your genuine interest in the role.

Demonstrate Problem-Solving Skills

Prepare to discuss specific incidents where you identified root causes and implemented solutions. Use the STAR method (Situation, Task, Action, Result) to structure your answers. This will highlight your analytical skills and ability to handle operational challenges effectively.

Showcase Your Automation Knowledge

Since automation is a key part of this role, be ready to talk about your experience with scripting languages like PowerShell or Python. Share examples of how you've used automation to improve processes or reduce manual tasks, as this aligns perfectly with the job's focus on proactive reliability practices.

Communicate Clearly and Effectively

Effective communication is crucial, especially when discussing user-facing outages or changes. Practice explaining complex technical concepts in simple terms. This will demonstrate your ability to collaborate with cross-functional teams and ensure everyone is on the same page during critical situations.