Director, Platform Management & Observability
Director, Platform Management & Observability

Director, Platform Management & Observability

Bristol Full-Time No home office possible
G

Director, Platform Management & Observability

About Graphcore

How often do you get the chance to build a technology that transforms the future of humanity? Graphcore products have set the standard in made-for-AI compute hardware and software, gaining global attention and industry acclaim. Now we are developing the next generation of artificial intelligence compute with systems that will allow AI researchers to develop more advanced models, help scientists unlock exciting new discoveries, and power companies around the world as they put AI at the heart of their business. Graphcore recently joined SoftBank Group – bringing large and ongoing investment from one of the world’s leading backers of innovative AI companies.

Job Summary

As the engineering Director for Platform Management & Observability, you will be responsible for building, managing, and guiding a team of talented engineers focused on the architecture, implementation, and deployment of highly scalable management solutions for AI infrastructure built using our next-generation products. Covering monitoring, observability, control, and data centre infrastructure management, you will work closely with software, cloud, and customer-facing teams to establish first-hand knowledge of these solutions, enabling the creation of proof-of-concepts, reference designs, and integrations with third-party tooling.

Your team will work closely with product, architecture, and other delivery teams to ensure that solutions are functionally complete, simple to deploy, and easy to use, supporting engineering efforts and providing reference designs to our customers.

Responsibilities and Duties

  • Manage a team contributing to all phases of product development, from definition, architecture, and design, through implementation, debugging, testing, and early customer support.
  • Deliver and operate an internal management & observability service for engineering teams to aid debugging, performance analysis, benchmarking, test/QA, etc., from system bring-up through customer release, at all scales.
  • Evaluate new technologies and innovations to anticipate future customer needs and develop strategies for Graphcore data center management solutions.
  • Prioritize team objectives dynamically in response to evolving business goals.
  • Identify opportunities for process improvements, leading initiatives to enhance efficiency and quality.
  • Collaborate with product management, engineering leads, customer-facing teams, and internal customers to ensure timely delivery of team outputs.
  • Champion quality by ensuring solutions are thoroughly tested.
  • Work with senior management to establish strategic plans and objectives.
  • Mentor and guide junior engineers; coach managers and team leads.
  • Foster a culture of continuous learning and improvement.

Skills and Experience

  • BSc or MSc in Computer Engineering, Computer Science, or related field, or equivalent experience.
  • Proven experience managing engineering teams for over 10 years.
  • Experience with complex issues where problems are not clearly defined and fundamental principles may not fully apply.
  • Detail-oriented with the ability to multitask in a dynamic environment with shifting priorities.
  • Strong analytical, creative, and problem-solving skills.
  • Excellent written and verbal communication skills.
  • Experience using Jira and Confluence for project management.
  • 14+ years of relevant post-degree experience.
  • Familiarity with technologies like Prometheus, Grafana, OpenTelemetry, Clickhouse, Kafka, Superset, and stacks such as Elastic Stack, Better Stack, LGTM.
  • Knowledge of commercial observability solutions like Datadog, Dynatrace, and Splunk.

In addition to a competitive salary, Graphcore offers flexible working, a generous annual leave policy, private medical insurance, health cash plan, dental plan, pension (matched up to 5%), life assurance, and income protection. We have a generous parental leave policy and an employee assistance programme supporting health, mental wellbeing, and bereavement. Our Bristol office provides healthy food, snacks, and a barista bar! We value diversity and inclusion, striving to create a welcoming environment for all. We offer an equal opportunity process and are happy to accommodate reasonable adjustments during the interview process.

Applicants must have the right to work in the UK. We are currently unable to support visa sponsorships.

#J-18808-Ljbffr

G

Contact Detail:

graphcore Recruiting Team

Director, Platform Management & Observability
graphcore
G
  • Director, Platform Management & Observability

    Bristol
    Full-Time

    Application deadline: 2027-06-10

  • G

    graphcore

Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>