At a Glance
- Tasks: Design and maintain cutting-edge network observability systems for a GPU cloud network.
- Company: Join CoreWeave, a fast-growing tech company with a collaborative culture.
- Benefits: Enjoy competitive salary, family-level health insurance, and tuition reimbursement.
- Why this job: Make a real impact by optimising network performance and automating workflows.
- Qualifications: Experience with Prometheus, Grafana, Python, and Go is essential.
- Other info: Hybrid work environment with opportunities for growth and learning.
The predicted salary is between 36000 - 60000 £ per year.
Overview: Join CoreWeave's Network Observability team as a Senior Engineer. You will design, develop, and maintain the monitoring, telemetry, and observability systems that keep CoreWeave's GPU cloud network operating reliably and at scale. You will build solutions that provide real-time insights into network performance and enable proactive issue detection and rapid resolution.
What You'll Do:
- Develop, optimize, and maintain network observability platforms.
- Use Python and Go to create and automate collectors, exporters, and dashboards that provide deep visibility into network health and performance.
- Collaborate with Network Engineering and Platform teams to ingest and unify logs, metrics, and events from Arista EOS, NVIDIA Cumulus Linux, Nokia SR OS, SR Linux, and other platforms into a single observability pipeline.
- Design and implement scalable telemetry solutions using protocols like gNMI, SNMP, and streaming analytics.
- Ensure advanced alerting and anomaly detection with Prometheus, Grafana, and Alertmanager.
- Work with network developers, site reliability engineers, and security teams to integrate observability across the broader infrastructure.
- Participate in design discussions, RFCs, and architectural decisions.
- Join a rotating on-call schedule to troubleshoot observability-related issues and provide timely support to operations teams, quickly isolating and fixing problems.
- Guide junior team members, share best practices, and foster a culture of continuous learning within the observability domain.
Who You Are — Minimum Qualifications:
- Deep familiarity with Prometheus, Grafana, Alertmanager, gNMI, and SNMP.
- Experience writing or extending custom metric collectors/exporters is a plus.
- Experience as a Network Engineer, SRE, Software Developer, or Systems Administrator in large-scale environments with telemetry and monitoring deployments.
- Passion for automating tasks and reducing human error through automated workflows.
- Experience containerizing solutions in Kubernetes and deploying container-based workloads efficiently.
- Proficient with Python, Go, and Bash; familiarity with configuration management and templating tools (e.g., Ansible, Jinja2).
- Strong knowledge of Linux systems and IP networking concepts, including routing, switching, and network troubleshooting.
- Hands-on experience with platforms such as Arista EOS, NVIDIA Cumulus Linux, Nokia SR OS, and SR Linux.
- Collaborative, humble, and open to learning from more senior colleagues.
Preferred Qualifications:
- Bachelor's degree in Computer Science or related field.
- Hands-on ML experience for anomaly detection in networks (e.g., TensorFlow, scikit-learn).
- Network certifications (e.g., CCNA, CCNP) or equivalent.
- Experience with data pipelines, event correlation, or large-scale analytics.
- Familiarity with OpenTelemetry, Jaeger, or Zipkin for distributed tracing.
Why CoreWeave?
At CoreWeave, we work hard, have fun, and move fast. We're in a hyper-growth phase and value curiosity, ownership, and collaboration. Our core values include being curious, acting like an owner, empowering employees, delivering best-in-class client experiences, and achieving more together. We support an entrepreneurial mindset and provide opportunities to develop innovative solutions. You will be surrounded by top talent and gain growth opportunities as we scale.
What We Offer:
In addition to a competitive salary, we offer a range of benefits to support your needs:
- Family-level Medical Insurance
- Family-level Dental Insurance
- Generous Pension Contribution
- Life Assurance at 4x Salary
- Critical Illness Cover
- Employee Assistance Programme
- Tuition Reimbursement
- Work culture focused on innovative disruption
Benefits may vary by location.
Our Workplace:
We prioritize a hybrid work environment; remote work may be considered for candidates located more than 30 miles from an office, based on role requirements. New hires attend onboarding at a hub within the first month. Teams also gather quarterly to support collaboration.
CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information.
Export Control Compliance:
This position requires access to export controlled information. To conform to U.S. Government export regulations, applicants must meet certain criteria. CoreWeave may decline to pursue export licensing as appropriate.
Senior Engineer, Network Observability in London employer: CoreWeave
Contact Detail:
CoreWeave Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Senior Engineer, Network Observability in London
✨Tip Number 1
Network observability is all about collaboration, so don’t hesitate to reach out to current employees on LinkedIn. Ask them about their experiences and insights into the role. This can give you a leg up in understanding the company culture and what they value.
✨Tip Number 2
When you get an interview, be ready to showcase your technical skills. Prepare to discuss your experience with Prometheus, Grafana, and any custom metric collectors you've built. We want to see how you think and solve problems, so bring your A-game!
✨Tip Number 3
Don’t just focus on your technical skills; show us your passion for automation and reducing human error. Share examples of how you’ve implemented automated workflows in past roles. This will resonate well with our values at CoreWeave.
✨Tip Number 4
Finally, apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining our team at CoreWeave.
We think you need these skills to ace Senior Engineer, Network Observability in London
Some tips for your application 🫡
Tailor Your Application: Make sure to customise your CV and cover letter to highlight your experience with network observability tools like Prometheus and Grafana. We want to see how your skills align with what we do at CoreWeave!
Show Off Your Projects: If you've worked on any relevant projects, especially those involving Python or Go, don’t hold back! Share specific examples that demonstrate your ability to develop and maintain observability systems.
Be Clear and Concise: When writing your application, keep it straightforward. We appreciate clarity, so avoid jargon unless it's necessary. Make it easy for us to see why you’re a great fit for the Senior Engineer role.
Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it’s super easy!
How to prepare for a job interview at CoreWeave
✨Know Your Tools Inside Out
Make sure you’re well-versed in Prometheus, Grafana, and Alertmanager. Be ready to discuss how you've used these tools in past projects, especially in relation to network observability. Having specific examples of how you’ve implemented or optimised these systems will really impress.
✨Showcase Your Coding Skills
Since Python and Go are key for this role, brush up on your coding skills. Prepare to demonstrate your ability to write or extend custom metric collectors/exporters. You might even want to bring a small project or code snippet to discuss during the interview.
✨Collaborate and Communicate
This role involves working closely with various teams, so be prepared to talk about your collaborative experiences. Share examples of how you’ve worked with Network Engineering or SRE teams to unify logs and metrics. Highlight your communication skills and how you foster teamwork.
✨Emphasise Continuous Learning
CoreWeave values curiosity and continuous learning, so be ready to discuss how you stay updated with industry trends. Mention any recent courses, certifications, or projects that showcase your commitment to personal and professional growth, especially in areas like ML for anomaly detection.