Site Reliability Engineer (K8s)

Site Reliability Engineer (K8s)

Full-Time No working from home possible
P

Responsibilities

  • Our BI team runs a set of GCP-based APIs and data services that a lot of internal products depend on
  • As we've grown, keeping things running has increasingly been a side responsibility for engineers who are primarily building features — and that's not sustainable
  • We're looking for an SRE to own that space: service health, incident response, infrastructure monitoring, and making sure we're not blindly burning cloud budget
  • The Site Reliability Engineer will ensure the availability, performance, and security of the Business Intelligence team’s GCP-hosted APIs and data infrastructure
  • This role is responsible for proactive monitoring, incident response, and continuous improvement of platform reliability across a cloud-native stack
  • The engineer will work closely with backend and data engineers to maintain service health and drive operational excellence
  • This position also carries responsibility for GCP cost visibility, helping the team track and optimize cloud spend through structured monitoring and alerting
  • Monitor and maintain uptime of GCP-hosted APIs and services, keeping performance within agreed targets
  • Lead incident response for BI platform services — triage, resolve, and follow up with post‑mortems that actually prevent recurrence
  • Build and manage observability infrastructure: dashboards, alerts, and logging across GCP services
  • Track GCP cloud spend and set up cost alerting to flag anomalies before they become problems
  • Review and fix security gaps — IAP configs, service account permissions, API access controls
  • Work with data and backend engineers to shore up reliability of data pipelines and BigQuery workflows
  • Contribute to infrastructure‑as‑code and help keep deployments documented and reproducible

Benefits

  • Dollars and Sense: 401(k) match
  • Happy + Healthy: Comprehensive medical plans, affordable medical, dental and vision options, 100%-paid life & disability insurance
  • Break a Sweat: Free virtual fitness classes, Better Yourself Wellness program
  • Always Learning: Generous annual tuition reimbursement, ongoing team trainings
  • Take a Load Off: Paid vacation, sick time, and company holidays (including a floating holiday)
  • Good Ol’ Fun: Team‑building events, happy hours, holiday celebrations, and more!

Qualifications

  • Practical experience with GCP — Cloud Run, API Gateway, and BigQuery in particular
  • Proficiency with Git and version control in a team setting
  • 2+ years in a Site Reliability, DevOps, or Cloud Infrastructure role in a production environment
  • Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent hands‑on experience
  • Experience with monitoring and observability tooling (Cloud Monitoring, Datadog, or similar)
  • Solid grasp of cloud security fundamentals — IAM, network controls, access management
  • Terraform or other infrastructure‑as‑code tools
  • CI/CD pipelines and deployment automation (GitHub Actions, Cloud Build, or similar)
  • Python for scripting or automation
  • MySQL, Spanner, or BigQuery at any meaningful depth
  • Experience with dbt or Looker
  • GCP cost management and spend optimization
  • Comfortable working across CET/EST hours in a distributed team
#J-18808-Ljbffr
P

Contact Details:

PulsePoint Recruitment Team