Responsibilities
- Design, implement, and maintain data pipelines to ingest and process OpenShift telemetry (metrics, logs, traces) at scale.
- Stream OpenShift telemetry via Kafka (producers, topics, schemas) and build resilient consumer services for transformation and enrichment.
- Engineer data models and routing for multi‑tenant observability; ensure lineage, quality, and SLAs across the stream layer.
- Integrate processed telemetry into Splunk for visualisation, dashboards, alerting, and analytics to achieve Observability Level 4 (proactive insights).
- Implement schema management (Avro/Protobuf), governance, and versioning for telemetry events.
- Build automated validation, replay, and backfill mechanisms for data reliability and recovery.
- Instrument services with OpenTelemetry; standardise tracing, metrics, and structured logging across platforms.
- Use LLMs to enhance observability capabilities (e.g., query assistance, anomaly summarization, Runbook generation).
- Collaborate with platform, SRE, and application teams to integrate telemetry, alerts, and SLOs.
- Ensure security, compliance, and best practices for data pipelines and observability platforms.
- Document data flows, schemas, dashboards, and operational Runbook.
Required Skills
- Hands‑on experience building streaming data pipelines with Kafka (producers/consumers, schema registry, Kafka Connect/KSQL/KStream).
- Proficiency with OpenShift/Kubernetes telemetry (OpenTelemetry, Prometheus) and CLI tooling.
- Experience integrating telemetry into Splunk (HEC, UF, source types, CIM), building dashboards and alerting.
- Strong data engineering skills in Python (or similar) for ETL/ELT, enrichment, and validation.
- Knowledge of event schemas (Avro/Protobuf/JSON), contracts, and backward/forward compatibility.
- Familiarity with observability standards and practices; ability to drive toward Level 4 maturity (proactive monitoring, automated insights).
- Understanding of hybrid cloud and multi‑cluster telemetry patterns.
- Security and compliance for data pipelines: secret management, RBAC, encryption in transit/at rest.
- Good problem‑solving skills and ability to work in a collaborative team environment.
- Strong communication and documentation skills.
#J-18808-Ljbffr
Contact Detail:
Test Triangle Ltd Recruiting Team