What is The Role
Elastic is building Agent Builder, a conversational platform that connects production agents to real customer business data in Elasticsearch. As a Principal Engineer, you will set technical direction and drive the Kibana backend architecture for the agentic platform: streaming APIs, secure tool execution, session and memory persistence, retrieval and citations contracts, and evaluation telemetry. Your influence will extend beyond a single feature, shaping service boundaries, reliability posture, and standards that other solutions build on.
What You Will Be Doing
- Own the architecture for chat back-end services (Node/TypeScript), defining service boundaries, data contracts, and scalability targets
- Lead cross-team design reviews; author ADRs and RFCs that become reference standards for AI-chat and ingestion work.
- Build and harden event-driven pipelines that capture chat telemetry, evaluation traces, and LLM feedback loops; expose them via self-service analytics endpoints.
- Champion reliability—define error budgets, introduce testing strategy, and steer incident-response playbooks for conversational workloads.
- Mentor senior and Junior engineers; grow their system-design skills and foster a high-trust, low-ego culture.
- Partner with Product, Design, and Data Science to translate ambiguous goals (e.g., "multi-step reasoning with tool calling") into incremental, testable action items.
- Represent Elastic in open-source AI communities (LangGraph/LangChain, MCP/A2A) through design proposals, blog posts, and conference talks.
What You Bring
- We appreciate articulate and "low ego" people who want to grow as part of a team.
- 10 + years building distributed, production SaaS services—at least 5 years leading large-scale Node/TypeScript or similar back-end stacks.
- Deep expertise in distributed systems fundamentals—shard routing, consensus, eventual consistency, back-pressure, and circuit-breaker patterns.
- Demonstrated success designing high-throughput, low-latency APIs (gRPC / REST / WebSocket)—including streaming responses and resumable sessions.
- Hands-on experience with observability: OpenTelemetry, log/metric pipelines, synthetic checks, and SLO dashboards.
- Exposure to LLM tooling (LangChain/LangGraph, OpenAI function calls, vector-search, RAG orchestration) and enthusiasm for advancing GenAI architectures.
- Clear, persuasive written communication—your ADRs and RFCs set the standard others emulate.
- Nice-to-have: contribution history to Kibana or other large SPAs; ability to prototype front-end dashboards when it unblocks back-end work.
If this sounds interesting, we would love to hear from you! Please include whatever info you believe is relevant: resume, GitHub profile, code samples, blog posts and writing samples, links to personal projects, etc.