AI Inference Engineer
We are seeking an AI Inference Engineer to join our expanding team. Our current technology stack includes Python, C++, TensorRT-LLM, and Kubernetes. This role offers the opportunity to work on large-scale deployment of machine learning models for real-time inference.
Responsibilities
- Develop APIs for AI inference used by internal and external customers
- Benchmark and optimize the inference stack to address bottlenecks
- Enhance system reliability and observability, and respond to outages
- Research and implement optimizations for LLM inference
Qualifications
- Experience with ML systems and deep learning frameworks (e.g., PyTorch, TensorFlow, ONNX)
- Knowledge of LLM architectures and inference techniques (e.g., batching, quantization)
- Experience deploying reliable, distributed, real-time model serving at scale
- (Optional) Understanding of GPU architectures or CUDA kernel programming
The compensation range for this role is $190,000 – $240,000. Additional benefits include equity, comprehensive health insurance, dental, vision, and a 401(k) plan.
#J-18808-Ljbffr
Contact Detail:
Perplexity Recruiting Team