At a Glance
- Tasks: Design and implement evaluation frameworks for cutting-edge AI models.
- Company: Join LILT, a leader in AI-driven translation technology.
- Benefits: Competitive salary, remote work options, and growth opportunities.
- Other info: Dynamic team environment with a focus on collaboration and innovation.
- Why this job: Be the quality gatekeeper for innovative AI solutions that change communication.
- Qualifications: Expertise in Python and modern AI frameworks required.
The predicted salary is between 36000 - 60000 £ per year.
About LILT AI is changing how the world communicates — and LILT is leading that transformation. We are on a mission to make the world's information accessible to everyone, regardless of the language they speak. We use cutting‑edge AI, machine translation, and human‑in‑the‑loop expertise to translate content faster, more accurately, and more cost‑effectively without compromising on brand, voice, or quality. At LILT, we empower our teammates with leading tools, global collaboration, and growth opportunities to do their best work. Our company virtues—Work together, win together; Find a way or make one; Quicker than they expect; Quality is Job 1—guide everything we do.
We are trusted by Intel Corporation, Canva, the United States Department of Defense, the United States Air Force, ASICS, and hundreds of global enterprises. Backed by Sequoia, Intel Capital, and Redpoint, we’re building a category‑defining company in a $50B+ global translation market being redefined by AI.
About The Role As a Research Engineer focused on Model Evaluation, you are the final arbiter of technical quality for our frontier AI deliverables. You will design sophisticated evaluation suites and serve as the lead calibrator, reviewing and refining the contributions of other engineers to ensure our data samples and model outputs meet the exacting standards of the world’s leading AI labs. This is a highly technical role for someone who enjoys getting in the weeds of model behaviour, RAG performance, and RLHF alignment.
- Key Responsibilities
- Eval Architecture & Benchmarking: Design and implement automated and human‑in‑the‑loop evaluation frameworks to measure model performance across multiple modalities (text, code, image, etc.).
- Calibration & Peer Review: Act as the Gold Standard reviewer for other engineers. You will calibrate their data generation and evaluation contributions, providing technical feedback to ensure scientific consistency and high‑fidelity output.
- Frontier Sample Generation: Write and refine complex prompts and golden response pairs for frontier‑model training, specifically focusing on edge cases in reasoning and multilingual contexts.
- Quality Control (End‑to‑End): Develop the logic for multimodal QC checks, ensuring that high‑volume data samples are correct across diverse domains and languages.
- Technical Mentorship: Bring new knowledge and best practices to our established delivery and forward‑deployed engineering teams on model evaluations.
Qualifications
- Education: B.S. in Computer Science, AI, or a related field or 5+ years of relevant experience in a high‑growth AI/Research environment.
- Deep Technical Proficiency: Expert-level Python skills and hands‑on experience with modern AI frameworks (PyTorch, Transformers, LangChain/LlamaIndex).
- Evaluation Experience: Experience building model evaluation suites (e.g., MMLU‑style benchmarks, custom RAG metrics, or human‑preference alignment).
- Domain Expertise: Deep understanding of RAG architectures, vector database retrieval logic, and agentic workflows. Experience with RLHF/RLAIF environments and the mechanics of preference signalling/reward modelling.
- Multimodal & Multilingual Rigor: Experience handling data quality at scale across different languages and modalities (images, video, or audio).
- Precision‑ and Quality‑Orientation: You find bugs in model reasoning that others miss. You are comfortable being the final quality arbiter for technical deliverables that others produce.
Preferred Skills
- Fluency in multiple languages (highly preferred for multilingual model calibration).
- Experience in Frontier Labs or high‑tier AI research environments.
- At least one of: a portfolio of research contributions, an example of evals or “model‑breaking” samples, or use of open‑source AI evaluation tools.
Our Story
Our founders, Spence and John met at Google working on Google Translate. As researchers at Stanford and Berkeley, they both worked on language technology to make information accessible to everyone. While together at Google, they were amazed to learn that Google Translate wasn’t used for enterprise products and services inside the company. The quality just wasn’t there. So they set out to build something better. LILT was born. LILT has been a machine learning company since its founding in 2015. At the time, machine translation didn’t meet the quality standard for enterprise translations, so LILT assembled a cutting‑edge research team tasked with closing that gap. While meeting customer demand for translation services, LILT has prioritised investments in Large Language Models, human‑in‑the‑loop systems, and now agentic AI. With AI innovation accelerating and enterprise demand growing, the next phase of LILT’s journey is just beginning.
What Sets Our Platform Apart
- Brand‑aware AI that learns your voice, tone, and terminology to ensure every translation is accurate and consistent.
- Agentic AI workflows that automate the entire translation process from content ingestion to quality review to publishing.
- 100+ native integrations with systems such as Adobe Experience Manager, Webflow, Salesforce, GitHub, and Google Drive to simplify content translation.
- Human‑in‑the‑loop reviews via our global network of professional linguists, for high‑impact content that requires expert review.
LILT in the News
Featured in The Software Report’s Top 100 Software Companies! LILT makes it onto the Inc. 5000 List. LILT continues to be an intellectual powerhouse, holding numerous patents that help power the most efficient and sophisticated AI and language models in the industry. Check out all our news on our website.
Information collected and processed as part of your application process, including any job applications you choose to submit, is subject to LILT's Privacy Policy at https://lilt.com/legal/privacy. At LILT, we are committed to a fair, inclusive, and transparent hiring process. As part of our recruitment efforts, we may use artificial intelligence (AI) and automated tools to assist in the evaluation of applications, including résumé screening, assessment scoring, and interview analysis. These tools are designed to support human decision‑making and help us identify qualified candidates efficiently and objectively. All final hiring decisions are made by people. If you have any concerns, require accommodations, or would like to opt‑out of the use of AI in our hiring process, please let us know at recruiting@lilt.com. LILT is an equal opportunity employer. We extend equal opportunity to all individuals without regard to an individual’s race, religion, color, national origin, ancestry, sex, sexual orientation, gender identity, age, physical or mental disability, medical condition, genetic characteristics, veteran or marital status, pregnancy, or any other classification protected by applicable local, state or federal laws. We are committed to the principles of fair employment and the elimination of all discriminatory practices.
Research Engineer, Evaluations, Applied AI employer: LILT AI
At LILT, we are at the forefront of AI-driven communication solutions, fostering a collaborative and innovative work culture that empowers our employees to excel. Our commitment to employee growth is evident through mentorship opportunities and access to cutting-edge tools, ensuring that every team member can contribute meaningfully to our mission of making information accessible globally. Located in a vibrant tech hub, we offer a dynamic environment where creativity thrives, and every voice is valued.
StudySmarter Expert Advice🤫
We think this is how you could land Research Engineer, Evaluations, Applied AI
✨Tip Number 1
Network like a pro! Reach out to folks in the AI and tech space, especially those who work at LILT or similar companies. A friendly chat can open doors and give you insider info that could help you stand out.
✨Tip Number 2
Show off your skills! Prepare a portfolio showcasing your best work, especially any projects related to model evaluation or AI frameworks. This is your chance to shine and demonstrate your expertise in a tangible way.
✨Tip Number 3
Practice makes perfect! Get ready for technical interviews by brushing up on your Python skills and understanding of RAG architectures. Mock interviews with friends or mentors can really help you feel more confident.
✨Tip Number 4
Apply through our website! It’s the best way to ensure your application gets seen. Plus, it shows you’re genuinely interested in being part of the LILT team and contributing to our mission.
We think you need these skills to ace Research Engineer, Evaluations, Applied AI
Some tips for your application 🫡
Tailor Your Application:Make sure to customise your CV and cover letter for the Research Engineer role. Highlight your experience with AI frameworks and model evaluation, as this will show us you understand what we're looking for.
Showcase Your Technical Skills:Don’t hold back on flaunting your Python prowess and any hands-on experience with AI tools. We want to see how you’ve used these skills in real-world scenarios, so include specific examples!
Be Clear and Concise:When writing your application, keep it straightforward. Use clear language and avoid jargon unless it's relevant. We appreciate a well-structured application that gets straight to the point.
Apply Through Our Website:We encourage you to submit your application directly through our website. This way, you’ll ensure it reaches us without any hiccups, and you can easily track your application status!
How to prepare for a job interview at LILT AI
✨Know Your AI Stuff
Make sure you brush up on your knowledge of AI frameworks like PyTorch and Transformers. Be ready to discuss your experience with model evaluation suites and how you've tackled challenges in RAG architectures or RLHF environments.
✨Showcase Your Precision
Prepare examples that highlight your attention to detail, especially when it comes to quality control. Think about instances where you identified bugs in model reasoning that others missed, and be ready to explain your thought process.
✨Be a Team Player
LILT values collaboration, so come prepared to discuss how you've worked with others in high-pressure environments. Share experiences where you’ve provided technical feedback or mentored peers, showcasing your ability to elevate team performance.
✨Multimodal Mastery
Since the role involves handling diverse data types, be ready to talk about your experience with multimodal data quality. Discuss any projects where you’ve worked with text, images, or audio, and how you ensured accuracy across different languages.