At a Glance
- Tasks: Evaluate and improve LLM agent systems in a dynamic gaming environment.
- Company: Join Tencent's Level Infinite, a global leader in gaming innovation.
- Benefits: Gain hands-on experience, mentorship, and potential for future opportunities.
- Other info: Collaborative culture that values diverse voices and innovative ideas.
- Why this job: Dive into AI evaluation and make a real impact on cutting-edge technology.
- Qualifications: Pursuing or recently graduated in relevant fields with strong Python skills.
The predicted salary is between 20000 - 30000 £ per year.
About The Hiring Team
Level Infinite is Tencent’s global gaming brand. It is a global game publisher offering a comprehensive network of services for games, development teams, and studios around the world. We are dedicated to delivering engaging and original gaming experiences to a worldwide audience, whenever and wherever they choose to play while building a community that fosters inclusivity, connection, and accessibility. Level Infinite also provides a wide range of services and resources to our network of developers and partner studios around the world to help them unlock the true potential of their games.
What The Role Entails
We are hiring an intern to work on evaluation and reliability infrastructure for a real-world LLM agent system in the UA performance marketing field. The agent performs multi-step reasoning, retrieves context, selects tools, executes actions, handles user confirmations, and interacts with external services. The goal of this internship is to build transferable expertise in agent evaluation engineering: evaluating tool use, measuring trajectory quality, designing benchmarks, analyzing traces, comparing model and prompt variants, and improving the reliability of agentic AI systems. This role is ideal for someone interested in future opportunities in LLM agent evaluation, AI safety evaluation, research engineering, LLMOps, or applied AI infrastructure.
- Research the state-of-the-art agentic workflow evaluation frameworks in the industry and in the research field.
- Apply the theory to build automated evaluation pipelines that can run agent scenarios, capture execution artifacts, score results, and detect regressions.
- Evaluate tool-use behavior, including whether the agent selects the right tool, passes correct arguments, avoids unnecessary calls, and handles tool errors appropriately.
- Analyze agent trajectories using traces, logs, intermediate steps, and final outputs to identify reasoning failures, context misuse, hallucinated assumptions, and brittle workflow patterns.
- Design metrics for agent reliability, including success rate, tool-call precision, argument accuracy, recovery rate, retry count, latency, cost, and safety-related failure rates.
- Create reusable evaluation datasets from synthetic cases, golden workflows, and real anonymized executions.
- Support experiments comparing prompts, model providers, tool descriptions, memory strategies, context construction methods, and execution modes.
- Help build human evaluation workflows and rubrics for judging agent correctness, faithfulness, usefulness, and risk awareness.
- Work with engineers to translate evaluation findings into better tests, monitoring signals, tool interfaces, prompts, and guardrails.
- Potentially compose research papers and publish in scientific conferences.
Who We Look For
Currently pursuing or recent graduates of a Master’s or PhD degree in Computer Science, Artificial Intelligence, Machine Learning, Software Engineering, Data Science, or a related field. Strong Python fundamentals and interest in AI systems. Curious about how LLM agents work, fail, and improve. Interested in evaluation methodology, not just application building. Comfortable reading logs, traces, test cases, and structured data. Detail-oriented and able to define clear, measurable criteria for ambiguous agent behavior. Prior experience with LLMs, LangChain-like agents, tool calling, pytest, data analysis, or observability tools is helpful but not required.
Equal Employment Opportunity at Tencent
As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.
LLM Agent Evaluation Intern employer: Tencent
Level Infinite, as part of Tencent's global gaming brand, is an exceptional employer that champions inclusivity and innovation within the gaming industry. Our collaborative work culture encourages creativity and personal growth, providing interns with invaluable hands-on experience in cutting-edge AI technologies while contributing to meaningful projects. Located in a vibrant tech hub, we offer unique opportunities for professional development and networking, making it an ideal place for aspiring talents in the field of AI and gaming.
StudySmarter Expert Advice🤫
We think this is how you could land LLM Agent Evaluation Intern
✨Tip Number 1
Network like a pro! Reach out to people in the gaming and AI fields on LinkedIn or at events. A friendly chat can open doors that applications alone can't.
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repo showcasing any projects related to LLMs or AI systems. This gives you a chance to demonstrate your expertise beyond just a CV.
✨Tip Number 3
Prepare for interviews by diving deep into the role. Understand the latest trends in agent evaluation and be ready to discuss how you can contribute to Level Infinite's mission.
✨Tip Number 4
Apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you're genuinely interested in joining our team.
We think you need these skills to ace LLM Agent Evaluation Intern
Some tips for your application 🫡
Tailor Your CV:Make sure your CV is tailored to the role of LLM Agent Evaluation Intern. Highlight any relevant experience or skills that align with the job description, especially in AI systems and evaluation methodologies. We want to see how you fit into our world!
Craft a Compelling Cover Letter:Your cover letter is your chance to shine! Use it to express your passion for AI and evaluation engineering. Share specific examples of your work or projects that relate to the role, and let us know why you're excited about joining Level Infinite.
Show Off Your Python Skills:Since strong Python fundamentals are key for this role, make sure to mention any relevant projects or coursework that showcase your coding abilities. If you've worked on anything related to LLMs or data analysis, we want to hear about it!
Apply Through Our Website:We encourage you to apply through our website for a smoother application process. It’s the best way for us to receive your application and keep track of all the amazing candidates like you. Don’t miss out on this opportunity!
How to prepare for a job interview at Tencent
✨Know Your Stuff
Make sure you brush up on the latest trends in LLMs and AI systems. Familiarise yourself with evaluation methodologies and be ready to discuss how they apply to real-world scenarios. This shows your genuine interest and understanding of the field.
✨Show Off Your Skills
Prepare to demonstrate your Python skills, especially if you have any experience with data analysis or observability tools. You might be asked to solve a coding problem or explain your thought process, so practice articulating your approach clearly.
✨Ask Smart Questions
Come prepared with insightful questions about the role and the team. Inquire about their current projects or challenges they face in agent evaluation. This not only shows your enthusiasm but also helps you gauge if the company is the right fit for you.
✨Be Detail-Oriented
Since the role requires a keen eye for detail, be ready to discuss how you've approached ambiguous problems in the past. Share examples where you defined clear criteria for evaluating outcomes, as this will highlight your analytical skills and attention to detail.