At a Glance
- Tasks: Evaluate and improve AI agent systems in a dynamic gaming environment.
- Company: Join Tencent's Level Infinite, a leader in global gaming innovation.
- Benefits: Gain hands-on experience, mentorship, and potential for future roles in AI.
- Other info: Inclusive culture that values diverse perspectives and fosters growth.
- Why this job: Dive into cutting-edge AI technology and shape the future of gaming.
- Qualifications: Pursuing or recently graduated in relevant tech fields with Python skills.
The predicted salary is between 20000 - 30000 € per year.
About the Hiring Team
Level Infinite is Tencent’s global gaming brand. It is a global game publisher offering a comprehensive network of services for games, development teams, and studios around the world. We are dedicated to delivering engaging and original gaming experiences to a worldwide audience, whenever and wherever they choose to play while building a community that fosters inclusivity, connection, and accessibility. Level Infinite also provides a wide range of services and resources to our network of developers and partner studios around the world to help them unlock the true potential of their games.
What the Role Entails
We are hiring an intern to work on evaluation and reliability infrastructure for a real-world LLM agent system in the UA performance marketing field. The agent performs multi-step reasoning, retrieves context, selects tools, executes actions, handles user confirmations, and interacts with external services. The goal of this internship is to build transferable expertise in agent evaluation engineering: evaluating tool use, measuring trajectory quality, designing benchmarks, analyzing traces, comparing model and prompt variants, and improving the reliability of agentic AI systems.
- Research the state-of-the-art agentic workflow evaluation frameworks in the industry and in the research field.
- Apply the theory to build automated evaluation pipelines that can run agent scenarios, capture execution artifacts, score results, and detect regressions.
- Evaluate tool-use behavior, including whether the agent selects the right tool, passes correct arguments, avoids unnecessary calls, and handles tool errors appropriately.
- Analyze agent trajectories using traces, logs, intermediate steps, and final outputs to identify reasoning failures, context misuse, hallucinated assumptions, and brittle workflow patterns.
- Design metrics for agent reliability, including success rate, tool-call precision, argument accuracy, recovery rate, retry count, latency, cost, and safety-related failure rates.
- Create reusable evaluation datasets from synthetic cases, golden workflows, and real anonymized executions.
- Support experiments comparing prompts, model providers, tool descriptions, memory strategies, context construction methods, and execution modes.
- Help build human evaluation workflows and rubrics for judging agent correctness, faithfulness, usefulness, and risk awareness.
- Work with engineers to translate evaluation findings into better tests, monitoring signals, tool interfaces, prompts, and guardrails.
- Potentially compose research papers and publish in scientific conferences.
Who We Look For
- Currently pursuing or recent graduates of a Master’s or PhD degree in Computer Science, Artificial Intelligence, Machine Learning, Software Engineering, Data Science, or a related field.
- Strong Python fundamentals and interest in AI systems.
- Curious about how LLM agents work, fail, and improve.
- Interested in evaluation methodology, not just application building.
- Comfortable reading logs, traces, test cases, and structured data.
- Detail-oriented and able to define clear, measurable criteria for ambiguous agent behavior.
- Prior experience with LLMs, LangChain-like agents, tool calling, pytest, data analysis, or observability tools is helpful but not required.
Equal Employment Opportunity at Tencent
As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.
Agent Evaluation Intern in London employer: Tencent
Level Infinite, as part of Tencent's global gaming brand, is an exceptional employer that champions innovation and inclusivity within the gaming industry. Our vibrant work culture encourages collaboration and creativity, providing interns with invaluable hands-on experience in cutting-edge AI technologies while fostering personal and professional growth. Located in a dynamic environment, we offer unique opportunities to engage with industry leaders and contribute to meaningful projects that shape the future of gaming.
StudySmarter Expert Advice🤫
We think this is how you could land Agent Evaluation Intern in London
✨Tip Number 1
Network like a pro! Reach out to people in the gaming and AI fields on LinkedIn or at events. A friendly chat can open doors that a CV just can't.
✨Tip Number 2
Show off your skills! Create a portfolio showcasing any projects related to LLMs or AI systems. This gives you a chance to demonstrate your expertise beyond what's on paper.
✨Tip Number 3
Prepare for interviews by diving deep into the company’s work. Understand their products and how they use AI. Tailor your answers to show how you can contribute to their goals.
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you're genuinely interested in joining our team.
We think you need these skills to ace Agent Evaluation Intern in London
Some tips for your application 🫡
Tailor Your CV:Make sure your CV reflects the skills and experiences that align with the Agent Evaluation Intern role. Highlight any relevant projects or coursework in AI, machine learning, or software engineering that showcase your Python skills and curiosity about LLM agents.
Craft a Compelling Cover Letter:Use your cover letter to tell us why you're passionate about AI systems and evaluation methodologies. Share specific examples of your interest in the field and how you can contribute to our team at Level Infinite.
Showcase Your Projects:If you've worked on any projects related to LLMs or AI evaluation, make sure to mention them! We love seeing practical applications of your skills, so include links to your GitHub or any relevant portfolios.
Apply Through Our Website:We encourage you to apply directly through our website for the best chance of getting noticed. It’s the easiest way for us to keep track of your application and ensure it reaches the right people!
How to prepare for a job interview at Tencent
✨Know Your Stuff
Make sure you brush up on your knowledge of LLMs and agent evaluation methodologies. Familiarise yourself with the latest frameworks in the industry and be ready to discuss how they apply to the role. This shows your genuine interest and understanding of the field.
✨Show Off Your Python Skills
Since strong Python fundamentals are key for this internship, be prepared to demonstrate your coding skills. You might be asked to solve a problem or explain your thought process while coding. Practising common algorithms or data structures can really help you shine.
✨Ask Smart Questions
Prepare thoughtful questions about the team’s projects, challenges they face, or their approach to AI safety evaluation. This not only shows your enthusiasm but also helps you gauge if the company culture aligns with your values.
✨Be Detail-Oriented
Highlight your attention to detail by discussing past experiences where you defined clear criteria for evaluating ambiguous behaviours. Use specific examples to illustrate how you’ve approached similar challenges, as this will resonate well with the evaluative nature of the role.