At a Glance
- Tasks: Evaluate and improve AI agent systems in a dynamic gaming environment.
- Company: Join Level Infinite, Tencent's global gaming brand, dedicated to innovative gaming experiences.
- Benefits: Gain hands-on experience in AI evaluation with potential for future opportunities.
- Other info: Diverse and inclusive workplace fostering creativity and collaboration.
- Why this job: Be at the forefront of AI technology and contribute to cutting-edge gaming solutions.
- Qualifications: Pursuing or recently graduated in relevant fields with strong Python skills.
The predicted salary is between 20000 - 30000 £ per year.
About The Hiring Team
Level Infinite is Tencent’s global gaming brand. It is a global game publisher offering a comprehensive network of services for games, development teams, and studios around the world. We are dedicated to delivering engaging and original gaming experiences to a worldwide audience, whenever and wherever they choose to play while building a community that fosters inclusivity, connection, and accessibility. Level Infinite also provides a wide range of services and resources to our network of developers and partner studios around the world to help them unlock the true potential of their games.
What The Role Entails
We are hiring an intern to work on evaluation and reliability infrastructure for a real-world LLM agent system in the UA performance marketing field. The agent performs multi-step reasoning, retrieves context, selects tools, executes actions, handles user confirmations, and interacts with external services. The goal of this internship is to build transferable expertise in agent evaluation engineering: evaluating tool use, measuring trajectory quality, designing benchmarks, analyzing traces, comparing model and prompt variants, and improving the reliability of agentic AI systems.
- Research the state-of-the-art agentic workflow evaluation frameworks in the industry and in the research field.
- Apply the theory to build automated evaluation pipelines that can run agent scenarios, capture execution artifacts, score results, and detect regressions.
- Evaluate tool-use behavior, including whether the agent selects the right tool, passes correct arguments, avoids unnecessary calls, and handles tool errors appropriately.
- Analyze agent trajectories using traces, logs, intermediate steps, and final outputs to identify reasoning failures, context misuse, hallucinated assumptions, and brittle workflow patterns.
- Design metrics for agent reliability, including success rate, tool-call precision, argument accuracy, recovery rate, retry count, latency, cost, and safety-related failure rates.
- Create reusable evaluation datasets from synthetic cases, golden workflows, and real anonymized executions.
- Support experiments comparing prompts, model providers, tool descriptions, memory strategies, context construction methods, and execution modes.
- Help build human evaluation workflows and rubrics for judging agent correctness, faithfulness, usefulness, and risk awareness.
- Work with engineers to translate evaluation findings into better tests, monitoring signals, tool interfaces, prompts, and guardrails.
- Potentially compose research papers and publish in scientific conferences.
Who We Look For
Currently pursuing or recent graduates of a Master’s or PhD degree in Computer Science, Artificial Intelligence, Machine Learning, Software Engineering, Data Science, or a related field. Strong Python fundamentals and interest in AI systems. Curious about how LLM agents work, fail, and improve. Interested in evaluation methodology, not just application building. Comfortable reading logs, traces, test cases, and structured data. Detail-oriented and able to define clear, measurable criteria for ambiguous agent behavior. Prior experience with LLMs, LangChain-like agents, tool calling, pytest, data analysis, or observability tools is helpful but not required.
Equal Employment Opportunity at Tencent
As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.
Agent Evaluation Intern employer: Tencent
Level Infinite is an exceptional employer that champions innovation and inclusivity within the gaming industry. Located in a vibrant environment, we offer interns the chance to work on cutting-edge AI systems while benefiting from a supportive culture that prioritises employee growth and collaboration. With access to a global network of resources and opportunities for meaningful contributions, our interns are empowered to develop their skills and make a real impact in the world of gaming.
StudySmarter Expert Advice🤫
We think this is how you could land Agent Evaluation Intern
✨Tip Number 1
Network like a pro! Reach out to current or former employees at Level Infinite on LinkedIn. A friendly chat can give us insider info about the company culture and maybe even a referral!
✨Tip Number 2
Prepare for the interview by diving deep into the latest trends in AI and LLMs. We want to show that we’re not just interested in the role, but also passionate about the field. Bring some fresh ideas to the table!
✨Tip Number 3
Practice makes perfect! Do mock interviews with friends or use online platforms. This will help us articulate our thoughts clearly and confidently when discussing our skills and experiences.
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure our application gets seen by the right people. Plus, it shows we’re serious about joining the team at Level Infinite!
We think you need these skills to ace Agent Evaluation Intern
Some tips for your application 🫡
Tailor Your CV:Make sure your CV reflects the skills and experiences that align with the Agent Evaluation Intern role. Highlight any relevant projects or coursework in AI, machine learning, or software engineering that showcase your Python skills and curiosity about LLM agents.
Craft a Compelling Cover Letter:Use your cover letter to tell us why you're passionate about AI systems and evaluation methodologies. Share specific examples of your work or studies that demonstrate your detail-oriented approach and interest in how LLM agents operate.
Showcase Your Curiosity:In your application, let us know about your eagerness to learn and explore the intricacies of agent evaluation. Mention any research you've done on state-of-the-art frameworks or tools that relate to the role, as this will show your proactive attitude.
Apply Through Our Website:We encourage you to submit your application through our website for the best chance of being noticed. It’s the easiest way for us to keep track of your application and ensure it gets to the right people!
How to prepare for a job interview at Tencent
✨Know Your Stuff
Make sure you brush up on the latest trends in AI and LLMs. Familiarise yourself with evaluation methodologies and be ready to discuss how they apply to real-world scenarios. This shows your genuine interest and understanding of the field.
✨Show Off Your Skills
Prepare to demonstrate your Python skills, especially if you have any projects or coursework related to AI systems. Bring examples of your work that highlight your ability to analyse data, read logs, or build evaluation pipelines. Practical experience can really set you apart!
✨Ask Smart Questions
Come prepared with thoughtful questions about the role and the team. Inquire about their current projects, challenges they face in agent evaluation, or how they measure success. This not only shows your enthusiasm but also helps you gauge if the company is the right fit for you.
✨Be Detail-Oriented
Since the role requires a keen eye for detail, be ready to discuss how you've approached ambiguous problems in the past. Share examples where you defined clear criteria for evaluating performance or reliability, as this will demonstrate your suitability for the position.