Member of Engineering (Pre-training / Data Research)
Member of Engineering (Pre-training / Data Research)

Member of Engineering (Pre-training / Data Research)

Trainee 36000 - 60000 £ / year (est.) Home office possible
P

At a Glance

  • Tasks: Join our data team to enhance the quality of datasets for training AI models.
  • Company: Poolside, a pioneering company in Artificial General Intelligence.
  • Benefits: Enjoy remote work, flexible hours, 37 days off, and health insurance.
  • Why this job: Be at the forefront of AI research and make a real impact on technology.
  • Qualifications: Strong background in machine learning and experience with Large Language Models.
  • Other info: Collaborative culture with frequent team meet-ups and a focus on personal growth.

The predicted salary is between 36000 - 60000 £ per year.

ABOUT POOLSIDE

In this decade, the world will create Artificial General Intelligence. There will only be a small number of companies who will achieve this. Their ability to stack advantages and pull ahead will define the winners. These companies will move faster than anyone else. They will attract the world's most capable talent. They will be on the forefront of applied research, engineering, infrastructure and deployment at scale. They will continue to scale their training to larger & more capable models. They will be given the right to raise large amounts of capital along their journey to enable this. They will create powerful economic engines. They will obsess over the success of their users and customers. Poolside exists to be this company - to build a world where AI will be the engine behind economically valuable work and scientific progress.

ABOUT OUR TEAM

We are a remote-first team that sits across Europe and North America and comes together once a month in-person for 3 days and for longer offsites twice a year. Our R&D and production teams are a combination of more research and more engineering-oriented profiles, however, everyone deeply cares about the quality of the systems we build and has a strong underlying knowledge of software development. We believe that good engineering leads to faster development iterations, which allows us to compound our efforts.

ABOUT THE ROLE

You would be working on our data team focused on the quality of the datasets being delivered for training our models. This is a hands-on role where your #1 mission would be to improve the quality of the pretraining datasets by leveraging your previous experience, intuition and training experiments. This includes synthetic data generation and data mix optimization. You would be closely collaborating with other teams like Pre-training, Fine-tuning and Product to define high-quality data both quantitatively and qualitatively. Staying in sync with the latest research in the field of dataset design and pretraining is key for being successful in a role where you would be constantly showing original research initiatives with short time-bounded experiments and highly technical engineering competence while deploying your solutions in production. With the volumes of data to process being massive, you'll have at your disposal a performant distributed data pipeline together with a large GPU cluster.

YOUR MISSION

To deliver massive-scale datasets of natural language and source code with the highest quality for training Poolside models.

RESPONSIBILITIES

  • Follow the latest research related to LLMs and data quality in particular.
  • Be familiar with the most relevant open-source datasets and models.
  • Closely work with other teams such as Pretraining, Fine-tuning or Product to ensure short feedback loops on the quality of the models delivered.
  • Suggest, conduct and analyze data ablations or training experiments that aim to improve the quality of the datasets generated via quantitative insights.

SKILLS & EXPERIENCE

  • Strong machine learning and engineering background.
  • Experience with Large Language Models (LLM).
  • Good knowledge of Transformers is a must.
  • Knowledge/Experience with cutting-edge training tricks.
  • Knowledge/Experience of distributed training.
  • Trained LLMs from scratch.
  • Knowledge of deep learning fundamentals.
  • Experience in building trillion-scale pretraining datasets, in particular:
  • Ingest, filter and deduplicate large amounts of web and code data.
  • Familiar with concepts making SOTA pretraining datasets: multi-linguality, curriculum learning, data augmentation, data packing, etc.
  • Run data ablations, tokenization and data-mixture experiments.
  • Develop prompt engineering pipelines to generate synthetic data at scale.
  • Fine-tuning small models for data filtering purposes.
  • Experience working with large-scale GPU clusters and distributed data pipelines.
  • Strong obsession with data quality.
  • Research experience.
  • Author of scientific papers on any of the topics: applied deep learning, LLMs, source code generation, etc, is a nice to have.
  • Can freely discuss the latest papers and descend to fine details.
  • Is reasonably opinionated.
  • Programming experience.
  • Strong algorithmic skills.
  • Linux, Git, Docker, k8s, cloud managed services.
  • Data pipelines and queues.
  • Python with PyTorch or Jax.
  • Nice to have: Prior experience in non-ML programming, especially not in Python C/C++, CUDA, Triton.
  • PROCESS

    • Intro call with Eiso, our CTO & Co-Founder.
    • Technical Interview(s) with one of our Founding Engineers.
    • Team fit call with the People team.
    • Final interview with Eiso, our CTO & Co-Founder.

    BENEFITS

    • Fully remote work & flexible hours.
    • 37 days/year of vacation & holidays.
    • Health insurance allowance for you and dependents.
    • Company-provided equipment.
    • Wellbeing, always-be-learning and home office allowances.
    • Frequent team get togethers.
    • Great diverse & inclusive people-first culture.

    Member of Engineering (Pre-training / Data Research) employer: poolside

    Poolside is an exceptional employer for those passionate about advancing artificial intelligence, offering a fully remote work environment with flexible hours and a generous 37 days of vacation per year. Our inclusive culture prioritises employee wellbeing and continuous learning, while providing opportunities for collaboration across diverse teams and access to cutting-edge technology, ensuring that you can thrive in your role as a Member of Engineering focused on data research.
    P

    Contact Detail:

    poolside Recruiting Team

    StudySmarter Expert Advice 🤫

    We think this is how you could land Member of Engineering (Pre-training / Data Research)

    ✨Tip Number 1

    Network like a pro! Reach out to people in the industry, especially those already working at Poolside. A friendly chat can go a long way, and who knows, they might even put in a good word for you!

    ✨Tip Number 2

    Show off your skills! Prepare a portfolio or a project that highlights your experience with large datasets and machine learning. This is your chance to demonstrate your expertise and passion for data quality.

    ✨Tip Number 3

    Stay updated on the latest research! Dive into recent papers about LLMs and dataset design. Being able to discuss these topics during interviews will show that you're genuinely interested and knowledgeable.

    ✨Tip Number 4

    Apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re proactive and keen to join the team at Poolside.

    We think you need these skills to ace Member of Engineering (Pre-training / Data Research)

    Machine Learning
    Large Language Models (LLM)
    Transformers
    Distributed Training
    Deep Learning Fundamentals
    Data Quality Analysis
    Data Ablations
    Tokenization
    Data Augmentation
    Prompt Engineering
    Python
    PyTorch
    Git
    Docker
    Cloud Managed Services

    Some tips for your application 🫡

    Show Your Passion for Data Quality: When writing your application, let us know why you're obsessed with data quality. Share any relevant experiences or projects that highlight your commitment to improving datasets, especially in the context of machine learning.

    Be Specific About Your Skills: We want to see your technical prowess! Make sure to detail your experience with Large Language Models, distributed training, and any specific tools you've used like Python, PyTorch, or Jax. The more specific you are, the better!

    Connect with Our Mission: In your application, connect your personal goals with our mission at Poolside. Explain how your background and aspirations align with building a world where AI drives economic value and scientific progress.

    Apply Through Our Website: Don't forget to apply through our website! It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows you’re keen on joining our team!

    How to prepare for a job interview at poolside

    ✨Know Your Stuff

    Make sure you brush up on the latest research related to Large Language Models (LLMs) and data quality. Familiarise yourself with relevant open-source datasets and models, as well as cutting-edge training tricks. This will not only show your passion but also your commitment to staying ahead in the field.

    ✨Collaborate Like a Pro

    Since this role involves working closely with teams like Pre-training and Fine-tuning, be prepared to discuss how you would ensure short feedback loops on model quality. Think of examples from your past experiences where collaboration led to successful outcomes, and be ready to share those stories.

    ✨Show Off Your Technical Skills

    Be ready to dive deep into your technical expertise during the interview. Whether it's discussing your experience with distributed training or your knowledge of Transformers, make sure you can articulate your skills clearly. Prepare to talk about specific projects where you’ve built large-scale pretraining datasets or run data ablations.

    ✨Ask Insightful Questions

    Interviews are a two-way street, so come armed with thoughtful questions about the company’s approach to data quality and their engineering processes. This shows that you’re genuinely interested in the role and helps you gauge if the company is the right fit for you.

    Member of Engineering (Pre-training / Data Research)
    poolside

    Land your dream job quicker with Premium

    You’re marked as a top applicant with our partner companies
    Individual CV and cover letter feedback including tailoring to specific job roles
    Be among the first applications for new jobs with our AI application
    1:1 support and career advice from our career coaches
    Go Premium

    Money-back if you don't land a job in 6-months

    P
    Similar positions in other companies
    UK’s top job board for Gen Z
    discover-jobs-cta
    Discover now
    >