Remote Member of Engineering (Pre-training / Data Acquisition) in Poole

Remote Member of Engineering (Pre-training / Data Acquisition) in Poole

Poole Full-Time 60000 - 80000 £ / year (est.) Working from home possible
P

At a Glance

  • Tasks: Join our team to build cutting-edge web crawlers for AI data acquisition.
  • Company: Poolside, a pioneering company in Artificial General Intelligence.
  • Benefits: Enjoy fully remote work, flexible hours, and generous vacation time.
  • Other info: Collaborative culture with a focus on innovation and personal growth.
  • Why this job: Make a real impact on the future of AI and software development.
  • Qualifications: Experience in distributed systems and web crawling; Python proficiency required.

The predicted salary is between 60000 - 80000 £ per year.

ABOUT POOLSIDE

In this decade, the world will create Artificial General Intelligence. There will only be a small number of companies who will achieve this. Their ability to stack advantages and pull ahead will define the winners. These companies will move faster than anyone else. They will attract the world's most capable talent. They will be on the forefront of applied research, engineering, infrastructure and deployment at scale. They will continue to scale their training to larger & more capable models. They will create powerful economic engines. They will obsess over the success of their users and customers.

Poolside exists to be this company: to build a world where AI will be the engine behind economically valuable work and scientific progress. We believe the fastest way to reach AGI lies in accelerating software development itself, by reshaping the developer experience with agentic systems, coding assistants, and the frontier models that power them. We deploy these systems directly into the development environments of security-conscious enterprises.

ABOUT OUR TEAM

We were founded in the US and have our home there, but our team is distributed across Europe and North America. We get our fix of in-person collaboration (and croissants) in Paris each month for 3 days, always Monday-Wednesday, with an open invitation to stay the whole week. We also do longer off-sites once a year.

Our team is a multidisciplinary blend of research, engineering, and business experts. What unites us is our deep care for what we build together. We’re in a race that requires hard work, intellectual curiosity, and obsession; to balance this intensity, we’ve assembled a team of low ego and kind-hearted individuals who have built the special culture Poolside has. By building collaboratively and with intention, we create a compounding effect that moves the entire company forward towards our mission: reaching AGI through intelligence systems built for software development.

ABOUT THE ROLE

You'll be working alongside our pre-training data team, focused on one of the most foundational challenges in training frontier LLMs: acquiring the best possible pre-training data. The data we collect is upstream of everything. It directly shapes the capability of the models we train. As our first dedicated data acquisition engineer, you will spearhead and evolve systems that crawl the web at massive scale, rapidly ingest data from strategic partnerships, and build specialized tooling to maximize recall from high-value sources. You'll collaborate closely with pre-training data researchers and engineers to ensure that our sourcing of data maps to our training needs, to ensure we have the most capable pre-trained models.

YOUR MISSION

To deliver the highest-quality, diverse, and most comprehensive data corpus to fuel the pre-training of frontier models for software development.

RESPONSIBILITIES

  • Design, build, and operate a large-scale web crawler responsible for acquiring all openly accessible data on the internet
  • Develop specialized deep crawlers targeting high-value sources to improve recall and coverage
  • In collaboration with data researchers, own a long-term road map for data acquisition
  • Build observability, monitoring, and debugging tooling to ensure reliability and transparency across crawl infrastructure
  • Collaborate with pre-training, post-training, and evaluations teams to align data acquisition priorities with model training needs
  • Build high-throughput ingestion pipelines for rapidly onboarding partner data and evaluating it for quality

SKILLS & EXPERIENCE

  • Strong distributed systems background with proven experience building and operating large-scale infrastructure — data pipelines, web crawlers, or similar
  • Proficiency in Python, and comfortable optimizing performance and debugging complex systems under production conditions
  • Hands-on experience with web crawling or large-scale data extraction: understanding of HTTP protocols, distributed job queues, and data parsing at scale
  • Familiarity with cloud platforms (AWS) and container orchestration (Kubernetes, Docker) for deploying and managing high-throughput workloads
  • Awareness of the non-technical dimensions of internet-scale crawling: data privacy, robots.txt adherence, and responsible crawl practices
  • Nice to have: Prior experience pre-training LLMs
  • Experience in building trillion-scale SOTA pre-training datasets
  • Experience translating research to production at scale

PROCESS

  • Intro call with one of our Founding Engineers
  • Technical Interview(s) with one of our Members of Engineering
  • Team fit call with the People team
  • Final interview with one of our Founding Engineers

BENEFITS

  • Fully remote work & flexible hours
  • 37 days/year of vacation & holidays
  • 16 weeks of flexible, full-pay parental leave
  • Health insurance allowance for you & dependents
  • Company-provided equipment
  • Well-being, always-be-learning & home office allowances
  • Frequent team get togethers
  • Diverse & inclusive people-first culture

Remote Member of Engineering (Pre-training / Data Acquisition) in Poole employer: poolside

At Poolside, we pride ourselves on being an exceptional employer, offering a fully remote work environment with flexible hours that empower our team to thrive. Our commitment to employee well-being is reflected in our generous benefits package, including 37 days of vacation, comprehensive health insurance, and a culture that prioritises diversity and inclusion. With opportunities for professional growth and regular team gatherings in vibrant locations like Paris, we foster a collaborative atmosphere where innovation and kindness flourish, making it an exciting place to contribute to the future of AI.

P

Contact Details:

poolside Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Remote Member of Engineering (Pre-training / Data Acquisition) in Poole

Get Involved in Data Science Meetups

Tap into local data science meetups or workshops to connect with fellow enthusiasts and professionals. These events are goldmines for networking, and sometimes even lead directly to job openings at companies like poolside!

Show Off Your Projects

Start building a public portfolio showcasing your data science projects on platforms like GitHub or personal websites. Highlight unique analyses or models you've developed. This not only demonstrates your skills but also gets your name out there for roles like Remote Member of Engineering (Pre-training / Data Acquisition) at poolside.

Leverage Professional Networks

Join professional bodies related to data science, like the Data Science Society or similar organisations. Getting involved can lead to mentorship opportunities and insider knowledge about full-time positions at companies like poolside.

Apply Directly through Our Website

When you find a suitable opening like Remote Member of Engineering (Pre-training / Data Acquisition) at poolside, make sure to apply directly through our website. It gives you an edge and shows you're keen to join our team. Plus, who doesn’t love a direct application? It’s easier than navigating through job boards!

We think you need these skills to ace Remote Member of Engineering (Pre-training / Data Acquisition) in Poole

Distributed Systems
Large-Scale Infrastructure
Data Pipelines
Web Crawlers
Python
Performance Optimisation
Debugging Complex Systems

Some tips for your application 🫡

Show Off Your Projects:In the world of data science, your projects can speak volumes about your skills. Make sure to showcase a few key projects in your CV or portfolio, especially those that highlight your ability to work with data sets, build models, or use relevant tools like Python, R, or SQL. Don’t forget to include links to any GitHub repositories if applicable!

Quantify Your Achievements:Employers love numbers! When drafting your CV, highlight your achievements with quantifiable results. For instance, mention how your data analysis led to a certain percentage increase in efficiency or revenue at a previous job or project. These details can really make your application pop!

Craft a Tailored Cover Letter:For a full-time role at poolside, your cover letter should reflect your passion for data science and your excitement about the specific projects or values of the company. Dive into why you’re a good fit, how your skills align with their needs, and any unique perspectives you can bring to the team.

Stand Out with Relevant Courses and Certifications:Although experience talks, relevant courses or certifications can be your ticket to impressing hiring managers at poolside. Mention any standout courses you've completed that equipped you with essential skills, such as machine learning certifications or data visualisation courses. This shows your commitment to continuously developing your skills in the field!

How to prepare for a job interview at poolside

Brush Up on Your Statistics

For a data science role, we need to seriously sharpen our statistics skills. Get ready to tackle technical questions on probability distributions, hypothesis testing, and regression analysis. These are often the bread and butter of data science interviews, so don't just skim over them!

Showcase Your Projects

Prepare a killer portfolio showcasing your data science projects. We should include details about the datasets used, the tools and techniques applied, and the impact of your findings. If we can walk them through a particularly challenging project or a cool visualisation that had real-world implications, it’ll really make us stand out!

Get Comfortable with Python and R

Most data science positions require us to be proficient in programming languages like Python and R. We should practice common libraries like pandas, NumPy, and scikit-learn, and be ready for live coding exercises or algorithm questions. Showing off our coding chops can really impress the interviewers at poolside!

Prepare for Case Studies

Expect to encounter real-world case studies during the interview. We might be asked how we’d approach a data problem or analyse a dataset to extract insights. It's essential to think out loud and demonstrate our problem-solving process so that the interviewer can see our logical thinking in action.