At a Glance
- Tasks: Transform raw data into training-ready datasets for autonomous driving.
- Company: Join Oxa, a leader in autonomous vehicle technology based in Oxford.
- Benefits: Competitive salary, flexible working, and comprehensive health benefits.
- Other info: Dynamic team environment with opportunities for growth and innovation.
- Why this job: Make a real impact on the future of autonomous driving technology.
- Qualifications: Strong Python and SQL skills; experience with data pipelines and large datasets.
The predicted salary is between 60000 - 80000 ÂŁ per year.
Who are we? Founded in 2014, Oxa is a global leader in autonomous vehicle (AV) technology, dedicated to accelerating Industrial Mobile Autonomy (IMA). We develop advanced physical AI and robotics technology, anchored around our configurable and explainable self-driving software, Oxa Driver; development toolchain, Oxa Foundry; and fleet management software, Oxa Hub. We utilise hardware blueprints known as Reference Autonomy Designs (RADs) to enable the integration of sensors, compute and drive‑by‑wire systems into existing vehicles produced by OEMs. Our solutions automate repetitive industrial driving tasks, such as the towing and carrying of goods in locations like ports, airports and manufacturing facilities, or asset and perimeter monitoring in environments such as solar farms or industrial plants. We’re helping global businesses to address critical challenges like labour shortages and rising operational costs - driving efficiency, productivity, and safety.
Your Role: We are hiring a Data Engineer to help build the systems that prepare, curate, and scale training and evaluation data for machine learning in autonomous driving. You will work across the full data lifecycle, from raw vehicle logs and simulation outputs to curated, labelled, and model‑ready datasets. This includes handling multimodal sensor data, scaling labelling through both human and ML‑based workflows, and enabling intelligent selection of high‑value data from thousands of hours of real‑world and simulated driving. This role sits close to model performance and safety ensuring quality, structure, and selection of data directly influence how perception and planning systems behave in the real world.
What You Will Work On:
- Transform raw multimodal logs (camera, LiDAR, radar) into training‑ready datasets
- Support hand‑labelled and auto‑labelled data pipelines, including validation and quality control
- Help build and scale autolabelling systems, where ML models generate annotations across large datasets
- Support intelligent data curation and selection from thousands of hours of real‑world and simulated driving
- Generate and process simulated data for perception and planning, ensuring sufficient sim‑to‑real fidelity for synthetic data to be useful in training and evaluation
- Manage multiple data representations, including sensor‑native formats (images, point clouds), structured scene representations (objects, semantics, occupancy), and bird’s‑eye view (BEV) representations for downstream models
- Support dataset generation for perception models (for example detection, segmentation, and occupancy) and planning models (behavioural learning)
Key Responsibilities:
- Design, build, and maintain scalable data pipelines from raw logs to training datasets
- Contribute to systems for dataset generation, versioning, and reproducibility
- Develop and operate autolabelling pipelines, integrating model outputs into labelling workflows
- Implement quality control mechanisms for both human and ML‑generated labels
- Support ML-assisted data curation workflows to identify high‑value or failure‑prone scenarios
- Build pipelines to generate, transform, and validate simulated datasets, helping identify and reduce sim‑to‑real mismatches to improve their usefulness for training and evaluation
- Work closely with ML engineers to translate model requirements into data pipelines and datasets
- Debug data issues across the stack, from sensor‑level artefacts to dataset inconsistencies
- Improve storage, compute, and throughput efficiency for large‑scale datasets
What You Need to Succeed:
- Strong software engineering skills, with Python as a primary language
- Strong SQL skills and experience working with analytical data warehouses (e.g. BigQuery, Snowflake)
- Experience building production‑grade data pipelines or distributed data systems
- Experience working with large‑scale datasets
- Familiarity with cloud infrastructure (e.g. GCP, AWS, or similar)
- Solid understanding of data modelling, transformation, and data quality considerations
Extra Kudos If You Have:
- Experience working with ML data pipelines or supporting ML systems
- Familiarity with computer vision, robotics, or autonomous systems
- Experience working with multimodal sensor data, such as images, LiDAR, or radar
- Exposure to labelling workflows, autolabelling, or dataset curation
- Experience with spatial or geospatial data
- Familiarity with Linux‑based development environments
- Experience with tools such as Docker, shell scripting, workflow orchestrators, and transformation frameworks (e.g. Hera Workflows, dbt)
Benefits:
- Competitive salary, benchmarked against the market and reviewed annually
- Company share programme
- Hybrid and/or flexible remote working arrangements
- Core benefits of market leading private healthcare, life assurance, critical illness cover, income protection, alongside a company paid health cash plan (including gym discounts)
- A salary exchange pension plan
- 25 days’ annual leave plus bank holidays
- A pet‑friendly office environment
- Safe assigned spaces for team members with individual and diverse needs
Data Engineer - ML Systems for Autonomous Driving in England employer: Oxa
Contact Detail:
Oxa Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Data Engineer - ML Systems for Autonomous Driving in England
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can put in a good word for you.
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those related to data engineering or machine learning. This gives potential employers a taste of what you can do beyond your CV.
✨Tip Number 3
Prepare for interviews by brushing up on common data engineering questions and practical tasks. Practice coding challenges and be ready to discuss your past projects in detail. Confidence is key!
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining our team at Oxa.
We think you need these skills to ace Data Engineer - ML Systems for Autonomous Driving in England
Some tips for your application 🫡
Tailor Your CV: Make sure your CV is tailored to the Data Engineer role. Highlight your experience with data pipelines, Python, and SQL, and don’t forget to mention any work with ML systems or multimodal sensor data!
Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you’re passionate about autonomous driving and how your skills align with our mission at Oxa. Keep it concise but impactful!
Showcase Relevant Projects: If you've worked on projects related to data engineering or machine learning, make sure to showcase them. Include links to your GitHub or any relevant portfolios that demonstrate your expertise.
Apply Through Our Website: We encourage you to apply through our website for the best chance of getting noticed. It’s straightforward and ensures your application goes directly to us, so we can review it promptly!
How to prepare for a job interview at Oxa
✨Know Your Data Inside Out
Make sure you’re familiar with the types of data you'll be working with, especially multimodal sensor data like camera, LiDAR, and radar. Brush up on how to transform raw logs into training-ready datasets, as this will likely come up in your interview.
✨Showcase Your Software Skills
Since strong software engineering skills are crucial for this role, be prepared to discuss your experience with Python and SQL. Bring examples of production-grade data pipelines you've built or worked on, and be ready to explain your thought process behind them.
✨Understand the ML Pipeline
Familiarise yourself with machine learning data pipelines and how they integrate with data curation workflows. Be ready to talk about any experience you have with autolabelling systems and quality control mechanisms, as these are key aspects of the job.
✨Prepare Questions About the Role
Interviews are a two-way street! Prepare insightful questions about the team’s current projects, challenges they face with data management, or how they ensure sim-to-real fidelity. This shows your genuine interest in the role and helps you assess if it’s the right fit for you.