At a Glance
- Tasks: Lead the design and development of next-gen data platforms using Databricks and PySpark.
- Company: Join a forward-thinking tech company in London with a hybrid work model.
- Benefits: Enjoy competitive pay, health coverage, and perks like free lunches and social events.
- Other info: Great opportunities for learning, growth, and working with cutting-edge AI technologies.
- Why this job: Shape the future of data engineering while mentoring others and solving complex challenges.
- Qualifications: Proven experience in PySpark and Databricks; strong programming skills in Python.
The predicted salary is between 80000 - 100000 € per year.
We're looking for a Lead Data Engineer (Databricks, PySpark) to join our team in London, UK in a hybrid working mode. In this role, you will help shape and deliver next-generation data platforms. You will be hands‑on in developing, implementing and optimizing scalable ETL workflows and data pipelines, leveraging the full capabilities of Databricks and modern cloud technologies. You will play a key part in the transition to a robust Lakehouse architecture, working closely with cross‑functional teams in an agile environment. This position is ideal for a data engineering leader who enjoys solving complex challenges, mentoring others and working at the forefront of Databricks technology. Experience with any major cloud provider is welcome, but a strong focus on Databricks is essential.
Responsibilities
- Design, develop and maintain production‑grade data applications, reusable frameworks and scalable data pipelines using Databricks, PySpark and Python/Scala.
- Lead the architectural design and modernization of data platforms to a Lakehouse architecture leveraging Databricks‑native technologies such as Delta Lake and Unity Catalog.
- Drive advanced Spark performance tuning including handling data skew, optimizing Catalyst optimizer/query execution plans and managing cluster compute and memory efficiency for high‑volume workloads.
- Champion modern software engineering practices within the data ecosystem including CI/CD pipelines, Infrastructure as Code (IaC), rigorous code reviews, automated testing and version control.
- Implement secure, scalable and highly available data solutions leveraging integrations between Databricks and major cloud services (AWS, Azure or GCP).
- Architect and support AI‑driven data solutions including integrating Large Language Models (LLMs), building Agentic workflows and operationalizing GenAI or machine learning models within Databricks pipelines.
- Act as a Technical Lead in an agile environment collaborating with architects and product owners to decompose complex business requirements into actionable technical strategies, Epics and User Stories.
- Mentor and upskill engineers fostering a culture of engineering excellence, continuous learning and technical innovation.
- Serve as a key technical liaison effectively translating and communicating complex architectural decisions, data concepts and system capabilities to both technical and non‑technical stakeholders.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Software Engineering or a related field.
- Deep, hands‑on proficiency in PySpark with proven ability to tackle advanced performance tuning, data skew handling, memory management and Catalyst optimizer troubleshooting.
- Extensive experience building production workloads on Databricks including knowledge of Databricks Workflows, Delta Lake and Unity Catalog for governance and security.
- Demonstrable experience designing and migrating to Lakehouse architectures utilizing open table formats such as Delta Lake or Apache Iceberg.
- Strong hands‑on experience integrating Databricks with native cloud services on AWS, Azure or GCP.
- Advanced programming skills in Python (Scala is a plus) with strong understanding of object‑oriented and functional programming principles.
- Proven track record of applying software engineering standards to data pipelines including CI/CD, Infrastructure as Code (e.g. Terraform), version control (Git) and rigorous code reviews.
- Solid background in implementing automated testing frameworks and data quality validation within pipelines.
- Proven experience as a Senior or Lead Engineer capable of driving technical strategy, making architectural decisions and decomposing complex solutions into Agile Epics and User Stories.
- Strong ability to articulate complex technical concepts and trade‑offs clearly to both technical peers and non‑technical stakeholders.
- Advantageous: Official Databricks certifications (e.g. Certified Data Engineer Professional, Spark Developer).
- Highly desirable: Hands‑on experience or strong interest in AI and Agentic workflows including operationalizing LLMs, using frameworks like LangChain or LlamaIndex or leveraging Databricks ML/MosaicML for GenAI applications.
We offer
- EPAM Employee Stock Purchase Plan (ESPP).
- Protection benefits including life assurance, income protection and critical illness cover.
- Private medical insurance and dental care.
- Employee Assistance Program.
- Cyclescheme, Techscheme and season ticket loans.
- Various perks such as free Wednesday lunch in‑office, on‑site massages and regular social events.
- Learning and development opportunities including in‑house training and coaching, professional certifications, and courses.
- If otherwise eligible, participation in the discretionary annual bonus program.
- If otherwise eligible and hired into a qualifying level, participation in the discretionary Long‑Term Incentive (LTI) Program.
Lead Data Engineer (Databricks, PySpark) in London employer: EPAM Systems
Join a forward-thinking company in London that prioritises innovation and employee growth, offering a hybrid working model for the Lead Data Engineer role. With a strong focus on modern technologies like Databricks and a commitment to fostering a culture of continuous learning, you will have access to extensive training opportunities, competitive benefits including private medical insurance, and a vibrant work environment that encourages collaboration and social engagement.
StudySmarter Expert Advice🤫
We think this is how you could land Lead Data Engineer (Databricks, PySpark) in London
✨Tip Number 1
Network like a pro! Reach out to your connections in the data engineering field, especially those who work with Databricks or PySpark. A friendly chat can lead to insider info about job openings that aren't even advertised yet.
✨Tip Number 2
Show off your skills! Create a portfolio showcasing your projects, especially those involving scalable ETL workflows and Lakehouse architectures. This will give potential employers a taste of what you can do and set you apart from the crowd.
✨Tip Number 3
Prepare for interviews by brushing up on your technical knowledge. Be ready to discuss advanced Spark performance tuning and how you've tackled complex challenges in past roles. We want to see your problem-solving skills in action!
✨Tip Number 4
Don't forget to apply through our website! It’s the best way to ensure your application gets noticed. Plus, we love seeing candidates who are proactive about joining our team at StudySmarter.
We think you need these skills to ace Lead Data Engineer (Databricks, PySpark) in London
Some tips for your application 🫡
Tailor Your CV:Make sure your CV is tailored to the Lead Data Engineer role. Highlight your experience with Databricks, PySpark, and any cloud technologies you've worked with. We want to see how your skills align with our needs!
Showcase Your Projects:Include specific projects where you've designed or optimised data pipelines. We love seeing real-world examples of your work, especially if they involve Lakehouse architectures or advanced Spark performance tuning.
Be Clear and Concise:When writing your application, keep it clear and to the point. Use bullet points for your achievements and responsibilities. We appreciate straightforward communication, especially when it comes to complex technical concepts.
Apply Through Our Website:Don't forget to apply through our website! It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, we love seeing candidates who take that extra step!
How to prepare for a job interview at EPAM Systems
✨Know Your Databricks Inside Out
Make sure you brush up on your Databricks knowledge before the interview. Be ready to discuss your hands-on experience with Databricks Workflows, Delta Lake, and Unity Catalog. Prepare examples of how you've tackled performance tuning and data skew issues in past projects.
✨Showcase Your Leadership Skills
As a Lead Data Engineer, you'll need to demonstrate your ability to mentor and lead teams. Think of specific instances where you've guided others or made architectural decisions. Be prepared to discuss how you foster a culture of engineering excellence and continuous learning.
✨Prepare for Technical Questions
Expect technical questions that dive deep into PySpark, cloud integrations, and CI/CD practices. Brush up on your programming skills in Python and be ready to explain complex concepts clearly. Practising coding challenges can also help you feel more confident.
✨Understand the Business Context
It's crucial to articulate how your technical skills align with the company's goals. Research the company’s projects and think about how your experience can contribute to their success. Be ready to translate technical jargon into business value for non-technical stakeholders.