Lead Data Engineer (Databricks, PySpark) in London

Lead Data Engineer (Databricks, PySpark) in London

London Full-Time 80000 - 100000 € / year (est.) No home office possible
EPAM Systems, Inc.

At a Glance

  • Tasks: Lead the design and development of next-gen data platforms using Databricks and PySpark.
  • Company: Join a forward-thinking tech company in London with a hybrid work culture.
  • Benefits: Enjoy competitive salary, health insurance, and perks like free lunches and social events.
  • Other info: Great opportunities for learning and career growth in a dynamic, agile environment.
  • Why this job: Shape the future of data engineering while mentoring others and solving complex challenges.
  • Qualifications: Strong experience in Databricks, PySpark, and cloud services; leadership skills are a plus.

The predicted salary is between 80000 - 100000 € per year.

We're looking for a Lead Data Engineer (Databricks, PySpark) to join our team in London, UK in a hybrid working mode. In this role, you will help shape and deliver next-generation data platforms. You will be hands-on in developing, implementing and optimizing scalable ETL workflows and data pipelines, leveraging the full capabilities of Databricks and modern cloud technologies. You will play a key part in the transition to a robust Lakehouse architecture, working closely with cross-functional teams in an agile environment. This position is ideal for a data engineering leader who enjoys solving complex challenges, mentoring others and working at the forefront of Databricks technology. Experience with any major cloud provider is welcome, but a strong focus on Databricks is essential.

Responsibilities

  • Design, develop and maintain production-grade data applications, reusable frameworks and scalable data pipelines using Databricks, PySpark and Python/Scala.
  • Lead the architectural design and modernization of data platforms to a Lakehouse architecture leveraging Databricks-native technologies such as Delta Lake and Unity Catalog.
  • Drive advanced Spark performance tuning including handling data skew, optimizing Catalyst optimizer/query execution plans and managing cluster compute and memory efficiency for high-volume workloads.
  • Champion modern software engineering practices within the data ecosystem including CI/CD pipelines, Infrastructure as Code (IaC), rigorous code reviews, automated testing and version control.
  • Implement secure, scalable and highly available data solutions leveraging integrations between Databricks and major cloud services (AWS, Azure or GCP).
  • Architect and support AI-driven data solutions including integrating Large Language Models (LLMs), building Agentic workflows and operationalizing GenAI or machine learning models within Databricks pipelines.
  • Act as a Technical Lead in an agile environment collaborating with architects and product owners to decompose complex business requirements into actionable technical strategies, Epics and User Stories.
  • Mentor and upskill engineers fostering a culture of engineering excellence, continuous learning and technical innovation.
  • Serve as a key technical liaison effectively translating and communicating complex architectural decisions, data concepts and system capabilities to both technical and non-technical stakeholders.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Software Engineering or a related field.
  • Deep, hands-on proficiency in PySpark with proven ability to tackle advanced performance tuning, data skew handling, memory management and Catalyst optimizer troubleshooting.
  • Extensive experience building production workloads on Databricks including knowledge of Databricks Workflows, Delta Lake and Unity Catalog for governance and security.
  • Demonstrable experience designing and migrating to Lakehouse architectures utilizing open table formats such as Delta Lake or Apache Iceberg.
  • Strong hands-on experience integrating Databricks with native cloud services on AWS, Azure or GCP.
  • Advanced programming skills in Python (Scala is a plus) with strong understanding of object-oriented and functional programming principles.
  • Proven track record of applying software engineering standards to data pipelines including CI/CD, Infrastructure as Code (e.g. Terraform), version control (Git) and rigorous code reviews.
  • Solid background in implementing automated testing frameworks and data quality validation within pipelines.
  • Proven experience as a Senior or Lead Engineer capable of driving technical strategy, making architectural decisions and decomposing complex solutions into Agile Epics and User Stories.
  • Strong ability to articulate complex technical concepts and trade-offs clearly to both technical peers and non-technical stakeholders.

Advantageous: Official Databricks certifications (e.g. Certified Data Engineer Professional, Spark Developer).

Highly desirable: Hands-on experience or strong interest in AI and Agentic workflows including operationalizing LLMs, using frameworks like LangChain or LlamaIndex or leveraging Databricks ML/MosaicML for GenAI applications.

We offer

  • EPAM Employee Stock Purchase Plan (ESPP).
  • Protection benefits including life assurance, income protection and critical illness cover.
  • Private medical insurance and dental care.
  • Employee Assistance Program.
  • Competitive group pension plan.
  • Cyclescheme, Techscheme and season ticket loans.
  • Various perks such as free Wednesday lunch in-office, on-site massages and regular social events.
  • Learning and development opportunities including in-house training and coaching, professional certifications, over 22,000 courses on LinkedIn Learning Solutions and much more.
  • If otherwise eligible, participation in the discretionary annual bonus program.
  • If otherwise eligible and hired into a qualifying level, participation in the discretionary Long-Term Incentive (LTI) Program.
  • All benefits and perks are subject to certain eligibility requirements.

Lead Data Engineer (Databricks, PySpark) in London employer: EPAM Systems, Inc.

At EPAM, we pride ourselves on being an exceptional employer, offering a dynamic work culture that fosters innovation and collaboration in the heart of London. As a Lead Data Engineer, you will not only have the opportunity to work with cutting-edge technologies like Databricks and PySpark but also benefit from extensive learning and development resources, competitive compensation packages, and a supportive environment that encourages mentorship and professional growth. Our hybrid working model, along with unique perks such as free lunches and wellness initiatives, makes EPAM a truly rewarding place to advance your career while making a meaningful impact.

EPAM Systems, Inc.

Contact Detail:

EPAM Systems, Inc. Recruiting Team

StudySmarter Expert Advice🤫

We think this is how you could land Lead Data Engineer (Databricks, PySpark) in London

Tip Number 1

Network like a pro! Reach out to your connections in the data engineering field, especially those who work with Databricks or PySpark. A friendly chat can lead to insider info about job openings that aren't even advertised yet.

Tip Number 2

Show off your skills! Create a portfolio showcasing your projects, especially those involving Databricks and scalable data pipelines. This gives potential employers a taste of what you can do and sets you apart from the crowd.

Tip Number 3

Prepare for interviews by brushing up on common data engineering challenges and solutions. Be ready to discuss your experience with Lakehouse architectures and performance tuning in detail. We want to see your problem-solving skills in action!

Tip Number 4

Apply through our website! It’s the best way to ensure your application gets noticed. Plus, we love seeing candidates who are proactive and eager to join our team at StudySmarter.

We think you need these skills to ace Lead Data Engineer (Databricks, PySpark) in London

Databricks
PySpark
Python
Scala
ETL Workflows
Lakehouse Architecture
Delta Lake

Some tips for your application 🫡

Tailor Your CV:Make sure your CV is tailored to the Lead Data Engineer role. Highlight your experience with Databricks, PySpark, and any cloud services you've worked with. We want to see how your skills align with what we're looking for!

Showcase Your Projects:Include specific projects where you've designed or optimised data pipelines. We love seeing real-world examples of your work, especially if they involve Lakehouse architectures or advanced Spark performance tuning.

Be Clear and Concise:When writing your application, keep it clear and to the point. Use bullet points for easy reading and make sure to highlight your key achievements. We appreciate straightforward communication!

Apply Through Our Website:Don't forget to apply through our website! It’s the best way for us to receive your application and ensures you’re considered for the role. We can’t wait to see what you bring to the table!

How to prepare for a job interview at EPAM Systems, Inc.

Know Your Tech Inside Out

Make sure you brush up on your Databricks and PySpark skills. Be ready to discuss specific projects where you've implemented scalable ETL workflows or optimised data pipelines. The more detailed examples you can provide, the better!

Showcase Your Leadership Skills

As a Lead Data Engineer, you'll be expected to mentor others and drive technical strategy. Prepare to share experiences where you've led teams or made architectural decisions. Highlight how you fostered a culture of continuous learning.

Understand the Lakehouse Architecture

Familiarise yourself with Lakehouse architecture and its benefits. Be prepared to discuss how you've designed or migrated to such architectures, especially using Delta Lake or Apache Iceberg. This will show your depth of knowledge in modern data solutions.

Communicate Clearly with All Stakeholders

Practice explaining complex technical concepts in simple terms. You’ll need to articulate your ideas to both technical peers and non-technical stakeholders. Think of examples where you've successfully communicated intricate details to diverse audiences.