At a Glance
- Tasks: Design and maintain scalable data pipelines using PySpark and Python.
- Company: Join a high-profile programme focused on building a modern data lake.
- Benefits: Remote work with occasional London travel; competitive pay up to £450/day.
- Why this job: Shape a long-term, impactful platform from the ground up in a collaborative environment.
- Qualifications: Expertise in PySpark, Python, Databricks, and Azure DevOps required.
- Other info: 6-month contract with potential for long-term extension; active SC clearance needed.
The predicted salary is between 72000 - 108000 £ per year.
We are seeking a PySpark Data Engineer to support the development of a modern, scalable data lake for a new strategic programme. This is a greenfield initiative to replace fragmented legacy reporting solutions, offering the opportunity to shape a long-term, high-impact platform from the ground up.
Key Responsibilities:
- Design, build, and maintain scalable data pipelines using PySpark 3/4 and Python 3.
- Contribute to the creation of a unified data lake following medallion architecture principles.
- Leverage Databricks and Delta Lake (Parquet format) for efficient, reliable data processing.
- Apply BDD testing practices using Python Behave and ensure code quality with Python Coverage.
- Collaborate with cross-functional teams and participate in Agile delivery workflows.
- Manage configurations and workflows using YAML, Git, and Azure DevOps.
Required Skills & Experience:
- Proven expertise in PySpark 3/4 and Python 3 for large-scale data engineering.
- Hands-on experience with Databricks, Delta Lake, and medallion architecture.
- Familiarity with Python Behave for Behaviour Driven Development.
- Strong understanding of YAML, code quality tools (e.g. Python Coverage), and CI/CD pipelines.
- Knowledge of Azure DevOps and Git best practices.
- Active SC clearance is essential - applicants without this cannot be considered.
Contract Details:
- 6-month initial contract with long-term extension potential (multi-year programme).
- Inside IR35.
This is an excellent opportunity to join a high-profile programme at its inception and help build a critical data platform from the ground up. If you are a mission-driven engineer with a passion for scalable data solutions and secure environments, we would love to hear from you.
Pyspark Data Engineer employer: Job Traffic
Contact Detail:
Job Traffic Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Pyspark Data Engineer
✨Tip Number 1
Familiarise yourself with the specific technologies mentioned in the job description, such as PySpark, Databricks, and Delta Lake. Having hands-on experience or projects that showcase your skills in these areas will make you stand out during discussions.
✨Tip Number 2
Since this role involves Agile delivery workflows, brush up on Agile methodologies and be prepared to discuss how you've successfully collaborated in cross-functional teams. Sharing specific examples can demonstrate your adaptability and teamwork skills.
✨Tip Number 3
Make sure you understand the principles of medallion architecture and be ready to explain how you would apply them in building a unified data lake. This knowledge will show your strategic thinking and ability to contribute to the project's long-term vision.
✨Tip Number 4
Since active SC clearance is essential for this position, if you already have it, be sure to mention it early in your conversations. If not, consider discussing your eligibility and willingness to obtain it, as this shows your commitment to meeting the role's requirements.
We think you need these skills to ace Pyspark Data Engineer
Some tips for your application 🫡
Tailor Your CV: Make sure your CV highlights your experience with PySpark, Python, and any relevant tools like Databricks and Delta Lake. Use specific examples that demonstrate your expertise in building scalable data pipelines.
Craft a Compelling Cover Letter: In your cover letter, express your enthusiasm for the role and the opportunity to work on a greenfield project. Mention your familiarity with medallion architecture and how your skills align with the company's needs.
Showcase Relevant Projects: If you have worked on similar projects, describe them briefly in your application. Highlight your contributions to data engineering initiatives, especially those involving Agile methodologies and CI/CD practices.
Highlight Security Clearance: Since active SC clearance is essential for this role, make sure to mention it prominently in your application. This will help you stand out as a qualified candidate right from the start.
How to prepare for a job interview at Job Traffic
✨Showcase Your Technical Skills
Be prepared to discuss your experience with PySpark and Python in detail. Highlight specific projects where you've built scalable data pipelines and how you utilised Databricks and Delta Lake. This is your chance to demonstrate your technical expertise.
✨Understand the Medallion Architecture
Familiarise yourself with the medallion architecture principles, as this role involves creating a unified data lake. Be ready to explain how you would implement these principles in a real-world scenario, showcasing your understanding of data engineering best practices.
✨Emphasise Collaboration and Agile Experience
Since the role requires collaboration with cross-functional teams, share examples of how you've worked in Agile environments. Discuss your experience in participating in sprints and how you’ve contributed to team success, which will show your ability to work well with others.
✨Prepare for Behaviour Driven Development (BDD) Questions
As BDD testing practices using Python Behave are part of the job, be ready to discuss your experience with BDD. Prepare to explain how you ensure code quality and how tools like Python Coverage have played a role in your previous projects.