At a Glance
- Tasks: Design and maintain scalable data pipelines using PySpark and Python.
- Company: Join a cutting-edge team focused on building a modern data lake.
- Benefits: Remote work with occasional travel to London; competitive pay up to £450/day.
- Why this job: Shape a high-impact platform from scratch in a greenfield initiative.
- Qualifications: Expertise in PySpark, Python, Databricks, and Azure DevOps required.
- Other info: 6-month contract with potential for long-term extension; active SC clearance needed.
The predicted salary is between 54000 - 75600 £ per year.
We are seeking a PySpark Data Engineer to support the development of a modern, scalable data lake for a new strategic programme. This is a greenfield initiative to replace fragmented legacy reporting solutions, offering the opportunity to shape a long-term, high-impact platform from the ground up.
Key Responsibilities:
- Design, build, and maintain scalable data pipelines using PySpark 3/4 and Python 3.
- Contribute to the creation of a unified data lake following medallion architecture principles.
- Leverage Databricks and Delta Lake (Parquet format) for efficient, reliable data processing.
- Apply BDD testing practices using Python Behave and ensure code quality with Python Coverage.
- Collaborate with cross-functional teams and participate in Agile delivery workflows.
- Manage configurations and workflows using YAML, Git, and Azure DevOps.
Required Skills & Experience:
- Proven expertise in PySpark 3/4 and Python 3 for large-scale data engineering.
- Hands-on experience with Databricks, Delta Lake, and medallion architecture.
- Familiarity with Python Behave for Behaviour Driven Development.
- Strong understanding of YAML, code quality tools (e.g. Python Coverage), and CI/CD pipelines.
- Knowledge of Azure DevOps and Git best practices.
- Active SC clearance is essential - applicants without this cannot be considered.
Contract Details:
- 6-month initial contract with long-term extension potential (multi-year programme).
- Inside IR35.
This is an excellent opportunity to join a high-profile programme at its inception and help build a critical data platform from the ground up. If you are a mission-driven engineer with a passion for scalable data solutions and secure environments, we'd love to hear from you.
Pyspark Data Engineer employer: iO Associates
Contact Detail:
iO Associates Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Pyspark Data Engineer
✨Tip Number 1
Familiarise yourself with the latest features of PySpark 3/4 and Python 3. Being well-versed in these technologies will not only boost your confidence but also allow you to discuss specific use cases during interviews.
✨Tip Number 2
Gain hands-on experience with Databricks and Delta Lake. Consider working on personal projects or contributing to open-source initiatives that utilise these tools, as practical knowledge can set you apart from other candidates.
✨Tip Number 3
Brush up on your understanding of Behaviour Driven Development (BDD) and Python Behave. Being able to articulate how you've implemented BDD in past projects will demonstrate your commitment to code quality and collaboration.
✨Tip Number 4
Ensure you have a solid grasp of YAML, Git, and Azure DevOps best practices. You might want to create a small project that showcases your ability to manage configurations and workflows effectively, which can be a great talking point in interviews.
We think you need these skills to ace Pyspark Data Engineer
Some tips for your application 🫡
Tailor Your CV: Make sure your CV highlights your experience with PySpark, Python, and any relevant data engineering projects. Use specific examples that demonstrate your skills in building scalable data pipelines and working with Databricks and Delta Lake.
Craft a Compelling Cover Letter: In your cover letter, express your enthusiasm for the role and the opportunity to work on a greenfield project. Mention your familiarity with medallion architecture and how your previous experiences align with the responsibilities outlined in the job description.
Showcase Relevant Projects: If you have worked on similar projects, include them in your application. Describe your role, the technologies used (like YAML, Git, and Azure DevOps), and the impact of your contributions. This will help demonstrate your hands-on experience.
Highlight Security Clearance: Since active SC clearance is essential for this position, make sure to clearly state your current security clearance status in your application. This will ensure that your application is considered without any delays.
How to prepare for a job interview at iO Associates
✨Showcase Your Technical Skills
Be prepared to discuss your experience with PySpark and Python in detail. Highlight specific projects where you've designed and built data pipelines, and be ready to explain the challenges you faced and how you overcame them.
✨Understand the Medallion Architecture
Familiarise yourself with medallion architecture principles as they are crucial for this role. Be ready to discuss how you would implement these principles in a data lake environment and provide examples from your past work.
✨Demonstrate Collaboration Skills
Since the role involves working with cross-functional teams, prepare to share examples of how you've successfully collaborated in Agile environments. Discuss your communication style and how you handle feedback and differing opinions.
✨Prepare for Behaviour Driven Development (BDD)
Brush up on BDD practices using Python Behave. Be ready to explain how you ensure code quality and testing in your projects, and consider discussing any tools or methodologies you've used to maintain high standards.