At a Glance
- Tasks: Design and optimise data pipelines using Spark and Python to enhance client projects.
- Company: Join Dataiku, a leader in AI technology and innovation.
- Benefits: Enjoy competitive salary, remote work options, and opportunities for personal growth.
- Other info: Be part of a dynamic team shaping the future of AI.
- Why this job: Make a real impact in the AI field while collaborating with diverse teams.
- Qualifications: Experience with Spark, SQL, and Python; strong problem-solving skills required.
The predicted salary is between 36000 - 60000 £ per year.
Dataiku is looking for a Data Engineer specialized in Spark (PySpark) to join our Field Engineering team. In this role, you will work closely with our clients to troubleshoot and optimize complex data pipelines within the Dataiku platform. This includes both reactive support (advanced issues reported via the support portal) and proactive services (performance reviews and architecture advisory missions we propose to clients). You will serve as a technical expert in data processing, leveraging SQL and Python frameworks. You will specialize in Spark-based distributed data processing and lakehouse architecture. You will help our clients succeed, whether working with SQL-based workflows, processing data on Kubernetes, Databricks, or other modern data platforms.
What You'll Do
- Help customers design, build, and optimize Flows in Dataiku, improving overall project performance and maintainability.
- Debug and enhance complex Spark code and data pipelines for better performance and reliability.
- Guide clients in tuning and scaling Spark environments, such as Kubernetes and Databricks, including providing architectural guidance and best practices to enhance performance and reliability.
- Optimize SQL-based data pipelines to ensure efficient and robust data workflows within Dataiku.
- Advise clients on integrating different data pipelines (Spark, SQL, Python) into optimized solutions.
- Collaborate with internal teams to resolve technical issues and contribute to the knowledge base.
Who You Are
You have deep hands-on experience building, debugging, and tuning Spark pipelines in production environments. Specifically, you have:
- Spark & PySpark Expertise
- Proficiency in writing and debugging PySpark code for large-scale data processing.
- Experience with Parquet, Delta Lake, and columnar file formats.
- Understanding of Spark’s interaction with metastores (e.g., Hive, Unity Catalog).
- Deep understanding of resource management: Spark executors, cores, memory, and relevant configurations (e.g., spark.executor.memory, spark.sql.shuffle.partitions).
- Expertise in tuning Spark jobs: partitioning, caching, broadcast joins, and avoiding unnecessary shuffles.
- Lakehouse & Orchestration
- Familiarity with lakehouse architectures and ACID-compliant data layers (Delta Lake, Iceberg, Hudi).
- Experience working with Databricks, including Databricks Connect and Databricks Workflows.
- Experience automating and scheduling Spark jobs using tools like Apache Airflow or native orchestration tools.
- Core Data Engineering Skills
- Proven experience developing, optimizing, and troubleshooting SQL-based data pipelines for efficient ETL and data transformation processes.
- Proficiency in building and managing data transformation workflows in Python, leveraging frameworks such as pandas.
- Familiarity with data modeling concepts and data quality best practices.
- Experience integrating data from a variety of sources, including databases, APIs, and cloud storages.
- Ability to communicate technical concepts effectively to both technical and non-technical stakeholders.
What does the hiring process look like?
- Initial call with a member of our Technical Recruiting team.
- Video call with the Field Engineer Hiring Manager.
- Technical Assessment to show your skills (Home Test).
- Debrief of your Tech Assessment with FE Team members.
- Final Interview with the VP Field Engineering.
At Dataiku, you'll be part of a journey to shape the ever-evolving world of AI. We're not just building a product; we're crafting the future of AI. If you're ready to make a significant impact in a company that values innovation, collaboration, and your personal growth, we can't wait to welcome you to Dataiku!
Our practices are rooted in the idea that everyone should be treated with dignity, decency and fairness. Dataiku also believes that a diverse identity is a source of strength and allows us to optimize across the many dimensions that are needed for our success. Therefore, we are proud to be an equal opportunity employer. All employment practices are based on business needs, without regard to race, ethnicity, gender identity or expression, sexual orientation, religion, age, neurodiversity, disability status, citizenship, veteran status or any other aspect which makes an individual unique or protected by laws and regulations in the locations where we operate.
If you need assistance or an accommodation, please contact us at: reasonable-accommodations@dataiku.com
Protect yourself from fraudulent recruitment activity. Dataiku will never ask you for payment of any type during the interview or hiring process. Other than our video-conference application, Zoom, we will also never ask you to make purchases or download third-party applications during the process. If you experience something out of the ordinary or suspect fraudulent activity, please review our page on identifying and reporting fraudulent activity.
Data Engineer – Spark Specialist in Ledbury employer: Dataiku
Dataiku is an exceptional employer that fosters a culture of innovation and collaboration, making it an ideal place for Data Engineers to thrive. With a strong commitment to employee growth, Dataiku offers opportunities to work on cutting-edge AI technologies while supporting a diverse and inclusive workplace. Located in Ledbury, England, employees benefit from a supportive environment that values personal development and encourages meaningful contributions to the future of AI.
StudySmarter Expert Advice🤫
We think this is how you could land Data Engineer – Spark Specialist in Ledbury
✨Get Involved in Data Science Meetups
Tap into local data science meetups or workshops to connect with fellow enthusiasts and professionals. These events are goldmines for networking, and sometimes even lead directly to job openings at companies like Dataiku!
✨Show Off Your Projects
Start building a public portfolio showcasing your data science projects on platforms like GitHub or personal websites. Highlight unique analyses or models you've developed. This not only demonstrates your skills but also gets your name out there for roles like Data Engineer – Spark Specialist at Dataiku.
✨Leverage Professional Networks
Join professional bodies related to data science, like the Data Science Society or similar organisations. Getting involved can lead to mentorship opportunities and insider knowledge about full-time positions at companies like Dataiku.
✨Apply Directly through Our Website
When you find a suitable opening like Data Engineer – Spark Specialist at Dataiku, make sure to apply directly through our website. It gives you an edge and shows you're keen to join our team. Plus, who doesn’t love a direct application? It’s easier than navigating through job boards!
We think you need these skills to ace Data Engineer – Spark Specialist in Ledbury
Some tips for your application 🫡
Show Off Your Projects:In the world of data science, your projects can speak volumes about your skills. Make sure to showcase a few key projects in your CV or portfolio, especially those that highlight your ability to work with data sets, build models, or use relevant tools like Python, R, or SQL. Don’t forget to include links to any GitHub repositories if applicable!
Quantify Your Achievements:Employers love numbers! When drafting your CV, highlight your achievements with quantifiable results. For instance, mention how your data analysis led to a certain percentage increase in efficiency or revenue at a previous job or project. These details can really make your application pop!
Craft a Tailored Cover Letter:For a full-time role at Dataiku, your cover letter should reflect your passion for data science and your excitement about the specific projects or values of the company. Dive into why you’re a good fit, how your skills align with their needs, and any unique perspectives you can bring to the team.
Stand Out with Relevant Courses and Certifications:Although experience talks, relevant courses or certifications can be your ticket to impressing hiring managers at Dataiku. Mention any standout courses you've completed that equipped you with essential skills, such as machine learning certifications or data visualisation courses. This shows your commitment to continuously developing your skills in the field!
How to prepare for a job interview at Dataiku
✨Brush Up on Your Statistics
For a data science role, we need to seriously sharpen our statistics skills. Get ready to tackle technical questions on probability distributions, hypothesis testing, and regression analysis. These are often the bread and butter of data science interviews, so don't just skim over them!
✨Showcase Your Projects
Prepare a killer portfolio showcasing your data science projects. We should include details about the datasets used, the tools and techniques applied, and the impact of your findings. If we can walk them through a particularly challenging project or a cool visualisation that had real-world implications, it’ll really make us stand out!
✨Get Comfortable with Python and R
Most data science positions require us to be proficient in programming languages like Python and R. We should practice common libraries like pandas, NumPy, and scikit-learn, and be ready for live coding exercises or algorithm questions. Showing off our coding chops can really impress the interviewers at Dataiku!
✨Prepare for Case Studies
Expect to encounter real-world case studies during the interview. We might be asked how we’d approach a data problem or analyse a dataset to extract insights. It's essential to think out loud and demonstrate our problem-solving process so that the interviewer can see our logical thinking in action.