Job Description
Key Responsibilities:
- Build and optimize Prophecy data pipelines for large scale batch and streaming data workloads using Pyspark
- Define end-to-end data architecture leveraging prophecy integrated with databricks or Spark or other cloud-native compute engines
- Establish coding standards, reusable components, and naming conventions using Prophecy's visual designer and metdata-driven approach
- Implement scalable and efficient data models (e.g star schema, scd typ2) for data marts and analytics layer
- Integrate Prophecy pipelines with orchestration tools like Airflow, data catalog tools for lineage
- Implement version control, automated testing and deployment pipelines using Git and CI/CD (e.g GitHub and Jenkins)
- Monitor and tune performance of Spark jobs optimize data partitions and caching strategies
- Having experience and exposure to convert legacy etl tools like datastage, informatica into Prophecy pipelines using Transpiler component of Prophecy
Required skill & experience:
- 2+ years of hands-on experience with Prophecy (Using pyspark) approach
- 5+ years of experience in data engineering with tools such as Spark, Databricks,scala/Pyspark or SQL
- Strong understanding of ETL/ELT pipelines, distributed data processing and data lake architecture.
- Having exposure to ETL tools such as Informatica,
- Datastage or talend is added advantage
- Experience with Unity catalog, Delta lake and modern data lakehouse concepts
- Strong communication and stakeholder management skills.
Contact Detail:
PRACYVA Recruiting Team