DE&A - AIML - Auto ML

Apply Now

Job Board

Companies

Zensar Technologies

DE&A - AIML - Auto ML

Full-Time No working from home possible

Apply Now

Description

Must-have	Nice-to-have / differentiators
Principal-level hands-on data engineering on AWS — 7+ years	Prior simulation / CAE / HPC data lake experience (Ansys, Siemens NX, BETA CAE, OpenFOAM, etc.)
Deep production experience with S3, S3 Tables, Glue, Athena, and OpenSearch (including k-NN / vector search)	Familiarity with surrogate model training data pipelines
Built and shipped vector embedding workloads	Experience with SageMaker Unified Studio or comparable governed data-mesh tooling (in case of required integration)
Strong metadata modelling and data taxonomy design experience for scientific or engineering domains	Multi-cloud data engineering (AWS GCP) experience
Comfort working with Parquet, JSON-LD, and large binary scientific data formats (mesh, time-series, spectra)	Published or contributed to AWS data architecture patterns or blueprints
Python proficiency; PySpark / Glue job tuning experience

Responsibilities

Key responsibilities on this engagement

• Run the Sprint 1 architecture review of the existing UAT codebase (S3 + Glue + S3 Tables + OpenSearch + Athena) and deliver written gap findings.

• Design the metadata schema, taxonomy, and field catalogue (Light, Brain, Power).

• Tune data orchestration — Glue jobs, Athena queries, S3 Tables config, scheduling. Lead the deep-dive technical sessions with analysts on visualization requirements

• Build and validate the simulation data onboarding pipeline against real data — including the 30 GB-per-run acoustic spectra dataset.

• Configure and validate the OpenSearch k-NN vector store and the Bedrock embedding pipeline.

• Author the AI/ML data export format specification and the AI onboarding pattern document.

• Co-design the API middleware blueprint with the Cloud Infrastructure Architect.

Qualifications

Must-have	Nice-to-have / differentiators
Principal-level hands-on data engineering on AWS — 7+ years	Prior simulation / CAE / HPC data lake experience (Ansys, Siemens NX, BETA CAE, OpenFOAM, etc.)
Deep production experience with S3, S3 Tables, Glue, Athena, and OpenSearch (including k-NN / vector search)	Familiarity with surrogate model training data pipelines
Built and shipped vector embedding workloads	Experience with SageMaker Unified Studio or comparable governed data-mesh tooling (in case of required integration)
Strong metadata modelling and data taxonomy design experience for scientific or engineering domains	Multi-cloud data engineering (AWS GCP) experience
Comfort working with Parquet, JSON-LD, and large binary scientific data formats (mesh, time-series, spectra)	Published or contributed to AWS data architecture patterns or blueprints
Python proficiency; PySpark / Glue job tuning experience

Contact Details:

Zensar Technologies Recruitment Team

View Zensar Technologies profile

DE&A - AIML - Auto ML

Zensar Technologies

Apply Now

DE&A - AIML - Auto ML

Company

Product

Help