DE&A - AIML - Auto ML

DE&A - AIML - Auto ML

Full-Time No working from home possible
Zensar Technologies
Description

Must-have

Nice-to-have / differentiators

Principal-level hands-on data engineering on AWS β€” 7+ years

Prior simulation / CAE / HPC data lake experience (Ansys, Siemens NX, BETA CAE, OpenFOAM, etc.)

Deep production experience with S3, S3 Tables, Glue, Athena, and OpenSearch

(including k-NN / vector search)

Familiarity with surrogate model training data pipelines

Built and shipped vector embedding workloads

Experience with SageMaker Unified Studio or comparable governed data-mesh tooling

(in case of required integration)

Strong metadata modelling and data taxonomy design experience for scientific

or engineering domains

Multi-cloud data engineering (AWS GCP) experience

Comfort working with Parquet, JSON-LD, and large binary scientific data formats

(mesh, time-series, spectra)

Published or contributed to AWS data architecture patterns or blueprints

Python proficiency; PySpark / Glue job tuning experience


Responsibilities

Key responsibilities on this engagement

β€’ Run the Sprint 1 architecture review of the existing UAT codebase (S3 + Glue + S3 Tables + OpenSearch + Athena) and deliver written gap findings.

β€’ Design the metadata schema, taxonomy, and field catalogue (Light, Brain, Power).

β€’ Tune data orchestration β€” Glue jobs, Athena queries, S3 Tables config, scheduling. Lead the deep-dive technical sessions with analysts on visualization requirements

β€’ Build and validate the simulation data onboarding pipeline against real data β€” including the 30 GB-per-run acoustic spectra dataset.

β€’ Configure and validate the OpenSearch k-NN vector store and the Bedrock embedding pipeline.

β€’ Author the AI/ML data export format specification and the AI onboarding pattern document.

β€’ Co-design the API middleware blueprint with the Cloud Infrastructure Architect.


Qualifications

Must-have

Nice-to-have / differentiators

Principal-level hands-on data engineering on AWS β€” 7+ years

Prior simulation / CAE / HPC data lake experience (Ansys, Siemens NX, BETA CAE, OpenFOAM, etc.)

Deep production experience with S3, S3 Tables, Glue, Athena, and OpenSearch

(including k-NN / vector search)

Familiarity with surrogate model training data pipelines

Built and shipped vector embedding workloads

Experience with SageMaker Unified Studio or comparable governed data-mesh tooling

(in case of required integration)

Strong metadata modelling and data taxonomy design experience for scientific

or engineering domains

Multi-cloud data engineering (AWS GCP) experience

Comfort working with Parquet, JSON-LD, and large binary scientific data formats

(mesh, time-series, spectra)

Published or contributed to AWS data architecture patterns or blueprints

Python proficiency; PySpark / Glue job tuning experience


Zensar Technologies

Contact Details:

Zensar Technologies Recruitment Team