Machine Learning Expert - Fully Remote | Upto $90/hr

Machine Learning Expert - Fully Remote | Upto $90/hr

Full-Time No working from home possible
O

Overview

We are hiring experienced machine learning engineers and researchers to serve as human baseliners for evaluations of open-ended machine learning research tasks. These evaluations measure how well AI agents perform on realistic AI R&D problems. To interpret agent performance, we also need strong human reference points: skilled practitioners attempting the same tasks under the same time and compute constraints. As a baseliner, you will complete self-contained ML research tasks in a sandboxed environment, working independently with your preferred tools and workflow. Your performance will be used as a benchmark against which frontier-model agents are evaluated.

What You’ll Do

  • Attempt open-ended machine learning research tasks under a fixed time and compute budget (work trial)
  • Work independently in a sandboxed Linux environment with internet access
  • Use your preferred tooling, including IDEs and AI coding assistants such as Cursor, Claude Code, and ChatGPT
  • Record your full working session via screen recording
  • Complete a short pre-task and post-task questionnaire
  • Submit your final work product, screen recording, and completed questionnaires

Post this you will be hired for a longer commitment.

Commitment

  • Minimum 20 hours per week if selected
  • More availability is strongly preferred

Requirements

  • 3+ years of machine learning experience (time spent in a PhD program counts toward this requirement; undergraduate and master’s experience does not count)
  • Attended a top‑100 university or worked at FAANG or a comparable company
  • Experience with at least one major ML framework such as PyTorch, JAX, or TensorFlow
  • Deep, hands‑on expertise in at least one of the following focus areas:
  • Pretraining under tight data and compute budgets
  • PPO, reward shaping, custom gym / gymnasium environments, and throughput tuning
  • Full fine‑tuning, LoRA, QLoRA, DPO, RLHF, RLAIF, and distillation
  • Large‑scale corpus filtering, deduplication, subsampling, and benchmark contamination avoidance
  • Architecture design under strict parameter‑count or size constraints
  • Modifying pretrained architectures, including attention patterns, pooling heads, or training objectives
  • Contrastive training for embedding or retrieval models
  • Generative vision or video modeling
  • Multilingual or low‑resource language experience
  • Image or video data pipelines at scale
  • Experience balancing competing model objectives such as safety and capability
  • Prior work as an ML evaluator, red‑teamer, or baseliner

Required Domain Expertise

  • Pretraining: training transformer language models from scratch
  • Reinforcement learning: training agents in custom or existing environments
  • Post‑training: fine‑tuning and aligning LLMs
  • Dataset curation: building and cleaning large text corpora for LLM training
  • Model architecture: designing and modifying neural network architectures

Logistics (work trial requirements)

  • One baseline attempt per contractor per task
  • Each task may only be attempted once by a given contractor
  • All work is confidential and covered by NDA
  • Compute and environment are provided; no personal GPU is required
#J-18808-Ljbffr
O

Contact Details:

Obsidian Recruitment Team