Sarvam AI logo

Machine Learning Engineer, Vision

Sarvam AI

Bengaluru, Karnataka, IndiaMID
AIDeep Learning

Job Description

Join Sarvam AI as a Machine Learning Engineer focusing on vision-language models.

Responsibilities

  • Design and run training and fine-tuning pipelines for large vision-language models on GPU clusters
  • Build multimodal data pipelines — ingestion, filtering, deduplication, synthetic generation, and quality assurance
  • Implement and experiment with new architectures and training techniques from research
  • Build evaluation harnesses, benchmarks, and automated regression tracking
  • Optimise models for inference — quantisation, batching, and serving infrastructure
  • Build robust pipelines and integrations that put vision model capabilities in the hands of end users
  • Translate real-world problems into well-scoped ML tasks with the right data and evaluation strategy
  • Work directly with clients to understand their use cases — document processing, visual search, form extraction — and own the solution end to end
  • Build production-grade systems on top of Sarvam Vision and open-source models: multimodal pipelines, retrieval-augmented workflows, and structured output extraction
  • Debug and improve deployed solutions — latency, accuracy, edge cases, and integration with client infrastructure
  • Strong Python and PyTorch — comfortable reading and modifying model internals
  • Hands-on experience training or fine-tuning large models, including debugging broken runs
  • Experience building data pipelines at scale
  • Solid grounding in transformer architectures and modern training techniques
  • Comfort with ambiguity — the roadmap is not fully pre-specified
  • Strong focus on secure coding practices, code quality, and system reliability
  • Undergraduate degree in a technical discipline (CS, statistics, physics, or equivalent)
  • Experience with vision-language models or multimodal systems
  • Distributed training (FSDP, DeepSpeed, Megatron-LM)
  • Post-training methods — RLHF, DPO, or alignment techniques
  • Inference optimisation — quantisation, distillation, serving
  • Prior exposure to vision-based AI systems or document processing pipelines
  • Contributions to open-source projects or a solid GitHub portfolio
  • Work alongside researchers, engineers, builders, and business leaders who move fast and hold each other to a very high bar
  • High ownership and high impact, from day one
  • Everything we do is AI-first, from the way we build and ship to the way we think about problems
  • You can work on problems that could change how an entire country learns, works, and communicates

Qualifications

  • Strong Python and PyTorch experience.
  • Experience training large models.

Nice to have

  • Experience with vision-language models.
  • Contributions to open-source projects.

Interested in this role?

Sign up free to apply on FeedbackAI and get an AI match score for this job.